Safe-Linking - Blocking a 20 Year Old Exploit Primitive:
========================================================
Background:
-----------
From the early days of binary exploitation, the heap internal data
structures have been a prime target for attackers. By understanding
the way the heap malloc() and free() work, attackers were able to
leverage an initial vulnerability in a heap buffer, such as a linear
buffer overflow, into a stronger exploit primitive such as an
Arbitrary-Write.

One such detailed example is the Phrack article from 2001, "Vudo Malloc
Tricks" (http://phrack.org/issues/57/8.html). This article details the
internals of multiple heap implementations, and describes what is now
called "Unsafe-Unlinking." Attackers that modify the FD and BK pointers
of double-linked lists (such as the small bins for example) can use the
unlink() operation to trigger an Arbitrary-Write, and thus achieve code
execution over the target program.

And indeed, Glibc version 2.3.6 from 2005 embedded a fix to this known
exploit primitive  called "Safe-Unlinking." This elegant fix verifies
the integrity of the double-linked node before unlinking it from the
list, by checking that:

	p->FD->BK == p.

While this exploit primitive was blocked more than 14 years ago, at the
time no one came up with a similar solution to protect the pointers of
the single-linked lists. Leveraging this weak spot, attackers shifted
their focus to these unprotected single-linked lists such as fastbins,
and currently also tcache.

In 2013, Chromium integrated a new functionality dubbed "MaskPtr()",
which went relatively unnoticed. In the next section we will describe
our variant for this functionality, and the reasons behind it.

Introducing Safe-Linking:
-------------------------
In this white paper, we present a security mechanism to close this
almost 20 year old security gap. Safe-Linking is a security mechanism
to protect single-linked lists (such as the fastbins and tcache) from
tampering by attackers. The mechanism makes use of randomness from the
Address Space Layout Randomization (ASLR) that is now heavily deployed
in most modern operating systems to "sign" the list's pointers. When
combined with chunk alignment integrity checks, this new technique
protects the pointers from hijacking attempts.

Our solution protects against 3 common attacks regularly used in modern
day exploits:
  * Partial pointer override: Modifying the lower bytes (Little Endian).
  * Full pointer override: Hijacking the pointer to a chosen location.
  * Unaligned chunks: Pointing the list to an unaligned address.

Threat Modeling:
----------------
In our threat model, the attacker has the following capabilities:
* Controlled linear buffer overflow / underflow over a heap buffer.
* Relative Arbitrary-Write over a heap buffer.

It’s important to note that our attacker doesn't know where the heap is
located, as the heap's base address is randomized together with the
mmap_base by the ASLR (more on this topic on the next section).

Our solution raises the bar and blocks our attacker's heap-based
exploit attempts. Once deployed, attackers must have an additional
capability in the form of heap-leak / pointer-leak to be able to
successfully build an exploit.

An example scenario for our protection is a position-dependent binary
(loads without ASLR) that has a heap overflow when parsing incoming
user input. Until now, an attacker was able to exploit such targets
without any heap leak, and with only minimal control over the heap's
allocations, by depending solely on the fixed addresses of the binary
itself. We block such exploit attempts, and leverage the heap's ASLR to
gain randomness when redirecting heap allocations to fixed addresses in
such target binaries.

Protection:
-----------
On Linux machines, the heap is randomized via mmap_base which follows
the following logic:

	random_base = ((1 << rndbits) - 1) << PAGE_SHIFT)
	
While "rndbit" defaults to 8 on 32 bit Linux machines, and 28 on 64 bit
machines.

We denote the address in which the single-linked list pointer is stored
at as L. We now define the following calculation:

	Mask := (L >> PAGE_SHIFT)

According to the ASLR formula shown above, the shift positions the
first random bits from the memory address right at the LSBit of the
mask itself.

This leads us to our protection scheme. We denote the single-linked
list pointer with P and see how the scheme looks:
  * PROTECT(P) := (L >> PAGE_SHIFT) XOR (P)
  * *L = PROTECT(P)
  
Or in it's code version:

#define PROTECT_PTR(pos, ptr, type)	\
        ((type)((((size_t)pos) >> PAGE_SHIFT) ^ ((size_t)ptr)))
#define REVEAL_PTR(pos, ptr, type)	\
        PROTECT_PTR(pos, ptr, type)

This way, the random bits from the address L are placed on top the LSB
of the stored protected pointer. This protection layer prevents an
attacker without the knowledge of the random ASLR bits from modifying
the pointer to a controlled value.

However, if you paid attention, you can easily see that we are at a
disadvantage against the "Safe-Unlinking" mechanism. In our solution,
the attacker can't properly hijack the pointer. However, we are also
limited as we can't check if a pointer modification occurred. This is
where an additional check takes place.

All allocated chunks in the heap are aligned to a known offset which is
usually 8 bytes on 32 bit machines, and 16 on 64 bit machines. By
checking that each "reveal()ed" pointer is aligned accordingly, we add
two important layers:
  * Attackers must correctly guess the alignment bits.
  * Attackers can't point chunks to unaligned memory addresses.
  
On 64 bit machines, this statistical protection causes an attack
attempt to fail 15 out of 16 times.

Even on its own, this alignment check prevents known exploit
primitives, such as the one described in:
https://quentinmeffre.fr/exploit/heap/2018/11/02/fastbin_attack.html

Example Implementation:
-----------------------
Here is a snippet from the proposed patched version from glibc 2.30:

 tcache_put (...)
 {
   ...
-  e->next = tcache->entries[tc_idx];
+  e->next = PROTECT_PTR(&e->next, tcache->entries[tc_idx], tcache_entry *);
   tcache->entries[tc_idx] = e;
   ++(tcache->counts[tc_idx]);
 }

 tcache_get (size_t tc_idx)
 {
   tcache_entry *e = tcache->entries[tc_idx];
-  tcache->entries[tc_idx] = e->next;
+  tcache->entries[tc_idx] = REVEAL_PTR(&e->next, e->next, tcache_entry *);
   --(tcache->counts[tc_idx]);
   e->key = NULL;
+  if (__glibc_unlikely(((size_t)e) & MALLOC_ALIGN_MASK))
+    malloc_printerr ("malloc(): un-aligned tcache chunk detected");
   return (void *) e;
 }
 
Benchmarking:
-------------
Benchmarking showed that the added code sums up to 2-3 asm instructions
for free() and 3-4 asm instructions for malloc(). On Glibc, the change
was negligible and wasn't measurable even when summing up 1 billion (!)
malloc()/free() operations on a single vCPU in GCP. The tests were made
on a 64 bit version of the library. The worst impact on TCMalloc's
benchmarking tests was 1.5%, while the average was a mere 0.02%.

These benchmarking results are due to the slim design of the proposed
mechanism:
  * The protection has no memory overhead.
  * The protection needs no initialization.
  * No special source of randomness is needed.
  * The protection uses only L and P which are both present at the time
    the pointer needs to be protect()ed or reveal()ed.
  
It is important to note, and is detailed in glibc's documentation, that
both the fastbins and the tcache use single-linked list to hold data.
They only support the put/get API, and no common functionality will
traverse the entire list and keep it intact.

While such functionality does exist, it is only used for gathering
malloc statistics (mallinfo), and so the added overhead for accessing
the single-linked pointer is negligible.

Revisiting our Threat Model:
----------------------------
The alignment check reduces the attack surface and mandates that a
fastbin or a tcache chunk points to an aligned memory address. This
directly blocks known exploit variants, as mentioned above.

Just like Safe-Unlinking (for double-linked lists), our protection
relies on the fact that the attacker doesn't know what legitimate
heap pointers look like. An attacker that can forge a memory struct
and knows what a valid heap pointer looks like can successfully forge
a valid FD/BK pointers that won't trigger an Arbitrary-Write primitive,
but will allow for a chunk at an attacker-controlled address. An
attacker without a pointer leak won't be able to fully control an
overridden pointer, due to the protection layer that relies on the
randomness inherited from the deployed ASLR.

The proposed PAGE_SHIFT places the random bits right over the first bit
of the stored pointer. Together with the alignment check, this
statistically blocks attackers from changing even the lowest bit / byte
(Little Endian) of the stored single-linked pointer.

Back to Chromium's MaskPtr():
-----------------------------
In 2013, Chromium added a similar functionality dubbed "MaskPtr()" to
be used only when the free list (FL) is a double-linked list. In their
variant, the code is as follows:

inline void* MaskPtr(void* p) {
  // Maximize ASLR entropy and guarantee the result is an invalid address.
  const uintptr_t mask =
      ~(reinterpret_cast<uintptr_t>(TCMalloc_SystemAlloc) >> 13);
  return reinterpret_cast<void*>(reinterpret_cast<uintptr_t>(p) ^ mask);
}

We can see that they use a random pointer from the code section, and
not a heap location as is used in our implementation. In addition, they
shift the address by the hard coded value 13, and also invert the bits
of their mask. As we failed to find the documentation for their design
choices, we can read from the code that the bit inversion is used to
guarantee the result will be an invalid address.

Compared to our design, Chromium's implementation implies an additional
memory reference (to the code function) and an additional asm
instruction for the bit flipping. Not only so, their pointer masking
is used without the additional alignment check, therefore the code
can't catch in advance a pointer modification without crashing the
process.

Integration:
------------
We implemented, and tested the patches for successfully integrating the
proposed mitigation to the latest versions of glibc (ptmalloc), uClibc
(dlmalloc) and gperftools (TCMalloc). In addition, we also pointed
Chromium's development team to our version of Safe-Linking as was
submitted to gperftools, and we hope it will be included in all of
their free list (FL) implementations.

We believe that integrating Safe-Linking to these 3 dominant
libraries will lead to a wider adoption by other libraries, both in the
open source community and in closed source software in the industry.
The fact that a basic version of Safe-Linking was already embedded into
Chromium since 2013, proves the maturity of this solution.

Final Note:
-----------
Safe-Linking is not a magic bullet that will stop all exploit attempts
against modern day heap implementations. However, this is another step
in the right direction. By forcing an attacker to have a pointer leak
vulnerability before he can even start his heap-based exploit, we
gradually raise the bar.

From our past experience, this specific mitigation would have blocked
several major exploits that we have implemented throughout the years,
thus turning "broken" software products to "unexploitable" (at least
with the vulnerabilities we had at the time of the respective research
projects).

It is also important to note that our solution isn't limited only to
heap implementations. It also enables endangered data structures, with
single-linked list pointers that are located near user buffers, to get
integrity protection without any additional memory overhead, and with a
negligible run-time impact. The solution is easily extendable to every
single-linked list in a system with ASLR.