Safe-Linking - Blocking a 20 Year Old Exploit Primitive: ======================================================== Background: ----------- From the early days of binary exploitation, the heap internal data structures have been a prime target for attackers. By understanding the way the heap malloc() and free() work, attackers were able to leverage an initial vulnerability in a heap buffer, such as a linear buffer overflow, into a stronger exploit primitive such as an Arbitrary-Write. One such detailed example is the Phrack article from 2001, "Vudo Malloc Tricks" (http://phrack.org/issues/57/8.html). This article details the internals of multiple heap implementations, and describes what is now called "Unsafe-Unlinking." Attackers that modify the FD and BK pointers of double-linked lists (such as the small bins for example) can use the unlink() operation to trigger an Arbitrary-Write, and thus achieve code execution over the target program. And indeed, Glibc version 2.3.6 from 2005 embedded a fix to this known exploit primitive called "Safe-Unlinking." This elegant fix verifies the integrity of the double-linked node before unlinking it from the list, by checking that: p->FD->BK == p. While this exploit primitive was blocked more than 14 years ago, at the time no one came up with a similar solution to protect the pointers of the single-linked lists. Leveraging this weak spot, attackers shifted their focus to these unprotected single-linked lists such as fastbins, and currently also tcache. In 2013, Chromium integrated a new functionality dubbed "MaskPtr()", which went relatively unnoticed. In the next section we will describe our variant for this functionality, and the reasons behind it. Introducing Safe-Linking: ------------------------- In this white paper, we present a security mechanism to close this almost 20 year old security gap. Safe-Linking is a security mechanism to protect single-linked lists (such as the fastbins and tcache) from tampering by attackers. The mechanism makes use of randomness from the Address Space Layout Randomization (ASLR) that is now heavily deployed in most modern operating systems to "sign" the list's pointers. When combined with chunk alignment integrity checks, this new technique protects the pointers from hijacking attempts. Our solution protects against 3 common attacks regularly used in modern day exploits: * Partial pointer override: Modifying the lower bytes (Little Endian). * Full pointer override: Hijacking the pointer to a chosen location. * Unaligned chunks: Pointing the list to an unaligned address. Threat Modeling: ---------------- In our threat model, the attacker has the following capabilities: * Controlled linear buffer overflow / underflow over a heap buffer. * Relative Arbitrary-Write over a heap buffer. It’s important to note that our attacker doesn't know where the heap is located, as the heap's base address is randomized together with the mmap_base by the ASLR (more on this topic on the next section). Our solution raises the bar and blocks our attacker's heap-based exploit attempts. Once deployed, attackers must have an additional capability in the form of heap-leak / pointer-leak to be able to successfully build an exploit. An example scenario for our protection is a position-dependent binary (loads without ASLR) that has a heap overflow when parsing incoming user input. Until now, an attacker was able to exploit such targets without any heap leak, and with only minimal control over the heap's allocations, by depending solely on the fixed addresses of the binary itself. We block such exploit attempts, and leverage the heap's ASLR to gain randomness when redirecting heap allocations to fixed addresses in such target binaries. Protection: ----------- On Linux machines, the heap is randomized via mmap_base which follows the following logic: random_base = ((1 << rndbits) - 1) << PAGE_SHIFT) While "rndbit" defaults to 8 on 32 bit Linux machines, and 28 on 64 bit machines. We denote the address in which the single-linked list pointer is stored at as L. We now define the following calculation: Mask := (L >> PAGE_SHIFT) According to the ASLR formula shown above, the shift positions the first random bits from the memory address right at the LSBit of the mask itself. This leads us to our protection scheme. We denote the single-linked list pointer with P and see how the scheme looks: * PROTECT(P) := (L >> PAGE_SHIFT) XOR (P) * *L = PROTECT(P) Or in it's code version: #define PROTECT_PTR(pos, ptr, type) \ ((type)((((size_t)pos) >> PAGE_SHIFT) ^ ((size_t)ptr))) #define REVEAL_PTR(pos, ptr, type) \ PROTECT_PTR(pos, ptr, type) This way, the random bits from the address L are placed on top the LSB of the stored protected pointer. This protection layer prevents an attacker without the knowledge of the random ASLR bits from modifying the pointer to a controlled value. However, if you paid attention, you can easily see that we are at a disadvantage against the "Safe-Unlinking" mechanism. In our solution, the attacker can't properly hijack the pointer. However, we are also limited as we can't check if a pointer modification occurred. This is where an additional check takes place. All allocated chunks in the heap are aligned to a known offset which is usually 8 bytes on 32 bit machines, and 16 on 64 bit machines. By checking that each "reveal()ed" pointer is aligned accordingly, we add two important layers: * Attackers must correctly guess the alignment bits. * Attackers can't point chunks to unaligned memory addresses. On 64 bit machines, this statistical protection causes an attack attempt to fail 15 out of 16 times. Even on its own, this alignment check prevents known exploit primitives, such as the one described in: https://quentinmeffre.fr/exploit/heap/2018/11/02/fastbin_attack.html Example Implementation: ----------------------- Here is a snippet from the proposed patched version from glibc 2.30: tcache_put (...) { ... - e->next = tcache->entries[tc_idx]; + e->next = PROTECT_PTR(&e->next, tcache->entries[tc_idx], tcache_entry *); tcache->entries[tc_idx] = e; ++(tcache->counts[tc_idx]); } tcache_get (size_t tc_idx) { tcache_entry *e = tcache->entries[tc_idx]; - tcache->entries[tc_idx] = e->next; + tcache->entries[tc_idx] = REVEAL_PTR(&e->next, e->next, tcache_entry *); --(tcache->counts[tc_idx]); e->key = NULL; + if (__glibc_unlikely(((size_t)e) & MALLOC_ALIGN_MASK)) + malloc_printerr ("malloc(): un-aligned tcache chunk detected"); return (void *) e; } Benchmarking: ------------- Benchmarking showed that the added code sums up to 2-3 asm instructions for free() and 3-4 asm instructions for malloc(). On Glibc, the change was negligible and wasn't measurable even when summing up 1 billion (!) malloc()/free() operations on a single vCPU in GCP. The tests were made on a 64 bit version of the library. The worst impact on TCMalloc's benchmarking tests was 1.5%, while the average was a mere 0.02%. These benchmarking results are due to the slim design of the proposed mechanism: * The protection has no memory overhead. * The protection needs no initialization. * No special source of randomness is needed. * The protection uses only L and P which are both present at the time the pointer needs to be protect()ed or reveal()ed. It is important to note, and is detailed in glibc's documentation, that both the fastbins and the tcache use single-linked list to hold data. They only support the put/get API, and no common functionality will traverse the entire list and keep it intact. While such functionality does exist, it is only used for gathering malloc statistics (mallinfo), and so the added overhead for accessing the single-linked pointer is negligible. Revisiting our Threat Model: ---------------------------- The alignment check reduces the attack surface and mandates that a fastbin or a tcache chunk points to an aligned memory address. This directly blocks known exploit variants, as mentioned above. Just like Safe-Unlinking (for double-linked lists), our protection relies on the fact that the attacker doesn't know what legitimate heap pointers look like. An attacker that can forge a memory struct and knows what a valid heap pointer looks like can successfully forge a valid FD/BK pointers that won't trigger an Arbitrary-Write primitive, but will allow for a chunk at an attacker-controlled address. An attacker without a pointer leak won't be able to fully control an overridden pointer, due to the protection layer that relies on the randomness inherited from the deployed ASLR. The proposed PAGE_SHIFT places the random bits right over the first bit of the stored pointer. Together with the alignment check, this statistically blocks attackers from changing even the lowest bit / byte (Little Endian) of the stored single-linked pointer. Back to Chromium's MaskPtr(): ----------------------------- In 2013, Chromium added a similar functionality dubbed "MaskPtr()" to be used only when the free list (FL) is a double-linked list. In their variant, the code is as follows: inline void* MaskPtr(void* p) { // Maximize ASLR entropy and guarantee the result is an invalid address. const uintptr_t mask = ~(reinterpret_cast(TCMalloc_SystemAlloc) >> 13); return reinterpret_cast(reinterpret_cast(p) ^ mask); } We can see that they use a random pointer from the code section, and not a heap location as is used in our implementation. In addition, they shift the address by the hard coded value 13, and also invert the bits of their mask. As we failed to find the documentation for their design choices, we can read from the code that the bit inversion is used to guarantee the result will be an invalid address. Compared to our design, Chromium's implementation implies an additional memory reference (to the code function) and an additional asm instruction for the bit flipping. Not only so, their pointer masking is used without the additional alignment check, therefore the code can't catch in advance a pointer modification without crashing the process. Integration: ------------ We implemented, and tested the patches for successfully integrating the proposed mitigation to the latest versions of glibc (ptmalloc), uClibc (dlmalloc) and gperftools (TCMalloc). In addition, we also pointed Chromium's development team to our version of Safe-Linking as was submitted to gperftools, and we hope it will be included in all of their free list (FL) implementations. We believe that integrating Safe-Linking to these 3 dominant libraries will lead to a wider adoption by other libraries, both in the open source community and in closed source software in the industry. The fact that a basic version of Safe-Linking was already embedded into Chromium since 2013, proves the maturity of this solution. Final Note: ----------- Safe-Linking is not a magic bullet that will stop all exploit attempts against modern day heap implementations. However, this is another step in the right direction. By forcing an attacker to have a pointer leak vulnerability before he can even start his heap-based exploit, we gradually raise the bar. From our past experience, this specific mitigation would have blocked several major exploits that we have implemented throughout the years, thus turning "broken" software products to "unexploitable" (at least with the vulnerabilities we had at the time of the respective research projects). It is also important to note that our solution isn't limited only to heap implementations. It also enables endangered data structures, with single-linked list pointers that are located near user buffers, to get integrity protection without any additional memory overhead, and with a negligible run-time impact. The solution is easily extendable to every single-linked list in a system with ASLR.