PACMAN: Breaking ARM Pointer Authentication with Speculative Execution
ARM Pointer Authentication was designed around a simple and elegant principle: if an attacker corrupts a protected pointer, the program crashes. No crash suppression, no oracle, no way to distinguish a correct guess from a wrong one. The PAC space is small enough to theoretically bruteforce, but every wrong guess terminates the process and rotates the keys. Security by crash.
PACMAN breaks that design at the hardware level. The key insight is that speculative execution already suppresses exceptions by design. The CPU speculatively executes past a potential fault, observes micro-architectural side effects, then squashes the faulting instruction without triggering any architectural-visible exception. If you can route a PAC verification through a speculative path and transmit the result through a side channel, you get an oracle: a primitive that tells you whether a guessed PAC is correct or not, without ever crashing. Then it’s just a matter of iterating over all $2^n$ possible PAC values.
The paper demonstrates this against the Apple M1, the first desktop processor supporting ARMv8.3. A full cross-privilege attack, userspace against kernel, is shown working. 55,159 PACMAN gadgets were found in the XNU kernel alone.
ARM Pointer Authentication: The Mechanism Being Broken
ARM PA was introduced in ARMv8.3 (2017) and has shipped in Apple silicon since the A12 (2018). The M1, M1 Pro, M1 Max, and every subsequent Apple chip supports it. Qualcomm, Samsung, and most ARMv8.3+ vendors either ship it or have announced support.
The mechanism works by embedding a cryptographic hash of a pointer into the pointer’s unused upper bits. On macOS 12.2.1 on M1, virtual addresses are 48-bit, leaving 16 bits for the PAC. The PAC is computed from:
\[\text{PAC} = \text{QARMA}(\text{ptr}, \text{key}, \text{salt})\]where QARMA is a lightweight tweakable block cipher, key is one of five hardware keys stored in privileged registers inaccessible to userspace (IA, IB, DA, DB, GA), and salt is a program-specified context value (typically the stack pointer or object address).
The signing and verification instructions are:
| Instruction | Operation | Key |
|---|---|---|
pacia ptr, salt |
Sign instruction pointer with IA key | IA |
pacib ptr, salt |
Sign instruction pointer with IB key | IB |
pacda ptr, salt |
Sign data pointer with DA key | DA |
pacdb ptr, salt |
Sign data pointer with DB key | DB |
autia ptr, salt |
Verify and strip PAC (IA key) | IA |
autib ptr, salt |
Verify and strip PAC (IB key) | IB |
autda ptr, salt |
Verify and strip PAC (DA key) | DA |
autdb ptr, salt |
Verify and strip PAC (DB key) | DB |
Verification failure poisons the pointer by setting specific bits in the PAC region. Any subsequent dereference triggers a translation fault. No crash suppression, no retry, no oracle. That’s the design guarantee.
Stack protection looks like this in practice:
; filename: pa-stack-protection.asm
; Function prologue — sign the return address
pacia lr, sp ; sign lr using stack pointer as salt
sub sp, sp, #0x40
str lr, [sp, #0x30] ; push signed return address
; Function epilogue — verify before use
ldr lr, [sp, #0x30] ; pop signed return address
add sp, sp, #0x40
autia lr, sp ; verify: strip PAC or poison if tampered
ret ; fault here if lr was corrupted
macOS uses PA for return addresses, C++ vtable pointers, vtable entries, and Objective-C method caches.
The PACMAN Gadget
A PACMAN gadget is a code pattern that, when executed speculatively, leaks whether a PAC verification succeeded or failed through a micro-architectural side channel. Two variants exist based on how the result is transmitted.
Data PACMAN gadget — transmits via a speculative memory load:
flowchart LR
t1["t1 · BR1 mis-speculates\ncond=false, trained=true"]
t2["t2 · AUT(guessed_ptr)\nverify PAC under speculation"]
ok["valid_ptr"]
bad["poisoned_ptr"]
t3a["t3 · Load(valid_ptr)\nTLB side effect observable\nPAC is CORRECT"]
t3b["t3 · speculative exception\nnot issued to memory\nPAC is INCORRECT"]
t4["t4 · BR1 squashed\nexception suppressed, no crash"]
t1 --> t2
t2 -->|correct PAC| ok --> t3a --> t4
t2 -->|incorrect PAC| bad --> t3b --> t4
Instruction PACMAN gadget — transmits via a speculative instruction fetch (requires eager nested-branch squash):
flowchart LR
i1["t1 · BR1 mis-speculates"]
i2["t2 · AUT(guessed_ptr)\nBTB predicts BR2 target in parallel"]
i3["t3 · AUT completes\nBR2 misprediction → eager squash"]
iok["valid_ptr"]
ibad["poisoned_ptr"]
i4a["t4 · fetch from valid_ptr\niTLB side effect observable\nPAC is CORRECT"]
i4b["t4 · fetch not issued\nspeculative exception\nPAC is INCORRECT"]
i5["t5 · BR1 squashed\nexception suppressed, no crash"]
i1 --> i2 --> i3
i3 -->|correct PAC| iok --> i4a --> i5
i3 -->|incorrect PAC| ibad --> i4b --> i5
The data gadget in pseudocode:
// filename: data-pacman-gadget.c
if (cond) { // BR1: trained to be taken, then mis-speculated
verified_ptr = AUT(guessed_ptr); // verify under speculation
Load(verified_ptr); // transmit via TLB side effect
}
The instruction gadget needs one additional hardware property: eager squashing of nested branches. When the processor speculatively executes past BR1 and encounters BR2 (the indirect branch using verified_ptr), it uses the BTB to predict BR2’s target before verified_ptr is resolved. Once AUT completes and verified_ptr is available, the processor detects BR2 misprediction and eagerly squashes BR2 while fetching from the actual target. That fetch is the side channel. The Apple M1’s aggressive out-of-order speculation makes it particularly susceptible here.
Gadget prevalence in XNU
The paper scanned XNU 12.2.1 (xnu-8019.80.24) using a Ghidra script: find conditional branches, inspect 32 instructions in both directions, match aut* destinations to subsequent load/branch source registers.
| Gadget type | Count |
|---|---|
| Data PACMAN gadgets | 13,867 |
| Instruction PACMAN gadgets | 41,292 |
| Total | 55,159 |
| Average branch-to-transmit distance | 8.1 instructions |
The scan was conservative: only register data-dependence tracked, only 32 instructions inspected per branch. The real count is higher.
Reverse Engineering the Apple M1
The M1 is a 4 p-core + 4 e-core big.LITTLE design on AArch64 with full ARMv8.3 support. Apple does not publish micro-architectural details. No public documentation on the M1’s TLB organization existed at the time of the paper. The attack requires knowing the TLB parameters precisely to build the Prime+Probe eviction sets.
Cache hierarchy (from system registers via kext)
| Structure | Ways | Sets | Line Size | Total Size |
|---|---|---|---|---|
| p-core L1I | 6 | 512 | 64 B | 192 KB |
| p-core L1D | 8 | 256 | 64 B | 128 KB |
| p-core L2 | 12 | 8192 | 128 B | 12 MB |
| e-core L1I | 8 | 256 | 64 B | 128 KB |
| e-core L1D | 8 | 128 | 64 B | 64 KB |
| e-core L2 | 16 | 2048 | 128 B | 4 MB |
| System L3 | shared across all SoC components | 16 MB |
macOS 12.2.1 uses 48-bit virtual addresses with 16KB pages and 16-bit PACs.
TLB hierarchy (reverse engineered)
The TLB experiments used the eviction address formula:
\[\text{Addrs}[i] = x + i \times \text{stride} + i \times 128 \text{ B}, \quad 1 \leq i \leq N\]where the $i \times 128\text{B}$ term maps each address to a distinct cache set to isolate TLB conflicts from cache conflicts. The paper varied stride (in multiples of 16KB) and N from 1 to 30 and observed three distinct latency jumps:
| Access latency range | Explanation | Eviction trigger |
|---|---|---|
| ~60 cycles | L1 dTLB hit | baseline |
| ~80 cycles | L2 cache hit + L1 dTLB hit | L1D cache miss |
| ~95 cycles | L1 dTLB miss | stride ≥ 256×16KB, N ≥ 12 |
| ~115 cycles | L2 TLB miss | stride ≥ 2048×16KB, N ≥ 23 |
| ~130 cycles | L2 TLB miss + L1D miss | both |
Derived TLB parameters:
| TLB | Ways | Sets | Stride to evict | Addresses needed |
|---|---|---|---|---|
| L1 dTLB | 12 | 256 | 256 × 16 KB = 4 MB | ≥ 12 |
| L2 TLB | 23 | 2048 | 2048 × 16 KB = 32 MB | ≥ 23 |
| L1 iTLB | 4 | 32 | 32 × 16 KB = 512 KB | ≥ 4 |
The L1 iTLB discovery revealed something unexpected: the L1 dTLB acts as a non-inclusive backing store for the L1 iTLB. When a page table entry is evicted from the L1 iTLB, it migrates to the L1 dTLB. This is critical for the cross-privilege attack because the L1 dTLB and L2 TLB are shared between userspace and kernelspace, while the L1 iTLB is private per privilege level.
flowchart TD
US_iTLB["Userspace L1 iTLB\n4-way · 32 sets (private)"]
KS_iTLB["Kernelspace L1 iTLB\n4-way · 32 sets (private)"]
L1D["L1 dTLB\n12-way · 256 sets\nSHARED (EL0 + EL1)"]
L2["L2 TLB\n23-way · 2048 sets\nSHARED (EL0 + EL1)"]
MEM["Page Table Walker"]
US_iTLB -->|"evict → insert"| L1D
KS_iTLB -->|"evict → insert"| L1D
L1D -->|miss| L2
L2 -->|miss| MEM
The shared L1 dTLB is the cross-privilege side channel. A userspace attacker can Prime+Probe the L1 dTLB to observe kernel TLB activity.
The timer problem
The attack requires a high-resolution timer. On M1:
| Timer | Register | EL0 accessible? | Resolution |
|---|---|---|---|
| System counter | CNTPCT_EL0 |
Yes | 24 MHz (too slow) |
| ARM cycle counter | PMCCNTR_EL0 |
Not on M1 | does not exist on M1 |
| Apple PMC0 | S3_2_c15_c0_0 |
No (kernel only) | Cycle-accurate |
| Multi-thread counter | shared memory | Yes | Sufficient |
The custom multi-thread timer:
// filename: multithread-timer.c
// Thread 1: dedicated timer — no ISB to maximize resolution
volatile uint64_t counter;
void timerthread() {
while (1) { counter++; }
}
// Thread 2: measuring thread — ISB enforces ordering
isb
ldr time1, [counter_addr] // read counter before
isb
// ... operations to time ...
isb
ldr time2, [counter_addr] // read counter after
isb
sub latency, time2, time1
The timer thread deliberately omits isb; the serialization penalty would slow counter increments and reduce resolution. The variance tradeoff is worth it. The resulting threshold for L1 dTLB hit vs. miss:
| Measurement | Threshold | Notes |
|---|---|---|
| L1 dTLB hit | ≤ 27 counts | never exceeded in experiments |
| L1 dTLB miss | ≥ 32 counts | never below in experiments |
| Decision threshold | 30 counts | used consistently in PoC attacks |
The paper also built PacmanOS: a bare-metal Rust hypervisor that boots directly on M1 hardware, providing noise-free reverse engineering experiments with full MSR access. Used for TLB characterization only; the actual attack uses the multi-thread timer.
PAC Oracle Results
The PAC oracle is built on top of the TLB Prime+Probe channel. The full protocol for the data PACMAN gadget cross-privilege attack:
- Train the branch predictor in the kernel PACMAN gadget to be taken (64 iterations)
- Reset TLB hierarchy by accessing 23 addresses mapping to the same L2 TLB set (not part of the eviction set)
- Prime the L1 dTLB set with 12 addresses at 256×16KB stride
- Trigger the PACMAN gadget via syscall with the pointer embedding the guessed PAC
- Probe the L1 dTLB: count misses
For the instruction PACMAN gadget, step 5 adds: evict the target entry from the kernelspace L1 iTLB to the shared L1 dTLB by fetching 4 instructions at 32×16KB stride (via 4 syscalls), then probe the L1 dTLB.
Results across 20,000 trials (10,000 correct PAC, 10,000 incorrect PAC):
| Gadget type | Signal for incorrect PAC | Signal for correct PAC |
|---|---|---|
| Data PACMAN | 0 misses in 99.2% of trials | ≥ 5 misses in 99.6% of trials |
| Instruction PACMAN | ≤ 1 miss in 99.2% of trials | ≥ 5 misses in 99.8% of trials |
The distributions are clearly bimodal. A threshold of 5 misses cleanly separates correct from incorrect PACs.
Brute-Force Attack
With a reliable oracle, the attack becomes iteration. The M1 uses 16-bit PACs, so the search space is $2^{16} = 65{,}536$ values. Each oracle query takes 2.69 ms (dominated by the 64 branch-predictor training iterations and kernel context-switch overhead).
Expected brute-force time:
\[\mathbb{E}[T_{\text{brute}}] = \frac{2^{16}}{2} \times 2.69 \text{ ms} = 88.1 \text{ seconds}\]The paper reports this as ~2.94 minutes (for exhaustive search of all $2^{16}$ values, worst case). Average case is half that.
Accuracy across 50 brute-force runs under realistic noise (web browsing, video call running concurrently):
| Outcome | Count | Rate |
|---|---|---|
| True positive (correct PAC found) | 45 / 50 | 90% |
| False negative (no PAC found, repeat required) | 5 / 50 | 10% |
| False positive (wrong PAC returned) | 0 / 50 | 0% |
Zero false positives is the critical number. A false positive would produce an incorrect PAC that crashes the system when used. Since false negatives are tolerable (retry until the correct PAC is found), the brute-force is both reliable and safe to run.
Jump2Win: Kernel Control-Flow Hijack
The end-to-end attack targets the C++ method dispatch process, which macOS uses pervasively for kernel object method calls:
// filename: vtable-dispatch-pa.c
vtable_ptr = AUT(*object_addr); // verify vtable pointer (DA key)
fp = AUT(vtable_ptr[i]); // verify function pointer (IA key)
call fp; // execute
Two separate PAC values must be forged: one for the vtable pointer (DA key) and one for the function pointer (IA key), each with salt derived from the object’s address plus a compile-time constant.
The attack target: two objects allocated contiguously in memory where object1 contains a buffer immediately preceding object2’s vtable pointer. The buffer overflow in object1 overwrites object2’s vtable pointer.
The attack steps:
- Use PACMAN to brute-force the PAC for the address of
win_function(the jump target) under the IA key with the appropriate salt - Use PACMAN to brute-force the PAC for the buffer address (which will serve as the fake vtable) under the DA key
- Trigger the buffer overflow: fill object1’s buffer with a signed
win_functionaddress (PAC from step 1), overwrite object2’s vtable pointer with a signed buffer address (PAC from step 2) - Trigger a method call on object2: the kernel verifies and loads the buffer address as the vtable, indexes it to get the
win_functionaddress, verifies the function pointer PAC, and branches towin_function
The paper implemented this as a kext with an intentional buffer overflow, attacked from an unprivileged userspace process. The win_function executed in kernel context.
Countermeasures
The paper evaluates three mitigation directions. None are fully satisfying.
| Direction | Approach | Problem |
|---|---|---|
| PAC-agnostic execution | isb after every aut* instruction |
Enormous performance penalty; aut* is extremely common in PA-enabled code |
| PAC-agnostic execution | Always speculate AUT as success | Introduces Meltdown-style vulnerability by allowing speculative deref of invalid pointers |
| Invisible speculation | InvisiSpec, SafeSpec, Delay-on-Miss | Broken by speculative interference attacks; also need to extend from caches to TLBs |
| Information flow tracking | STT, NDA, Dolma | These taint from load instructions; PACMAN taint starts from aut*, and needs re-purposing |
| Software | Fix memory corruption bugs | Necessary but not sufficient; PACMAN exists as long as speculation + PA coexist |
The information flow tracking fix is straightforward in principle: mark the output register of any aut* instruction as tainted and propagate. In practice, none of the existing IFT frameworks implement this. It would require hardware changes and compiler cooperation.
The real problem is architectural. PAC’s security-by-crash guarantee and speculative execution’s crash-suppression capability are fundamentally at odds. You cannot have aggressive out-of-order speculation and unconditional crash-on-bad-PAC simultaneously. One of them has to give.
Why This Matters
Pointer Authentication is the primary hardware-assisted CFI primitive for the ARM ecosystem right now. PAC It Up, PTAuth, PACStack, AOS, and several other research systems all build on PA’s security properties, evaluated exclusively under the memory-safety threat model. PACMAN invalidates those evaluations under any threat model that includes speculative execution primitives, which is every real-world deployment.
The attack generalizes beyond M1 in two ways. First, the PACMAN gadget structure is generic: any AArch64 processor with ARMv8.3 PA and sufficient speculation depth is potentially vulnerable. Second, TLB-based Prime+Probe cross-privilege attacks will work on any processor where TLBs are shared across privilege levels, which is a common design choice. The paper is the first to demonstrate TLB-based side-channel attacks with speculative execution on Apple processors specifically, but the methodology extends.
The immediate practical scope is every M1, M1 Pro, M1 Max, M2, and subsequent Apple Silicon device running macOS and iOS. In the longer term, every server and mobile device shipping ARMv8.3+ silicon.
Source and PoC: pacmanattack.com.
References
- Ravichandran, J., Na, W.T., Lang, J. & Yan, M. (2022). PACMAN: Attacking ARM Pointer Authentication with Speculative Execution. ISCA 2022.
- Kocher, P. et al. (2019). Spectre Attacks: Exploiting Speculative Execution. IEEE S&P 2019.
- Lipp, M. et al. (2018). Meltdown: Reading Kernel Memory from User Space. USENIX Security 2018.
- Gras, B., Razavi, K., Bos, H. & Giuffrida, C. (2018). Translation Leak-aside Buffer: Defeating Cache Side-channel Protections with TLB Attacks. USENIX Security 2018.
- Yarom, Y. & Falkner, K. (2014). Flush+Reload: A High Resolution, Low Noise, L3 Cache Side-Channel Attack. USENIX Security 2014.
- Liljestrand, H. et al. (2019). PAC it up: Towards Pointer Integrity using ARM Pointer Authentication. USENIX Security 2019.
- Farkhani, R.M., Ahmadi, M. & Lu, L. (2021). PTAuth: Temporal Memory Safety via Robust Points-to Authentication. USENIX Security 2021.
- Liljestrand, H. et al. (2021). PACStack: an Authenticated Call Stack. USENIX Security 2021.
- Göktas, E. et al. (2020). Speculative Probing: Hacking Blind in the Spectre Era. CCS 2020.
- Yan, M. et al. (2018). InvisiSpec: Making Speculative Execution Invisible in the Cache Hierarchy. MICRO 2018.
- Yu, J. et al. (2019). Speculative Taint Tracking (STT): A Comprehensive Protection for Speculatively Accessed Data. MICRO 2019.
- Qualcomm Technologies. (2017). Pointer Authentication on ARMv8.3: Design and Analysis of the New Software Security Instructions.
- Azad, B. (2019). Examining Pointer Authentication on the iPhone XS. Google Project Zero.