ARM Pointer Authentication was designed around a simple and elegant principle: if an attacker corrupts a protected pointer, the program crashes. No crash suppression, no oracle, no way to distinguish a correct guess from a wrong one. The PAC space is small enough to theoretically bruteforce, but every wrong guess terminates the process and rotates the keys. Security by crash.

PACMAN breaks that design at the hardware level. The key insight is that speculative execution already suppresses exceptions by design. The CPU speculatively executes past a potential fault, observes micro-architectural side effects, then squashes the faulting instruction without triggering any architectural-visible exception. If you can route a PAC verification through a speculative path and transmit the result through a side channel, you get an oracle: a primitive that tells you whether a guessed PAC is correct or not, without ever crashing. Then it’s just a matter of iterating over all $2^n$ possible PAC values.

The paper demonstrates this against the Apple M1, the first desktop processor supporting ARMv8.3. A full cross-privilege attack, userspace against kernel, is shown working. 55,159 PACMAN gadgets were found in the XNU kernel alone.

ARM Pointer Authentication: The Mechanism Being Broken

ARM PA was introduced in ARMv8.3 (2017) and has shipped in Apple silicon since the A12 (2018). The M1, M1 Pro, M1 Max, and every subsequent Apple chip supports it. Qualcomm, Samsung, and most ARMv8.3+ vendors either ship it or have announced support.

The mechanism works by embedding a cryptographic hash of a pointer into the pointer’s unused upper bits. On macOS 12.2.1 on M1, virtual addresses are 48-bit, leaving 16 bits for the PAC. The PAC is computed from:

\[\text{PAC} = \text{QARMA}(\text{ptr}, \text{key}, \text{salt})\]

where QARMA is a lightweight tweakable block cipher, key is one of five hardware keys stored in privileged registers inaccessible to userspace (IA, IB, DA, DB, GA), and salt is a program-specified context value (typically the stack pointer or object address).

The signing and verification instructions are:

Instruction Operation Key
pacia ptr, salt Sign instruction pointer with IA key IA
pacib ptr, salt Sign instruction pointer with IB key IB
pacda ptr, salt Sign data pointer with DA key DA
pacdb ptr, salt Sign data pointer with DB key DB
autia ptr, salt Verify and strip PAC (IA key) IA
autib ptr, salt Verify and strip PAC (IB key) IB
autda ptr, salt Verify and strip PAC (DA key) DA
autdb ptr, salt Verify and strip PAC (DB key) DB

Verification failure poisons the pointer by setting specific bits in the PAC region. Any subsequent dereference triggers a translation fault. No crash suppression, no retry, no oracle. That’s the design guarantee.

Stack protection looks like this in practice:

; filename: pa-stack-protection.asm

; Function prologue — sign the return address
pacia lr, sp        ; sign lr using stack pointer as salt
sub   sp, sp, #0x40
str   lr, [sp, #0x30]  ; push signed return address

; Function epilogue — verify before use
ldr   lr, [sp, #0x30]  ; pop signed return address
add   sp, sp, #0x40
autia lr, sp        ; verify: strip PAC or poison if tampered
ret                 ; fault here if lr was corrupted

macOS uses PA for return addresses, C++ vtable pointers, vtable entries, and Objective-C method caches.

The PACMAN Gadget

A PACMAN gadget is a code pattern that, when executed speculatively, leaks whether a PAC verification succeeded or failed through a micro-architectural side channel. Two variants exist based on how the result is transmitted.

Data PACMAN gadget — transmits via a speculative memory load:

flowchart LR
    t1["t1 · BR1 mis-speculates\ncond=false, trained=true"]
    t2["t2 · AUT(guessed_ptr)\nverify PAC under speculation"]
    ok["valid_ptr"]
    bad["poisoned_ptr"]
    t3a["t3 · Load(valid_ptr)\nTLB side effect observable\nPAC is CORRECT"]
    t3b["t3 · speculative exception\nnot issued to memory\nPAC is INCORRECT"]
    t4["t4 · BR1 squashed\nexception suppressed, no crash"]

    t1 --> t2
    t2 -->|correct PAC| ok --> t3a --> t4
    t2 -->|incorrect PAC| bad --> t3b --> t4

Instruction PACMAN gadget — transmits via a speculative instruction fetch (requires eager nested-branch squash):

flowchart LR
    i1["t1 · BR1 mis-speculates"]
    i2["t2 · AUT(guessed_ptr)\nBTB predicts BR2 target in parallel"]
    i3["t3 · AUT completes\nBR2 misprediction → eager squash"]
    iok["valid_ptr"]
    ibad["poisoned_ptr"]
    i4a["t4 · fetch from valid_ptr\niTLB side effect observable\nPAC is CORRECT"]
    i4b["t4 · fetch not issued\nspeculative exception\nPAC is INCORRECT"]
    i5["t5 · BR1 squashed\nexception suppressed, no crash"]

    i1 --> i2 --> i3
    i3 -->|correct PAC| iok --> i4a --> i5
    i3 -->|incorrect PAC| ibad --> i4b --> i5

The data gadget in pseudocode:

// filename: data-pacman-gadget.c
if (cond) {             // BR1: trained to be taken, then mis-speculated
    verified_ptr = AUT(guessed_ptr);  // verify under speculation
    Load(verified_ptr);               // transmit via TLB side effect
}

The instruction gadget needs one additional hardware property: eager squashing of nested branches. When the processor speculatively executes past BR1 and encounters BR2 (the indirect branch using verified_ptr), it uses the BTB to predict BR2’s target before verified_ptr is resolved. Once AUT completes and verified_ptr is available, the processor detects BR2 misprediction and eagerly squashes BR2 while fetching from the actual target. That fetch is the side channel. The Apple M1’s aggressive out-of-order speculation makes it particularly susceptible here.

Gadget prevalence in XNU

The paper scanned XNU 12.2.1 (xnu-8019.80.24) using a Ghidra script: find conditional branches, inspect 32 instructions in both directions, match aut* destinations to subsequent load/branch source registers.

Gadget type Count
Data PACMAN gadgets 13,867
Instruction PACMAN gadgets 41,292
Total 55,159
Average branch-to-transmit distance 8.1 instructions

The scan was conservative: only register data-dependence tracked, only 32 instructions inspected per branch. The real count is higher.

Reverse Engineering the Apple M1

The M1 is a 4 p-core + 4 e-core big.LITTLE design on AArch64 with full ARMv8.3 support. Apple does not publish micro-architectural details. No public documentation on the M1’s TLB organization existed at the time of the paper. The attack requires knowing the TLB parameters precisely to build the Prime+Probe eviction sets.

Cache hierarchy (from system registers via kext)

Structure Ways Sets Line Size Total Size
p-core L1I 6 512 64 B 192 KB
p-core L1D 8 256 64 B 128 KB
p-core L2 12 8192 128 B 12 MB
e-core L1I 8 256 64 B 128 KB
e-core L1D 8 128 64 B 64 KB
e-core L2 16 2048 128 B 4 MB
System L3 shared across all SoC components     16 MB

macOS 12.2.1 uses 48-bit virtual addresses with 16KB pages and 16-bit PACs.

TLB hierarchy (reverse engineered)

The TLB experiments used the eviction address formula:

\[\text{Addrs}[i] = x + i \times \text{stride} + i \times 128 \text{ B}, \quad 1 \leq i \leq N\]

where the $i \times 128\text{B}$ term maps each address to a distinct cache set to isolate TLB conflicts from cache conflicts. The paper varied stride (in multiples of 16KB) and N from 1 to 30 and observed three distinct latency jumps:

Access latency range Explanation Eviction trigger
~60 cycles L1 dTLB hit baseline
~80 cycles L2 cache hit + L1 dTLB hit L1D cache miss
~95 cycles L1 dTLB miss stride ≥ 256×16KB, N ≥ 12
~115 cycles L2 TLB miss stride ≥ 2048×16KB, N ≥ 23
~130 cycles L2 TLB miss + L1D miss both

Derived TLB parameters:

TLB Ways Sets Stride to evict Addresses needed
L1 dTLB 12 256 256 × 16 KB = 4 MB ≥ 12
L2 TLB 23 2048 2048 × 16 KB = 32 MB ≥ 23
L1 iTLB 4 32 32 × 16 KB = 512 KB ≥ 4

The L1 iTLB discovery revealed something unexpected: the L1 dTLB acts as a non-inclusive backing store for the L1 iTLB. When a page table entry is evicted from the L1 iTLB, it migrates to the L1 dTLB. This is critical for the cross-privilege attack because the L1 dTLB and L2 TLB are shared between userspace and kernelspace, while the L1 iTLB is private per privilege level.

flowchart TD
    US_iTLB["Userspace L1 iTLB\n4-way · 32 sets (private)"]
    KS_iTLB["Kernelspace L1 iTLB\n4-way · 32 sets (private)"]
    L1D["L1 dTLB\n12-way · 256 sets\nSHARED (EL0 + EL1)"]
    L2["L2 TLB\n23-way · 2048 sets\nSHARED (EL0 + EL1)"]
    MEM["Page Table Walker"]

    US_iTLB -->|"evict → insert"| L1D
    KS_iTLB -->|"evict → insert"| L1D
    L1D -->|miss| L2
    L2 -->|miss| MEM

The shared L1 dTLB is the cross-privilege side channel. A userspace attacker can Prime+Probe the L1 dTLB to observe kernel TLB activity.

The timer problem

The attack requires a high-resolution timer. On M1:

Timer Register EL0 accessible? Resolution
System counter CNTPCT_EL0 Yes 24 MHz (too slow)
ARM cycle counter PMCCNTR_EL0 Not on M1 does not exist on M1
Apple PMC0 S3_2_c15_c0_0 No (kernel only) Cycle-accurate
Multi-thread counter shared memory Yes Sufficient

The custom multi-thread timer:

// filename: multithread-timer.c

// Thread 1: dedicated timer — no ISB to maximize resolution
volatile uint64_t counter;
void timerthread() {
    while (1) { counter++; }
}

// Thread 2: measuring thread — ISB enforces ordering
isb
ldr time1, [counter_addr]   // read counter before
isb
// ... operations to time ...
isb
ldr time2, [counter_addr]   // read counter after
isb
sub latency, time2, time1

The timer thread deliberately omits isb; the serialization penalty would slow counter increments and reduce resolution. The variance tradeoff is worth it. The resulting threshold for L1 dTLB hit vs. miss:

Measurement Threshold Notes
L1 dTLB hit ≤ 27 counts never exceeded in experiments
L1 dTLB miss ≥ 32 counts never below in experiments
Decision threshold 30 counts used consistently in PoC attacks

The paper also built PacmanOS: a bare-metal Rust hypervisor that boots directly on M1 hardware, providing noise-free reverse engineering experiments with full MSR access. Used for TLB characterization only; the actual attack uses the multi-thread timer.

PAC Oracle Results

The PAC oracle is built on top of the TLB Prime+Probe channel. The full protocol for the data PACMAN gadget cross-privilege attack:

  1. Train the branch predictor in the kernel PACMAN gadget to be taken (64 iterations)
  2. Reset TLB hierarchy by accessing 23 addresses mapping to the same L2 TLB set (not part of the eviction set)
  3. Prime the L1 dTLB set with 12 addresses at 256×16KB stride
  4. Trigger the PACMAN gadget via syscall with the pointer embedding the guessed PAC
  5. Probe the L1 dTLB: count misses

For the instruction PACMAN gadget, step 5 adds: evict the target entry from the kernelspace L1 iTLB to the shared L1 dTLB by fetching 4 instructions at 32×16KB stride (via 4 syscalls), then probe the L1 dTLB.

Results across 20,000 trials (10,000 correct PAC, 10,000 incorrect PAC):

Gadget type Signal for incorrect PAC Signal for correct PAC
Data PACMAN 0 misses in 99.2% of trials ≥ 5 misses in 99.6% of trials
Instruction PACMAN ≤ 1 miss in 99.2% of trials ≥ 5 misses in 99.8% of trials

The distributions are clearly bimodal. A threshold of 5 misses cleanly separates correct from incorrect PACs.

Brute-Force Attack

With a reliable oracle, the attack becomes iteration. The M1 uses 16-bit PACs, so the search space is $2^{16} = 65{,}536$ values. Each oracle query takes 2.69 ms (dominated by the 64 branch-predictor training iterations and kernel context-switch overhead).

Expected brute-force time:

\[\mathbb{E}[T_{\text{brute}}] = \frac{2^{16}}{2} \times 2.69 \text{ ms} = 88.1 \text{ seconds}\]

The paper reports this as ~2.94 minutes (for exhaustive search of all $2^{16}$ values, worst case). Average case is half that.

Accuracy across 50 brute-force runs under realistic noise (web browsing, video call running concurrently):

Outcome Count Rate
True positive (correct PAC found) 45 / 50 90%
False negative (no PAC found, repeat required) 5 / 50 10%
False positive (wrong PAC returned) 0 / 50 0%

Zero false positives is the critical number. A false positive would produce an incorrect PAC that crashes the system when used. Since false negatives are tolerable (retry until the correct PAC is found), the brute-force is both reliable and safe to run.

Jump2Win: Kernel Control-Flow Hijack

The end-to-end attack targets the C++ method dispatch process, which macOS uses pervasively for kernel object method calls:

// filename: vtable-dispatch-pa.c
vtable_ptr = AUT(*object_addr);  // verify vtable pointer (DA key)
fp         = AUT(vtable_ptr[i]); // verify function pointer (IA key)
call fp;                          // execute

Two separate PAC values must be forged: one for the vtable pointer (DA key) and one for the function pointer (IA key), each with salt derived from the object’s address plus a compile-time constant.

The attack target: two objects allocated contiguously in memory where object1 contains a buffer immediately preceding object2’s vtable pointer. The buffer overflow in object1 overwrites object2’s vtable pointer.

The attack steps:

  1. Use PACMAN to brute-force the PAC for the address of win_function (the jump target) under the IA key with the appropriate salt
  2. Use PACMAN to brute-force the PAC for the buffer address (which will serve as the fake vtable) under the DA key
  3. Trigger the buffer overflow: fill object1’s buffer with a signed win_function address (PAC from step 1), overwrite object2’s vtable pointer with a signed buffer address (PAC from step 2)
  4. Trigger a method call on object2: the kernel verifies and loads the buffer address as the vtable, indexes it to get the win_function address, verifies the function pointer PAC, and branches to win_function

The paper implemented this as a kext with an intentional buffer overflow, attacked from an unprivileged userspace process. The win_function executed in kernel context.

Countermeasures

The paper evaluates three mitigation directions. None are fully satisfying.

Direction Approach Problem
PAC-agnostic execution isb after every aut* instruction Enormous performance penalty; aut* is extremely common in PA-enabled code
PAC-agnostic execution Always speculate AUT as success Introduces Meltdown-style vulnerability by allowing speculative deref of invalid pointers
Invisible speculation InvisiSpec, SafeSpec, Delay-on-Miss Broken by speculative interference attacks; also need to extend from caches to TLBs
Information flow tracking STT, NDA, Dolma These taint from load instructions; PACMAN taint starts from aut*, and needs re-purposing
Software Fix memory corruption bugs Necessary but not sufficient; PACMAN exists as long as speculation + PA coexist

The information flow tracking fix is straightforward in principle: mark the output register of any aut* instruction as tainted and propagate. In practice, none of the existing IFT frameworks implement this. It would require hardware changes and compiler cooperation.

The real problem is architectural. PAC’s security-by-crash guarantee and speculative execution’s crash-suppression capability are fundamentally at odds. You cannot have aggressive out-of-order speculation and unconditional crash-on-bad-PAC simultaneously. One of them has to give.

Why This Matters

Pointer Authentication is the primary hardware-assisted CFI primitive for the ARM ecosystem right now. PAC It Up, PTAuth, PACStack, AOS, and several other research systems all build on PA’s security properties, evaluated exclusively under the memory-safety threat model. PACMAN invalidates those evaluations under any threat model that includes speculative execution primitives, which is every real-world deployment.

The attack generalizes beyond M1 in two ways. First, the PACMAN gadget structure is generic: any AArch64 processor with ARMv8.3 PA and sufficient speculation depth is potentially vulnerable. Second, TLB-based Prime+Probe cross-privilege attacks will work on any processor where TLBs are shared across privilege levels, which is a common design choice. The paper is the first to demonstrate TLB-based side-channel attacks with speculative execution on Apple processors specifically, but the methodology extends.

The immediate practical scope is every M1, M1 Pro, M1 Max, M2, and subsequent Apple Silicon device running macOS and iOS. In the longer term, every server and mobile device shipping ARMv8.3+ silicon.

Source and PoC: pacmanattack.com.

References

  1. Ravichandran, J., Na, W.T., Lang, J. & Yan, M. (2022). PACMAN: Attacking ARM Pointer Authentication with Speculative Execution. ISCA 2022.
  2. Kocher, P. et al. (2019). Spectre Attacks: Exploiting Speculative Execution. IEEE S&P 2019.
  3. Lipp, M. et al. (2018). Meltdown: Reading Kernel Memory from User Space. USENIX Security 2018.
  4. Gras, B., Razavi, K., Bos, H. & Giuffrida, C. (2018). Translation Leak-aside Buffer: Defeating Cache Side-channel Protections with TLB Attacks. USENIX Security 2018.
  5. Yarom, Y. & Falkner, K. (2014). Flush+Reload: A High Resolution, Low Noise, L3 Cache Side-Channel Attack. USENIX Security 2014.
  6. Liljestrand, H. et al. (2019). PAC it up: Towards Pointer Integrity using ARM Pointer Authentication. USENIX Security 2019.
  7. Farkhani, R.M., Ahmadi, M. & Lu, L. (2021). PTAuth: Temporal Memory Safety via Robust Points-to Authentication. USENIX Security 2021.
  8. Liljestrand, H. et al. (2021). PACStack: an Authenticated Call Stack. USENIX Security 2021.
  9. Göktas, E. et al. (2020). Speculative Probing: Hacking Blind in the Spectre Era. CCS 2020.
  10. Yan, M. et al. (2018). InvisiSpec: Making Speculative Execution Invisible in the Cache Hierarchy. MICRO 2018.
  11. Yu, J. et al. (2019). Speculative Taint Tracking (STT): A Comprehensive Protection for Speculatively Accessed Data. MICRO 2019.
  12. Qualcomm Technologies. (2017). Pointer Authentication on ARMv8.3: Design and Analysis of the New Software Security Instructions.
  13. Azad, B. (2019). Examining Pointer Authentication on the iPhone XS. Google Project Zero.