PACMAN: Breaking ARM Pointer Authentication with Speculative Execution

ARM Pointer Authentication was designed around a simple and elegant principle: if an attacker corrupts a protected pointer, the program crashes. No crash suppression, no oracle, no way to distinguish a correct guess from a wrong one. The PAC space is small enough to theoretically bruteforce, but every wrong guess terminates the process and rotates the keys. Security by crash.

PACMAN breaks that design at the hardware level.¹ The key insight is that speculative execution already suppresses exceptions by design. The CPU speculatively executes past a potential fault, observes micro-architectural side effects, then squashes the faulting instruction without triggering any architectural-visible exception. If you can route a PAC verification through a speculative path and transmit the result through a side channel, you get an oracle: a primitive that tells you whether a guessed PAC is correct or not, without ever crashing. Then it’s just a matter of iterating over all $2^n$ possible PAC values.

The paper demonstrates this against the Apple M1, the first desktop processor supporting ARMv8.3. A full cross-privilege attack, userspace against kernel, is shown working. 55,159 PACMAN gadgets were found in the XNU kernel alone.

ARM Pointer Authentication: The Mechanism Being Broken

ARM PA was introduced in ARMv8.3 (2017) and has shipped in Apple silicon since the A12 (2018). The M1, M1 Pro, M1 Max, and every subsequent Apple chip supports it. Qualcomm, Samsung, and most ARMv8.3+ vendors either ship it or have announced support.

The mechanism works by embedding a cryptographic hash of a pointer into the pointer’s unused upper bits. On macOS 12.2.1 on M1, virtual addresses are 48-bit, leaving 16 bits for the PAC. The PAC is computed from:

\[\text{PAC} = \text{QARMA}(\text{ptr}, \text{key}, \text{salt})\]

where QARMA is a lightweight tweakable block cipher, key is one of five hardware keys stored in privileged registers inaccessible to userspace (IA, IB, DA, DB, GA), and salt is a program-specified context value (typically the stack pointer or object address).

The signing and verification instructions are:

Instruction	Operation	Key
`pacia ptr, salt`	Sign instruction pointer with IA key	IA
`pacib ptr, salt`	Sign instruction pointer with IB key	IB
`pacda ptr, salt`	Sign data pointer with DA key	DA
`pacdb ptr, salt`	Sign data pointer with DB key	DB
`autia ptr, salt`	Verify and strip PAC (IA key)	IA
`autib ptr, salt`	Verify and strip PAC (IB key)	IB
`autda ptr, salt`	Verify and strip PAC (DA key)	DA
`autdb ptr, salt`	Verify and strip PAC (DB key)	DB

Verification failure poisons the pointer by setting specific bits in the PAC region. Any subsequent dereference triggers a translation fault. No crash suppression, no retry, no oracle. That’s the design guarantee.

Stack protection looks like this in practice:

; filename: pa-stack-protection.asm

; Function prologue — sign the return address
pacia lr, sp        ; sign lr using stack pointer as salt
sub   sp, sp, #0x40
str   lr, [sp, #0x30]  ; push signed return address

; Function epilogue — verify before use
ldr   lr, [sp, #0x30]  ; pop signed return address
add   sp, sp, #0x40
autia lr, sp        ; verify: strip PAC or poison if tampered
ret                 ; fault here if lr was corrupted

macOS uses PA for return addresses, C++ vtable pointers, vtable entries, and Objective-C method caches.

The PACMAN Gadget

A PACMAN gadget is a code pattern that, when executed speculatively, leaks whether a PAC verification succeeded or failed through a micro-architectural side channel. Two variants exist based on how the result is transmitted.

Data PACMAN gadget — transmits via a speculative memory load:

Diagram 1 — PACMAN through a data access — under a mis-speculated branch, the load's TLB side effect reveals whether the PAC guess verified, before the fault is suppressed.

Instruction PACMAN gadget — transmits via a speculative instruction fetch (requires eager nested-branch squash):

Diagram 2 — The instruction-fetch variant — a second mispredicted branch turns the iTLB fill into the oracle, leaking PAC validity with no architectural fault.

The data gadget in pseudocode:

// filename: data-pacman-gadget.c
if (cond) {             // BR1: trained to be taken, then mis-speculated
    verified_ptr = AUT(guessed_ptr);  // verify under speculation
    Load(verified_ptr);               // transmit via TLB side effect
}

The instruction gadget needs one additional hardware property: eager squashing of nested branches. When the processor speculatively executes past BR1 and encounters BR2 (the indirect branch using verified_ptr), it uses the BTB to predict BR2’s target before verified_ptr is resolved. Once AUT completes and verified_ptr is available, the processor detects BR2 misprediction and eagerly squashes BR2 while fetching from the actual target. That fetch is the side channel. The Apple M1’s aggressive out-of-order speculation makes it particularly susceptible here.

Gadget prevalence in XNU

The paper scanned XNU 12.2.1 (xnu-8019.80.24) using a Ghidra script: find conditional branches, inspect 32 instructions in both directions, match aut* destinations to subsequent load/branch source registers.

Gadget type	Count
Data PACMAN gadgets	13,867
Instruction PACMAN gadgets	41,292
Total	55,159
Average branch-to-transmit distance	8.1 instructions

The scan was conservative: only register data-dependence tracked, only 32 instructions inspected per branch. The real count is higher.

Reverse Engineering the Apple M1

The M1 is a 4 p-core + 4 e-core big.LITTLE design on AArch64 with full ARMv8.3 support. Apple does not publish micro-architectural details. No public documentation on the M1’s TLB organization existed at the time of the paper. The attack requires knowing the TLB parameters precisely to build the Prime+Probe eviction sets.

Cache hierarchy (from system registers via kext)

Structure	Ways	Sets	Line Size	Total Size
p-core L1I	6	512	64 B	192 KB
p-core L1D	8	256	64 B	128 KB
p-core L2	12	8192	128 B	12 MB
e-core L1I	8	256	64 B	128 KB
e-core L1D	8	128	64 B	64 KB
e-core L2	16	2048	128 B	4 MB
System L3	shared across all SoC components			16 MB

macOS 12.2.1 uses 48-bit virtual addresses with 16KB pages and 16-bit PACs.

TLB hierarchy (reverse engineered)

The TLB experiments used the eviction address formula:

\[\text{Addrs}[i] = x + i \times \text{stride} + i \times 128 \text{ B}, \quad 1 \leq i \leq N\]

where the $i \times 128\text{B}$ term maps each address to a distinct cache set to isolate TLB conflicts from cache conflicts. The paper varied stride (in multiples of 16KB) and N from 1 to 30 and observed three distinct latency jumps:

Access latency range	Explanation	Eviction trigger
~60 cycles	L1 dTLB hit	baseline
~80 cycles	L2 cache hit + L1 dTLB hit	L1D cache miss
~95 cycles	L1 dTLB miss	stride ≥ 256×16KB, N ≥ 12
~115 cycles	L2 TLB miss	stride ≥ 2048×16KB, N ≥ 23
~130 cycles	L2 TLB miss + L1D miss	both

Derived TLB parameters:

TLB	Ways	Sets	Stride to evict	Addresses needed
L1 dTLB	12	256	256 × 16 KB = 4 MB	≥ 12
L2 TLB	23	2048	2048 × 16 KB = 32 MB	≥ 23
L1 iTLB	4	32	32 × 16 KB = 512 KB	≥ 4

The L1 iTLB discovery revealed something unexpected: the L1 dTLB acts as a non-inclusive backing store for the L1 iTLB. When a page table entry is evicted from the L1 iTLB, it migrates to the L1 dTLB. This is critical for the cross-privilege attack because the L1 dTLB and L2 TLB are shared between userspace and kernelspace, while the L1 iTLB is private per privilege level.

Diagram 3 — M1 TLB hierarchy — private per-privilege L1 iTLBs feed a shared L1 dTLB and L2 TLB; the shared levels are what enable the cross-privilege channel.

The shared L1 dTLB is the cross-privilege side channel. A userspace attacker can Prime+Probe the L1 dTLB to observe kernel TLB activity.

The timer problem

The attack requires a high-resolution timer. On M1:

Timer	Register	EL0 accessible?	Resolution
System counter	`CNTPCT_EL0`	Yes	24 MHz (too slow)
ARM cycle counter	`PMCCNTR_EL0`	Not on M1	does not exist on M1
Apple PMC0	`S3_2_c15_c0_0`	No (kernel only)	Cycle-accurate
Multi-thread counter	shared memory	Yes	Sufficient

The custom multi-thread timer:

// filename: multithread-timer.c

// Thread 1: dedicated timer — no ISB to maximize resolution
volatile uint64_t counter;
void timerthread() {
    while (1) { counter++; }
}

// Thread 2: measuring thread — ISB enforces ordering
isb
ldr time1, [counter_addr]   // read counter before
isb
// ... operations to time ...
isb
ldr time2, [counter_addr]   // read counter after
isb
sub latency, time2, time1

The timer thread deliberately omits isb; the serialization penalty would slow counter increments and reduce resolution. The variance tradeoff is worth it. The resulting threshold for L1 dTLB hit vs. miss:

Measurement	Threshold	Notes
L1 dTLB hit	≤ 27 counts	never exceeded in experiments
L1 dTLB miss	≥ 32 counts	never below in experiments
Decision threshold	30 counts	used consistently in PoC attacks

The paper also built PacmanOS: a bare-metal Rust hypervisor that boots directly on M1 hardware, providing noise-free reverse engineering experiments with full MSR access. Used for TLB characterization only; the actual attack uses the multi-thread timer.

PAC Oracle Results

The PAC oracle is built on top of the TLB Prime+Probe channel. The full protocol for the data PACMAN gadget cross-privilege attack:

Train the branch predictor in the kernel PACMAN gadget to be taken (64 iterations)
Reset TLB hierarchy by accessing 23 addresses mapping to the same L2 TLB set (not part of the eviction set)
Prime the L1 dTLB set with 12 addresses at 256×16KB stride
Trigger the PACMAN gadget via syscall with the pointer embedding the guessed PAC
Probe the L1 dTLB: count misses

For the instruction PACMAN gadget, step 5 adds: evict the target entry from the kernelspace L1 iTLB to the shared L1 dTLB by fetching 4 instructions at 32×16KB stride (via 4 syscalls), then probe the L1 dTLB.

Results across 20,000 trials (10,000 correct PAC, 10,000 incorrect PAC):

Gadget type	Signal for incorrect PAC	Signal for correct PAC
Data PACMAN	0 misses in 99.2% of trials	≥ 5 misses in 99.6% of trials
Instruction PACMAN	≤ 1 miss in 99.2% of trials	≥ 5 misses in 99.8% of trials

The distributions are clearly bimodal. A threshold of 5 misses cleanly separates correct from incorrect PACs.

Brute-Force Attack

With a reliable oracle, the attack becomes iteration. The M1 uses 16-bit PACs, so the search space is $2^{16} = 65{,}536$ values. Each oracle query takes 2.69 ms (dominated by the 64 branch-predictor training iterations and kernel context-switch overhead).

Expected brute-force time:

\[\mathbb{E}[T_{\text{brute}}] = \frac{2^{16}}{2} \times 2.69 \text{ ms} = 88.1 \text{ seconds}\]

The paper reports this as ~2.94 minutes (for exhaustive search of all $2^{16}$ values, worst case). Average case is half that.

Accuracy across 50 brute-force runs under realistic noise (web browsing, video call running concurrently):

Outcome	Count	Rate
True positive (correct PAC found)	45 / 50	90%
False negative (no PAC found, repeat required)	5 / 50	10%
False positive (wrong PAC returned)	0 / 50	0%

Zero false positives is the critical number. A false positive would produce an incorrect PAC that crashes the system when used. Since false negatives are tolerable (retry until the correct PAC is found), the brute-force is both reliable and safe to run.

Jump2Win: Kernel Control-Flow Hijack

The end-to-end attack targets the C++ method dispatch process, which macOS uses pervasively for kernel object method calls:

// filename: vtable-dispatch-pa.c
vtable_ptr = AUT(*object_addr);  // verify vtable pointer (DA key)
fp         = AUT(vtable_ptr[i]); // verify function pointer (IA key)
call fp;                          // execute

Two separate PAC values must be forged: one for the vtable pointer (DA key) and one for the function pointer (IA key), each with salt derived from the object’s address plus a compile-time constant.

The attack target: two objects allocated contiguously in memory where object1 contains a buffer immediately preceding object2’s vtable pointer. The buffer overflow in object1 overwrites object2’s vtable pointer.

The attack steps:

Use PACMAN to brute-force the PAC for the address of win_function (the jump target) under the IA key with the appropriate salt
Use PACMAN to brute-force the PAC for the buffer address (which will serve as the fake vtable) under the DA key
Trigger the buffer overflow: fill object1’s buffer with a signed win_function address (PAC from step 1), overwrite object2’s vtable pointer with a signed buffer address (PAC from step 2)
Trigger a method call on object2: the kernel verifies and loads the buffer address as the vtable, indexes it to get the win_function address, verifies the function pointer PAC, and branches to win_function

The paper implemented this as a kext with an intentional buffer overflow, attacked from an unprivileged userspace process. The win_function executed in kernel context.

Countermeasures

The paper evaluates three mitigation directions. None are fully satisfying.

Direction	Approach	Problem
PAC-agnostic execution	`isb` after every `aut*` instruction	Enormous performance penalty; `aut*` is extremely common in PA-enabled code
PAC-agnostic execution	Always speculate AUT as success	Introduces Meltdown-style vulnerability by allowing speculative deref of invalid pointers
Invisible speculation	InvisiSpec, SafeSpec, Delay-on-Miss	Broken by speculative interference attacks; also need to extend from caches to TLBs
Information flow tracking	STT, NDA, Dolma	These taint from load instructions; PACMAN taint starts from `aut*`, and needs re-purposing
Software	Fix memory corruption bugs	Necessary but not sufficient; PACMAN exists as long as speculation + PA coexist

The information flow tracking fix is straightforward in principle: mark the output register of any aut* instruction as tainted and propagate. In practice, none of the existing IFT frameworks implement this. It would require hardware changes and compiler cooperation.

The real problem is architectural. PAC’s security-by-crash guarantee and speculative execution’s crash-suppression capability are fundamentally at odds. You cannot have aggressive out-of-order speculation and unconditional crash-on-bad-PAC simultaneously. One of them has to give.

Why This Matters

Pointer Authentication is the primary hardware-assisted CFI primitive for the ARM ecosystem right now. PAC It Up², PTAuth³, PACStack⁴, AOS, and several other research systems all build on PA’s security properties, evaluated exclusively under the memory-safety threat model. PACMAN invalidates those evaluations under any threat model that includes speculative execution primitives, which is every real-world deployment.

The attack generalizes beyond M1 in two ways. First, the PACMAN gadget structure is generic: any AArch64 processor with ARMv8.3 PA and sufficient speculation depth is potentially vulnerable. Second, TLB-based Prime+Probe cross-privilege attacks will work on any processor where TLBs are shared across privilege levels, which is a common design choice. The paper is the first to demonstrate TLB-based side-channel attacks with speculative execution on Apple processors specifically, but the methodology extends.

The immediate practical scope is every M1, M1 Pro, M1 Max, M2, and subsequent Apple Silicon device running macOS and iOS. In the longer term, every server and mobile device shipping ARMv8.3+ silicon.

Source and PoC: pacmanattack.com.

References

Ravichandran, J., Na, W.T., Lang, J. & Yan, M. (2022). PACMAN: Attacking ARM Pointer Authentication with Speculative Execution. ISCA 2022. ↩
Liljestrand, H. et al. (2019). PAC it up: Towards Pointer Integrity using ARM Pointer Authentication. USENIX Security 2019. ↩
Farkhani, R.M., Ahmadi, M. & Lu, L. (2021). PTAuth: Temporal Memory Safety via Robust Points-to Authentication. USENIX Security 2021. ↩
Liljestrand, H. et al. (2021). PACStack: an Authenticated Call Stack. USENIX Security 2021. ↩