SoK: Bootloader Security Is Worse Than You Think

Bootloaders sit at the most privileged layer of any system. They run before the OS, before virtual memory is initialized, before most exploit mitigations exist as a concept. A compromise here doesn’t just give you ring 0. It gives you something that survives OS reinstalls, disk replacements, and most incident response playbooks. That’s a bootkit, and they’re not theoretical.¹

In 2024 alone there were 204 bootloader CVEs. Some led to remote code execution in privileged environments. What this SoK from Purdue reveals is that the field’s ability to detect and prevent these vulnerabilities is much weaker than the threat model warrants: 25 analysis tools evaluated, 3 that could actually run, 1 attack surface out of 6 where anything was detected. The tools that exist are specialized to surfaces that aren’t where the interesting attacks live.

Three Types, Eight Stages

The paper classifies 43 bootloaders into three types based on starting point and target software. Every downstream finding about attack surfaces, tool coverage, and mitigation applicability maps back to this taxonomy.

Diagram 1 — Bootloader taxonomy — firmware (Type 1) either splits to a second-stage OS bootloader (Type 2) or boots a monolithic image directly (Type 3).

The paper deep-dives on 7 bootloaders selected for distinct design characteristics and CVE density. Their CI/CD integration tells you a lot about what automated security analysis looks like in practice:

Bootloader	Type	Key Architectures	CI/CD Security Tools	Attack Surfaces
coreboot	1	RISC-V, x86, ARM, PPC	CodeQL, Cppcheck, OSS-Fuzz, OpenSSF	has1, has2, sas1–4
EDK-II (UEFI)	1	x86, ARM, RISC-V, MIPS, LoongArch	CodeQL, Coverity, OSS-Fuzz	has1, has2, sas1–4
SeaBIOS	1	x86	none	has1, has2, sas1–4
GRUB2	2	x86, ARM, PPC, MIPS, SPARC, RISC-V	OSS-Fuzz	has1, has2, sas1, sas2, sas4
Windows Boot Manager	2	x86, ARM	(closed source)	has1, has2, sas1, sas2, sas4
MCUboot	3	32-bit MCU	CodeQL, Cppcheck, libFuzzer, OpenSSF	has1, sas2, sas3
Das U-Boot	3	ARM, x86, ARC, MIPS, PPC, RISC-V, Xtensa	CodeQL, Cppcheck, OSS-Fuzz, OpenSSF	has1, sas1, sas2, sas3, sas4

Type 1 bootloaders expose all 6 attack surfaces. Type 2 exposes 5 (no post-boot SMI handlers). Type 3 shows the most variability, with several surfaces only partially applicable depending on board configuration.

The 8-stage boot process cuts across these types differently. Stages 5–8 (OS handoff and beyond) are absent in type 1 since it remains OS-agnostic. Stage 4 (bootloader handoff) is absent in type 3 since there’s no staged boot. Stage 3 implementations also diverge significantly: coreboot uses multiple sub-components (bootblock, verstage, romstage, ramstage), while EDK-II wraps it in a single DXE phase.

Six Attack Surfaces, Most of Them Ignored

The paper defines 6 attack surfaces across hardware and software categories. Not all surfaces apply to all types.

Surface	Category	What the Attacker Needs	Known Vulns in Dataset	Representative Attacks
`has1`	Hardware - Invasive	Physical dismantly access	2	MoonBounce (SPI flash, EDK-II), NAND glitching (U-Boot)
`has2`	Hardware - External	Peripheral plugging	4	ThunderStrike2 (Thunderbolt OptionROMs, EDK-II), CVE-2020-25647 (USB, GRUB)
`sas1`	Software - Remote	Network boot exposure	12	PixieFAIL (9 CVEs in EDK-II PXE IPv6), CVE-2023-40547 shim PXE, CVE-2018-18439 U-Boot TFTP
`sas2`	Software - Persistent	Write access to storage	16	BootHole CVE-2020-10713 (GRUB grub.cfg), LogoFAIL (EDK-II logo parsing), CVE-2019-13104 (U-Boot ext4)
`sas3`	Software - Post-boot	OS-level access	10	TrickBoot (SMI callbacks, EDK-II/SeaBIOS), UbootKit (flash rewrite, U-Boot)
`sas4`	Software - Boot Time	Interactive boot interface	45	CVE-2024-7756 (EDK-II shell), CVE-2024-49504 (GRUB), CVE-2023-48426 (U-Boot)

The surface distribution by bootloader type:

Surface	Type 1	Type 2	Type 3
`has1`	MoonBounce	FinFisher (MBR)	Glitching U-Boot
`has2`	ThunderStrike2	CVE-2020-25647	not applicable
`sas1`	PixieFAIL	CVE-2023-40547	CVE-2018-18439 (partial)
`sas2`	LogoFAIL	BootHole CVE-2020-10713	CVE-2019-13104
`sas3`	TrickBoot	not applicable	UbootKit (partial)
`sas4`	CVE-2024-7756	CVE-2024-49504	CVE-2023-48426 (partial)

The vulnerability count per surface reveals where the tooling gap is worst: sas4 has 45 known vulnerabilities and the most tool coverage, while has1 and has2 together have 6 known vulnerabilities and essentially zero automated detection capability. sas2 with 16 known vulns (BootHole, LogoFAIL class) is the most dangerous underexplored software surface.

The paper found over 40 vulnerabilities in BOOTBENCH along surfaces covered by none of the 25 evaluated tools. Not low coverage. Zero coverage.

BOOTBENCH: What 3,658 Vulnerabilities Look Like

There was no comprehensive bootloader vulnerability dataset before this paper. Prior tools evaluated on upstream bootloaders only or used fewer than 10 known bugs as ground truth. BOOTBENCH is built from two sources: mining the CVE database for 1,157 bootloader CVEs, and keyword-filtering commit histories (derived from MITRE’s CWE list) across all 43 bootloaders to surface 2,501 additional security bugs that were fixed without a CVE assignment.

Vulnerability distribution by bootloader:

Bootloader	Bug Count	Notes
Legacy BIOS	367	Largest share; legacy codebase, decades of debt
EDK-II	357	Most active research target
GRUB	74	High-profile CVEs (BootHole, Shim Shady)
U-Boot	55	Broad hardware support creates large attack surface
Others (39 bootloaders)	2,805	Mostly non-CVE commit-mined bugs

Dominant vulnerability classes: privilege escalation, memory overflows (stack and heap), and information disclosure. The paper acknowledges dataset bias: CVE mining relies on vendor disclosure practices, and commit keyword filtering can miss vulnerabilities patched under innocuous commit messages.

The evaluation methodology addresses this by selecting a single commit per bootloader where the maximum number of simultaneously present, verifiable vulnerabilities exist, enabling direct pre/post-patch comparison. Table 7 in the paper lists the 7 evaluation bootloaders, their commit hashes, and verified vulnerability counts at each revision.

25 Tools, 3 That Run, 1 That Finds Anything

The paper sources 25 tools from the top security and software engineering venues over the past decade: 10 static, 15 dynamic. Selection criteria for evaluation: open-source, documented, runnable without specialized hardware. The tool evaluation results are the paper’s most damning finding.

Static Analysis

Tool	EDK-II (16 bugs)	coreboot (8)	SeaBIOS (3)	GRUB (29)	shim (11)	U-Boot (12)	TF-A (11)
CodeQL	0 TP / 23 FP	0 / 0	0 / 5	0 / 17	0 / 0	0 / 56	0 / 0
FwHunt	0 / 0	failed	failed	failed	failed	failed	failed
STASE	5 / 0	failed	failed	failed	failed	failed	failed

\[\text{FNR}(T) = \frac{\text{FN}(T)}{\text{TP}(T) + \text{FN}(T)}\]

STASE is the best static tool in the study: $\text{FNR} = 11/16 = 68.8\%$. CodeQL and FwHunt are at 100% false negative rate. Neither FwHunt nor STASE can run on anything outside EDK-II. CodeQL ran across all 7 bootloaders and produced zero true positives everywhere, with particularly bad noise on U-Boot (56 false positives) and EDK-II (23).

The reason CodeQL fails isn’t a bug in CodeQL. Firmware-specific taint sources and sinks aren’t modeled. MMIO reads, NVRAM variable reads, and interrupt-delivered buffers are all valid attacker-controlled inputs in bootloader code. CodeQL’s default queries know none of them.

Dynamic Analysis

Tool	EDK-II (16 bugs)	coreboot (8)	SeaBIOS (3)	GRUB (29)	shim (11)	U-Boot (12)	TF-A (11)
FuzzUEr	1 TP / 0 FP	failed	failed	failed	failed	failed	failed
HBFA	0 / 0	failed	failed	failed	failed	failed	failed
efi_fuzz	0 / 0	failed	failed	failed	failed	failed	failed
BootFuzz	failed	failed	0 / 0	failed	failed	failed	failed

FuzzUEr found 1 vulnerability: a memory safety bug in EDK-II. It missed the other 15 because it doesn’t support the relevant attack surfaces and couldn’t reconstruct the state required to reach deeper code paths.

HBFA stubs driver functionality to run in userspace. The USB_IO_PROTOCOL is fully stubbed: drivers that depend on it compile but most code is replaced by return EFI_SUCCESS. It’s not fuzzing; it’s fuzzing a shadow of the target.

efi_fuzz targets exclusively NVRAM variable fuzzing (sas2), making it the most constrained tool in the evaluation.

OSS-Fuzz adoption: only EDK-II and systemd-boot participate, achieving 25.72% and 29.37% code coverage respectively. Both reflect generic rather than bootloader-specific fuzzing configurations.

SAST adoption more broadly: only 20.1% (9 of 43+) open-source bootloaders integrate any general-purpose static analysis, and all leave tools at default configuration, which prior work shows is largely ineffective without firmware-specific tuning.

Why Static Analysis Fails: The Four Hard Problems

The paper breaks down static analysis failure into four technical challenges worth understanding in detail.

SATC1: Points-to analysis. Bootloader code is pointer-heavy and manages memory through direct physical addresses rather than malloc/free. EDK-II in particular uses both data and function pointers extensively, with targets that depend on dynamic configuration rather than static structure. Standard points-to analysis techniques assume memory allocation APIs as anchor points. Those don’t exist here.

SATC2: Taint source and sink identification. This is illustrated directly in the paper with a code example from EDK-II’s SMI handler interface. The interface accepts a VOID* communication buffer that gets cast to different concrete structures depending on which handler is invoked:

// filename: EDK-II SMI handler (Listing 1 from paper)
typedef struct { UINT32 KeyId; UINT8 Data[32]; } CREATE_KEY_COMM;
typedef struct { UINT32 KeyId; }                 DELETE_KEY_COMM;

EFI_STATUS CreateKeyHandler(EFI_HANDLE DispatchHandle,
    CONST VOID *Context, VOID *CommBuffer, UINTN *CommBufferSize)
{
    // VOID* recast: structure layout depends on which handler was dispatched
    CREATE_KEY_COMM *Req = (CREATE_KEY_COMM *)CommBuffer;
    // attacker controls CommBuffer contents; Req->KeyId and Req->Data are tainted
    return EFI_SUCCESS;
}

EFI_STATUS DeleteKeyHandler(EFI_HANDLE DispatchHandle,
    CONST VOID *Context, VOID *CommBuffer, UINTN *CommBufferSize)
{
    // same VOID* type, different concrete layout: 8 bytes vs 36 bytes
    DELETE_KEY_COMM *Req = (DELETE_KEY_COMM *)CommBuffer;
    // fuzzer must know to send different structure per handler
    return EFI_SUCCESS;
}

Even within a single attack surface, required fields differ across protocols. Fs->OpenVolume(Fs, &Root) and Rng->GetRNG(Rng, NULL, Size, Buf) are both sas4 protocol calls but have entirely different argument structures. No existing tool automatically identifies this. They rely on manual annotations, protocol-specific heuristics, or fixed patterns.

The practical consequence is illustrated by BootHole (CVE-2020-10713). The vulnerability requires tracking attacker-controlled data from grub.cfg through the flex-generated parser:

grub.cfg  (attacker writes this)
  └─► flex parser reads config
        └─► GRUB redefines YY_FATAL_ERROR() to return instead of abort
              └─► unchecked input reaches yy_flex_strncpy()
                    └─► heap buffer overflow → code execution + Secure Boot bypass

Detecting this requires inter-procedural, alias-aware taint tracking that models GRUB’s YY_FATAL_ERROR redefinition as a flow-altering sink modification. None of the evaluated tools handle it.

SATC3: Interrupt-driven control flow. Bootloader execution is not sequential. Hardware interrupts, firmware exceptions, and asynchronous callbacks introduce non-linear flows invisible in static call graphs. SMI handlers are the canonical example: they fire asynchronously and are invisible to analysis that doesn’t explicitly model SMM dispatch. Most tools handle protocol-based callbacks in limited ways but don’t model hardware interrupt delivery at all.

SATC4: Cross-stage state persistence. Bootloaders pass state across stages through NVRAM, system tables, and reserved memory regions. A taint originating in EDK-II’s DXE phase (a NVRAM variable write) may be read later by GRUB. No tool models this. SPENDER and Exite handle SMM/SMRAM persistence. BootStomp modeled NVRAM in Android bootloaders. Full cross-stage tracking across different bootloaders in the same boot chain remains unsolved.

Why Dynamic Analysis Fails: The Three Hard Problems

Execution environment. Bootloaders are hardware-dependent and interrupt-triggered. QEMU supports bootloader execution, but coverage is limited by driver emulation gaps. barebox supports 20 different SPI drivers across all configurations; only 5 work in QEMU. Differences in board-specific initialization sequences compound this: initializing permanent memory on an ARM mach-rockchip board takes 4 steps while Freescale imx8mm_evk requires 7 due to a different memory layout. No automated rehosting technique exists for COTS bootloaders.

Interface invocation. Even when the execution environment works, identifying which interfaces exist, triggering them correctly, and reconstructing the state they depend on is largely manual. In EDK-II, protocol drivers perform a DriverBindingSupported check before loading to confirm required hardware is present. Even when compiled, some protocols aren’t active at runtime. Once identified, interfaces like CreateKeyHandler (sas3) can be invoked externally, but others (certain SMM protocols) require firmware modification to reach. Most tools handle this through manual harness writing.

Bug oracles. Bootloaders in early stages (1–3) lack virtual memory protections, so memory corruption bugs often don’t crash anything observable. Standard sanitizers (ASan, UBSan) rely on virtual memory. The bare-metal KASAN implementation has the same limitations. FuzzUEr required a heavily customized ASan adaptation that only works in stage 3. Standard U-Boot has enough contiguous memory for ASan’s shadow mapping, but handling shadow and poisoned memory before permanent memory initialization in stage 2 is an unsolved engineering problem.

Exploit Mitigations: None of the Standard Options Work

The paper checked adoption of 6 standard exploit mitigations across all bootloaders. The constraint is fundamental: most mitigations require an MMU, entropy sources, or virtual-memory-based compiler instrumentation, none of which exist in early boot.

Mitigation	Requirement	Early Boot Feasibility	Adoption
ASLR	Entropy source, relocatable code, loader	Not feasible (flat address space, static linking)	Essentially absent
Stack Canaries	Entropy source	Feasible in later stages once entropy available	Sparse
NX	MMU	Feasible post-MMU initialization	Partial
PAC	ARM hardware + secure key material	Feasible on ARM once keys initialized	ARM-only, limited
CFI	Compiler instrumentation, virtual memory	Incomplete toolchain support	Very limited
RELRO	ELF runtime linker	Bootloaders don’t have an ELF runtime	Absent

Privilege separation is similarly limited. Most bootloaders execute monolithically at the highest privilege level: x86 ring 0 or ARM EL3/EL2 with no component isolation. A single module compromise reaches everything in the boot chain. Partial exceptions exist in UEFI (SMM provides isolation) and ARM TF-A (TrustZone), but SPENDER showed that SMM compromise still endangers the full chain.

Vulnerability prevention techniques face the same barriers. Language-based memory safety retrofitting (Checked C, SoftBound, CETS) requires virtual address partitioning or imposes runtime overhead exceeding 100%. A Checked C retrofit of EDK-II managed to instrument 86% of pointers but couldn’t reach 100% due to bootloader-specific layout constraints. LowFat Pointers rely on virtual address partitioning: not available. CETS and SoftBound impose overhead exceeding 100%: not viable in a boot context. Memory-safe languages: two open-source bootloaders use Rust, one uses Go. RustEDK-II enables Rust UEFI applications but not the bootloader core.

DECAF (automated debloating) is the most practically useful defense result: it removes up to 70% of UEFI code without breaking functionality, reducing attack surface rather than hardening the existing code.

Open Problems

The paper identifies 7 open problems. These are research-level gaps, not engineering backlogs:

#	Problem	Type	Notes
OP1	Automated taint source/sink identification in bootloaders	Technical	LLM-assisted static analysis (e.g., IRIS) cited as near-term direction
OP2	Automated rehosting for COTS bootloaders	Engineering	Current tools need manual platform-specific support
OP3	Automated input type identification for type 2/3 bootloaders	Technical	Only exists for specific type 1 surfaces
OP4	Lightweight runtime checks without hardware protections	Engineering	Strong bug oracles under early-boot memory constraints
OP5	Cross-surface and cross-stage analysis framework	Technical	Current tools treat each surface and stage in isolation
OP6	Domain-specific exploit mitigations	Engineering	Current mitigations all assume OS primitives
OP7	Automatic attack surface discovery across hardware and boot stages	Technical	Especially early-boot interfaces that aren’t documented

OP5 is the most fundamental: even in the controlled evaluation where vulnerabilities were injected across all 6 attack surfaces, existing tools only detected issues in sas1 and sas4. No framework can simultaneously reason across both attack surfaces and boot stages. The gaps are structural, not a tuning problem.

Why This Matters

A kernel exploit gives you the machine until the next reboot. A bootkit gives you the machine. BlackLotus² (2023) survived OS reinstallation and disk replacement. Complete hardware reformatting was the remediation path for affected Windows systems on commodity hardware.

The 2024 CVE count of 204 sits in this context: RSFUZZER, one of the more capable EDK-II fuzzers, focuses exclusively on SMI handlers and can’t find PixieFAIL or LogoFAIL. FuzzUEr focuses on boot-time interfaces and can’t find LogoFAIL. The tools are optimized for sas4 surfaces where 45 known vulnerabilities exist, while the sas1/sas2 surfaces where the high-impact attacks actually live have 28 known vulnerabilities and near-zero tool coverage.

BOOTBENCH is at github.com/BreakingBoot/oss-bootloaders. 3,658 vulnerabilities with commits and patch context across 43 bootloaders is the kind of dataset that usually takes a research group years to assemble. If you’re building firmware analysis tooling or doing UEFI exploitation research, it’s worth pulling.

References

Glosner, C. & Machiry, A. (2026). SoK: All You Ever Wanted to Know About Bootloader Security But Were Afraid to Ask. IEEE S&P 2026. ↩
Matrosov, A. (2023). The Untold Story of the BlackLotus UEFI Bootkit. Binarly. ↩