Bootloaders sit at the most privileged layer of any system. They run before the OS, before virtual memory is initialized, before most exploit mitigations exist as a concept. A compromise here doesn’t just give you ring 0. It gives you something that survives OS reinstalls, disk replacements, and most incident response playbooks. That’s a bootkit, and they’re not theoretical.

In 2024 alone there were 204 bootloader CVEs. Some led to remote code execution in privileged environments. What this SoK from Purdue reveals is that the field’s ability to detect and prevent these vulnerabilities is much weaker than the threat model warrants: 25 analysis tools evaluated, 3 that could actually run, 1 attack surface out of 6 where anything was detected. The tools that exist are specialized to surfaces that aren’t where the interesting attacks live.

Three Types, Eight Stages

The paper classifies 43 bootloaders into three types based on starting point and target software. Every downstream finding about attack surfaces, tool coverage, and mitigation applicability maps back to this taxonomy.

flowchart LR
    HW([Hardware\nRoot of Trust]) --> T1["Type 1 · Firmware Bootloader\nEDK-II · SeaBIOS · coreboot"]
    T1 -->|split boot| T2["Type 2 · OS Bootloader\nGRUB2 · Windows Boot Manager"]
    T2 --> OS([OS / Hypervisor])
    T1 -->|monolithic| T3["Type 3 · Monolithic\nU-Boot · MCUboot"]
    T3 --> EM([Embedded Application])

The paper deep-dives on 7 bootloaders selected for distinct design characteristics and CVE density. Their CI/CD integration tells you a lot about what automated security analysis looks like in practice:

Bootloader Type Key Architectures CI/CD Security Tools Attack Surfaces
coreboot 1 RISC-V, x86, ARM, PPC CodeQL, Cppcheck, OSS-Fuzz, OpenSSF has1, has2, sas1–4
EDK-II (UEFI) 1 x86, ARM, RISC-V, MIPS, LoongArch CodeQL, Coverity, OSS-Fuzz has1, has2, sas1–4
SeaBIOS 1 x86 none has1, has2, sas1–4
GRUB2 2 x86, ARM, PPC, MIPS, SPARC, RISC-V OSS-Fuzz has1, has2, sas1, sas2, sas4
Windows Boot Manager 2 x86, ARM — (closed source) has1, has2, sas1, sas2, sas4
MCUboot 3 32-bit MCU CodeQL, Cppcheck, libFuzzer, OpenSSF has1, sas2, sas3
Das U-Boot 3 ARM, x86, ARC, MIPS, PPC, RISC-V, Xtensa CodeQL, Cppcheck, OSS-Fuzz, OpenSSF has1, sas1, sas2, sas3, sas4

Type 1 bootloaders expose all 6 attack surfaces. Type 2 exposes 5 (no post-boot SMI handlers). Type 3 shows the most variability, with several surfaces only partially applicable depending on board configuration.

The 8-stage boot process cuts across these types differently. Stages 5–8 (OS handoff and beyond) are absent in type 1 since it remains OS-agnostic. Stage 4 (bootloader handoff) is absent in type 3 since there’s no staged boot. Stage 3 implementations also diverge significantly: coreboot uses multiple sub-components (bootblock, verstage, romstage, ramstage), while EDK-II wraps it in a single DXE phase.

Six Attack Surfaces, Most of Them Ignored

The paper defines 6 attack surfaces across hardware and software categories. Not all surfaces apply to all types.

Surface Category What the Attacker Needs Known Vulns in Dataset Representative Attacks
has1 Hardware · Invasive Physical dismantly access 2 MoonBounce (SPI flash, EDK-II), NAND glitching (U-Boot)
has2 Hardware · External Peripheral plugging 4 ThunderStrike2 (Thunderbolt OptionROMs, EDK-II), CVE-2020-25647 (USB, GRUB)
sas1 Software · Remote Network boot exposure 12 PixieFAIL (9 CVEs in EDK-II PXE IPv6), CVE-2023-40547 shim PXE, CVE-2018-18439 U-Boot TFTP
sas2 Software · Persistent Write access to storage 16 BootHole CVE-2020-10713 (GRUB grub.cfg), LogoFAIL (EDK-II logo parsing), CVE-2019-13104 (U-Boot ext4)
sas3 Software · Post-boot OS-level access 10 TrickBoot (SMI callbacks, EDK-II/SeaBIOS), UbootKit (flash rewrite, U-Boot)
sas4 Software · Boot Time Interactive boot interface 45 CVE-2024-7756 (EDK-II shell), CVE-2024-49504 (GRUB), CVE-2023-48426 (U-Boot)

The surface distribution by bootloader type:

Surface Type 1 Type 2 Type 3
has1 MoonBounce FinFisher (MBR) Glitching U-Boot
has2 ThunderStrike2 CVE-2020-25647 not applicable
sas1 PixieFAIL CVE-2023-40547 CVE-2018-18439 (partial)
sas2 LogoFAIL BootHole CVE-2020-10713 CVE-2019-13104
sas3 TrickBoot not applicable UbootKit (partial)
sas4 CVE-2024-7756 CVE-2024-49504 CVE-2023-48426 (partial)

The vulnerability count per surface reveals where the tooling gap is worst: sas4 has 45 known vulnerabilities and the most tool coverage, while has1 and has2 together have 6 known vulnerabilities and essentially zero automated detection capability. sas2 with 16 known vulns (BootHole, LogoFAIL class) is the most dangerous underexplored software surface.

[!NOTE] The paper found over 40 vulnerabilities in BOOTBENCH along surfaces covered by none of the 25 evaluated tools. Not low coverage. Zero coverage.

BOOTBENCH: What 3,658 Vulnerabilities Look Like

There was no comprehensive bootloader vulnerability dataset before this paper. Prior tools evaluated on upstream bootloaders only or used fewer than 10 known bugs as ground truth. BOOTBENCH is built from two sources: mining the CVE database for 1,157 bootloader CVEs, and keyword-filtering commit histories (derived from MITRE’s CWE list) across all 43 bootloaders to surface 2,501 additional security bugs that were fixed without a CVE assignment.

Vulnerability distribution by bootloader:

Bootloader Bug Count Notes
Legacy BIOS 367 Largest share; legacy codebase, decades of debt
EDK-II 357 Most active research target
GRUB 74 High-profile CVEs (BootHole, Shim Shady)
U-Boot 55 Broad hardware support creates large attack surface
Others (39 bootloaders) 2,805 Mostly non-CVE commit-mined bugs

Dominant vulnerability classes: privilege escalation, memory overflows (stack and heap), and information disclosure. The paper acknowledges dataset bias: CVE mining relies on vendor disclosure practices, and commit keyword filtering can miss vulnerabilities patched under innocuous commit messages.

The evaluation methodology addresses this by selecting a single commit per bootloader where the maximum number of simultaneously present, verifiable vulnerabilities exist, enabling direct pre/post-patch comparison. Table 7 in the paper lists the 7 evaluation bootloaders, their commit hashes, and verified vulnerability counts at each revision.

25 Tools, 3 That Run, 1 That Finds Anything

The paper sources 25 tools from the top security and software engineering venues over the past decade: 10 static, 15 dynamic. Selection criteria for evaluation: open-source, documented, runnable without specialized hardware. The tool evaluation results are the paper’s most damning finding.

Static Analysis

Tool EDK-II
(16 bugs)
coreboot
(8)
SeaBIOS
(3)
GRUB
(29)
shim
(11)
U-Boot
(12)
TF-A
(11)
CodeQL 0 TP / 23 FP 0 / 0 0 / 5 0 / 17 0 / 0 0 / 56 0 / 0
FwHunt 0 / 0 failed failed failed failed failed failed
STASE 5 / 0 failed failed failed failed failed failed
\[\text{FNR}(T) = \frac{\text{FN}(T)}{\text{TP}(T) + \text{FN}(T)}\]

STASE is the best static tool in the study: $\text{FNR} = 11/16 = 68.8\%$. CodeQL and FwHunt are at 100% false negative rate. Neither FwHunt nor STASE can run on anything outside EDK-II. CodeQL ran across all 7 bootloaders and produced zero true positives everywhere, with particularly bad noise on U-Boot (56 false positives) and EDK-II (23).

The reason CodeQL fails isn’t a bug in CodeQL. Firmware-specific taint sources and sinks aren’t modeled. MMIO reads, NVRAM variable reads, and interrupt-delivered buffers are all valid attacker-controlled inputs in bootloader code. CodeQL’s default queries know none of them.

Dynamic Analysis

Tool EDK-II
(16 bugs)
coreboot
(8)
SeaBIOS
(3)
GRUB
(29)
shim
(11)
U-Boot
(12)
TF-A
(11)
FuzzUEr 1 TP / 0 FP failed failed failed failed failed failed
HBFA 0 / 0 failed failed failed failed failed failed
efi_fuzz 0 / 0 failed failed failed failed failed failed
BootFuzz failed failed 0 / 0 failed failed failed failed

FuzzUEr found 1 vulnerability: a memory safety bug in EDK-II. It missed the other 15 because it doesn’t support the relevant attack surfaces and couldn’t reconstruct the state required to reach deeper code paths.

HBFA stubs driver functionality to run in userspace. The USB_IO_PROTOCOL is fully stubbed: drivers that depend on it compile but most code is replaced by return EFI_SUCCESS. It’s not fuzzing; it’s fuzzing a shadow of the target.

efi_fuzz targets exclusively NVRAM variable fuzzing (sas2), making it the most constrained tool in the evaluation.

OSS-Fuzz adoption: only EDK-II and systemd-boot participate, achieving 25.72% and 29.37% code coverage respectively. Both reflect generic rather than bootloader-specific fuzzing configurations.

SAST adoption more broadly: only 20.1% (9 of 43+) open-source bootloaders integrate any general-purpose static analysis, and all leave tools at default configuration, which prior work shows is largely ineffective without firmware-specific tuning.

Why Static Analysis Fails: The Four Hard Problems

The paper breaks down static analysis failure into four technical challenges worth understanding in detail.

SATC1: Points-to analysis. Bootloader code is pointer-heavy and manages memory through direct physical addresses rather than malloc/free. EDK-II in particular uses both data and function pointers extensively, with targets that depend on dynamic configuration rather than static structure. Standard points-to analysis techniques assume memory allocation APIs as anchor points. Those don’t exist here.

SATC2: Taint source and sink identification. This is illustrated directly in the paper with a code example from EDK-II’s SMI handler interface. The interface accepts a VOID* communication buffer that gets cast to different concrete structures depending on which handler is invoked:

// filename: EDK-II SMI handler (Listing 1 from paper)
typedef struct { UINT32 KeyId; UINT8 Data[32]; } CREATE_KEY_COMM;
typedef struct { UINT32 KeyId; }                 DELETE_KEY_COMM;

EFI_STATUS CreateKeyHandler(EFI_HANDLE DispatchHandle,
    CONST VOID *Context, VOID *CommBuffer, UINTN *CommBufferSize)
{
    // VOID* recast: structure layout depends on which handler was dispatched
    CREATE_KEY_COMM *Req = (CREATE_KEY_COMM *)CommBuffer;
    // attacker controls CommBuffer contents; Req->KeyId and Req->Data are tainted
    return EFI_SUCCESS;
}

EFI_STATUS DeleteKeyHandler(EFI_HANDLE DispatchHandle,
    CONST VOID *Context, VOID *CommBuffer, UINTN *CommBufferSize)
{
    // same VOID* type, different concrete layout: 8 bytes vs 36 bytes
    DELETE_KEY_COMM *Req = (DELETE_KEY_COMM *)CommBuffer;
    // fuzzer must know to send different structure per handler
    return EFI_SUCCESS;
}

Even within a single attack surface, required fields differ across protocols. Fs->OpenVolume(Fs, &Root) and Rng->GetRNG(Rng, NULL, Size, Buf) are both sas4 protocol calls but have entirely different argument structures. No existing tool automatically identifies this. They rely on manual annotations, protocol-specific heuristics, or fixed patterns.

The practical consequence is illustrated by BootHole (CVE-2020-10713). The vulnerability requires tracking attacker-controlled data from grub.cfg through the flex-generated parser:

grub.cfg  (attacker writes this)
  └─► flex parser reads config
        └─► GRUB redefines YY_FATAL_ERROR() to return instead of abort
              └─► unchecked input reaches yy_flex_strncpy()
                    └─► heap buffer overflow → code execution + Secure Boot bypass

Detecting this requires inter-procedural, alias-aware taint tracking that models GRUB’s YY_FATAL_ERROR redefinition as a flow-altering sink modification. None of the evaluated tools handle it.

SATC3: Interrupt-driven control flow. Bootloader execution is not sequential. Hardware interrupts, firmware exceptions, and asynchronous callbacks introduce non-linear flows invisible in static call graphs. SMI handlers are the canonical example: they fire asynchronously and are invisible to analysis that doesn’t explicitly model SMM dispatch. Most tools handle protocol-based callbacks in limited ways but don’t model hardware interrupt delivery at all.

SATC4: Cross-stage state persistence. Bootloaders pass state across stages through NVRAM, system tables, and reserved memory regions. A taint originating in EDK-II’s DXE phase (a NVRAM variable write) may be read later by GRUB. No tool models this. SPENDER and Exite handle SMM/SMRAM persistence. BootStomp modeled NVRAM in Android bootloaders. Full cross-stage tracking across different bootloaders in the same boot chain remains unsolved.

Why Dynamic Analysis Fails: The Three Hard Problems

Execution environment. Bootloaders are hardware-dependent and interrupt-triggered. QEMU supports bootloader execution, but coverage is limited by driver emulation gaps. barebox supports 20 different SPI drivers across all configurations; only 5 work in QEMU. Differences in board-specific initialization sequences compound this: initializing permanent memory on an ARM mach-rockchip board takes 4 steps while Freescale imx8mm_evk requires 7 due to a different memory layout. No automated rehosting technique exists for COTS bootloaders.

Interface invocation. Even when the execution environment works, identifying which interfaces exist, triggering them correctly, and reconstructing the state they depend on is largely manual. In EDK-II, protocol drivers perform a DriverBindingSupported check before loading to confirm required hardware is present. Even when compiled, some protocols aren’t active at runtime. Once identified, interfaces like CreateKeyHandler (sas3) can be invoked externally, but others (certain SMM protocols) require firmware modification to reach. Most tools handle this through manual harness writing.

Bug oracles. Bootloaders in early stages (1–3) lack virtual memory protections, so memory corruption bugs often don’t crash anything observable. Standard sanitizers (ASan, UBSan) rely on virtual memory. The bare-metal KASAN implementation has the same limitations. FuzzUEr required a heavily customized ASan adaptation that only works in stage 3. Standard U-Boot has enough contiguous memory for ASan’s shadow mapping, but handling shadow and poisoned memory before permanent memory initialization in stage 2 is an unsolved engineering problem.

Exploit Mitigations: None of the Standard Options Work

The paper checked adoption of 6 standard exploit mitigations across all bootloaders. The constraint is fundamental: most mitigations require an MMU, entropy sources, or virtual-memory-based compiler instrumentation, none of which exist in early boot.

Mitigation Requirement Early Boot Feasibility Adoption
ASLR Entropy source, relocatable code, loader Not feasible (flat address space, static linking) Essentially absent
Stack Canaries Entropy source Feasible in later stages once entropy available Sparse
NX MMU Feasible post-MMU initialization Partial
PAC ARM hardware + secure key material Feasible on ARM once keys initialized ARM-only, limited
CFI Compiler instrumentation, virtual memory Incomplete toolchain support Very limited
RELRO ELF runtime linker Bootloaders don’t have an ELF runtime Absent

Privilege separation is similarly limited. Most bootloaders execute monolithically at the highest privilege level: x86 ring 0 or ARM EL3/EL2 with no component isolation. A single module compromise reaches everything in the boot chain. Partial exceptions exist in UEFI (SMM provides isolation) and ARM TF-A (TrustZone), but SPENDER showed that SMM compromise still endangers the full chain.

Vulnerability prevention techniques face the same barriers. Language-based memory safety retrofitting (Checked C, SoftBound, CETS) requires virtual address partitioning or imposes runtime overhead exceeding 100%. A Checked C retrofit of EDK-II managed to instrument 86% of pointers but couldn’t reach 100% due to bootloader-specific layout constraints. LowFat Pointers rely on virtual address partitioning: not available. CETS and SoftBound impose overhead exceeding 100%: not viable in a boot context. Memory-safe languages: two open-source bootloaders use Rust, one uses Go. RustEDK-II enables Rust UEFI applications but not the bootloader core.

DECAF (automated debloating) is the most practically useful defense result: it removes up to 70% of UEFI code without breaking functionality, reducing attack surface rather than hardening the existing code.

Open Problems

The paper identifies 7 open problems. These are research-level gaps, not engineering backlogs:

# Problem Type Notes
OP1 Automated taint source/sink identification in bootloaders Technical LLM-assisted static analysis (e.g., IRIS) cited as near-term direction
OP2 Automated rehosting for COTS bootloaders Engineering Current tools need manual platform-specific support
OP3 Automated input type identification for type 2/3 bootloaders Technical Only exists for specific type 1 surfaces
OP4 Lightweight runtime checks without hardware protections Engineering Strong bug oracles under early-boot memory constraints
OP5 Cross-surface and cross-stage analysis framework Technical Current tools treat each surface and stage in isolation
OP6 Domain-specific exploit mitigations Engineering Current mitigations all assume OS primitives
OP7 Automatic attack surface discovery across hardware and boot stages Technical Especially early-boot interfaces that aren’t documented

OP5 is the most fundamental: even in the controlled evaluation where vulnerabilities were injected across all 6 attack surfaces, existing tools only detected issues in sas1 and sas4. No framework can simultaneously reason across both attack surfaces and boot stages. The gaps are structural, not a tuning problem.

Why This Matters

A kernel exploit gives you the machine until the next reboot. A bootkit gives you the machine. BlackLotus (2023) survived OS reinstallation and disk replacement. Complete hardware reformatting was the remediation path for affected Windows systems on commodity hardware.

The 2024 CVE count of 204 sits in this context: RSFUZZER, one of the more capable EDK-II fuzzers, focuses exclusively on SMI handlers and can’t find PixieFAIL or LogoFAIL. FuzzUEr focuses on boot-time interfaces and can’t find LogoFAIL. The tools are optimized for sas4 surfaces where 45 known vulnerabilities exist, while the sas1/sas2 surfaces where the high-impact attacks actually live have 28 known vulnerabilities and near-zero tool coverage.

BOOTBENCH is at github.com/BreakingBoot/oss-bootloaders. 3,658 vulnerabilities with commits and patch context across 43 bootloaders is the kind of dataset that usually takes a research group years to assemble. If you’re building firmware analysis tooling or doing UEFI exploitation research, it’s worth pulling.

References

  1. Glosner, C. & Machiry, A. (2026). SoK: All You Ever Wanted to Know About Bootloader Security But Were Afraid to Ask. IEEE S&P 2026.
  2. Glosner, C. & Machiry, A. (2025). FUZZUER: Enabling Fuzzing of UEFI Interfaces on EDK-2. NDSS 2025.
  3. Shafiuzzaman, M. et al. (2024). STASE: Static Analysis Guided Symbolic Execution for UEFI Vulnerability Signature Generation. ASE 2024.
  4. Falcon, F. & Arce, I. (2024). PixieFAIL: Nine Vulnerabilities in TianoCore’s EDK II IPv6 Network Stack. Quarkslab.
  5. Binarly Research Team. (2023). Finding LogoFAIL: The Dangers of Image Parsing During System Boot. Binarly.
  6. Eclypsium. (2020). BootHole (CVE-2020-10713): There’s a Hole in the Boot. Eclypsium Research.
  7. Eclypsium. (2020). TrickBoot: Persist, Brick, Profit. Eclypsium Research.
  8. Matrosov, A. (2023). The Untold Story of the BlackLotus UEFI Bootkit. Binarly.
  9. Redini, N. et al. (2017). BootStomp: On the Security of Bootloaders in Mobile Devices. USENIX Security 2017.
  10. Cherupattamoolayil, S. et al. (2025). Adding Spatial Memory Safety to EDK II through Checked C. ISSTA 2025.
  11. Christensen, J. et al. (2020). DECAF: Automatic, Adaptive De-bloating and Hardening of COTS Firmware. USENIX Security 2020.
  12. Sharma, A. et al. (2024). Rust for Embedded Systems: Current State and Open Problems. CCS 2024.
  13. Li, Z., Dutta, S. & Naik, M. (2025). IRIS: LLM-Assisted Static Analysis for Detecting Security Vulnerabilities. arXiv:2501.10793.