In the wake of devastating software supply chain attacks like SolarWinds and the XZ Utils backdoor, the industry has standardized on “receipts” SBOMs (Software Bill of Materials) and SLSA (Supply-chain Levels for Software Artifacts). These frameworks tell us who built the software and what went inside it. They are essential for provenance, but they are fundamentally reactive. They tell us a breach happened after the artifact is signed and distributed.
A new paper by Syed et al. titled “Agentic AI for Autonomous Defense in Software Supply Chain Security” proposes a paradigm shift: moving from static verification to active, autonomous mitigation [1].
Instead of just checking a checklist, the authors propose a multi-agent AI system that lives inside the CI/CD pipeline, reasoning about code and vulnerabilities in real-time, and autonomously executing fixes.
As a security researcher focusing on AI, I find this approach compelling. It attempts to solve the “speed vs. safety” trade-off by using Agentic AI not just to detect, but to defend.
Here is a deep dive into the architecture and its implications.
The Limitation of Current Provenance
Current frameworks like SLSA and in-toto focus on integrity. They ensure that once a build finishes, the artifact hasn’t been tampered with. However, Syed et al. argue that these tools lack the ability to intervene during the build process when a malicious dependency is injected or a configuration is drifted [1].
If a malicious actor commits a backdoor, current provenance tools faithfully record that the malicious actor committed it. That is not defense; that is logging.
The Agentic AI Architecture
The proposed framework utilizes a concept called Agentic AI systems composed of autonomous agents that can perceive, reason, and act. The paper outlines a sophisticated architecture combining Large Language Models (LLMs) for reasoning and Reinforcement Learning (RL) for decision-making.
High-Level Architecture
The system is not a monolithic AI but a team of specialized agents communicating via LangChain and orchestrated by LangGraph. They interact with the real world (GitHub, Jenkins) via the Model Context Protocol (MCP).
Here is a simplified view of how these components interact:
+-----------------------------------------------------------------------+
| EXTERNAL CI/CD ENVIRONMENT |
| (GitHub Actions, Jenkins, Docker Registries, Dependency Repos) |
+---------------------------+---------------------------+---------------+
| ^
v |
+-----------------------------------------------------------------------+
| MODEL CONTEXT PROTOCOL (MCP) |
| (Standardized Interface for Tools & Data Access) |
+---------------------------+---------------------------+---------------+
| ^
v |
+-----------------------------------------------------------------------+
| AGENTIC AI CORE |
| |
| +----------------+ +------------------+ +-----------------------+ |
| | Code Analysis | | Dependency | | CI/CD Monitoring | |
| | Agent | | Intelligence | | Agent | |
| +-------+--------+ +--------+---------+ +-----------+-----------+ |
| | | | |
| +--------------------+------------------------+ |
| | |
| v |
| +-----------------------------------------------------------------+ |
| | MULTI-AGENT ORCHESTRATION (LangChain & LangGraph) | |
| | - Coordinates Handoffs | |
| | - Manages Execution Graphs | |
| +-----------------------------------+-----------------------------+ |
| | |
| v |
| +-----------------------------------------------------------------+ |
| | REASONING & LEARNING LAYER | |
| | +-------------------+ +-----------------------------+ | |
| | | LLM Engine | <--> | Reinforcement Learning (RL) | | |
| | | (Semantic | | (Adaptive Decision Policy) | | |
| | | Vulnerability | | | | |
| | | Analysis) | | | | |
| | +-------------------+ +-----------------------------+ | |
| +-----------------------------------+-----------------------------+ |
| | |
| v |
| +-----------------------------------------------------------------+ |
| | BLOCKCHAIN SECURITY LEDGER (Audit Trail) | |
| +-----------------------------------------------------------------+ |
+-----------------------------------------------------------------------+
1. The Agents (The Eyes and Hands)
The framework deploys specialized agents, each with a distinct role [1]:
- Code Analysis Agent: Scans commits and pull requests.
- Dependency Intelligence Agent: Evaluates third-party libraries against SBOMs.
- CI/CD Monitoring Agent: Watches pipeline logs for anomalies.
- Access Control Agent: Enforces permissions and secrets management.
2. The Brain: LLMs + Reinforcement Learning
This is where the magic happens. The system doesn’t just use regex to look for known bad patterns (like a standard SAST tool).
- LLMs (The Reasoner): They perform semantic analysis. For example, they can understand that a piece of code performing insecure deserialization is dangerous, even if the syntax varies. They use Chain-of-Thought prompting to generate explainable security rationales.
- RL (The Decision Maker): The LLM identifies what is wrong, but the RL agent decides what to do.
The RL agent treats the pipeline as a Markov Decision Process (MDP). It learns a policy $\pi^*$ to maximize a reward function that balances security effectiveness against operational overhead [1].
The Reward Function: \(R_t = \alpha \cdot \mathbb{1}_{\text{attack mitigated}} - \beta \cdot \mathbb{1}_{\text{false positive}} - \delta \cdot \Delta t_{\text{build}} + \eta \cdot \mathbb{1}_{\text{developer-accepted fix}}\)
This is crucial. The AI learns not to block every build (which annoys developers) but to block only the high-risk ones, maximizing the “accepted fix” reward.
The Defense Loop
The paper details a continuous cycle of activity. Unlike static scans that happen once, this loop runs throughout the build lifecycle.
+------------+
| OBSERVE | <--- (Monitor CI/CD indicators)
+------------+
|
v
+------------+
| REASON | <--- (LLM performs semantic analysis)
+------------+
|
v
+------------+ (Action: Block, Quarantine, Fix)
| ACT | ----------------------------------------->+
+------------+ |
| |
v |
+------------+ |
| VERIFY | <--- (Feedback loop to RL Agent) |
+------------+ |
| |
+---------------------------------------------------+
Every step of this loop the observation, the LLM’s reasoning, and the final action is recorded in an immutable Blockchain Security Ledger. This addresses the “Black Box” problem of AI. We can audit why the AI decided to kill a build process [1].
Experimental Results & Implications
The authors tested this framework against rule-based systems and “RL-only” baselines using simulated and real GitHub Actions/Jenkins pipelines. The results are notable:
- Detection Accuracy: The LLM-based reasoning significantly improved recall on semantic vulnerabilities (like insecure deserialization) compared to static rule-sets.
- Mitigation Latency: The autonomous agents reduced the time-to-mitigation by acting immediately, rather than waiting for human triage.
- Overhead: The build time increase was kept under 6%, which is a reasonable tax for autonomous security.
Researcher’s Critical Take
While promising, the introduction of autonomous agents into the supply chain introduces new attack vectors:
- LLM Hallucinations: An LLM might mistakenly flag safe code as a “Injection Attack” (False Positive) or, worse, fail to detect a novel polymorphic attack (False Negative).
- Prompt Injection: If an attacker can manipulate the build artifacts or logs that the LLM reads, they might attempt to jailbreak the Code Analysis Agent into approving malicious code.
- RL Adversaries: A sophisticated attacker could try to poison the RL environment, teaching the agent that allowing a specific type of malicious behavior is rewarding (Data Poisoning).
However, the use of a Blockchain Ledger for logging and Multi-Agent Consensus (where agents cross-check each other) acts as a strong mitigation against these risks.
Conclusion
The paper by Syed et al. provides a glimpse into the future of DevSecOps: a shift from “Verifying Integrity” to “Enforcing Integrity.”
By combining the semantic understanding of LLMs with the adaptive decision-making of Reinforcement Learning, we move towards “Self-Defending Supply Chains.” The days of simply reading an SBOM and hoping for the best may be numbered. The future belongs to the agents that can defend the code while it is being written.
References
[1] Syed, T. A., Belgaum, M. R., Jan, S., Khan, A. A., & Alqahtani, S. S. (2025). Agentic AI for Autonomous Defense in Software Supply Chain Security: Beyond Provenance to Vulnerability Mitigation. arXiv preprint arXiv:2512.23480.