In the wake of devastating software supply chain attacks like SolarWinds and the XZ Utils backdoor, the industry has standardized on “receipts” SBOMs (Software Bill of Materials) and SLSA (Supply-chain Levels for Software Artifacts). These frameworks tell us who built the software and what went inside it. They are essential for provenance, but they are fundamentally reactive. They tell us a breach happened after the artifact is signed and distributed.

A new paper by Syed et al. titled “Agentic AI for Autonomous Defense in Software Supply Chain Security” proposes a paradigm shift: moving from static verification to active, autonomous mitigation [1].

Instead of just checking a checklist, the authors propose a multi-agent AI system that lives inside the CI/CD pipeline, reasoning about code and vulnerabilities in real-time, and autonomously executing fixes.

As a security researcher focusing on AI, I find this approach compelling. It attempts to solve the “speed vs. safety” trade-off by using Agentic AI not just to detect, but to defend.

Here is a deep dive into the architecture and its implications.

The Limitation of Current Provenance

Current frameworks like SLSA and in-toto focus on integrity. They ensure that once a build finishes, the artifact hasn’t been tampered with. However, Syed et al. argue that these tools lack the ability to intervene during the build process when a malicious dependency is injected or a configuration is drifted [1].

If a malicious actor commits a backdoor, current provenance tools faithfully record that the malicious actor committed it. That is not defense; that is logging.

The Agentic AI Architecture

The proposed framework utilizes a concept called Agentic AI systems composed of autonomous agents that can perceive, reason, and act. The paper outlines a sophisticated architecture combining Large Language Models (LLMs) for reasoning and Reinforcement Learning (RL) for decision-making.

High-Level Architecture

The system is not a monolithic AI but a team of specialized agents communicating via LangChain and orchestrated by LangGraph. They interact with the real world (GitHub, Jenkins) via the Model Context Protocol (MCP).

Here is a simplified view of how these components interact:

+-----------------------------------------------------------------------+
|                   EXTERNAL CI/CD ENVIRONMENT                          |
|   (GitHub Actions, Jenkins, Docker Registries, Dependency Repos)      |
+---------------------------+---------------------------+---------------+
                            |                           ^
                            v                           |
+-----------------------------------------------------------------------+
|                    MODEL CONTEXT PROTOCOL (MCP)                       |
|         (Standardized Interface for Tools & Data Access)              |
+---------------------------+---------------------------+---------------+
                            |                           ^
                            v                           |
+-----------------------------------------------------------------------+
|                       AGENTIC AI CORE                                 |
|                                                                       |
|  +----------------+  +------------------+  +-----------------------+  |
|  | Code Analysis  |  |   Dependency     |  |  CI/CD Monitoring     |  |
|  |    Agent       |  |  Intelligence    |  |       Agent           |  |
|  +-------+--------+  +--------+---------+  +-----------+-----------+  |
|          |                    |                        |              |
|          +--------------------+------------------------+              |
|                               |                                       |
|                               v                                       |
|  +-----------------------------------------------------------------+  |
|  |     MULTI-AGENT ORCHESTRATION (LangChain & LangGraph)           |  |
|  |       - Coordinates Handoffs                                    |  |
|  |       - Manages Execution Graphs                                |  |
|  +-----------------------------------+-----------------------------+  |
|                                      |                                |
|                                      v                                |
|  +-----------------------------------------------------------------+  |
|  |              REASONING & LEARNING LAYER                         |  |
|  |  +-------------------+       +-----------------------------+    |  |
|  |  | LLM Engine        | <-->  | Reinforcement Learning (RL) |    |  |
|  |  | (Semantic         |       | (Adaptive Decision Policy)  |    |  |
|  |  |  Vulnerability    |       |                             |    |  |
|  |  |  Analysis)        |       |                             |    |  |
|  |  +-------------------+       +-----------------------------+    |  |
|  +-----------------------------------+-----------------------------+  |
|                                      |                                |
|                                      v                                |
|  +-----------------------------------------------------------------+  |
|  |          BLOCKCHAIN SECURITY LEDGER (Audit Trail)               |  |
|  +-----------------------------------------------------------------+  |
+-----------------------------------------------------------------------+

1. The Agents (The Eyes and Hands)

The framework deploys specialized agents, each with a distinct role [1]:

2. The Brain: LLMs + Reinforcement Learning

This is where the magic happens. The system doesn’t just use regex to look for known bad patterns (like a standard SAST tool).

The RL agent treats the pipeline as a Markov Decision Process (MDP). It learns a policy $\pi^*$ to maximize a reward function that balances security effectiveness against operational overhead [1].

The Reward Function: \(R_t = \alpha \cdot \mathbb{1}_{\text{attack mitigated}} - \beta \cdot \mathbb{1}_{\text{false positive}} - \delta \cdot \Delta t_{\text{build}} + \eta \cdot \mathbb{1}_{\text{developer-accepted fix}}\)

This is crucial. The AI learns not to block every build (which annoys developers) but to block only the high-risk ones, maximizing the “accepted fix” reward.

The Defense Loop

The paper details a continuous cycle of activity. Unlike static scans that happen once, this loop runs throughout the build lifecycle.

     +------------+
     |   OBSERVE  | <--- (Monitor CI/CD indicators)
     +------------+
           |
           v
     +------------+
     |  REASON    | <--- (LLM performs semantic analysis)
     +------------+
           |
           v
     +------------+       (Action: Block, Quarantine, Fix)
     |    ACT     | ----------------------------------------->+
     +------------+                                            |
           |                                                   |
           v                                                   |
     +------------+                                            |
     |   VERIFY   | <--- (Feedback loop to RL Agent)           |
     +------------+                                            |
           |                                                   |
           +---------------------------------------------------+

Every step of this loop the observation, the LLM’s reasoning, and the final action is recorded in an immutable Blockchain Security Ledger. This addresses the “Black Box” problem of AI. We can audit why the AI decided to kill a build process [1].

Experimental Results & Implications

The authors tested this framework against rule-based systems and “RL-only” baselines using simulated and real GitHub Actions/Jenkins pipelines. The results are notable:

  1. Detection Accuracy: The LLM-based reasoning significantly improved recall on semantic vulnerabilities (like insecure deserialization) compared to static rule-sets.
  2. Mitigation Latency: The autonomous agents reduced the time-to-mitigation by acting immediately, rather than waiting for human triage.
  3. Overhead: The build time increase was kept under 6%, which is a reasonable tax for autonomous security.

Researcher’s Critical Take

While promising, the introduction of autonomous agents into the supply chain introduces new attack vectors:

However, the use of a Blockchain Ledger for logging and Multi-Agent Consensus (where agents cross-check each other) acts as a strong mitigation against these risks.

Conclusion

The paper by Syed et al. provides a glimpse into the future of DevSecOps: a shift from “Verifying Integrity” to “Enforcing Integrity.”

By combining the semantic understanding of LLMs with the adaptive decision-making of Reinforcement Learning, we move towards “Self-Defending Supply Chains.” The days of simply reading an SBOM and hoping for the best may be numbered. The future belongs to the agents that can defend the code while it is being written.


References

[1] Syed, T. A., Belgaum, M. R., Jan, S., Khan, A. A., & Alqahtani, S. S. (2025). Agentic AI for Autonomous Defense in Software Supply Chain Security: Beyond Provenance to Vulnerability Mitigation. arXiv preprint arXiv:2512.23480.