Meet Co-RedTeam: How Multi-Agent AI is Automating Red Teaming

In the modern world of cybersecurity, red teaming—the practice of proactively attacking systems to find vulnerabilities—is essential. However, it is also notoriously difficult. It requires deep domain expertise, patience, and the ability to reason across massive, complex codebases.

While Large Language Models (LLMs) have shown promise in writing code and reasoning, they often struggle with the rigorous demands of security testing. They lack execution grounding (they guess instead of testing) and fail to learn from past mistakes.

But what if we didn’t just use one AI agent? What if we built a team of specialists?

Researchers from Google Cloud AI Research, Google, and Michigan State University have introduced Co-RedTeam, a security-aware multi-agent framework designed to mirror real-world red-teaming workflows. Let’s dive into how it works and why it outperforms existing methods.

The Problem: Why LLMs Struggle with Security

Current approaches to automated red teaming often fall short due to three main limitations:

Limited Interaction: Single-agent systems struggle to coordinate the complex, multi-step workflows required for real-world hacking.
Weak Execution Grounding: Many systems rely on static analysis, trying to find bugs without running the code. This leads to false positives.
No Experience Reuse: The system starts from scratch every time, failing to learn patterns from previous vulnerabilities.

Co-RedTeam solves these issues by introducing an Orchestrator that coordinates two distinct stages: Vulnerability Discovery and Iterative Exploitation, all backed by a long-term memory.

Stage 1: Vulnerability Discovery

Before an AI can hack a system, it needs to know what to hack. Co-RedTeam handles this through a collaborative debate between two agents: the Analysis Agent and the Critique Agent.

The Workflow

Analysis Agent: This agent browses the code using specialized tools. It doesn’t just look at code snippets; it grounds its reasoning in established security standards like CWE (Common Weakness Enumeration) and OWASP Top 10. It identifies suspicious code patterns and drafts a hypothesis.
Critique Agent: Acting as a peer reviewer, this agent checks the hypothesis. Is the evidence concrete? Is the risk level accurate? If the hypothesis is weak, it is rejected or sent back for refinement.

+----------------+           +-------------------+
|  Target        |           | Security Docs     |
|  Codebase      |           | (CWE, OWASP)      |
+-------+--------+           +---------+---------+
        ^                              ^
        |                              |
        | (Browses Files)              | (Retrieves Context)
        |                              |
+-------+--------+           +---------+---------+
| Analysis       | --------> | Critique Agent   |
| Agent          | (Draft)   | (Validates)      |
+----------------+           +-------------------+
        |                              ^
        | (Refined Hypotheses)         |
        v                              |
  Validated Vulnerability Candidates -+

This loop continues until a reliable list of potential vulnerabilities is generated, complete with file paths, line numbers, and risk ratings.

Stage 2: Iterative Exploitation

Finding a bug is only half the battle. Proving it requires execution. This stage is where Co-RedTeam truly shines, utilizing a closed-loop system involving three agents.

The Team

Planner: Decomposes the vulnerability into a multi-step plan (e.g., Set up environment -> Craft payload -> Send request).
Validation Agent: A safety gate that checks if the planned commands are safe and syntactically correct before execution.
Execution Agent: Runs the actual code in an isolated Docker environment.
Evaluation Agent: Analyzes the output. Did the code crash? Did we get a shell?

The Loop

The magic happens here: The Evaluation agent feeds the results back to the Planner. If the exploit fails, the Planner updates the plan, modifies the payload, and tries again. This prevents the system from getting stuck in infinite loops of bad commands.

      +-------------------+
      |   Long-Term       |<---(Retrieve Experience)
      |   Memory          |------+
      +-------------------+      |
            ^                     |
            |                     |
            v                     |
      +-------------------+      | (Updated Plan)
      |    Planner        |<-----+
      | (Plan & Refine)   |
      +-------------------+
            |
      (Action) |
            v
      +-------------------+
      |   Validation      |
      |  Agent (Gate)     |
      +-------------------+
            |
      (Safe?) |
            v
      +-------------------+
      |   Execution       | (Isolated Docker)
      |   Agent           |
      +-------------------+
            |
      (Result) |
            v
      +-------------------+
      |   Evaluation      |
      |   (Success/Fail)  |
      +-------------------+
            |
            +-----> Planner (Update Strategy)

The “Brain”: Layered Long-Term Memory

Unlike static tools, Co-RedTeam learns. It utilizes a layered memory system to store experience from previous attacks:

Vulnerability Pattern Memory: Stores abstract patterns of bugs (e.g., “When function X is combined with flag Y, it becomes dangerous”).
Strategy Memory: Remembers high-level strategies (e.g., “Always check the configuration file first”).
Technical Action Memory: Records specific commands or scripts that worked (or failed) in the past.

This allows the system to improve over time. As seen in the paper, the system’s success rate increases as it processes more tasks, particularly when initialized with “warm” security knowledge.

Performance: Does It Work?

The researchers evaluated Co-RedTeam against strong baselines—including Vanilla LLMs, generic coding agents like OpenHands, and specialized security agents like RepoAudit and C-Agent—using benchmarks like CyBench, BountyBench, and CyberGym.

Key Results

CyBench (Exploitation): Co-RedTeam (backed by Gemini 3 Pro) achieved a 63.7% success rate, significantly outperforming the strongest baseline (C-Agent) at 47.8%.
BountyBench (Detection): It achieved a detection accuracy of 20%, an improvement of over 10% in absolute terms compared to baselines.
CyberGym (PoC Exploits): It achieved a 37.3% success rate in generating working proof-of-concept exploits.

Ablation Studies: What matters most?

The researchers removed components of Co-RedTeam to see which features were critical:

Removing Execution Feedback: Performance crashed. This confirms that static analysis alone is insufficient for real-world hacking.
Removing Memory: Success rates dropped, particularly on complex tasks, proving the value of experience reuse.
Removing Validation: The system wasted time on malformed commands, reducing overall efficiency.

Despite its complex architecture, Co-RedTeam is surprisingly efficient, often running faster than generic agents like OpenHands because it avoids fruitless loops of invalid code execution.

Conclusion

Co-RedTeam represents a significant step forward in automated cybersecurity. By moving away from “single-shot” prompts and toward a multi-agent, execution-grounded system with memory, it bridges the gap between AI reasoning and practical red teaming.

It demonstrates that the future of AI security isn’t just about having a smarter model; it’s about building a smarter team.

References

Paper: He, P., Fox, A., Miculicich, L., et al. (2025). Co-RedTeam: Orchestrated Security Discovery and Exploitation with LLM Agents. arXiv preprint arXiv:2602.02164.
Benchmarks Used:
- CyBench: A framework for evaluating cybersecurity capabilities of LLMs (Zhang et al., 2024).
- BountyBench: Dollar impact of AI agent attackers and defenders on real-world systems (Zhang et al., 2025a).
- CyberGym: Evaluating AI agents’ cybersecurity capabilities with real-world vulnerabilities at scale (Wang et al., 2025).
Standards:
- CWE (Common Weakness Enumeration): MITRE Corporation.
- OWASP Top 10: OWASP Foundation.