For decades, internet users have relied on a comforting shield known as “practical obscurity.” The idea is simple: while you could theoretically be identified by your zip code, birth date, and movie ratings (as famously demonstrated in the Netflix Prize study), actually doing so requires structured data and expensive, manual detective work.

Most of us assume that our pseudonymous Reddit throwaway account or our Hacker News handle is safe because no one has the time or money to manually sift through thousands of posts to find a clue.

That era is over.

A groundbreaking new paper titled “Large-scale online deanonymization with LLMs” by Lermen et al. (2026) demonstrates that Large Language Models (LLMs) have fundamentally broken this shield. They can now automate the process of linking anonymous profiles to real-world identities with frightening precision.


The Old World vs. The LLM World

In the past, deanonymization attacks (like the Netflix/IMDb linkage) relied on structured data—rows of numbers, dates, and fixed attributes.

Today, LLMs can process unstructured text. They don’t need a spreadsheet; they read your comments, analyze your writing style, infer your demographics, and connect the dots.

The Attack Pipeline: How It Works

The researchers developed a framework called ESRC to systematize this attack. It stands for Extract, Search, Reason, Calibrate.

Here is a visualization of how an AI agent turns a random forum post into a real name:

      [ Unstructured Text Input ] 
                |
         (User Comments/Bio)
      "I hate Python, love Rust."
      "Working at a startup in Berlin."
                |
                v
    +-------------------------+
    |  1. EXTRACT (The LLM)   | -> Features: 
    +-------------------------+    - Location: Berlin
                                   - Role: Developer
                                   - Interests: Rust, Systems
                |
                v
    +-------------------------+
    |  2. SEARCH (Embeddings) | -> Vector Search against 
    +-------------------------+    Database of 100k profiles
                |
                v
    +-------------------------+
    |  3. REASON (The Agent)  | -> "Candidate A lives in Munich.
    +-------------------------+    Candidate B is in Berlin.
                                   B is the match."
                |
                v
    +-------------------------+
    |  4. CALIBRATE           | -> Output: Match with 94% confidence
    +-------------------------+

Experiment 1: The “James Bond” Agent

The most chilling demonstration involved an autonomous LLM agent with web access.

The Setup: The researchers took anonymous profiles (Hacker News, Reddit) and simply told the LLM: “Find out who this person is.”

The Process:

  1. The LLM reads the profile and builds a “dossier” (e.g., “PhD student in Biology, UK-based, uses specific Python library”).
  2. It autonomously searches Google/LinkedIn.
  3. It cross-references the dossier with search results.
  4. It outputs a name.

The Results:

+-----------------------+-------------------------------------+
|     Attack Target     |  Success Rate (Recall @ 90% Prec)   |
+-----------------------+-------------------------------------+
|   Hacker News Users   |               67%                   |
|   Reddit Academic     |               52%                   |
|   Anthropic Subjects  |               ~27%                  |
+-----------------------+-------------------------------------+

Table 1: Agentic Re-identification Rates


Experiment 2: The Modular ESRC Framework

While agents are cool, they are expensive and hard to scale. The researchers then built a pipeline to deanonymize users across massive databases (up to 100,000 candidates) efficiently.

Case Study: Hacker News ↔ LinkedIn

The goal was to link a pseudonymous Hacker News account to a real LinkedIn profile using only text data.

The Baseline: The classical “Netflix Prize” style attack (using handcrafted features) failed spectacularly, achieving near 0% recall.

The LLM Method:

  1. Extract: Summarized user activity.
  2. Search: Used semantic embeddings to find the top 100 similar candidates.
  3. Reason: Used GPT-5.2 to verify the match.

Graph 1: Recall vs. Precision

Notice how the classical method (Green) collapses. The LLM reasoning approach (Red) maintains high recall even as precision requirements get stricter.

Recall (%)
  |
70|    * (Reasoning)
  |     \
60|      \
  |       \
50|        \   * (Search Only)
  |         \   \
40|          \   \
  |           \   \
30|            \   \   * (Classical Baseline)
  |             \   \   \
20|              \   \   \
  |               \   \   \
10|                \   \   \__________________
  |                 \   \____________________
 0-------------------------------------------------> Precision (%)
      90%      95%      99%

The “Reasoning” step allowed the model to distinguish between similar candidates, boosting performance from near zero to 45.1% recall at 99% precision.


The Code: How the “Tournament” Works

One of the clever innovations in the paper is the Calibration step. LLMs aren’t great at giving exact probability numbers (e.g., “I am 94% sure”). They are, however, great at comparisons (“Match A is better than Match B”).

To sort matches by confidence, they used a Swiss-system tournament.

Here is a Pythonic pseudo-code representation of the algorithm:

def calibrate_matches(query_candidate_pairs):
    # Initialize ratings for all proposed matches
    ratings = {pair: 0 for pair in query_candidate_pairs}
    
    # Run N rounds of the tournament
    for round in range(1, N_ROUNDS):
        # Swiss-system: pair up matches with similar ratings
        matchups = swiss_pairing(ratings)
        
        for pair_a, pair_b in matchups:
            # Ask the LLM: "Which is a more plausible match?"
            winner = LLM_Judge(pair_a, pair_b)
            
            # Update ratings (like ELO in chess)
            if winner == pair_a:
                ratings[pair_a] += 1
                ratings[pair_b] -= 1
            else:
                ratings[pair_b] += 1
                ratings[pair_a] -= 1
                
    return sort_by_rating(ratings)

This approach allows an attacker to scale the attack to thousands of users, prioritizing the “easy” matches first.


Implications: Why This Matters

The paper concludes with a stark warning: The threat model for online privacy needs to be rewritten.

  1. Cost vs. Feasibility: Previously, you were safe because a human investigator cost $100/hr. An LLM agent costs cents.
  2. Unstructured Data is a Fingerprint: We used to worry about metadata (GPS, Zip codes). Now, your writing style, your specific interest in “neon noir aesthetics,” and your dog’s name “Biscuit” are enough to identify you.
  3. False Sense of Security: Splitting your personality across platforms (LinkedIn for work, Reddit for hobbies) no longer works. The LLM finds the bridge between them.

What Can You Do?

The authors suggest that simply not publishing data is the only true mitigation. However, that defeats the purpose of online communities.


References

  1. Lermen, S., Paleka, D., Swanson, J., Aerni, M., Carlini, N., & Tramèr, F. (2026). Large-scale online deanonymization with LLMs. arXiv preprint arXiv:2602.16800.
  2. Narayanan, A., & Shmatikov, V. (2008). Robust De-anonymization of Large Sparse Datasets. IEEE Symposium on Security and Privacy. (The “Netflix Prize” paper).
  3. Sweeney, L. (2000). Simple Demographics Often Identify People Uniquely. Carnegie Mellon University.
  4. Li, C. (2025). Contextual Integrity and AI Agents. (Referenced regarding Anthropic Interviewer Dataset).