Don't Trust Stubborn Neighbors:
A Security Framework for Agentic Networks

Multi-agent LLM systems are vulnerable to malicious agents that attempt to manipulate group decisions through spreading misinformation during discussion. This repository provides a framework for:

Running adversarial multi-agent experiments on CommonsenseQA and ToolBench datasets across different network topologies
Evaluating trust-adaptive defenses that dynamically reweight peer influence to isolate malicious agents
Fitting the Friedkin–Johnsen (FJ) opinion dynamics model to LLM agent behavior to analytically predict belief outcomes

Repository Structure

cascade/
├── core/
│   ├── methods.py          # LLM client/backend adapters (OpenAI, vLLM, Gemini)
│   ├── prompts.py          # Agent prompt templates (persuasion & agreeableness traits)
│   └── utils.py            # Belief normalization, adjacency matrix generation
├── experiments/
│   ├── config.py           # YAML/JSON config loader
│   ├── batch.py            # Multi-scenario batch runner
│   ├── trust.py            # Trust matrix construction and updates
│   ├── csqa/
│   │   ├── agents.py       # Agent and AgentGraph classes
│   │   ├── runnerCQ.py     # CommonsenseQA orchestrator
│   │   └── cli.py          # CLI argument parser
│   └── toolbench/
│       ├── agents.py       # ToolBench task builder
│       ├── runnerTB.py     # ToolBench orchestrator
│       └── cli.py
└── analysis/
    ├── compute_asr.py              # Attack Success Rate (ASR) computation
    ├── consolidate_belief_logs.py  # Belief trajectory aggregation
    ├── fit_fj_complete_full.py     # FJ model fitting for complete graphs
    ├── fit_fj_star_full.py         # FJ model fitting for star graphs
    ├── star_predict.py             # FJ predictions for star topology
    ├── complete_predict.py         # FJ predictions for complete topology
    └── summarize_beliefs.py        # Belief summary generation
configs/                    # YAML experiment configs
data/                       # JSONL datasets (csqa, toolbench)

Setup

Install dependencies:

pip install -r requirements.txt

Create a .env file in the project root:

OPENAI_API_KEY=your_key
OPENAI_BASE_URL=https://api.openai.com/v1

# Only required for Gemini backend
GOOGLE_CLOUD_PROJECT=your_project_id

Running Experiments

There are two ways to run experiments:

Option 1: Config file (recommended)

Use cascade.cli.runner with a YAML config. This is the standard approach and handles all experiment parameters automatically.

python -m cascade.cli.runner --config configs/qwen3-235b_all_experiments.yaml

cascade.cli.runner automatically routes to the CommonsenseQA or ToolBench runner based on the dataset field in each config entry — no need to run separate commands.

Ready-to-use configs are provided:

# Quick test (OpenAI gpt-4o-mini, complete graph, no trust)
python -m cascade.cli.runner --config configs/test_gpt5mini.yaml

# Full experiment suite for Qwen3-235B (all topologies, datasets, trust experiments, traits, agent scaling)
python -m cascade.cli.runner --config configs/qwen3-235b_all_experiments.yaml

configs/qwen3-235b_all_experiments.yaml is the reference config covering all 68 scenarios used in the paper: 3 topologies × 2 datasets × no-trust baseline, trust experiments (Exp1–4), trait ablations, and agent scaling.

Option 2: Direct CLI args

Use cascade.experiments.csqa or cascade.experiments.toolbench directly. The --scenario flag is required — it sets the output folder name under output/.

# Adversarial — 1 attacker, 5 benign agents, complete graph, no trust
python -m cascade.experiments.csqa \
  --model gpt-4o-mini \
  --backend openai \
  --graph complete \
  --agents 6 \
  --attackers 1 \
  --placement 0 \
  --rounds 10 \
  --dataset csqa_100 \
  --no-trust \
  --scenario complete-6a-notrust-csqa

Backends

Backend	Description
`openai`	Any OpenAI-compatible API (default)
`vllm`	Self-hosted open-weight models via vLLM
`gemini`	Google Vertex AI
`blablador`	Helmholtz Blablador API

Network Topologies

`--graph`	Description
`complete`	Fully connected — all agents hear all others
`pure_star`	Hub + isolated leaves — only hub aggregates

Trust Experiment Configurations

Config	Name	Description
`exp1`	T-W	Warmup + fixed trust, static attacker
`exp2`	T-WA	Warmup + fixed trust, adaptive attacker
`exp3`	T-WS	Warmup + sparse trust updates, adaptive attacker
`exp4`	T-S	Random sparse trust, no warmup, static attacker

Attacker types:

Static — defends the incorrect answer every round
Adaptive — mimics a benign agent during warmup, then pivots to attack

Analysis

# Compute Attack Success Rate across all completed runs
python -m cascade.analysis.compute_asr --output-dir output

# Aggregate belief trajectory logs
python -m cascade.analysis.consolidate_belief_logs --output-dir output

# Fit Friedkin–Johnsen model (complete graphs)
python cascade/analysis/fit_fj_complete_full.py output/

# Fit Friedkin–Johnsen model (star graphs)
python cascade/analysis/fit_fj_star_full.py output/

ASR definition:

ASR = (# benign agents initially correct who flip to wrong answer at final round)
    / (# benign agents initially correct)

Results are written to output/{model}/{dataset}/{scenario}/summaries/.

Output Structure

output/
└── {model}/
    └── {dataset}/
        └── {scenario}/
            └── {sample_id}/
                ├── records/        # Per-question JSON logs (full dialogue)
                ├── summaries/      # ASR and accuracy summaries
                ├── belief_logs/    # Belief trajectory CSVs (per agent, per round)
                └── trust_logs/     # Trust matrix evolution

Data

File	Description
`data/csqa_100.jsonl`	100-question CommonsenseQA subset
`data/toolbench_100.jsonl`	100-question ToolBench subset

Questions follow the standard CSQA format:

{
  "id": "...",
  "question": "What do people aim to do at work?",
  "choices": {"label": ["A","B","C","D","E"], "text": ["complete", "learn", ...]},
  "answerKey": "A"
}

Citation

To cite our work:

@misc{abedini2026donttruststubbornneighbors,
      title={Don't Trust Stubborn Neighbors: A Security Framework for Agentic Networks}, 
      author={Samira Abedini and Sina Mavali and Lea Schönherr and Martin Pawelczyk and Rebekka Burkholz},
      year={2026},
      eprint={2603.15809},
      archivePrefix={arXiv},
      primaryClass={cs.MA},
      url={https://arxiv.org/abs/2603.15809}, 
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Don't Trust Stubborn Neighbors:
A Security Framework for Agentic Networks

Repository Structure

Setup

Running Experiments

Option 1: Config file (recommended)

Option 2: Direct CLI args

Backends

Network Topologies

Trust Experiment Configurations

Analysis

Output Structure

Data

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
cascade		cascade
configs		configs
data		data
figures		figures
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Don't Trust Stubborn Neighbors:A Security Framework for Agentic Networks

Repository Structure

Setup

Running Experiments

Option 1: Config file (recommended)

Option 2: Direct CLI args

Backends

Network Topologies

Trust Experiment Configurations

Analysis

Output Structure

Data

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Don't Trust Stubborn Neighbors:
A Security Framework for Agentic Networks

Packages