Skip to content

Dormant-Neurons/MAS-Cascade

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Don't Trust Stubborn Neighbors:
A Security Framework for Agentic Networks

Code for the paper: arXiv:2603.15809

Teaser


Multi-agent LLM systems are vulnerable to malicious agents that attempt to manipulate group decisions through spreading misinformation during discussion. This repository provides a framework for:

  • Running adversarial multi-agent experiments on CommonsenseQA and ToolBench datasets across different network topologies
  • Evaluating trust-adaptive defenses that dynamically reweight peer influence to isolate malicious agents
  • Fitting the Friedkin–Johnsen (FJ) opinion dynamics model to LLM agent behavior to analytically predict belief outcomes

Repository Structure

cascade/
├── core/
│   ├── methods.py          # LLM client/backend adapters (OpenAI, vLLM, Gemini)
│   ├── prompts.py          # Agent prompt templates (persuasion & agreeableness traits)
│   └── utils.py            # Belief normalization, adjacency matrix generation
├── experiments/
│   ├── config.py           # YAML/JSON config loader
│   ├── batch.py            # Multi-scenario batch runner
│   ├── trust.py            # Trust matrix construction and updates
│   ├── csqa/
│   │   ├── agents.py       # Agent and AgentGraph classes
│   │   ├── runnerCQ.py     # CommonsenseQA orchestrator
│   │   └── cli.py          # CLI argument parser
│   └── toolbench/
│       ├── agents.py       # ToolBench task builder
│       ├── runnerTB.py     # ToolBench orchestrator
│       └── cli.py
└── analysis/
    ├── compute_asr.py              # Attack Success Rate (ASR) computation
    ├── consolidate_belief_logs.py  # Belief trajectory aggregation
    ├── fit_fj_complete_full.py     # FJ model fitting for complete graphs
    ├── fit_fj_star_full.py         # FJ model fitting for star graphs
    ├── star_predict.py             # FJ predictions for star topology
    ├── complete_predict.py         # FJ predictions for complete topology
    └── summarize_beliefs.py        # Belief summary generation
configs/                    # YAML experiment configs
data/                       # JSONL datasets (csqa, toolbench)

Setup

Install dependencies:

pip install -r requirements.txt

Create a .env file in the project root:

OPENAI_API_KEY=your_key
OPENAI_BASE_URL=https://api.openai.com/v1

# Only required for Gemini backend
GOOGLE_CLOUD_PROJECT=your_project_id

Running Experiments

There are two ways to run experiments:

Option 1: Config file (recommended)

Use cascade.cli.runner with a YAML config. This is the standard approach and handles all experiment parameters automatically.

python -m cascade.cli.runner --config configs/qwen3-235b_all_experiments.yaml

cascade.cli.runner automatically routes to the CommonsenseQA or ToolBench runner based on the dataset field in each config entry — no need to run separate commands.

Ready-to-use configs are provided:

# Quick test (OpenAI gpt-4o-mini, complete graph, no trust)
python -m cascade.cli.runner --config configs/test_gpt5mini.yaml

# Full experiment suite for Qwen3-235B (all topologies, datasets, trust experiments, traits, agent scaling)
python -m cascade.cli.runner --config configs/qwen3-235b_all_experiments.yaml

configs/qwen3-235b_all_experiments.yaml is the reference config covering all 68 scenarios used in the paper: 3 topologies × 2 datasets × no-trust baseline, trust experiments (Exp1–4), trait ablations, and agent scaling.

Option 2: Direct CLI args

Use cascade.experiments.csqa or cascade.experiments.toolbench directly. The --scenario flag is required — it sets the output folder name under output/.

# Adversarial — 1 attacker, 5 benign agents, complete graph, no trust
python -m cascade.experiments.csqa \
  --model gpt-4o-mini \
  --backend openai \
  --graph complete \
  --agents 6 \
  --attackers 1 \
  --placement 0 \
  --rounds 10 \
  --dataset csqa_100 \
  --no-trust \
  --scenario complete-6a-notrust-csqa

Backends

Backend Description
openai Any OpenAI-compatible API (default)
vllm Self-hosted open-weight models via vLLM
gemini Google Vertex AI
blablador Helmholtz Blablador API

Network Topologies

--graph Description
complete Fully connected — all agents hear all others
pure_star Hub + isolated leaves — only hub aggregates

Trust Experiment Configurations

Config Name Description
exp1 T-W Warmup + fixed trust, static attacker
exp2 T-WA Warmup + fixed trust, adaptive attacker
exp3 T-WS Warmup + sparse trust updates, adaptive attacker
exp4 T-S Random sparse trust, no warmup, static attacker

Attacker types:

  • Static — defends the incorrect answer every round
  • Adaptive — mimics a benign agent during warmup, then pivots to attack

Analysis

# Compute Attack Success Rate across all completed runs
python -m cascade.analysis.compute_asr --output-dir output

# Aggregate belief trajectory logs
python -m cascade.analysis.consolidate_belief_logs --output-dir output

# Fit Friedkin–Johnsen model (complete graphs)
python cascade/analysis/fit_fj_complete_full.py output/

# Fit Friedkin–Johnsen model (star graphs)
python cascade/analysis/fit_fj_star_full.py output/

ASR definition:

ASR = (# benign agents initially correct who flip to wrong answer at final round)
    / (# benign agents initially correct)

Results are written to output/{model}/{dataset}/{scenario}/summaries/.


Output Structure

output/
└── {model}/
    └── {dataset}/
        └── {scenario}/
            └── {sample_id}/
                ├── records/        # Per-question JSON logs (full dialogue)
                ├── summaries/      # ASR and accuracy summaries
                ├── belief_logs/    # Belief trajectory CSVs (per agent, per round)
                └── trust_logs/     # Trust matrix evolution

Data

File Description
data/csqa_100.jsonl 100-question CommonsenseQA subset
data/toolbench_100.jsonl 100-question ToolBench subset

Questions follow the standard CSQA format:

{
  "id": "...",
  "question": "What do people aim to do at work?",
  "choices": {"label": ["A","B","C","D","E"], "text": ["complete", "learn", ...]},
  "answerKey": "A"
}

Citation

To cite our work:

@misc{abedini2026donttruststubbornneighbors,
      title={Don't Trust Stubborn Neighbors: A Security Framework for Agentic Networks}, 
      author={Samira Abedini and Sina Mavali and Lea Schönherr and Martin Pawelczyk and Rebekka Burkholz},
      year={2026},
      eprint={2603.15809},
      archivePrefix={arXiv},
      primaryClass={cs.MA},
      url={https://arxiv.org/abs/2603.15809}, 
}

About

Code and experiments for paper: "Don't Trust Stubborn Neighbors: A Security Framework for Agentic Networks"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages