Skip to content

DaDominio/graphstrike

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

55 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

title GraphStrike
emoji 🕵️
colorFrom blue
colorTo indigo
sdk docker
app_port 7860
pinned false
license mit
tags
reinforcement-learning
social-network
fraud-detection
openenv
llm-agent



An OpenEnv-compatible reinforcement learning environment where an LLM agent must identify all 10 members of a coordinated fake account network hidden inside a synthetic social network. The agent learns via Reflexion and a dynamic hybrid rule/LLM policy , not via gradient updates or fine-tuning.


Theme

SUPPORT

Customer Service Agents

Complex environment where agents resolve multi-step queries using external tools and APIs.

Problem Statement

The task: A social network contains fake accounts organised into a single coordinated ring of 10. The ring behaves in a coordinated way — same posting hour, same IP subnet, stolen celebrity photos, copy-paste bios. The agent must find all 10 by navigating a limited step budget, inspecting accounts, and flagging suspects.

Proposed Solution

An OpenEnv-compatible reinforcement learning environment where an LLM agent must identify all 10 members of a coordinated fake account ring hidden inside a synthetic social network. The agent learns via Reflexion and a dynamic hybrid rule/LLM policy — not via gradient updates or fine-tuning.


Novelty Highlights

  • Adaptive Hybrid Intelligence (Rules + LLM): Unlike static ensembles, GraphStrike dynamically blends deterministic rules and LLM reasoning using a trust gate, shifting control as performance improves.
  • Learning Without Fine-Tuning: Instead of updating model weights, the agent learns through Reflexion lessons and best-trajectory memory injected into future prompts.
  • Graph-First Detection Pipeline: Detection is not account-by-account only; it uses cascade effects, neighbor propagation, and multi-hop graph expansion to uncover coordinated rings.
  • Math-Grounded Decision Control: Risk composition, trust calibration, and grader alignment are formula-driven, making behavior interpretable and reproducible.
  • Adversarial Evasion Benchmarking: Hard-mode includes timed evasion events, so success reflects robustness under disruption rather than overfitting to static patterns.
  • Safety-Net by Design: High-confidence rule overrides prevent catastrophic LLM errors while preserving LLM flexibility for strategic exploration.

Performance Summary

We evaluate GraphStrike's hybrid rule/LLM policy across multiple frontier models to measure how well each model handles the investigation task. All runs use the same inference pipeline (inference.py) with identical system prompts and structured logging. Each model ran: (1) seed=0 on all 3 tasks, and (2) seeds 0-2 on all 3 tasks for variance measurement.

Seed=0 scores (single episode per task):

Model Performance Table


3-seed variance scores (mean across seeds 0, 1, 2):

Model Performance Table


Rule-Based Baseline (no LLM, deterministic)

Model Performance Table



Table of Contents

  1. What This Is
  2. The Problem: How Fake Detection Actually Works
  3. Synthetic Data Generation
  4. Data Model
  5. The RL Environment
  6. Risk Scoring Mathematics
  7. The LLM Policy (Qwen3 via Bedrock)
  8. Reflexion — How the Agent Learns
  9. Hybrid Policy — The Novel Contribution
  10. Training Loop End-to-End
  11. API Reference
  12. Docker Deployment
  13. Submission Requirements
  14. Verification & Validation

1. What is this !?

This is an OpenEnv hackathon submission. OpenEnv is a framework for building RL environments with a standard microservice interface (/reset, /step, /state) so that any agent implementation can plug in.

What makes this non-trivial:

  • The network is large (50–1000 accounts depending on difficulty).
  • Fake accounts are mixed with innocent high-signal "decoy" accounts.
  • In hard mode, the gang actively evades — dropping intra-gang follows, renaming profiles — while the agent is mid-investigation.
  • The agent cannot see the full network upfront: it must explore via INSPECT and INVESTIGATE_NETWORK actions, spending steps to reveal information.

What makes the learning novel:

  • The LLM (inference via AWS Bedrock) cannot be fine-tuned — it is a black-box API.
  • The agent learns via Reflexion: post-episode lessons are written back into memory and injected into every future prompt.
  • A dynamic hybrid policy (α-weighted) blends the LLM with a deterministic rule engine, with the blend weight α updating based on recent win rate. Rules dominate early; the LLM takes over as it proves itself.

System Architecture

System Architecture


2. The Problem: How Fake Detection Actually Works

A real-world fake account detector does not read post content. Detection relies on three categories of signals computed from metadata:

Signal Hierarchy (Node -> Behavioral -> Graph)

Signal Hierarchy

  • Node signals (offline): content fingerprints like photo reuse, bio-template similarity, and comment repetition provide the first suspicion layer.
  • Behavioral signals (temporal/device): coordinated posting hour, account-age clustering, and shared IP subnet add stronger gang-level evidence.
  • Graph signals (live at INSPECT): mutual follows, flagged-neighbor growth, and cluster alignment are hardest to evade, so they carry the highest weight in risk scoring.
  • False-positive control: high-legitimacy hubs (for example celebrities) are down-weighted through hub-legitimacy discounting.

3. Synthetic Data Generation

File: server/generator.py

Episodes are generated deterministically by seed. 150 episodes are pre-generated (50 per task) and cached as JSON files in episodes/.

Network Composition

Task Network size Gang Decoys Real Max steps
easy 50 10 0 40 30
medium 200 10 20 170 50
hard 1000 10 50 940 80
  • Gang accounts: All 10 share base_age (same creation week), tightly clustered avg_post_hour, high photo_reuse_score/bio_template_score, comment_repeat_score in [0.60, 0.90], ip_cluster_id = "ip_gang_{seed}", and dense intra-gang follow edges (density 0.60–0.80).
  • Real accounts: Log-normal follower distributions, unique IP clusters, low fake scores.
  • Decoy accounts (medium/hard): Real accounts with elevated fraud scores (0.20–0.40 range) — they look suspicious but are NOT gang members and penalise reckless flagging.
  • Celebrity accounts (2 per episode): 100k–5M followers, very low fake scores, high hub_legitimacy_score.
  • Zero-edge isolates (2 per episode): No edges — test whether the agent wastes steps on disconnected nodes.

4. Data Model

File: models.py

ActionType

Value Cost Effect
inspect 1 step Reveals full AccountProfile + follow list
investigate_network 2 steps Expands 2 hops; reveals account IDs only
flag 0 steps Marks account as gang member; triggers SUSPECT cascade
unflag 0 steps Removes flag; clears CONFIRMED_FAKE status
submit 0 steps Ends episode; triggers scoring

AccountProfile — key fields

Category Fields
Raw counts follower_count, following_count, post_count
Temporal avg_post_hour, account_age_days
Content pipeline (0–1) photo_reuse_score, bio_template_score, comment_repeat_score
IP/device shared_ip_count, ip_cluster_id
Graph (live at INSPECT) mutual_follow_rate, flagged_neighbor_count, avg_neighbor_photo_reuse, post_hour_cluster_score
Risk breakdown fake_risk_score, node_risk, behavior_risk, graph_risk, hub_legitimacy_score
Evasion/status name_change_count, status (NORMAL/SUSPECT/CONFIRMED_FAKE)

FakeGangObservation — what the agent sees each step

done, reward, visible_accounts, visible_account_ids, flagged_ids, inspected_ids, suspect_ids, graph_edges, steps_remaining, evasion_triggered, evasion_count, task, message


5. The RL Environment

File: server/environment.py

Episode Lifecycle & Action Mechanics

Episode Flow

FLAG cascade (dual): When FLAG(X) is called — (1) every visible account that X follows becomes SUSPECT via the follow-graph, and (2) every visible account sharing X's ip_cluster_id becomes SUSPECT. Gang members share ip_gang_{seed}; real accounts have unique IPs → zero false positives.

Reward Function

base_reward = tp×1.0 − fp×0.5 − fn×0.3

Win condition:
  easy/medium:  recall ≥ 0.8 AND precision ≥ 0.7
  hard:         recall ≥ 0.9 AND precision ≥ 0.8

Bonuses:
  +5.0   full win
  +3.0   perfect recall
  +2.0   partial win (high recall, low precision)
  +1.0   efficiency (SUBMIT with ≥50% steps remaining)
  −1.0   per evasion event (hard mode)
  −2.0   forced submit (ran out of steps)

Evasion (hard mode)

  • unfollow_intragang: 30% of intra-gang edges randomly removed mid-investigation — destroys graph signal. Fires 4 times (steps 15, 30, 45, 60).
  • rename_count: Random gang members get name_change_count += 1 — a visual evasion signal.

6. Risk Scoring Mathematics

File: server/scoring.py — all functions are stateless and deterministic.

pipw

Risk Scoring Formulas 1

Risk Scoring Formulas 2


8. The LLM Policy (Qwen3 via Bedrock)

File: agent/policy.py

Model: qwen.qwen3-next-80b-a3b via AWS Bedrock Converse API (maxTokens=512, temperature=0.4)

Prompt Structure

Every step, the policy builds a prompt from three components:

[reflections from past episodes]       ← grows richer every episode
[best trajectory few-shot example]     ← best win ever, showing the full action log
━━━ CURRENT CASE ━━━
[formatted observation]                ← status badges, risk scores, suspect list
What is your next action?

Accounts in the observation are sorted by fake_risk_score descending, with status badges prepended. fnbr=N(!) highlights when flagged_neighbor_count > 0; [HUB?] warns the LLM not to flag high-legitimacy accounts.

Required Response Format

<thinking>
Reasoning — which account is most suspicious and why.
</thinking>
<action>
INSPECT acc_0041
</action>

If parsing fails, a heuristic fallback inspects the highest-scored uninspected account. Retries use exponential backoff (1s, 2s, 4s) up to 3 attempts.


9. Reflexion — How the Agent Learns

Files: agent/reflection.py, agent/memory.py

The agent cannot update Qwen3's weights — Bedrock is a black-box API. Instead, it learns via Reflexion: post-episode lessons are written as text and injected into future prompts.

Reflexion Learning Loop

Reflexion Learning Loop

Episode N:
  1. LLM acts using: system_prompt + reflections[last 4] + best_trajectory
  2. Episode ends → WIN or LOSS
  3. Post-episode:
     LOSS → generate_reflection(action_log, outcome) → lesson stored
     WIN  → save trajectory if better reward + generate_success_reflection

Episode N+1:
  → last 4 reflections + best win trajectory injected into prompt
  → LLM has learned from its past

Example generated reflection:

"The starting accounts were all real; I wasted 8 steps inspecting low-signal nodes before pivoting. When photo_reuse and bio_template are both below 0.3 after 3 inspections, immediately use INVESTIGATE_NETWORK to jump to a different graph region."

All memory persists in a Docker volume (memory/) across container restarts — reflections, best trajectories, win history, and α values per task.


10. Hybrid Policy — The Novel Contribution

File: agent/hybrid_policy.py

Key insight: A new LLM agent starts dumb but improves over time. A rule engine is always consistent but cannot adapt. The hybrid policy exploits both — rules provide a safety net early while the LLM builds its track record; once the LLM proves itself, rules step back.

Architecture

Hybrid Policy Architecture

Alpha (α): The Trust Weight

α is a per-task value in [0.20, cap] representing current trust in the LLM:

reflection_factor = min(1.0, n_reflections / 4.0)
raw = 0.20 + reflection_factor × (0.80 × recent_win_rate + 0.12)
α = clamp(raw, 0.20, cap)
Task α cap Rationale
easy 0.50 Rule engine alone achieves ~91% — LLM should assist, not override
medium 0.70 Decoys require some LLM judgment, but cascade must stay
hard 0.85 LLM needs latitude for evasion adaptation, but safety rules remain

Alpha trajectory over training (easy task, cap=0.50):

Episode Win rate Reflections α (capped)
1 0% 0 0.20
5 20% 4 0.48
10 50% 9 0.50
20 80% 19 0.50

System Architecture

Rule Confidence Levels

Situation Action Confidence
Steps remaining = 0 SUBMIT 1.00
Uninspected SUSPECT accounts exist INSPECT suspects[0] 0.95
fake_risk ≥ 0.85 FLAG that account 0.95
fake_risk in [threshold, 0.85) FLAG that account 0.70+
10 accounts already flagged SUBMIT 0.85
Steps remaining ≤ 3 SUBMIT 0.90
Uninspected accounts available INSPECT top candidate 0.30

At α=0.20 (early): rules dominate (~90% of decisions). At α=0.50 (moderate): LLM controls exploration; rules control safety. At α=0.85 (high): LLM controls most decisions; rules only override forced submits and uninspected suspects.

α is saved to memory/alpha_{task}.json and persists across Docker restarts — the agent doesn't reset to 0.20 every time.


11. Training Loop End-to-End

File: train.py

Curriculum

Phase Episodes Task Goal
1 1–20 easy Learn basic signal thresholds, build first reflections
2 21–35 medium Handle decoys, learn evasion response
3 36–50 hard Feature-only detection, persistent evasion

Seeds rotate deterministically: seed = (episode_num + task_offset) % 50

Per-Episode Flow

for ep in range(n_episodes):

  1. DETERMINE TASK      curriculum_task(ep) or fixed task
  2. COMPUTE ALPHA       compute_alpha(win_rate, n_reflections, task)
  3. LOAD CONTEXT        last 4 reflections + best win trajectory
  4. RUN EPISODE         while not obs.done:
                           blend(rule_action, llm_action, rule_conf, α)
                           → obs = env.step(final)
  5. POST-EPISODE        record_win → update α → generate reflection
  6. LOG                 task | win/loss | reward | recall | precision | α | modes

Episode metrics (flushed to runs/metrics.jsonl every 5 episodes) include: episode, task, won, reward, recall, precision, steps_used, alpha_used, mode_agree, mode_rule, mode_llm, n_reflections_used.

You can watch the transition: early episodes have high rule counts; later episodes have high agree counts (LLM learned to make the same decisions as the rules, but also brings strategic reasoning the rules can't).


12. API Reference

File: server/app.py

Endpoint Method Description
/health GET {"status": "healthy"}
/tasks GET Task list + action_schema + score_range: [0.0, 1.0]
/reset POST Accepts {task, seed} → returns initial observation
/step POST Accepts any FakeGangAction → returns updated observation
/state GET Current episode metadata (step count, task, score)
/grader GET Normalised [0.0, 1.0] score after SUBMIT
/baseline POST Runs rule-based agent on all 3 tasks, returns scores

Baseline performance:

Task Seed=0 score Win rate (50 seeds) Mean score (50 seeds)
easy 0.91 100% ~0.91
medium 0.906 84% ~0.77
hard 0.9038 52% ~0.47

13. Docker Deployment

# Build
docker build -f server/Dockerfile -t graphstrike .

# Run
docker run -it \
  -e AWS_ACCESS_KEY_ID=your_key \
  -e AWS_SECRET_ACCESS_KEY=your_secret \
  -v $(pwd)/memory:/app/memory \
  -v $(pwd)/runs:/app/runs \
  -p 8000:8000 \
  graphstrike

The memory/ and runs/ volumes preserve all learning between container restarts.

Environment Variables

Variable Default Description
AWS_ACCESS_KEY_ID (required) For Bedrock/Qwen3 access
AWS_SECRET_ACCESS_KEY (required) For Bedrock/Qwen3 access
AWS_DEFAULT_REGION us-east-1 Bedrock region
TRAIN_TASK (curriculum) Fix to easy/medium/hard
TRAIN_EPISODES 50 Total training episodes
TRAIN_TEMP 0.4 LLM sampling temperature
TRAIN_VERBOSE 0 Set 1 for per-step action logging
SERVER_PORT 8000 FastAPI port

Startup Sequence (run.sh)

1. Validate AWS credentials
2. python server/generator.py    → generates 150 episode JSON files
3. uvicorn server.app:app        → starts the environment server
4. Health check polling          → waits until /health responds
5. python train.py               → runs the full training loop

Full HTTP validation

python3 -m uvicorn server.app:app --port 8001 &
sleep 3
python3 validate.py --url http://localhost:8001
# Expected: Results: 24/24 passed — all OK

Deployed Endpoint Verification

curl https://pandago-graphstrike.hf.space/health
# → {"status": "healthy"}

curl https://pandago-graphstrike.hf.space/tasks
# → {"tasks": ["easy","medium","hard"], "action_schema": {...}, "score_range": [0.0, 1.0]}

curl -X POST https://pandago-graphstrike.hf.space/baseline
# → {"scores": {"easy": 0.91, "medium": 0.906, "hard": 0.9038}, "agent": "rule_based"}

Material wave loading

Developed with ❤️ by Team ComputeXOR

{

}

About

we're addicted to solve some real issues ~ Team ComputeXOR

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 98.2%
  • Other 1.8%