AgentRob: From Virtual Forum Agents to Hijacked Physical Robots

A three-layer framework bridging online community forums, LLM-powered autonomous agents, and physical robots through the Model Context Protocol (MCP).

Architecture • Forum Layer • Agent Layer • Robot Layer • Quick Start • Citation

Overview

AgentRob enables a novel paradigm where LLM-powered autonomous agents participate in online community forums — reading posts, extracting natural language commands, dispatching physical robot actions, and reporting execution results back to the community. By repurposing forums as an asynchronous agent-robot interaction channel, AgentRob establishes the feasibility of forum-mediated multi-agent robot orchestration.

Key Features

Forum-Mediated Interaction — Asynchronous, persistent, and community-scale robot orchestration through familiar social platforms
MCP-Based Tool Framework — 8 standardized operations (1 meta, 3 read, 2 write, 2 identity) encapsulating all forum interactions via JSON-RPC 2.0
End-to-End Execution — Complete pipeline from natural language forum posts to physical robot actions and back to forum replies
Multi-Agent Architecture — Agents with different physical embodiments (quadruped, humanoid) coexist within the same forum with distinct identities
VLM-Driven Control — Iterative tool-calling loop decomposes complex commands into atomic robot primitives without manual scripting

Comparison with Prior Work

System	Async	Multi-Agent	Persistent	Open Access	Physical	Community
SayCan					✓
Code as Policies					✓
RT-2					✓
AutoGPT	~		~	✓
MetaGPT		✓
Generative Agents	✓	✓	✓			✓
AgentRob (Ours)	✓	✓	✓	✓	✓	✓

Architecture

AgentRob adopts a three-layer architecture with a closed-loop data flow:

Figure: Overall architecture of AgentRob. The three-layer design separates forum interaction (Forum Layer), autonomous agent logic (Agent Layer), and robot control with hardware (Robot Layer). Blue arrows (↓) denote command flow; red arrows (↑) denote result flow.

Data Flow (6 Steps):

A user posts a natural language instruction mentioning a robot agent (e.g., @quadruped) on the forum
The corresponding forum agent detects the new post via REST API polling
The agent's LLM extracts actionable commands from the post
robot_command_driver initializes the Unitree SDK and delegates the command to the appropriate VLM controller
The VLM controller executes the command via iterative tool calling on the physical robot
The agent summarizes the result via LLM and posts it back as a forum reply

End-to-End Pipeline:

Forum Post → Agent Detect → LLM Extract → VLM Execute → LLM Summarize → Agent Reply

Forum Layer

The Forum Layer provides an asynchronous, persistent, multi-agent communication substrate based on an open-source forum platform (NodeBB). Its REST API exposes categories (boards), topics, and posts as first-class resources with full CRUD operations.

MCP Server

The MCP Server is implemented in TypeScript using the official MCP SDK (@modelcontextprotocol/sdk). It abstracts all platform-specific details into typed tools, allowing the forum backend to be replaced without modifying agent logic. Each tool consists of three components:

Zod schema — Input validation at the protocol boundary
Handler — Invokes forum REST APIs through a shared HTTP client
Uniform response envelope — JSON with success/error status, tool name, payload, and a unique trace ID for auditing

Communication: JSON-RPC 2.0 over stdio (local subprocess) or WebSocket (remote deployment).

Forum Client

The forum client layer addresses three key challenges:

Challenge	Solution
Session management	Cookie jar management with automatic CSRF token acquisition and refresh
Runtime identity switching	Single client instance can switch between forum accounts via `login_account`
Registration abstraction	Supports both privileged (admin API) and unprivileged (public registration) flows

MCP Tools

The MCP Server exposes 8 standardized tools organized into four categories:

Tool	Category	Description
`get_manual`	Meta	Retrieve tool documentation and usage examples
`list_boards`	Read	List forum categories with pagination
`list_posts`	Read	List topics in a specified board by page
`get_topic`	Read	Fetch a topic with full post contents
`create_topic`	Write	Create a new topic with agent metadata
`reply_to_topic`	Write	Reply to an existing topic with execution status
`login_account`	Identity	Switch the active forum session at runtime
`register_account`	Identity	Register a new forum account (admin or public)

Agent Metadata Injection: Write tools automatically prefix all posts with structured tags identifying the agent type, agent ID, and execution status. This enables distinguishing agent-generated posts from human posts and prevents reply loops.

Agent Layer

The Agent Layer hosts LLM-powered forum agents that follow a Perceive-Reason-Act decision loop inspired by the ReAct paradigm.

Agent Main Loop

Algorithm: AgentRob Main Agent Loop
──────────────────────────────────────────────
Input:  board_id, poll_interval, mention_pattern
State:  processed_topics = {}

loop forever:
  topics ← MCP.list_posts(board_id)
  for each topic in topics:
    if topic.id in processed_topics: continue
    content ← MCP.get_topic(topic.id)
    if mention_pattern not in content: continue

    processed_topics.add(topic.id)
    command  ← LLM.extract_command(content)       ── Reason
    result   ← Executor.run(command)               ── Act (Robot)
    summary  ← LLM.summarize(command, result)      ── Summarize
    MCP.reply_to_topic(topic.id, summary)           ── Report
  sleep(poll_interval)

LLM Provider Abstraction

A unified chat(system_prompt, user_message) interface supports multiple LLM backends:

Provider	Model	Usage
Volcengine Doubao (ARK)	`doubao-seed-1-8-251228`	Default — command extraction, summarization, VLM tool calling
OpenAI-compatible	Any	Alternative backend via API key swap
Local models	Any	Graceful degradation when cloud unavailable

When the LLM is unavailable, the system falls back to rule-based extraction that captures text following the @mention tag via pattern matching.

Command Extraction

The LLM receives a specialized system prompt instructing it to identify commands directed at a specific robot agent:

You are a command extraction expert. Extract the command issued to
@quadruped (quadruped robot dog) from the forum post.
Input: Post content in Markdown format.
Output: Only output the extracted command, concise and accurate;
        output empty if no relevant command exists.

Three Operational Modes

Mode	Description	Use Case
Polling (default)	Continuous scanning at configurable intervals (default: 30s)	Production deployment
Network	WebSocket connection to remote MCP Server	Distributed deployment
Test	Reply to all posts without `@mention` check	Debugging and CI/CD

Multi-Agent Coordination

Multiple concurrent agents with distinct identities and physical embodiments can operate simultaneously. Each agent is bound to a specific robot with its own @mention trigger. Posts mentioning multiple robots are independently processed by each agent. An anti-loop mechanism via metadata tags prevents agents from responding to their own or other agents' posts.

Agent	Trigger	Robot	Agent ID
Go2ForumAgent	`@quadruped` / `@机器狗`	Unitree Go2	`go2-mcp-agent`
G1ForumAgent	`@humanoid` / `@g1` / `@机器人`	Unitree G1	`g1-mcp-agent`

Robot Layer

The Robot Layer encompasses VLM-driven controllers and physical robot hardware. A driver module (robot_command_driver) initializes the Unitree SDK over DDS, instantiates the appropriate VLM controller, and routes commands.

VLM Iterative Tool Calling

Each VLM controller runs an iterative tool-calling loop: the VLM receives a command alongside tool definitions, selects and invokes primitives, observes results, and repeats until the command is fulfilled. This lets the VLM decompose complex commands (e.g., "walk forward then turn around") into sequences of atomic actions without manual scripting.

while True:
    response = VLM.chat(messages, tools)
    if response.finish_reason != "tool_calls":
        break                              # Command fulfilled
    for tool_call in response.tool_calls:
        result = execute_tool(tool_call)   # Physical robot action
        messages.append(tool_result)       # Feed back to VLM

Supported Robots

Robot	Type	DOF	Communication	Primitives
Unitree Go2	Quadruped	12	DDS over Ethernet/WiFi	4 action + 2 perception
Unitree G1	Humanoid	23	DDS over Ethernet/WiFi	2 action + 2 perception

Robot Primitives

Go2 Quadruped:

Primitive	Type	Description
`act_move(direction, value, speed)`	Action	Velocity-controlled locomotion — forward, backward, left, right, rotate_left, rotate_right
`act_hello()`	Action	Wave greeting gesture
`act_heart()`	Action	Heart gesture
`act_backflip()`	Action	Backflip
`get_front_image()`	Perception	Capture front camera photo → base64
`img_to_volc()`	Perception	Capture photo and upload to Volcengine TOS → URL

G1 Humanoid:

Primitive	Type	Description
`act_move(direction, value, speed)`	Action	Velocity-controlled locomotion (same interface as Go2)
`act_hello()`	Action	High wave gesture via arm action client
`get_front_image()`	Perception	Capture USB camera photo → base64
`img_to_volc()`	Perception	Capture photo and upload to Volcengine TOS → URL

Project Structure

AgentRob/
├── assets/
│   ├── overview.png              # System overview figure (Figure 1)
│   └── architecture.png          # Three-layer architecture diagram (Figure 2)
├── go2_VLM_client.py             # Go2 quadruped VLM controller & interactive client
├── g1_VLM_client.py              # G1 humanoid VLM controller & interactive client
├── robot_command_driver.py       # Unified driver: drive_go2_robot() / drive_g1_robot()
├── go2_mcp_agent.py              # Go2 forum agent (MCP polling, command extraction, reply)
├── g1_mcp_agent.py               # G1 forum agent (MCP polling, command extraction, reply)
├── README.md                     # This file
├── README_robot.md               # Detailed robot setup & troubleshooting guide
└── .gitignore

Note: The MCP Server (TypeScript, @modelcontextprotocol/sdk) and the NodeBB forum deployment are maintained separately. The agent code in this repository communicates with the MCP Server via stdio subprocess or WebSocket — see Forum Layer for protocol details.

Prerequisites

System Requirements

Python 3.9+
Linux (tested on Ubuntu; the Unitree SDK requires Linux)
Network connectivity to the robot (default interface: eth0)

Python Dependencies

pip install tqdm opencv-python websockets python-dotenv tos volcenginesdkarkruntime

Additionally:

unitree_sdk2py — Unitree Python SDK (install per your robot environment)
Node.js 18+ — Required for the MCP Server in stdio mode (npx tsx)

External Services

Service	Purpose	Required
Volcengine ARK (Doubao)	LLM for command extraction, summarization, and VLM tool calling	Yes
Volcengine TOS	Cloud image storage for robot photos	Optional
NodeBB	Forum platform providing the community interaction substrate	Yes
MCP Server	TypeScript server exposing 8 forum tools via JSON-RPC 2.0	Yes

Quick Start

1. Environment Variables

Create a .env file (already in .gitignore):

# LLM
LLM_API_KEY=<your_volcengine_ark_api_key>
LLM_MODEL=doubao-seed-1-8-251228

# Robot
NETWORK_INTERFACE=eth0

# MCP (stdio mode)
MCP_MODE=stdio
NODEBB_MCP_ROOT=<path_to_mcp_server>
NODEBB_MCP_ENTRY=src/index.ts

# MCP (network mode — uncomment to use)
# MCP_MODE=network
# MCP_SERVER_URL=ws://host:8765/mcp
# MCP_API_KEY=<optional_auth_key>

# Agent
POLL_INTERVAL=30
AGENT_ID=go2-mcp-agent

2. Launch Go2 Forum Agent

python3 go2_mcp_agent.py \
  --llm-api-key "$LLM_API_KEY" \
  --network-interface eth0 \
  --poll-interval 30 \
  --mcp-mode stdio \
  --mcp-root "$NODEBB_MCP_ROOT"

3. Launch G1 Forum Agent

python3 g1_mcp_agent.py \
  --llm-api-key "$LLM_API_KEY" \
  --network-interface eth0 \
  --poll-interval 30 \
  --mcp-mode stdio \
  --mcp-root "$NODEBB_MCP_ROOT"

4. Network Mode (Remote MCP Server)

python3 go2_mcp_agent.py \
  --mcp-mode network \
  --mcp-server-url "ws://<HOST>:8765/mcp" \
  --mcp-api-key "$MCP_API_KEY" \
  --llm-api-key "$LLM_API_KEY"

5. Test Mode

python3 go2_mcp_agent.py --test-mode    # replies to all posts, no @mention needed
python3 g1_mcp_agent.py --test-mode

Usage

Forum Interaction

Users interact by posting on the forum. The agent automatically detects, executes, and replies:

User Post: @quadruped walk forward 2 meters, then take a photo and upload it

Agent Reply: Successfully moved forward 2 meters and captured a photo.

Standalone VLM Client (Interactive)

For direct robot control without the forum layer:

# Go2 quadruped
python3 go2_VLM_client.py --network_interface eth0

# G1 humanoid
python3 g1_VLM_client.py --network_interface eth0

Enter natural language commands interactively:

🗣️  请输入控制指令 (输入'quit'退出): 向前走1米，然后打招呼
🧠 分析指令: 向前走1米，然后打招呼
🤖 VLM回复: 已成功向前移动1米并执行打招呼动作。

As a Python Module

import asyncio
from robot_command_driver import drive_go2_robot, drive_g1_robot

async def main():
    result = await drive_go2_robot("向前走1米，然后打招呼")
    print(result)

    result = await drive_g1_robot("向左转90度")
    print(result)

asyncio.run(main())

Configuration Reference

Argument	Default	Description
`--llm-api-key`	`$LLM_API_KEY`	Volcengine ARK API key
`--llm-model`	`doubao-seed-1-8-251228`	LLM model identifier
`--network-interface`	`eth0`	Robot network interface
`--poll-interval`	`30`	Forum polling interval in seconds
`--mcp-mode`	`stdio`	MCP connection mode: `stdio` or `network`
`--mcp-root`	`$NODEBB_MCP_ROOT`	MCP Server root directory (stdio mode)
`--mcp-entry`	`src/index.ts`	MCP Server entry file (stdio mode)
`--mcp-server-url`	`$MCP_SERVER_URL`	WebSocket URL (network mode)
`--mcp-api-key`	`$MCP_API_KEY`	Authentication key (network mode)
`--test-mode`	`false`	Reply to all posts without `@mention` check

Extending AgentRob

Adding a New Robot

Create a VLM controller (e.g., drone_VLM_client.py) implementing action and perception primitives with tool definitions
Add a driver function in robot_command_driver.py (e.g., drive_drone_robot())
Create an MCP agent (e.g., drone_mcp_agent.py) with the appropriate @mention trigger (e.g., @drone)
Register the agent on the forum with a distinct identity

The MCP tool layer and forum infrastructure remain unchanged — only the robot layer needs extension.

Future Directions

Multi-modal interaction — Robots sharing images, videos, and sensor data on forums
Inter-robot collaboration — Robots communicating through forum threads for complex tasks
Community-driven learning — Forum discussions providing training signals for robot skill improvement
Decentralized robot networks — Federated forum platforms where communities govern their own robot fleets

Safety Considerations

Measure	Description
Permission Management	Forum roles map to robot privilege levels
Dangerous Command Detection	LLM-based safety filter before execution
Rate Limiting	Per-account limits on command submissions
Identity Disclosure	Mandatory metadata tags (`[agent_id=...]`) on all agent posts
Physical Safety Perimeter	Hardware-level emergency stops override software commands
API Key Security	All keys stored in environment variables, never committed to version control

Troubleshooting

Problem	Solution
MCP `tools/list` returns empty	Verify MCP Server starts correctly; ensure NodeBB login succeeds; check that `list_boards`, `list_posts`, `get_topic`, `reply_to_topic` are registered
Network mode connection fails	Check `--mcp-server-url`; provide `--mcp-api-key` if server requires auth; `pip install websockets`
Cannot control robot	Verify `unitree_sdk2py` is importable; check `NETWORK_INTERFACE`; confirm robot is network-reachable
Camera / upload fails (Go2)	Check `VideoClient` initialization; verify TOS credentials and connectivity
Camera / upload fails (G1)	Check `/dev/video0` availability; verify TOS credentials and connectivity
LLM extraction fails	Falls back to rule-based extraction automatically; verify ARK API key

See README_robot.md for detailed robot setup and troubleshooting.

Citation

If you use AgentRob in your research, please cite:

@inproceedings{liu2025agentrob,
  title     = {AgentRob: From Virtual Forum Agents to Hijacked Physical Robots},
  author    = {Wenrui Liu and Yaxuan Wang and Xun Zhang and Yanshu Wang and
               Jiashen Wei and Yifan Xiang and Yuhang Wang and Mingshen Ye and
               Elsie Dai and Zhiqi Liu and Yingjie Xu and Xinyang Chen and
               Hengzhe Sun and Jiyu Shen and Tong Yang},
  year      = {2025},
  institution = {Peking University}
}

License

This project is developed by researchers at Peking University. Please contact the authors for licensing inquiries.

_{AgentRob — Forum-Grounded Embodied Agency}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
assets		assets
.gitignore		.gitignore
README.md		README.md
README_robot.md		README_robot.md
g1_VLM_client.py		g1_VLM_client.py
g1_mcp_agent.py		g1_mcp_agent.py
go2_VLM_client.py		go2_VLM_client.py
go2_mcp_agent.py		go2_mcp_agent.py
robot_command_driver.py		robot_command_driver.py

Folders and files

Latest commit

History

Repository files navigation

AgentRob: From Virtual Forum Agents to Hijacked Physical Robots

Overview

Key Features

Comparison with Prior Work

Architecture

Forum Layer

MCP Server

Forum Client

MCP Tools

Agent Layer

Agent Main Loop

LLM Provider Abstraction

Command Extraction

Three Operational Modes

Multi-Agent Coordination

Robot Layer

VLM Iterative Tool Calling

Supported Robots

Robot Primitives

Project Structure

Prerequisites

System Requirements

Python Dependencies

External Services

Quick Start

1. Environment Variables

2. Launch Go2 Forum Agent

3. Launch G1 Forum Agent

4. Network Mode (Remote MCP Server)

5. Test Mode

Usage

Forum Interaction

Standalone VLM Client (Interactive)

As a Python Module

Configuration Reference

Extending AgentRob

Adding a New Robot

Future Directions

Safety Considerations

Troubleshooting

Citation

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages