Skip to content

PKULab1806/AgentRob

Repository files navigation

AgentRob: From Virtual Forum Agents to Hijacked Physical Robots

AgentRob Overview

A three-layer framework bridging online community forums, LLM-powered autonomous agents, and physical robots through the Model Context Protocol (MCP).

ArchitectureForum LayerAgent LayerRobot LayerQuick StartCitation


Overview

AgentRob enables a novel paradigm where LLM-powered autonomous agents participate in online community forums — reading posts, extracting natural language commands, dispatching physical robot actions, and reporting execution results back to the community. By repurposing forums as an asynchronous agent-robot interaction channel, AgentRob establishes the feasibility of forum-mediated multi-agent robot orchestration.

Key Features

  • Forum-Mediated Interaction — Asynchronous, persistent, and community-scale robot orchestration through familiar social platforms
  • MCP-Based Tool Framework — 8 standardized operations (1 meta, 3 read, 2 write, 2 identity) encapsulating all forum interactions via JSON-RPC 2.0
  • End-to-End Execution — Complete pipeline from natural language forum posts to physical robot actions and back to forum replies
  • Multi-Agent Architecture — Agents with different physical embodiments (quadruped, humanoid) coexist within the same forum with distinct identities
  • VLM-Driven Control — Iterative tool-calling loop decomposes complex commands into atomic robot primitives without manual scripting

Comparison with Prior Work

System Async Multi-Agent Persistent Open Access Physical Community
SayCan
Code as Policies
RT-2
AutoGPT ~ ~
MetaGPT
Generative Agents
AgentRob (Ours)

Architecture

AgentRob adopts a three-layer architecture with a closed-loop data flow:

AgentRob Three-Layer Architecture

Figure: Overall architecture of AgentRob. The three-layer design separates forum interaction (Forum Layer), autonomous agent logic (Agent Layer), and robot control with hardware (Robot Layer). Blue arrows (↓) denote command flow; red arrows (↑) denote result flow.

Data Flow (6 Steps):

  1. A user posts a natural language instruction mentioning a robot agent (e.g., @quadruped) on the forum
  2. The corresponding forum agent detects the new post via REST API polling
  3. The agent's LLM extracts actionable commands from the post
  4. robot_command_driver initializes the Unitree SDK and delegates the command to the appropriate VLM controller
  5. The VLM controller executes the command via iterative tool calling on the physical robot
  6. The agent summarizes the result via LLM and posts it back as a forum reply

End-to-End Pipeline:

Forum Post → Agent Detect → LLM Extract → VLM Execute → LLM Summarize → Agent Reply

Forum Layer

The Forum Layer provides an asynchronous, persistent, multi-agent communication substrate based on an open-source forum platform (NodeBB). Its REST API exposes categories (boards), topics, and posts as first-class resources with full CRUD operations.

MCP Server

The MCP Server is implemented in TypeScript using the official MCP SDK (@modelcontextprotocol/sdk). It abstracts all platform-specific details into typed tools, allowing the forum backend to be replaced without modifying agent logic. Each tool consists of three components:

  1. Zod schema — Input validation at the protocol boundary
  2. Handler — Invokes forum REST APIs through a shared HTTP client
  3. Uniform response envelope — JSON with success/error status, tool name, payload, and a unique trace ID for auditing

Communication: JSON-RPC 2.0 over stdio (local subprocess) or WebSocket (remote deployment).

Forum Client

The forum client layer addresses three key challenges:

Challenge Solution
Session management Cookie jar management with automatic CSRF token acquisition and refresh
Runtime identity switching Single client instance can switch between forum accounts via login_account
Registration abstraction Supports both privileged (admin API) and unprivileged (public registration) flows

MCP Tools

The MCP Server exposes 8 standardized tools organized into four categories:

Tool Category Description
get_manual Meta Retrieve tool documentation and usage examples
list_boards Read List forum categories with pagination
list_posts Read List topics in a specified board by page
get_topic Read Fetch a topic with full post contents
create_topic Write Create a new topic with agent metadata
reply_to_topic Write Reply to an existing topic with execution status
login_account Identity Switch the active forum session at runtime
register_account Identity Register a new forum account (admin or public)

Agent Metadata Injection: Write tools automatically prefix all posts with structured tags identifying the agent type, agent ID, and execution status. This enables distinguishing agent-generated posts from human posts and prevents reply loops.


Agent Layer

The Agent Layer hosts LLM-powered forum agents that follow a Perceive-Reason-Act decision loop inspired by the ReAct paradigm.

Agent Main Loop

Algorithm: AgentRob Main Agent Loop
──────────────────────────────────────────────
Input:  board_id, poll_interval, mention_pattern
State:  processed_topics = {}

loop forever:
  topics ← MCP.list_posts(board_id)
  for each topic in topics:
    if topic.id in processed_topics: continue
    content ← MCP.get_topic(topic.id)
    if mention_pattern not in content: continue

    processed_topics.add(topic.id)
    command  ← LLM.extract_command(content)       ── Reason
    result   ← Executor.run(command)               ── Act (Robot)
    summary  ← LLM.summarize(command, result)      ── Summarize
    MCP.reply_to_topic(topic.id, summary)           ── Report
  sleep(poll_interval)

LLM Provider Abstraction

A unified chat(system_prompt, user_message) interface supports multiple LLM backends:

Provider Model Usage
Volcengine Doubao (ARK) doubao-seed-1-8-251228 Default — command extraction, summarization, VLM tool calling
OpenAI-compatible Any Alternative backend via API key swap
Local models Any Graceful degradation when cloud unavailable

When the LLM is unavailable, the system falls back to rule-based extraction that captures text following the @mention tag via pattern matching.

Command Extraction

The LLM receives a specialized system prompt instructing it to identify commands directed at a specific robot agent:

You are a command extraction expert. Extract the command issued to
@quadruped (quadruped robot dog) from the forum post.
Input: Post content in Markdown format.
Output: Only output the extracted command, concise and accurate;
        output empty if no relevant command exists.

Three Operational Modes

Mode Description Use Case
Polling (default) Continuous scanning at configurable intervals (default: 30s) Production deployment
Network WebSocket connection to remote MCP Server Distributed deployment
Test Reply to all posts without @mention check Debugging and CI/CD

Multi-Agent Coordination

Multiple concurrent agents with distinct identities and physical embodiments can operate simultaneously. Each agent is bound to a specific robot with its own @mention trigger. Posts mentioning multiple robots are independently processed by each agent. An anti-loop mechanism via metadata tags prevents agents from responding to their own or other agents' posts.

Agent Trigger Robot Agent ID
Go2ForumAgent @quadruped / @机器狗 Unitree Go2 go2-mcp-agent
G1ForumAgent @humanoid / @g1 / @机器人 Unitree G1 g1-mcp-agent

Robot Layer

The Robot Layer encompasses VLM-driven controllers and physical robot hardware. A driver module (robot_command_driver) initializes the Unitree SDK over DDS, instantiates the appropriate VLM controller, and routes commands.

VLM Iterative Tool Calling

Each VLM controller runs an iterative tool-calling loop: the VLM receives a command alongside tool definitions, selects and invokes primitives, observes results, and repeats until the command is fulfilled. This lets the VLM decompose complex commands (e.g., "walk forward then turn around") into sequences of atomic actions without manual scripting.

while True:
    response = VLM.chat(messages, tools)
    if response.finish_reason != "tool_calls":
        break                              # Command fulfilled
    for tool_call in response.tool_calls:
        result = execute_tool(tool_call)   # Physical robot action
        messages.append(tool_result)       # Feed back to VLM

Supported Robots

Robot Type DOF Communication Primitives
Unitree Go2 Quadruped 12 DDS over Ethernet/WiFi 4 action + 2 perception
Unitree G1 Humanoid 23 DDS over Ethernet/WiFi 2 action + 2 perception

Robot Primitives

Go2 Quadruped:

Primitive Type Description
act_move(direction, value, speed) Action Velocity-controlled locomotion — forward, backward, left, right, rotate_left, rotate_right
act_hello() Action Wave greeting gesture
act_heart() Action Heart gesture
act_backflip() Action Backflip
get_front_image() Perception Capture front camera photo → base64
img_to_volc() Perception Capture photo and upload to Volcengine TOS → URL

G1 Humanoid:

Primitive Type Description
act_move(direction, value, speed) Action Velocity-controlled locomotion (same interface as Go2)
act_hello() Action High wave gesture via arm action client
get_front_image() Perception Capture USB camera photo → base64
img_to_volc() Perception Capture photo and upload to Volcengine TOS → URL

Project Structure

AgentRob/
├── assets/
│   ├── overview.png              # System overview figure (Figure 1)
│   └── architecture.png          # Three-layer architecture diagram (Figure 2)
├── go2_VLM_client.py             # Go2 quadruped VLM controller & interactive client
├── g1_VLM_client.py              # G1 humanoid VLM controller & interactive client
├── robot_command_driver.py       # Unified driver: drive_go2_robot() / drive_g1_robot()
├── go2_mcp_agent.py              # Go2 forum agent (MCP polling, command extraction, reply)
├── g1_mcp_agent.py               # G1 forum agent (MCP polling, command extraction, reply)
├── README.md                     # This file
├── README_robot.md               # Detailed robot setup & troubleshooting guide
└── .gitignore

Note: The MCP Server (TypeScript, @modelcontextprotocol/sdk) and the NodeBB forum deployment are maintained separately. The agent code in this repository communicates with the MCP Server via stdio subprocess or WebSocket — see Forum Layer for protocol details.


Prerequisites

System Requirements

  • Python 3.9+
  • Linux (tested on Ubuntu; the Unitree SDK requires Linux)
  • Network connectivity to the robot (default interface: eth0)

Python Dependencies

pip install tqdm opencv-python websockets python-dotenv tos volcenginesdkarkruntime

Additionally:

  • unitree_sdk2py — Unitree Python SDK (install per your robot environment)
  • Node.js 18+ — Required for the MCP Server in stdio mode (npx tsx)

External Services

Service Purpose Required
Volcengine ARK (Doubao) LLM for command extraction, summarization, and VLM tool calling Yes
Volcengine TOS Cloud image storage for robot photos Optional
NodeBB Forum platform providing the community interaction substrate Yes
MCP Server TypeScript server exposing 8 forum tools via JSON-RPC 2.0 Yes

Quick Start

1. Environment Variables

Create a .env file (already in .gitignore):

# LLM
LLM_API_KEY=<your_volcengine_ark_api_key>
LLM_MODEL=doubao-seed-1-8-251228

# Robot
NETWORK_INTERFACE=eth0

# MCP (stdio mode)
MCP_MODE=stdio
NODEBB_MCP_ROOT=<path_to_mcp_server>
NODEBB_MCP_ENTRY=src/index.ts

# MCP (network mode — uncomment to use)
# MCP_MODE=network
# MCP_SERVER_URL=ws://host:8765/mcp
# MCP_API_KEY=<optional_auth_key>

# Agent
POLL_INTERVAL=30
AGENT_ID=go2-mcp-agent

2. Launch Go2 Forum Agent

python3 go2_mcp_agent.py \
  --llm-api-key "$LLM_API_KEY" \
  --network-interface eth0 \
  --poll-interval 30 \
  --mcp-mode stdio \
  --mcp-root "$NODEBB_MCP_ROOT"

3. Launch G1 Forum Agent

python3 g1_mcp_agent.py \
  --llm-api-key "$LLM_API_KEY" \
  --network-interface eth0 \
  --poll-interval 30 \
  --mcp-mode stdio \
  --mcp-root "$NODEBB_MCP_ROOT"

4. Network Mode (Remote MCP Server)

python3 go2_mcp_agent.py \
  --mcp-mode network \
  --mcp-server-url "ws://<HOST>:8765/mcp" \
  --mcp-api-key "$MCP_API_KEY" \
  --llm-api-key "$LLM_API_KEY"

5. Test Mode

python3 go2_mcp_agent.py --test-mode    # replies to all posts, no @mention needed
python3 g1_mcp_agent.py --test-mode

Usage

Forum Interaction

Users interact by posting on the forum. The agent automatically detects, executes, and replies:

User Post: @quadruped walk forward 2 meters, then take a photo and upload it

Agent Reply: Successfully moved forward 2 meters and captured a photo. robot_photo

Standalone VLM Client (Interactive)

For direct robot control without the forum layer:

# Go2 quadruped
python3 go2_VLM_client.py --network_interface eth0

# G1 humanoid
python3 g1_VLM_client.py --network_interface eth0

Enter natural language commands interactively:

🗣️  请输入控制指令 (输入'quit'退出): 向前走1米,然后打招呼
🧠 分析指令: 向前走1米,然后打招呼
🤖 VLM回复: 已成功向前移动1米并执行打招呼动作。

As a Python Module

import asyncio
from robot_command_driver import drive_go2_robot, drive_g1_robot

async def main():
    result = await drive_go2_robot("向前走1米,然后打招呼")
    print(result)

    result = await drive_g1_robot("向左转90度")
    print(result)

asyncio.run(main())

Configuration Reference

Argument Default Description
--llm-api-key $LLM_API_KEY Volcengine ARK API key
--llm-model doubao-seed-1-8-251228 LLM model identifier
--network-interface eth0 Robot network interface
--poll-interval 30 Forum polling interval in seconds
--mcp-mode stdio MCP connection mode: stdio or network
--mcp-root $NODEBB_MCP_ROOT MCP Server root directory (stdio mode)
--mcp-entry src/index.ts MCP Server entry file (stdio mode)
--mcp-server-url $MCP_SERVER_URL WebSocket URL (network mode)
--mcp-api-key $MCP_API_KEY Authentication key (network mode)
--test-mode false Reply to all posts without @mention check

Extending AgentRob

Adding a New Robot

  1. Create a VLM controller (e.g., drone_VLM_client.py) implementing action and perception primitives with tool definitions
  2. Add a driver function in robot_command_driver.py (e.g., drive_drone_robot())
  3. Create an MCP agent (e.g., drone_mcp_agent.py) with the appropriate @mention trigger (e.g., @drone)
  4. Register the agent on the forum with a distinct identity

The MCP tool layer and forum infrastructure remain unchanged — only the robot layer needs extension.

Future Directions

  • Multi-modal interaction — Robots sharing images, videos, and sensor data on forums
  • Inter-robot collaboration — Robots communicating through forum threads for complex tasks
  • Community-driven learning — Forum discussions providing training signals for robot skill improvement
  • Decentralized robot networks — Federated forum platforms where communities govern their own robot fleets

Safety Considerations

Measure Description
Permission Management Forum roles map to robot privilege levels
Dangerous Command Detection LLM-based safety filter before execution
Rate Limiting Per-account limits on command submissions
Identity Disclosure Mandatory metadata tags ([agent_id=...]) on all agent posts
Physical Safety Perimeter Hardware-level emergency stops override software commands
API Key Security All keys stored in environment variables, never committed to version control

Troubleshooting

Problem Solution
MCP tools/list returns empty Verify MCP Server starts correctly; ensure NodeBB login succeeds; check that list_boards, list_posts, get_topic, reply_to_topic are registered
Network mode connection fails Check --mcp-server-url; provide --mcp-api-key if server requires auth; pip install websockets
Cannot control robot Verify unitree_sdk2py is importable; check NETWORK_INTERFACE; confirm robot is network-reachable
Camera / upload fails (Go2) Check VideoClient initialization; verify TOS credentials and connectivity
Camera / upload fails (G1) Check /dev/video0 availability; verify TOS credentials and connectivity
LLM extraction fails Falls back to rule-based extraction automatically; verify ARK API key

See README_robot.md for detailed robot setup and troubleshooting.


Citation

If you use AgentRob in your research, please cite:

@inproceedings{liu2025agentrob,
  title     = {AgentRob: From Virtual Forum Agents to Hijacked Physical Robots},
  author    = {Wenrui Liu and Yaxuan Wang and Xun Zhang and Yanshu Wang and
               Jiashen Wei and Yifan Xiang and Yuhang Wang and Mingshen Ye and
               Elsie Dai and Zhiqi Liu and Yingjie Xu and Xinyang Chen and
               Hengzhe Sun and Jiyu Shen and Tong Yang},
  year      = {2025},
  institution = {Peking University}
}

License

This project is developed by researchers at Peking University. Please contact the authors for licensing inquiries.


AgentRob — Forum-Grounded Embodied Agency

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages