From Token-Level Context to Emergent System-Level Intelligence
Research implementation of the AgentOS architecture proposed in Architecting AgentOS: From Token-Level Context to Emergent System-Level Intelligence.
AgentOS redefines the LLM as a "Reasoning Kernel" governed by structured operating system logic. The core innovation is treating the context window as an Addressable Semantic Space rather than a passive buffer.
| Traditional OS | AgentOS |
|---|---|
| CPU | Reasoning Kernel (RK) |
| RAM | Addressable Semantic Space (L2) |
| Page Tables | Semantic Page Tables |
| Interrupts | Reasoning Interrupts |
| Process Scheduler | Cognitive Scheduler |
- Semantic Slicing - Aggregate tokens into coherent "cognitive pages" based on attention patterns
- Cognitive Memory Hierarchy - L1 (active attention) β L2 (deep context) β L3 (knowledge base)
- Cognitive Sync Pulses - Event-driven synchronization for multi-agent coherence
- Perception Alignment - Optimal timing for merging semantic slices across agents
π Read the full comparison: AgentOS vs Traditional Systems
AgentOS offers bounded, scalable performance for long-running multi-agent conversations:
| Scenario | Traditional | AgentOS | Benefit |
|---|---|---|---|
| 5-turn chat | 500ms | 350ms | 1.4x faster |
| 20-turn chat | 5000ms | 1200ms | 4x faster |
| 100-turn session | 50000ms | 4000ms | 12x faster |
Trade-offs:
- β Bounded memory (L1 always ~500 tokens)
- β Semantic selectivity (focus on what matters)
- β True parallelism (agents work independently)
- β Higher complexity (5 interconnected subsystems)
- β Cold start (needs warm-up for optimal performance)
- β Parameter tuning (20+ sensitive settings)
When to use:
- Long-running conversations (10+ turns)
- Multiple agents collaborating
- Need fine-grained memory control
- Building production multi-agent systems
When NOT to use:
- Single-turn Q&A
- Using API-based models (GPT-4, Claude)
- Simplicity is more important than optimization
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β AgentOS β
β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Reasoning Kernel (RK) β β
β β Contextual Transition: π(Sβ, πβddα΅£) β Sβββ β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β S-MMU (Semantic Memory Management Unit) β β
β β βββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β L1 Cache β L2 RAM β L3 Storage β β β
β β β (Active) β (Deep Ctx) β (Knowledge) β β β
β β β KV-Cache β Vector DB β RAG Systems β β β
β β βββββββββββββββββββββββββββββββββββββββββββββββ β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Cognitive Scheduler β β
β β Optimizes for Cognitive Fidelity, not CPU time β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Multi-Agent Sync (CSP) β β
β β Cognitive Sync Pulses for temporal coherence β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β All 6 Phases Complete - Full multi-agent system with semantic memory, sync, and metrics
π ISSUES.md - 10 prioritized improvement items for production readiness
π docs/comparison.md - AgentOS vs Traditional: Unbiased analysis
π docs/ - Component documentation and explanations
- Python: 3.10 or later (for modern type hint syntax)
- PyTorch: 2.0+ with MPS (Mac M1/M2) or CUDA support
- Local LLM: Qwen2.5-0.5B-Instruct or similar (auto-downloaded)
# Clone repository
git clone https://github.com/yourusername/agentos.git
cd agentos
# Install (requires pip 21.3+)
pip install -e .
# Or for development
pip install -e ".[dev]"Note: Editable install requires pip 21.3 or later. For older versions:
# Alternative: Use PYTHONPATH
export PYTHONPATH="${PYTHONPATH}:$(pwd)/src"
python -m agentos.cli --generateRun the AgentOS CLI for an interactive multi-agent experience:
After installation:
# Fast mode (placeholder responses, ~8s startup)
agentos
# Full mode (actual LLM generation, ~40s startup)
agentos --generate
# Custom model
agentos --generate --model Qwen/Qwen2.5-0.5B-InstructCLI Commands:
/help- Show help/agents- List all agents/stats- Show system statistics/memory- Show memory utilization/sync- Trigger manual sync/quitor/exit- Exit
Sample CLI Output:
$ agentos --generate
Initializing AgentOS...
Ready! 2 agents loaded.
LLM Generation: ENABLED
============================================================
AgentOS CLI - Interactive Multi-Agent System
============================================================
Commands:
/help - Show this help message
/agents - List all agents
/stats - Show system statistics
/sync - Trigger manual sync
/memory - Show memory utilization
/quit or /exit - Quit the application
Just type your message and agents will respond!
You> What is consciousness?
Processing: What is consciousness?
----------------------------------------
Agent Contributions:
----------------------------------------
Researcher (researcher):
This requires LLM generation. Placeholder: Consciousness is
the state of being aware of and responsive to one's surroundings.
Critic (critic):
This requires LLM generation. Placeholder: The concept of
consciousness remains one of philosophy's deepest mysteries.
----------------------------------------
Final Synthesis:
----------------------------------------
This requires LLM generation. Placeholder: The synthesis of these
perspectives reveals that consciousness encompasses both subjective
experience and objective awareness.
Duration: 150ms | Sync pulses: 1
You> /stats
System Statistics:
----------------------------------------
Uptime: 25.3s
Agents: 2
Memory:
L1 Cache: 45.2%
L2 RAM: 38.1%
L3 Storage: 8 slices
Cognitive Drift:
Average: 0.125
Max: 0.187
Sync pulses: 3
You> /memory
Memory Hierarchy:
----------------------------------------
L1 Cache (Active Attention Window):
Utilization: 45.2%
Tokens: 231/512
Slices: 3
L2 RAM (Deep Context):
Utilization: 38.1%
Tokens: 780/2048
Slices: 12
L3 Storage (Knowledge Base):
Slices: 8
Total size: 2048 bytes
Page Table:
L1 entries: 3
L2 entries: 12
L3 entries: 8
You> /quit
Goodbye!
Note: This is a research prototype requiring local models for attention access.
from agentos import AgentOS, create_agentos
from agentos.scheduler import ThreadPriority
# Create system
system = create_agentos()
# Spawn specialized agents
researcher = system.spawn_agent("Alice", "researcher", ThreadPriority.HIGH)
writer = system.spawn_agent("Bob", "writer", ThreadPriority.NORMAL)
analyst = system.spawn_agent("Charlie", "analyst", ThreadPriority.NORMAL)
# Collaborative task
result = system.collaborate("Analyze the differences between AI and human cognition")
for agent_id, contribution in result.agent_contributions.items():
agent = system.get_agent(agent_id)
print(f"{agent.config.name}: {contribution}")pytestruff check src/
ruff format src/
mypy src/pre-commit installagentos/
βββ src/agentos/
β βββ kernel/ # Reasoning Kernel with semantic slicing
β βββ memory/
β β βββ slicing/ # Semantic Slicing (CID, boundaries)
β β βββ tiers/ # L1/L2/L3 memory tiers
β βββ scheduler/ # Cognitive Scheduler & RCB
β βββ sync/ # Multi-agent CSP & DSM
β βββ io/ # Interrupt handling & peripherals
β βββ synthesis/ # Semantic synthesis for multi-agent output
β βββ cli.py # CLI module with app() entry point
β βββ eval/ # Metrics and visualization
β
βββ examples/
β βββ semantic_slicing_demo.py
β βββ memory_hierarchy_demo.py
β βββ scheduler_demo.py
β βββ multi_agent_sync_demo.py
β βββ metrics_demo.py
β βββ integration_demo.py # Full system
β βββ test_system.py # Quick test script
β
βββ docs/
β βββ comparison.md # AgentOS vs Traditional comparison
β βββ reasoning-kernel.md # Semantic slicing explained
β βββ memory-hierarchy.md # L1/L2/L3 memory management
β βββ scheduler-io.md # Cognitive scheduling
β βββ multi-agent-sync.md # Synchronization & CSP
β βββ evaluation-metrics.md # Metrics and measurement
β βββ integration.md # Full system overview
β
βββ tests/ # Test files
βββ ISSUES.md # Improvement roadmap (10 prioritized issues)
βββ LICENSE # MIT License
βββ pyproject.toml # Project configuration
βββ README.md # This file
- Does attention-based slicing actually work? - Validate paper's core claim
- What's the optimal Ξ΅ threshold? - Paper leaves this as dynamic
- At what scale does CSP overhead > benefit? - Find "Cognitive Collapse Point"
- Can we achieve linear scalability? - Paper's claim about schema-based reasoning
| Metric | Symbol | Description |
|---|---|---|
| Cognitive Latency | Lκ | Time from interrupt to stable state |
| Contextual Utilization | Ξ· | Information-gain tokens / total tokens |
| Sync Stability | Ξ | Probability of maintaining unified state |
- Paper - Architecting AgentOS
- MemGPT - LLMs as Operating Systems
- AIOS - LLM Agent Operating System
- FlashAttention - Fast attention
MIT License - see LICENSE for details.
Based on research by ChengYou Li, XiaoDong Liu, XiangBao Meng, and XinYu Zhao.