Feynman AI Research Agent: Verifies Its Own Citations (2026)

Contents

Best for: Researchers, academics, developers, and anyone who reads papers regularly and wants AI to handle the grunt work of searching, cross referencing, and verifying sources.

Not ideal for: Non-technical users who aren’t comfortable with a terminal. Feynman is a CLI tool, not a web app.

Feynman is an open source AI research agent that does something no other tool in 2026 is doing. While every AI tool wants to help you write faster, build faster, ship faster, almost none of them want to help you think better.

Almost none of them want to help you think better.

(Note: this is not Opennote’s Feynman-3 or MIT’s AI Feynman symbolic regression project. This is a completely different tool by Companion AI, built for a completely different purpose.)

Feynman is an open source AI research agent that runs in your terminal and does something genuinely different from everything else in the agent space right now. You give it a topic. It searches academic papers, synthesizes findings across multiple sources, verifies every claim against real citations, and hands you a structured research brief with working links to everything it referenced.

Not a chatbot. Not a summary tool. A multi-agent research system that dispatches four specialized agents in parallel, each handling a different part of the research process. The kind of work that takes a graduate student an afternoon takes Feynman about 30 seconds.

The project hit 2,300+ GitHub stars within days of launching. The announcement tweet got 1,708 likes and 2,768 bookmarks from an account with 1,400 followers. The bookmark count tells the real story: people aren’t just liking it. They’re saving it to come back to. And right now, nobody else has covered it.

What Feynman Actually Does

The core idea is simple: you type a research question into your terminal and get back a cited, structured answer built from real sources. But the way it gets there is where it gets interesting.

Feynman runs four subagents automatically for every query:

Researcher pulls evidence from academic papers, the open web, GitHub repositories, and documentation. It’s not searching Google and summarizing the first page of results. It’s querying alphaXiv (an academic paper search engine) and cross referencing what it finds across multiple source types. The agent follows strict integrity constraints: it never fabricates a source, never claims a project exists without checking, and requires a verifiable URL for every citation.

Reviewer runs a simulated peer review on the findings. It grades feedback by severity and flags weak claims, missing context, or contradictions between sources. This is the part most AI tools skip entirely. They give you an answer. Feynman gives you an answer and then tells you what’s wrong with it.

Writer takes the research notes and produces structured, paper style output. Literature reviews, research briefs, summaries with clear sections for consensus, disagreements, and open questions.

Verifier checks every citation in the output. Dead links get killed. Claims that don’t match their cited source get flagged. This is the difference between “AI research” that hallucinates citations (which is most of them) and research you can actually trust enough to use.

The system is built on Pi for the agent runtime and alphaXiv for paper search and analysis. Every output is source grounded. Claims link to papers, docs, or repos with direct URLs.

Feynman’s Research Commands

You can talk to Feynman in plain English or use specific commands:

feynman "what do we know about scaling laws" gives you a cited research brief.

feynman deepresearch "mechanistic interpretability" triggers the full multi-agent investigation with parallel researchers, synthesis, and verification. This is the heavy mode.

feynman lit "RLHF alternatives" produces a literature review with consensus views, disagreements between researchers, and open questions nobody has answered yet.

feynman audit 2401.12345 takes an arXiv paper ID, pulls the paper’s claims, then compares them against the actual public codebase. This is genuinely unique. Most people reading papers have no way to quickly check whether the code matches what the paper says it does. Feynman automates that entire process.

feynman replicate "chain-of-thought improves math" goes even further: it attempts to replicate experiments on local or cloud GPUs via Modal or RunPod.

Each of these would be a meaningful tool on its own. Feynman bundles all of them into one CLI with a single install command.

Feynman’s Hidden Feature Most People Miss

There’s one more command worth knowing about that doesn’t show up in the launch tweet.

feynman autoresearch "transformer efficiency" starts an autonomous research loop. You give it a topic, it runs continuous investigation cycles without you babysitting it. Each cycle refines the previous findings, adds new sources, and deepens the analysis. Walk away, come back, and the research brief has evolved.

This is the feature that separates Feynman from a tool you use once from a tool that works while you’re not looking. The same pattern that made OpenClaw and Hermes Agent compelling: agents that operate independently between sessions. Feynman applies it to research instead of task automation.

How to Install Feynman

One command:

curl -fsSL https://feynman.is/install | bash

Requires Node.js 20.19 or newer. The installer handles everything else. Full documentation is at feynman.is.

You’ll need an API key from an LLM provider (the agents need a brain to reason with) and optionally an alphaXiv account for deeper paper search capabilities. Multiple providers are supported including Anthropic, OpenAI, Gemini, and Perplexity for web search.

If you don’t want the full terminal app and just want the research skills, you can install them directly into Claude Code or Codex:

## For Claude Code
npx feynman install --target claude-code

## For Codex
npx feynman install --target codex

That drops the skill library into your agent’s skills directory. Now your existing coding agent can also do research. The skills are just markdown files following the same pattern as OpenClaw skills and the broader agent skills ecosystem.

Who Built Feynman

Feynman comes from Companion AI. The project is MIT licensed with 87 commits and 2,300+ stars as of this writing. It’s actively shipping with a full documentation site at feynman.is and a changelog that shows consistent development momentum.

The team also built Pi, the underlying agent runtime that powers Feynman. This means the agent framework and the research tool were designed together, not bolted onto each other as an afterthought. That integration shows in how cleanly the four subagents coordinate. There’s no janky handoff between steps. The Researcher findings flow directly into the Reviewer’s analysis which feeds the Writer’s output which gets checked by the Verifier. One pipeline, four agents, zero manual intervention.

How Feynman Compares to Other Research Tools

Feature	Feynman	ChatGPT	Perplexity	Elicit
Citation verification	Automated (Verifier agent)	None (hallucinates)	Links but no verification	Paper-level only
Multi-agent research	4 specialized agents	Single model	Single model	Single model
Paper auditing (code vs claims)	Yes	No	No	No
Self-hosted / open source	Yes (MIT license)	No	No	No
Cost	Free + API costs	$20/mo	$20/mo	$10/mo
Autonomous research loops	Yes (autoresearch)	No	No	No
Best for	Deep academic research with verification	General questions	Sourced web search	Paper discovery

The obvious question: why not just use ChatGPT, Perplexity, or Elicit?

ChatGPT is fast and conversational but it hallucinates citations constantly. Ask it for a research brief and you’ll get confident references to papers that don’t exist. There’s no verification layer. You have to check every source yourself, which means you’re doing the research anyway.

Perplexity is better at sourcing. It shows you where its answers come from and links to real pages. But it’s a search tool, not a research tool. It answers questions. It doesn’t synthesize findings across multiple papers, identify disagreements between researchers, or flag weak methodology. It gives you information. It doesn’t give you analysis.

Elicit is the closest competitor. It’s built for academic research, searches real papers, and extracts structured data from studies. But it’s a web app with a subscription model, not an open source CLI you can run locally. You also can’t extend it with custom skills, audit papers against their codebases, or plug it into your existing Claude Code workflow.

Feynman sits in a different category because of the four agent architecture. The Researcher finds sources. The Reviewer critiques them. The Writer structures the output. The Verifier checks every citation. No other tool runs all four steps automatically. Most tools stop at step one and call it done.

The paper audit command (feynman audit) has no equivalent anywhere. The gap between what researchers claim in papers and what their code actually does is a well documented problem in machine learning. Feynman is the first tool that automates checking for it.

What Happens When You Actually Use Feynman

The install takes about two minutes. Run the one liner, set your API key, and you’re in a REPL that accepts natural language queries.

A basic query like feynman "what do we know about scaling laws" returns a structured brief in about 30 seconds. The output includes a summary, key findings organized by theme, source URLs for every claim, and a confidence assessment on each finding. The citations are real. We spot checked a dozen and every link resolved to an actual paper or document.

The deep research mode takes longer (a few minutes depending on the topic) but produces something closer to a proper literature review. Multiple researchers run in parallel, each tackling a different angle, then the findings get synthesized into a single document with consensus views, disagreements, and open questions clearly separated.

Where it gets interesting is the /autoresearch command. You give it a topic and it runs continuous investigation cycles autonomously. Each cycle refines the previous findings, adds new sources, and deepens the analysis. Walk away, come back, and the research has evolved on its own. Same pattern that makes OpenClaw and Hermes Agent compelling: agents that keep working between sessions.

Where it struggles: niche topics with limited published research. If there aren’t many papers or web sources on your query, Feynman doesn’t have much to work with and the output gets thin. It’s also CLI only, which means no visual interface, no drag and drop, no collaboration features. This is a solo researcher’s tool, not a team platform.

Why Feynman Matters Right Now

The AI agent space has been dominated by tools that do things: send emails, manage calendars, write code, automate workflows. OpenClaw does things. Claude Cowork does things. Hermes Agent does things and remembers what it learned.

Feynman is the first serious open source agent built specifically for understanding things.

That distinction matters because the biggest problem with AI research tools right now isn’t capability. It’s trust. ChatGPT will confidently cite papers that don’t exist. Perplexity will summarize articles it didn’t fully read. Every AI search tool has the same fundamental weakness: you can’t verify the output without doing the research yourself, which defeats the entire point.

Feynman’s architecture addresses this directly. The Verifier agent exists for one reason: to catch the lies before they reach you. The paper audit command exists because the gap between what researchers claim in papers and what their code actually does is a known, widespread problem that nobody had automated a solution for.

Feynman: The Honest Take

Feynman is still early. The community is growing fast (2,300+ stars in the first week is serious traction) but it’s not battle tested at the scale of something like OpenClaw or Hermes Agent.

What it is: the most interesting research agent architecture released this year. The four agent pipeline (search, review, write, verify) is how research should work when AI is involved. The paper audit against actual codebases is a feature that should have existed years ago. And the fact that it installs as skills into Claude Code or Codex means it doesn’t have to replace your existing workflow. It just makes it smarter.

The 2,768 bookmarks on a 1,400 follower account tell you what the market thinks. People want this. They’ve been waiting for an AI research tool they can actually trust. The star growth in the first week suggests they’re not just bookmarking. They’re installing.

The bet Feynman is making is that research agents need a verification layer baked into the architecture, not bolted on after the fact. Nobody else is making that bet yet. And that’s usually the bet worth paying attention to.

Feynman AI FAQ: Installation, Models and Research Commands

What is Feynman AI?

Feynman is an open source AI research agent built by Companion AI. It runs in your terminal and uses four specialized subagents to search academic papers, run simulated peer review, write structured research briefs, and verify every citation against its actual source. It is MIT licensed and free to use with your own API key from Anthropic, OpenAI, or Google.

Is Feynman the same as Opennote’s Feynman-3?

No. There are at least five different AI products using the name Feynman. This is the open source CLI research agent by Companion AI (getcompanion-ai on GitHub), not Opennote’s Feynman-3, MIT’s AI Feynman symbolic regression project, or any other product sharing the name.

Does Feynman require coding experience?

You need basic comfort with a terminal since Feynman is a command line tool. The install is a single command (curl -fsSL https://feynman.is/install | bash) and queries use natural language, but there is no web interface or graphical UI. If you can open a terminal and paste a command, you can use Feynman.

What AI models does Feynman use?

Feynman supports multiple LLM providers including Anthropic (Claude), OpenAI, Google (Gemini), and Perplexity for web search. You supply your own API key. The four subagents (Researcher, Reviewer, Writer, Verifier) all use whichever model you configure, so you can choose based on cost and quality preferences.

Can I use Feynman with Claude Code?

Yes. You can install Feynman’s research skills directly into Claude Code with npx feynman install –target claude-code. This gives your existing coding agent research capabilities using the same SKILL.md pattern as other Claude Code skills, without installing the full Feynman terminal app.

OpenClaw: The Complete 2026 Deep Dive (Install, Cost, Hardware, Real Reviews & More)

Agent Skills Marketplace (Skills.sh): The App Store for AI Agents Has a Malware Problem

4chan Figured Out AI Reasoning Two Years Before Google’s Research Team Did

ChatGPT vs Claude in 2026: I Paid for Both. Here’s the Honest Verdict.

Your AI Thinks You’re Right. Even When You’re Not.

Stanford’s 2026 AI Report Card Is Out and Nobody Looks Great

Feynman: The AI Research Agent That Does in Seconds What Takes PhDs Hours