This repo contains two end-user tools for running repo-specific autoresearch with Codex:
skills/setup-autoresearch/SKILL.md: a skill that inspects a target repo, asks only high-risk follow-up questions, generates a repo-specificprogram.md, and scaffolds a minimal autoresearch harness when the repo lacks onecodex-autoresearch.sh: a shell supervisor that keeps Codex working on the same autoresearch task unattended
The design follows the original karpathy/autoresearch idea: the human defines the research operating rules in program.md, and the agent iterates within those rules.
The shell supervisor in this repo was also inspired by congwa/codex-autoresearch, which provided the original shell-script direction for wrapping long-running Codex sessions.
Use setup-autoresearch when you are in a target repository and need to create the first program.md, and possibly the minimal harness scripts and TSV ledger that autoresearch needs.
Use codex-autoresearch.sh either to start a new autoresearch session from a prompt, or to keep resuming an existing session with minimal supervision.
You can install the skill directly from this GitHub repository with npx skills:
npx skills add benyue1978/codex-autoresearch --skill setup-autoresearchFor Codex specifically, install it to the Codex skill directory:
npx skills add benyue1978/codex-autoresearch --skill setup-autoresearch -a codexThe skill is meant to be available to Codex in the target repository. Once available, ask Codex to use setup-autoresearch.
Example request:
Use the setup-autoresearch skill to inspect this repository and generate program.md.
What the skill is expected to do:
- inspect the current repo and infer as much as possible
- identify the setup, baseline command, authoritative measure, editable surface, and verification gates
- ask follow-up questions only when a wrong assumption would be costly
- decide whether the repo already has a usable autoresearch harness or needs a minimal scaffold
- after confirmation, generate
program.mdand only the missing harness pieces, such as thin scripts and a TSV ledger - stop and ask whether you want to start immediately or use the shell runner
The generated program.md should make the keep/discard rule explicit:
- create a candidate git commit before the authoritative experiment runs
- record every attempted experiment in the result ledger, including
keep,discard, andcrash - if the measure improves, keep the experiment commit as the new baseline
- if the measure stays the same with simpler logic, keep the experiment commit as the new baseline
- otherwise discard it by resetting git back to the previously kept baseline
When the skill finishes, one valid next prompt is:
read program.md and begin autoresearch
codex-autoresearch.sh is a long-running supervisor around codex exec.
It does four important things:
- starts with
codex execwhen given a prompt source, then continues withcodex exec resume - stores the latest Codex message and session metadata in a state directory
- prints the last few lines from Codex after each unfinished attempt so you can monitor progress live
- stops only when Codex returns the expected completion token and confirmation line
If the supervisor is terminated externally with signals like SIGTERM, SIGINT, or SIGHUP, it now logs the signal explicitly before exiting instead of leaving you with only a generic Terminated.
If Codex exits because a usage limit is exhausted, the supervisor now detects that from recent Codex stderr, backs off instead of blindly retrying, and sleeps until the parsed reset time when one is available. If no reset time can be extracted, it falls back to a longer retry delay.
To start a new autoresearch session, pass a prompt source. For example:
WORKDIR=/path/to/target-repo bash /path/to/codex-autoresearch.sh ./prompt.mdIf you prefer stdin:
printf 'read program.md and begin autoresearch\n' | WORKDIR=/path/to/target-repo bash /path/to/codex-autoresearch.sh -Typical prompt content:
read program.md and begin autoresearch
If you already know the Codex session id:
WORKDIR=/path/to/target-repo bash /path/to/codex-autoresearch.sh --session-id 11111111-2222-3333-4444-555555555555If you want the wrapper to target the last Codex session in the working directory, say so explicitly:
WORKDIR=/path/to/target-repo bash /path/to/codex-autoresearch.sh --lastResume target selection is explicit. For resume mode, provide either --session-id or --last.
If you want native Codex --full-auto behavior, use:
WORKDIR=/path/to/target-repo bash /path/to/codex-autoresearch.sh --full-auto ./prompt.mdIf you want Codex to run without human approval prompts, use:
WORKDIR=/path/to/target-repo bash /path/to/codex-autoresearch.sh --full-permission ./prompt.mdThis passes --dangerously-bypass-approvals-and-sandbox to Codex. Use it only when you are comfortable giving Codex unrestricted execution for that run.
--full-auto and --full-permission are mutually exclusive.
WORKDIR: target repository where Codex should runSTATE_DIR: directory for saved session state, logs, prompts, and last-message snapshotsINTERVAL: sleep time between unfinished attempts, default3MODEL: optional Codex model overridePROFILE: optional Codex profileMONITOR_LINES: how many trailing lines from the latest Codex message to print after each unfinished attempt, default3EXECUTION_MODE: optional fallback mode when no CLI flag is provided, one ofnormal,full-auto, orfull-permissionUSE_FULL_AUTO: whether to pass--full-autoby default,1unless overriddenDANGEROUSLY_BYPASS: legacy fallback for full-permission modeSKIP_GIT_REPO_CHECK: whether to pass--skip-git-repo-check
The shell writes state under STATE_DIR or a temporary directory if none is provided.
Useful files include:
events.jsonl: raw Codex JSON outputrunner.log: stderr from Codex invocationslast-message.txt: latest assistant messageattempt-####.last.txt: per-attempt snapshots of the last messagesession-id.txt: captured Codex session idinitial-prompt.txtandresume-prompt.txt: exact prompts sent by the supervisor
- open the target repository in Codex
- use
setup-autoresearchto generateprogram.md - review the inferred setup and confirmation question
- start the autoresearch task with
codex-autoresearch.sh ./prompt.md - if needed later, resume it with
codex-autoresearch.sh --session-id <uuid>orcodex-autoresearch.sh --last - monitor the live 3-line Codex preview while the run continues
- if Codex reports a usage limit, the supervisor should back off using the reset time from
events.jsonlwhen available, or a longer fallback delay otherwise
This repo currently includes shell tests for the setup skill contract and the supervisor behavior:
bash tests/run-all.shThe runner currently executes:
bash tests/setup-autoresearch-template.sh
bash tests/full-auto-flag.sh
bash tests/last-flag-resume.sh
bash tests/prompt-and-resume-are-mutually-exclusive.sh
bash tests/resume-flags-are-mutually-exclusive.sh
bash tests/prompt-mode-ignores-stale-session-id.sh
bash tests/stale-state-does-not-force-resume.sh
bash tests/rate-limit-reset-backoff.sh
bash tests/rate-limit-fallback-backoff.sh
bash tests/monitor-and-full-permission.sh
bash tests/resume-with-session-id.sh
bash tests/termination-signal-is-explicit.sh
bash tests/readme.sh