A small, reinforcement learning project: train PPO on CartPole, evaluate with baselines + confidence intervals, and keep runs reproducible (CI included).
- Demo
- Results (example)
- Skills / signals
- Tech stack
- Quickstart
- Reproducibility
- Repo structure
- Future plans
Generated via multi-seed evaluation (20 episodes/seed × 3 seeds) and exported to CSV/Markdown.
| Policy | Mean Return | Std | 95% CI | Success Rate |
|---|---|---|---|---|
| Baseline: random | 22.95 | 11.29 | [20.01, 25.89] | 0.00% |
| Baseline: do_nothing | 9.33 | 0.79 | [9.13, 9.54] | 0.00% |
| Trained PPO | 500.00 | 0.00 | [500.00, 500.00] | 100.00% |
Artifacts: outputs/comparisons/results.md, outputs/comparisons/results.csv
- RL training loop + evaluation discipline (baselines, multi-seed, confidence intervals)
- Reproducibility (configs saved per run, deterministic seeds)
- ML/SWE hygiene (CI, tests, formatting, artifacts)
- Gymnasium (classic-control), Stable-Baselines3 (PPO), PyTorch
- Experiment artifacts: JSON metrics + TensorBoard
- Evaluation: baselines + multi-seed rollouts + 95% confidence intervals
- Python 3.8+
- FFmpeg (required only for video/GIF recording via
moviepy/imageio[ffmpeg])
python -m venv .venv
# Windows: .venv\Scripts\activate
# macOS/Linux: source .venv/bin/activate
# Install (matches CI)
python -m pip install --upgrade pip
pip install -e ".[dev]"
# 1) Train (writes outputs/runs/<timestamp>_<env>_seed<seed>_<git-hash>/)
python -m src.train --config configs/default.yaml
# 2) Compare vs baselines (writes outputs/comparisons/results.*)
python -m src.compare --model-path outputs/runs/<run-dir>/model.zip --episodes 20 --num-seeds 3 --seed 123
# 3) View training curves in TensorBoard
tensorboard --logdir outputs/runs/<run-dir>/tensorboard
# 4) Record a short demo GIF (writes videos/ and/or assets/ depending on your config)
python -m src.record_video --model-path outputs/runs/<run-dir>/model.zip --gifOptional: Multi-seed evaluation with saved results
python -m src.evaluate --model-path outputs/runs/<run-dir>/model.zip --episodes 20 --num-seeds 5 --save-results- Re-run an existing experiment from its saved config:
python -m src.reproduce outputs/runs/<run-dir>- Validate determinism and core behaviors via tests:
pytest -qsrc/- core scripts:train,evaluate,compare,reproduce,record_video, plus shared utilities incommonconfigs/- YAML configs (default.yaml,quick_test.yaml)outputs/runs/- per-run artifacts (config, model, metrics, TensorBoard logs)outputs/comparisons/- exported results tables (CSV/Markdown)assets/,videos/- demo media
More details (tracking, evaluation, reproducibility)
- Per-run directories under
outputs/runs/with savedconfig.json, metrics, and TensorBoard logs. - Multi-seed evaluation and baseline comparisons with 95% confidence intervals.
- Reproduce any run from its saved config via
python -m src.reproduce outputs/runs/<run-dir>.
Developer Guide (optional)
pip install -e ".[dev]"
rl-train --config configs/default.yaml
rl-evaluate --model-path outputs/runs/<run-dir>/model.zip
rl-compare --model-path outputs/runs/<run-dir>/model.zip
rl-reproduce outputs/runs/<run-dir>
rl-video --model-path outputs/runs/<run-dir>/model.zip# Quality
make lint
make format
make test
# Training / evaluation
make train-quick
make evaluate
make compare- CI runs lint + tests on Python 3.8–3.10 across Linux/Windows/macOS.
- Outputs are written under
outputs/(runs, TensorBoard logs, and comparison exports).
- Add a second environment (
Acrobot-v1), compare training curves - Add hyperparameter sweep with results table
