EvoCode Project

LLM Code Evaluation Framework with test-driven feedback loops.

Components

judge0-setup/ - Local Judge0 code execution sandbox (Docker-based)
evocode/ - LLM evaluation framework with Streamlit UI

Quick Start

Prerequisites

Docker and Docker Compose
Python 3.10+
An OpenAI-compatible LLM server (e.g., LM Studio on localhost:1234)

Setup

# 1. Clone the repo
git clone <repo-url>
cd sandbox

# 2. Create Python virtual environment
cd judge0-setup
python3 -m venv venv
./venv/bin/pip install requests
./venv/bin/pip install -r ../evocode/requirements.txt

# 3. Start Judge0 (pulls Docker images on first run)
./start.sh

# 4. Initialize EvoCode database
cd ../evocode
../judge0-setup/venv/bin/python scripts/init_db.py

# 5. Start your LLM server (e.g., LM Studio on localhost:1234)

# 6. Run EvoCode UI
../judge0-setup/venv/bin/streamlit run ui/app.py

Usage

Open http://localhost:8501 in your browser
Go to Settings and add your LLM model endpoint
Go to Run Evaluation to test LLMs on coding challenges
View results in Dashboard and Model Comparison

CLI Usage

cd evocode

# List challenges
../judge0-setup/venv/bin/python scripts/run_cli.py --list-challenges

# Run evaluation
../judge0-setup/venv/bin/python scripts/run_cli.py fizzbuzz

Managing Judge0

cd judge0-setup
./start.sh    # Start services
./stop.sh     # Stop services
./restart.sh  # Restart services
./status.sh   # Check status

How It Works

Select a coding challenge and LLM model
LLM generates code to solve the problem
Judge0 executes the code against test cases
If tests fail, feedback is provided and LLM tries again
Continues until all tests pass or max attempts reached
Results are stored and can be compared across models

Project Structure

sandbox/
├── judge0-setup/          # Judge0 Docker setup
│   ├── docker-compose.yml
│   ├── judge0.conf
│   ├── start.sh / stop.sh / restart.sh / status.sh
│   ├── verify_judge0.py
│   └── venv/              # Python virtual environment
├── evocode/               # Evaluation framework
│   ├── core/              # Core modules (llm, judge, evaluation)
│   ├── storage/           # SQLite database
│   ├── ui/                # Streamlit pages
│   ├── challenges/        # YAML challenge definitions
│   └── scripts/           # CLI tools
├── CLAUDE.md              # Instructions for Claude Code
└── README.md              # This file

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
evocode		evocode
judge0-setup		judge0-setup
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
research.md		research.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EvoCode Project

Components

Quick Start

Prerequisites

Setup

Usage

CLI Usage

Managing Judge0

How It Works

Project Structure

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

EvoCode Project

Components

Quick Start

Prerequisites

Setup

Usage

CLI Usage

Managing Judge0

How It Works

Project Structure

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages