Skip to content

Anjiang-Wei/CodeARC

Repository files navigation

CodeARC: Benchmarking Reasoning Capabilities of LLM Agents for Inductive Program Synthesis (COLM'25)

COLM 2025 arXiv License Python

HuggingFace HuggingFace

HuggingFace HuggingFace

Quick Start

Setting Up the Environment

  1. Create and activate a Conda environment:

    conda create -y -n CodeARC python=3.10.12
    conda activate CodeARC
  2. Install dependencies:

    pip install -r requirements.txt
  3. Set API keys: Ensure you have valid API keys for the required services:

    export OPENAI_API_KEY=<your_openai_api_key>
    export ANTHROPIC_API_KEY=<your_anthropic_api_key>
    export TOGETHER_API_KEY=<your_together_api_key>

Running Main Evaluation

python3 run.py --model_name openai/gpt-4o-mini --total_idx 20

We support OpenAI models (e.g., openai/gpt-4o), Anthropic models (e.g., anthropic/claude-3-7-sonnet-20250219), and models served by Together AI (e.g., meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo). For testing purposes, you can pass --total_idx 20 to limit evaluation to 20 problems instead of the full dataset (1114 problems). See run.py for additional configuration options.

To summarize results:

python3 src/compute_metrics.py

HuggingFace Dataset

The CodeARC datasets are hosted on HuggingFace:

Setting up HuggingFace Account

  1. Obtain an access token:

  2. Login using the token:

    Option A: Use the command line:

    huggingface-cli login
    huggingface-cli whoami

    Option B: Add the token to the environment variable:

    export HF_TOKEN=<your_huggingface_token>
    

Accessing Datasets via the HuggingFace datasets Library

You can directly load the datasets using the HuggingFace datasets library:

from datasets import load_dataset

# Define dataset paths
hf_problems_path = "anjiangwei/CodeARC-Problems"
hf_invocations_path = "anjiangwei/CodeARC-Invocations"

# Load datasets
problems_dataset = load_dataset(hf_problems_path)
invocations_dataset = load_dataset(hf_invocations_path)

# Example: Access the first training sample
print(problems_dataset["train"][0])
print(invocations_dataset["train"][0])

Citation

If our research inspires you, please cite our paper:

@inproceedings{wei2025codearc,
  title={Code{ARC}: Benchmarking Reasoning Capabilities of {LLM} Agents for Inductive Program Synthesis},
  author={Anjiang Wei and Tarun Suresh and Jiannan Cao and Naveen Kannan and Yuheng Wu and Kai Yan and Thiago S. F. X. Teixeira and Ke Wang and Alex Aiken},
  booktitle={Second Conference on Language Modeling},
  year={2025},
  url={https://openreview.net/forum?id=Q5pVZCrrKr}
}

License

This project is licensed under the Apache 2.0 License.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages