Skip to content

alessoh/Autoresearch-Music

Repository files navigation

AutoResearch Music

Autonomous pretraining research on Irish traditional music using the IrishMAN dataset (216,284 tunes in ABC notation). This project adapts Karpathy's autoresearch framework to train a GPT-style language model on musical notation, letting an AI agent autonomously experiment with model architectures, hyperparameters, and training strategies to minimize validation bits-per-byte. Once training is complete, a generation script samples new Irish tunes from the trained model and renders them as playable sheet music in the browser.

What This Project Does

The pipeline works in three stages. First, it downloads the IrishMAN dataset from Hugging Face and trains a BPE tokenizer specialized for ABC notation. Second, it runs an autonomous experiment loop where Claude Code iteratively modifies the model architecture and hyperparameters, trains for exactly 5 minutes per experiment, evaluates the result, and keeps or discards the change based on whether validation loss improved. Third, a generation script loads the best model checkpoint, samples new ABC notation tunes, and produces an interactive HTML page with rendered sheet music you can view and listen to in any web browser.

The IrishMAN dataset contains Irish tunes transcribed in ABC notation, an ASCII-based musical notation system. Each tune includes structural control codes (section count, bar count, edit distance between sections) that the model learns to condition on. All tunes are in the public domain, collected from thesession.org and abcnotation.com under the MIT license.

Project Structure

prepare.py           Data download, tokenizer training, dataloading, evaluation (do not modify)
train.py             Model architecture, optimizer, training loop (agent modifies this)
generate.py          Sample ABC tunes from a trained checkpoint and render sheet music
program.md           Agent instructions for the autonomous experiment loop
run_autoresearch.sh  Launch script with Claude Code authentication fix
check_results.py     Utility to inspect experiment results, logs, and data status
pyproject.toml       Dependencies (PyTorch, kernels, rustbpe, tiktoken, etc.)
checkpoints/         Saved model weights (created automatically after training)
output/              Generated ABC notation and HTML sheet music
logs/                Training and launch logs

Step-by-Step Setup on Lightning.ai with H100 GPU

The following instructions walk you through the entire process from a fresh Lightning.ai studio to generated sheet music. Each step is written to be executed in sequence from a terminal session.

Step 1: Create a Lightning.ai Studio

Log in to lightning.ai and create a new Studio. Select the GPU tier that includes an H100 (typically listed as "GPU H100" under the compute options). Choose Ubuntu as the base image. Once the studio provisions and boots, open the terminal.

Step 2: Clone the Repository

cd ~
git clone https://github.com/alessoh/Autoresearch-Music.git
cd Autoresearch-Music

Step 3: Install the uv Package Manager

The project uses uv for Python version management and dependency resolution. Install it with the official installer and reload your shell profile so the command is available:

curl -LsSf https://astral.sh/uv/install.sh | sh
source ~/.bashrc

Verify the installation by running uv --version.

Step 4: Install Claude Code

Claude Code is the AI agent that runs the autonomous research loop. Install Node.js 20 and then Claude Code globally:

curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash -
sudo apt-get install -y nodejs
npm install -g @anthropic-ai/claude-code

You may see some DNS warnings about eu-north1.ec2.archive.ubuntu.com during the apt update. These are harmless as long as the fallback mirrors resolve, which they do on Lightning.ai.

Step 5: Set Your Anthropic API Key

Export your Anthropic API key and persist it in your shell profile. Replace the placeholder below with your actual key from console.anthropic.com:

export ANTHROPIC_API_KEY="sk-ant-api03-your-real-key-here"
echo 'export ANTHROPIC_API_KEY="sk-ant-api03-your-real-key-here"' >> ~/.bashrc

Verify it prints correctly:

echo $ANTHROPIC_API_KEY

If it prints nothing, the export did not take effect. Fix this before proceeding, as the autonomous launch script will refuse to start without a valid key.

Step 6: Pin Python Version and Install Dependencies

Lightning.ai studios ship with Python 3.14 by default, but PyTorch's torch.compile does not support Python 3.14 yet. Pin the project to Python 3.12 and install all dependencies:

sed -i 's/requires-python = ">=3.10"/requires-python = ">=3.10,<3.14"/' pyproject.toml
uv python install 3.12
uv venv --python 3.12
uv sync

This downloads PyTorch with CUDA 12.8 support, Flash Attention 3 kernels, the BPE tokenizer, and all other packages. It may take a few minutes on a fresh studio.

Step 7: Verify GPU Access

Confirm that the H100 is visible to PyTorch:

uv run python -c "import torch; print(torch.cuda.get_device_name(0))"

You should see "NVIDIA H100 80GB HBM3" in the output.

Step 8: Download and Prepare the IrishMAN Dataset

Run the data preparation script. This downloads the IrishMAN train and validation JSON files from Hugging Face (about 80 MB total), converts them into parquet shards, and trains a 4096-vocab BPE tokenizer specialized for ABC notation:

uv run prepare.py

Everything is stored in ~/.cache/autoresearch-music/. This takes about 1 to 2 minutes on an H100 instance.

Step 9: Run the Baseline Training Experiment

Run a single 5-minute training experiment to verify the full pipeline and establish a baseline score:

uv run train.py > run.log 2>&1
grep "^val_bpb:\|^peak_vram_mb:" run.log

You should see output like val_bpb: 0.862268 and peak_vram_mb: 42892.2. The model checkpoint is automatically saved to checkpoints/model_latest.pt at the end of the run.

Step 10: Generate Sheet Music

After the baseline run completes, you can immediately generate tunes and sheet music from the trained model:

uv run generate.py --num-tunes 8

This produces two files in the output/ folder:

  • output/generated_tunes.html — interactive sheet music rendered with the abcjs library, one card per tune
  • output/generated_tunes.abc — raw ABC notation text

The generation script cycles through different structural prompts to produce variety: free generation, two-section tunes, single-section tunes, tunes with similar sections, tunes with contrasting sections, and so on. All tunes come from the same model checkpoint but with different structural control codes as starting prompts.

You can customize generation with various flags:

uv run generate.py --temperature 0.7 --num-tunes 10         # more conservative/coherent
uv run generate.py --temperature 1.0 --num-tunes 5          # more creative/adventurous
uv run generate.py --prompt "S:2 B:8 E:8" --num-tunes 5     # two similar 8-bar sections
uv run generate.py --prompt "S:3 B:4" --num-tunes 5          # three short sections
uv run generate.py --prompt "S:1 B:16" --num-tunes 5         # one long section

Step 11: View the Sheet Music

There are several ways to see and hear the generated music on Lightning.ai.

The quickest method is to print the ABC notation in your terminal and paste it into an online player. Run cat ~/Autoresearch-Music/output/generated_tunes.abc, copy the output, and paste it into editor.drawthedots.com. You will see the sheet music rendered and can click play to hear it.

Alternatively, use the Lightning.ai file browser in the left sidebar of your studio. Navigate to Autoresearch-Music/output/ and click on generated_tunes.html to preview or download it. You can also serve the HTML locally by running python -m http.server 8080 from the output directory and using Lightning's port forwarding to open it in your browser.

Step 12: Launch the Autonomous Research Loop (Optional)

To let Claude Code autonomously optimize the model beyond the baseline, make the launch script executable and run it:

chmod +x run_autoresearch.sh
./run_autoresearch.sh

The agent reads program.md, creates a git branch (e.g. autoresearch/mar21), and begins the autonomous experiment loop. Each experiment takes about 5 minutes, so you get roughly 12 experiments per hour. For a shorter test run of about 10 experiments:

claude -p "Read program.md. The setup is already done. Start the experiment loop now. NEVER STOP." \
  --dangerously-skip-permissions \
  --max-turns 100

For an overnight session of about 100 experiments:

claude -p "Read program.md. The setup is already done. Start the experiment loop now. NEVER STOP." \
  --dangerously-skip-permissions \
  --max-turns 1000

After the autonomous loop finishes, generate new tunes from the improved model:

uv run generate.py --num-tunes 8

Check experiment results anytime with:

cat results.tsv
python check_results.py --best

Optional: Compare Baseline vs Optimized Tunes

To hear how much the agent's experiments improved the model, generate tunes from both the original baseline and the optimized version. First, save the optimized checkpoint, then rerun baseline training and generate from both:

cd ~/Autoresearch-Music
mv checkpoints/model_latest.pt checkpoints/model_optimized.pt
git stash
uv run train.py > run_baseline.log 2>&1
mv checkpoints/model_latest.pt checkpoints/model_baseline.pt
git stash pop
mv checkpoints/model_optimized.pt checkpoints/model_latest.pt
uv run generate.py --checkpoint checkpoints/model_baseline.pt --output-html output/baseline_tunes.html --output-abc output/baseline_tunes.abc
uv run generate.py --checkpoint checkpoints/model_latest.pt --output-html output/optimized_tunes.html --output-abc output/optimized_tunes.abc

This takes about 10 minutes (two 5-minute training runs). You can then compare baseline_tunes.html against optimized_tunes.html to hear the difference the autonomous optimization made.

The -p Problem Fix

Claude Code v2.1.x has a confirmed bug where interactive mode ignores the ANTHROPIC_API_KEY environment variable and always forces the browser-based OAuth login flow. This fails in headless environments like Lightning.ai GPU instances where no browser is available.

The root cause is that Claude Code's interactive session handler has a hard-coded preference for OAuth tokens. Even when a valid API key exists in the environment, the interactive launcher bypasses it.

The fix, implemented in run_autoresearch.sh, uses claude -p (print/headless mode) instead of claude (interactive mode). The -p flag activates the non-interactive pipeline that correctly reads ANTHROPIC_API_KEY from the environment and authenticates via the API directly. The agent retains all tool-use capabilities including file editing, shell commands, and code execution. The only difference is the absence of the interactive conversational UI, which is not needed for autonomous research.

Configuration Reference

Parameters in prepare.py (data preparation, do not modify during experiments):

Parameter Value Purpose
MAX_SEQ_LEN 2048 Context window length
TIME_BUDGET 300 Training time limit (5 minutes)
VOCAB_SIZE 4096 BPE vocabulary size (tuned for ABC notation)
EVAL_TOKENS 5,242,880 Tokens used for validation evaluation

Parameters in train.py (these are what the agent experiments with):

Parameter Default Purpose
DEPTH 8 Number of transformer layers
ASPECT_RATIO 64 Model dimension equals depth times this value
HEAD_DIM 128 Attention head dimension
WINDOW_PATTERN "L" Full attention to capture musical repetition
TOTAL_BATCH_SIZE 524,288 Tokens per optimizer step
DEVICE_BATCH_SIZE 128 Micro-batch size per forward pass
MATRIX_LR 0.04 Learning rate for Muon optimizer
EMBEDDING_LR 0.6 Learning rate for token embeddings

Parameters for generate.py:

Flag Default Purpose
--num-tunes 5 Number of tunes to generate
--temperature 0.85 Sampling temperature (lower is more conservative)
--top-k 50 Top-k sampling filter
--top-p 0.95 Nucleus sampling threshold
--max-tokens 1024 Maximum tokens per tune
--prompt (empty) Control code to start generation
--checkpoint checkpoints/model_latest.pt Path to model weights

Dataset Details

The IrishMAN (Irish Massive ABC Notation) dataset contains 216,284 Irish tunes split 99/1 for training and validation. Each tune is prepended with structural control codes:

  • S: (sections) indicates how many sections the tune has, ranging from 1 to 8
  • B: (bars) specifies the number of bars per section, ranging from 1 to 32
  • E: (edit distance) quantifies similarity between consecutive sections, from 0 (no match) to 10 (exact match)

A typical training example looks like:

S:2 B:8 E:5 B:8
X:1 L:1/8 M:4/4 K:G |: GABc d2 Bd | cBAG FGAB | cdef g2 fg | edcB A2 AB :|

Troubleshooting

If torch.compile raises a RuntimeError about Python 3.14, you need to pin to Python 3.12. See Step 6 above.

If the launch script reports that ANTHROPIC_API_KEY is not set, the key did not persist from a previous terminal session. Re-export it with export ANTHROPIC_API_KEY="your-key" and try again.

If generate.py reports that no checkpoint was found, you need to run at least one training pass with the current train.py that includes checkpoint saving. Run uv run train.py > run.log 2>&1 and the checkpoint will appear in the checkpoints/ directory.

If the generated HTML sheet music does not render in the Lightning.ai preview, download the file to your local machine and open it in Chrome or Firefox. The abcjs library requires a modern browser with JavaScript enabled.

License

MIT

About

Autoresearch project for database of sheet music

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors