A Claude/Codex skill for ensembling gradient boosting models on large-scale tabular datasets, optimized for the Numerai competition.
Built from the Kaggle Grandmasters Playbook: 7 Battle-Tested Modeling Techniques for Tabular Data. I wanted to see how much of that accumulated wisdom could be distilled into a skill that Codex CLI could actually execute on.
Generated using Claude Opus 4.5 with Anthropic's official skill-creator.
Claude recommended embedding Numerai specializations (era-based validation, embargo handling, multi-target ensembling) in the skill instead of the prompt.
The skill assumes a Colab Pro+ environment with an A100-80GB GPU. I wanted to give the agent unfettered runway to go deep (30K trees on GPU frameworks). Adjust these assumptions if you're working with different hardware.
If you are using your ChatGPT subscription for codex instead of an API key, you will need to copy the ~/.codex/auth.json from your local computer to colab after installing codex cli in the colab terminal.
The skill and prompt expect a preprocessed dataset:
- Merged
train.parquetandvalidation.parquetinto a single file - Eras 0200–1000 only
- Six target columns
You'll need to generate a similar data file or modify the skill to work with the official Numerai data directly.
This isn't a finished, ready-to-use project. It's an experiment on what AI skills can accomplish. That said, you should be able to copy the tabular-ml-modeling folder into .codex/skills/ or .claude/skills/ and reference it in your prompt. The included numerai_prompt.md shows how I ran the task. And yes, the prompt was also generated by Claude.