Work

I'm chief scientist and co-founder of Goodfire. We're an AI interpretability startup.

I was at DeepMind from 2019 to late 2023, where I worked on:

Interpretability for LLMs (e.g. the Hydra Effect, Copy Suppression) and AlphaZero.
Science of training data.
RLHF data quality and self-annotation.
Evaluation of generalist deep RL agents.

I did my PhD (thesis) at Imperial College with Nick Jones and Kevin Murphy.

Research

If there's something on there you're interested in collaborating on, please get in touch!

I have a substack if you prefer to read there.

Email is probably best, but you can reach me on Twitter or LinkedIn as well.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
README.md		README.md
Tom_McGrath_CV.pdf		Tom_McGrath_CV.pdf
_config.yml		_config.yml
research-projects.md		research-projects.md
safety_as_science.md		safety_as_science.md