Update: I'm currently on leave from Cambridge to work as a student researcher at Google DeepMind, focusing on character / model specs.
Hey! I'm a second year PhD student in the Language Technology Lab at Cambridge, where I work on aligning AI assistants. I'm supervised by Anna Korhonen and Ramit Debnath.
I want the future to go well because of AI, not in spite of it, so I work on problems I think will bring us closer to that future. Currently, my main research focus is on the character and value systems of AI assistants. I work on shaping these qualities during post-training - this is called character training. This work began last year during my time at MATS, where I was mentored by Evan Hubinger and Nathan Lambert. See our paper, and stay tuned for upcoming work. I'm very excited about open-source work in this area, so please get in touch if you'd like to get involved!
AI assistants can also be unintuitive to talk to - in some ways they are much better at inhabiting fiction than we are. I think humans should have tools and techniques that help us avoid being deceived by them (unintentionally or not), and this is something I work on at Cadenza Labs. We're currently running an AI lie-detection competition - consider participating if this sounds fun!
You can reach me at sm2783[at]cam[dot]ac[dot]uk or through the links below.