Aidan Ewart
Updated 18/02/2024
Hi! I'm an undergrad studying maths at the University of Bristol, and I do independent research into ML safety in my free time. My current main interests are in machine learning model interpretability and alignment robustness.
Publications
Sparse Autoencoders Find Highly Interpretable Features in Language Models
Demonstrates an unsupervised method for finding human-understandable decompositions of LM activations.
ICLR 2024, ATTRIB Workshop @ NeurIPS 2023Eight Methods to Evaluate Robust Unlearning in LLMs
Rigorously evaluates the machine unlearning done in Eldan and Russinovich (2023).
Interesting Projects
- An Open Type Theory
- A Compiler for an ML-Style Language
- A Compiler for a Haskell-Style Language
- A Theorem-Proving 'Extension' for Lua
- A Compiler for AQA Pseudocode to AQA Assembly
- A Reverse Polish Notation Calculator (plus s)