You can expect posts about:
As a PhD student working on NLP and ML, I often come across interesting ideas, debugging stories, and “aha!” moments that I think could be valuable to share. This blog will serve as both a personal record and hopefully a resource for others in the field.
I’m particularly excited to write about:
I’ll be posting regularly about my research adventures. Feel free to reach out if you have any questions or topics you’d like me to cover!

Happy researching!
]]>Attribution, in the context of language models, refers to the process of identifying which parts of the training data, model components, or input contributed most to a particular prediction. Think of it as asking: “Why did the model produce this specific output?”
There are several flavors of attribution that researchers work on:
From my research experience, I’ve found that attribution is crucial for:
Working on attribution research has taught me that there are several fundamental challenges:
The field is rapidly evolving with exciting developments in:
Here’s a quick example of how you might compute simple gradient-based attribution:
import torch
def compute_input_attribution(model, input_ids, target_token_id):
"""
Compute gradient-based attribution for input tokens.
"""
# Enable gradients for input embeddings
embeddings = model.get_input_embeddings()(input_ids)
embeddings.requires_grad_(True)
# Forward pass
outputs = model(inputs_embeds=embeddings)
logits = outputs.logits
# Get probability for target token
target_prob = torch.softmax(logits[0, -1], dim=-1)[target_token_id]
# Backward pass
target_prob.backward()
# Attribution is the gradient magnitude
attribution = embeddings.grad.norm(dim=-1)
return attribution.detach()
This is just scratching the surface, but it gives you an idea of how we can start understanding what drives model predictions.
As language models become more powerful and ubiquitous, the need for robust attribution methods will only grow. I’m excited to continue working on making these models more interpretable and trustworthy.
What aspects of attribution are you most interested in? Feel free to reach out if you’d like to discuss any of these topics further!
This post is based on insights from my ongoing research on attribution methods. Stay tuned for more technical deep-dives!
]]>