I am a final year Ph.D. candidate at Mila & Université de Montréal, advised by Ioannis Mitliagkas.
Most recently, I was a visiting researcher at Meta Super Intelligence Labs (FAIR) working on pretraining large language models with the memory and generalization team.
The central theme of my research is understanding and improving how machine learning systems generalize to novel tasks and environments. My work spans causal representation learning for robustness under distribution shifts (1), as well as modern paradigms such as in-context learning (2) and large-scale pretraining (3). Going forward, I am broadly interested in the following research directions for improving the capabilties and relaibility of foundation models.
- Novel approaches for pretraining. I am interested in pretraining strategies that help language models learn richer representations and improve long-horizon reasoning & planning. A direction I find especially promising is data-constrained pretraining, where better objectives, architectures, and synthetic data may become increasingly important as compute scales faster than the supply of high-quality data.
- Reusable skills for continual learning. I am interested in approaches for discovering reusable skills/strategies from reasoning traces that can help "amortize" the reasoning process. I am especially interested in exploring how to consolidate skills over time, enabling efficient adaptation to new tasks and self-improvement.
- Causal approaches for alignment and safety. I am interested in alignment methods that move beyond spurious correlations and better capture the underlying intent and causal structure. In particular, I am excited by causal approaches for reward design and concept learning that entail better understanding and realiable steering of LLM behavior.
My research is supported by the FRQNT doctoral fellowship, and I am deeply grateful for the amazing collaborations that have enrinched my Ph.D. journey.
I was advised by Kartik Ahuja and Pascal Vincent
under the Meta AIM Program, and
also did a summer internship at Microsoft Research Cambridge with
Cheng Zhang and Meyer Scetbon.
Further, I worked with
Vasilis Syrgkanis at Stanford, and prior to Ph.D., I was a research fellow at Microsoft Research India with Amit Sharma.
Select Publications & Preprints
-
Beyond Multi-Token Prediction: Pretraining LLMs with Future Summaries
Divyat Mahajan, Sachin Goyal, Badr Youbi Idrissi, Mohammad Pezeshki, Ioannis Mitliagkas, David Lopez-Paz, Kartik Ahuja
ICLR 2026
[arxiv]
[twitter]
-
Amortized Inference of Causal Models via Conditional Fixed-Point Iterations
Divyat Mahajan*, Jannes Gladrow, Agrin Hilmkil, Cheng Zhang, Meyer Scetbon*
TMLR 2025 (J2C Certification), ICLR 2026
[arxiv]
[code]
-
Compositional Risk Minimization
Divyat Mahajan, Mohammad Pezeshki, Charles Arnal, Ioannis Mitliagkas, Kartik Ahuja, Pascal Vincent
ICML 2025
[arxiv]
[code]
[presentation]
[poster]
[twitter]
-
Empirical Analysis of Model Selection for Heterogeneous Causal Effect Estimation
Divyat Mahajan, Ioannis Mitliagkas, Brady Neal, Vasilis Syrgkanis
ICLR 2024 (Spotlight)
[arxiv]
[code]
[presentation]
[poster]
[twitter]
-
Additive Decoders for Latent Variables Identification and Cartesian-Product Extrapolation
Sébastien Lachapelle*, Divyat Mahajan*, Ioannis Mitliagkas, Simon Lacoste-Julien
NeurIPS 2023 (Oral)
[arxiv]
[code]
[blog]
[talk(conference)]
[talk(reading group)]
[presentation]
[poster]
-
Interventional Causal Representation Learning
Kartik Ahuja, Divyat Mahajan, Yixin Wang, Yoshua Bengio
ICML 2023 (Oral)
[arxiv]
[code]
[talk]
[presentation]
[poster]
-
Towards efficient representation identification in supervised learning
Kartik Ahuja*, Divyat Mahajan*, Vasilis Syrgkanis, Ioannis Mitliagkas
CleaR 2022
[arxiv]
[code]
[talk]
[presentation]
[poster]
-
Domain Generalization using Causal Matching
Divyat Mahajan, Shruti Tople, Amit Sharma
ICML 2021 (Oral)
[arxiv]
[code]
[talk]
[presentation]
[poster]
-
Preserving Causal Constraints in Counterfactual Explanations for Machine Learning Classifiers
Divyat Mahajan, Chenhao Tan, Amit Sharma
CausalML@NeurIPS 2019 (Oral)
[arxiv]
[code]
[talk]
[presentation]
[poster]
Select Awards & Honours
Software
-
RobustDG
Toolkit for Building Robust ML models that generalize to unseen domains | Github | Microsoft