About
DATA-FM @ ICLR 2026
Welcome to the Navigating and Addressing Data Problems for Foundation Models Workshop (DATA-FM), co-located with ICLR 2026!
Foundation models (FMs) continue to progress rapidly, with advances in reasoning, multimodal understanding and generation, and emerging agentic behaviors. These developments rely on increasingly diverse forms of data, including large-scale pre-training corpora; post-training data such as instruction, preference, reasoning, and multi-turn interaction traces; aligned multimodal datasets; and high-quality synthetic data throughout the pipeline. As reliance on broad and heterogeneous data sources grows, longstanding challenges in curation, attribution, copyright, privacy, fairness, safety, and evaluation have become more pressing. Understanding and improving the data layer is now a central scientific and engineering priority for the next generation of FMs.
Building on the success of the previous two editions (DPFM @ ICLR 2024 and DATA-FM @ ICLR 2025), the 3rd DATA-FM workshop aims to deepen a principled understanding of data challenges across the FM pipeline. We welcome a broad community of participants, including but not limited to researchers and engineers working on pre-training, post-training, multimodality, and agentic systems; experts in law, policy, and economics; and practitioners from industry, including frontier labs and startups. Our goal is to clarify emerging data problems, identify actionable research opportunities, and foster interdisciplinary collaboration toward a more rigorous and responsible data ecosystem for AI.
Topics of interest include, but are not limited to:
- Data curation: collection, cleaning, deduplication, selection, and mixture optimization
- Data attribution, provenance, and valuation
- Data marketplaces and emerging economic models for data exchange
- Data scarcity, discovery, and sourcing strategies
- Synthetic data generation: quality, diversity, and mitigation of model collapse
- Principled methodologies for model evaluation and benchmark design
- Small-scale experimentation for guiding large-scale training (e.g., scaling laws, μP)
- Data-centric approaches to alignment and AI safety
- Responsible data practices: privacy, security, copyright, and fairness
- Legal, regulatory, and governance frameworks for data in foundation models
Calls
Call for Papers
Important Dates
-
Submission Deadline:
Feb 6th, 2026→Feb 8th, 2026, AoE— Submissions Closed - Notification of Acceptance: March 1st, 2026, AoE
- Camera-ready Deadline: April 1st, 2026, 11:59pm AoE
- Workshop Date (Finalized): April 26th, 2026 @ Rio de Janeiro, Brazil
Regular Submission Instructions
Regular submissions may be research or position papers. All submissions are handled through OpenReview and must be anonymized for double-blind review. Papers should be no more than 10 pages (excluding references) and follow the Overleaf template adapted from ICLR. An optional appendix of any length may be included at the end of the draft after the references.
Our workshop does not have formal proceedings, i.e., it is non-archival. Accepted papers and their review comments will be posted on OpenReview in public (after the end of the review process), while rejected and withdrawn papers and their reviews will remain private.
We welcome submissions presenting novel research, ongoing or incomplete projects, manuscripts currently under review at other venues, as well as recently published results. In addition, we adopt the following policies:
- [Submission on previous conference papers] We allow submissions that have been accepted at major machine learning conferences within one year of ICLR 2026 (i.e., after May 2025), including papers recently accepted to the ICLR 2026 main conference. However, as workshops are primarily intended to showcase novel or ongoing research, submissions based on previously published work may be deprioritized for oral presentations.
- [Submission on previous journal papers] For work published in journals, we leave it to the authors to assess the novelty and relevance of the submission for the community. While the machine learning field moves quickly, this workshop aims to be inclusive of subareas that may progress at a different pace and values contributions that emphasize fundamental and long-lasting research.
Short Paper Submission Instructions (3–5 pages)
Since 2025, ICLR has discontinued the separate “Tiny Papers” track, and is instead requiring each workshop to accept short (3–5 pages in ICLR format, exact page length to be determined by each workshop) paper submissions, with an eye towards inclusion; see https://iclr.cc/Conferences/2025/CallForTinyPapers for a history of the ICLR tiny papers initiative. Authors of these papers will be earmarked for potential funding from ICLR, but need to submit a separate application for Financial Assistance that evaluates their eligibility. This application for Financial Assistance to attend ICLR 2026 will become available on https://iclr.cc/Conferences/2026/ at the beginning of February and close early March.
Building on last year's practice, our workshop continues to welcome short paper submissions intended to support underrepresented, under-resourced, and early-career researchers who may not yet have the means to submit full papers. This track is intended for work at the early stages of a project: for example, a concise but self-contained theoretical result, a novel observation from preliminary experiments, or a fresh perspective on an existing problem. The goal is to foster early-stage ideas and provide a platform for researchers to receive constructive feedback and guidance as they develop their work further.
Short papers will be peer reviewed. Submissions should be anonymized, 3–5 pages long (excluding references), using the same submission portal in OpenReview and following the same Overleaf template . In addition, please clearly add a tag [Short] at the beginning of the submission title.
In accordance with ICLR policy, AI-generated papers are not permitted in the short paper track.
Author-Reviewer Policy
The workshop program committee plays an important role in identifying and giving feedback on up-and-coming work that would most benefit from discussion and visibility at the workshop. To sustain our review and program selection processes, we expect at least one author of each submitted paper to volunteer to participate as a reviewer for the DATA-FM 2026 workshop.
Large Language Model Usage Policy
DATA-FM 2026 adheres to the ICLR 2026 policies on large language model (LLM) usage: https://blog.iclr.cc/2025/08/26/policies-on-large-language-model-usage-at-iclr-2026/.
In particular, authors may use LLM-based tools to assist with writing, editing, coding, or experimentation, provided that any such use is disclosed, and that all human authors take full responsibility for the content and originality of the submission.
Awards
Awards & Complimentary Registration
Best Paper Awards
We will select 4–6 submissions for oral presentations (15 minutes each). From them, we will recognize one best paper award and one best paper honorable mention to acknowledge outstanding research contributions.
Early Career Free Registration
To promote diversity, equity, and inclusion, the workshop offers a limited number of free full ICLR 2026 conference registrations for early-career researchers and students. Awardees will be selected with priority given to early-career attendees.
How to Apply:
- Fill out this application form
- Deadline: March 6th, 2026, AoE
- Awardees will be announced by: TBD
Outstanding Reviewers Free Registration
The workshop values high-quality peer review. We offer a limited number of free full ICLR 2026 conference registrations for reviewers who provide exceptional reviews. This is a self-nominated award—if you have contributed outstanding reviews to our workshop, we encourage you to apply.
How to Apply:
- Fill out this application form
- Deadline: March 6th, 2026, AoE
- Awardees will be announced by: TBD
Talks
Invited Speakers
Maria De-Arteaga
ESADE Business School
Kelvin Guu
Google DeepMind
Hanna Hajishirzi
AI2 / University of Washington
Junyang Lin
Alibaba QwenOrganization
Workshop Organizers
Luxi He
Princeton University
Yuzheng Hu
University of Illinois Urbana-Champaign
Martin Jaggi
EPFL
Ruoxi Jia
Virginia Tech
Pratyush Maini
DatologyAI / CMU
Monica Ribero
Google
Jiachen (Tianhao) Wang
Princeton University
Zheng Xu
MetaProgram Committee
Abhay Kumar
Abhaya Trivedi
Ahmed M. Abdelmoniem
Ajay Yadav
Alfy Samuel
Amin Banayeeanzade
Amr Abourayya
Anmol Goel
Anmol Kabra
Arinbjörn Kolbeinsson
Arun Ganesh
Aryansh Shrivastava
Aurélien Bellet
Benedikt Droste
Bowen Tan
Buxin Su
Cathy Jiao
Chendi Wang
Chunhui Zhang
Clara Na
Daogao Liu
Dario Loi
David Heineman
Dequan Wang
Dhruv Nathawani
Divyansh Pareek
Eddison Pham
Erchi Wang
Fan Wu
Firas Darwish
Francesco Tonin
Frederic Sala
Gaurav Rohit Ghosal
Götz-Henrik Wiegand
Guy Rosman
Haibo Yang
Haodong Wen
Haonan Duan
Harsh Raj
Haruka Kiyohara
Hongbin Liu
Iacopo Masi
Iris Dominguez-Catena
Ishika Agarwal
Jacqueline He
Jalaj Upadhyay
James Flemings
Jan Geffert
Jasin Cekinmez
Jhalak Gupta
Jialu Wang
Jiayi Wang
Jiayuan Ye
Jingtan Wang
Jingwei Zuo
Jingyan Shen
Jinhyun So
Jinlong Pang
Joris Guerin
Junwei Deng
Kevin Christian Wibisono
Kijung Shin
Lalchand Pandia
Lie He
Lingcheng Kong
Lingxiao Wang
Lorenzo Rossi
Lorenzo Sani
Lun Wang
Luyang Zhang
Mahule Roy
Manoj Saravanan
Maximilian Idahl
Mayee F Chen
Mehak
Meng Ding
Michael Handley
Michael Johnston
MingYu Lu
Miroojin Bakshi
Murali Emani
Neslihan Bulut
Nick Rui
Nikola Konstantinov
Peizhi Niu
Pingbang Hu
Qiaobo Li
Qirun Dai
Quan Gan
Reem I. Masoud
Robin Staab
Rohit Kumar Salla
Ryan McKenna
Ryan Wang
Ryo Mitsuhashi
Saksham Rastogi
Salma Kharrat
Sattvik Sahai
Sebastian U Stich
Shaobo Wang
Shiqiang Wang
Shixuan Liu
Simin Fan
Simon Park
Sneha Kudugunta
Spencer Hong
Stefanos Laskaridis
Swastik Nanda
Tianyi Xu
Tianyuan Zou
Tiejin Chen
Timo Hromadka
Tom Julian Viering
Umar Farooqi
Valentin NOËL
Valter Hudovernik
Vethavikashini Chithrra Raghuram
Victor Moreli dos Santos
Vijay Prakash Dwivedi
Vishakh Padmakumar
Wanyun Xie
Wei-Ning Chen
Weida Li
Wenkai Li
Xiaoqing Sun
Xinjie Shen
Xinyan Velocity Yu
Xinyang Lu
Xuan Ouyang
Yae Jee Cho
Yanlin Zhang
Yanqi Luo
Yao Tong
Yasuhiro Yoshida
Yi Sui
Yi Zhou
Yifan Zhang
Yifei Zhang
Yingrui Ji
Yu-Chen Den
Yurong Liu
Zeman Li
Zhengyao Gu
Zhengyuan Jiang
Zhiliang Chen
Ziao Yang
Zichen Wen
Zichun Yu
Zilin Du
Zillur Rahman
Sponsors
Sponsors