About

DATA-FM @ ICLR 2026

Welcome to the Navigating and Addressing Data Problems for Foundation Models Workshop (DATA-FM), co-located with ICLR 2026!

Foundation models (FMs) continue to progress rapidly, with advances in reasoning, multimodal understanding and generation, and emerging agentic behaviors. These developments rely on increasingly diverse forms of data, including large-scale pre-training corpora; post-training data such as instruction, preference, reasoning, and multi-turn interaction traces; aligned multimodal datasets; and high-quality synthetic data throughout the pipeline. As reliance on broad and heterogeneous data sources grows, longstanding challenges in curation, attribution, copyright, privacy, fairness, safety, and evaluation have become more pressing. Understanding and improving the data layer is now a central scientific and engineering priority for the next generation of FMs.

Building on the success of the previous two editions (DPFM @ ICLR 2024 and DATA-FM @ ICLR 2025), the 3rd DATA-FM workshop aims to deepen a principled understanding of data challenges across the FM pipeline. We welcome a broad community of participants, including but not limited to researchers and engineers working on pre-training, post-training, multimodality, and agentic systems; experts in law, policy, and economics; and practitioners from industry, including frontier labs and startups. Our goal is to clarify emerging data problems, identify actionable research opportunities, and foster interdisciplinary collaboration toward a more rigorous and responsible data ecosystem for AI.


Topics of interest include, but are not limited to:

  • Data curation: collection, cleaning, deduplication, selection, and mixture optimization
  • Data attribution, provenance, and valuation
  • Data marketplaces and emerging economic models for data exchange
  • Data scarcity, discovery, and sourcing strategies
  • Synthetic data generation: quality, diversity, and mitigation of model collapse
  • Principled methodologies for model evaluation and benchmark design
  • Small-scale experimentation for guiding large-scale training (e.g., scaling laws, μP)
  • Data-centric approaches to alignment and AI safety
  • Responsible data practices: privacy, security, copyright, and fairness
  • Legal, regulatory, and governance frameworks for data in foundation models

Calls

Call for Papers

Important Dates
  • Submission Deadline: Feb 6th, 2026 Feb 8th, 2026, AoE  — Submissions Closed
  • Notification of Acceptance: March 1st, 2026, AoE
  • Camera-ready Deadline: April 1st, 2026, 11:59pm AoE
  • Workshop Date (Finalized): April 26th, 2026 @ Rio de Janeiro, Brazil
Regular Submission Instructions

Regular submissions may be research or position papers. All submissions are handled through OpenReview and must be anonymized for double-blind review. Papers should be no more than 10 pages (excluding references) and follow the Overleaf template adapted from ICLR. An optional appendix of any length may be included at the end of the draft after the references.

Our workshop does not have formal proceedings, i.e., it is non-archival. Accepted papers and their review comments will be posted on OpenReview in public (after the end of the review process), while rejected and withdrawn papers and their reviews will remain private.

We welcome submissions presenting novel research, ongoing or incomplete projects, manuscripts currently under review at other venues, as well as recently published results. In addition, we adopt the following policies:

  • [Submission on previous conference papers] We allow submissions that have been accepted at major machine learning conferences within one year of ICLR 2026 (i.e., after May 2025), including papers recently accepted to the ICLR 2026 main conference. However, as workshops are primarily intended to showcase novel or ongoing research, submissions based on previously published work may be deprioritized for oral presentations.
  • [Submission on previous journal papers] For work published in journals, we leave it to the authors to assess the novelty and relevance of the submission for the community. While the machine learning field moves quickly, this workshop aims to be inclusive of subareas that may progress at a different pace and values contributions that emphasize fundamental and long-lasting research.
Short Paper Submission Instructions (3–5 pages)

Since 2025, ICLR has discontinued the separate “Tiny Papers” track, and is instead requiring each workshop to accept short (3–5 pages in ICLR format, exact page length to be determined by each workshop) paper submissions, with an eye towards inclusion; see https://iclr.cc/Conferences/2025/CallForTinyPapers for a history of the ICLR tiny papers initiative. Authors of these papers will be earmarked for potential funding from ICLR, but need to submit a separate application for Financial Assistance that evaluates their eligibility. This application for Financial Assistance to attend ICLR 2026 will become available on https://iclr.cc/Conferences/2026/ at the beginning of February and close early March.

Building on last year's practice, our workshop continues to welcome short paper submissions intended to support underrepresented, under-resourced, and early-career researchers who may not yet have the means to submit full papers. This track is intended for work at the early stages of a project: for example, a concise but self-contained theoretical result, a novel observation from preliminary experiments, or a fresh perspective on an existing problem. The goal is to foster early-stage ideas and provide a platform for researchers to receive constructive feedback and guidance as they develop their work further.

Short papers will be peer reviewed. Submissions should be anonymized, 3–5 pages long (excluding references), using the same submission portal in OpenReview and following the same Overleaf template . In addition, please clearly add a tag [Short] at the beginning of the submission title.

In accordance with ICLR policy, AI-generated papers are not permitted in the short paper track.

Author-Reviewer Policy

The workshop program committee plays an important role in identifying and giving feedback on up-and-coming work that would most benefit from discussion and visibility at the workshop. To sustain our review and program selection processes, we expect at least one author of each submitted paper to volunteer to participate as a reviewer for the DATA-FM 2026 workshop.

Large Language Model Usage Policy

DATA-FM 2026 adheres to the ICLR 2026 policies on large language model (LLM) usage: https://blog.iclr.cc/2025/08/26/policies-on-large-language-model-usage-at-iclr-2026/.

In particular, authors may use LLM-based tools to assist with writing, editing, coding, or experimentation, provided that any such use is disclosed, and that all human authors take full responsibility for the content and originality of the submission.

Awards

Awards & Complimentary Registration

Best Paper Awards

We will select 4–6 submissions for oral presentations (15 minutes each). From them, we will recognize one best paper award and one best paper honorable mention to acknowledge outstanding research contributions.

Early Career Free Registration

To promote diversity, equity, and inclusion, the workshop offers a limited number of free full ICLR 2026 conference registrations for early-career researchers and students. Awardees will be selected with priority given to early-career attendees.

How to Apply:

  • Fill out this application form
  • Deadline: March 6th, 2026, AoE
  • Awardees will be announced by: TBD
Outstanding Reviewers Free Registration

The workshop values high-quality peer review. We offer a limited number of free full ICLR 2026 conference registrations for reviewers who provide exceptional reviews. This is a self-nominated award—if you have contributed outstanding reviews to our workshop, we encourage you to apply.

How to Apply:

  • Fill out this application form
  • Deadline: March 6th, 2026, AoE
  • Awardees will be announced by: TBD

Talks

Invited Speakers

Maria De-Arteaga

ESADE Business School

Kelvin Guu

Google DeepMind

Hanna Hajishirzi

AI2 / University of Washington

Junyang Lin

Alibaba Qwen

Organization

Workshop Organizers

Luxi He

Princeton University

Yuzheng Hu

University of Illinois Urbana-Champaign

Ruoxi Jia

Virginia Tech

Pratyush Maini

DatologyAI / CMU

Monica Ribero

Google

Jiachen (Tianhao) Wang

Princeton University

Zheng Xu

Meta

Program Committee

Abhay Kumar

Abhaya Trivedi

Ahmed M. Abdelmoniem

Ajay Yadav

Alfy Samuel

Amin Banayeeanzade

Amr Abourayya

Anmol Goel

Anmol Kabra

Arinbjörn Kolbeinsson

Arun Ganesh

Aryansh Shrivastava

Aurélien Bellet

Benedikt Droste

Bowen Tan

Buxin Su

Cathy Jiao

Chendi Wang

Chunhui Zhang

Clara Na

Daogao Liu

Dario Loi

David Heineman

Dequan Wang

Dhruv Nathawani

Divyansh Pareek

Eddison Pham

Erchi Wang

Fan Wu

Firas Darwish

Francesco Tonin

Frederic Sala

Gaurav Rohit Ghosal

Götz-Henrik Wiegand

Guy Rosman

Haibo Yang

Haodong Wen

Haonan Duan

Harsh Raj

Haruka Kiyohara

Hongbin Liu

Iacopo Masi

Iris Dominguez-Catena

Ishika Agarwal

Jacqueline He

Jalaj Upadhyay

James Flemings

Jan Geffert

Jasin Cekinmez

Jhalak Gupta

Jialu Wang

Jiayi Wang

Jiayuan Ye

Jingtan Wang

Jingwei Zuo

Jingyan Shen

Jinhyun So

Jinlong Pang

Joris Guerin

Junwei Deng

Kevin Christian Wibisono

Kijung Shin

Lalchand Pandia

Lie He

Lingcheng Kong

Lingxiao Wang

Lorenzo Rossi

Lorenzo Sani

Lun Wang

Luyang Zhang

Mahule Roy

Manoj Saravanan

Maximilian Idahl

Mayee F Chen

Mehak

Meng Ding

Michael Handley

Michael Johnston

MingYu Lu

Miroojin Bakshi

Murali Emani

Neslihan Bulut

Nick Rui

Nikola Konstantinov

Peizhi Niu

Pingbang Hu

Qiaobo Li

Qirun Dai

Quan Gan

Reem I. Masoud

Robin Staab

Rohit Kumar Salla

Ryan McKenna

Ryan Wang

Ryo Mitsuhashi

Saksham Rastogi

Salma Kharrat

Sattvik Sahai

Sebastian U Stich

Shaobo Wang

Shiqiang Wang

Shixuan Liu

Simin Fan

Simon Park

Sneha Kudugunta

Spencer Hong

Stefanos Laskaridis

Swastik Nanda

Tianyi Xu

Tianyuan Zou

Tiejin Chen

Timo Hromadka

Tom Julian Viering

Umar Farooqi

Valentin NOËL

Valter Hudovernik

Vethavikashini Chithrra Raghuram

Victor Moreli dos Santos

Vijay Prakash Dwivedi

Vishakh Padmakumar

Wanyun Xie

Wei-Ning Chen

Weida Li

Wenkai Li

Xiaoqing Sun

Xinjie Shen

Xinyan Velocity Yu

Xinyang Lu

Xuan Ouyang

Yae Jee Cho

Yanlin Zhang

Yanqi Luo

Yao Tong

Yasuhiro Yoshida

Yi Sui

Yi Zhou

Yifan Zhang

Yifei Zhang

Yingrui Ji

Yu-Chen Den

Yurong Liu

Zeman Li

Zhengyao Gu

Zhengyuan Jiang

Zhiliang Chen

Ziao Yang

Zichen Wen

Zichun Yu

Zilin Du

Zillur Rahman

Sponsors

Sponsors

Contact us

Email us at [email protected]