DATA-FM-ICLR2026

About

DATA-FM @ ICLR 2026

Welcome to the Navigating and Addressing Data Problems for Foundation Models Workshop (DATA-FM), co-located with ICLR 2026!

Foundation models (FMs) continue to progress rapidly, with advances in reasoning, multimodal understanding and generation, and emerging agentic behaviors. These developments rely on increasingly diverse forms of data, including large-scale pre-training corpora; post-training data such as instruction, preference, reasoning, and multi-turn interaction traces; aligned multimodal datasets; and high-quality synthetic data throughout the pipeline. As reliance on broad and heterogeneous data sources grows, longstanding challenges in curation, attribution, copyright, privacy, fairness, safety, and evaluation have become more pressing. Understanding and improving the data layer is now a central scientific and engineering priority for the next generation of FMs.

Building on the success of the previous two editions (DPFM @ ICLR 2024 and DATA-FM @ ICLR 2025), the 3rd DATA-FM workshop aims to deepen a principled understanding of data challenges across the FM pipeline. We welcome a broad community of participants, including but not limited to researchers and engineers working on pre-training, post-training, multimodality, and agentic systems; experts in law, policy, and economics; and practitioners from industry, including frontier labs and startups. Our goal is to clarify emerging data problems, identify actionable research opportunities, and foster interdisciplinary collaboration toward a more rigorous and responsible data ecosystem for AI.

Topics of interest include, but are not limited to:

Data curation: collection, cleaning, deduplication, selection, and mixture optimization
Data attribution, provenance, and valuation
Data marketplaces and emerging economic models for data exchange
Data scarcity, discovery, and sourcing strategies
Synthetic data generation: quality, diversity, and mitigation of model collapse
Principled methodologies for model evaluation and benchmark design
Small-scale experimentation for guiding large-scale training (e.g., scaling laws, μP)
Data-centric approaches to alignment and AI safety
Responsible data practices: privacy, security, copyright, and fairness
Legal, regulatory, and governance frameworks for data in foundation models

Calls

Call for Papers

Important Dates

Submission Deadline: ~~Feb 6th, 2026~~ → ~~Feb 8th, 2026, AoE~~ — Submissions Closed
Notification of Acceptance: March 1st, 2026, AoE
Camera-ready Deadline: April 1st, 2026, 11:59pm AoE
Workshop Date (Finalized): April 26th, 2026 @ Rio de Janeiro, Brazil

Regular Submission Instructions

Regular submissions may be research or position papers. All submissions are handled through OpenReview and must be anonymized for double-blind review. Papers should be no more than 10 pages (excluding references) and follow the Overleaf template adapted from ICLR. An optional appendix of any length may be included at the end of the draft after the references.

Our workshop does not have formal proceedings, i.e., it is non-archival. Accepted papers and their review comments will be posted on OpenReview in public (after the end of the review process), while rejected and withdrawn papers and their reviews will remain private.

We welcome submissions presenting novel research, ongoing or incomplete projects, manuscripts currently under review at other venues, as well as recently published results. In addition, we adopt the following policies:

[Submission on previous conference papers] We allow submissions that have been accepted at major machine learning conferences within one year of ICLR 2026 (i.e., after May 2025), including papers recently accepted to the ICLR 2026 main conference. However, as workshops are primarily intended to showcase novel or ongoing research, submissions based on previously published work may be deprioritized for oral presentations.
[Submission on previous journal papers] For work published in journals, we leave it to the authors to assess the novelty and relevance of the submission for the community. While the machine learning field moves quickly, this workshop aims to be inclusive of subareas that may progress at a different pace and values contributions that emphasize fundamental and long-lasting research.

Short Paper Submission Instructions (3–5 pages)

Since 2025, ICLR has discontinued the separate “Tiny Papers” track, and is instead requiring each workshop to accept short (3–5 pages in ICLR format, exact page length to be determined by each workshop) paper submissions, with an eye towards inclusion; see https://iclr.cc/Conferences/2025/CallForTinyPapers for a history of the ICLR tiny papers initiative. Authors of these papers will be earmarked for potential funding from ICLR, but need to submit a separate application for Financial Assistance that evaluates their eligibility. This application for Financial Assistance to attend ICLR 2026 will become available on https://iclr.cc/Conferences/2026/ at the beginning of February and close early March.

Building on last year's practice, our workshop continues to welcome short paper submissions intended to support underrepresented, under-resourced, and early-career researchers who may not yet have the means to submit full papers. This track is intended for work at the early stages of a project: for example, a concise but self-contained theoretical result, a novel observation from preliminary experiments, or a fresh perspective on an existing problem. The goal is to foster early-stage ideas and provide a platform for researchers to receive constructive feedback and guidance as they develop their work further.

Short papers will be peer reviewed. Submissions should be anonymized, 3–5 pages long (excluding references), using the same submission portal in OpenReview and following the same Overleaf template . In addition, please clearly add a tag [Short] at the beginning of the submission title.

In accordance with ICLR policy, AI-generated papers are not permitted in the short paper track.

Author-Reviewer Policy

The workshop program committee plays an important role in identifying and giving feedback on up-and-coming work that would most benefit from discussion and visibility at the workshop. To sustain our review and program selection processes, we expect at least one author of each submitted paper to volunteer to participate as a reviewer for the DATA-FM 2026 workshop.

Large Language Model Usage Policy

DATA-FM 2026 adheres to the ICLR 2026 policies on large language model (LLM) usage: https://blog.iclr.cc/2025/08/26/policies-on-large-language-model-usage-at-iclr-2026/.

In particular, authors may use LLM-based tools to assist with writing, editing, coding, or experimentation, provided that any such use is disclosed, and that all human authors take full responsibility for the content and originality of the submission.

Awards

Awards & Complimentary Registration

Best Paper Awards

We will select 4–6 submissions for oral presentations (15 minutes each). From them, we will recognize one best paper award and one best paper honorable mention to acknowledge outstanding research contributions.

Early Career Free Registration

To promote diversity, equity, and inclusion, the workshop offers a limited number of free full ICLR 2026 conference registrations for early-career researchers and students. Awardees will be selected with priority given to early-career attendees.

How to Apply:

Fill out this application form
Deadline: March 6th, 2026, AoE
Awardees will be announced by: TBD

Outstanding Reviewers Free Registration

The workshop values high-quality peer review. We offer a limited number of free full ICLR 2026 conference registrations for reviewers who provide exceptional reviews. This is a self-nominated award—if you have contributed outstanding reviews to our workshop, we encourage you to apply.

How to Apply:

Fill out this application form
Deadline: March 6th, 2026, AoE
Awardees will be announced by: TBD

Talks

Invited Speakers

Organization

Workshop Organizers

DATA-FM @ ICLR 2026

3rd Workshop on Navigating and Addressing Data Problems for Foundation Models

(DATA-FM @ ICLR 2026)

About

Calls

Important Dates

Regular Submission Instructions

Short Paper Submission Instructions (3–5 pages)

Author-Reviewer Policy

Large Language Model Usage Policy

Awards

Best Paper Awards

Early Career Free Registration

Outstanding Reviewers Free Registration

Talks

Maria De-Arteaga

Kelvin Guu

Hanna Hajishirzi

Junyang Lin

Baharan Mirzasoleiman

Organization

Luxi He

Yuzheng Hu

Martin Jaggi

Ruoxi Jia

Pratyush Maini

Monica Ribero

Jiachen (Tianhao) Wang

Zheng Xu

Sponsors

Contact us

Email us at [email protected]