The Second Workshop on Evaluation for Multimodal Generation

Multimodal generation and retrieval systems are increasingly central to modern information retrieval, powering retrieval-augmented generation (RAG), multimodal search, recommendation, and knowledge-intensive applications. Despite rapid progress in multimodal large language models (MLLMs), robust and principled evaluation of multimodal generation and retrieval remains a major open challenge for the IR community. This workshop aims to foster discussions and research efforts by bringing together researchers and practitioners in information retrieval, natural language processing, computer vision, and multimodal AI. Our goal is to establish evaluation methods for multimodal research and advance research efforts in this direction.

Call for Papers

Both long paper and short papers (up to 9 pages and 4 pages respectively with unlimited references and appendices) are welcomed for submission.

A list of topics relevant to this workshop (but not limited to):

Multimodal retrieval for RAG, Agentic AI, recommendation systems
Evaluation of retrieved cross-modal samples, without relying on augmented generation
Multi-aspect evaluation methods capturing inter- and intra-modal coherence, relevance, grounding, and contextual consistency
Benchmark retrieval datasets, evaluation protocols and annotations for text–image–audio–video–3D generation
Automatic and human-centric metrics for informativeness, factuality, fluency, faithfulness, calibration, and usability for multimodal generation
Methodology for detecting, analysing, and mitigating multimodal bias, stereotypes, toxicity, and hallucinations
Evaluation in multimodal low-resource and multilingual settings, including culturally aware and cross-lingual metrics
Agent-based evaluation of multimodal generation in multi-turn, tool-use, or iterative editing scenarios
Game-theoretic or optimization-based formulations of evaluation objectives and protocols
Evaluation of the generation quality of synthetic multimodal data, provenance/attribution, and downstream impact on training and deployment
Ethical considerations in the evaluation of multimodal text generation, including bias detection and mitigation strategies
Evaluation of Security and Privacy Dimensions in Multimodal Applications

Invited Speakers

Mark Sanderson

Talk Title: TBD

Mark Sanderson is Professor of Information Retrieval at RMIT University where he is Dean of Research for the STEM College. Mark received his Ph.D. in Computer Science from the University of Glasgow, United Kingdom, in 1997. Mark was the first researcher show the value of snippets, a component of search interfaces which are now a standard feature of all search engines. While a faculty member at the Sheffield Information School, Mark co-founded, in 2003, the annual imageCLEF evaluation campaign, which continues to run today. The event has created over 60 research evaluation tasks for the image retrieval and image processing community involving over 500 international research groups. Mark was general chair of ACM SIGIR in 2004 and PC chair of ACM SIGIR 2009 & 2012; and ACM CIKM 2017. Mark was inducted into the ACM SIGIR Academy in 2024. His work in information retrieval, data analysis, and recommender systems has attracted over 15,000 Google Scholar citations.

Jing Jiang

Talk Title: TBD

Jing Jiang is a Professor in the School of Computing at the Australian National University. Prior to joining the ANU, she was a Professor of Computer Science at the Singapore Management University and Director of the AI and Data Science Cluster of the School of Computing and Information Systems at SMU. Jing’s research focuses on the applied side of natural language processing. She has worked on a broad range of topics in NLP including information extraction, topic modelling, sentiment analysis, social media analysis, question answering, and the evaluation of vision-language models. She served as a program co-chair for the Conference on Empirical Methods in Natural Language Processing (EMNLP) 2019, as an action editor for the Transactions of the Association for Computational Linguistics (TACL), and currently serves as one of the Editors-in-Chief of ACL Rolling Review (ARR). Jing received her PhD degree in Computer Science from the University of Illinois at Urbana-Champaign and her B.S. and M.S. degrees from Stanford University. She was named one of the Singapore 100 Women in Tech in 2021.

Paul Thomas

Talk Title: TBD

Paul Thomas is a senior applied scientist at Microsoft, where he focuses on measurement for Bing and other products. His research interests include information retrieval, particularly how people use web search systems and how these systems should be evaluated. He has contributed to various research topics and has co-authored numerous publications. Paul Thomas has also been involved in hosting guest seminars on topics such as using language models for relevance labelling, which is crucial for evaluating the quality of search engines. His work has been recognized with a high h-index and numerous citations, reflecting his significant impact in the field of information retrieval.

Javen Shi

Talk Title: TBD

Professor Javen Qinfeng Shi is the Founding Director of the Causal AI Group and one of the directors at the Australian Institute for Machine Learning (AIML). His research spans causation, AI, mind, and metaphysics. Globally ranked 6th in probabilistic graphical models and 4th in causation by Google Scholar, he has contributed to industries including material discovery, agriculture, mining, sport, manufacturing, bushfire, health and education. His awards include the ACM SIGIR 2025 Test of Time Award, first place in the Open Catalyst Challenge on AI driven material discovery at NeurIPS AI for Science 2023, victory in the AUS/NZ Bushfire Data Quest 2020, finalist recognition in the SA Department of Energy and Mining’s Gawler Challenge 2020, 2nd place in the global OZ Minerals Explorer Challenge 2019 (with 1,000+ participants from 62 countries), and the Golden Prize (1st place) from Volkswagen in 2019 for AI-powered digital factory innovation.

Submission Instructions

We welcome both ARR paper commitment and direct paper submission. More details on ARR paper commitment will be coming soon. For direct paper submission, you are invited to submit your papers in our OpenReview portal. Papers are required to strictly follow the SIGIR submission guidelines. We invite both long papers (9 pages) and short papers (4 pages) submissions. All the submitted papers have to be anonymous for double-blind review. All accepted papers must be presented in person at the workshop.

Important Dates

Mar 25, 2026: Submission Open
May 2, 2026: Workshop Paper Direct Submission
May 27, 2026: ARR Commitment Date
June 2, 2026: Workshop Paper Notification
July 24, 2026: Workshop Day

Note: All deadlines are 11:59PM UTC-12:00 (“Anywhere on Earth”)

Organisers

Wei Emma Zhang, Adelaide University
Xiang Dai, CSIRO
Sarvnaz Karimi, CSIRO
Desmond Elliot, University of Copenhagen
Byron Fang, Oracle
Mong Yuan Sim, Adelaide University & CSIRO

Previous Edition

EvalMG25 @ COLING 2025

The Second Workshop on Evaluation for Multimodal Generation

EvalMG26 @ SIGIR 2026

Special Theme: Evaluation of Multimodal Generation and Retrieval Systems