Scientific Image Quality Assessment Challenge (SIQA)

ICME 2026 · Advancing AI’s Ability to Evaluate Scientific Integrity in Visual Data

Introduction

Multimodal large language models (MLLMs) are revolutionizing image quality assessment (IQA) by bringing semantic understanding and world knowledge into evaluation. Yet nearly all existing IQA benchmarks ignore scientific imagery, a cornerstone of research, education, and discovery.

Unlike general photos, where quality is judged by blur or noise, scientific images must also be correct, complete, clear, and conventional. A high-quality scientific visualization must: accurately reflect scientific facts (Validity); include all necessary labels, scales, and context (Completeness); be instantly interpretable by experts (Clarity); and follow field-specific norms in style and notation (Conformity).

To address this gap, the SIQA Challenge introduces two tracks (SIQA-U & SIQA-S) that push models beyond pixels—to evaluate scientific integrity through visual reasoning and domain-aware judgment.

Task Example

Challenge Tracks

The SIQA Challenge consists of two independent tracks. Participants can choose any one or both to compete in. Each track will have its own set of awards.

Task Example
Model Perception SRCC Perception PLCC Knowledge SRCC Knowledge PLCC Mean SRCC Mean PLCC
NIQE (zero-shot) 0.345 0.235 0.447 0.410 0.396 0.322
Q-Align (Zero-shot) 0.749 0.762 0.285 0.400 0.517 0.581
CLIP-IQA (Zero-shot) 0.496 0.520 0.362 0.435 0.429 0.478
CLIP-IQA+ (Trained) 0.724 0.676 0.862 0.801 0.793 0.741
HyperIQA (Trained) 0.773 0.783 0.897 0.895 0.835 0.839
InternVL3.5 (Fine-tuned) 0.857 0.881 0.915 0.937 0.886 0.909

Note: While MLLMs are not mandatory for this challenge, baseline experiments across diverse architectures show that fine-tuned MLLMs achieve superior performance on SIQA-S.

🔍 Challenge Track Details

Track 1: SIQA Understanding (SIQA-U)

Evaluates a model’s ability to reason about scientific image quality through structured visual question answering aligned with the four SIQA dimensions.

Question Types:

  • Yes/No: Binary verification of factual, structural, or representational conditions.
  • What: Multiple-choice comprehension of scientific entities, relationships, or context.
  • How: Quality judgment on completeness, clarity, and disciplinary conventions.

Measures factual verification, semantic understanding, and scientific reasoning.

Evaluation Metric:
Final Score₁ = 0.2 × ACCyes/no + 0.3 × ACCwhat + 0.5 × ACChow

Track 2: SIQA Scoring (SIQA-S)

Predicts continuous quality scores along two complementary dimensions:

  • Knowledge-Driven: Factual correctness (validity + completeness) — does the image convey accurate science?
  • Perception-Driven: Human-expert judgments on clarity and discipline-specific usability (clarity + conformity).

Models predict scores directly from images—no text input required.

Evaluation Metric:
For each dimension d ∈ {Factual, Perceptual}:
Score(d) = max( (SRCC(d) + PLCC(d)) / 2 , 0 ) × 100
Final Score₂ = (ScoreFactual + ScorePerceptual) / 2

The SIQA dataset is built around two complementary tracks: SIQA-U for structured reasoning and SIQA-S for continuous quality scoring. To help participants better understand the challenge design and data structure, we provide a visual overview of the dataset below.

SIQA Dataset Overview

Timeline

February 7, 2026
Registration opens at Form, and delever confirm mail at [email protected]; Training dataset released at Huggingface Train
March 10, 2026
Validation dataset released at Hugging Face ValidSet. Submission Phase Begins: Please submit your result files via email to [email protected] following the official format guidelines.

Note: To ensure fairness, all submissions are evaluated locally by the committee. The leaderboard is updated weekly every Wednesday.
April 1, 2026
Registration deadline
April 25, 2026
Code & model submission deadline if you want, submission your code at github;
May 4, 2026
Paper submission deadline at official mail.
May 15, 2026
Final results announced (to be posted on results page)

Top-performing teams will be invited to extend their solutions into a full paper for publication in Displays journal.

Organizers

Wenzhe Li
Shanghai Artificial Intelligence Laboratory
Liang Chen
Shanghai Artificial Intelligence Laboratory
Junying Wang
Shanghai Artificial Intelligence Laboratory
Farong Wen
Shanghai Artificial Intelligence Laboratory
Yijing Guo
Shanghai Artificial Intelligence Laboratory
Ye Shen
Shanghai Artificial Intelligence Laboratory
Qihang Yan
Shanghai Artificial Intelligence Laboratory

In partnership with universities and open-data initiatives worldwide.

Advisory Committee

Zicheng Zhang*
Shanghai Artificial Intelligence Laboratory
Chunyi Li
Shanghai Artificial Intelligence Laboratory
Wenlong Zhang
Shanghai Artificial Intelligence Laboratory
Wei Zhou
Cardiff University
Xiaohong Liu
Shanghai Jiao Tong University
Xiongkuo Min
Shanghai Jiao Tong University
Guangtao Zhai
Shanghai Artificial Intelligence Laboratory

* means the project lead