mllm-evaluation

Star

Here are 23 public repositories matching this topic...

EvolvingLMMs-Lab / EASI

Star

Holistic Evaluation of Multimodal LLMs on Spatial Intelligence

multimodal-models mllm spatial-intelligence mllm-evaluation

Updated Apr 14, 2026
Python

luo-junyu / FinMME

Star

[ACL 2025] FinMME: Benchmark Dataset for Financial Multi-Modal Reasoning Evaluation

mllm-evaluation mllm-reasoning

Updated Jun 19, 2025
Python

AdaCheng / EgoThink

Star

[CVPR'24 Highlight] The official code and data for paper "EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Language Models"

egocentric-vision mllm-evaluation

Updated Mar 25, 2025
Python

HiThink-Research / GAGE

Star

General AI evaluation and Gauge Engine. A unified evaluation engine for LLMs, MLLMs, audio, and diffusion models.

agent game-arena sandbox-environment llm llm-evaluation mllm-evaluation

Updated Apr 17, 2026
Python

zhousheng97 / EgoTextVQA

Star

[CVPR'25] 🌟🌟 EgoTextVQA: Towards Egocentric Scene-Text Aware Video Question Answering

videoqa mllm-evaluation scene-text-vqa scene-text-videoqa egocentric-qa-assistance

Updated Jun 19, 2025
Python

Lum1104 / EIBench

Star

Why We Feel: Breaking Boundaries in Emotional Reasoning with Multimodal Large Language Models

emotion-analysis chain-of-thought-reasoning mllm-evaluation emotion-reasoning

Updated Sep 30, 2025
Python

path2generalist / General-Level

Star

On Path to Multimodal Generalist: General-Level and General-Bench

benchmark llm mllm multimodal-large-language-models multimodal-generalist llm-evaluation mllms mllm-evaluation

Updated Jul 11, 2025
Python

AdaCheng / VidEgoThink

Star

The official code and data for paper "VidEgoThink: Assessing Egocentric Video Understanding Capabilities for Embodied AI"

egocentric-videos mllm-evaluation

Updated Mar 25, 2025
Python

Now-Join-Us / OmniEvalKit

Star

The code repository for "OmniEvalKit: A Modular, Lightweight Toolbox for Evaluating Large Language Model and its Omni-Extensions"

evaluation-framework large-language-models mllm llm-evaluation mllm-evaluation

Updated Feb 21, 2025
Python

EchoDreamer / Modality-Preference

Star

Modality Preference

mllm-evaluation

Updated Mar 31, 2026
Python

OBI-Future / PictOBI-20k

Star

[ICASSP'26] PictOBI-20k: A dataset designed to evaluate large multimodal models on the visual decipherment tasks of pictographic OBCs

benchmark dataset decipherment visual-perception oracle-bone-inscription mllm-evaluation

Updated Jan 16, 2026
Python

SkyworkAI / CSVQA

Star

A Multimodal Benchmark for Evaluating Scientific Reasoning Capabilities of VLMs

benchmark reasoning mllm mllm-evaluation mllm-reasoning

Updated Jun 6, 2025
Python

MaoSong2022 / Encoder-Redundancy

Star

offcial implementatio of "Investigating Redundancy in Multimodal Large Language Models with Multiple Vision Encoders" (ICLR2026)

vision mllm mllm-evaluation

Updated Feb 13, 2026
JavaScript

williamium3000 / ego-privacy

Star

Office implementation of EgoPrivacy (ICML2025)

privacy vision egocentric egocentric-vision mllm mllm-evaluation

Updated Oct 9, 2025

grow-ai-like-a-child / core-knowledge

Star

Office codebase for ICML 2025 paper "Core Knowledge Deficits in Multi-Modal Language Models"

core-knowledge large-language-model multi-modal-large-language-model mllm-evaluation

Updated Mar 7, 2026
HTML

Moenupa / DeOCR

Star

A high-performance highly-customizable reverse OCR tool that renders text or huggingface-compatible datasets to images. Dimension, DPI, CSS configurable!

python ocr asynchronous data-preprocessing image-rendering multimodal huggingface-datasets multimodal-large-language-models mllm-evaluation deepseek-ocr