iSearch++: An Augmented State-of-the-Art Information Retrieval Test Collection for Integrated Academic Search
Note
This repository is a reference landing page of curated resources:
- Zenodo archive with updated test collection resources: https://zenodo.org/
- General statistics about licenses in iSearch: https://github.com/irgroup/isearch-statistics
- ir_datasets integration: https://github.com/breuert/ir_datasets
Abstract:
The iSearch test collection remains a unique resource for evaluating information access systems such as academic search engines. Built following the Cranfield evaluation paradigm, it combines arXiv full texts and metadata with detailed descriptions of users’ information needs across different expertise levels. Although the collection is now over 15 years old and relatively small by modern standards (~160,000 documents), its structured relevance assessments make it an ideal foundation for evaluating contemporary systems.
The iSearch++ project aims to modernize this dataset by improving full-text extraction (e.g., table extraction), re-evaluating relevance using LLM-as-a-Judge methods, integrating the collection into the ir_datasets framework, and aligning it with FAIR principles. Within the NFDIxCS context, iSearch++ demonstrates how legacy research datasets can be updated to meet current technical and accessibility standards while preserving their original research value.