Skip to content

dakshjain-1616/vissparse-neo

Repository files navigation

VisSparse – Skip 90% of image tokens without losing accuracy

Made autonomously using NEO · Install NEO Extension

Python 3.8+ License: MIT Tests

Benchmark Vision-Language efficiency by dynamically masking image tokens during inference.

Install

git clone https://github.com/dakshjain-1616/vissparse
cd vissparse
pip install -r requirements.txt

Quickstart

Run a full VQA accuracy-vs-latency sweep in mock mode (no GPU or model download needed):

python run_sparse_vqa.py --num-samples 100 --keep-ratio 0.1 --mode mock

Or use the library directly in your code:

from vissparse.token_selector import TokenSelector
from vissparse.sparse_attention import SparseAttentionMask

selector = TokenSelector(similarity_threshold=0.5)
mask = SparseAttentionMask.generate(selector, image_tokens, keep_ratio=0.1)

Key features

  • Dynamic Token Selection: Cosine-similarity based selector to identify informative image tokens at inference time.
  • Sparse Attention Masking: Custom mask generator compatible with standard VLM architectures like Qwen2-VL.
  • VQA Benchmarking: Evaluate accuracy vs. tokens skipped on VQA v2 datasets with automated CSV reporting.
  • Mock Mode: Test logic and metrics without downloading heavy vision-language models or requiring GPU.

Run tests

pytest tests/ -q
# 91 passed

Project structure

vissparse/
├── vissparse/      ← main library (token_selector, sparse_attention, metrics)
├── tests/          ← test suite (integration, metrics, attention, selector)
├── scripts/        ← demo scripts (demo.py)
├── run_sparse_vqa.py ← main CLI entry point
├── conftest.py     ← pytest configuration
└── requirements.txt

About

Benchmarks Vision-Language model efficiency by dynamically masking image tokens during inference. Skip up to 90% of visual tokens with minimal accuracy loss, with per-layer sparsity heatmaps and throughput charts.

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors