Local tracing options: InSPyReNet support

Add support for fully local tracing via [InSPyReNet](https://github.com/plemeri/InSPyReNet), a background removal model. No API key, no network access, no cost. Should be the default when no `GOOGLE_API_KEY` is set.

## Why InSPyReNet

I evaluated several local approaches against 110 Gemini-traced ground truth images:

| Approach | >0.9 IoU | >0.7 IoU | Median IoU | Speed |
|-|-|-|-|-|
| **InSPyReNet** | **69%** | **72%** | **0.949** | 0.7s |
| ISNet (rembg) | 62% | 75% | 0.934 | 0.9s |
| SAM2-large (ultralytics) | 39% | 45% | 0.588 | 1.8s |
| U2Net (rembg) | 20% | 60% | 0.873 | 0.4s |
| FastSAM | 7% | 30% | 0.380 | 0.5s |
| Ollama VLMs | N/A | N/A | N/A | 90s+ |

InSPyReNet won head-to-head against ISNet (the runner-up) 31 to 7 across 110 images, with 72 ties.

**Why not SAM2?** SAM2-large produces excellent masks (0.92+ IoU) when it works, but it's bimodal -- either near-perfect or complete failure, depending on prompting. The `_C` post-processing extension doesn't compile for Apple Silicon.

**Why not Ollama VLMs?** Vision models (qwen3.5:35b, qwen3-vl:8b) can identify tools but can't generate mask images. I tried extracting polygon vertices as structured JSON instead -- the models return plausible-looking coordinates but they don't map to actual pixel positions. VLMs lack the spatial precision needed for contour tracing. See #11.

## How it should work

InSPyReNet is a salient object detection model trained to separate foreground from background. The model outputs a foreground mask which can feed into the existing `_trace_mask()` pipeline -- same OpenCV contour extraction, smoothing, and polygon output as Gemini.

Some notes on implementation and testing:

- ~80MB model weights, downloaded automatically on first trace
- Runs on Apple Silicon (MPS) or CPU via PyTorch
- Sub-second inference on Apple Silicon, ~2-3s on CPU
- No prompting, no configuration, no mask selection heuristics

## When to use Gemini instead

InSPyReNet struggles slightly with highly reflective/metallic tools, poor lighting, and images where the perspective correction leaves visible table edges. Gemini handles these better because it can reason about the scene rather than just separating foreground/background. Folks may have to crop their images a bit better to have plainer backgrounds, only showing this and the paper relative to a tool being traced.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Local tracing options: InSPyReNet support #13

Why InSPyReNet

How it should work

When to use Gemini instead

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Approach	>0.9 IoU	>0.7 IoU	Median IoU	Speed
InSPyReNet	69%	72%	0.949	0.7s
ISNet (rembg)	62%	75%	0.934	0.9s
SAM2-large (ultralytics)	39%	45%	0.588	1.8s
U2Net (rembg)	20%	60%	0.873	0.4s
FastSAM	7%	30%	0.380	0.5s
Ollama VLMs	N/A	N/A	N/A	90s+

Local tracing options: InSPyReNet support #13

Description

Why InSPyReNet

How it should work

When to use Gemini instead

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions