vggt-mps

Port of Facebook Research's VGGT (Visual Geometry Grounded Transformer) to Apple Silicon via PyTorch's MPS backend. Takes single or multi-view images and produces depth maps, camera poses, and 3D point clouds.


Version	2.0.0
Python	3.10+
Platform	macOS 13+ on Apple Silicon (M1/M2/M3)
Model	facebook/VGGT-1B (1B params, ~5 GB on disk)
License	MIT
PyPI	Not yet published

What it produces

Given N input images, VGGT predicts:

Output	Description
Depth maps	Per-pixel depth estimation
Camera poses	6-DOF camera parameters for each view
3D point clouds	Dense reconstruction (exportable as PLY, OBJ, GLB)
Confidence maps	Per-pixel reliability scores

Architecture

The upstream VGGT model is a 1B-parameter transformer trained on multi-view geometry tasks. This repo wraps it with:

MPS device detection and dtype handling (float32 for Metal compatibility)
A sparse attention module (vggt_sparse_attention.py) that patches the model at runtime for O(n) memory scaling instead of O(n^2)
A unified CLI (vggt command with subcommands)
A Gradio web interface
An MCP server for Claude Desktop integration

vggt-mps/
  src/
    vggt_core.py                # Core VGGT processing
    vggt_sparse_attention.py    # Runtime sparse attention patch
    config.py                   # Centralized configuration
    visualization.py            # 3D visualization
    commands/                   # CLI subcommands (demo, reconstruct, test, benchmark, web)
    utils/                      # Model loader, image utils, export
  tests/                        # MPS, sparse attention, integration tests
  repo/vggt/                    # Vendored upstream VGGT source

Sparse attention

The sparse attention module replaces standard O(n^2) cross-view attention with a covisibility-masked variant. No retraining required -- it patches the loaded model at runtime.

Images	Standard memory	Sparse memory	Reduction
100	O(10K)	O(1K)	10x
500	O(250K)	O(5K)	50x
1000	O(1M)	O(10K)	100x

Output difference vs. standard attention: reported as 0.000000 in tests. In practice this means numerically identical within float32 precision.

Requirements

Apple Silicon Mac (M1, M2, or M3)
8 GB+ RAM
6 GB disk for model weights
Python 3.10+

Install

From source (recommended)

git clone https://github.com/jmanhype/vggt-mps.git
cd vggt-mps
pip install -e .

Or with uv:

make install   # uses uv pip install -e .

Download model weights

vggt download
# or: python main.py download

The model downloads from Hugging Face (~5 GB).

Usage

CLI

vggt demo                              # run with sample images
vggt demo --kitchen --images 4         # kitchen dataset, 4 views
vggt reconstruct data/*.jpg            # your own images
vggt reconstruct --sparse data/*.jpg   # sparse attention for large sets
vggt reconstruct --export ply data/*.jpg
vggt web                               # launch Gradio UI
vggt web --port 8080 --share           # public link
vggt test --suite all                  # run test suite
vggt benchmark --compare               # performance comparison

Python

from src.vggt_sparse_attention import make_vggt_sparse

# Patch any loaded VGGT model for sparse attention
sparse_model = make_vggt_sparse(model, device="mps")
output = sparse_model(images)

MCP server (Claude Desktop)

Add to ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "vggt-agent": {
      "command": "uv",
      "args": [
        "run", "--python", "/path/to/vggt-mps/vggt-env/bin/python",
        "--with", "fastmcp", "fastmcp", "run",
        "/path/to/vggt-mps/src/vggt_mps_mcp.py"
      ]
    }
  }
}

Available MCP tools: vggt_quick_start_inference, vggt_extract_video_frames, vggt_process_images, vggt_create_3d_scene, vggt_reconstruct_3d_scene, vggt_visualize_reconstruction.

Dependencies

Package	Role
torch >= 2.0.0	Computation backend (MPS)
torchvision >= 0.15.0	Image transforms
einops >= 0.6.1	Tensor reshaping
transformers >= 4.30.0	Model loading
huggingface-hub >= 0.16.0	Weight download
timm >= 0.9.0	Vision model components
opencv-python >= 4.7.0	Image I/O
gradio >= 3.40.0	Optional: web interface
fastmcp >= 0.1.0	Optional: MCP server

Limitations

Runs on Apple Silicon only. No CUDA path in this repo (use upstream VGGT for that).
Uses float32 exclusively; MPS does not support float16 autocast for this model.
The vggt download command pulls ~5 GB over the network with no resume support.
Not published to PyPI yet. Install from source.
Sparse attention memory numbers in the table above are asymptotic ratios, not measured byte counts.
The vendored repo/vggt/ tree is a snapshot and may drift from upstream.

References

Contributing

Development branch is develop. See CONTRIBUTING.md.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 106 Commits
.github		.github
.rlm-trace		.rlm-trace
docs		docs
examples		examples
repo/vggt		repo/vggt
scripts		scripts
src/vggt_mps		src/vggt_mps
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.pypirc.template		.pypirc.template
.python-version		.python-version
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
DEVELOPMENT.md		DEVELOPMENT.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
PUBLISHING.md		PUBLISHING.md
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

vggt-mps

What it produces

Architecture

Sparse attention

Requirements

Install

From source (recommended)

Download model weights

Usage

CLI

Python

MCP server (Claude Desktop)

Dependencies

Limitations

References

Contributing

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

vggt-mps

What it produces

Architecture

Sparse attention

Requirements

Install

From source (recommended)

Download model weights

Usage

CLI

Python

MCP server (Claude Desktop)

Dependencies

Limitations

References

Contributing

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages