tpeople - High-Scale Local Image Analysis System

A high-performance biometric recognition and semantic search system for 300k+ images, using LanceDB, CLIP/SigLIP embeddings, and ArcFace facial recognition.

🏗️ Project Structure

tpeople/                      # Repository (T:\tpeople)
├── server/                   # FastAPI/MCP Server
│   ├── lab_server1.py        # Hybrid FastAPI/MCP server
│   ├── lab_server1_core.py   # Shared business logic
│   ├── lab_server1_db.py     # Database config & management
│   └── lab_server1_schema.py # LanceDB schemas
│
├── builders/                 # Data Pipeline Builders
│   ├── import_image_semantic.py  # Image semantic (CLIP/SigLIP)
│   ├── import_image_face.py      # Face detection (YOLO+ArcFace)
│   ├── import_text_semantic.py   # Text semantic (BGE embeddings)
│   ├── import_text_fts.py        # Text FTS5 (SQLite full-text)
│   ├── builder_utils1.py     # Shared builder utilities
│   └── lab_logging.py        # Structured logging
│
├── explorers/                # Interactive UI Tools
│   ├── clusterexplorer.py    # Face clustering & identity
│   ├── imageexplorer1.py     # Image search UI
│   └── textexplorer1.py      # Text search UI
│
├── pipelines/                # Core Pipeline Logic
│   ├── semantic_image.py     # CLIP/SigLIP pipeline
│   ├── face_image.py         # YOLO + ArcFace pipeline
│   ├── text_semantic.py      # BGE embedding pipeline
│   ├── text_fts.py           # FTS5 indexing pipeline
│   ├── lab_utils.py          # Image utilities
│   └── lab_file_scout.py     # File discovery
│
├── server_client_samples/    # Client Examples
│   ├── lab_client1_simple.py
│   └── lab_client1_simple.cs
│
├── testers/                  # Testing & Utilities
│   ├── lab_bigtext_generator.py # BIGTEST corpus generator
│   └── lab_texttester.py     # Text search tester
│
├── utils/                    # Utility Scripts
│   ├── UtilDedupe.py         # Duplicate cleanup
│   └── utilHtmlTxt.py        # HTML text extraction
│
├── research_lab/             # Experimental Tools
│   └── lab_describe_snapshot.py # UI analysis (Moondream2)
│
├── oldTools/                 # Legacy Diagnostic Tools
│   ├── lab_check_indices.py
│   ├── lab_diagnose_schema.py
│   └── lab_migrate_cluster_summary.py
│
├── test_data/                # Test Datasets (Source for auto-init)
│   ├── text/
│   │   ├── stories/          # Fairy tale corpus
│   │   └── cinderella.txt    # Needle-in-haystack test
│   └── images/
│       └── sg9.jpg           # Needle-in-haystack image
│
└── .github/
    ├── TODO.md               # Project roadmap
    └── copilot-instructions.md # AI coding guidelines

tserverData/                  # Outside Repo (T:\tserverData)
├── test/                     # Test Environment
│   ├── database/             # Test database (auto-created)
│   ├── images/               # Test images (auto-populated)
│   └── text/                 # Test documents (auto-populated)
│
├── ref/                      # Reference Environment
│   ├── database/             # Reference database
│   ├── images/               # Reference images
│   └── text/                 # Reference documents
│
└── bigtest/                  # BIGTEST Environment
    ├── database/             # BIGTEST database
    ├── images/               # BIGTEST images
    └── text/                 # BIGTEST documents

🚀 Quick Start

1. Start the Server (Hybrid MCP + HTTP)

python server/lab_server1.py --test
# Dashboard: http://127.0.0.1:8000/docs
# MCP: http://127.0.0.1:8000/mcp

2. Run Data Ingestion (Phase 3 Specialized Builders)

Image Ingestion

# Semantic embeddings (CLIP/SigLIP)
python builders/import_image_semantic.py --root T:\tserverData\test\images --test --model clip

# Face detection (YOLO + ArcFace)
python builders/import_image_face.py --root T:\tserverData\test\images --test

Text Ingestion

# Semantic embeddings (BGE)
python builders/import_text_semantic.py --root T:\tserverData\test\text --test

# Full-text search (FTS5)
python builders/import_text_fts.py --root T:\tserverData\test\text --test

3. Explorers (Interactive UI)

# Face Cluster Explorer
python explorers/clusterexplorer.py --test

# Image Search Explorer
python explorers/imageexplorer1.py --test

# Text Search Explorer
python explorers/textexplorer1.py --test

4. Advanced Operations

Cluster Auto-Tuning

# Headless auto-tuning (uses REF corpus)
python explorers/clusterexplorer.py --test --headless --autoeps

# Reset cluster assignments
python explorers/clusterexplorer.py --test --init

BIGTEST Corpus Generation

# Generate 10k test files
python testers/lab_bigtext_generator.py 10000 --threads 12

# Ingest into BIGTEST database
python builders/import_text_semantic.py --root T:\tserverData\bigtest\text --bigtest
python builders/import_text_fts.py --root T:\tserverData\bigtest\text --bigtest
python builders/import_image_semantic.py --root T:\tserverData\bigtest\images --bigtest --model clip
python builders/import_image_face.py --root T:\tserverData\bigtest\images --bigtest

🌍 Environments

The system supports four environments via CLI flags (--prod, --test, --ref, --bigtest):

Environment	Database	Image Corpus	Text Corpus	Purpose
PROD	`T:\tserverData\prod\database`	`T:\_ALLPIC`*	`T:\_ALLTEXT`*	Production (300k+ images)
TEST	`T:\tserverData\test\database`	`T:\tserverData\test\images`	`T:\tserverData\test\text`	Development/Testing
REF	`T:\tserverData\ref\database`	`T:\tserverData\ref\images`	`T:\tserverData\ref\text`	Ground Truth (AutoTuner)
BIGTEST	`T:\tserverData\bigtest\database`	`T:\tserverData\bigtest\images`	`T:\tserverData\bigtest\text`	Load Testing (generated)

Notes:

All environments are automatically initialized under T:\tserverData\ (drive root, outside repo)
TEST, REF, and BIGTEST corpus folders are auto-populated from test_data/ on first use
*PROD corpus paths are configurable via T:\tserverData\prod\corpus_config.json (created with defaults on first use)

📚 Key Features

Phase 3 Pipeline Architecture: Single-responsibility specialized builders for optimal performance
- Image Semantic (import_image_semantic.py): CLIP/SigLIP embeddings with 13 loaders (224×224)
- Image Face (import_image_face.py): YOLO detection + ArcFace embeddings with 6 loaders (640×640)
- Text Semantic (import_text_semantic.py): BGE-large-en-v1.5 embeddings with paragraph chunking
- Text FTS5 (import_text_fts.py): SQLite full-text search indexing
Multi-Model Support: CLIP, SigLIP, BGE, ArcFace in unified schema
Face Recognition: ArcFace (512-dim) with DBSCAN clustering, auto-tuning via reference data
Text Search: Semantic embeddings + FTS5 full-text (phrase queries, boolean logic)
Hybrid Server: FastAPI + Model Context Protocol (MCP) for AI agents
Performance: 45-50 files/sec face detection, responsive shutdown (<2s on Ctrl+C)
Unicode Safety: Automatic path sanitization for Windows (prevents OpenCV/LanceDB crashes)
Quality Control: Size filtering (48px minimum), duplicate detection, invalid record filtering
Load Testing: BIGTEST corpus generator for performance validation (1k-100k+ files)
Comprehensive Telemetry: Real-time monitoring with detection speed, ETA, GPU utilization

📋 Documentation

Copilot Instructions - AI coding guidelines & framework rules
TODO - Project roadmap and tasks
Database Schema - Schema versioning & migration guide

🛠️ Technology Stack

Database: LanceDB (vector storage) + SQLite (FTS5 full-text)
Embeddings: CLIP, SigLIP, BGE-large-en-v1.5, ArcFace
Detection: YOLOv9-Face
Framework: FastAPI, FastMCP
UI: PySide6 (Qt)
ML: PyTorch, Transformers, InsightFace
Analysis: Moondream2 (UI snapshot analysis)

Name		Name	Last commit message	Last commit date
Latest commit History 138 Commits
.github		.github
.vscode		.vscode
__pycache__		__pycache__
builders		builders
explorers		explorers
migrations		migrations
oldTools		oldTools
pipelines		pipelines
research_lab		research_lab
server		server
server_client_samples		server_client_samples
test_data		test_data
testers		testers
utils		utils
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
clean.bat		clean.bat
runt.bat		runt.bat
tpeople.code-workspace		tpeople.code-workspace

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

tpeople - High-Scale Local Image Analysis System

🏗️ Project Structure

🚀 Quick Start

1. Start the Server (Hybrid MCP + HTTP)

2. Run Data Ingestion (Phase 3 Specialized Builders)

Image Ingestion

Text Ingestion

3. Explorers (Interactive UI)

4. Advanced Operations

Cluster Auto-Tuning

BIGTEST Corpus Generation

🌍 Environments

📚 Key Features

📋 Documentation

🛠️ Technology Stack

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

tpeople - High-Scale Local Image Analysis System

🏗️ Project Structure

🚀 Quick Start

1. Start the Server (Hybrid MCP + HTTP)

2. Run Data Ingestion (Phase 3 Specialized Builders)

Image Ingestion

Text Ingestion

3. Explorers (Interactive UI)

4. Advanced Operations

Cluster Auto-Tuning

BIGTEST Corpus Generation

🌍 Environments

📚 Key Features

📋 Documentation

🛠️ Technology Stack

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages