Privacy firewall for your PDFs before sending to LLMs
Large Language Models are becoming a default tool for reviewing, summarizing, and extracting insights from documents. But there is a hidden cost.
Most LLMs require raw document input. When you upload a contract, medical record, financial, or internal report, you are often sending unfiltered sensitive data along with it.
This creates real risks:
Personally identifiable information is exposed unintentionally
Confidential or regulated data leaves your control
Manual redaction is slow, error-prone, and inconsistent
Existing redaction tools are either rule-based, cloud-only, or break document structure
In practice, teams are forced to choose between using LLMs effectively and protecting privacy. That trade-off should not exist.
RedactAI is an MCP (Model Context Protocol) server that provides AI-powered sensitive data detection and redaction for PDF documents. It leverages local Ollama models to identify and permanently remove personally identifiable information (PII) from PDFs while maintaining document integrity.
Simply provide a PDF file path, and RedactAI will:
- β Automatically detect sensitive data (names, emails, dates, IDs, medical info, financial data)
- β Redact permanently by blacking out sensitive information
- β Let you preview with side-by-side comparison before and after
- β Choose your model for speed vs accuracy trade-offs
- β Customize redactions by excluding false positives or adding missed items
Privacy First: All processing happens locally with Ollama - your data never leaves your machine.
Click to watch the full demo on YouTube
| Feature | RedactAI | Manual Redaction | Cloud Services |
|---|---|---|---|
| Privacy | β 100% Local | β Local | β Data sent to cloud |
| Speed | β Seconds | β Hours | β Fast |
| Accuracy | β AI-Powered | β Error-prone | β AI-Powered |
| Cost | β Free | β Subscription fees | |
| Customization | β Full control | β Full control | β Limited |
| Audit Trail | β Highlighted preview | β Manual tracking |
- Why RedactAI?
- Features
- Architecture
- Prerequisites
- Installation
- MCP Configuration
- Model Selection
- Usage Examples
- Available Tools
- Workflow Example
- Technical Details
- Troubleshooting
- Project Structure
- Contributing
- License
- Choose from any locally installed Ollama model (gemma3:1b, llama3.2:3b, mistral:7b, etc.)
- Trade-off between speed and accuracy based on model size
- Automatic model caching for improved performance
Automatically detects and redacts:
- Names: Full names of people (John Doe, Dr. Smith, Jane M. Johnson)
- Emails: Email addresses ([email protected], [email protected])
- Phones: Phone numbers (+1-555-123-4567, (555) 123-4567)
- Addresses: Physical addresses (123 Main St, Apt 4B, New York, NY 10001)
- IDs/SSNs: ID numbers (123-45-6789, Passport: AB1234567)
- Credit Cards: Card numbers (1234-5678-9012-3456)
- Dates of Birth: DOB: 01/15/1990, Born: January 15, 1990
- Medical Info: Diagnosis codes, patient IDs, prescription info
- Financial Data: Account numbers, transaction details, salary info
- Other PII: Social media handles, URLs with personal info
1. Automatic Redaction - Full AI-powered detection and redaction
2. Analysis Mode - Preview sensitive data before redacting
3. Custom Redaction - Fine-tune results with exclude/include lists
- Redacted PDF: Permanently blacks out sensitive information
- Highlighted PDF: Preview showing what was detected (yellow highlights)
- Auto-opens both original and redacted PDFs for side-by-side comparison
- Detailed progress tracking for each operation step
- Masked data reporting (shows first/last characters only)
- Cross-platform support (Windows, macOS, Linux)
- MCP Framework: FastMCP for Model Context Protocol implementation
- LLM Integration: Ollama API with structured JSON responses
- PDF Processing: PyMuPDF (fitz) for text extraction and redaction
- Text Analysis: Custom data processor with masking utilities
1. MCP Server (src/server.py)
- Exposes 5 primary tools via MCP protocol
- Handles LLM instance caching
- Progress tracking and error recovery
2. Ollama LLM Wrapper (src/tools/ollama_llm.py)
- Robust JSON parsing with error recovery
- Structured schema for consistent output
- Connection health checking
3. PDF Extractor (src/tools/pdf_extractor.py)
- Text extraction from PDF documents
- Support for page-by-page or full document extraction
4. Data Processor (src/tools/data_processor.py)
- Flattens and deduplicates sensitive data
- Creates masked versions for secure reporting
5. PDF Redactor (src/tools/pdf_redactor.py)
- Applies black redactions to matched text
- Generates highlighted preview version
- Per-page redaction statistics
Before installing RedactAI, ensure you have:
-
Python 3.8 or higher
python --version
-
Ollama installed and running
- Download from: https://ollama.ai
- After installation, start the service:
ollama serve
-
At least one Ollama model (recommended):
# Fast model (recommended for getting started) ollama pull gemma3:1b # Balanced model (recommended for balance) ollama pull gemma3:4b # Accurate model (for maximum precision) ollama pull gemma3:12b
-
Claude Desktop (for MCP integration)
- Download from: https://claude.ai/download
Choose your platform and run the automated installation:
# Download the script
Invoke-WebRequest -Uri "https://raw.githubusercontent.com/AtharvSabde/RedactAI/main/setup.ps1" -OutFile "setup.ps1"
# Run it
powershell -ExecutionPolicy Bypass -File setup.ps1chmod +x setup.sh
./setup.shThe automated script will:
- β Install all dependencies
- β Set up virtual environment
- β Configure Ollama and pull recommended model
- β Automatically configure Claude Desktop
- β Verify the installation
After installation completes:
- Restart Claude Desktop
- Type in Claude:
List available Ollama models - If you see models listed, you're ready to go! π
π Complete Installation Guide (INSTALLATION.md)
The detailed guide includes:
- Manual installation steps
- Prerequisites checklist
- Configuration helper scripts
- Troubleshooting common issues
- Platform-specific instructions
Choose the right model based on your needs:
| Model | Parameters | Speed | Accuracy | Best For |
|---|---|---|---|---|
| gemma3:1b | 1 Billion | β‘ Fast (14s) | Basic | Quick scans, simple documents |
| gemma3:4b | 4 Billion | βοΈ Balanced (49s) | High | Recommended - balanced use |
| gemma3:12b | 12 Billion | π’ Slow (108s) | Higher | better accuracy, complex documents |
- gemma3:1b: 9 redactions in 14 seconds (basic detection)
- gemma3:4b: 38 redactions in 49 seconds (aggressive detection) β Recommended
- gemma3:12b: 14 redactions in 108 seconds (smart/selective)
Recommendation: Start with gemma3:4b for the best balance of speed and accuracy.
| Model Size | Parameters | Speed | Accuracy | Best For |
|---|---|---|---|---|
| Small | 1B-4B | β‘β‘β‘ | ββ | Quick processing, simple documents |
| Medium | 4B-12B | β‘β‘ | βββ | Balanced use, most documents |
| Large | 12B+ | β‘ | ββββ | High accuracy, complex documents |
In Claude Desktop, simply say:
Redact "C:\Users\atharv\Desktop\resume.pdf"
RedactAI will:
- β Analyze the PDF with the default model (gemma3:1b)
- β Detect all sensitive information
- β Create a redacted version
- β Auto-open both PDFs side-by-side for comparison
Redact my resume using gemma3:4b model for better accuracy
After seeing the first redaction:
Redact again but don't redact my name "John Doe" and DO redact "Google" and "Project X"
RedactAI will use the redact_pdf_custom tool to:
- Exclude: "John Doe"
- Include: "Google", "Project X"
Analyze "C:\Documents\contract.pdf" without redacting
This shows you what would be redacted without creating a new file.
What Ollama models do I have available?
Is Ollama running and ready?
RedactAI provides 5 MCP tools:
Lists all Ollama models installed on your system with size and details.
Returns: JSON with model list and size-to-accuracy guidance
Use case: Check which models you can use before redacting.
Verifies Ollama service is running and specified model is available.
Parameters:
model(optional): Model name to check (default: "gemma3:1b")base_url(optional): Ollama API URL (default: "http://localhost:11434")
Use case: Troubleshooting connection issues.
Analyzes PDF to detect sensitive information WITHOUT redacting.
Parameters:
pdf_path: Local file path to PDFpdf_base64(optional): Base64 encoded PDF datamodel(optional): Ollama model to use (default: "gemma3:1b")
Returns:
- Masked preview of detected data
- Categories and counts
- No files created
Use case: Preview before permanent redaction.
Permanently redacts sensitive data from PDF.
Parameters:
pdf_path: Local file path to PDFpdf_base64(optional): Base64 encoded PDF datamodel(optional): Model to use (default: "gemma3:1b")return_base64(optional): Return as base64 (default: false)auto_open(optional): Auto-open PDFs (default: true)
Returns:
- Redacted PDF (blacked out sensitive data)
- Highlighted preview PDF (shows what was redacted)
- Detailed summary with masked data
- Statistics per page
Use case: Main redaction workflow.
Custom redaction with user-specified exclusions and additions.
Parameters:
pdf_path: Path to ORIGINAL PDF (required)exclude_items: List of strings to NOT redactinclude_items: List of strings to forcefully redactmodel(optional): Model to use (default: "gemma3:1b")auto_open(optional): Auto-open PDFs (default: true)return_base64(optional): Return as base64 (default: false)
Example:
{
"pdf_path": "resume.pdf",
"exclude_items": ["John Doe", "[email protected]"],
"include_items": ["Secret Project", "XYZ Corp"],
"model": "gemma3:4b"
}Use case: Fine-tune redactions after initial pass. User reviews initial redaction and says "don't redact my name, but DO redact 'Google'".
1. User uploads PDF β analyze_pdf_sensitive_data()
2. Review masked sensitive data β Decide what to redact
3. Run redact_pdf() β Get redacted + highlighted PDFs
4. Both PDFs auto-open side-by-side
5. If adjustments needed β Use redact_pdf_custom()
- Exclude false positives
- Include additional items
6. Final redacted PDF ready for sharing
Solution:
# Start Ollama service
ollama serve
# Verify it's running
curl http://localhost:11434/api/tagsSolution:
# List installed models
ollama list
# Install missing model
ollama pull gemma3:1bSolution:
- Verify paths in
claude_desktop_config.jsonare correct - Use forward slashes (
/) even on Windows, or escape backslashes (\\) - Restart Claude Desktop completely
- Check Claude logs:
- Windows:
%APPDATA%\Claude\logs - macOS:
~/Library/Logs/Claude
- Windows:
Solution:
- Ensure you have a PDF viewer installed (Adobe Reader, browser, etc.)
- Try manually opening from the output path
- Set
auto_open: falsein the tool call if it's causing issues
Solution:
- Use a smaller model:
gemma3:1b(fastest) - Process fewer pages at once
- Upgrade your hardware (more RAM/CPU helps)
Solution:
# Check Ollama logs
ollama logs
# Restart Ollama
# Windows: Stop and restart the service
# macOS/Linux:
killall ollama
ollama serveSolution:
- Try different models:
gemma3:4borgemma3:12b - Use
redact_pdf_customto fine-tune results - Exclude false positives or include missed items
RedactAI/
βββ src/
β βββ server.py # Main MCP server (FastMCP)
β βββ tools/
β βββ ollama_llm.py # Ollama LLM integration
β βββ pdf_extractor.py # PDF text extraction
β βββ data_processor.py # Sensitive data processing
β βββ pdf_redactor.py # PDF redaction logic
βββ scripts/
β βββ configure_claude.py # Configuration helper script
βββ setup.sh # Automated setup (macOS/Linux)
βββ setup.ps1 # Automated setup (Windows)
βββ requirements.txt # Python dependencies
βββ INSTALLATION.md # Detailed installation guide
βββ README.md # This file
βββ LICENSE # MIT License
βββ .gitignore
- Original files are never modified - Redactions create new files
- Temporary files are cleaned up - Automatic cleanup in finally blocks
- Masked reporting - Sensitive data never exposed in full in logs/responses
- Local processing - All LLM operations run locally via Ollama (no cloud APIs)
- No data transmission - Your sensitive documents stay on your machine
Contributions are welcome! Here's how you can help:
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature - Commit your changes:
git commit -m 'Add amazing feature' - Push to the branch:
git push origin feature/amazing-feature - Open a Pull Request
# Clone your fork
git clone https://github.com/YOUR_USERNAME/RedactAI.git
cd RedactAI
# Create venv and install dependencies
python -m venv venv
source venv/bin/activate # or venv\Scripts\activate on Windows
pip install -r requirements.txt
# Test changes
python src/server.py- This is an MCP server - it exposes tools via the Model Context Protocol, not a standalone CLI application
- The server must be running and connected to an MCP client (like Claude Desktop) to use the tools
- All operations return detailed JSON responses with progress tracking and error information
- The system requires Ollama to be running locally at
http://localhost:11434by default - Larger models provide better accuracy but require more computational resources and time
- The highlighted PDF serves as a preview/audit trail of what was redacted
- Anthropic for Claude and MCP
- Ollama for local LLM infrastructure
- PyMuPDF for PDF processing
- FastMCP for MCP server framework
Atharv Sabde
- GitHub: @AtharvSabde
- Project: RedactAI
If RedactAI helped you protect your privacy, please β star the repo on GitHub!
Built with β€οΈ for privacy-conscious AI users
