A comprehensive solution for anonymizing and pseudonymizing personally identifiable information (PII) in both textual and structured data. Built on Microsoft Presidio, this API provides enterprise-grade data deidentification with configurable operators and methods.
- Text & Structured Data Processing - Support for plain text and JSON
- Flexible Anonymization - Multiple operators: replace, redact, mask, hash, encrypt
- Consistent Pseudonymization - Random number, counter, and cryptographic hash methods. Maintains consistency within a single request.
- Production Ready - Thread-safe, scalable FastAPI service with comprehensive error handling
PERSON, EMAIL_ADDRESS, PHONE_NUMBER, CREDIT_CARD, IP_ADDRESS,
LOCATION, DATE_TIME, URL.
Anonymization Operators:
replace- Replace with generic placeholdersredact- Remove entirelymask- Masking with characterhash- Cryptographic hashencrypt- Encryption
Pseudonymization Methods:
random_number- Cryptographically secure random pseudonymscounter- Sequential numberingcrypto_hash- BLAKE2b-based pseudonyms
You can run the application either directly with uv or using Docker.
- Clone the repository
- Set up environment variables:
Create a
.envfile in the project root by copying.env.default:You can then modify the variables incp .env.default .env.envas needed.
The application is containerized using Docker, with a robust and flexible deployment strategy that leverages:
- Docker for containerization with a multi-environment support (dev and prod) using Docker Compose profiles
- Traefik as a reverse proxy and load balancer, with built-in SSL/TLS support via Let's Encrypt, and a dashboard in dev environment.
- Gunicorn as the production-grade WSGI HTTP server, with configurable worker processes and threads, and dynamic scaling based on system resources.
- Docker and Docker Compose installed on your machine.
Build and run the development environment:
docker compose --profile dev up --build
The API will be available at : http://ddi.localhost
Traefik Dashboard will be available at : http://traefik.ddi.localhost
For a quick test without full stack:
docker build --target dev-standalone -t ddi:dev-standalone .
docker run --env-file .env -p 8005:8005 ddi:dev-standalone
Note: This version won't reflect source code changes in real-time.
Configure production-specific settings, then build and run the production environment:
docker compose --profile prod up --build
- Python 3.13 or higher
- uv for dependency management
-
Install uv, see https://docs.astral.sh/uv/getting-started/installation/
-
Install dependencies
make init
-
Start the server
make start
The API will be available at http://localhost:8005.
curl -X POST "http://localhost:8005/anonymize/text" \
-H "Content-Type: application/json" \
-d '{
"text": "John Doe lives in New York and his email is [email protected]",
"operator": "replace"
}'Response:
{
"anonymized_text": "<PERSON> lives in <LOCATION> and his email is <EMAIL_ADDRESS>",
"detected_entities": [
{
"type": "PERSON",
"start": 0,
"end": 8,
"score": 0.85,
"text": "John Doe"
},
{
"type": "LOCATION",
"start": 18,
"end": 26,
"score": 0.85,
"text": "New York"
},
{
"type": "EMAIL_ADDRESS",
"start": 44,
"end": 60,
"score": 1.0,
"text": "[email protected]"
}
]
}curl -X POST "http://localhost:8005/anonymize/structured" \
-H "Content-Type: application/json" \
-d '{
"data": {
"user": {
"name": "Alice Johnson",
"email": "[email protected]",
"address": "123 Main St, Boston"
}
},
"operator": "mask",
"operator_params": {
"masking_char":"*",
"chars_to_mask":999
}
}'Response:
{
"anonymized_data": {
"user": {
"name": "*************",
"email": "*****************",
"address": "*******************"
}
},
"detected_fields": {
"user.name": "PERSON",
"user.email": "EMAIL_ADDRESS",
"user.address": "LOCATION"
}
}curl -X POST "http://localhost:8005/pseudonymize/text" \
-H "Content-Type: application/json" \
-d '{
"text": "John Doe was born on January 24, 1985, and Jane Smith lives in 221B Baker Street London",
"method": "counter"
}'Response:
{
"pseudonymized_text": "<PERSON_2> was born on <DATE_TIME_1>, and <PERSON_1> lives in <LOCATION_1>",
"detected_entities": [
{
"type": "PERSON",
"start": 0,
"end": 8,
"score": 0.85,
"text": "John Doe"
},
{
"type": "DATE_TIME",
"start": 21,
"end": 37,
"score": 0.85,
"text": "January 24, 1985"
},
{
"type": "PERSON",
"start": 43,
"end": 53,
"score": 0.85,
"text": "Jane Smith"
},
{
"type": "LOCATION",
"start": 63,
"end": 87,
"score": 0.85,
"text": "221B Baker Street London"
}
]
}Configure external services to add contextual information to pseudonyms, by entity type:
# In .env file
ENRICHMENT_CONFIGURATIONS='{
"LOCATION": {
"type": "http",
"url": "http://your-geo-service.example.com/enrich"
}
}'To transform, for example, <LOCATION_123> into
<LOCATION_123> (United Kingdom) when the service returns country information.
Once the server is running, you can access the interactive API documentation:
- Swagger UI: Available at
/docs - ReDoc: Available at
/redoc
These interfaces provide detailed information about all available endpoints, request/response schemas, and allow you to test the API directly from your browser.
The project uses Ruff for linting and formatting, with pre-commit hooks for automated quality checks. Code documentation is built with MkDocs and Material theme.
Key commands for development:
make help # Display all available commands
make init # Initialize project (first installation)
make start # Start application
make check # Run all checks (precommit + test)
make format # Format code
make lint # Run linting checks
make docs-serve # Serve project documentation locally| Variable | Description | Required | Default Value | Possible Values |
|---|---|---|---|---|
| Application Configuration | ||||
DEFAULT_LANGUAGE |
Default language for text analysis | No | en |
en, fr |
DEFAULT_MINIMUM_SCORE |
Default confidence threshold | No | 0.5 |
0.0 to 1.0 |
DEFAULT_ANONYMIZATION_OPERATOR |
Default anonymization method | No | replace |
replace, redact, mask, hash, encrypt |
DEFAULT_PSEUDONYMIZATION_METHOD |
Default pseudonymization method | No | random_number |
random_number, counter, crypto_hash |
ENRICHMENT_CONFIGURATIONS |
Entity enrichment service configs | No | {} |
JSON object |
| Environment Configuration | ||||
ENVIRONMENT |
Affects error handling and logging throughout the application | No | development |
development, production |
LOG_LEVEL |
Minimum logging level | No | info |
debug, info, warning, error, critical |
| Internal Application Configuration | ||||
APP_INTERNAL_HOST |
Host for internal application binding | No | 0.0.0.0 |
Valid host/IP |
APP_INTERNAL_PORT |
Port for internal application binding | No | 8005 |
Any valid port |
| External Routing Configuration | ||||
APP_EXTERNAL_HOST |
External hostname for the application | Yes | ddi.localhost |
Valid hostname |
APP_EXTERNAL_PORT |
External port for routing (dev env only) | No | 80 |
Any valid port |
| Traefik Configuration | ||||
TRAEFIK_RELEASE |
Traefik image version | No | v3.4.4 |
Valid Traefik version |
LETS_ENCRYPT_EMAIL |
Email for Let's Encrypt certificate | Yes | [email protected] |
Valid email |
| Performance Configuration | ||||
WORKERS_COUNT |
Number of worker processes | No | 4 |
Positive integer |
THREADS_PER_WORKER |
Number of threads per worker | No | 2 |
Positive integer |
Refer to .env.default for a complete list of configurable environment variables and their default values.
The project follows Domain-Driven Design principles with clean separation of concerns:
├── domain/ # Core business logic
│ ├── contracts/ # Abstract interfaces
│ ├── services/ # Application services
│ ├── types/ # Domain models and enums
│ └── exceptions.py # Domain exceptions
├── adapters/ # External integrations
│ ├── api/ # FastAPI routes and schemas
│ ├── presidio/ # Microsoft Presidio integration
│ └── infrastructure/ # Config, HTTP client, enrichment
We welcome contributions to this project! Please see the CONTRIBUTING.md file for guidelines on how to contribute, including:
- How to set up your development environment
- Coding standards and style guidelines
- Pull request process
- Testing requirements
This project is licensed under the MIT License - see the LICENSE file for details.