🧪 Science-Backed Text Anonymisation

Anonymize Text.
Retain Data.

Research-grade PII detection for Mac, Windows, Linux, and Mobile. Scale from individual research papers to enterprise-wide cloud APIs.

Available for texts in English, Dutch, French, Spanish, German, Italian & many more

Reports

Input Text

"Jane Doe lives at 123 Baker St and works at Apple."

Anonymised Output

"PERSON_1 lives at ADDRESS_1 and works at ORG_1."

Developed for Smart Privacy

Zero Code Required Intuitive GUI designed for researchers and non-technical staff.

Air-Gapped Ready Local processing ensures data never leaves your infrastructure.

ML-Powered NER Probabilistic entity detection outperforms static dictionary lists.

Intruder Tested Empirically validated against human re-identification attempts.

Core Principles

Scientific Foundation

Based on the peer-reviewed Textwash project (GPL-3.0). Auditable, transparent, and built by academic researchers.

Contextual Privacy

Uses category probabilities to anonymise phrases based on linguistic context, not just simple keywords.

Local-First Architecture

Designed for sensitive institutional data. No internet connection required for the desktop application.

ISO 9001 Certified Development Company

🧩 Product family

Choose the setup that best fits your workflow, from a user-friendly desktop app to cloud APIs and the original open-source script.

All variants are built around the same research-grade anonymisation approach and evaluation framework.

Desktop & mobile app

Textwash Pro

Mac · Windows · Linux · iOS · Android

A user-friendly application that runs entirely on your devices. Import unstructured text data and export anonymised versions without sending anything to external servers.

Supports English, Dutch, French, Spanish, German, Italian, and many more; designed to be easy to use for non-technical users.

Offline by default · GUI-based

API & integrations

Textwash Pro API

Cloud-based processing · Zapier-ready

Cloud API for integrating Textwash anonymisation into your own systems and workflows. Ideal for automated pipelines, web apps, and low-code tools such as Zapier.

Process text from forms, CRMs, or ticket systems before storage or analysis.

REST API · Integrations

Cloud workspace

Textwash Pro Cloud

Browser-based batch processing

Use Textwash in a cloud environment hosted by us or in your own organisation's cloud. Upload datasets, configure entity types, and run anonymisation jobs directly from your browser.

Ideal for teams who need shared project dashboards and result logs.

Hosting in your cloud or servers by us · Team-ready

Open-source foundations

Textwash Free

Original script · No GUI

The original open-source Textwash project that Textwash Pro builds on. A script-based anonymisation tool without a graphical interface, intended for technical users who want direct access to the underlying code.

Includes the full anonymisation pipeline and evaluation materials under GPL-3.0.

Source code & paper The open-source original

🏢 Typical use cases

Textwash Pro is built to support real-world anonymisation workflows in research, industry, and the public sector.

If your use case involves unstructured text and personal data, Textwash Pro is likely relevant. Not sure? Reach out at [email protected]

GDPR-compliant data anonymisation

Anonymise free-text fields that contain personal data before storing or sharing them:

Customer support logs and email archives
Contact forms and CRM notes
Internal reports with narrative descriptions

Archiving & records retention

Keep long-term text repositories usable while reducing privacy risk before archival storage:

Legacy case files and historical correspondence
Retention-ready email and document collections
Internal knowledge bases with personal references

Open Science & data sharing

Prepare research datasets for sharing while protecting participants’ identities:

Survey open-ended responses
Interview and focus group transcripts
Field notes and qualitative research data

Legal, Health, & Social services

Remove direct and indirect identifiers from sensitive case descriptions:

Clinical notes and case vignettes
Legal case summaries and memos
Social work documentation and protocols

User research & UX feedback

Anonymise qualitative feedback before sharing within teams or with external partners:

User interviews and usability tests
App store reviews and support tickets
Internal product discovery notes

Logs & monitoring data

Remove PII from semi-structured logs before central storage or analysis:

Application and server logs containing user details
Chat logs from support systems
Exported audit trails and monitoring outputs

Proxy & preprocessing for LLM workflows

Route prompts and free-text inputs through anonymisation before they reach external or internal LLM systems:

PII-safe prompt proxy for shared AI assistants
Preprocess support tickets before summarisation
Mask identifiers prior to retrieval, ranking, and generation

🏛️ Custom institutional workflows

We design end-to-end anonymisation workflows that align with institutional governance, legal obligations, and research quality standards.

📋 Governance alignment

Policy mapping, retention definitions, and approval checkpoints for every data flow stage

🛡️ Controlled processing

Role-based access setup, secure review loops, and privacy controls for internal and external teams

📈 Audit readiness

Documented procedures, quality evidence, and repeatable validation protocols for compliance reviews

For institutional rollouts, integration planning, or compliance questions, contact [email protected]

🤝 Optional Services

Textwash Pro works as a standalone product. If helpful, we additionally offer optional implementation and consultancy support for research teams, companies, and public-sector organisations.

Advisory and implementation support

The optional service package combines operational design, integration planning, and quality assurance for sensitive text workflows.

Workflow assessment for privacy, compliance, and data utility
PII preprocessing and guardrail design before model usage
Output postprocessing checks to reduce leakage risk

Cross-checking between anonymisation quality and business requirements
Human-in-the-loop review strategies for high-impact datasets
Integration recommendations across on-prem and cloud stacks

🔎 Phase 1: Discovery

Data landscape review
Risk and exposure mapping
Target workflow definition

🧪 Phase 2: Pilot

Dataset onboarding
Entity type calibration
Human quality review setup

⚙️ Phase 3: Integration

System and API integration
Operational runbook creation
Monitoring dashboard rollout

✅ Phase 4: Governance

Audit & evidence reviews
Policy and training package
Continuous improvements

Optional services are available for SMEs, enterprise teams, universities, healthcare, and public sector

Built for serious data protection work

Textwash Pro was designed to meet high standards for text anonymisation. The following principles guide its development.

1. Complete and transparent evaluation

The underlying anonymisation approach has been evaluated empirically. This includes tests of what the tool can and cannot do, as well as a motivated intruder test where humans attempt to re-identify persons in anonymised documents.

2. Data never leave your system

The Textwash Pro application does not require you to upload text data or use any remote API. You can disconnect from the internet and continue anonymising documents. This minimises leakage and reduces risks for sensitive data.

3. Transparent foundations

Textwash Pro is based on the open, research-driven Textwash project. The foundations can be inspected, tested, and extended by the community.

4. Learning-based anonymisation

Personal information is complex and context-dependent. Textwash therefore does not rely on simple dictionary lookups. Instead, it uses a machine learning model that assigns category probabilities to phrases and anonymises them accordingly.

Considering other anonymisation tools?

Even if you do not use Textwash Pro, we strongly encourage you to ask any tool provider for:

An empirical evaluation that clearly shows what their tool can and cannot do (you can point them to the Textwash evaluation approach and dataset)
A clear justification for why data must be sent to online services or APIs In many cases, strong anonymisation does not require central data collection

If this level of transparency is not available, treat risk claims with caution

You can always reach us at [email protected] if you have questions about evaluation details.

European Data Laws (GDPR)

Compliance by design, not as an add-on

Textwash Pro is 100% ready for current EU privacy requirements and aligns with GDPR principles, especially data minimisation, purpose limitation, and privacy by design/default (Articles 5 and 25 GDPR).

European AI Sovereignty

Full local deployment on Windows, Linux, and macOS

The Windows/Linux/macOS app runs fully local, offline, and air-gapped. No data leaves the client environment, and no external APIs are required.

❓ FAQ

Common questions about deployment models, support levels, and governance requirements

❓ Is Textwash Pro usable without optional services?

Yes. The product is fully usable on its own, and services are optional

❓ Do you provide SLA options?

Yes. We can define service levels, support windows, response targets, and escalation paths for qualifying organisations

❓ Is Textwash Pro suitable for public sector programmes?

Yes. We support public sector, research, healthcare, and regulated environments with governance-aligned implementation plans

❓ Can on-premise and cloud setups be combined?

Yes. Hybrid architectures can combine local processing with API or cloud components, depending on policy and risk constraints

❓ How do you support audits and compliance reviews?

We provide documentation inputs, quality checkpoints, and implementation evidence to help internal governance and external audits

❓ Who should contact you for enterprise or institutional rollout?

Programme managers, data protection teams, and technical leads can contact us at [email protected] to discuss fit and rollout options

❓ How does Textwash Pro support GDPR compliance in practice?

Textwash Pro is aligned with GDPR principles including data minimisation, purpose limitation, and privacy by design/default (Articles 5 and 25), and supports compliance-focused workflows for sensitive text handling.

❓ Can Textwash Pro be used in sovereign or air-gapped AI environments?

Yes. The Windows/Linux/macOS deployment runs fully local and offline, so no data leaves the client environment and no external API connectivity is required.

🚀 Quick Start Guide

Textwash Pro offers a graphical user interface (GUI) for anonymising text files with no command line required:

Open the Textwash Pro app on your Mac, Windows, Linux, iOS, or Android device
Import data by selecting individual files or folders in the GUI
Set the language (supports English, Dutch, French, Spanish, German, Italian, and many more)
Choose the output folder where anonymised files should be saved
Start the anonymisation run; anonymised files are written to the chosen directory

Textwash Pro is designed to be user-friendly and works well for both small and large text collections. It can take advantage of powerful hardware where available, but does not require any technical setup.

Need a walkthrough?

If you would like a short demo or have specific questions about your use case, we are happy to help.

Examples & sample data

Also the original open-source Textwash project includes detailed person descriptions and their anonymised counterparts. These examples illustrate how the underlying anonymisation behaves.

Original, detail-rich descriptions in the examples directory
Corresponding anonymised versions in examples_anonymised

You can use these example files to understand how different entity types are treated, and as a starting point for your own evaluation.

Browse Textwash Free on GitHub

🏷️ Fine-grained control over entity types

Textwash can anonymise a rich set of entity types and can be restricted to a subset as needed.

This allows you to align anonymisation with legal and methodological requirements while preserving as much non-identifying information as possible.

PRONOUNS PHONE NUMBER EMAIL ADDRESS NUMERICS MONTHS DATE PERSON LOCATION OCCUPATION TITLE AGE CULTURAL IDENTITY TIME ADDRESS ORGANISATION OTHER IDENTIFIABLE ATTRIBUTE

By selecting only the entity types you need, you can tailor anonymisation to your context while keeping as much useful, non-identifying information as possible.

Research Info

Textwash Pro is a commercial product built on a research-driven and openly documented foundation. For procurement and governance processes, we recommend reviewing independent evidence alongside product claims.

Empirical evaluation showing what a tool can and cannot anonymise, ideally against shared benchmark datasets
A clear explanation of data flow, including why remote APIs are needed and which safeguards apply
Governance artefacts such as validation reports, audit evidence, and documented limits of the method

These materials help organisations make informed, auditable decisions about deployment readiness.

🔬 Technical Reports

🛡️ Privacy in benchmark comparisons

Independent benchmark work compares Textwash against multiple anonymisation approaches and indicates that base Textwash is highly competitive and dependable on privacy performance

⚖️ Trade-offs over single-metric claims

Published studies discuss utility/privacy trade-offs (e.g., BLEU, re-identification risk, and computation cost), supporting practical model selection for real deployment constraints

📚 Evidence across publications

The evidence base spans peer-reviewed journals and conference proceedings, which support procurement, requirement governance, and internal documentation needs

• arXiv (2026): arxiv.org/pdf/2602.12806

• Nature Scientific Reports: nature.com/articles/s41598-023-42977-3

• Procedia Computer Science: sciencedirect.com/science/article/pii/S1877050925008518

• arXiv (2024): arxiv.org/abs/2411.05978

• ACM Digital Library: dl.acm.org/doi/abs/10.1145/3576050.3576070

• arXiv (2021): arxiv.org/abs/2103.09263

• and many more

Benchmark comparison from the technical report

Click to open benchmark figure on arXiv

For research collaborations, interoperability discussions, or evaluation questions, contact [email protected]

🤝 Become a Textwash Pro Partner

Looking for an anonymisation partner for your product, organization, or research project? Let’s discuss integrations, pilots, and custom deployment options.

✉️ Email us

👥 Who developed Textwash Pro?

Textwash Pro is developed and distributed by Dr. Bennett Kleinberg & jocapps^® GmbH and is based on Textwash (github.com/ben-aaron188/textwash) under the GNU General Public License v3.0. The original Textwash project was developed by Dr. Maximilian Mozes and Dr. Bennett Kleinberg.

Textwash Pro extends this foundation with a multi-platform GUI, deployment options, and additional tooling while preserving the open, research-driven ethos of the original project.

Paper: Kleinberg, B., Davies, T., & Mozes, M. (2022). Textwash, automated open-source text anonymisation. arXiv:2208.13081.

Questions? Mail us!
[email protected]

Anonymize Text. Retain Data.