RCC v16 - Reference Compliance Checker

Live demo → zooman33.github.io/rcc-v16 — interactive walkthrough with synthetic clinical trial data. No API key required.

A browser-based tool that compares new translations against client-provided reference files, flagging deviations from approved phrasing. Built for localization teams working on regulated content where reference reuse is mandatory.

About the demo: The live demo above simulates the RCC workflow using pre-computed matching results on fabricated data. It is not the real tool — it exists so recruiters and collaborators can see what RCC does without needing access to the proprietary codebase. See Note on code below.

The problem

In clinical trial localization, clients provide previously approved translations as reference files. Linguists are expected to reuse approved phrasing wherever the source text matches. Checking this manually means opening two files side by side and scanning segment by segment. On a 40-page protocol with 200+ reusable segments, that's hours of tedious cross-referencing per language pair, and things get missed.

What RCC does

RCC takes a new translation file (XLZ/XLIFF or DOCX) and one or more reference files, then uses an LLM to identify segments where the source text matches (or nearly matches) the reference source. For each match, it compares the new target against the approved target and flags any deviations.

How it works

Source file (XLZ/XLIFF/DOCX)
        |
        v
 +--------------+
 | Segment       |     Reference files (XLZ/XLIFF/DOCX)
 | Extraction    |            |
 +--------------+            v
        |            +--------------+
        |            | Segment       |
        |            | Extraction    |
        |            +--------------+
        v                    |
 +--------------+            |
 | Source-to-    |<-----------
 | Source Match  |
 | (LLM layer)  |
 +--------------+
        |
        v
 +--------------+
 | Target-to-    |
 | Target Compare|
 | (LLM layer)  |
 +--------------+
        |
        v
 +--------------+
 | Deviation     |
 | Report        |
 +--------------+

Key design decisions

Pure LLM matching, no LCS heuristics. Earlier versions (v1-v12) used longest-common-subsequence and fuzzy string matching. These broke on paraphrased sources and produced too many false positives. v13+ switched to LLM-based semantic matching, which handles paraphrase, reordering, and partial matches much better.
Placeholder filtering. Clinical trial documents are full of placeholders ([Study Drug Name], <Protocol Number>) that differ between source and reference. RCC strips these before comparison so they don't trigger false deviations.
Multiple LLM provider support. Runs against Anthropic Claude by default, but supports swapping providers. The matching prompts are tuned for clinical/regulatory content.
Browser-based, runs locally. No server, no installation. Open the HTML file, paste your API key, drop your files. Deployed across the team via OneDrive/SharePoint.

Tech stack

HTML/CSS/JavaScript (single-file browser app)
Anthropic Claude API (for semantic matching)
XLZ/XLIFF parsing (JavaScript, client-side)
DOCX extraction (JSZip + XML parsing)

What it replaced

Manual cross-referencing that took 2-4 hours per file per language pair, with an error rate that went up sharply after the first 100 segments (reviewer fatigue). RCC runs the same check in minutes and catches deviations that humans reliably miss on long documents.

Versions

Version	Change
v1-v12	LCS/fuzzy string matching, high false positive rate
v13	Switched to LLM-based matching
v14	Added placeholder filtering
v15	Multi-reference file support
v16	Multiple LLM provider support, improved prompt tuning

Note on code

This tool was built for internal use at Lionbridge on the Merck/MSD account. The source code contains client-specific logic and is not published here. This README documents the architecture, design decisions, and problem it solves.

If you're building something similar for your own localization team, I'm happy to talk through the approach. Reach out at [email protected].

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
index.html		index.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RCC v16 - Reference Compliance Checker

The problem

What RCC does

How it works

Key design decisions

Tech stack

What it replaced

Versions

Note on code

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RCC v16 - Reference Compliance Checker

The problem

What RCC does

How it works

Key design decisions

Tech stack

What it replaced

Versions

Note on code

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages