Document Automation
JENNY converts draft SOPs into properly formatted federal compliant documents. An LLM extracts structured content from the draft, then a deterministic Python pipeline handles all template mutation with a structural integrity gate. The pipeline is model-agnostic -- any LLM that produces a valid config can drive it.
Upload User uploads the template (.docx) and source draft (.docx or .pdf)
Extract LLM reads the draft and produces a structured config
Review User verifies and edits the config in the browser UI
Generate Pipeline mutates the template, validates output, produces .docx
Two extraction paths:
- Keyed build (
JENNY_keyed.exe): Anthropic API key embedded -- extraction runs automatically - Keyless build (
JENNY.exe): User copies the extraction prompt into any LLM (ChatGPT, Gemini, Copilot, etc.) and pastes the output back
See QUICKSTART.md for end-user instructions.
From the source draft, JENNY extracts or derives:
| Section | Source |
|---|---|
| Title, Purpose | Verbatim from draft |
| Scope | Verbatim if present, otherwise derived from step content |
| Step-by-step instructions (S6) | Verbatim text with hierarchy assignment |
| Images / Screenshots | Extracted from .docx or .pdf, positioned at original locations |
| Hyperlinks | Extracted and preserved in output |
| Highlighted steps | Detected from .docx formatting, flagged for review |
| Roles, Materials, Guidelines | Derived from S6 content by the LLM |
| Cover date | From source, or defaults to current month/year |
LLM-derived sections (Scope, Roles, Materials, Guidelines, Hierarchy) require SOP owner review.
JENNY ships as a single .exe -- no Python, Node.js, or environment setup required.
| Executable | Description |
|---|---|
JENNY_keyed.exe |
API key embedded. Double-click and go. |
JENNY.exe |
No key. User extracts via external LLM and pastes config back. |
Place the .exe in its own folder and double-click. A browser window opens at http://localhost:5000. The app creates uploads/ and jobs/ folders next to the exe at runtime.
See Efficiency by Design: Reducing AI Cost Through Smart Architecture for a detailed look at how JENNY's split-phase architecture achieves a 15-25x cost reduction over LLM-heavy approaches by limiting the model to text comprehension and handling all document construction in code.
Current prototype validates against 82-84 checks and scores 100% on tested SOPs:
| Draft | Type | Score | Features |
|---|---|---|---|
| DTS Job Aid (Cancelling Training Event) | 82/82 (100%) | Images, hyperlinks | |
| CBA Reconciliation | DOCX | 83/83 (100%) | Highlights, nested hierarchy |
| Travel Memo | DOCX | 83/83 (100%) | Baseline text-only |
| Onboarding Travel Tracker | DOCX | 83/83 100% | Highlights, complex hierarchy |
- Full extraction-to-docx pipeline for both .docx and .pdf drafts
- Image extraction, positioning, and embedding (PDF and DOCX)
- Hyperlink preservation
- Highlight detection from .docx and application to config
- Cover page uses full title, all other locations use short title
- External LLM flow: copy prompt with draft text + image markers, paste config back
- Config review UI with sequential numbering, section labels, editable fields
- Validation gate with score reporting
- Single-file executables (keyed and keyless)
- PDF text extraction depends on PyMuPDF -- scanned/image-only PDFs not supported
- Multi-procedure SOPs (
structure_type: "multi") not yet implemented - External LLM cannot detect highlights -- JENNY splices them from the uploaded draft
- Step hierarchy is an LLM judgment call and may need manual correction
Backend:
pip install -r requirements.txt
set ANTHROPIC_API_KEY=sk-ant-...
cd backend
python jenny_backend.py
Frontend:
cd frontend
npm install
npm run dev
Build executables:
pip install pyinstaller
python build.py
JENNY/
README.md
QUICKSTART.md End-user guide
requirements.txt
build.py Build script for executables
jenny.spec PyInstaller spec
.env API key (not committed)
backend/
jenny_backend.py Flask API + session management
jenny_pipeline.py Deterministic template mutation
unpack_docx.py Unpack .docx to XML
pack_docx.py Pack XML to .docx
JENNY_Phase0_Extraction_Prompt.md Extraction prompt (single source of truth)
frontend/
src/App.jsx React UI
index.html
package.json
data/ Test drafts and templates
dist/ Built executables (not committed)
| Endpoint | Method | Purpose |
|---|---|---|
/api/health |
GET | Status check |
/api/upload |
POST | Upload template and draft |
/api/extract |
POST | LLM extracts config from draft (keyed build) |
/api/get-prompt |
POST | Get assembled prompt for external LLM (keyless build) |
/api/import-config |
POST | Import config from external LLM output |
/api/generate |
POST | Run pipeline, return validated .docx |
/api/download/<job_id> |
GET | Download generated .docx |
/api/image/<session_id>/<filename> |
GET | Serve extracted image |
/api/key-status |
GET | Check API key and build type |
- No
exec()-- all config parsing usesast.literal_eval - Debug mode disabled by default
- 50 MB upload size limit
- Server-side error logging only
- Path traversal protection on image serving endpoint