Inspiration
Technical documents are everywhere — engineering specs, student textbooks, patent filings, maintenance manuals — and they all share one fatal flaw: photographs that should be engineering diagrams.
We watched classmates study from blurry, low-resolution scans where critical details were unreadable. We saw patent attorneys spend \$500–\$2,000 per illustration hiring CAD technicians to manually redraw photographs. Across the \$12B+ technical documentation market, no tool automates the transformation of in-document photographs into clean, precise line art.
Then we realized something deeper: converting a photograph to 2-color line art isn't just aesthetic — it's mathematically effective compression. A color photograph stores $\sim 24$ bits per pixel across millions of pixels. Line art reduces that to $1$ bit per pixel (black or white), eliminating color channels, texture, and noise entirely. The result:
$$\text{Size Reduction} = 1 - \frac{\text{Line Art Size}}{\text{Photo Size}} \approx 70\text{–}90\%$$
All structural information — edges, shapes, annotations — is preserved. Only the data that adds weight but not meaning is removed. That's not lossy compression. That's visual intelligence.
What It Does
LineForge transforms any document into a clean, professional technical publication in one upload.
Upload a PDF, Word document, Markdown file, or HTML page. LineForge:
- Extracts every embedded image
- Converts each photograph into CAD-style line art via Google Gemini 3 Pro
- Generates technical captions describing each image
- Reassembles the complete document — preserving layout, structure, and formatting
- Optionally vectorizes line art to SVG via potrace for infinite-resolution scaling
Beyond documents, LineForge offers three more AI-powered tools:
- Pipeline Builder — Drag-and-drop composable AI blocks: chain line art → 3D model → cinematic video from a single image
- Image2STL — Generate 3D printable models (Meshy.ai) with an interactive Three.js viewer
- Video Guide — Create cinematic video walkthroughs from a single image using Google Veo 3.1
How We Built It
Stack: Next.js 16, React 19, TypeScript, Tailwind CSS
AI Services:
- Gemini 3 Pro (
gemini-3-pro-image-preview) withresponseModalities: ["IMAGE", "TEXT"]for simultaneous line art generation and technical captioning - Veo 3.1 (
veo-3.1-generate-preview) for async video synthesis with operation polling - Meshy.ai for image-to-3D model generation
Document Processing — Four format-specific parsers:
pdf-libfor PDF extraction and reassemblymammothfor DOCX/Word documentsremark/unifiedfor Markdown AST manipulationcheeriofor HTML DOM processing
Architecture Highlights:
- NDJSON streaming — Results stream back image-by-image so users see real-time progress instead of waiting for entire documents
- Typed pipeline blocks — Each block declares input/output types with compatibility validation at build time
- Resilience layer — Exponential backoff with
retryDelayheader parsing, 3 retries per Gemini API call - Image processing —
sharpfor raster manipulation,potracefor raster-to-SVG vectorization
Design System — Blueprint engineering aesthetic:
- Dark navy background (
#0a0e27) with cyan accent lines - JetBrains Mono typography
- SVG-based grid backgrounds
anime.jscrosshair targeting animations, technical drawing loaders, and corner accent components
Challenges We Faced
Gemini rate limits during hackathon crunch. With dozens of images per document, we hit 429s fast. We built an exponential backoff system that parses retryDelay from Gemini's response headers and waits the exact specified duration — not a fixed delay, but the server's own recommendation. This made our pipeline resilient even under heavy load.
Preserving document structure across 4 formats. Each format stores images differently — PDFs embed binary streams, DOCX uses relationship references, Markdown uses file paths, HTML uses src attributes. We had to write format-specific extraction and reinsertion logic that swaps images without corrupting the surrounding document structure.
Potrace integration. The Node.js potrace wrapper had a broken image loader (Jimp compatibility issue). We bypassed it entirely by preprocessing images through sharp to raw bitmap buffers, then feeding those directly to potrace's tracing engine — effectively rewriting the integration layer.
NDJSON streaming across Next.js API routes. Streaming newline-delimited JSON from server to client through Next.js required careful handling of ReadableStream, chunked encoding, and progressive parsing on the frontend to update the UI as each image completed.
Learning 8 technologies in 24 hours. None of us had used Gemini image generation, Veo 3.1, NDJSON streaming, pdf-lib, mammoth, potrace, Three.js, or anime.js before this hackathon. Every feature required simultaneously learning the technology and building production code with it.
What We Learned
We integrated 8 new technologies in 24 hours:
- Google Gemini 3 Pro —
responseModalitiesfor combined image + text output - Google Veo 3.1 — Asynchronous video generation with operation polling
- NDJSON streaming — Real-time progress for long-running document processing
- pdf-lib — PDF parsing, image extraction, and document reassembly
- mammoth — Word document processing while preserving formatting
- Pipeline architecture — Typed blocks with compatibility checking
- Three.js — Interactive 3D model rendering in the browser
- potrace + anime.js — SVG vectorization with engineering-style loading animations
The biggest takeaway: line art is a form of intelligent compression. By understanding what information matters in a technical image (edges, structure, annotations) and discarding what doesn't (color, texture, noise), you can reduce file size by $70$–$90\%$ while actually improving readability. That's not a trade-off — it's a free lunch.
Built With
- css
- gemini
- javascript
- meshy.ai
- next.js
- notebooklm
- potrace
- react
- tailwind
- typescript
- veo
Log in or sign up for Devpost to join the conversation.