Inspiration

Technical documents are everywhere — engineering specs, student textbooks, patent filings, maintenance manuals — and they all share one fatal flaw: photographs that should be engineering diagrams.

We watched classmates study from blurry, low-resolution scans where critical details were unreadable. We saw patent attorneys spend \$500–\$2,000 per illustration hiring CAD technicians to manually redraw photographs. Across the \$12B+ technical documentation market, no tool automates the transformation of in-document photographs into clean, precise line art.

Then we realized something deeper: converting a photograph to 2-color line art isn't just aesthetic — it's mathematically effective compression. A color photograph stores $\sim 24$ bits per pixel across millions of pixels. Line art reduces that to $1$ bit per pixel (black or white), eliminating color channels, texture, and noise entirely. The result:

$$\text{Size Reduction} = 1 - \frac{\text{Line Art Size}}{\text{Photo Size}} \approx 70\text{–}90\%$$

All structural information — edges, shapes, annotations — is preserved. Only the data that adds weight but not meaning is removed. That's not lossy compression. That's visual intelligence.

What It Does

LineForge transforms any document into a clean, professional technical publication in one upload.

Upload a PDF, Word document, Markdown file, or HTML page. LineForge:

  1. Extracts every embedded image
  2. Converts each photograph into CAD-style line art via Google Gemini 3 Pro
  3. Generates technical captions describing each image
  4. Reassembles the complete document — preserving layout, structure, and formatting
  5. Optionally vectorizes line art to SVG via potrace for infinite-resolution scaling

Beyond documents, LineForge offers three more AI-powered tools:

  • Pipeline Builder — Drag-and-drop composable AI blocks: chain line art → 3D model → cinematic video from a single image
  • Image2STL — Generate 3D printable models (Meshy.ai) with an interactive Three.js viewer
  • Video Guide — Create cinematic video walkthroughs from a single image using Google Veo 3.1

How We Built It

Stack: Next.js 16, React 19, TypeScript, Tailwind CSS

AI Services:

  • Gemini 3 Pro (gemini-3-pro-image-preview) with responseModalities: ["IMAGE", "TEXT"] for simultaneous line art generation and technical captioning
  • Veo 3.1 (veo-3.1-generate-preview) for async video synthesis with operation polling
  • Meshy.ai for image-to-3D model generation

Document Processing — Four format-specific parsers:

  • pdf-lib for PDF extraction and reassembly
  • mammoth for DOCX/Word documents
  • remark/unified for Markdown AST manipulation
  • cheerio for HTML DOM processing

Architecture Highlights:

  • NDJSON streaming — Results stream back image-by-image so users see real-time progress instead of waiting for entire documents
  • Typed pipeline blocks — Each block declares input/output types with compatibility validation at build time
  • Resilience layer — Exponential backoff with retryDelay header parsing, 3 retries per Gemini API call
  • Image processingsharp for raster manipulation, potrace for raster-to-SVG vectorization

Design System — Blueprint engineering aesthetic:

  • Dark navy background (#0a0e27) with cyan accent lines
  • JetBrains Mono typography
  • SVG-based grid backgrounds
  • anime.js crosshair targeting animations, technical drawing loaders, and corner accent components

Challenges We Faced

Gemini rate limits during hackathon crunch. With dozens of images per document, we hit 429s fast. We built an exponential backoff system that parses retryDelay from Gemini's response headers and waits the exact specified duration — not a fixed delay, but the server's own recommendation. This made our pipeline resilient even under heavy load.

Preserving document structure across 4 formats. Each format stores images differently — PDFs embed binary streams, DOCX uses relationship references, Markdown uses file paths, HTML uses src attributes. We had to write format-specific extraction and reinsertion logic that swaps images without corrupting the surrounding document structure.

Potrace integration. The Node.js potrace wrapper had a broken image loader (Jimp compatibility issue). We bypassed it entirely by preprocessing images through sharp to raw bitmap buffers, then feeding those directly to potrace's tracing engine — effectively rewriting the integration layer.

NDJSON streaming across Next.js API routes. Streaming newline-delimited JSON from server to client through Next.js required careful handling of ReadableStream, chunked encoding, and progressive parsing on the frontend to update the UI as each image completed.

Learning 8 technologies in 24 hours. None of us had used Gemini image generation, Veo 3.1, NDJSON streaming, pdf-lib, mammoth, potrace, Three.js, or anime.js before this hackathon. Every feature required simultaneously learning the technology and building production code with it.

What We Learned

We integrated 8 new technologies in 24 hours:

  1. Google Gemini 3 ProresponseModalities for combined image + text output
  2. Google Veo 3.1 — Asynchronous video generation with operation polling
  3. NDJSON streaming — Real-time progress for long-running document processing
  4. pdf-lib — PDF parsing, image extraction, and document reassembly
  5. mammoth — Word document processing while preserving formatting
  6. Pipeline architecture — Typed blocks with compatibility checking
  7. Three.js — Interactive 3D model rendering in the browser
  8. potrace + anime.js — SVG vectorization with engineering-style loading animations

The biggest takeaway: line art is a form of intelligent compression. By understanding what information matters in a technical image (edges, structure, annotations) and discarding what doesn't (color, texture, noise), you can reduce file size by $70$–$90\%$ while actually improving readability. That's not a trade-off — it's a free lunch.

Built With

Share this project:

Updates