A 100% open-source, high-fidelity rendering engine for OpenXML documents. PanoramicData.Render acts as a virtual layout engine, calculating exact glyph positions, line breaks, and object anchors to produce a visually faithful representation of the document in SVG and PDF formats.
Fidelity goal: "Visually indistinguishable at normal zoom" — not pixel-identical with Microsoft Word, but close enough that a human viewer cannot tell the difference without overlaying outputs and zooming.
Primary use case: Server-side or client-side DOCX-to-SVG conversion for web-based document viewing. The library produces SVG strings and PDF streams; UI and visualization are the consumer's responsibility.
The library follows a Measure-then-Paint pipeline with strict separation between layout computation and output rendering.
┌─────────────┐ ┌──────────────────┐ ┌─────────────────┐ ┌──────────────┐ ┌───────────────┐
│ OpenXML │───>│ Style │───>│ Layout Engine │───>│ Render │───>│ Output │
│ Ingestion │ │ Resolution │ │ (The "Brain") │ │ Abstraction │ │ Drivers │
└─────────────┘ └──────────────────┘ └─────────────────┘ └──────────────┘ └───────────────┘
│ │ │ │ │
Open-XML-SDK Full OOXML cascade SkiaSharp metrics IRenderTarget SVG / PDF
DOM loading Theme → Direct fmt Knuth-Plass breaks Drawing commands One per page
Uses the Open-XML-SDK (DocumentFormat.OpenXml) to load document parts:
- Document body — paragraphs, tables, content controls
- Styles — style definitions with
basedOnchains - Theme — theme colors, theme fonts, tint/shade modifiers
- Numbering — multi-level list definitions, abstract numbering, number format overrides
- Settings — default tab stop, compatibility settings, document grid
- Headers/Footers — per-section, with first-page and odd/even variants
- Relationships — images, hyperlinks, OLE objects
- Embedded media — images stored in the package
A cascading engine that resolves the effective formatting for every text run. The full OOXML cascade order:
- Document Defaults (
w:docDefaults) — base paragraph and run properties - Theme — theme fonts (
majorFont/minorFont), theme colors with tint/shade - Numbering Styles — formatting inherited from list level definitions
- Table Styles — conditional formatting bands (first row, last column, etc.)
- Paragraph Style Hierarchy —
basedOnchains (potentially 10+ levels deep) - Character Style Hierarchy —
basedOnchains for run-level styles - Toggle Properties —
w:b,w:i,w:caps,w:smallCapstoggle rather than set when the inherited value is alreadytrue - Direct Formatting — explicit properties on the paragraph/run element
- Revision Overrides — tracked change formatting (rendered as final state; revision marks not displayed)
Key complexity: Toggle properties. <w:b/> on a run inside a bold character style toggles bold off, not reinforces it. This applies to: bold, italic, caps, smallCaps, strike, dstrike, vanish, emboss, imprint, outline, shadow.
The computational core. Iterates through resolved blocks and computes exact positions.
Units: All internal calculations use twips (1/1440 inch) to match Word's internal precision. Conversion to output units (SVG px, PDF points) happens only at render time.
Font Metrics: Uses SkiaSharp with HarfBuzz (SKShaper) for:
- Glyph measurement (advance widths, ink bounds)
- Complex script shaping (Arabic, Devanagari, Thai, etc.)
- Kerning pair adjustments
- Ligature substitution
Line Breaking: Implements the Knuth-Plass algorithm from day one:
- Considers the entire paragraph to minimize overall "badness"
- Supports hyphenation via TeX hyphenation patterns (optional)
- Handles justification by distributing space across glue items
- Produces line breaks that closely match Word's output on justified text
Pagination: Determines page breaks based on:
- Page dimensions and margins (per section)
- Widow/orphan control
- Keep-with-next / keep-lines-together
- Section breaks (next page, continuous, odd/even)
- Fixed page break characters
- Footnote/endnote space reservation
An IRenderTarget interface that accepts drawing commands:
DrawText(glyphs, positions, font, color)— positioned glyph runDrawLine(from, to, stroke)— line segmentDrawRect(rect, fill, stroke)— rectangle (borders, backgrounds)DrawImage(data, rect)— raster imageDrawPath(path, fill, stroke)— arbitrary vector pathPushClip(rect)/PopClip()— clipping regionsSetHyperlink(rect, uri)— clickable region
This abstraction decouples layout from output format — the layout engine emits drawing commands without knowing whether the target is SVG, PDF, or something else.
An opt-in pre-processing step that replaces cached field result text with dynamically computed values derived from the rendered layout. Activated by setting RenderOptions.FieldUpdate to a non-null FieldUpdateOptions instance.
DOCX files store field results as cached text. When a document is rendered without being opened in Word first, these cached values may be stale or absent (e.g., TOC page numbers pointing to the wrong pages, "Page X of Y" showing "1 of 1").
The field update engine uses an iterative convergence loop:
- Initial layout — the document is laid out with the existing (stale) cached field values
- Field computation — field values are recomputed from the layout results:
PAGE/NUMPAGES— from the page map- Document properties (
TITLE,AUTHOR,FILENAME, etc.) — from package metadata SEQ— sequential counter values per identifierTOC— rebuilt from heading paragraphs and outline levelsTOC \f(Table of Figures) — rebuilt from Caption-style paragraphsPAGEREF— resolved from bookmark-to-page mapREF— resolved from bookmark text content
- Convergence check — if all field values match the previous pass, stop
- Re-layout — if any value changed, the document model is updated in-memory and re-laid out
- Iteration cap — if
MaxIterationsis reached without convergence, log a warning and use the last computed values
Convergence is typically reached in ≤ 3 passes. The worst case (TOC expansion changes page numbers, which changes the TOC itself) is handled by the iteration cap.
| Field | Source | Switches Supported |
|---|---|---|
PAGE |
Block-to-page map | — |
NUMPAGES |
Total page count | — |
TITLE, AUTHOR, SUBJECT, KEYWORDS, DESCRIPTION |
Core file properties | — |
FILENAME |
RenderOptions.SourceFilename |
— |
SEQ |
Document-order counter | \r N (reset), \h (hidden) |
TOC |
Heading paragraphs + outline levels | \o, \h, \n, \p, \t |
TOC \f |
Caption-style paragraphs | — |
PAGEREF |
Bookmark-to-page map | — |
REF |
Bookmark text content | — |
SvgRenderer:
- Generates one SVG string per page
- Fonts optionally embedded as Base64 WOFF2 in
<style>blocks - Text positioned via
<text>elements with explicitx/yper glyph run - Hyperlinks emitted as
<a>wrappers - Images embedded as Base64 data URIs
PdfRenderer:
- Uses SkiaSharp's PDF backend (
SKDocument) - Generates a single PDF file with one page per document page
- Font embedding handled by SkiaSharp (note: no font subsetting or tagged PDF — known limitations)
Fonts are resolved in priority order:
- Fonts embedded in the DOCX (if present in the package)
- Configured font directories (via
RenderOptions.FontDirectories) - System font directories (platform-dependent defaults)
.ttf— TrueType fonts.otf— OpenType fonts.ttc— TrueType Collections (multiple faces per file, common for CJK)
When a requested font is not available:
- Check
RenderOptions.FontSubstitutionsfor explicit mapping (e.g.,"Calibri" → "Liberation Sans") - Fall back to
RenderOptions.FallbackFontFamily - If still unresolved, log a warning and use the first available sans-serif font
Embedding fonts (SVG WOFF2, PDF) may violate font licenses. This is the caller's responsibility. The library provides RenderOptions.EmbedFonts to control this behaviour.
/// <summary>
/// Configuration for document rendering.
/// </summary>
public class RenderOptions
{
/// <summary>Directories to search for font files.</summary>
public List<string> FontDirectories { get; set; }
/// <summary>Explicit font name substitutions (key=requested, value=replacement).</summary>
public Dictionary<string, string> FontSubstitutions { get; set; }
/// <summary>Font family to use when no match is found.</summary>
public string FallbackFontFamily { get; set; }
/// <summary>Target DPI for SVG output (default: 96).</summary>
public double TargetDpi { get; set; }
/// <summary>Whether to embed fonts in SVG output as WOFF2 (default: false).</summary>
public bool EmbedFonts { get; set; }
/// <summary>Whether to embed images as data URIs in SVG (default: true).</summary>
public bool EmbedImages { get; set; }
/// <summary>Optional page range to render (null = all pages).</summary>
public Range? PageRange { get; set; }
/// <summary>Optional field update configuration (null = disabled, fields render cached values).</summary>
public FieldUpdateOptions? FieldUpdate { get; set; }
/// <summary>Original filename for FILENAME field substitution.</summary>
public string? SourceFilename { get; set; }
}
/// <summary>
/// Configuration for the field update engine.
/// </summary>
public class FieldUpdateOptions
{
/// <summary>Update PAGE and NUMPAGES fields (default: true).</summary>
public bool UpdatePageFields { get; set; } = true;
/// <summary>Update document property fields (default: true).</summary>
public bool UpdateDocumentProperties { get; set; } = true;
/// <summary>Rebuild Table of Contents fields (default: true).</summary>
public bool UpdateTableOfContents { get; set; } = true;
/// <summary>Rebuild Table of Figures fields (default: true).</summary>
public bool UpdateTableOfFigures { get; set; } = true;
/// <summary>Update SEQ sequence number fields (default: true).</summary>
public bool UpdateSequenceFields { get; set; } = true;
/// <summary>Update PAGEREF and REF cross-reference fields (default: true).</summary>
public bool UpdateCrossReferences { get; set; } = true;
/// <summary>Maximum convergence iterations (default: 3, must be >= 1).</summary>
public int MaxIterations { get; set; } = 3;
}
/// <summary>
/// Main entry point for rendering DOCX documents.
/// </summary>
public class DocxRenderer
{
public DocxRenderer(RenderOptions options);
public DocxRenderer(RenderOptions options, ILogger logger);
public Task<RenderResult> RenderAsync(
Stream docxStream,
CancellationToken cancellationToken = default);
}
/// <summary>
/// Result of rendering a document.
/// </summary>
public class RenderResult
{
public IReadOnlyList<RenderedPage> Pages { get; }
/// <summary>Field update diagnostics (null if FieldUpdate was not enabled).</summary>
public FieldUpdateResult? FieldUpdateResult { get; }
public Task ToPdfAsync(
Stream output,
CancellationToken cancellationToken = default);
}
/// <summary>
/// Diagnostics from the field update engine.
/// </summary>
public class FieldUpdateResult
{
/// <summary>Number of layout passes required for convergence.</summary>
public int IterationsRequired { get; }
/// <summary>Field types that were updated (e.g., "TOC", "PAGE", "PAGEREF").</summary>
public IReadOnlyList<string> UpdatedFields { get; }
}
/// <summary>
/// A single rendered page.
/// </summary>
public class RenderedPage
{
public double WidthPoints { get; }
public double HeightPoints { get; }
public string ToSvg();
}- Immutable results.
RenderResultandRenderedPageare immutable once produced. - Async throughout. I/O-bound operations (reading streams, writing PDF) are async with
CancellationToken. - No global state.
DocxRendereris stateless after construction; safe to reuse across calls. - Logging via
ILogger. NoConsole.WriteorTrace; structured logging only.
| Metric | Target |
|---|---|
| Simple 1-page document | < 500ms |
| 50-page business report | < 10s |
| 500-page document | < 120s |
| Throughput (concurrent) | Linear scaling up to CPU core count |
- Peak memory should not exceed 3× the DOCX file size for text-heavy documents
- Image-heavy documents may use more; images are streamed where possible rather than buffered entirely
- No memory leaks on repeated renders (verified via long-running tests)
DocxRendereris thread-safe: multiple documents can render concurrentlyRenderResultis immutable and safe to read from multiple threads- Font caches are shared (thread-safe) across renders for efficiency
- Malformed DOCX files: Best-effort rendering with warnings logged, not exceptions thrown
- Missing fonts: Substitution with fallback + logged warning
- Unsupported features: Rendered as empty space or placeholder with logged warning; never crashes
- Corrupt images: Replaced with a placeholder rectangle
CancellationToken is threaded through all pipeline stages. Cancellation is checked:
- Between pages during pagination
- Between paragraphs during layout
- During font loading
- During output generation
| Feature | Status |
|---|---|
.doc (binary Word format) |
Never supported |
Macro execution (.docm) |
Macros stripped; content rendered |
| Document editing / round-trip | Read-only rendering only |
| HTML output | Not planned |
| CLI tool | Not planned (library only) |
| SmartArt | Future phase — best-effort fallback image if available |
| OLE embedded objects | Future phase — best-effort fallback image if available |
| Revision marks / comments display | Renders final document state only |
| Accessible / tagged PDF | Known SkiaSharp limitation; not in scope |
| PDF/A compliance | Known SkiaSharp limitation; not in scope |
| Font subsetting for PDF | Known SkiaSharp limitation; not in scope |
| Package | Purpose | License |
|---|---|---|
DocumentFormat.OpenXml |
OOXML parsing | MIT |
SkiaSharp |
Font metrics, image processing, PDF backend | MIT |
SkiaSharp.HarfBuzz |
Complex script shaping, kerning | MIT |
Microsoft.Extensions.Logging.Abstractions |
Structured logging | MIT |
All dependencies are MIT-licensed, consistent with this project's MIT license.
- StyleResolverTests — Verify cascade resolution: toggle properties, basedOn chains, theme inheritance
- TwipConverterTests — Ensure rounding errors don't accumulate over long documents
- KnuthPlassTests — Verify line break positions against known-good outputs
- FontResolverTests — Fallback chains, substitution mappings, missing font handling
Deterministic tests that verify computed positions:
- "Given this paragraph with these styles and this page width, line breaks occur at word indices [X, Y, Z]"
- "Given this table with these column widths, cell (2,3) is positioned at (x, y) with dimensions (w, h)"
These are fast, run without image comparison, and are the primary diagnostic tool.
Image comparison for end-to-end fidelity:
- Baseline: Reference PNGs generated from a controlled Word version (pinned, documented)
- Test: PanoramicData.Render generates SVG → rasterized to PNG at 150 DPI
- Comparison: Perceptual diff (not raw pixel comparison) with configurable threshold
- Threshold: Tests fail if perceptual difference exceeds a defined threshold per test document
- Tool: A perceptual diff library (e.g., pixelmatch-style algorithm) to avoid false positives from anti-aliasing
A curated set of .docx files covering:
- Basic text formatting (bold, italic, sizes, colors)
- Paragraph alignment and indentation
- Multi-level numbered and bulleted lists
- Tables (simple, merged cells, nested, auto-fit)
- Headers, footers, page numbers
- Sections with different page sizes/orientations
- Inline and floating images
- Tab stops and leaders
- Footnotes and endnotes
- Columns
- Watermarks
- RTL text
- Complex scripts (Arabic, CJK)
Each test document is small and tests one feature to keep failures diagnostic.