Skip to content

eeseol/hwpx-converter

Repository files navigation

HwpxConverter

What is this project?

HwpxConverter is a Windows converter that parses HWPX (OWPML) documents and converts them to HTML. It uses Hancom’s public OWPML SDK libraries to traverse the document structure (sections/paragraphs/runs/tables, etc.) and produces HTML in a form that’s easy to inspect, debug, and feed into downstream pipelines.

Typical use cases:

  • HWPX → HTML conversion (primary goal)
  • Preprocessing for RAG / search pipelines (HTML/text extraction as an intermediate format)
  • Batch conversion tests for public-sector documents (tables/outlines/lists)

Key features

  • Text extraction based on Paragraph / TextRun traversal
  • Outline (Outline 1–10) → HTML tag mapping (e.g., h1–h6)
  • Table (<table>) layout + text rendering
  • List rendering policy: no numbering computation, keep <ol> but display bullets only via CSS

Source: Official publication of the National Police Agency

(2026년도 미래치안도전기술개발사업 신규과제 선정계획 공고)


Support and limitations

  • Input: .hwpx only

  • Even with a .hwpx extension, some files may fail to open if they are non-standard HWPX or corrupted

    • Examples: legacy HWP renamed to .hwpx, institution-provided files with non-standard packaging, damaged archives
  • Currently targets Windows + Visual Studio 2022 (as the baseline environment)


1) Quick start (GitHub Releases)

If you want to run it without building from source, download the latest executable (HwpxConverter.exe) from GitHub Releases and run:

HwpxConverter.exe "input.hwpx" "output.html"
  • If the path/name contains spaces, quotes are strongly recommended.

2) Build on Windows (Visual Studio)

This project requires Hancom’s OWPML SDK libraries.

Build environment

  • Windows 11
  • Visual Studio 2022
  • Platform: Win32 (x86) (based on current project settings)

Prerequisite: Hancom OWPML SDK libraries

Prepare the SDK from Hancom’s public repo:

Build that repo and obtain the required libraries (e.g. Owpml.lib, OWPMLApi.lib, OWPMLUtil.lib) and headers, then place them to match this repo’s expected layout (include/, lib/), following your .vcxproj include/library path configuration.

Build steps

  1. Open the solution (HwpxConverter.sln) in Visual Studio
  2. Configuration: Release
  3. Platform: Win32
  4. Build

After building, Release/HwpxConverter.exe will be produced.


Usage

Command line

HwpxConverter.exe "InputFile.hwpx" "OutputFile.html"
  • If the input file is not .hwpx, the program prints an error and exits immediately.
  • If it is .hwpx but conversion fails, it prints guidance indicating the file may be non-standard or corrupted.

Testing

A practical local test layout:

  • test/cases/: input .hwpx files
  • test/expected/: expected output (HTML) or baseline outputs
  • test/out/: actual outputs generated by running the converter (recommended to gitignore)

Example:

HwpxConverter.exe "test/cases/table_only.hwpx" "test/out/table_only.html"

Minimum recommended set (3 docs):

  • Outline-only document
  • Table-only document
  • List-only document

Contributing (short)

  • Indentation: spaces
  • Encoding: UTF-8 recommended
  • Commit example: converter: add hwpx extension validation

License

See LICENSE for details.


Reference


Example

About

HWPX(OWPML) 문서를 HTML로 변환하는 Windows 변환기 (C++/Visual Studio)

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors