Skip to content
@datalab-to

Datalab

Developing state of the art document intelligence models.

Pinned Loading

  1. marker marker Public

    Convert PDF to markdown + JSON quickly with high accuracy

    Python 32.9k 2.3k

  2. surya surya Public

    OCR, layout analysis, reading order, table recognition in 90+ languages

    Python 19.5k 1.3k

  3. pdftext pdftext Public

    Extract structured text from pdfs quickly

    Python 674 65

  4. chandra chandra Public

    OCR model that handles complex tables, forms, handwriting with full layout.

    Python 5.1k 572

Repositories

Showing 10 of 10 repositories
  • sdk Public
    datalab-to/sdk’s past year of commit activity
    Python 10 MIT 7 3 14 Updated Mar 20, 2026
  • chandra Public

    OCR model that handles complex tables, forms, handwriting with full layout.

    datalab-to/chandra’s past year of commit activity
    Python 5,064 Apache-2.0 572 24 5 Updated Mar 18, 2026
  • marker Public

    Convert PDF to markdown + JSON quickly with high accuracy

    datalab-to/marker’s past year of commit activity
    Python 32,895 GPL-3.0 2,276 333 63 Updated Mar 10, 2026
  • surya Public

    OCR, layout analysis, reading order, table recognition in 90+ languages

    datalab-to/surya’s past year of commit activity
    Python 19,484 GPL-3.0 1,334 139 15 Updated Mar 1, 2026
  • datalab-on-prem Public

    Scripts to run Datalab's self-service on-prem container

    datalab-to/datalab-on-prem’s past year of commit activity
    Shell 5 1 0 0 Updated Feb 12, 2026
  • pykatex Public
    datalab-to/pykatex’s past year of commit activity
    Python 2 0 0 0 Updated Feb 5, 2026
  • oss_container Public
    datalab-to/oss_container’s past year of commit activity
    Python 1 1 0 0 Updated Oct 2, 2025
  • datalab-to/inference-mirror’s past year of commit activity
    Python 3 1 0 1 Updated Aug 13, 2025
  • docext Public

    An on-premises, OCR-free unstructured data extraction, markdown conversion and benchmarking toolkit. (https://idp-leaderboard.org/)

    datalab-to/docext’s past year of commit activity
    Python 9 Apache-2.0 4 0 0 Updated Jun 18, 2025
  • pdftext Public

    Extract structured text from pdfs quickly

    datalab-to/pdftext’s past year of commit activity
    Python 674 Apache-2.0 65 12 6 Updated Jun 11, 2025

Top languages

Python Shell

Most used topics

Loading…