Skip to content

digitalpalidictionary/dpd-pali-courses

Repository files navigation

DPD Pāḷi Courses

This repository contains the source materials for the Digital Pāḷi Dictionary (DPD) Pāḷi courses, transformed from original Google Docs into a modern, searchable static website.

Repository Contents

  • docs/: Markdown source files for all course materials.
    • bpc/: Beginner Pāḷi Course (BPC) lessons.
    • bpc_ex/: BPC Exercises.
    • bpc_key/: BPC Answer Keys.
    • ipc/: Intermediate Pāḷi Course (IPC) lessons.
    • ipc_ex/: IPC Exercises.
    • ipc_key/: IPC Answer Keys.
  • identity/: DPD CSS and JavaScript assets used for the website and document generation.
  • scripts/: Regularly used maintenance and generation scripts (runnable with uv run).
  • tools/: Python modules used by scripts (imports only).
  • mkdocs.yaml: Configuration for the MkDocs static site generator.

Static Site Generation

The website is built using MkDocs with the Material for MkDocs theme. It serves as the primary way to interact with the course materials.

Document Generation (PDF & DOCX)

In addition to the static website, this project can generate high-quality PDF and Word (.docx) documents for offline study and editing. These documents are generated directly from the same Markdown source files used for the website, ensuring consistency across all formats.

System Dependencies

To generate documents locally, you must install the following system-level dependencies:

macOS (using Homebrew):

# For PDF generation (WeasyPrint dependencies)
brew install weasyprint
# or 
brew install pango libffi

# For DOCX generation (Pandoc)
brew install pandoc

Linux (Ubuntu/Debian):

# For PDF generation
sudo apt-get install python3-pip python3-cffi python3-brotli libpango-1.0-0 libpangoft2-1.0-0

# For DOCX generation
sudo apt-get install pandoc

Generating Documents Locally

  1. Install Python Dependencies: Ensure your local environment is up to date:
    uv sync
  2. Run the Scripts:
    # Generate PDFs
    uv run python scripts/generate_pdfs.py
    
    # Generate DOCX
    uv run python scripts/generate_docx.py
    The generated files will be placed in the pdf_exports/ and docx_exports/ directories respectively.

Maintenance & Generation Scripts

All scripts are located in the scripts/ directory and can be run using uv run python scripts/<script_name>.py (for Python scripts) or uv run bash scripts/<script_name>.sh (for shell scripts).

Quick Start Commands

Build the website locally:

./scripts/cl/pali-build-website

Generate PDFs and DOCX documents:

./scripts/cl/pali-build-pdf-doc

Content Verification & Validation

  • verify_sources.py: Interactive source verification tool that compares original (old) DOCX materials against generated DOCX and PDF outputs. Helps identify discrepancies between source and generated formats.

    • Usage: uv run python scripts/verify_sources.py
  • verify_pdf_content.py: Extracts text from generated PDFs and compares with source Markdown to ensure no data loss during PDF generation.

    • Usage: uv run python scripts/verify_pdf_content.py
  • verify_docx_content.py: Verification tool for DOCX content integrity. Compares text extracted from generated Word documents with source Markdown.

    • Usage: uv run python scripts/verify_docx_content.py
  • verify_numbering.py: Verifies consistency of sentence numbering (footnotes, lists) across Markdown, website, and PDF. Identifies discrepancies where numbering resets or differs between formats.

    • Usage: uv run python scripts/verify_numbering.py
  • compare_md_sources.py: Compares current Markdown files against an older Git commit to detect potential data loss or regressions in course content.

    • Usage: uv run python scripts/compare_md_sources.py [--commit <hash>]

Document Generation

  • generate_pdfs.py: Generates high-quality PDF course materials from Markdown source files using WeasyPrint. Now also generates pdf_exports/vocab.pdf and pdf_exports/abbreviations.pdf.

    • Usage: uv run python scripts/generate_pdfs.py
  • generate_docx.py: Generates Word (.docx) documents from Markdown source using Pandoc. Maintains visual parity with PDF output for offline study. Now also generates docx_exports/vocab.docx and docx_exports/abbreviations.docx.

    • Usage: uv run python scripts/generate_docx.py

Content Cleanup & Maintenance

  • renumber_footnotes.py: Renumbers footnotes sequentially across all files in a course folder. The counter starts at 1 and continues across files in course order, correcting duplicate numbers and out-of-order references automatically.

    • Usage: uv run python scripts/renumber_footnotes.py [--dry-run]
  • check_renumber.py: Detects and corrects numbering inconsistencies in Pāḷi sentence lists. Supports dry-run and automatic re-numbering of exercises and answer keys.

    • Usage: uv run python scripts/check_renumber.py [--dry-run]
  • clean_dead_links.py: Finds and removes dead links in Markdown files. Specifically targets list items in index files that link to removed .md files.

    • Usage: uv run python scripts/clean_dead_links.py
  • fix_heading_hierarchy.py: Normalizes heading levels across all Markdown files. Converts bolded top lines to H1 headings and ensures no heading levels are skipped (e.g., # to ### becomes # to ##).

    • Usage: uv run python scripts/fix_heading_hierarchy.py
  • fixing_tables.py: Performs automated cleanup of Markdown tables. Normalizes cell padding, standardizes separator rows, and strips unnecessary bolding from footnote definitions.

    • Usage: uv run python scripts/fixing_tables.py

Site & Metadata Generation

  • generate_mkdocs_yaml.py: Helper script to update mkdocs.yaml based on course folder structure. Automatically generates the navigation section using headings from Markdown files.

    • Usage: uv run python scripts/generate_mkdocs_yaml.py
  • generate_indexes.py: Generates index.md pages for course categories, creating a Table of Contents based on individual lesson headings.

    • Usage: uv run python scripts/generate_indexes.py
  • update_css.py: Synchronizes CSS variables from source configurations to the Identity stylesheet directory.

    • Usage: uv run python scripts/update_css.py
  • vocab_abbrev_pali_course.py (in dpd-db): Generates Markdown vocabulary and abbreviation reference pages from the DPD database. These pages are published in the "Reference" section of the website.

    • Usage: cd ../dpd-db && uv run python scripts/export/vocab_abbrev_pali_course.py

Pre-Processing Workflows

Pre-processing scripts run a series of checks and corrections before building the website or documents:

  • web_preprocessing.sh: Runs all pre-processing steps required before building the MkDocs website (generates metadata, renumbers content, cleans links, updates CSS).

    • Usage: uv run bash scripts/web_preprocessing.sh
  • pdf_preprocessing.sh: Runs pre-processing steps required before generating PDF and DOCX documents.

    • Usage: uv run bash scripts/pdf_preprocessing.sh

Utilities & Legacy Tools

  • download_all_materials.py: Downloads old source materials from Google Docs as a ZIP archive. Facilitates keeping Markdown source files in sync with original (old) sources if needed (for reference/backup purposes).
    • Usage: uv run python scripts/download_all_materials.py

Local Development

This project uses uv for Python dependency management.

  1. Install uv: Follow the instructions at astral.sh/uv.

  2. Install Dependencies:

    uv sync
  3. Build and Serve Website Locally: We recommend using the included unified build script, which handles metadata generation, renumbering, and starting the local server:

    ./scripts/cl/pali-build-website

    To run this from anywhere on your system, add scripts/cl/ to your PATH (e.g., fish_add_path /path/to/dpd-pali-courses/scripts/cl in Fish).

    The site will be available at http://127.0.0.1:8000.

  4. Generate Documents Locally:

    uv run python scripts/generate_pdfs.py
    uv run python scripts/generate_docx.py

Automated Deployment & Generation

The website, PDF volumes, and DOCX volumes are automatically updated whenever changes are pushed to the main branch.

  • Website Deployment: Handled by .github/workflows/deploy_site.yaml.
  • Document Generation: Handled by a unified workflow that generates both PDF and DOCX artifacts and publishes them to the latest GitHub Release.

Useful Links (Original Sources)

Note: These Google Docs are the original sources. They will be removed once the Markdown conversion and website are fully verified.

Project Management

Beginner Pāḷi Course

Intermediate Pāḷi Course