markdown-guide.md

Markdown Library Programmer's Guide

This guide provides an overview and usage instructions for the io.github.edadma.markdown library, a Scala implementation of the CommonMark specification with extensions like tables.

Installation

Add the following dependency to your build.sbt:

libraryDependencies += "io.github.edadma" %%% "markdown" % "0.0.1"

The library is cross-built for Scala JVM, Scala.js, and Scala Native platforms.

Basic Usage

Parsing Markdown

The simplest way to parse Markdown to an AST:

import io.github.edadma.markdown._

// Parse Markdown into a Document AST
val document = parseDocumentContent("# Hello, *world*!")

If you need link references:

// Get both document and link references
val (doc, linkRefs) = parseDocumentContentWithRefs("# Hello\n\n[ref]: /url")

Rendering to HTML

To render Markdown as HTML:

import io.github.edadma.markdown._

// Parse and render in one step
val html = renderToHTML("# Hello, *world*!")

// Or render an existing Document
val document = parseDocumentContent("# Hello, *world*!")
val html = renderToHTML(document)

Extracting Headers

Get all headings for a table of contents:

val document = parseDocumentContent("# Title\n## Section\n### Subsection")
val headers = extractHeaders(document)  // List[(Int, String)] of (level, text)

// Output the headers
headers.foreach { case (level, text) =>
  println(s"${"\t" * (level - 1)}$text")
}

Core API

Main Functions

Function	Description
`parseDocumentContent(input: String): Document`	Parse Markdown to a Document AST
`parseDocumentContentWithRefs(input: String): (Document, Map[String, LinkReference])`	Parse Markdown, returning Document and link references
`renderToHTML(md: String): String`	Parse and render Markdown to HTML
`renderToHTML(document: Document): String`	Render a Document AST to HTML
`renderToXML(doc: Document, system: String = "document"): String`	Render a Document AST to XML (useful for debugging)
`extractHeaders(document: Document): List[(Int, String)]`	Extract all headings with their levels
`inlinesToPlainText(inlines: List[Inline]): String`	Convert inline elements to plain text
`escapeXml(s: String): String`	Escape special characters for XML/HTML output

Document Model

The AST is structured hierarchically around these core types:

Document(children: List[Block]) - Container for all blocks in a document
Block - Base trait for all block-level elements
Inline - Base trait for all inline-level elements

Block-Level Elements

Class	Description
`Paragraph(inlines: List[Inline])`	A paragraph of text
`Heading(level: Int, inlines: List[Inline])`	A heading with level 1-6
`BlockQuote(children: List[Block])`	A block quote containing other blocks
`Code(content: String, infoString: Option[String])`	A code block with optional language info
`ThematicBreak()`	A horizontal rule (HR)
`HTMLBlock(content: String)`	A raw HTML block

List-Related Elements

Class	Description
`ListBlock(data: ListData, items: List[ListItem])`	A list (ordered or unordered)
`ListItem(content: List[Block])`	An item in a list
`ListData(isOrdered, bulletChar, startNumber, delimiter, isTight, indent)`	Metadata about a list's structure

Table-Related Elements

Class	Description
`Table(headerRow, rows, alignments)`	A table with headers, rows, and column alignments
`TableRow(cells: List[TableCell])`	A row in a table
`TableCell(content: List[Inline])`	A cell in a table
`TableAlignment` (enum)	Alignment types: Left, Center, Right, None

Inline-Level Elements

Class	Description
`Text(content: String)`	Plain text
`Emphasis(inlines: List[Inline])`	Emphasized text (text)
`Strong(inlines: List[Inline])`	Strongly emphasized text (text)
`CodeSpan(content: String)`	Inline code (`code`)
`Link(destination, title, inlines)`	A hyperlink
`Image(destination, title, inlines)`	An image
`AutoLink(destination, text)`	An autolink ()
`SoftLineBreak()`	A newline in source rendered as space
`HardLineBreak()`	A forced line break
`RawHTML(content: String)`	Inline HTML

Other Types

Type	Description
`LinkReference(destination, title)`	Reference-style link definition
`C(char, pos, line, column, isLiteral)`	Character cursor for tracking position in source

Working with the AST

Walking the Document Tree

You can traverse the document structure recursively:

def walkDocument(doc: Document): Unit = {
  def walkNode(node: Node): Unit = node match {
    case Document(children) => 
      println("Document:")
      children.foreach(walkNode)
    
    case Heading(level, inlines) =>
      println(s"Heading (level $level): ${inlinesToPlainText(inlines)}")
    
    case Paragraph(inlines) =>
      println(s"Paragraph: ${inlinesToPlainText(inlines)}")
    
    case BlockQuote(children) =>
      println("BlockQuote:")
      children.foreach(walkNode)
    
    case ListBlock(data, items) =>
      println(s"${if (data.isOrdered) "Ordered" else "Unordered"} List:")
      items.foreach { item =>
        println("  - ListItem:")
        item.content.foreach(b => walkNode(b))
      }
    
    case Code(content, infoString) =>
      println(s"Code block${infoString.map(s => s" ($s)").getOrElse("")}: $content")
    
    case other => println(s"Other node: $other")
  }
  
  walkNode(doc)
}

Finding Specific Content

Extract specific content from the document:

// Find all links in a document
def findLinks(doc: Document): List[Link] = {
  val links = collection.mutable.ListBuffer[Link]()
  
  def searchNode(node: Node): Unit = node match {
    case Link(dest, title, inlines) => 
      links += Link(dest, title, inlines)
    
    case Document(children) => children.foreach(searchNode)
    case Paragraph(inlines) => inlines.foreach(searchNode)
    case Heading(_, inlines) => inlines.foreach(searchNode)
    case Emphasis(inlines) => inlines.foreach(searchNode)
    case Strong(inlines) => inlines.foreach(searchNode)
    case b: BlockQuote => b.children.foreach(searchNode)
    case lb: ListBlock => lb.items.foreach(item => item.content.foreach(searchNode))
    
    // Handle other nodes as needed
    case _ => // Skip other node types
  }
  
  searchNode(doc)
  links.toList
}

Transforming the AST

The AST is immutable, so transformations create new nodes:

// Add a custom class to all code blocks
def addClassToCodeBlocks(doc: Document, className: String): Document = {
  def transformBlock(block: Block): Block = block match {
    case c: Code =>
      val newInfoString = c.infoString match {
        case Some(info) => Some(s"$info $className")
        case None => Some(className)
      }
      c.copy(infoString = newInfoString)
    
    case bq: BlockQuote =>
      BlockQuote(bq.children.map(transformBlock))
    
    case lb: ListBlock =>
      val newItems = lb.items.map(item => 
        ListItem(item.content.map(transformBlock))
      )
      ListBlock(lb.data, newItems)
    
    // Transform other block types as needed
    case other => other
  }
  
  Document(doc.children.map(transformBlock))
}

Common Tasks

Adding a Custom Header ID

// Add a custom ID attribute to a heading when rendering
def customRenderHeading(heading: Heading): String = {
  val text = inlinesToPlainText(heading.inlines)
  val id = text.toLowerCase.replaceAll("[^a-z0-9]+", "-").trim('-')
  s"""<h${heading.level} id="$id">${renderInlines(heading.inlines)}</h${heading.level}>"""
}

Syntax Highlighting for Code Blocks

// Custom renderer with syntax highlighting for code blocks
def renderCodeBlockWithHighlighting(code: Code): String = {
  val content = escapeXml(code.content)
  val langClass = code.infoString
    .map(info => s""" class="language-${info.split(' ').head}"""")
    .getOrElse("")
  
  s"""<pre><code$langClass>$content</code></pre>"""
}

Custom Table Rendering

// Custom HTML table renderer with additional classes
def renderCustomTable(table: Table): String = {
  val alignAttrs = table.alignments.map {
    case TableAlignment.Left => """ align="left""""
    case TableAlignment.Right => """ align="right""""
    case TableAlignment.Center => """ align="center""""
    case TableAlignment.None => ""
  }
  
  val headerCells = table.headerRow.cells.zip(alignAttrs).map { case (cell, align) =>
    s"""<th$align>${renderInlines(cell.content)}</th>"""
  }.mkString

  val rows = table.rows.map { row =>
    val cells = row.cells.zip(alignAttrs).map { case (cell, align) =>
      s"""<td$align>${renderInlines(cell.content)}</td>"""
    }.mkString
    s"<tr>$cells</tr>"
  }.mkString
  
  s"""<table class="markdown-table">
     |  <thead>
     |    <tr>$headerCells</tr>
     |  </thead>
     |  <tbody>
     |    $rows
     |  </tbody>
     |</table>""".stripMargin
}

Advanced Usage

Custom Block Parser

To extend the library with a custom block parser:

import scala.collection.mutable

object CustomDivBlockParser extends BlockParser {
  val name: String = "custom divs"
  
  def canStart(lines: List[LazyList[C]]): Boolean = {
    if (lines.isEmpty) return false
    
    val line = lines.head.takeWhile(_.char != '\n').map(_.char).mkString
    line.trim.startsWith(":::") 
  }
  
  def parse(
    lines: List[LazyList[C]], 
    linkRefs: mutable.Map[String, LinkReference]
  ): (Block, Int) = {
    var currentLine = 0
    val content = new StringBuilder
    var foundClosing = false
    
    // Get div type
    val firstLine = lines(currentLine)
      .takeWhile(_.char != '\n').map(_.char).mkString.trim
    val divType = firstLine.stripPrefix(":::")
    currentLine += 1
    
    // Collect all lines until closing marker
    while (currentLine < lines.size && !foundClosing) {
      val line = lines(currentLine)
        .takeWhile(_.char != '\n').map(_.char).mkString
      
      if (line.trim == ":::") {
        foundClosing = true
      } else {
        content.append(line).append('\n')
      }
      currentLine += 1
    }
    
    // Process the content recursively
    val reader = new InputReader(content.toString)
    val (innerDoc, _) = parseDocument(reader.stream)
    
    // Return a custom HTML block with div wrapper
    val html = s"""<div class="$divType">
                  |${renderToHTML(innerDoc)}
                  |</div>""".stripMargin
                  
    (HTMLBlock(html), currentLine)
  }
}

// Add the custom parser to the list of block parsers
blockParsers.prepend(CustomDivBlockParser)

Custom Inline Renderer

To customize how inline elements are rendered:

def customRenderInlines(inlines: List[Inline]): String = {
  inlines.map {
    case Text(content) => escapeXml(content)
    case Emphasis(children) => s"<em class='custom-em'>${customRenderInlines(children)}</em>"
    case Strong(children) => s"<strong class='custom-strong'>${customRenderInlines(children)}</strong>"
    case Link(dest, title, children) =>
      val titleAttr = title.map(t => s""" title="${escapeXml(t)}"""").getOrElse("")
      val classes = if (dest.startsWith("http")) "external-link" else "internal-link"
      s"""<a href="proxy.php?url=https%3A%2F%2Fgithub.com%2F%3C%2Fspan%3E%24%7BescapeXml%28dest%29%7D%3Cspan+class%3D"pl-s">"$titleAttr class="$classes">${customRenderInlines(children)}</a>"""
    case other => renderToHTML(other) // Fall back to default rendering
  }.mkString
}

Performance Considerations

The parser uses lazy lists to avoid loading the entire document into memory at once
For large documents, consider processing in chunks if possible
If parsing multiple documents, reuse the same parsers to avoid initialization overhead

Error Handling

The parser is designed to handle malformed input gracefully:

def safeParseMarkdown(input: String): Document = {
  try {
    parseDocumentContent(input)
  } catch {
    case e: Exception =>
      // Log the error
      println(s"Error parsing markdown: ${e.getMessage}")
      // Return an empty document or error document
      Document(List(Paragraph(List(Text(s"Error parsing content: ${e.getMessage}")))))
  }
}

Debugging

For debugging purposes, you can render the document as XML:

val doc = parseDocumentContent("# Test\n\nParagraph")
val xml = renderToXML(doc)
println(xml)

This will produce XML output showing the full structure of the AST:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE document SYSTEM "CommonMark.dtd">
<document xmlns="http://commonmark.org/xml/1.0">
  <h1>Test</h1>
  <paragraph>Paragraph</paragraph>
</document>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Markdown Library Programmer's Guide

Installation

Basic Usage

Parsing Markdown

Rendering to HTML

Extracting Headers

Core API

Main Functions

Document Model

Block-Level Elements

List-Related Elements

Table-Related Elements

Inline-Level Elements

Other Types

Working with the AST

Walking the Document Tree

Finding Specific Content

Transforming the AST

Common Tasks

Adding a Custom Header ID

Syntax Highlighting for Code Blocks

Custom Table Rendering

Advanced Usage

Custom Block Parser

Custom Inline Renderer

Performance Considerations

Error Handling

Debugging

FilesExpand file tree

markdown-guide.md

Latest commit

History

markdown-guide.md

File metadata and controls

Markdown Library Programmer's Guide

Installation

Basic Usage

Parsing Markdown

Rendering to HTML

Extracting Headers

Core API

Main Functions

Document Model

Block-Level Elements

List-Related Elements

Table-Related Elements

Inline-Level Elements

Other Types

Working with the AST

Walking the Document Tree

Finding Specific Content

Transforming the AST

Common Tasks

Adding a Custom Header ID

Syntax Highlighting for Code Blocks

Custom Table Rendering

Advanced Usage

Custom Block Parser

Custom Inline Renderer

Performance Considerations

Error Handling

Debugging