Skip to content

lambdaclass/lambda_compiler_kit

Repository files navigation

Lambda Compiler Kit (LCK)

A collection of Lean 4 libraries for building verified compiler infrastructure.

Overview

Lambda Compiler Kit provides formally verified implementations of common compiler components. The first library is a regular expression engine with pattern matching capabilities.

Features

Regex Library

A complete regular expression engine implemented in Lean 4 with formal verification:

  • Pattern Syntax: literals, character classes [abc], ranges [a-z], anchors ^$, wildcards ., alternation |
  • Quantifiers: * (zero or more), + (one or more), ? (zero or one)
  • Escape Sequences: \n, \t, \., \^, \$, \[, \], \|, \\, \*, \+, \?
  • Implementation: NFA-based matching using Thompson's construction
  • Matching Modes: Full string match and grep-style substring search

CLI Tool: lck-grep

A grep-like utility for testing the regex engine:

# Build the tool
lake build lck-grep

# Basic usage
echo "hello world" | lake exe lck-grep "world"

# Search log files
lake exe lck-grep "^ERROR" < application.log

# Match IP addresses
lake exe lck-grep "[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+" < server.log

Quick Start

Prerequisites

  • Lean 4 (v4.26.0)
  • Lake (Lean's build system)

Building

# Clone the repository
git clone https://github.com/lambdaclass/lambda_compiler_kit.git
cd lambda_compiler_kit

# Build the library
lake build

# Run tests
lake build test && lake exe test

# Build CLI tool
lake build lck-grep

Examples

Try the interactive demo:

./examples/demo.sh

Or explore individual examples:

# Find lines starting with specific text
lake exe lck-grep "^2024" < examples/logs.txt

# Match email addresses
lake exe lck-grep "[a-z]+@[a-z]+\.[a-z]+" < examples/emails.txt

# Find words with alternation
lake exe lck-grep "cat|dog|bird" < examples/words.txt

See examples/README.md for comprehensive pattern examples.

Library Usage

Import the regex library in your Lean code:

import Lck.Regex

open Lck.Regex

-- Compile and search for a pattern
def example1 : IO Unit := do
  match search "hello" "hello world" with
  | .ok true => IO.println "Match found!"
  | .ok false => IO.println "No match"
  | .error e => IO.eprintln s!"Error: {e}"

-- Pre-compile for efficiency
def example2 : IO Unit := do
  match compile "^[0-9]+" with
  | .ok regex =>
      let result := searchCompiled regex "123 test"
      IO.println s!"Match: {result}"
  | .error e => IO.eprintln s!"Error: {e}"

Project Structure

  • Lck/ - Library source code
    • Lck/Regex/ - Regex library implementation
      • Syntax.lean - AST representation
      • Parser.lean - Pattern parser
      • Nfa.lean - NFA construction (Thompson's construction)
      • Matcher.lean - NFA simulation and matching
      • Regex.lean - High-level API
  • cli/ - Command-line tools
    • LckGrep.lean - lck-grep utility
  • test/ - Test suites
  • examples/ - Example files and demo scripts

Development

Running Tests

lake build test && lake exe test

All tests should pass:

  • Parser tests (22 tests, including edge cases for escape sequences, anchors, and reversed ranges)
  • NFA construction tests (7 tests)
  • Matcher tests (23 tests, including negated classes, ranges, anchors, complex patterns)

Documentation

See CLAUDE.md for development guidelines and project details.

Pattern Syntax Reference

Syntax Description Example
abc Literal characters hello matches "hello"
. Any character (except newline) a.c matches "abc", "adc"
^ Start of line anchor ^ERROR matches lines starting with "ERROR"
$ End of line anchor end$ matches lines ending with "end"
[abc] Character class [aeiou] matches any vowel
[a-z] Character range [0-9] matches any digit
[^abc] Negated character class [^0-9] matches non-digits
| Alternation (OR) cat|dog matches "cat" or "dog"
* Zero or more ab*c matches "ac", "abc", "abbc"
+ One or more ab+c matches "abc", "abbc" (not "ac")
? Zero or one (optional) ab?c matches "ac" or "abc"

Technology

  • Language: Lean 4 (v4.26.0)
  • Build System: Lake
  • Dependencies: Mathlib v4.26.0, LSpec (testing)
  • Algorithm: Thompson's NFA construction with NFA simulation

License

Released under MIT license. See LICENSE for details.

Contributing

This project follows Mathlib conventions and linting standards. All contributions should:

  • Pass the linter: lake lint
  • Pass all tests: lake build test && lake exe test
  • Follow Lean 4 best practices

Future Work

Planned additions to the Lambda Compiler Kit:

  • Lexer generator
  • Parser combinators
  • Semantic analysis utilities
  • Code generation framework

GitHub Configuration

To set up your new GitHub repository:

  1. Under your repository name, click Settings
  2. In the Actions section, click "General"
  3. Check Allow GitHub Actions to create and approve pull requests
  4. In Pages section, set Source to "GitHub Actions"

After setup, documentation will be automatically generated at your GitHub Pages URL.

About

Lambda Compiler Kit: formally verified toolkit for building compilers

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages