Skip to content

al8n/tokit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

48 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Tokit

Blazing fast parser combinators with parse-while-lexing architecture (zero-copy), deterministic LALR-style parsing, and no hidden backtracking.

github LoC Build codecov

docs.rs crates.io crates.io license

Overview

Tokit is a blazing fast parser combinator library for Rust that uniquely combines:

  • Parse-While-Lexing Architecture: Zero-copy streaming - parsers consume tokens directly from the lexer without buffering, eliminating allocation overhead
  • Deterministic LALR-Style Parsing: Explicit lookahead with compile-time buffer capacity, no hidden backtracking
  • Flexible Error Handling: Same parser code adapts for fail-fast runtime or greedy compiler diagnostics via the Emitter trait

Unlike traditional parser combinators that buffer tokens and rely on implicit backtracking, Tokit streams tokens on-demand with predictable, deterministic decisions. This makes it ideal for building high-performance language tooling, DSL parsers, compilers, and REPLs that need both speed and comprehensive error reporting.

Key Features

  • Parse-While-Lexing: Zero-copy streaming architecture - no token buffering, no extra allocations
  • No Hidden Backtracking: Explicit, predictable parsing with lookahead-based decisions instead of implicit backtracking
  • Deterministic + Composable: Combines the flexibility of parser combinators with LALR-style deterministic table parsing
  • Flexible Error Handling Architecture: Designed to support both fail-fast parsing (runtime) and greedy parsing (compiler diagnostics) by swapping the Emitter type - same parser, different behavior
  • Token-Based Parsing: Works directly on token streams from any lexer implementing the Lexer<'inp> trait
  • Composable Combinators: Build complex parsers from simple, reusable building blocks
  • Flexible Error Handling: Configurable error emission strategies (Fatal, Silent, Ignored)
  • Rich Error Recovery: Built-in recover() and inplace_recover() combinators for resilient parsing with backtracking or synchronization
  • Zero-Cost Abstractions: All configuration resolved at compile time
  • No-std Support: Core functionality works without allocator
  • Multiple Source Types: Support for str, [u8], Bytes, BStr, HipStr
  • Logos Integration: Optional LogosLexer adapter for seamless logos integration
  • CST Support: Optional Concrete Syntax Tree support via rowan

Installation

Add this to your Cargo.toml:

[dependencies]
tokit = "0.1"

Feature Flags

  • std (default) - Enable standard library support
  • alloc - Enable allocator support for no-std environments
  • rowan - Enable CST (Concrete Syntax Tree) support with rowan integration
  • logos - Enable integration with latest logos crate
  • bytes - Enable integration with latest bytes crate
    • bytes_1 - Enable integration with latest bytes@1
  • bstr - Enable integration with latest bstr crate
    • bstr_1 - Enable integration with bstr@1
  • hipstr - Enable integration with latest hipstr crate
  • smallvec - Enable integration with latest smallvec crate
    • smallvec_1 - Enable integration with smallvec_1@1
  • heapless - Enable integration with latest heapless crate

Core Components

Lexer Layer

  • Lexer<'inp> Trait

    Core trait for lexers that produce token streams. Implement this to use any lexer with Tokit.

  • Token<'a> Trait

    Defines token types with:

    • Kind: Token kind discriminator
    • Error: Associated error type
  • LogosLexer<'inp, T, L> (feature: logos)

    Ready-to-use adapter for integrating Logos lexers.

Error Handling

Tokit's flexible Emitter system allows the same parser to adapt to different use cases by simply changing the error handling strategy.

Atomically Composable Trait Design

Tokit's emitter system uses atomically composable traits - small, focused traits that each handle a specific parsing scenario. Instead of one monolithic interface, error handling is broken down into atomic building blocks:

  • Core: Emitter - Base error handling (lexer errors, unexpected tokens)
  • Repetition: TooFewEmitter, TooManyEmitter, FullContainerEmitter
  • Separation: SeparatedEmitter, UnexpectedLeadingSeparatorEmitter, UnexpectedTrailingSeparatorEmitter

This design provides:

  • Fine-grained control: Implement only the traits you need
  • Composability: Mix and match traits to build custom strategies
  • Extensibility: Create specialized emitters for specific use cases

Built-in Emitter Strategies

Tokit provides complete implementations that implement all atomic traits with consistent behavior:

  • Fatal - Fail-fast parsing: Stop on first error (default) - perfect for runtime parsing and REPLs
  • Verbose - Comprehensive error collection: Collect all errors and continue parsing - perfect for compiler diagnostics and IDEs
  • Silent - Silently ignore errors
  • Ignored - Ignore errors completely

Key Design: Change the Emitter type to switch between fail-fast runtime parsing and comprehensive compiler diagnostics - same parser code, different behavior. This makes Tokit suitable for both:

  • Runtime/REPL: Fast feedback with Fatal emitter
  • Compiler/IDE: Comprehensive diagnostics with Verbose emitter

The Verbose Emitter: Unlike Fatal which stops at the first error, the Verbose emitter collects all errors during parsing and continues where possible. Errors are stored in a BTreeMap indexed by span, automatically sorted by their position in the source code. After parsing, retrieve all errors via the errors() method for display, analysis, or further processing. This is ideal for:

  • Showing users all issues at once in compiler output
  • Providing real-time diagnostics in IDE error panels
  • Collecting comprehensive error information for debugging
  • Generating detailed error reports for language servers

Custom Emitters: Thanks to the atomically composable trait design, you can create custom error handling strategies by implementing only the traits you need. You compose small, focused traits to build exactly the behavior you want. For example, you could build an emitter that:

  • Implements only Emitter + TooFewEmitter for a parser that only needs those scenarios

  • Limits the maximum number of errors before stopping

  • Filters errors by severity level

  • Sends errors to a logging system or telemetry service

  • Implements domain-specific error recovery strategies for specific error types

  • Wraps an existing emitter and adds custom behavior for certain atomic traits

  • Rich Error Types (in error/ module)

    • Token-level: UnexpectedToken, MissingToken, UnexpectedEot
    • Syntax-level: Unclosed, Unterminated, Malformed, Invalid
    • Escape sequences: HexEscape, UnicodeEscape
    • All errors include span tracking

Error Recovery

Tokit provides built-in parser combinators for resilient parsing that can continue after errors:

Recovery Strategies

  • recover(recovery_parser) - Error recovery with backtracking

    • If primary parser fails, resets to starting position and tries recovery parser
    • Use for: Alternative interpretations, fallback values, error nodes
    • Example: parse_expr().recover(parse_error_node())
  • inplace_recover(recovery_parser) - Error recovery without backtracking

    • If primary parser fails, continues from error position with recovery parser
    • Use for: Panic mode recovery, resynchronization, skipping to safe points
    • Example: parse_stmt().inplace_recover(skip_to_semicolon())

Recovery Patterns

Alternative Parsing (with backtracking):

// Try parsing as function, fall back to error item
let parser = parse_function()
    .recover(parse_error_item());

// Input with error → recovers gracefully

Synchronization Points (without backtracking):

// Parse statement, skip to semicolon on error
let parser = parse_statement()
    .inplace_recover(
        skip_to(|tok| matches!(tok, Token::Semicolon))
            .then_ignore(any())
            .map(|_| Statement::Error)
    );

// Continues parsing from next statement

Comprehensive Error Collection:

// Use with Verbose emitter to collect all errors
let emitter = Verbose::new();
let items = many(
    parse_item().recover(parse_error_item())
);

// After parsing, retrieve all errors:
for (span, error) in emitter.errors() {
    eprintln!("Error at {:?}: {}", span, error);
}

Error recovery works seamlessly with the atomically composable emitter system - combine Verbose emitter with recovery combinators to build robust parsers that report all issues in a single pass.

Utilities

  • Span Tracking

    • SimpleSpan - Lightweight span representation
    • Spanned<T, S> - Wrap value with span
    • Located<T, S> - Wrap value with span and source slice
    • Sliced<T, S> - Wrap value with source slice
  • Parser Configuration

    • Parser<F, L, O, Error, Context> - Configurable parser
    • ParseContext - Context for emitter and cache
    • Window - Type-level peek buffer capacity for deterministic lookahead
    • Note: Lookahead windows support 1-32 token capacity via typenum::{U1..U32}

Examples

All examples are self-contained and runnable with cargo run --example <name> --features std,logos. They are also compiled as integration tests (cargo test --example <name> --features std,logos).

json — JSON value parser

Demonstrates map-based combinators (separated_by, fold, peek_then) on a recursive JSON grammar. Produces an enum Value (null, bool, number, string, array, object) from the token stream without any intermediate allocation.

cargo run --example json --features std,logos

calculator — arithmetic expression evaluator (token-level Pratt)

Demonstrates the token-level Pratt API (InputRef::pratt) where Token implements PrattToken to classify itself as an operand, prefix, or infix/postfix operator. Fold functions receive raw Spanned<Token> values and encode computed f64 results back into Token::Num.

Operator table: + - (infix, left, precedence 1), * / (infix, left, 2), unary - (prefix, 3), ^ (infix, right, 4), ( ) (grouping via PREC_PAREN sentinel).

cargo run --example calculator --features std,logos

s_expression — Lisp S-expression interpreter (recursive descent)

Demonstrates pure recursive-descent parsing with InputRef::next and InputRef::try_expect — no Pratt parsing involved. The evaluator reduces the parsed AST to an Atom value.

Supports: integer and boolean literals, keyword atoms (:foo), quoted lists ('(1 2 3)), the built-in functions + - * / = not, and (if cond then [else]) conditionals.

cargo run --example s_expression --features std,logos

c_expression — C-style expression parser (combinator-level Pratt)

Demonstrates the combinator-level Pratt API (pratt_of) where separate parse_lhs / parse_rhs functions return PrattLHS / PrattRHS values and fold functions receive fully typed AST nodes and an &mut InputRef — enabling complex postfix forms that consume additional tokens.

Supported operators (in precedence order): || &&, == != < <= > >=, << >>, + -, * / %, unary ! - ~ ++ --, postfix ++ --, array subscript a[i], function call f(args...), and ternary cond ? then : else.

cargo run --example c_expression --features std,logos

Architecture

Tokit's architecture follows a layered design:

  1. Lexer Layer - Token production and source abstraction
  2. Parser Layer - Composable parser combinators
  3. Error Layer - Rich error types and emission strategies
  4. Utility Layer - Spans, containers, and helpers

This separation enables:

  • Use any lexer by implementing Lexer<'inp>
  • Mix and match parser combinators
  • Customize error handling per-parser or globally
  • Zero-cost abstractions through compile-time configuration

Design Philosophy

Parse-While-Lexing: Zero-Copy Streaming

Tokit uses a parse-while-lexing architecture where parsers consume tokens directly from the lexer as needed, without intermediate buffering:

Traditional Approach (Two-Phase):

Source → Lexer → [Token Buffer] → Parser
         ↓
    Allocate Vec<Token>  ← Extra allocation!

Tokit Approach (Streaming):

Source → Lexer ←→ Parser
         ↑________↓
    Zero-copy streaming, no buffer

Benefits:

  • Zero Extra Allocations: No token buffer, tokens consumed on-demand
  • Lower Memory Footprint: Only lookahead window buffered on stack, not entire token stream
  • Better Cache Locality: Tokens processed immediately after lexing
  • Predictable Performance: No large allocations, deterministic memory usage

No Hidden Backtracking

Unlike traditional parser combinators that rely on implicit backtracking (trying alternatives until one succeeds), Tokit uses explicit lookahead-based decisions. This design choice provides:

  • Predictable Performance: No hidden exponential backtracking scenarios
  • Explicit Control: Developers decide when and where to peek ahead via peek_then() and peek_then_choice()
  • Deterministic Parsing: LALR-style table-driven decisions using fixed-capacity lookahead windows (Window trait)
  • Better Error Messages: Failed alternatives don't hide earlier, more relevant errors
// Traditional parser combinator (hidden backtracking):
// try_parser1.or(try_parser2).or(try_parser3)  // May backtrack!

// Tokit approach (explicit lookahead, no backtracking):
let parser = any()
    .peek_then::<_, typenum::U2>(|peeked, _| {
        match peeked.front() {
            ...
        }
    });

Parser Combinators + Deterministic Table Parsing

Tokit uniquely combines:

  • Parser Combinator Flexibility: Compose small parsers into complex grammars
  • LALR-Style Determinism: Fixed lookahead windows with deterministic decisions
  • Type-Level Capacity: Lookahead buffer size known at compile time (Window::CAPACITY)

This hybrid approach gives you composable abstractions without sacrificing performance or predictability.

Atomically Composable Error Handling

Tokit's error handling system breaks down error scenarios into small, focused traits. Each trait handles one specific parsing situation (like TooFewEmitter for "too few elements" or UnexpectedLeadingSeparatorEmitter for leading separators).

Benefits of the Atomically Composable Trait Design:

  • Implement only what you need: Your parser only uses TooFewEmitter? Just implement that trait
  • Compose custom strategies: Mix and match atomic traits to build specialized error handlers
  • Pre-built bundles: Fatal, Verbose, and Silent implement all traits for convenience
  • Fine-grained control: Small, reusable pieces that compose into complex behavior

This is fundamentally different from traditional monolithic error handler interfaces - you get both the simplicity of pre-built strategies and the flexibility to implement only what you need.

Fail-Fast Runtime ↔ Comprehensive Compiler Diagnostics

Tokit's architecture decouples parsing logic from error handling strategy through the atomic Emitter trait system. This means:

Same Parser, Different Contexts:

  • Runtime/REPL Mode: Use Fatal emitter → stop on first error for immediate feedback
  • Compiler/IDE Mode: Use Verbose emitter → collect all errors for comprehensive diagnostics
  • Testing/Fuzzing: Use Ignored emitter → parse through all errors for robustness testing

Benefits:

  • ✅ Write parsers once, deploy everywhere
  • ✅ No separate "error recovery mode" - it's just a different emitter
  • ✅ Custom emitters can implement domain-specific error handling
  • ✅ Zero-cost abstraction - emitter behavior resolved at compile time

Inspirations

Tokit takes inspiration from:

  • winnow - For ergonomic parser API design
  • chumsky - For composable parser combinator patterns
  • logos - For high-performance lexing
  • rowan - For lossless syntax tree representation

Core Priorities

  1. Performance - Parse-while-lexing (zero-copy streaming), zero-cost abstractions, no hidden allocations
  2. Predictability - No hidden backtracking, explicit control flow, deterministic decisions
  3. Composability - Small parsers combine into complex ones; atomic emitter traits compose into custom strategies
  4. Versatility - Same parser works for runtime (fail-fast) and compiler diagnostics (comprehensive) via atomic Emitter traits
  5. Flexibility - Work with any lexer, atomic error handling traits, support both AST and CST
  6. Correctness - Rich error types, span tracking, validation

Who Uses Tokit?

  • smear: Blazing fast, fully spec-compliant, reusable parser combinators for standard GraphQL and GraphQL-like DSLs

License

tokit is dual-licensed under:

You may choose either license for your purposes.

Copyright (c) 2026 Al Liu.

About

Blazing fast parser combinators with parse-while-lexing architecture (zero-copy), deterministic LALR-style parsing, and no hidden backtracking.

Resources

License

Apache-2.0, MIT licenses found

Licenses found

Apache-2.0
LICENSE-APACHE
MIT
LICENSE-MIT

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

  •  

Packages

 
 
 

Contributors

Languages