Tokit

Blazing fast parser combinators with parse-while-lexing architecture (zero-copy), deterministic LALR-style parsing, and no hidden backtracking.

Overview

Tokit is a blazing fast parser combinator library for Rust that uniquely combines:

Parse-While-Lexing Architecture: Zero-copy streaming - parsers consume tokens directly from the lexer without buffering, eliminating allocation overhead
Deterministic LALR-Style Parsing: Explicit lookahead with compile-time buffer capacity, no hidden backtracking
Flexible Error Handling: Same parser code adapts for fail-fast runtime or greedy compiler diagnostics via the Emitter trait

Unlike traditional parser combinators that buffer tokens and rely on implicit backtracking, Tokit streams tokens on-demand with predictable, deterministic decisions. This makes it ideal for building high-performance language tooling, DSL parsers, compilers, and REPLs that need both speed and comprehensive error reporting.

Key Features

Parse-While-Lexing: Zero-copy streaming architecture - no token buffering, no extra allocations
No Hidden Backtracking: Explicit, predictable parsing with lookahead-based decisions instead of implicit backtracking
Deterministic + Composable: Combines the flexibility of parser combinators with LALR-style deterministic table parsing
Flexible Error Handling Architecture: Designed to support both fail-fast parsing (runtime) and greedy parsing (compiler diagnostics) by swapping the Emitter type - same parser, different behavior
Token-Based Parsing: Works directly on token streams from any lexer implementing the Lexer<'inp> trait
Composable Combinators: Build complex parsers from simple, reusable building blocks
Flexible Error Handling: Configurable error emission strategies (Fatal, Silent, Ignored)
Rich Error Recovery: Built-in recover() and inplace_recover() combinators for resilient parsing with backtracking or synchronization
Zero-Cost Abstractions: All configuration resolved at compile time
No-std Support: Core functionality works without allocator
Multiple Source Types: Support for str, [u8], Bytes, BStr, HipStr
Logos Integration: Optional LogosLexer adapter for seamless logos integration
CST Support: Optional Concrete Syntax Tree support via rowan

Installation

Add this to your Cargo.toml:

[dependencies]
tokit = "0.1"

Feature Flags

std (default) - Enable standard library support
alloc - Enable allocator support for no-std environments
rowan - Enable CST (Concrete Syntax Tree) support with rowan integration
logos - Enable integration with latest logos crate
- logos_0_16 - Enable integration with [email protected]
- logos_0_15 - Enable integration with [email protected]
- logos_0_14 - Enable integration with [email protected]
bytes - Enable integration with latest bytes crate
- bytes_1 - Enable integration with latest bytes@1
bstr - Enable integration with latest bstr crate
- bstr_1 - Enable integration with bstr@1
hipstr - Enable integration with latest hipstr crate
- hipstr_0_8 - Enable integration with [email protected]
smallvec - Enable integration with latest smallvec crate
- smallvec_1 - Enable integration with smallvec_1@1
heapless - Enable integration with latest heapless crate
- heapless_0_9 - Enable integration with [email protected]

Core Components

Lexer Layer

Lexer<'inp> Trait

Core trait for lexers that produce token streams. Implement this to use any lexer with Tokit.
Token<'a> Trait

Defines token types with:
- Kind: Token kind discriminator
- Error: Associated error type
LogosLexer<'inp, T, L> (feature: logos)

Ready-to-use adapter for integrating Logos lexers.

Error Handling

Tokit's flexible Emitter system allows the same parser to adapt to different use cases by simply changing the error handling strategy.

Atomically Composable Trait Design

Tokit's emitter system uses atomically composable traits - small, focused traits that each handle a specific parsing scenario. Instead of one monolithic interface, error handling is broken down into atomic building blocks:

Core: Emitter - Base error handling (lexer errors, unexpected tokens)
Repetition: TooFewEmitter, TooManyEmitter, FullContainerEmitter
Separation: SeparatedEmitter, UnexpectedLeadingSeparatorEmitter, UnexpectedTrailingSeparatorEmitter

This design provides:

Fine-grained control: Implement only the traits you need
Composability: Mix and match traits to build custom strategies
Extensibility: Create specialized emitters for specific use cases

Built-in Emitter Strategies

Tokit provides complete implementations that implement all atomic traits with consistent behavior:

Fatal - Fail-fast parsing: Stop on first error (default) - perfect for runtime parsing and REPLs
Verbose - Comprehensive error collection: Collect all errors and continue parsing - perfect for compiler diagnostics and IDEs
Silent - Silently ignore errors
Ignored - Ignore errors completely

Key Design: Change the Emitter type to switch between fail-fast runtime parsing and comprehensive compiler diagnostics - same parser code, different behavior. This makes Tokit suitable for both:

Runtime/REPL: Fast feedback with Fatal emitter
Compiler/IDE: Comprehensive diagnostics with Verbose emitter

The Verbose Emitter: Unlike Fatal which stops at the first error, the Verbose emitter collects all errors during parsing and continues where possible. Errors are stored in a BTreeMap indexed by span, automatically sorted by their position in the source code. After parsing, retrieve all errors via the errors() method for display, analysis, or further processing. This is ideal for:

Showing users all issues at once in compiler output
Providing real-time diagnostics in IDE error panels
Collecting comprehensive error information for debugging
Generating detailed error reports for language servers

Custom Emitters: Thanks to the atomically composable trait design, you can create custom error handling strategies by implementing only the traits you need. You compose small, focused traits to build exactly the behavior you want. For example, you could build an emitter that:

Implements only Emitter + TooFewEmitter for a parser that only needs those scenarios
Limits the maximum number of errors before stopping
Filters errors by severity level
Sends errors to a logging system or telemetry service
Implements domain-specific error recovery strategies for specific error types
Wraps an existing emitter and adds custom behavior for certain atomic traits
Rich Error Types (in error/ module)
- Token-level: UnexpectedToken, MissingToken, UnexpectedEot
- Syntax-level: Unclosed, Unterminated, Malformed, Invalid
- Escape sequences: HexEscape, UnicodeEscape
- All errors include span tracking

Error Recovery

Tokit provides built-in parser combinators for resilient parsing that can continue after errors:

Recovery Strategies

recover(recovery_parser) - Error recovery with backtracking
- If primary parser fails, resets to starting position and tries recovery parser
- Use for: Alternative interpretations, fallback values, error nodes
- Example: parse_expr().recover(parse_error_node())
inplace_recover(recovery_parser) - Error recovery without backtracking
- If primary parser fails, continues from error position with recovery parser
- Use for: Panic mode recovery, resynchronization, skipping to safe points
- Example: parse_stmt().inplace_recover(skip_to_semicolon())

Recovery Patterns

Alternative Parsing (with backtracking):

// Try parsing as function, fall back to error item
let parser = parse_function()
    .recover(parse_error_item());

// Input with error → recovers gracefully

Synchronization Points (without backtracking):

// Parse statement, skip to semicolon on error
let parser = parse_statement()
    .inplace_recover(
        skip_to(|tok| matches!(tok, Token::Semicolon))
            .then_ignore(any())
            .map(|_| Statement::Error)
    );

// Continues parsing from next statement

Comprehensive Error Collection:

// Use with Verbose emitter to collect all errors
let emitter = Verbose::new();
let items = many(
    parse_item().recover(parse_error_item())
);

// After parsing, retrieve all errors:
for (span, error) in emitter.errors() {
    eprintln!("Error at {:?}: {}", span, error);
}

Error recovery works seamlessly with the atomically composable emitter system - combine Verbose emitter with recovery combinators to build robust parsers that report all issues in a single pass.

Utilities

Span Tracking
- SimpleSpan - Lightweight span representation
- Spanned<T, S> - Wrap value with span
- Located<T, S> - Wrap value with span and source slice
- Sliced<T, S> - Wrap value with source slice
Parser Configuration
- Parser<F, L, O, Error, Context> - Configurable parser
- ParseContext - Context for emitter and cache
- Window - Type-level peek buffer capacity for deterministic lookahead
- Note: Lookahead windows support 1-32 token capacity via typenum::{U1..U32}

Examples

All examples are self-contained and runnable with cargo run --example <name> --features std,logos. They are also compiled as integration tests (cargo test --example <name> --features std,logos).

`json` — JSON value parser

Demonstrates map-based combinators (separated_by, fold, peek_then) on a recursive JSON grammar. Produces an enum Value (null, bool, number, string, array, object) from the token stream without any intermediate allocation.

cargo run --example json --features std,logos

`calculator` — arithmetic expression evaluator (token-level Pratt)

Demonstrates the token-level Pratt API (InputRef::pratt) where Token implements PrattToken to classify itself as an operand, prefix, or infix/postfix operator. Fold functions receive raw Spanned<Token> values and encode computed f64 results back into Token::Num.

Operator table: + - (infix, left, precedence 1), * / (infix, left, 2), unary - (prefix, 3), ^ (infix, right, 4), ( ) (grouping via PREC_PAREN sentinel).

cargo run --example calculator --features std,logos

`s_expression` — Lisp S-expression interpreter (recursive descent)

Demonstrates pure recursive-descent parsing with InputRef::next and InputRef::try_expect — no Pratt parsing involved. The evaluator reduces the parsed AST to an Atom value.

Supports: integer and boolean literals, keyword atoms (:foo), quoted lists ('(1 2 3)), the built-in functions + - * / = not, and (if cond then [else]) conditionals.

cargo run --example s_expression --features std,logos

`c_expression` — C-style expression parser (combinator-level Pratt)

Demonstrates the combinator-level Pratt API (pratt_of) where separate parse_lhs / parse_rhs functions return PrattLHS / PrattRHS values and fold functions receive fully typed AST nodes and an &mut InputRef — enabling complex postfix forms that consume additional tokens.

Supported operators (in precedence order): || &&, == != < <= > >=, << >>, + -, * / %, unary ! - ~ ++ --, postfix ++ --, array subscript a[i], function call f(args...), and ternary cond ? then : else.

cargo run --example c_expression --features std,logos

Architecture

Tokit's architecture follows a layered design:

Lexer Layer - Token production and source abstraction
Parser Layer - Composable parser combinators
Error Layer - Rich error types and emission strategies
Utility Layer - Spans, containers, and helpers

This separation enables:

Use any lexer by implementing Lexer<'inp>
Mix and match parser combinators
Customize error handling per-parser or globally
Zero-cost abstractions through compile-time configuration

Design Philosophy

Parse-While-Lexing: Zero-Copy Streaming

Tokit uses a parse-while-lexing architecture where parsers consume tokens directly from the lexer as needed, without intermediate buffering:

Traditional Approach (Two-Phase):

Source → Lexer → [Token Buffer] → Parser
         ↓
    Allocate Vec<Token>  ← Extra allocation!

Tokit Approach (Streaming):

Source → Lexer ←→ Parser
         ↑________↓
    Zero-copy streaming, no buffer

Benefits:

✅ Zero Extra Allocations: No token buffer, tokens consumed on-demand
✅ Lower Memory Footprint: Only lookahead window buffered on stack, not entire token stream
✅ Better Cache Locality: Tokens processed immediately after lexing
✅ Predictable Performance: No large allocations, deterministic memory usage

No Hidden Backtracking

Unlike traditional parser combinators that rely on implicit backtracking (trying alternatives until one succeeds), Tokit uses explicit lookahead-based decisions. This design choice provides:

Predictable Performance: No hidden exponential backtracking scenarios
Explicit Control: Developers decide when and where to peek ahead via peek_then() and peek_then_choice()
Deterministic Parsing: LALR-style table-driven decisions using fixed-capacity lookahead windows (Window trait)
Better Error Messages: Failed alternatives don't hide earlier, more relevant errors

// Traditional parser combinator (hidden backtracking):
// try_parser1.or(try_parser2).or(try_parser3)  // May backtrack!

// Tokit approach (explicit lookahead, no backtracking):
let parser = any()
    .peek_then::<_, typenum::U2>(|peeked, _| {
        match peeked.front() {
            ...
        }
    });

Parser Combinators + Deterministic Table Parsing

Tokit uniquely combines:

Parser Combinator Flexibility: Compose small parsers into complex grammars
LALR-Style Determinism: Fixed lookahead windows with deterministic decisions
Type-Level Capacity: Lookahead buffer size known at compile time (Window::CAPACITY)

This hybrid approach gives you composable abstractions without sacrificing performance or predictability.

Atomically Composable Error Handling

Tokit's error handling system breaks down error scenarios into small, focused traits. Each trait handles one specific parsing situation (like TooFewEmitter for "too few elements" or UnexpectedLeadingSeparatorEmitter for leading separators).

Benefits of the Atomically Composable Trait Design:

✅ Implement only what you need: Your parser only uses TooFewEmitter? Just implement that trait
✅ Compose custom strategies: Mix and match atomic traits to build specialized error handlers
✅ Pre-built bundles: Fatal, Verbose, and Silent implement all traits for convenience
✅ Fine-grained control: Small, reusable pieces that compose into complex behavior

This is fundamentally different from traditional monolithic error handler interfaces - you get both the simplicity of pre-built strategies and the flexibility to implement only what you need.

Fail-Fast Runtime ↔ Comprehensive Compiler Diagnostics

Tokit's architecture decouples parsing logic from error handling strategy through the atomic Emitter trait system. This means:

Same Parser, Different Contexts:

Runtime/REPL Mode: Use Fatal emitter → stop on first error for immediate feedback
Compiler/IDE Mode: Use Verbose emitter → collect all errors for comprehensive diagnostics
Testing/Fuzzing: Use Ignored emitter → parse through all errors for robustness testing

Benefits:

✅ Write parsers once, deploy everywhere
✅ No separate "error recovery mode" - it's just a different emitter
✅ Custom emitters can implement domain-specific error handling
✅ Zero-cost abstraction - emitter behavior resolved at compile time

Inspirations

Tokit takes inspiration from:

winnow - For ergonomic parser API design
chumsky - For composable parser combinator patterns
logos - For high-performance lexing
rowan - For lossless syntax tree representation

Core Priorities

Performance - Parse-while-lexing (zero-copy streaming), zero-cost abstractions, no hidden allocations
Predictability - No hidden backtracking, explicit control flow, deterministic decisions
Composability - Small parsers combine into complex ones; atomic emitter traits compose into custom strategies
Versatility - Same parser works for runtime (fail-fast) and compiler diagnostics (comprehensive) via atomic Emitter traits
Flexibility - Work with any lexer, atomic error handling traits, support both AST and CST
Correctness - Rich error types, span tracking, validation

Who Uses Tokit?

smear: Blazing fast, fully spec-compliant, reusable parser combinators for standard GraphQL and GraphQL-like DSLs

License

tokit is dual-licensed under:

MIT License (LICENSE-MIT)
Apache License, Version 2.0 (LICENSE-APACHE)

You may choose either license for your purposes.

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
.github		.github
ci		ci
tokit		tokit
.codecov.yml		.codecov.yml
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
Cargo.toml		Cargo.toml
LICENSE-APACHE		LICENSE-APACHE
LICENSE-MIT		LICENSE-MIT
README.md		README.md
rustfmt.toml		rustfmt.toml

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Tokit

Overview

Key Features

Installation

Feature Flags

Core Components

Lexer Layer

Error Handling

Atomically Composable Trait Design

Built-in Emitter Strategies

Error Recovery

Recovery Strategies

Recovery Patterns

Utilities

Examples

json — JSON value parser

calculator — arithmetic expression evaluator (token-level Pratt)

s_expression — Lisp S-expression interpreter (recursive descent)

c_expression — C-style expression parser (combinator-level Pratt)

Architecture

Design Philosophy

Parse-While-Lexing: Zero-Copy Streaming

No Hidden Backtracking

Parser Combinators + Deterministic Table Parsing

Atomically Composable Error Handling

Fail-Fast Runtime ↔ Comprehensive Compiler Diagnostics

Inspirations

Core Priorities

Who Uses Tokit?

License

About

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`json` — JSON value parser

`calculator` — arithmetic expression evaluator (token-level Pratt)

`s_expression` — Lisp S-expression interpreter (recursive descent)

`c_expression` — C-style expression parser (combinator-level Pratt)

Packages