Blazing fast parser combinators with parse-while-lexing architecture (zero-copy), deterministic LALR-style parsing, and no hidden backtracking.
- Overview
- Installation
- Core Components
- Examples
- Architecture
- Design Philosophy
- Who Uses Tokit?
- License
Tokit is a blazing fast parser combinator library for Rust that uniquely combines:
- Parse-While-Lexing Architecture: Zero-copy streaming - parsers consume tokens directly from the lexer without buffering, eliminating allocation overhead
- Deterministic LALR-Style Parsing: Explicit lookahead with compile-time buffer capacity, no hidden backtracking
- Flexible Error Handling: Same parser code adapts for fail-fast runtime or greedy compiler diagnostics via the
Emittertrait
Unlike traditional parser combinators that buffer tokens and rely on implicit backtracking, Tokit streams tokens on-demand with predictable, deterministic decisions. This makes it ideal for building high-performance language tooling, DSL parsers, compilers, and REPLs that need both speed and comprehensive error reporting.
- Parse-While-Lexing: Zero-copy streaming architecture - no token buffering, no extra allocations
- No Hidden Backtracking: Explicit, predictable parsing with lookahead-based decisions instead of implicit backtracking
- Deterministic + Composable: Combines the flexibility of parser combinators with LALR-style deterministic table parsing
- Flexible Error Handling Architecture: Designed to support both fail-fast parsing (runtime) and greedy parsing (compiler diagnostics) by swapping the
Emittertype - same parser, different behavior - Token-Based Parsing: Works directly on token streams from any lexer implementing the
Lexer<'inp>trait - Composable Combinators: Build complex parsers from simple, reusable building blocks
- Flexible Error Handling: Configurable error emission strategies (
Fatal,Silent,Ignored) - Rich Error Recovery: Built-in
recover()andinplace_recover()combinators for resilient parsing with backtracking or synchronization - Zero-Cost Abstractions: All configuration resolved at compile time
- No-std Support: Core functionality works without allocator
- Multiple Source Types: Support for
str,[u8],Bytes,BStr,HipStr - Logos Integration: Optional
LogosLexeradapter for seamless logos integration - CST Support: Optional Concrete Syntax Tree support via rowan
Add this to your Cargo.toml:
[dependencies]
tokit = "0.1"std(default) - Enable standard library supportalloc- Enable allocator support for no-std environmentsrowan- Enable CST (Concrete Syntax Tree) support with rowan integrationlogos- Enable integration with latestlogoscratelogos_0_16- Enable integration with[email protected]logos_0_15- Enable integration with[email protected]logos_0_14- Enable integration with[email protected]
bytes- Enable integration with latestbytescratebytes_1- Enable integration with latestbytes@1
bstr- Enable integration with latestbstrcratebstr_1- Enable integration withbstr@1
hipstr- Enable integration with latesthipstrcratehipstr_0_8- Enable integration with[email protected]
smallvec- Enable integration with latestsmallveccratesmallvec_1- Enable integration withsmallvec_1@1
heapless- Enable integration with latestheaplesscrateheapless_0_9- Enable integration with[email protected]
-
Lexer<'inp>TraitCore trait for lexers that produce token streams. Implement this to use any lexer with Tokit.
-
Token<'a>TraitDefines token types with:
Kind: Token kind discriminatorError: Associated error type
-
LogosLexer<'inp, T, L>(feature:logos)Ready-to-use adapter for integrating Logos lexers.
Tokit's flexible Emitter system allows the same parser to adapt to different use cases by simply changing the error handling strategy.
Tokit's emitter system uses atomically composable traits - small, focused traits that each handle a specific parsing scenario. Instead of one monolithic interface, error handling is broken down into atomic building blocks:
- Core:
Emitter- Base error handling (lexer errors, unexpected tokens) - Repetition:
TooFewEmitter,TooManyEmitter,FullContainerEmitter - Separation:
SeparatedEmitter,UnexpectedLeadingSeparatorEmitter,UnexpectedTrailingSeparatorEmitter
This design provides:
- Fine-grained control: Implement only the traits you need
- Composability: Mix and match traits to build custom strategies
- Extensibility: Create specialized emitters for specific use cases
Tokit provides complete implementations that implement all atomic traits with consistent behavior:
Fatal- Fail-fast parsing: Stop on first error (default) - perfect for runtime parsing and REPLsVerbose- Comprehensive error collection: Collect all errors and continue parsing - perfect for compiler diagnostics and IDEsSilent- Silently ignore errorsIgnored- Ignore errors completely
Key Design: Change the Emitter type to switch between fail-fast runtime parsing and comprehensive compiler diagnostics - same parser code, different behavior. This makes Tokit suitable for both:
- Runtime/REPL: Fast feedback with
Fatalemitter - Compiler/IDE: Comprehensive diagnostics with
Verboseemitter
The Verbose Emitter: Unlike Fatal which stops at the first error, the Verbose emitter collects all errors during parsing and continues where possible. Errors are stored in a BTreeMap indexed by span, automatically sorted by their position in the source code. After parsing, retrieve all errors via the errors() method for display, analysis, or further processing. This is ideal for:
- Showing users all issues at once in compiler output
- Providing real-time diagnostics in IDE error panels
- Collecting comprehensive error information for debugging
- Generating detailed error reports for language servers
Custom Emitters: Thanks to the atomically composable trait design, you can create custom error handling strategies by implementing only the traits you need. You compose small, focused traits to build exactly the behavior you want. For example, you could build an emitter that:
-
Implements only
Emitter+TooFewEmitterfor a parser that only needs those scenarios -
Limits the maximum number of errors before stopping
-
Filters errors by severity level
-
Sends errors to a logging system or telemetry service
-
Implements domain-specific error recovery strategies for specific error types
-
Wraps an existing emitter and adds custom behavior for certain atomic traits
-
Rich Error Types (in
error/module)- Token-level:
UnexpectedToken,MissingToken,UnexpectedEot - Syntax-level:
Unclosed,Unterminated,Malformed,Invalid - Escape sequences:
HexEscape,UnicodeEscape - All errors include span tracking
- Token-level:
Tokit provides built-in parser combinators for resilient parsing that can continue after errors:
-
recover(recovery_parser)- Error recovery with backtracking- If primary parser fails, resets to starting position and tries recovery parser
- Use for: Alternative interpretations, fallback values, error nodes
- Example:
parse_expr().recover(parse_error_node())
-
inplace_recover(recovery_parser)- Error recovery without backtracking- If primary parser fails, continues from error position with recovery parser
- Use for: Panic mode recovery, resynchronization, skipping to safe points
- Example:
parse_stmt().inplace_recover(skip_to_semicolon())
Alternative Parsing (with backtracking):
// Try parsing as function, fall back to error item
let parser = parse_function()
.recover(parse_error_item());
// Input with error → recovers gracefullySynchronization Points (without backtracking):
// Parse statement, skip to semicolon on error
let parser = parse_statement()
.inplace_recover(
skip_to(|tok| matches!(tok, Token::Semicolon))
.then_ignore(any())
.map(|_| Statement::Error)
);
// Continues parsing from next statementComprehensive Error Collection:
// Use with Verbose emitter to collect all errors
let emitter = Verbose::new();
let items = many(
parse_item().recover(parse_error_item())
);
// After parsing, retrieve all errors:
for (span, error) in emitter.errors() {
eprintln!("Error at {:?}: {}", span, error);
}Error recovery works seamlessly with the atomically composable emitter system - combine Verbose emitter with recovery combinators to build robust parsers that report all issues in a single pass.
-
Span Tracking
SimpleSpan- Lightweight span representationSpanned<T, S>- Wrap value with spanLocated<T, S>- Wrap value with span and source sliceSliced<T, S>- Wrap value with source slice
-
Parser Configuration
Parser<F, L, O, Error, Context>- Configurable parserParseContext- Context for emitter and cacheWindow- Type-level peek buffer capacity for deterministic lookahead- Note: Lookahead windows support 1-32 token capacity via
typenum::{U1..U32}
All examples are self-contained and runnable with cargo run --example <name> --features std,logos.
They are also compiled as integration tests (cargo test --example <name> --features std,logos).
Demonstrates map-based combinators (separated_by, fold, peek_then) on a recursive JSON
grammar. Produces an enum Value (null, bool, number, string, array, object) from the token
stream without any intermediate allocation.
cargo run --example json --features std,logosDemonstrates the token-level Pratt API (InputRef::pratt) where Token implements
PrattToken to classify itself as an operand, prefix, or infix/postfix operator. Fold functions
receive raw Spanned<Token> values and encode computed f64 results back into Token::Num.
Operator table: + - (infix, left, precedence 1), * / (infix, left, 2), unary -
(prefix, 3), ^ (infix, right, 4), ( ) (grouping via PREC_PAREN sentinel).
cargo run --example calculator --features std,logosDemonstrates pure recursive-descent parsing with InputRef::next and InputRef::try_expect
— no Pratt parsing involved. The evaluator reduces the parsed AST to an Atom value.
Supports: integer and boolean literals, keyword atoms (:foo), quoted lists ('(1 2 3)),
the built-in functions + - * / = not, and (if cond then [else]) conditionals.
cargo run --example s_expression --features std,logosDemonstrates the combinator-level Pratt API (pratt_of) where separate parse_lhs /
parse_rhs functions return PrattLHS / PrattRHS values and fold functions receive fully
typed AST nodes and an &mut InputRef — enabling complex postfix forms that consume additional
tokens.
Supported operators (in precedence order): || &&, == != < <= > >=,
<< >>, + -, * / %, unary ! - ~ ++ --, postfix ++ --,
array subscript a[i], function call f(args...), and ternary cond ? then : else.
cargo run --example c_expression --features std,logosTokit's architecture follows a layered design:
- Lexer Layer - Token production and source abstraction
- Parser Layer - Composable parser combinators
- Error Layer - Rich error types and emission strategies
- Utility Layer - Spans, containers, and helpers
This separation enables:
- Use any lexer by implementing
Lexer<'inp> - Mix and match parser combinators
- Customize error handling per-parser or globally
- Zero-cost abstractions through compile-time configuration
Tokit uses a parse-while-lexing architecture where parsers consume tokens directly from the lexer as needed, without intermediate buffering:
Traditional Approach (Two-Phase):
Source → Lexer → [Token Buffer] → Parser
↓
Allocate Vec<Token> ← Extra allocation!
Tokit Approach (Streaming):
Source → Lexer ←→ Parser
↑________↓
Zero-copy streaming, no buffer
Benefits:
- ✅ Zero Extra Allocations: No token buffer, tokens consumed on-demand
- ✅ Lower Memory Footprint: Only lookahead window buffered on stack, not entire token stream
- ✅ Better Cache Locality: Tokens processed immediately after lexing
- ✅ Predictable Performance: No large allocations, deterministic memory usage
No Hidden Backtracking
Unlike traditional parser combinators that rely on implicit backtracking (trying alternatives until one succeeds), Tokit uses explicit lookahead-based decisions. This design choice provides:
- Predictable Performance: No hidden exponential backtracking scenarios
- Explicit Control: Developers decide when and where to peek ahead via
peek_then()andpeek_then_choice() - Deterministic Parsing: LALR-style table-driven decisions using fixed-capacity lookahead windows (
Windowtrait) - Better Error Messages: Failed alternatives don't hide earlier, more relevant errors
// Traditional parser combinator (hidden backtracking):
// try_parser1.or(try_parser2).or(try_parser3) // May backtrack!
// Tokit approach (explicit lookahead, no backtracking):
let parser = any()
.peek_then::<_, typenum::U2>(|peeked, _| {
match peeked.front() {
...
}
});Tokit uniquely combines:
- Parser Combinator Flexibility: Compose small parsers into complex grammars
- LALR-Style Determinism: Fixed lookahead windows with deterministic decisions
- Type-Level Capacity: Lookahead buffer size known at compile time (
Window::CAPACITY)
This hybrid approach gives you composable abstractions without sacrificing performance or predictability.
Tokit's error handling system breaks down error scenarios into small, focused traits. Each trait handles one specific parsing situation (like TooFewEmitter for "too few elements" or UnexpectedLeadingSeparatorEmitter for leading separators).
Benefits of the Atomically Composable Trait Design:
- ✅ Implement only what you need: Your parser only uses
TooFewEmitter? Just implement that trait - ✅ Compose custom strategies: Mix and match atomic traits to build specialized error handlers
- ✅ Pre-built bundles:
Fatal,Verbose, andSilentimplement all traits for convenience - ✅ Fine-grained control: Small, reusable pieces that compose into complex behavior
This is fundamentally different from traditional monolithic error handler interfaces - you get both the simplicity of pre-built strategies and the flexibility to implement only what you need.
Tokit's architecture decouples parsing logic from error handling strategy through the atomic Emitter trait system. This means:
Same Parser, Different Contexts:
- Runtime/REPL Mode: Use
Fatalemitter → stop on first error for immediate feedback - Compiler/IDE Mode: Use
Verboseemitter → collect all errors for comprehensive diagnostics - Testing/Fuzzing: Use
Ignoredemitter → parse through all errors for robustness testing
Benefits:
- ✅ Write parsers once, deploy everywhere
- ✅ No separate "error recovery mode" - it's just a different emitter
- ✅ Custom emitters can implement domain-specific error handling
- ✅ Zero-cost abstraction - emitter behavior resolved at compile time
Tokit takes inspiration from:
- winnow - For ergonomic parser API design
- chumsky - For composable parser combinator patterns
- logos - For high-performance lexing
- rowan - For lossless syntax tree representation
- Performance - Parse-while-lexing (zero-copy streaming), zero-cost abstractions, no hidden allocations
- Predictability - No hidden backtracking, explicit control flow, deterministic decisions
- Composability - Small parsers combine into complex ones; atomic emitter traits compose into custom strategies
- Versatility - Same parser works for runtime (fail-fast) and compiler diagnostics (comprehensive) via atomic
Emittertraits - Flexibility - Work with any lexer, atomic error handling traits, support both AST and CST
- Correctness - Rich error types, span tracking, validation
smear: Blazing fast, fully spec-compliant, reusable parser combinators for standard GraphQL and GraphQL-like DSLs
tokit is dual-licensed under:
- MIT License (LICENSE-MIT)
- Apache License, Version 2.0 (LICENSE-APACHE)
You may choose either license for your purposes.
Copyright (c) 2026 Al Liu.