Optimize Viterbi cost calculation by batching word processing by mosuka · Pull Request #598 · lindera/lindera

mosuka · 2026-01-09T11:19:34Z

Optimize Viterbi cost calculation by batching word processing

Introduced a batch processing approach in the Viterbi algorithm to improve performance and cache locality.
Added buffers to the Lattice struct to store word entries and preceding state costs in a Struct-of-Arrays (SoA) format during processing.
Implemented add_edges_in_lattice_batched to process multiple candidate words starting at the same position in a single pass over preceding edges.
Optimized the Mode::Normal path by inlining cost calculations and skipping penalty overhead.
Reduced memory allocation and pointer chasing by reusing internal buffers within the Lattice structure.
Achieved significant performance improvements across various scenarios:
- Up to 37% speedup when using user dictionaries.
- 13% speedup for long text tokenization with detailed output.
- 7-9% improvement in standard tokenization scenarios.

…#598)" This reverts commit 75e0b2a.

Optimize Viterbi cost calculation by batching word processing

385d36e

mosuka merged commit 75e0b2a into main Jan 9, 2026
8 checks passed

mosuka deleted the viterbi branch January 9, 2026 12:51

mosuka added a commit that referenced this pull request Jan 10, 2026

Revert "Optimize Viterbi cost calculation by batching word processing (…

4f415a8

…#598)" This reverts commit 75e0b2a.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize Viterbi cost calculation by batching word processing#598

Optimize Viterbi cost calculation by batching word processing#598
mosuka merged 1 commit intomainfrom
viterbi

mosuka commented Jan 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

mosuka commented Jan 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant