Skip to content
Change the repository type filter

All

    Repositories list

    • wikilangs

      Public
      Pre-trained tokenizers, n-gram models, Markov chains, vocabularies, and embeddings for 340+ languages. Built for researchers, educators, and developers.
      Astro
      0300Updated Mar 6, 2026Mar 6, 2026
    • wikisets

      Public
      Flexible Wikipedia dataset builder with sampling and pretraining support. Built on top of wikipedia-monthly, providing fresh, clean Wikipedia dumps updated mont…
      Python
      MIT License
      0400Updated Nov 11, 2025Nov 11, 2025
    ProTip! When viewing an organization's repositories, you can use the props. filter to filter by custom property.