Skip to content
Change the repository type filter

All

    Repositories list

    • Jupyter Notebook
      0100Updated Mar 15, 2026Mar 15, 2026
    • LLM Reinforcement Learning Data Synthesis
      Python
      0000Updated Feb 12, 2026Feb 12, 2026
    • ModernBERT train from scratch in Cantonese
      Python
      0300Updated Nov 28, 2025Nov 28, 2025
    • Hong Kong Location Knowledge Base
      0100Updated Nov 20, 2025Nov 20, 2025
    • [ICASSP 2024] 🍵 Matcha-TTS: A fast TTS architecture with conditional flow matching
      Jupyter Notebook
      MIT License
      194000Updated Oct 30, 2025Oct 30, 2025
    • whistle

      Public
      Text-Only Domain Adaptation for Pretrained Speech Recognition Transformers
      Python
      3400Updated Oct 20, 2025Oct 20, 2025
    • Python
      11910Updated Aug 12, 2025Aug 12, 2025
    • yuesub

      Public
      Cantonese Subtitle Editor
      TypeScript
      MIT License
      0000Updated Jul 29, 2025Jul 29, 2025
    • Cantonese Video Transcribe Service
      Python
      52411Updated Jul 25, 2025Jul 25, 2025
    • Hong Kong Web Corpus Pipeline
      Python
      0200Updated Jul 18, 2025Jul 18, 2025
    • .github

      Public
      Apache License 2.0
      0000Updated Jul 13, 2025Jul 13, 2025
    • Python
      21210Updated Jun 29, 2025Jun 29, 2025
    • gaakzi

      Public
      今日格一件,明日又格一件,積習既多,然後脫然自有貫通處。
      0000Updated Jun 17, 2025Jun 17, 2025
    • Python
      0100Updated May 15, 2025May 15, 2025
    • vits2 backbone with multilingual-bert, modified for Cantonese support
      Python
      GNU Affero General Public License v3.0
      1.3k2602Updated Apr 16, 2025Apr 16, 2025
    • Jyutping Bert model pre-training with word mask filling
      0000Updated Apr 13, 2025Apr 13, 2025
    • Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme Predictions
      Python
      MIT License
      57101Updated Apr 5, 2025Apr 5, 2025
    • bots

      Public
      Bot for Hon9Kon9ize's Telegram/Discord server
      TypeScript
      0000Updated Mar 26, 2025Mar 26, 2025
    • Generate knowledge dataset in a format that is easy for LLM to consume during training
      Python
      0000Updated Mar 22, 2025Mar 22, 2025
    • Cantonese Instruction Evolver
      Python
      1100Updated Mar 8, 2025Mar 8, 2025
    • Evaluation of Open Source Cantonese ASR Models in Diverse Domains
      Python
      4200Updated Feb 16, 2025Feb 16, 2025
    • Python
      1400Updated Feb 8, 2025Feb 8, 2025
    • Tokenize Cantonese and English text to phoneme tokenization
      Python
      BSD 2-Clause "Simplified" License
      0000Updated Jan 16, 2025Jan 16, 2025
    • Use BERT fill-mask model to fix typos
      Python
      0210Updated Dec 15, 2024Dec 15, 2024
    • Fine-Tune Wav2Vec2 Bert 2.0 for Jyutping Recognition
      Jupyter Notebook
      0100Updated Dec 11, 2024Dec 11, 2024
    • SovitsTokenizer: A low-bitrate audio tokenizer that converts speech into discrete tokens
      Jupyter Notebook
      0300Updated Dec 4, 2024Dec 4, 2024
    • Python
      0100Updated Nov 1, 2024Nov 1, 2024
    • hon9kon9ize's website
      TypeScript
      0100Updated Oct 23, 2024Oct 23, 2024
    • Generate Cantonese Instruction dataset by Gemini Pro using Stanford's Alpaca prompts for fine-tuning LLMs.
      Python
      Apache License 2.0
      0200Updated Apr 20, 2024Apr 20, 2024
    • A monorepo for hon9kon9ize's services
      Python
      MIT License
      0400Updated Feb 6, 2024Feb 6, 2024
    ProTip! When viewing an organization's repositories, you can use the props. filter to filter by custom property.