GitHub - codemmash/codemmash: Profile README

Chen Hao (陈昊)


🎓 Affiliation	MPhil student, University of Macau
🔬 Research	Multimodal LLMs · Speech Models · Vision-Language
🛠️ Stack	Python, PyTorch, HuggingFace, CUDA
📍 Location	Macau, China

I work on making multimodal models actually understand what they hear and see — not just pattern-match on text. Mostly this means fighting with audio tokenizers, writing data pipelines at 2am, and questioning my life choices when another training run diverges.

Currently thinking about: how to build better speech benchmarks, why audio LLMs underperform vision LLMs by so much, and whether we can close that gap with better discrete representations.

Research Interests

Speech LLMs Multimodal Benchmarking Vision-Language Data Audio Tokenization Efficient Inference

Recent Projects

Project	What it does
speech-star	Benchmark measuring whether speech LLMs actually need audio — or just transcripts
audiotoken-bridge	Framework for injecting discrete speech tokens into LLMs via LoRA finetuning
vl-caption-engine	Automated pipeline for generating + filtering vision-language instruction data

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Chen Hao (陈昊)

Research Interests

Recent Projects

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Chen Hao (陈昊)

Research Interests

Recent Projects

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages