Skip to content

codemmash/codemmash

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 

Repository files navigation

Chen Hao (陈昊)

🎓 Affiliation MPhil student, University of Macau
🔬 Research Multimodal LLMs · Speech Models · Vision-Language
🛠️ Stack Python, PyTorch, HuggingFace, CUDA
📍 Location Macau, China

I work on making multimodal models actually understand what they hear and see — not just pattern-match on text. Mostly this means fighting with audio tokenizers, writing data pipelines at 2am, and questioning my life choices when another training run diverges.

Currently thinking about: how to build better speech benchmarks, why audio LLMs underperform vision LLMs by so much, and whether we can close that gap with better discrete representations.


Research Interests

Speech LLMs Multimodal Benchmarking Vision-Language Data Audio Tokenization Efficient Inference


Recent Projects

Project What it does
speech-star Benchmark measuring whether speech LLMs actually need audio — or just transcripts
audiotoken-bridge Framework for injecting discrete speech tokens into LLMs via LoRA finetuning
vl-caption-engine Automated pipeline for generating + filtering vision-language instruction data


About

Profile README

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors