🎯
Focusing
MPhil student @ UM · multimodal LLMs / speech models / vision · sometimes the code even runs
- Macau, China
-
01:08
(UTC +04:00)
Popular repositories Loading
-
speech-star
speech-star PublicSpeechStar: An Audio-Indispensable Benchmark for Evaluating Speech LLMs
Python
-
audiotoken-bridge
audiotoken-bridge PublicA training framework for integrating discrete speech tokens into large language models via instruction tuning
Python
-
vl-caption-engine
vl-caption-engine PublicScalable vision-language instruction data synthesis pipeline with quality-aware filtering for VLM training
Python
-
-
WAM-Diff
WAM-Diff PublicForked from fudan-generative-vision/WAM-Diff
WAM-Diff: A Masked Diffusion VLA Framework with MoE and Online Reinforcement Learning for Autonomous Driving
Python
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.