Teaching machines to see, understand, and reason like humans
About
Researching the intersection of vision & language
I work at the frontier where machines learn to see, reason, and understand the world like we do.
Ph.D. expected Winter 2026 at Dartmouth College, advised by SouYoung Jin. Dissertation committee: Soroush Vosoughi, Nikhil Singh, and Juan Carlos Niebles. My work focuses on Multimodal Large Language Models, evaluation, multimodal reasoning, and multimodal fusion through learnable masks and cross-modal alignment.
I've built AI and machine learning systems for enterprise and government clients, including computer vision pipelines, large-scale data processing, and real-time analytics. Research collaborations with DARPA/IARPA, CMU, KAUST, Adobe Research, Samsung Research, Northeastern University, Mount Sinai, Lunenfeld Institute, Universidad del Norte, EAFIT, Universidad CES, Universidad de Antioquia, among others. Founder of Wiqonn, a LATAM-based initiative delivering research-backed AI solutions for real-world impact.
vllm-mlx: A framework delivering 21-87% higher throughput than llama.cpp on Apple Silicon, with content-based prefix caching for multimodal workloads and up to 525 tokens/sec.
OpenAI-compatible server for Apple Silicon. Run LLMs and vision-language models (Llama, Qwen-VL, LLaVA) with continuous batching, MCP tool calling, and multimodal support on M1/M2/M3/M4 chips.