Multimodal Generative Systems โข Video Understanding โข World Models
Building next-generation generative AI systems that understand and create multimodal content from text to motion to photorealistic video.
|
๐น Transformer-based architectures for textโvideo synthesis |
๐ธ Distributed training pipelines (PyTorch DDP) |
โโ Generative Video Systems โ โโ ๐ง Transformer-based architectures โ โโ ๐จ Diffusion model research โ โโ โฐ Temporal consistency modeling โ โโ ๐ฅ Text-to-motion-to-video synthesis โ โโ World Models โ โโ ๐ Environment dynamics learning โ โโ ๐ฎ Predictive modeling โ โโ ๐ฎ Temporal coherence โ โโ Multimodal AI โโ ๐ Cross-modal fusion โโ ๐๏ธ Vision-language models โโ ๐ต Audio-visual learning
โโ Distributed Training โ โโ โก PyTorch DDP (8-GPU clusters) โ โโ ๐ Large-scale dataset pipelines โ โโ ๐ฌ Systematic experimentation โ โโ ๐ Mixed precision training โ โโ Production ML โ โโ ๐ ONNX deployment โ โโ โก TensorRT optimization โ โโ ๐ฆ Model quantization โ โโ ๐ฏ Inference acceleration โ โโ Research Infrastructure โโ ๐ง Experiment tracking โโ ๐ Metric dashboards โโ ๐ ๏ธ Reproducible pipelines
| Domain | Technologies | Experience Level |
|---|---|---|
| ๐ง Generative Modeling | Diffusion โข GANs โข VAEs โข Transformers โข Temporal Consistency | โโโโโโโโโโโโ 95% |
| ๐ฌ Video & Motion | Pose Estimation โข Temporal Modeling โข Sequence Alignment โข Motion Synthesis | โโโโโโโโโโโโ 90% |
| ๐ Multimodal Systems | Cross-modal Fusion โข Vision-Language โข Audio-Visual Learning | โโโโโโโโโโโโ 85% |
| โก Distributed Training | PyTorch DDP โข 8-GPU Clusters โข Large-scale Pipelines โข Experiment Tracking | โโโโโโโโโโโโ 88% |
| ๐ Model Deployment | ONNX โข TensorRT โข Quantization โข Inference Optimization | โโโโโโโโโโโโ 82% |
|
MSc Computer Vision, ๐ Dissertation: Cross-modal latent fusion for multimodal face generation |
2 Research Papers โจ Generative AI Architectures ๐ฌ Computer Vision Systems |
AI Encode London Winner โก Built a real-time AI prototype in 48 hours ๐ Production-ready system demo |
MSc Dissertation: Latent fusion for multimodal synthesis
Tech: VAEs โข Cross-modal Fusion โข PyTorch
Status: Published Research
I'm actively seeking opportunities to collaborate on cutting-edge research problems:
| Research Area | Specific Interests |
|---|---|
| ๐ World Models | Environment dynamics โข Predictive modeling โข Spatial reasoning |
| ๐ฌ Video Generation | Temporal coherence โข Motion synthesis โข Controllable generation |
| ๐ Multimodal AI | Cross-modal fusion โข Unified representations โข Audio-visual learning |
| โฐ Temporal Reasoning | Long-range dependencies โข Sequence modeling โข Event prediction |
| ๐ฎ Interactive Systems | Real-time generation โข Human-AI collaboration โข Embodied AI |
๐ London, United Kingdom
๐ MSc Computer Vision, Robotics & ML @ University of Surrey
๐ฌ Researching at the intersection of generative AI, video understanding, and world models