Shot in Tamarindo, Costa Rica
Sicong Jiang
Final-year PhD @ McGill | Student Researcher @ Google DeepMind
I am a PhD candidate at McGill University and a Research Intern at
Google DeepMind, focusing on LLM evaluation, reward modeling, and self-improving agents. Previously, I led research at
Abaka AI as a founding scientist, where I developed large-scale datasets across pre/mid/post-training pipelines for frontier AI labs.
My research addresses the fundamental challenge of building reliable AI agents through the lens of automated evaluation and reward modeling. I develop structured benchmarks and reward models—such as EditReward (ICLR 2026) and AgentThink (EMNLP 2025)—to enhance long-horizon reasoning and robustness. My goal is to create high-fidelity feedback loops that enable agents to self-improve.
Scholar •
X •
WeChat •
Email •
Mar 2026
🚀 Excited to join Google DeepMind as a Research Intern in London, UK.Feb 2026
🎉 Two papers accepted by CVPR 2026 (one Main + one Findings).Jan 2026
🎉 One paper accepted by ICLR 2026. Check EditReward.Nov 2025
🎉 One paper accepted (oral) by Bridge Program of AAAI 2026.Aug 2025
🎉 One paper accepted by EMNLP 2025. Check AgentThink.Aug 2025
🤝 Joined 2077AI-Foundation—thrilled to contribute to the AI open-source community!Jul 2025
🚀 Joined Abaka AI as a Founding Technical Member in Palo Alto, California.Jul 2025
🎉 One paper accepted by ICCV 2025 Foundation Models for AD Workshop. Check VLA4AD Survey.Mar 2025
✉️ Invited to contribute to Humanity's Last Exam, an AGI reasoning benchmark.Feb 2025
🎉 One paper accepted by ICLR 2025 Trustworthy LLM Workshop. Check SparseAttack-LLM4TS.Jan 2025
🎉 One paper accepted by AISTATS 2025. Check Attack-LLM4TS.AI Agents, Benchmarks & Evaluation
EditReward: A Human-Aligned Reward Model for Instruction-Guided Image Editing
K. Wu*, S. Jiang*, M. Ku, P. Nie, M. Liu, W. Chen
ICLR 2026
Website • Paper • GitHub ⭐ 120
AgentThink: Tool-Augmented Reasoning in VLMs for Autonomous Driving
K. Qian*, S. Jiang*, Y. Zhong*, Z. Luo, Z. Huang, et al.
EMNLP 2025
Website • Paper • GitHub ⭐ 138
VeriWeb: Verifiable Long-Chain Web Benchmark for Agentic Information-Seeking
2077AI Team
Under review, 2025
Website • Paper • GitHub ⭐ 86 • HuggingFace Daily Paper #2
EgoTL: Egocentric Think-Aloud Chains for Long-Horizon Tasks
L. Liu, D. Li, Y. Liang, S. Jiang, H. Vijay, H. Hu, et al.
CVPR 2026 Findings
Releasing soon
Foundation Models: Robustness, Safety & Applications
A Survey on Vision–Language–Action Models for Autonomous Driving
S. Jiang*, Z. Huang*, K. Qian*, Z. Luo, T. Zhu, et al.
ICCV Workshop, 2025
Paper • GitHub ⭐ 532 • Tech Channel Report
Adversarial Vulnerabilities in Large Language Models for Time Series Forecasting
F. Liu*, S. Jiang*, L. Miranda-Moreno, S. Choi, L. Sun
AISTATS 2025
Paper • GitHub ⭐ 15
FASIONAD+: Enhanced Safety in Autonomous Driving with Adaptive Feedback
Z. Luo*, S. Jiang*, K. Qian*, Z. Huang, J. Miao, et al.
ICRA 2026
Paper
MTRDrive: Memory-Tool Synergistic Reasoning for Robust Autonomous Driving in Corner Cases
Z. Luo*, K. Qian*, J. Wang, Y. Luo, J. Miao, Z. Fu, Y. Wang, S. Jiang, Z. Huang, et al.
ICRA 2026
Paper
Communication-Aware Reinforcement Learning for Cooperative Adaptive Cruise Control
S. Jiang, S. Choi, L. Sun
TRB Annual Meeting (Oral), 2024
Paper
LLM-as-Judge for Open-ended Tasks: Researching on rubric-based LLM evaluators for open-ended outputs and trajectories, and systematically probing judge failure modes to support trustworthy agent self-improvement.
Self-evolving Agent: Building agents that iteratively improve their policies via self-refinement loops, with a focus on reliable feedback signals and long-horizon behavior.
Research: As a founding member of the Research team, I lead benchmarking and evaluation for agentic and multimodal LLMs. I led the EditReward project and co-developed large-scale benchmarks including SuperGPQA and VeriWeb.
Advanced Dataset & Pipeline Design: Led several zero-to-one pipeline builds—architecting and deploying high-difficulty dataset solutions and production pipelines from scratch across coding, IMO-level math, multimodal data, agentic trajectories, and RL environments. These datasets and pipelines are directly used for model training and evaluation for multiple frontier AI labs.
Multimodal Data Pipelines: Built data pipelines and multi-stage QA systems for multimodal LLM projects, overseeing large-scale annotation workflows and label consistency.
Dataset Quality & Validation: Conducted analysis and validation to refine annotations and ensure robust datasets for LLM post-training.
AgentThink (Agent Reasoning): Led a collaboration with Xiaomi and Tsinghua on tool-augmented reasoning for vision-language models in autonomous driving, achieving +54% answer accuracy on open-source models.
Adversarial LLM4TS: Developed a black-box attack framework and public benchmarks for LLM-based time-series forecasting, in collaboration with the Amazon Chronos and Nixtla teams.
Multi-Agent RL Exploration: Developed a multi-agent search strategy combining MADDPG with frontier-based exploration, and built evaluation benchmarks for exploration efficiency.
Awards
2024
McGill Engineering Doctoral Award (MEDA)2021
TISED Doctoral Recruitment Award (DRA), McGill University2019
Outstanding Graduate of Liaoning Province; Most Influential Graduate, Northeastern University2017
National 1st Prize, China Undergraduate Mathematical Contest in Modeling2017
1st Class Academic Scholarship, Northeastern UniversityAcademic Service
Workshops Organizer
- CVPR 2026 Workshop on Video Generative Models: Benchmarks and Evaluation
- ICCV 2025 Workshop on Memory and Vision
- COLM 2025 Workshop on AI Agents: Capabilities and Safety
Conferences Reviewer
- Advances in Neural Information Processing Systems (NeurIPS)
- International Conference on Learning Representations (ICLR)
- International Conference on Artificial Intelligence and Statistics (AISTATS)
- IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- International Conference on Computer Vision (ICCV)
- Conference on Language Modeling (COLM)
- Conference on Empirical Methods in Natural Language Processing (EMNLP)
- Association for the Advancement of Artificial Intelligence (AAAI)
- IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
- IEEE International Conference on Robotics and Automation (ICRA)
- IEEE Intelligent Transportation Systems Conference (ITSC)
Journals Reviewer
- IEEE Robotics and Automation Letters (RA-L)
- Transportation Research Part C: Emerging Technologies (TRC)
- IEEE Transactions on Intelligent Transportation Systems (T-ITS)
I enjoy music by Tyler, the Creator, SZA and Chappell Roan.
Sometimes I also listen to Taylor Swift, Olivia Rodrigo and 9m88.
My favorite influencer is Allywoo on RedNote.
Cat: Bobo, a golden shaded British Shorthair who is good at programming with buttons.


