Bolin Ding
Researcher & Engineer

Bolin Ding

Senior Director and Research Scientist
Data Analytics and Intelligence Lab (DAIL)
Tongyi Lab, Alibaba Group

View Publications

My work is generally about making systems intelligent and efficient with machine learning and optimization techniques. My current research interests include programming frameworks for building LLM agents, systems and algorithms for tuning agent models, efficient data pipelines for training LLMs and agent models, and building novel agent applications (e.g., for data analytics and social science).

Previously, I also worked on database systems, data privacy, data pricing, and federated learning.

I lead several projects including AgentScope (agent programming framework, runtime & infra of agent applications, agent memory, and agent tuning) and Data-Juicer (multimodal data processing for LLMs and agents), which have been widely adopted in production. Some earlier research initiatives include PilotScope (AI4DB middleware) and FederatedScope (federated learning platform).

I'm always interested in discussing research collaborations and speaking opportunities. We are hiring research scientists, engineers, and research interns for our lab!

Research

Research Interests

Making systems intelligent and efficient with machine learning and optimization techniques.

01

Agent Programming Frameworks & Infrastructure

Programming frameworks, runtime, and infrastructure for building LLM agent applications, including agent memory and multi-agent coordination (e.g., AgentScope).

02

Agent & LLM Tuning

Systems and algorithms for tuning agent models and large language models, including preference optimization and scaling laws for test-time compute.

03

Data Pipelines for LLMs & Agents

Efficient multimodal data processing pipelines for training LLMs and agent models at scale (e.g., Data-Juicer).

04

Agent Applications

Novel agent applications, particularly for data analytics and social science, including Text-to-SQL, AI4DB, and large-scale multi-agent simulation.

05

Earlier Work: Databases, Privacy & Federated Learning

Database systems, data privacy, data pricing, and federated learning (e.g., PilotScope, FederatedScope).

Projects

Active Projects

Major research initiatives and open-source platforms.

AgentScope

A flexible yet robust multi-agent platform that enables agent-oriented programming for building diverse LLM applications, including very large-scale multi-agent simulations.

arXiv 2024

Data-Juicer

A one-stop multimodal data processing system for large language models, providing 50+ core operators, data-model co-development, and cloud-scale adaptive processing.

SIGMOD 2024, NeurIPS 2025 (Spotlight), ICML 2025 (Spotlight)

LLM4Analytics

Tools for translating natural language to executable analytics actions, including Text-to-SQL benchmarks and unified data manipulation frameworks empowered by large language models.

VLDB 2024, MLSys 2024

AI4SocialScience

Interdisciplinary research combining economics, social science, and machine learning through agent-based simulation, auction design, and information design.

ICML 2024/2025, SODA 2023, NAACL 2025

PilotScope

An AI4DB middleware system that steers databases with machine learning drivers, enabling easy deployment of learned database components in real database systems.

VLDB 2024

FederatedScope

A comprehensive federated learning platform with packages for privacy-preserving learning, GNN federation, and LLM fine-tuning in federated settings.

VLDB 2023, KDD 2022 (Best Paper in ADS), KDD 2024

Publications

Selected Publications

A selection of recent research contributions.

Manuscripts

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Query translation from XPath to SQL in the presence of recursive DTDs

Wenfei Fan, Jeffrey Xu Yu, Jianzhong Li, Bolin Ding, Lu Qin

VLDB Journal 2009

2008

2007

2006

For Students

If you are interested in my research and projects, and would like to join our lab as Research Scientists/Engineers or Research Interns, please drop me a line.
I have worked with some amazing interns in Microsoft Research (2013-2017) and Alibaba (2018-now):

2025Haoming Meng
2024Yuntao Du, Pengfei He
2023Derek Hu, Yin Lin, Mike Wang, Fan Wu
2022Song Bian, Yang Guo, Yuzheng Hu
2021Renzhi Wu
2020Zitao Li, Wei Tang, Renzhi Wu
2019Amrita Roy Chowdhury, Yuchao Tao, Zhuolun Xiang, Min Xu, Huaxiu Yao
2018Yihan Gao, Jinglin Peng, Tianhao Wang
2017Silu Huang, Zhuoyue Zhao
2016Kolya Malkin, Dominik Moritz, Zhao Chang
2015Silu Huang, Vasileios Verroios
2014Fotis Psallidas, Saravanan Thirumuruganathan
2013Fabian Hueske, Yanyan Shen, Mohan Yang
Awards
  • Best Paper Award, KDD 2022 (ADS track)
  • Technical Excellence Award, Microsoft Privacy, FY17
  • Yahoo!-DAIS Research Excellence Award Gold, 2012
  • Best Student Paper, ICDE 2007
  • Richard T. Cheng Fellowship, UIUC, 2007–2008
  • TopCoder Programming Competition, 1st Place, 2007
  • ACM-ICPC World Finals, Honorable Mention, 2005
  • ACM-ICPC Asia Regional, Gold Medal (3rd place), 2004
Service

Program Committee

  • SIGMOD 2020–2022
  • PVLDB 2017–2026
  • ICDE 2019–2023
  • KDD 2017–2026 (Area Chair 2023–2026)
  • ICML 2021–2026 (Area Chair 2023–2026)
  • ICLR 2021–2026 (Area Chair 2024–2026)
  • NeurIPS 2020–2025 (Area Chair 2021–2025)
  • CCS 2022–2024
  • CIKM 2024 (Senior PC)
  • WINE 2022 (Senior PC)
  • NSF Panelist 2016

Journal Reviewing

  • ACM Transactions on Database Systems (TODS)
  • IEEE Transactions on Knowledge and Data Engineering (TKDE)
  • ACM Transactions on Knowledge Discovery from Data (TKDD)