Anas Dorbani's profile picture

Anas Dorbani

PhD Student in Computer Engineering

A PhD student at Polytechnique Montreal specializing in the intersection of AI and data systems. My research focuses on multimodal data integration, tabular understanding, and enhancing database systems with large language models. I am passionate about building the next generation of intelligent data systems.

Data & AI SystemsMultimodal Data IntegrationTabular Understanding
Research

Published Papers

Factorized and Vectorized Execution: Optimizing Analytical and Semantic Queries over Relations

Sunny Yasser, Anas Dorbani, Amine Mhedhbi

ACM SIGMOD2026

Beyond Quacking: Deep Integration of Language Models and RAG into DuckDB

Anas Dorbani, Sunny Yasser, Jimmy Lin, Amine Mhedhbi

Very Large Data Base Endowment (VLDB) - Demonstration Track2025
PDFCode
Academic Journey

Education

PhD in Computer Engineering

Polytechnique Montreal & Mila

Montreal, Canada

2025 - Present

Focus

Researching multimodal data integration and tabular understanding, with a focus on large language models and database systems.

B.Sc. in Computer Science

National School of Computer Science And System Analysis

Rabat, Morocco

2019 - 2024

Thesis

Developed an automated approach to schema generation and data processing for financial compliance systems using advanced language models, improving metadata consistency and enhancing overall data integration and interpretability.

Industry Collaboration

Oracle Labs

Experience

Work Experience

Oracle Labs

Casablanca, Morocco

February 2024 - July 2024

Research Assistant

Data Integration Team

Automated schema generation for Oracle's Financial Crimes & Compliance systems, enhancing data processing. Fine-tuned 7B models to optimize schema and handle abbreviated column names. Created a framework to evaluate schema generation and data integration accuracy. Improved metadata consistency from 0.4 to 0.6, boosting data interpretability. Optimized output parsing for better data flow and results with 7B models.

Oracle Labs

Casablanca, Morocco

June 2023 - August 2023

Research Assistant

AutoMLx Team

Enhanced machine learning explainability for the AutoMLx project by optimizing LFI/GFI explainers, reducing their processing time by 80% and improving inference speed. Streamlined memory usage from 20GB to 4GB, lowering operational costs for explanation services. Achieved 83% code coverage to ensure reliability and maintainability of explainability features. Collaborated with cross-functional teams to deliver scalable, high-performance ML explainability solutions within AutoMLx

National University of Rabat

Rabat, Morocco

July 2022 - August 2022

Research Assistant

Valuation and Transfer Management

Engineered a deep learning model to predict RFID pricing by scraping specifications and market data. Deployed the solution on GCP using Docker for scalable performance and built a Django web application to streamline data collection and real-time model testing.

Teaching

TAships

INF3710: Files and Databases

Polytechnique Montreal

Introduction to files and databases: needs analysis via the entity-relationship model; relational model and relational algebra; SQL DDL/DML and embedded SQL; concurrency control and transaction management; relational schema design (functional dependencies and normal forms); storage models and file structures; indexing and hashing.

Fall 2025 - TA for Prof. Amine Mhedhbi
Winter 2026 - TA for Dre. Franjieh El Khoury
Recognition

Awards & Grants

VLDB Travel Grant

Grant
2025

Very Large Data Base Endowment Inc.

Funding support for students, researchers, and faculty to attend the VLDB 2025 conference in London, covering travel, lodging, and free registration to promote participation in database research.

Portfolio

Projects

FFX

August 2025 - Present

Fast Factorized eXecution engine for join-heavy analytical and semantic queries. Built in C++ with factorized intermediates and vectorized execution to optimize performance for modern data workloads.

Flock

Flock

September 2024 - Present

DBMS extension integrating LLM and RAG into OLAP systems. Developed FlockMTL from infrastructure design to code implementation and optimization. Designed custom map and reduce functions to integrate advanced workflows into relational database systems. Implemented dynamic batching over tuples to improve query execution efficiency.

OpenHands

OpenHands

Mars 2024 - August 2024

Platform for software development agents. As a core maintainer, I helped with reimplementing the SWE agent and fixing its benchmark to improve performance and reliability. Assisted with issue resolution and reviewed pull requests to maintain project quality.

SecureStream

Feb 2024

A network security project that employs machine learning and real-time traffic monitoring to detect anomalies in network data. Powered by the CSE-CIC-IDS2018 dataset and cicflowmeter, it enables swift identification of potential threats, enhancing overall network security.