2025 AI算法实习:大规模Schema下的Text-to-SQL多表关联推理研究
Research Focus: Multi-table Join Reasoning for Text-to-SQL in Large-Scale Schemas
This repository documents my internship journey as an AI Algorithm Intern (Dec 2024 - Present), focusing on cutting-edge research in Text-to-SQL with large-scale database schemas.
- Problem: LLM accuracy drops from 86% → 5% when database tables exceed 100 (the "Scale Wall" problem)
- Focus: Multi-table join reasoning, Schema Linking optimization
- Methods: Graph algorithms, semantic retrieval enhancement, query rewriting
LLM-Research-Internship-2025/
├── README.md # This file
├── 实习日志_第一周_12月2日至12月11日.md # Weekly log (Chinese)
└── Internship_Log_Week1_Dec2_to_Dec11.md # Weekly log (English)
| Paper | Venue | Core Contribution |
|---|---|---|
| SteinerSQL | arXiv 2509.19623 | Schema Linking as Steiner Tree problem, 40.04% SOTA |
| LinkAlign | arXiv 2503.18596 | Multi-round semantic enhanced retrieval |
| UNJOIN | arXiv 2505.18122 | Schema simplification via virtual wide table |
| SchemaGraphSQL | arXiv 2505.18363 | Pathfinding graph algorithms for Schema Linking |
| CHESS | arXiv 2405.16755 | Multi-agent framework for Text-to-SQL |
| Multi-hop Reasoning | arXiv 2405.09593 | LLM-based multi-hop reasoning |
- LLM: Qwen3-14B-awq (Local Deployment)
- Embedding: BGE-large-en-v1.5
- Framework: LlamaIndex, sentence-transformers
- Database: MySQL (344 tables, 3353 columns)
- 📖 Paper survey on Text-to-SQL multi-table join reasoning
- 🎤 Team presentation on research findings
- 🔧 LinkAlign implementation and optimization
- ✅ Query expansion for Chinese-English mixed retrieval
"Academia and industry are different—papers look elegant, but the real challenges begin when you try to implement them."
- Graph theory concepts (learned at university) are directly applicable in AI algorithm implementation
- AI assistance makes paper reading and debugging much more accessible
- The gap between academic benchmarks and real-world business scenarios requires creative engineering solutions
Feel free to reach out for discussions on Text-to-SQL research or AI internship experiences!
Last Updated: December 2024