Git Lineage – Query System
AI-powered understanding and querying of codebases Built with Amazon Bedrock, AWS Lambda, and SQLite
Inspiration
As developers, we often spend countless hours digging through codebases — trying to understand who wrote what, where a function originated, or how a module evolved. Traditional tools like grep, GitHub search, or even AI code assistants often fail to capture context or lineage.
We wanted to solve that.
Git Lineage was born from a simple idea: “What if an AI agent could understand, not just search, our repositories?”
What it does
Git Lineage transforms any GitHub repository into an AI-queryable knowledge base. Instead of just searching code by keyword, it allows users to ask natural-language questions like:
- “Where is the SceneManager class defined?”
- “How does process_scene() work?”
- “Which commit introduced render_frame?”
-> The system indexes all classes and functions using AST and Tree-sitter parsers. -> Generates embeddings via Amazon Titan for semantic similarity. -> Stores them in FAISS for ultra-fast retrieval.
When a query comes in, it performs a semantic search → fetches context → calls Bedrock (Claude 3.7 Sonnet) to summarize or explain. This results in a natural conversation-like experience with your codebase — no manual searching, no reading through 20 files.
How we built it
The system is built as an end-to-end pipeline using AWS services and local intelligence layers:
- Repository Indexing
- Uses GitPython, AST, and tree-sitter to extract all classes and functions.
Stores metadata — file paths, symbols, commits — inside a lightweight SQLite database.
Vector Store
Embeds symbol names and snippets using Amazon Titan Embeddings from Bedrock.
All embeddings are stored and indexed in FAISS for semantic retrieval.
LLM Query Agent
A lightweight AWS Lambda function receives user queries.
Performs FAISS vector search → fetches relevant code snippet → calls Claude 3.7 Sonnet via Bedrock.
Returns a natural language summary or explanation.
Responses appear instantly, showing code understanding in action.
Challenges we ran into
Managing token efficiency when sending contextual code to the LLM. Fine-tuning FAISS similarity thresholds for relevant snippet retrieval. Balancing speed and accuracy between local indexing and cloud inference.
Accomplishments that we're proud of
- Built a fully autonomous pipeline that turns raw code into an AI-searchable structure.
- Successfully integrated Amazon Bedrock + FAISS for hybrid semantic reasoning.
- Achieved query responses in under 3 seconds on AWS Lambda.
- Made LLMs understand repository evolution (commits, PRs, and structure) — not just snippets.
- Created a scalable and reproducible workflow that can analyze any public GitHub repo on demand.
What we learned
How to combine retrieval-augmented generation (RAG) with code intelligence. The challenge of keeping embeddings up-to-date with Git commits. How AWS Bedrock simplifies integration of multiple LLMs within serverless pipelines. Importance of efficient data storage — SQLite was ideal for ephemeral Lambda storage.
What's next for GitLineage
- Graph Database Integration (Amazon Neptune)
We initially planned to integrate AWS Neptune for a graph-based lineage view — connecting functions, classes, commits, and authors. Due to budget limits, this feature wasn’t deployed yet.
Multi-Repository Reasoning
Expand to support dependency tracing across interconnected projects (e.g., microservices).
Temporal Lineage Visualization
Introduce time-based visual graphs showing how code evolves per commit or release.
VS Code Integration
Bring Git Lineage directly into the developer workflow for instant in-editor explanations.
Knowledge Graph Compression & Summarization
Use dynamic context distillation to keep large codebases queryable under strict token limits.
Built With
- amazon-web-services
- ast
- bedrock
- faiss
- github
- lambda
- python
- sqlite
- treesitter
Log in or sign up for Devpost to join the conversation.