GitLineage

diagram

Git Lineage – Query System

AI-powered understanding and querying of codebases Built with Amazon Bedrock, AWS Lambda, and SQLite

Inspiration

As developers, we often spend countless hours digging through codebases — trying to understand who wrote what, where a function originated, or how a module evolved. Traditional tools like grep, GitHub search, or even AI code assistants often fail to capture context or lineage.

We wanted to solve that.

Git Lineage was born from a simple idea: “What if an AI agent could understand, not just search, our repositories?”

What it does

Git Lineage transforms any GitHub repository into an AI-queryable knowledge base. Instead of just searching code by keyword, it allows users to ask natural-language questions like:

“Where is the SceneManager class defined?”
“How does process_scene() work?”
“Which commit introduced render_frame?”

-> The system indexes all classes and functions using AST and Tree-sitter parsers. -> Generates embeddings via Amazon Titan for semantic similarity. -> Stores them in FAISS for ultra-fast retrieval.

When a query comes in, it performs a semantic search → fetches context → calls Bedrock (Claude 3.7 Sonnet) to summarize or explain. This results in a natural conversation-like experience with your codebase — no manual searching, no reading through 20 files.

How we built it

The system is built as an end-to-end pipeline using AWS services and local intelligence layers:

Repository Indexing
Uses GitPython, AST, and tree-sitter to extract all classes and functions.
Stores metadata — file paths, symbols, commits — inside a lightweight SQLite database.
Vector Store
Embeds symbol names and snippets using Amazon Titan Embeddings from Bedrock.
All embeddings are stored and indexed in FAISS for semantic retrieval.
LLM Query Agent
A lightweight AWS Lambda function receives user queries.
Performs FAISS vector search → fetches relevant code snippet → calls Claude 3.7 Sonnet via Bedrock.
Returns a natural language summary or explanation.

Responses appear instantly, showing code understanding in action.

Challenges we ran into

Managing token efficiency when sending contextual code to the LLM. Fine-tuning FAISS similarity thresholds for relevant snippet retrieval. Balancing speed and accuracy between local indexing and cloud inference.

Accomplishments that we're proud of

Built a fully autonomous pipeline that turns raw code into an AI-searchable structure.
Successfully integrated Amazon Bedrock + FAISS for hybrid semantic reasoning.
Achieved query responses in under 3 seconds on AWS Lambda.
Made LLMs understand repository evolution (commits, PRs, and structure) — not just snippets.
Created a scalable and reproducible workflow that can analyze any public GitHub repo on demand.

What we learned

How to combine retrieval-augmented generation (RAG) with code intelligence. The challenge of keeping embeddings up-to-date with Git commits. How AWS Bedrock simplifies integration of multiple LLMs within serverless pipelines. Importance of efficient data storage — SQLite was ideal for ephemeral Lambda storage.

What's next for GitLineage

Graph Database Integration (Amazon Neptune)
We initially planned to integrate AWS Neptune for a graph-based lineage view — connecting functions, classes, commits, and authors. Due to budget limits, this feature wasn’t deployed yet.
Multi-Repository Reasoning
Expand to support dependency tracing across interconnected projects (e.g., microservices).
Temporal Lineage Visualization
Introduce time-based visual graphs showing how code evolves per commit or release.
VS Code Integration
Bring Git Lineage directly into the developer workflow for instant in-editor explanations.
Knowledge Graph Compression & Summarization
Use dynamic context distillation to keep large codebases queryable under strict token limits.

Built With

amazon-web-services
ast
bedrock
faiss
github
lambda
python
sqlite
treesitter

Updates

Minjae Jang started this project — Oct 22, 2025 01:16 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.