Skip to content

amrgaberM/codelens-intelligence

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

59 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

CodeLens πŸ”

Live Demo Python 3.10+ License: MIT

AI-powered codebase intelligence. Ask questions about any GitHub repo in plain English.

CodeLens Demo

What is CodeLens?

Point CodeLens at any GitHub repository and instantly:

  • πŸ’¬ Ask questions in natural language
  • πŸ” Search code semantically (not just keywords)
  • πŸ“– Explain functions with full context
  • πŸ”— Understand dependencies between files

Quick Demo

# 1. Clone & install
git clone [repo link]
cd codelens && pip install -r requirements.txt

# 2. Add your Groq API key
echo "GROQ_API_KEY=your_key" > .env

# 3. Run
streamlit run streamlit_app.py

Performance

Metric Value
Retrieval Latency 38ms
Answer Generation 1.9s
Retrieval Accuracy 85%
Chunks Supported 2,000+

Tested on tiangolo/typer (605 files, 2,117 chunks)


How It Works

GitHub Repo β†’ AST Parser β†’ Chunker β†’ Embeddings β†’ Vector DB
                                                      ↓
    User Question β†’ Hybrid Search (Dense + BM25) β†’ Reranker β†’ LLM β†’ Answer

Key techniques:

  • Hybrid retrieval: 70% semantic + 30% keyword search
  • AST-based chunking: Preserves function/class boundaries
  • Dependency expansion: Adds related files automatically
  • Reciprocal Rank Fusion: Combines multiple search strategies

Tech Stack

Component Technology
LLM Groq (Llama 3.3 70B)
Embeddings MiniLM-L6-v2 (384 dim)
Vector Store ChromaDB
Sparse Search BM25
Backend FastAPI
Frontend Streamlit

Features

πŸ’¬ Natural Language Q&A

Ask anything about the codebase:

  • "How does authentication work?"
  • "What does the process_data function do?"
  • "Where is error handling implemented?"

πŸ” Hybrid Search

Combines semantic understanding with keyword matching for best results.

πŸ“Š Code Intelligence

  • Explain Function: Detailed breakdown of any function
  • Find Similar: Discover similar code patterns
  • Usage Analysis: Track where symbols are used
  • Auto-Documentation: Generate docs for any file

Installation

Prerequisites

Setup

# Clone
git clone https://github.com/amr-khalil/codelens.git
cd codelens

# Install
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate
pip install -r requirements.txt

# Configure
echo "GROQ_API_KEY=your_key" > .env

# Run
streamlit run streamlit_app.py

Usage

Web UI

streamlit run streamlit_app.py

CLI

python cli.py ingest https://github.com/tiangolo/typer
python cli.py query "How do I create a CLI command?"
python cli.py chat  # Interactive mode

API

uvicorn src.api.main:app --reload

# Index
curl -X POST http://localhost:8000/api/v1/ingest \
  -H "Content-Type: application/json" \
  -d '{"repo_url": "https://github.com/tiangolo/typer"}'

# Query
curl -X POST http://localhost:8000/api/v1/query \
  -H "Content-Type: application/json" \
  -d '{"query": "How does argument parsing work?"}'

Architecture

codelens/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ api/            # FastAPI REST API
β”‚   β”œβ”€β”€ ingestion/      # GitHub loader, AST parser
β”‚   β”œβ”€β”€ chunking/       # AST & semantic chunkers
β”‚   β”œβ”€β”€ embeddings/     # Embedding model
β”‚   β”œβ”€β”€ retrieval/      # Vector store, BM25, hybrid search
β”‚   β”œβ”€β”€ generation/     # LLM integration, prompts
β”‚   └── utils/          # Config, logging, dependency graph
β”œβ”€β”€ streamlit_app.py    # Web UI
β”œβ”€β”€ cli.py              # CLI interface
└── api.py              # Standalone API

Roadmap

  • Hybrid retrieval (dense + sparse)
  • AST-based chunking
  • Dependency graph expansion
  • Multi-language AST (tree-sitter)
  • Streaming responses
  • Redis caching
  • Evaluation metrics (RAGAS)

Contributing

# Setup
pip install pytest black isort

# Test
pytest tests/ -v

# Format
black src/ && isort src/

License

MIT Β© 2025 Amr


Acknowledgments

Built with Groq, ChromaDB, HuggingFace, Streamlit

About

Production-grade RAG system that understands entire codebases. AST-aware chunking, hybrid retrieval, dependency graphs, and multi-file reasoning for GitHub repositories.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages