git hub code reviewer

🌟 Inspiration

The inspiration for Code Archaeologist came from a common frustration every developer faces: onboarding to a new codebase is painful. Whether joining a new team, contributing to open source, or reviewing a colleague's project, understanding unfamiliar code can take days or even weeks.

We've all experienced it:

Spending hours tracing through files to understand architecture
Struggling to identify where to start making changes
Missing critical technical debt hiding in legacy code
Wishing for a "guided tour" of the codebase

We thought: What if AI could be your expert guide? What if, instead of manually exploring thousands of lines of code, you could point an AI at any GitHub repository and instantly get:

A complete architectural overview
Documentation explaining how everything works
Identified technical debt and code quality issues
A personalized onboarding guide for new developers

That's when Code Archaeologist was born - to transform the intimidating task of understanding unfamiliar code into an exciting exploration powered by Google's Gemini AI.

🔍 What it does

Code Archaeologist is an AI-powered codebase analysis tool that transforms any GitHub repository into comprehensive, human-readable documentation in seconds.

Core Features:

🏛️ Intelligent Code Analysis
- Analyzes repository architecture and design patterns
- Identifies key modules and their relationships
- Maps out the entire codebase structure
📚 Auto-Generated Documentation
- System Architecture: Deep dive into how components interact
- Module Overview: Breakdown of each major component
- Technical Debt Analysis: Identifies code quality issues, missing tests, and areas needing improvement
- Developer Onboarding Guide: Step-by-step setup and contribution instructions
🌳 Interactive Visualizations
- File Tree Explorer: Navigate the entire project structure with expandable folders
- Dependency Graph: Visual map of import relationships between modules
- Click-and-explore interface showing which files depend on each other
🤖 AI Code Detection
- Analyzes what percentage of code appears AI-generated vs human-written
- Identifies common AI coding patterns (verbose comments, generic naming, boilerplate code)
- Provides confidence levels and specific indicators found
- Helps teams understand their AI-assisted development patterns
💬 Interactive AI Chat
- Ask questions about any aspect of the codebase
- Get instant answers powered by Gemini AI
- Contextual responses based on the actual repository code
📄 Professional PDF Reports
- One-click export of complete analysis
- Beautifully formatted documentation ready to share
- Perfect for team onboarding or code reviews

🛠️ How we built it

Code Archaeologist combines cutting-edge AI with modern web technologies:

Frontend Stack:

React 18 - Modern UI framework with hooks
Vite - Lightning-fast build tool and dev server
Tailwind CSS v4 - Utility-first styling with custom design system
Framer Motion - Smooth animations and transitions
React Markdown - Rich text rendering for AI responses

Backend Stack:

FastAPI (Python) - High-performance async API framework
Google Gemini AI (gemini-3-pro-preview) - Advanced code analysis and natural language understanding
GitPython - Repository cloning and management
Uvicorn - Lightning-fast ASGI server

AI Integration:

The heart of Code Archaeologist is its sophisticated prompting system:

Repository Ingestion: Clone GitHub repos and extract all code files
Context Building: Concatenate code with file paths and structure
Intelligent Prompting: Send structured prompts to Gemini for:
- Architecture analysis
- Module identification
- Technical debt detection
- Onboarding guide generation
- AI code pattern recognition
Multi-Modal Analysis:
- Text analysis for code understanding
- Dependency graph construction from import statements
- File tree generation from repository structure
- Pattern matching for AI-generated code detection

Key Technical Achievements:

Smart Code Parsing:

# Extract dependencies from Python, JavaScript, TypeScript files
# Supports: import, from...import, require(), and more
def extract_imports_from_content(content, file_type, file_path)

AI Detection Algorithm:

Analyzes 10+ indicators of AI-generated code
Provides percentage breakdowns and confidence levels
Identifies specific patterns with examples

Real-time Chat:

Maintains conversation history
Contextual responses based on repository content
Streaming responses for better UX

Beautiful UI/UX:

Glassmorphism design with backdrop blur
Animated gradient backgrounds
Mouse-following glow effects
Smooth tab transitions
Print-optimized PDF generation

🚧 Challenges we ran into

1. Gemini API Context Length Limits

Problem: Large repositories exceeded Gemini's token limits.

Solution: Implemented smart truncation:

Prioritize important files (README, package.json, main entry points)
Limit to 50 files and 50KB per file
Focus on code files, exclude node_modules and binary files

2. Dependency Graph Complexity

Problem: Extracting accurate import relationships across different languages.

Solution: Built language-specific parsers:

# Python: import X, from X import Y
# JavaScript/TypeScript: import X from 'Y', require('Y')
# Handles relative paths: ./file, ../folder/file

3. AI Response Reliability

Problem: Gemini sometimes returned malformed JSON or included markdown code blocks.

Solution:

Strict prompt engineering specifying exact JSON format
Response cleaning (strip ```json markers)
Fallback error handling with user-friendly messages

4. Print/PDF Generation

Problem: Browser print dialog doesn't work well with dark backgrounds and complex layouts.

Solution:

Created separate <Report> component optimized for print
Print-specific CSS media queries
Hidden UI elements with .no-print class
White background and black text for PDFs

5. Real-time Chat Performance

Problem: Chat responses felt slow and disconnected.

Solution:

Loading indicators with animated dots
Smooth scroll to bottom on new messages
Cached repository content to avoid re-analysis
Streamed responses for better perceived performance

6. AI Code Detection Accuracy

Problem: Hard to accurately distinguish AI vs human code.

Solution:

Multi-factor analysis with 10+ indicators
Confidence levels (low/medium/high)
Specific examples and file patterns
Transparent breakdown of scoring methodology

🏆 Accomplishments that we're proud of

✨ Beautiful, Polished UI
- Professional glassmorphism design
- Smooth animations throughout
- Mobile-responsive layout
- Print-ready PDF generation
🤖 Advanced AI Integration
- Successfully integrated Gemini for complex code analysis
- Built sophisticated prompting system for accurate results
- Implemented conversational AI chat with context retention
📊 Rich Visualizations
- Interactive file tree with expand/collapse
- Dependency graph with connection tracking
- AI detection gauge with breakdown scores
⚡ Performance Optimization
- Fast analysis even for large repositories
- Efficient caching of analyzed data
- Async processing for better UX
🎯 Real-World Utility
- Actually solves a painful developer problem
- Production-ready documentation generation
- Saves hours of manual code exploration
🔒 Robust Error Handling
- Graceful fallbacks for API failures
- Clear error messages for users
- Validation of GitHub URLs

📚 What we learned

Technical Learnings:

Prompt Engineering is an Art
- Learned to craft precise, structured prompts for consistent AI responses
- Discovered the importance of example outputs in prompts
- Found that explicit JSON schemas dramatically improve response quality
AI Context Management
- Balancing context length vs information density
- Prioritizing the most relevant code files
- Truncation strategies that maintain usefulness
Multi-Language Code Parsing
- Each language has unique import syntax
- Relative vs absolute imports require different resolution logic
- File extensions matter for accurate classification
React Performance Patterns
- useMemo for expensive computations (dependency graphs)
- Efficient state management for large datasets
- Lazy loading and progressive rendering
Print CSS is Tricky
- Media queries behave differently for print
- Need explicit white backgrounds for PDFs
- Page breaks and overflow handling

Design Learnings:

Glassmorphism Best Practices
- Proper use of backdrop-filter and blur
- Layering gradients for depth
- Balancing transparency with readability
Animation Principles
- Stagger delays for list items (0.1s intervals)
- Spring animations feel more natural than linear
- Loading indicators reduce perceived wait time

Product Learnings:

Developer Tool UX
- Developers want speed AND beauty
- Clear progress indicators reduce anxiety
- Export features (PDF) add tremendous value
AI Transparency
- Users want to know confidence levels
- Showing specific examples builds trust
- Explaining how analysis works improves adoption

🚀 What's next for Code Archaeologist

Immediate Roadmap:

🔐 Support for Private Repositories
- OAuth integration with GitHub
- Personal access token support
- Organization-level analysis
📈 Advanced Metrics
- Code complexity scores (cyclomatic complexity)
- Test coverage analysis
- Security vulnerability detection
- Performance hotspot identification
🌍 Multi-Language Support
- Better support for Java, Go, Rust, Ruby
- Language-specific best practices
- Framework detection (React, Django, Spring, etc.)
💾 Analysis History
- Save and compare analyses over time
- Track technical debt trends
- Monitor code quality improvements

Long-term Vision:

🤝 Team Collaboration
- Share analyses within teams
- Commenting and annotation features
- Integration with Jira/Linear for issue tracking
🔄 CI/CD Integration
- GitHub Actions for automatic analysis
- Pull request comments with insights
- Automated documentation updates
📊 Repository Comparisons
- Compare similar projects
- Benchmark against industry standards
- Identify best practices from top repos
🎓 Learning Recommendations
- Suggest tutorials based on codebase tech stack
- Identify knowledge gaps for new team members
- Personalized onboarding paths
🔍 Code Search & Navigation
- Semantic code search powered by AI
- "Show me where authentication happens"
- "Find all API endpoints"
🌐 Browser Extension
- Analyze repos directly from GitHub UI
- Quick summaries on hover
- One-click documentation generation