Automated AI-Powered Code Review System
DTSC-5253 Data Scale Computing - Final Project
- Atharva Patil
- Mihir Chauhan
CodeSense is a serverless, event-driven code review platform that automatically analyzes GitHub pull requests using OpenAI's GPT-4. When developers push code to GitHub, webhooks trigger our AWS Lambda-based system to queue review jobs, analyze code changes, and provide intelligent feedback on code quality, security issues, and best practices.
- Automated Code Reviews: AI-powered analysis using OpenAI GPT-4o-mini
- GitHub Integration: Real-time webhook-based event processing
- Serverless Architecture: AWS Lambda for scalable, cost-effective compute
- Event-Driven Processing: SQS queue for reliable asynchronous job processing
- Web Dashboard: React-based frontend for viewing reviews and findings
- User Authentication: JWT-based secure authentication system
GitHub β API Gateway β Lambda (API) β RDS PostgreSQL
β
SQS Queue
β
Lambda (Worker) β OpenAI API
β
Store Findings
AWS Services Used:
- AWS Lambda: Serverless compute (API handler + Worker processor)
- API Gateway: HTTP API v2 for webhook and REST endpoints
- RDS PostgreSQL: Relational database for events, reviews, and findings
- SQS: Message queue for decoupled asynchronous processing
- S3: Static website hosting for React frontend
- VPC: Private networking for secure database access
- CloudWatch: Logging and monitoring
External APIs:
- GitHub API: Webhook delivery, repository access, code comparison
- OpenAI API: GPT-4o-mini for intelligent code analysis
Tables:
users- User accounts with hashed passwordsrepositories- Connected GitHub repositories with webhook secretsevents- GitHub push events (commits, branches, timestamps)reviews- Review jobs with status trackingfindings- Individual code issues identified by AIdeliveries- Webhook delivery tracking
- AWS Account with appropriate permissions
- GitHub account with repository access
- OpenAI API key
- Docker installed (for Lambda deployment)
- AWS CLI configured
- Python 3.11+
- Node.js 18+ (for frontend)
Backend (Lambda):
DATABASE_URL=postgresql://username:password@host:5432/dbname
OPENAI_API_KEY=sk-proj-...
OPENAI_MODEL=gpt-4o-mini
SECRET_KEY=your-jwt-secret-key
REVIEW_QUEUE_URL=https://sqs.us-east-1.amazonaws.com/.../cloudsense-review-queue
GITHUB_TOKEN=ghp_...Frontend:
VITE_API_URL=https://your-api-gateway-id.execute-api.us-east-1.amazonaws.com/production# Create RDS PostgreSQL instance (db.t3.micro recommended)
aws rds create-db-instance \
--db-instance-identifier cloudsense-postgres \
--db-instance-class db.t3.micro \
--engine postgres \
--master-username postgres \
--master-user-password YOUR_PASSWORD \
--allocated-storage 20 \
--vpc-security-group-ids sg-XXXXXXXX
# Run database migrations
python app/database.pyaws sqs create-queue \
--queue-name cloudsense-review-queue \
--attributes VisibilityTimeout=300,MessageRetentionPeriod=345600Build Lambda package (using Docker for Linux compatibility):
# Build dependencies
docker run --rm --entrypoint pip \
-v "${PWD}/build/api-lambda:/var/task" \
-v "${PWD}/requirements-lambda.txt:/tmp/requirements.txt" \
public.ecr.aws/lambda/python:3.11 \
install -r /tmp/requirements.txt -t /var/task
# Copy application code
cp -r app build/api-lambda/Deploy API Lambda:
aws lambda create-function \
--function-name cloudsense-api \
--runtime python3.11 \
--handler app.lambda_handler.lambda_handler \
--role arn:aws:iam::ACCOUNT:role/cloudsense-lambda-exec \
--memory-size 512 \
--timeout 60 \
--zip-file fileb://lambda-package.zip \
--environment Variables="{DATABASE_URL=...,SECRET_KEY=...,GITHUB_TOKEN=...}"Deploy Worker Lambda:
aws lambda create-function \
--function-name cloudsense-worker \
--runtime python3.11 \
--handler app.lambda_worker.lambda_handler \
--role arn:aws:iam::ACCOUNT:role/cloudsense-lambda-exec \
--memory-size 1024 \
--timeout 300 \
--zip-file fileb://lambda-package.zip \
--environment Variables="{DATABASE_URL=...,OPENAI_API_KEY=...,OPENAI_MODEL=gpt-4o-mini}"
# Create SQS event source mapping
aws lambda create-event-source-mapping \
--function-name cloudsense-worker \
--event-source-arn arn:aws:sqs:us-east-1:ACCOUNT:cloudsense-review-queue \
--batch-size 1# Create HTTP API
aws apigatewayv2 create-api \
--name cloudsense-api \
--protocol-type HTTP \
--target arn:aws:lambda:us-east-1:ACCOUNT:function:cloudsense-api
# Add routes
aws apigatewayv2 create-route \
--api-id API_ID \
--route-key 'POST /webhook/{secret}'
aws apigatewayv2 create-route \
--api-id API_ID \
--route-key 'ANY /api/{proxy+}'cd frontend
# Install dependencies
npm install
# Set production API URL
echo "VITE_API_URL=https://API_ID.execute-api.us-east-1.amazonaws.com/production" > .env.production
# Build
npm run build
# Deploy to S3
aws s3 sync dist/ s3://cloudsense-frontend/
# Configure S3 for SPA routing
aws s3api put-bucket-website \
--bucket cloudsense-frontend \
--website-configuration '{"IndexDocument":{"Suffix":"index.html"},"ErrorDocument":{"Key":"index.html"}}'- Go to your GitHub repository β Settings β Webhooks
- Add webhook:
- Payload URL:
https://API_ID.execute-api.us-east-1.amazonaws.com/production/webhook/YOUR_UNIQUE_SECRET - Content type:
application/json - Events: Push events
- Active: β
- Payload URL:
# Create virtual environment
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\Activate.ps1
# Install dependencies
pip install -r requirements.txt
# Set environment variables
cp .env.example .env
# Edit .env with your values
# Run local server
python -m uvicorn app.web:app --reload --port 8000cd frontend
# Install dependencies
npm install
# Set development API URL
echo "VITE_API_URL=http://localhost:8000" > .env.development
# Run dev server
npm run dev# Start all services (API + PostgreSQL)
docker-compose up -d
# View logs
docker-compose logs -f
# Stop services
docker-compose down# Test webhook endpoint
.\test-review-flow.ps1
# Check reviews in database
.\check-reviews.ps1- Make a commit to connected GitHub repository
- Verify webhook delivery in GitHub (green checkmark)
- Check CloudWatch logs:
aws logs tail /aws/lambda/cloudsense-api --follow - Verify event created in database
- Check SQS queue for message
- Monitor worker Lambda processing
- View findings in web dashboard
CloudWatch Logs:
# API Lambda logs
aws logs tail /aws/lambda/cloudsense-api --follow
# Worker Lambda logs
aws logs tail /aws/lambda/cloudsense-worker --followSQS Queue Monitoring:
aws sqs get-queue-attributes \
--queue-url https://sqs.us-east-1.amazonaws.com/ACCOUNT/cloudsense-review-queue \
--attribute-names AllLambda Metrics:
- Invocations, duration, errors, throttles available in CloudWatch
Monthly cost for moderate usage (~1000 reviews/month):
- Lambda: ~$2-3 (512MB API + 1024MB Worker)
- API Gateway: ~$1-2 (HTTP API requests)
- RDS: ~$10-12 (db.t3.micro with 20GB storage)
- SQS: ~$0.50 (1M requests free tier)
- S3: ~$0.50 (frontend hosting)
- CloudWatch: ~$0.50 (logs)
Total: ~$15-18/month
- JWT-based authentication for API endpoints
- GitHub webhook signature validation
- Unique webhook secrets per repository
- Database credentials in environment variables (never committed)
- VPC isolation for RDS database
- IAM least-privilege permissions for Lambda
- Large commits (>50 files) may timeout due to Lambda 300s limit
- OpenAI API rate limits may cause delays during high traffic
- Database transactions require explicit commits in Lambda environment
- Cross-platform Python dependency builds need Docker
MIT License - See LICENSE file for details
- Course: DTSC-5253 Data Scale Computing, University of Colorado Boulder
- Instructor: Eric Goodman
- Technologies: AWS, OpenAI, GitHub, FastAPI, React, PostgreSQL
