RAG Chatbot API

A production-ready Retrieval-Augmented Generation system that enables document upload, intelligent chunking, semantic search, and context-aware question answering with automatic citations.

Technical Overview

Core Technologies:

Node.js + TypeScript + Express
Transformers.js (local embeddings)
Groq (LLM inference)
In-memory vector store with cosine similarity

Key Features:

PDF and text file upload with automatic processing
Configurable chunking with overlap (2000/500 chars)
Local embedding generation (384-dimensional vectors)
Semantic search with similarity threshold filtering
LLM-powered answers with source attribution
Retrieval statistics and transparency metrics

Architecture

Upload Flow:
File → Parse → Chunk → Embed (local) → Store

Query Flow:
Question → Embed → Vector Search → Retrieve Top-K → LLM → Response + Citations

Components:

fileParser: Handles PDF and TXT extraction
chunker: Splits documents with configurable overlap
embeddingService: Generates embeddings using Transformers.js
vectorStore: In-memory storage with cosine similarity search
llmService: Groq integration for response generation
promptService: Context-aware prompt construction

Installation

npm install

Configuration

Create .env file:

GROQ_API_KEY=your_groq_api_key
PORT=3000
NODE_ENV=development

Get your Groq API key at: https://console.groq.com/keys

Usage

Start server:

npm run dev

API Endpoints:

# Upload document
POST /api/upload
Content-Type: multipart/form-data
Body: file (PDF or TXT)

# Ask question
POST /api/ask
Content-Type: application/json
Body: { "question": "Your question here" }

# Health check
GET /health

Web Interface: Navigate to http://localhost:3000 for the web UI.

Design Decisions

Local Embeddings: Chose Transformers.js over API-based solutions to eliminate rate limits, reduce operational costs, and improve latency (50-100ms vs 200-500ms). Trade-off: 90MB model download on first run.

Groq for LLM: Selected for superior inference speed, generous free tier, and stable performance during development. Architecture remains provider-agnostic for easy migration.

In-Memory Vector Store: Optimal for prototype and demonstration. Production deployment would migrate to Pinecone or Weaviate for persistence and scalability.

Production Considerations

For production deployment:

Migrate vector store to Pinecone/Weaviate
Implement Redis caching for frequent queries
Add JWT authentication
Configure rate limiting per user
Set up monitoring and logging
Consider hybrid search (semantic + keyword)

Configuration Options

All parameters are configurable in src/utils/config.ts:

rag: {
  chunkSize: 2000,           // Characters per chunk
  chunkOverlap: 500,         // Overlap between chunks
  retrievalK: 5,             // Number of chunks to retrieve
  similarityThreshold: 0.3,  // Minimum similarity score
  llmTemperature: 0.1        // LLM creativity (lower = more factual)
}

Project Structure

src/
├── api/
│   ├── controllers/    # Request handlers
│   ├── middleware/     # Error handling
│   └── routes/         # Route definitions
├── core/
│   ├── chunker.ts      # Text chunking logic
│   ├── embeddings.ts   # Local embedding generation
│   ├── fileParser.ts   # PDF/TXT parsing
│   ├── llm.ts          # Groq integration
│   └── prompt.ts       # Prompt construction
├── storage/
│   └── vectorStore.ts  # In-memory vector storage
├── types/
│   └── index.ts        # TypeScript interfaces
└── utils/
    ├── config.ts       # Configuration management
    └── logger.ts       # Logging utilities

Testing

The system has been tested with:

Multiple document types (PDF, TXT)
Various question formats (factual, conceptual)
Edge cases (no documents, irrelevant questions)
Multi-document retrieval scenarios

Performance

Typical query latency:

Embedding generation: ~50-100ms
Vector search: ~5-10ms
LLM inference: ~500-1500ms
Total: ~1-2 seconds

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
public		public
src		src
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
render.yaml		render.yaml
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAG Chatbot API

Technical Overview

Architecture

Installation

Configuration

Usage

Design Decisions

Production Considerations

Configuration Options

Project Structure

Testing

Performance

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RAG Chatbot API

Technical Overview

Architecture

Installation

Configuration

Usage

Design Decisions

Production Considerations

Configuration Options

Project Structure

Testing

Performance

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages