Skip to content

hfesc/contexto

Repository files navigation

Contexto πŸš€

Up-to-date documentation for LLMs and AI code editors

Contexto is a documentation aggregation platform inspired by Context7 from Upstash. It helps developers provide fresh, accurate documentation to AI assistants like Claude, GPT, and AI code editors like Cursor and Windsurf.

🌟 Features

Core Functionality

  • πŸ•·οΈ Intelligent Crawler: Automatically crawl entire documentation sites with GitHub integration
  • πŸ“š Documentation Aggregation: Index documentation from any library or framework
  • πŸ” Semantic Search: Vector-based search using OpenAI embeddings
  • πŸ€– LLM Enrichment: Automatically enhance documentation with explanations
  • ⚑ Lightning Fast: Redis caching for optimal performance
  • πŸ“ llms.txt Generation: Export documentation in LLM-friendly format

Advanced Features

  • πŸ—ΊοΈ Sitemap Support: Automatic sitemap.xml detection and parsing for efficient crawling
  • ⏱️ Rate Limiting: Built-in rate limiting to respect server resources (1 req/sec)
  • πŸ“Š Analytics Dashboard: Real-time statistics and activity tracking
  • πŸ”— MCP Server: Model Context Protocol for AI editor integration
  • πŸ“Š Job Queue: Real-time progress tracking for crawl operations
  • 🎨 Modern UI: Clean, responsive interface built with Next.js 15 and Tailwind CSS
  • 🐳 Docker Ready: Complete Docker setup for easy deployment

πŸ—οΈ Architecture

Contexto uses a 5-stage processing pipeline:

  1. Parse: Extract code snippets and examples from documentation
  2. Enrich: Add short explanations and metadata using LLMs
  3. Vectorize: Embed content for semantic search
  4. Rerank: Score results for relevance using custom algorithm
  5. Cache: Serve requests from Redis for best performance

πŸ› οΈ Tech Stack

  • Frontend: Next.js 15 (App Router), React 19, TypeScript, Tailwind CSS
  • Backend: Next.js API Routes
  • Database: Upstash Redis (caching), Upstash Vector (embeddings)
  • AI: OpenAI API (embeddings + enrichment)
  • Parsing: Cheerio, Markdown-it
  • Icons: Lucide React

πŸš€ Getting Started

Prerequisites

  • Node.js 18+ and npm
  • Upstash Redis database
  • Upstash Vector database
  • OpenAI API key

Installation

  1. Clone the repository:

```bash git clone cd contexto ```

  1. Install dependencies:

```bash npm install ```

  1. Set up environment variables:

```bash cp .env.example .env ```

Edit `.env` and add your credentials:

```env

Upstash Redis

UPSTASH_REDIS_REST_URL=your_redis_url UPSTASH_REDIS_REST_TOKEN=your_redis_token

Upstash Vector

UPSTASH_VECTOR_REST_URL=your_vector_url UPSTASH_VECTOR_REST_TOKEN=your_vector_token

OpenAI API

OPENAI_API_KEY=your_openai_api_key ```

  1. Run the development server:

```bash npm run dev ```

  1. Open http://localhost:3000 in your browser

Docker Installation (Recommended)

  1. Clone the repository and navigate to the directory

  2. Create .env file with your credentials:

cp .env.example .env
# Edit .env with your credentials
  1. Build and run with Docker Compose:
docker-compose up -d
  1. Access the application at http://localhost:3000

See DEPLOYMENT.md for detailed deployment options.

πŸ“– Usage

Adding a Library

  1. Click "Add Library" in the navigation
  2. Fill in the library details:
    • Name (e.g., "React")
    • Description
    • Version
    • Documentation URL
    • Category
  3. Click "Add Library" to create the library

Crawling Documentation

After adding a library, you have two indexing options:

Option 1: Intelligent Crawler (Recommended)

The crawler automatically discovers and indexes all documentation pages:

  1. Navigate to your library's detail page

  2. Click "Crawl Site"

  3. Configure crawler options:

    • Max Pages: Maximum number of pages to crawl (default: 100)
    • Max Depth: How deep to follow links (default: 5)
    • Require Code: Only index pages with code snippets
    • Follow External Links: Whether to crawl external domains
    • Include/Exclude Patterns: Filter URLs (e.g., /docs/, /api/)
  4. Click "Start Crawling" and monitor the progress

Supported File Types:

  • .md, .mdx - Markdown files
  • .html, .htm - HTML documentation
  • .rst - reStructuredText
  • .ipynb - Jupyter notebooks

GitHub Integration:

  • Automatically detects GitHub repository URLs
  • Uses GitHub API for faster and more reliable crawling
  • Supports branch and path selection
  • Ideal for open-source project documentation

Example URLs:

  • https://nextjs.org/docs - Regular site crawl
  • https://github.com/facebook/react/tree/main/docs - GitHub repo crawl
  • https://docs.python.org/3/ - Python documentation

Option 2: Quick Reindex

For single-page documentation or quick updates:

  1. Click "Quick Reindex" on the library detail page
  2. The system will re-process only the main documentation URL

Analytics Dashboard

Monitor your platform's performance:

  1. Navigate to the Dashboard page
  2. View statistics:
    • Total libraries and documentation chunks
    • Search analytics
    • Crawl job success/failure rates
    • Library-specific statistics
    • Recent activity feed

Searching Documentation

  1. Use the search bar on the homepage or navigate to the Search page
  2. Enter your query (e.g., "useState hook", "routing in Next.js")
  3. View results with enriched explanations and code examples
  4. Copy code snippets directly to your editor

Downloading llms.txt

Each library can be exported as an `llms.txt` file:

  1. Visit a library detail page
  2. Click "Download llms.txt"
  3. Paste the content into your AI editor's context

πŸ”Œ MCP Server Integration

Contexto includes an MCP (Model Context Protocol) server for integration with AI editors like Cursor.

Running the MCP Server

```bash npx tsx mcp-server.ts ```

Available Tools

  • `search_documentation`: Search across all libraries
  • `get_library`: Get details about a specific library
  • `list_libraries`: List all available libraries

Cursor Integration

Add to your Cursor configuration:

```json { "mcpServers": { "contexto": { "command": "npx", "args": ["tsx", "/path/to/contexto/mcp-server.ts"] } } } ```

πŸƒ Development

Advanced Features

Sitemap Support

The crawler automatically detects and parses sitemap.xml files for efficient URL discovery:

  • Checks multiple sitemap locations (/sitemap.xml, /sitemap_index.xml, etc.)
  • Parses sitemap indexes recursively
  • Falls back to regular crawling if no sitemap found
  • Respects sitemap priorities and update frequencies

Rate Limiting

Built-in rate limiting ensures respectful crawling:

  • 1 second delay between requests (default)
  • Prevents server overload
  • Configurable per crawler instance
  • Applies to both web and sitemap-based crawling

Analytics System

Comprehensive analytics tracking:

  • Search query tracking
  • Library statistics
  • Crawl job monitoring
  • Activity feed
  • Real-time dashboard updates

Project Structure

``` contexto/ β”œβ”€β”€ app/ # Next.js app directory β”‚ β”œβ”€β”€ api/ # API routes β”‚ β”‚ β”œβ”€β”€ analytics/ # Analytics endpoint β”‚ β”‚ β”œβ”€β”€ jobs/ # Job status endpoints β”‚ β”‚ β”œβ”€β”€ libraries/ # Library CRUD operations β”‚ β”‚ β”œβ”€β”€ search/ # Search endpoint β”‚ β”‚ └── llmstxt/ # llms.txt generation β”‚ β”œβ”€β”€ dashboard/ # Analytics dashboard β”‚ β”œβ”€β”€ libraries/ # Library pages β”‚ β”œβ”€β”€ search/ # Search page β”‚ β”œβ”€β”€ add/ # Add library page β”‚ β”œβ”€β”€ layout.tsx # Root layout β”‚ β”œβ”€β”€ page.tsx # Homepage β”‚ └── globals.css # Global styles β”œβ”€β”€ components/ # React components β”‚ β”œβ”€β”€ Header.tsx β”‚ β”œβ”€β”€ CrawlerConfig.tsx # Crawler configuration UI β”‚ β”œβ”€β”€ JobProgress.tsx # Job progress tracking β”‚ └── ... β”‚ β”œβ”€β”€ LibraryCard.tsx β”‚ └── SearchBar.tsx β”œβ”€β”€ lib/ # Utility libraries β”‚ β”œβ”€β”€ analytics.ts # Analytics service β”‚ β”œβ”€β”€ crawler.ts # Web crawler β”‚ β”œβ”€β”€ github-crawler.ts # GitHub API crawler β”‚ β”œβ”€β”€ job-queue.ts # Job queue management β”‚ β”œβ”€β”€ sitemap.ts # Sitemap parser β”‚ β”œβ”€β”€ redis.ts # Redis client β”‚ β”œβ”€β”€ vector.ts # Vector store client β”‚ β”œβ”€β”€ openai.ts # OpenAI client β”‚ └── parser.ts # Documentation parser β”œβ”€β”€ types/ # TypeScript types β”‚ β”œβ”€β”€ index.ts β”‚ β”œβ”€β”€ crawler.ts β”‚ └── analytics.ts β”œβ”€β”€ examples/ # Example data β”‚ └── example-libraries.json β”œβ”€β”€ mcp-server.ts # MCP server β”œβ”€β”€ Dockerfile # Docker configuration β”œβ”€β”€ docker-compose.yml # Docker Compose setup β”œβ”€β”€ .dockerignore β”œβ”€β”€ DEPLOYMENT.md # Deployment guide β”œβ”€β”€ package.json β”œβ”€β”€ tsconfig.json β”œβ”€β”€ tailwind.config.ts └── README.md ```

Available Scripts

  • `npm run dev`: Start development server
  • `npm run build`: Build for production
  • `npm run start`: Start production server
  • `npm run lint`: Run ESLint

🀝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

πŸ“„ License

MIT License - feel free to use this project for personal or commercial purposes.

πŸ™ Acknowledgments

πŸ“ž Support

For questions or issues, please open an issue on GitHub.


Built with ❀️ for the AI development community

About

No description, website, or topics provided.

Resources

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors