Up-to-date documentation for LLMs and AI code editors
Contexto is a documentation aggregation platform inspired by Context7 from Upstash. It helps developers provide fresh, accurate documentation to AI assistants like Claude, GPT, and AI code editors like Cursor and Windsurf.
- π·οΈ Intelligent Crawler: Automatically crawl entire documentation sites with GitHub integration
- π Documentation Aggregation: Index documentation from any library or framework
- π Semantic Search: Vector-based search using OpenAI embeddings
- π€ LLM Enrichment: Automatically enhance documentation with explanations
- β‘ Lightning Fast: Redis caching for optimal performance
- π llms.txt Generation: Export documentation in LLM-friendly format
- πΊοΈ Sitemap Support: Automatic sitemap.xml detection and parsing for efficient crawling
- β±οΈ Rate Limiting: Built-in rate limiting to respect server resources (1 req/sec)
- π Analytics Dashboard: Real-time statistics and activity tracking
- π MCP Server: Model Context Protocol for AI editor integration
- π Job Queue: Real-time progress tracking for crawl operations
- π¨ Modern UI: Clean, responsive interface built with Next.js 15 and Tailwind CSS
- π³ Docker Ready: Complete Docker setup for easy deployment
Contexto uses a 5-stage processing pipeline:
- Parse: Extract code snippets and examples from documentation
- Enrich: Add short explanations and metadata using LLMs
- Vectorize: Embed content for semantic search
- Rerank: Score results for relevance using custom algorithm
- Cache: Serve requests from Redis for best performance
- Frontend: Next.js 15 (App Router), React 19, TypeScript, Tailwind CSS
- Backend: Next.js API Routes
- Database: Upstash Redis (caching), Upstash Vector (embeddings)
- AI: OpenAI API (embeddings + enrichment)
- Parsing: Cheerio, Markdown-it
- Icons: Lucide React
- Node.js 18+ and npm
- Upstash Redis database
- Upstash Vector database
- OpenAI API key
- Clone the repository:
```bash git clone cd contexto ```
- Install dependencies:
```bash npm install ```
- Set up environment variables:
```bash cp .env.example .env ```
Edit `.env` and add your credentials:
```env
UPSTASH_REDIS_REST_URL=your_redis_url UPSTASH_REDIS_REST_TOKEN=your_redis_token
UPSTASH_VECTOR_REST_URL=your_vector_url UPSTASH_VECTOR_REST_TOKEN=your_vector_token
OPENAI_API_KEY=your_openai_api_key ```
- Run the development server:
```bash npm run dev ```
- Open http://localhost:3000 in your browser
-
Clone the repository and navigate to the directory
-
Create
.envfile with your credentials:
cp .env.example .env
# Edit .env with your credentials- Build and run with Docker Compose:
docker-compose up -d- Access the application at http://localhost:3000
See DEPLOYMENT.md for detailed deployment options.
- Click "Add Library" in the navigation
- Fill in the library details:
- Name (e.g., "React")
- Description
- Version
- Documentation URL
- Category
- Click "Add Library" to create the library
After adding a library, you have two indexing options:
The crawler automatically discovers and indexes all documentation pages:
-
Navigate to your library's detail page
-
Click "Crawl Site"
-
Configure crawler options:
- Max Pages: Maximum number of pages to crawl (default: 100)
- Max Depth: How deep to follow links (default: 5)
- Require Code: Only index pages with code snippets
- Follow External Links: Whether to crawl external domains
- Include/Exclude Patterns: Filter URLs (e.g.,
/docs/,/api/)
-
Click "Start Crawling" and monitor the progress
Supported File Types:
.md,.mdx- Markdown files.html,.htm- HTML documentation.rst- reStructuredText.ipynb- Jupyter notebooks
GitHub Integration:
- Automatically detects GitHub repository URLs
- Uses GitHub API for faster and more reliable crawling
- Supports branch and path selection
- Ideal for open-source project documentation
Example URLs:
https://nextjs.org/docs- Regular site crawlhttps://github.com/facebook/react/tree/main/docs- GitHub repo crawlhttps://docs.python.org/3/- Python documentation
For single-page documentation or quick updates:
- Click "Quick Reindex" on the library detail page
- The system will re-process only the main documentation URL
Monitor your platform's performance:
- Navigate to the Dashboard page
- View statistics:
- Total libraries and documentation chunks
- Search analytics
- Crawl job success/failure rates
- Library-specific statistics
- Recent activity feed
- Use the search bar on the homepage or navigate to the Search page
- Enter your query (e.g., "useState hook", "routing in Next.js")
- View results with enriched explanations and code examples
- Copy code snippets directly to your editor
Each library can be exported as an `llms.txt` file:
- Visit a library detail page
- Click "Download llms.txt"
- Paste the content into your AI editor's context
Contexto includes an MCP (Model Context Protocol) server for integration with AI editors like Cursor.
```bash npx tsx mcp-server.ts ```
- `search_documentation`: Search across all libraries
- `get_library`: Get details about a specific library
- `list_libraries`: List all available libraries
Add to your Cursor configuration:
```json { "mcpServers": { "contexto": { "command": "npx", "args": ["tsx", "/path/to/contexto/mcp-server.ts"] } } } ```
The crawler automatically detects and parses sitemap.xml files for efficient URL discovery:
- Checks multiple sitemap locations (
/sitemap.xml,/sitemap_index.xml, etc.) - Parses sitemap indexes recursively
- Falls back to regular crawling if no sitemap found
- Respects sitemap priorities and update frequencies
Built-in rate limiting ensures respectful crawling:
- 1 second delay between requests (default)
- Prevents server overload
- Configurable per crawler instance
- Applies to both web and sitemap-based crawling
Comprehensive analytics tracking:
- Search query tracking
- Library statistics
- Crawl job monitoring
- Activity feed
- Real-time dashboard updates
``` contexto/ βββ app/ # Next.js app directory β βββ api/ # API routes β β βββ analytics/ # Analytics endpoint β β βββ jobs/ # Job status endpoints β β βββ libraries/ # Library CRUD operations β β βββ search/ # Search endpoint β β βββ llmstxt/ # llms.txt generation β βββ dashboard/ # Analytics dashboard β βββ libraries/ # Library pages β βββ search/ # Search page β βββ add/ # Add library page β βββ layout.tsx # Root layout β βββ page.tsx # Homepage β βββ globals.css # Global styles βββ components/ # React components β βββ Header.tsx β βββ CrawlerConfig.tsx # Crawler configuration UI β βββ JobProgress.tsx # Job progress tracking β βββ ... β βββ LibraryCard.tsx β βββ SearchBar.tsx βββ lib/ # Utility libraries β βββ analytics.ts # Analytics service β βββ crawler.ts # Web crawler β βββ github-crawler.ts # GitHub API crawler β βββ job-queue.ts # Job queue management β βββ sitemap.ts # Sitemap parser β βββ redis.ts # Redis client β βββ vector.ts # Vector store client β βββ openai.ts # OpenAI client β βββ parser.ts # Documentation parser βββ types/ # TypeScript types β βββ index.ts β βββ crawler.ts β βββ analytics.ts βββ examples/ # Example data β βββ example-libraries.json βββ mcp-server.ts # MCP server βββ Dockerfile # Docker configuration βββ docker-compose.yml # Docker Compose setup βββ .dockerignore βββ DEPLOYMENT.md # Deployment guide βββ package.json βββ tsconfig.json βββ tailwind.config.ts βββ README.md ```
- `npm run dev`: Start development server
- `npm run build`: Build for production
- `npm run start`: Start production server
- `npm run lint`: Run ESLint
Contributions are welcome! Please feel free to submit a Pull Request.
MIT License - feel free to use this project for personal or commercial purposes.
For questions or issues, please open an issue on GitHub.
Built with β€οΈ for the AI development community