A production-ready RAG (Retrieval-Augmented Generation) chatbot AI Agent for WeChat integration using GPT via API2D/OpenAI, with vector search capabilities powered by Qdrant.
- 🤖 RAG-Powered Chatbot: Retrieval-Augmented Generation with vector similarity search
- 💬 WeChat Integration: Public account integration with duplicate message prevention
- 📝 Conversation History: Redis-backed persistent conversation tracking
- 🔍 Vector Search: Qdrant vector database for semantic similarity search
- 🗄️ Knowledge Base Management: CRUD operations for knowledge documents
- 🎯 Context-Aware Responses: Multi-turn conversations with conversation history
- 🔒 Security: Security headers, CORS, rate limiting, and admin API key protection
- 🛡️ Production-Ready: Error handling, graceful shutdown, health checks
- 🐳 Docker Support: Multi-stage builds with Docker Compose
- 🔧 Code Quality: ESLint and Prettier for consistent code style
- ✅ Environment Validation: Zod-based environment variable validation with type safety
- 🏗️ Clean Architecture: Repository pattern, service layer separation
- 📊 Admin Endpoints: Knowledge base reindexing and management
User Question → Embedding → Vector Search (Qdrant) → Context Retrieval → LLM (GPT) → Response
- Backend: Node.js 18+, TypeScript 5.3+, Express
- AI/LLM: OpenAI/API2D (GPT models)
- Vector Database: Qdrant (for embeddings and similarity search)
- Cache/Storage: Redis (conversation history, message deduplication)
- Database: MySQL (metadata, content, brands)
- Embeddings: OpenAI text-embedding models
- Node.js 18+ (ES modules support)
- TypeScript 5.3+
- MySQL database
- Redis server
- Qdrant vector database
- API2D/OpenAI account and API key
- WeChat public account (for WeChat integration)
- Clone and install dependencies:
npm install- Set up environment variables:
cp .env.example .env-
Configure
.envfile (see Configuration section below) -
Build the project:
npm run build- Start the server:
npm start- Development mode (with auto-reload):
npm run devThe server will start on port 3000 (or the port specified in .env).
The project uses Zod for environment variable validation:
- ✅ Early validation - Fails fast at startup if variables are missing/invalid
- ✅ Type safety - Automatic type coercion (strings → numbers/booleans)
- ✅ Clear errors - Shows exactly what's wrong with helpful messages
- ✅ URL validation - Validates URLs and enums
- ✅ Range validation - Validates numeric ranges (e.g., temperature 0-2)
If validation fails, the app will exit with clear error messages showing which variables need to be fixed.
| Variable | Description | Default | Required |
|---|---|---|---|
PORT |
Server port | 3000 | No |
NODE_ENV |
Environment (development/production) | development | No |
| OpenAI/API2D | |||
OPENAI_API_KEY |
Your OpenAI/API2D API key | - | ✅ Yes |
OPENAI_BASE_URL |
API base URL | https://openai.api2d.net | No |
GPT_MODEL |
GPT model to use | gpt-3.5-turbo | No |
GPT_MAX_TOKENS |
Maximum tokens in response | 1000 | No |
GPT_TEMPERATURE |
Response creativity (0-1) | 0.7 | No |
| Embeddings | |||
EMBEDDING_MODEL |
Embedding model name | text-embedding-ada-002 | ✅ Yes |
EMBEDDING_DIMENSIONS |
Embedding dimensions | 1536 | ✅ Yes |
EMBEDDING_BATCH_SIZE |
Batch size for embeddings | 100 | No |
| Vector Database (Qdrant) | |||
QDRANT_URL |
Qdrant server URL | - | ✅ Yes |
QDRANT_API_KEY |
Qdrant API key (if required) | - | No |
VECTOR_CHUNK_SIZE |
Text chunk size for vectorization | 500 | No |
VECTOR_TOP_K |
Number of top results to retrieve | 3 | No |
VECTOR_MIN_SCORE |
Minimum similarity score | 0.75 | No |
| Database (MySQL) | |||
MYSQL_HOST |
MySQL host | - | ✅ Yes |
MYSQL_USER |
MySQL username | - | ✅ Yes |
MYSQL_PASSWORD |
MySQL password | - | ✅ Yes |
MYSQL_DATABASE |
MySQL database name | - | ✅ Yes |
| Redis | |||
REDIS_HOST |
Redis host | localhost | ✅ Yes |
REDIS_PORT |
Redis port | 6379 | ✅ Yes |
REDIS_PASSWORD |
Redis password | - | No |
CONVERSATION_TTL_SECONDS |
Conversation history TTL | 604800 (7 days) | No |
WECHAT_TOKEN |
WeChat verification token | - | ✅ Yes (for WeChat) |
WECHAT_APPID |
WeChat App ID | - | No |
WECHAT_APPSECRET |
WeChat App Secret | - | No |
WECHAT_ENCODING_AES_KEY |
WeChat encoding AES key | - | No |
| Admin | |||
ADMIN_API_KEY |
Admin API key for protected endpoints | - | No |
| Features | |||
RAG_BOOTSTRAP |
Auto-build knowledge base on startup | false | No |
ALLOWED_ORIGINS |
CORS allowed origins (comma-separated) | * | No |
Base URL: http://localhost:3000
API Prefix: /api/chat (unique namespace to avoid conflicts with yoda-app's /api/v1/)
- GET
/api/chat/health- Production-ready health check with detailed status - GET
/health- Simple health check (legacy compatibility) - GET
/- Service information and available endpoints
-
POST
/api/chat/chatbot/ask- Send a question and get an AI-powered answer{ "question": "什么是褪黑素?", "userId": "user123" } -
GET
/api/chat/chatbot/history/:userId- Get conversation history for a user
- POST
/api/chat/chatbot/knowledge/:id- Add a knowledge document - POST
/api/chat/chatbot/knowledge/bulk- Bulk import knowledge documents - GET
/api/chat/chatbot/knowledge/search?q=query- Search knowledge base - GET
/api/chat/chatbot/knowledge/:id- Get a knowledge document - PUT
/api/chat/chatbot/knowledge/:id- Update a knowledge document - DELETE
/api/chat/chatbot/knowledge/:id- Delete a knowledge document
-
GET
/api/chat/wx- WeChat server verification- Query params:
signature,timestamp,nonce,echostr
- Query params:
-
POST
/api/chat/wx- Receive WeChat messages (XML format)- Automatically handles duplicate messages (prevents retry processing)
- Supports text messages and events (subscribe, etc.)
WeChat Configuration:
- Set the server URL in your WeChat public account settings:
https://your-domain.com/api/chat/wx - Use the token from your
.envfile (WECHAT_TOKEN) - WeChat will verify the server automatically
WeChat Duplicate Prevention:
- Messages are cached by
MsgIdin Redis (1 hour TTL) - Retry requests return cached response immediately
- Prevents duplicate processing when WeChat retries (>5 second timeout)
- POST
/api/chat/admin/reindex- Rebuild knowledge base from MySQL- Requires
x-api-keyheader orapiKeyquery parameter - Body:
{ "types": ["content", "brand"] }(optional, defaults to both) - Returns 202 Accepted with ingestion status
- Requires
# Build TypeScript to JavaScript
npm run build
# Start production server
npm start
# Development mode with auto-reload
npm run dev
# Linting
npm run lint # Check for linting errors
npm run lint:fix # Auto-fix linting errors
# Code formatting
npm run format # Format all TypeScript files
npm run format:check # Check if files are formatted
# Type checking
npm run type-check # Check TypeScript types without buildingThe project uses ESLint and Prettier for code quality:
- ESLint: Catches bugs, enforces best practices, type safety
- Prettier: Automatic code formatting for consistency
See LINTING_SETUP.md for detailed setup and usage.
yoda-chat/
├── src/
│ ├── index.ts # Main server entry point
│ ├── config/ # Configuration modules
│ │ ├── db.ts # MySQL database config
│ │ ├── embed.ts # Embedding client config
│ │ ├── env.ts # Environment variables (Zod validated)
│ │ ├── openai.ts # OpenAI/API2D config
│ │ └── qdrant.ts # Qdrant vector DB config
│ ├── controllers/ # Request handlers (HTTP layer)
│ │ ├── chatbotController.ts # Chatbot API endpoints
│ │ └── wechatController.ts # WeChat message handling
│ ├── services/ # Business logic layer
│ │ ├── cacheService.ts # Redis client and connection
│ │ ├── chatService.ts # Chatbot agent (conversation management)
│ │ ├── chunkingService.ts # Text chunking for RAG
│ │ ├── dbService.ts # Database operations
│ │ ├── embeddingService.ts # Text embedding generation
│ │ ├── llmService.ts # LLM/OpenAI API integration
│ │ ├── vectorService.ts # Qdrant vector operations
│ │ └── wechatService.ts # WeChat business logic
│ ├── repositories/ # Data access layer
│ │ ├── conversationRepository.ts # Conversation database operations
│ │ └── knowledgeRepository.ts # Content/brand database operations
│ ├── routes/ # Express routes
│ │ ├── admin.ts # Admin endpoints
│ │ ├── chatbot.ts # Chatbot routes
│ │ └── wechat.ts # WeChat routes
│ ├── middleware/ # Express middleware
│ │ ├── errorHandler.ts # Centralized error handling
│ │ └── security.ts # Security headers, CORS, rate limiting
│ ├── domain/ # Domain types and models
│ │ └── types/ # TypeScript type definitions
│ │ ├── chatbot.ts
│ │ ├── chatcompletion.ts
│ │ ├── chatConversations.ts
│ │ ├── chunkResult.ts
│ │ ├── dbContent.ts
│ │ ├── knowledge.ts
│ │ ├── qdrant.ts
│ │ └── wechat.ts
│ └── utils/ # Utility functions
│ ├── chunkText.ts # Text chunking utilities
│ ├── extract.ts # HTML/text extraction
│ ├── hash.ts # Hashing utilities
│ ├── logger.ts # Logging utility
│ ├── promise.ts # Promise utilities
│ └── similarity.ts # Similarity calculations
├── dist/ # Compiled JavaScript output
├── .eslintrc.json # ESLint configuration
├── .prettierrc.json # Prettier configuration
├── docker-compose.yml # Docker Compose configuration
├── Dockerfile # Docker multi-stage build
├── tsconfig.json # TypeScript configuration
├── package.json
└── README.md
- User Question → Received via API or WeChat
- Embedding Generation → Convert question to vector using OpenAI embeddings
- Vector Search → Query Qdrant for similar chunks (top-K results)
- Context Retrieval → Filter by similarity score threshold
- LLM Processing → Send question + context + history to GPT
- Response Generation → GPT generates answer based on retrieved knowledge
- History Storage → Save conversation to Redis for context in future messages
- Redis Storage: Conversation history stored in Redis lists
- Context Window: Last 10 messages used for LLM context
- TTL: Conversations expire after 7 days (configurable)
- Efficient Operations: Uses Redis LPUSH/LRANGE/LTRIM for O(1) operations
- Source: MySQL database (contents and brands)
- Processing: HTML extraction → Text chunking → Embedding → Vector storage
- Chunking: Optimized for Chinese text (200-500 characters)
- Indexing: Automatic on startup (if
RAG_BOOTSTRAP=true) or via admin API
- Message Verification: SHA1 signature verification
- XML Parsing: Automatic XML to JSON conversion
- Duplicate Prevention: Redis-cached responses by MsgId
- Event Handling: Subscribe events, text messages, etc.
# Build and start all services
docker compose up -d --build
# View logs
docker compose logs -f yoda-chat
# Stop services
docker compose down- yoda-chat: Main application container
- mysql: MySQL database (if not using external)
- redis: Redis cache (if not using external)
- qdrant: Qdrant vector database (if not using external)
# Check container health
docker compose ps
# Test health endpoint
curl http://localhost:3000/api/chat/healthSee DOCKER_COMPOSE_TEST.md for detailed testing guide.
- Docker and Docker Compose installed
- Environment variables configured
- External services (MySQL, Redis, Qdrant) accessible
- Domain and SSL certificate configured
- Set up environment variables (see
ECS_SETUP.md) - Build and deploy:
docker compose up -d --build
- Configure reverse proxy (nginx) to route
/api/chat/*to this service - Set up monitoring and logging
- Configure CI/CD (see
.github/workflows/)
See PRODUCTION_READINESS.md and ECS_SETUP.md for detailed guides.
- API_DOCUMENTATION.md - Complete API reference
- PRODUCTION_READINESS.md - Production checklist
- ECS_SETUP.md - ECS deployment guide
- RAG_ROADMAP.md - RAG implementation roadmap
- AGENT_STRATEGY.md - AI Agent strategy and assessment
- CODE_REVIEW.md - Code review and improvements
- LINTING_SETUP.md - ESLint and Prettier setup
- TEST_WECHAT_API.md - WeChat API testing guide
-
WeChat verification fails
- Check that
WECHAT_TOKENmatches your WeChat account settings - Verify callback URL:
https://your-domain.com/api/chat/wx - Check server logs for signature verification errors
- Check that
-
OpenAI/API2D errors
- Verify
OPENAI_API_KEYis correct and has sufficient credits - Check
OPENAI_BASE_URLis correctly configured - Review API logs for detailed error messages
- Verify
-
Vector search returns no results
- Ensure knowledge base is indexed (check Qdrant collection)
- Verify
VECTOR_MIN_SCOREthreshold is not too high - Check embedding model matches between indexing and search
- Run admin reindex:
POST /api/chat/admin/reindex
-
Redis connection errors
- Verify
REDIS_HOSTandREDIS_PORTare correct - Check Redis server is running and accessible
- Verify
REDIS_PASSWORDif authentication is enabled
- Verify
-
Qdrant connection errors
- Verify
QDRANT_URLis correct and accessible - Check
QDRANT_API_KEYif authentication is required - Ensure Qdrant collection exists and is initialized
- Verify
-
Port already in use
- Change
PORTin.envto a different port - Or stop the process using port 3000:
lsof -ti:3000 | xargs kill
- Change
-
Docker issues
- Ensure
taklip-shared-networkexists:docker network create taklip-shared-network - Check
.envfile is properly configured - Review container logs:
docker compose logs yoda-chat
- Ensure
test-wechat-api.sh- Test WeChat API endpointstest-wechat-api-verbose.sh- Verbose WeChat testingtest-wechat-curl.sh- cURL-based WeChat teststest-wechat.py- Python-based WeChat teststest-docker-compose.sh- Docker Compose testingtest-full-deploy.sh- Full deployment testing
See TEST_WECHAT_API.md for detailed testing instructions.
- ✅ Security Headers: X-Frame-Options, X-Content-Type-Options, etc.
- ✅ CORS Configuration: Configurable allowed origins
- ✅ Rate Limiting: 100 requests/minute per IP
- ✅ Admin API Protection: API key authentication
- ✅ Input Validation: Request validation and sanitization (XSS pattern detection)
- ✅ Error Handling: Centralized error handling without information leakage
- ✅ WeChat Signature Verification: SHA1 signature validation
- ✅ Environment Validation: Zod-based validation ensures all required secrets are set
- ✅ Parameterized Queries: SQL injection prevention
- Vector Search: Optimized Qdrant queries with similarity thresholds
- Caching: Redis caching for conversation history and duplicate prevention
- Efficient Chunking: Chinese-optimized text chunking (200-500 chars)
- Batch Processing: Configurable embedding batch sizes
- Connection Pooling: Database connection management
- Repository Pattern: Clean data access layer for better maintainability
- Service Layer: Separated business logic for better testability
- Follow the code style (ESLint + Prettier)
- Run
npm run lintandnpm run formatbefore committing - Follow the architecture patterns:
- Repositories for database operations
- Services for business logic
- Controllers for HTTP handling
- Add tests for new features
- Update documentation as needed
- Database operations are in
repositories/directory conversationRepository.ts- Conversation database operationsknowledgeRepository.ts- Content/brand database operations
- Business logic is in
services/directory - Services handle orchestration and business rules
- Controllers delegate to services
- Type definitions are in
domain/types/directory - Clear separation of domain models from infrastructure
See CODE_REVIEW.md for detailed architecture review and improvement recommendations.
ISC
Built with ❤️ for intelligent conversations