Model Context Protocol (MCP) server for comprehensive AI testing, evaluation, and quality assurance.
AI Testing MCP provides standardized testing methodologies, evaluation metrics, and automated testing workflows for AI/ML systems. It implements the Model Context Protocol for seamless integration with AI development tools.
- Unit Tests: Component-level testing
- Integration Tests: End-to-end workflow testing
- Performance Tests: Latency, throughput, resource usage
- Security Tests: Adversarial attacks, prompt injection
- Quality Tests: Output validation, consistency checks
- Accuracy: Precision, recall, F1 score
- Quality: Coherence, relevance, fluency
- Safety: Toxicity, bias detection
- Performance: Response time, token usage
- Cost: API costs, compute resources
- Standard Protocol: Implements MCP specification
- Tool Definitions: Testing tools for AI assistants
- Resource Management: Test data and configuration
- Prompt Templates: Reusable test prompts
# Clone the repository
git clone https://github.com/groovy-web/ai-testing-mcp.git
cd ai-testing-mcp
# Install dependencies
npm install
# Configure environment
cp .env.example .env
# Edit with your API keys
# Start the MCP server
npm start
# Use with MCP client
# Add server configuration to your MCP client{
"mcpServers": {
"ai-testing": {
"command": "node",
"args": ["/path/to/ai-testing-mcp/dist/index.js"],
"env": {
"OPENAI_API_KEY": "your-key",
"ANTHROPIC_API_KEY": "your-key"
}
}
}
}Execute a comprehensive test suite for an AI model.
{
"name": "run_test_suite",
"description": "Run tests on an AI model",
"inputSchema": {
"type": "object",
"properties": {
"model": { "type": "string" },
"testCategory": {
"type": "string",
"enum": ["accuracy", "performance", "security", "quality"]
},
"testCases": { "type": "array" }
}
}
}Evaluate AI model outputs against metrics.
{
"name": "evaluate_output",
"description": "Evaluate model output quality",
"inputSchema": {
"type": "object",
"properties": {
"output": { "type": "string" },
"expected": { "type": "string" },
"metrics": { "type": "array" }
}
}
}Generate test cases for specific scenarios.
{
"name": "generate_test_cases",
"description": "Generate test cases",
"inputSchema": {
"type": "object",
"properties": {
"scenario": { "type": "string" },
"count": { "type": "number" }
}
}
}ai-testing-mcp/
├── docs/ # Documentation
│ ├── mcp-protocol.md
│ ├── testing-guide.md
│ └── metrics.md
├── examples/ # Usage examples
│ ├── basic-testing/
│ ├── custom-metrics/
│ └── integration-examples/
├── src/ # Source code
│ ├── server/ # MCP server
│ ├── tools/ # Tool implementations
│ ├── metrics/ # Evaluation metrics
│ └── tests/ # Test definitions
└── schemas/ # JSON schemas
from mcp_client import MCPClient
client = MCPClient("ai-testing")
# Run accuracy tests
result = client.call_tool("run_test_suite", {
"model": "gpt-4",
"testCategory": "accuracy",
"testCases": [
{
"input": "What is 2+2?",
"expected": "4"
}
]
})
print(f"Accuracy: {result.metrics.accuracy}")# Evaluate with custom metrics
evaluation = client.call_tool("evaluate_output", {
"output": model_response,
"expected": expected_response,
"metrics": [
"exact_match",
"semantic_similarity",
"coherence",
"relevance"
]
})- MCP Protocol - Protocol specification
- Testing Guide - How to write tests
- Metrics Reference - Available evaluation metrics
- API Documentation - Complete API reference
- Exact match
- Semantic similarity
- Factual correctness
- Mathematical accuracy
- Response time
- Throughput
- Token efficiency
- Resource usage
- Prompt injection
- Jailbreak attempts
- Toxic content
- Bias detection
- Coherence
- Fluency
- Relevance
- Completeness
module.exports = {
// Models to test
models: [
{
name: "gpt-4",
provider: "openai",
apikey: process.env.OPENAI_API_KEY
},
{
name: "claude-3-opus",
provider: "anthropic",
apikey: process.env.ANTHROPIC_API_KEY
}
],
// Test configurations
tests: {
accuracy: {
enabled: true,
threshold: 0.95
},
performance: {
maxLatency: 2000,
maxTokens: 1000
},
security: {
enabled: true,
strict: true
}
},
// Output format
reports: {
format: "json",
destination: "./test-results"
}
};# Run all tests
npm test
# Run specific category
npm test -- --category accuracy
# Run with specific model
npm test -- --model gpt-4
# Generate report
npm run reportWe welcome contributions! Please see CONTRIBUTING.md for guidelines.
This project is licensed under the MIT License - see LICENSE for details.
Please read CODE_OF_CONDUCT.md to understand our community standards.
- GitHub Issues: Bug reports and feature requests
- Discussions: Community questions
- MCP Documentation: https://modelcontextprotocol.io
Explore more open-source tools from Groovy Web:
- langchain-multi-agent-example -- Multi-agent systems tutorial with LangChain
- rag-system-pgvector -- Production RAG with PostgreSQL + pgvector
- rag-systems-production -- Enterprise-grade RAG systems
- ai-testing-mcp -- AI testing via Model Context Protocol
- edge-computing-starter -- Cloudflare Workers + Hono template
- claude-code-workflows -- Workflows for Claude Code
- groovy-web-ai-agents -- Production AI agent configs
- groovy-web-examples -- Groovy/Grails examples