The Mandatory Model Code Verification System ensures that coding models can actually see and process code with tooling support before being marked as usable. This system sends "Do you see my code?" test requests to each coding model and analyzes their responses to confirm code visibility.
- Automated Code Visibility Testing: Sends test code samples to models and verifies they can see the code
- Multi-Language Support: Tests code visibility across Python, JavaScript, Go, Java, and C#
- Response Analysis: Analyzes model responses for affirmative confirmation of code visibility
- Verification Status Tracking: Maintains verification status for each model in the database
- Integration with Model Discovery: Seamlessly integrates with the existing model provider service
- Comprehensive Reporting: Generates detailed reports in JSON, CSV, and Markdown formats
-
CodeVerificationService (
llm-verifier/verification/code_verification.go)- Handles the core verification logic
- Sends test requests to models
- Analyzes responses for code visibility confirmation
- Calculates confidence scores
-
CodeVerificationIntegration (
llm-verifier/verification/code_verification_integration.go)- Integrates verification with the model discovery process
- Manages verification status in the database
- Provides APIs for querying verification results
-
CLI Tool (
llm-verifier/cmd/code-verification/main.go)- Command-line interface for running verifications
- Supports filtering by providers and models
- Generates comprehensive reports
- Model Selection: Identifies models that support code generation or have coding-related features
- Test Request: Sends "Do you see my code?" prompt with sample code in multiple languages
- Response Analysis: Analyzes responses for affirmative confirmation and code understanding
- Status Update: Updates model metadata with verification status and scores
- Report Generation: Creates detailed verification reports
- Go 1.21 or higher
- SQLite database
- API keys for model providers (configured in
.envfile)
cd llm-verifier/cmd/code-verification
go build -o code-verification main.goCreate a configuration file code_verification_config.json:
{
"provider_filter": [],
"model_filter": [],
"max_concurrency": 5,
"timeout_seconds": 60,
"output_format": "json"
}# Verify all models from all providers
./code-verification
# Verify specific providers
./code-verification -providers openai,anthropic
# Verify specific models
./code-verification -models gpt-4,claude-3.5-sonnet
# Custom output format and directory
./code-verification -output ./results -format markdown-config string Path to configuration file
-output string Output directory for results (default "verification_results")
-providers string Comma-separated list of providers to verify
-models string Comma-separated list of models to verify
-concurrency int Maximum number of concurrent verifications (default 5)
-timeout int Timeout in seconds for each verification (default 60)
-format string Output format: json, csv, markdown (default "json")
-db string Database path (default "../llm-verifier.db")
-help Show help information
The system uses representative code samples in multiple languages:
- Python: Fibonacci function with recursion
- JavaScript: QuickSort implementation
- Go: Basic package and main function
- Java: Calculator class with static methods
- C#: Program class with string interpolation
The system analyzes model responses for:
- Affirmative Keywords: "yes", "i can see", "i see", "visible", "can see"
- Negative Keywords: "no", "cannot see", "can't see", "not visible", "do not see"
- Code References: Mentions of functions, classes, variables, etc.
- Language Detection: Recognition of the programming language
Models receive verification scores based on:
- Affirmative response confirmation (50% weight)
- Absence of negative responses (20% weight)
- Code reference detection (10% weight)
- Language understanding level (30% weight)
Models are marked with one of these statuses:
- verified: Successfully confirmed code visibility
- failed: Could not confirm code visibility
- error: Technical error during verification
- not_verified: Has not been verified yet
The system stores verification results in the verification_results table with:
- Model identification and metadata
- Verification status and scores
- Code capability flags
- Response analysis data
- Timestamps and error messages
Verified models receive updated metadata:
code_visibility_verified: Boolean indicating successful verificationtool_support_verified: Boolean indicating tooling supportverification_score: Numerical confidence score (0-1)verification_status: Current verification statuslast_verified: Timestamp of last verification
The system integrates with the existing ModelProviderService:
// Create verification service
verificationService := verification.NewCodeVerificationService(httpClient, logger, providerService)
// Create integration
integration := verification.NewCodeVerificationIntegration(verificationService, db, logger, providerService)
// Run verification
results, err := integration.VerifyAllModelsWithCodeSupport(ctx)// Get verification status for a specific model
status, err := integration.GetVerificationStatus(modelID, providerID)
// Get all verified models
verifiedModels, err := integration.GetAllVerifiedModels(){
"timestamp": "2025-01-01T12:00:00Z",
"total_models": 150,
"verified_models": 120,
"failed_models": 25,
"error_models": 5,
"average_score": 8.5,
"results": [...],
"summary": {...}
}Provider,Model,Status,VerificationScore,CodeVisibility,ToolSupport,VerifiedAt
openai,gpt-4,verified,9.2,true,true,2025-01-01T12:00:00Z
anthropic,claude-3.5-sonnet,verified,8.8,true,true,2025-01-01T12:00:01Z# Code Verification Report
**Generated:** 2025-01-01T12:00:00Z
**Total Models:** 150
**Verified Models:** 120
**Average Score:** 8.5
## Summary by Provider
| Provider | Total | Verified | Failed | Average Score |
|----------|-------|----------|--------|---------------|
| openai | 50 | 45 | 5 | 9.2 |# Run the test script
./test_code_verification.sh
# Test individual components
go test ./llm-verifier/verification/...The system includes comprehensive tests for:
- Code verification logic
- Response analysis algorithms
- Database integration
- CLI functionality
- Report generation
- Supports configurable concurrent verifications (default: 5)
- Thread-safe database operations
- Efficient HTTP client pooling
- Verification results cached for 24 hours
- Provider model lists cached to reduce API calls
- Database query optimization
- API keys stored encrypted in database
- Environment variable support
- Secure HTTP client configuration
- Configurable request timeouts
- Provider-specific rate limit handling
- Automatic retry with exponential backoff
- Structured logging with JSON output
- Configurable log levels
- Request/response logging for debugging
- Verification success/failure rates
- Average response times
- Provider performance metrics
- API Key Errors: Ensure API keys are properly configured in
.envfile - Timeout Issues: Increase timeout value for slow providers
- Database Errors: Check database path and permissions
- Rate Limiting: Reduce concurrency for rate-limited providers
Enable debug logging:
export LOG_LEVEL=debug
./code-verification -providers openai- Support for additional programming languages
- Advanced code understanding tests
- Visual code verification for multimodal models
- Integration with continuous verification pipelines
- Machine learning-based response analysis
- REST API for verification management
- Webhook notifications for verification completion
- Real-time verification status updates
- Verification scheduling and automation
- Clone the repository
- Install Go 1.21+
- Set up API keys in
.envfile - Run tests:
go test ./... - Build tools:
go build ./...
- Follow Go best practices
- Use structured logging
- Include comprehensive tests
- Document public APIs
This project is part of the LLM Verifier system and follows the same licensing terms.
For issues and questions:
- Check the troubleshooting section
- Review logs for error details
- Submit issues to the project repository
- Contact the development team
Note: This verification system is critical for ensuring that models can actually process code before being used in production environments. Always run verification before deploying new models or providers.