LLM Verifier is a comprehensive Go application for verifying and benchmarking Large Language Models across multiple providers. The system provides automated testing, performance scoring, and configuration export capabilities.
llm-verifier/
├── cmd/ # Command-line interface
├── llmverifier/ # Core business logic
│ ├── config_export.go # Configuration export functionality
│ ├── verifier.go # Model verification engine
│ ├── analytics.go # Analytics and monitoring
│ └── migration.go # Configuration migration tools
├── providers/ # Provider-specific implementations
├── database/ # Data persistence layer
├── logging/ # Structured logging
└── tests/ # Comprehensive test suite
- Dependency Injection: Services accept dependencies through interfaces
- Strategy Pattern: Different verification strategies for different providers
- Observer Pattern: Event-driven architecture for monitoring
- Factory Pattern: Provider and service instantiation
- Repository Pattern: Data access abstraction
- Go 1.21 or later
- Git
- Make (optional, for build automation)
- Docker (for integration testing)
-
Clone and setup:
git clone https://github.com/your-org/llm-verifier.git cd llm-verifier go mod download -
Run tests:
go test ./... -v -
Build the application:
go build ./cmd/main.go -o llm-verifier
-
Run in development mode:
export LLM_VERIFIER_DEBUG=true ./llm-verifier --help
- Create a feature branch:
git checkout -b feature/your-feature - Make changes with tests
- Run full test suite:
go test ./... -race -cover - Update documentation if needed
- Submit pull request
The verification process consists of several stages:
- Discovery: Identify available models from providers
- Preparation: Set up test scenarios and prompts
- Execution: Run tests against each model
- Scoring: Calculate performance metrics
- Reporting: Generate comprehensive reports
Models are scored across multiple dimensions:
type PerformanceScore struct {
OverallScore float64 // Weighted average of all metrics
CodeCapability float64 // Code generation and analysis ability
Responsiveness float64 // API response times
Reliability float64 // Error rates and consistency
FeatureRichness float64 // Advanced feature support
ValueProposition float64 // Cost vs. performance ratio
}Scoring Weights:
- Code Capability: 25%
- Responsiveness: 20%
- Reliability: 20%
- Feature Richness: 20%
- Value Proposition: 15%
Each provider implements the Provider interface:
type Provider interface {
SendMessages(ctx context.Context, messages []message.Message, tools []tools.BaseTool) (*ProviderResponse, error)
StreamResponse(ctx context.Context, messages []message.Message, tools []tools.BaseTool) <-chan ProviderEvent
Model() models.Model
}Supported Providers:
- OpenAI (GPT-3.5, GPT-4, GPT-4o)
- Anthropic (Claude models)
- Google (Gemini)
- Groq (Fast inference)
- Together AI
- Fireworks AI
- And 15+ more providers
Create a new provider file in providers/:
// providers/custom_provider.go
package providers
type CustomProvider struct {
apiKey string
baseURL string
model models.Model
httpClient *http.Client
}
func NewCustomProvider(apiKey, baseURL string, model models.Model) *CustomProvider {
return &CustomProvider{
apiKey: apiKey,
baseURL: baseURL,
model: model,
httpClient: &http.Client{Timeout: 30 * time.Second},
}
}func (p *CustomProvider) SendMessages(ctx context.Context, messages []message.Message, tools []tools.BaseTool) (*ProviderResponse, error) {
// Convert messages to provider format
requestBody := p.convertMessages(messages)
// Add tools if supported
if len(tools) > 0 {
requestBody.Tools = p.convertTools(tools)
}
// Make API request
resp, err := p.makeRequest(ctx, "POST", "/chat/completions", requestBody)
if err != nil {
return nil, fmt.Errorf("Custom provider request failed: %w", err)
}
// Parse response
return p.parseResponse(resp)
}
func (p *CustomProvider) StreamResponse(ctx context.Context, messages []message.Message, tools []tools.BaseTool) <-chan ProviderEvent {
// Implement streaming if supported
ch := make(chan ProviderEvent)
go func() {
defer close(ch)
// Streaming implementation
}()
return ch
}
func (p *CustomProvider) Model() models.Model {
return p.model
}Update config_export.go to include the new provider:
func NewProvider(providerName models.ModelProvider, opts ...ProviderClientOption) (Provider, error) {
// ... existing cases ...
case models.ProviderCustom:
return &baseProvider[CustomClient]{
options: clientOptions,
client: newCustomClient(clientOptions),
}, nil
// ... rest of cases ...
}Add custom provider detection in extractProvider():
func extractProvider(endpoint string) string {
// ... existing patterns ...
if strings.Contains(endpoint, "custom-api.com") {
return "custom"
}
// ... existing logic ...
}Create comprehensive tests:
// providers/custom_provider_test.go
func TestCustomProvider_SendMessages(t *testing.T) {
// Test message sending
}
func TestCustomProvider_StreamResponse(t *testing.T) {
// Test streaming functionality
}
func TestCustomProvider_ErrorHandling(t *testing.T) {
// Test error scenarios
}Add provider documentation to user manual and API reference.
- Unit Tests: Individual function/component testing
- Integration Tests: Component interaction testing
- End-to-End Tests: Complete workflow testing
- Performance Tests: Load and performance benchmarking
- Security Tests: Vulnerability and sanitization testing
tests/
├── unit/ # Unit tests (90%+ coverage)
├── integration/ # Integration tests
├── e2e/ # End-to-end tests
├── performance/ # Performance benchmarks
├── security/ # Security validation
└── compatibility/ # Cross-platform testing
Full test suite:
go test ./... -v -race -coverSpecific test categories:
# Unit tests only
go test ./llmverifier -v -short
# Integration tests
go test ./tests/integration -v
# Performance benchmarks
go test -bench=. -benchmem ./...
# Coverage report
go test ./... -coverprofile=coverage.out
go tool cover -html=coverage.outBasic test structure:
func TestFunctionName(t *testing.T) {
// Arrange
setupTestData()
// Act
result, err := functionUnderTest(input)
// Assert
assert.NoError(t, err)
assert.Equal(t, expectedResult, result)
}Table-driven tests:
func TestFunctionName(t *testing.T) {
testCases := []struct {
name string
input TestInput
expected TestOutput
}{
{"case1", input1, expected1},
{"case2", input2, expected2},
}
for _, tc := range testCases {
t.Run(tc.name, func(t *testing.T) {
result := functionUnderTest(tc.input)
assert.Equal(t, tc.expected, result)
})
}
}type Provider interface {
SendMessages(ctx context.Context, messages []message.Message, tools []tools.BaseTool) (*ProviderResponse, error)
StreamResponse(ctx context.Context, messages []message.Message, tools []tools.BaseTool) <-chan ProviderEvent
Model() models.Model
}type ModelVerifier interface {
VerifyModel(ctx context.Context, model models.Model) (*VerificationResult, error)
VerifyMultipleModels(ctx context.Context, models []models.Model) ([]VerificationResult, error)
GetVerificationHistory(modelID string) ([]VerificationResult, error)
}type ExportOptions struct {
IncludeAPIKey bool // Include API keys in export
MinScore float64 // Minimum score threshold
Providers []string // Specific providers to include
OutputFormat string // Export format
Compression bool // Compress output
}type VerificationResult struct {
ModelInfo ModelInfo `json:"model_info"`
PerformanceScores PerformanceScore `json:"performance_scores"`
Error string `json:"error,omitempty"`
Timestamp time.Time `json:"timestamp"`
}LLM Verifier uses structured error handling:
// Custom error types
type VerificationError struct {
ModelID string
Provider string
ErrorType string
Message string
}
func (e *VerificationError) Error() string {
return fmt.Sprintf("[%s] %s: %s", e.Provider, e.ModelID, e.Message)
}
// Error wrapping
if err := verifyModel(model); err != nil {
return fmt.Errorf("model verification failed for %s: %w", model.ID, err)
}CPU profiling:
go test -cpuprofile=cpu.prof -bench=.
go tool pprof cpu.profMemory profiling:
go test -memprofile=mem.prof -bench=.
go tool pprof mem.prof- Connection Pooling: Reuse HTTP connections
- Request Batching: Group multiple requests
- Caching: Cache verification results
- Parallel Processing: Concurrent verification
- Resource Limits: Control memory and CPU usage
Performance benchmarks:
func BenchmarkVerification(b *testing.B) {
for i := 0; i < b.N; i++ {
verifyModel(testModel)
}
}Load testing:
func BenchmarkConcurrentVerification(b *testing.B) {
// Test concurrent model verification
sem := make(chan struct{}, 10) // Limit concurrency
// ... benchmark implementation
}All inputs are validated and sanitized:
func validateInput(input, inputType string) bool {
switch inputType {
case "model_id":
return validateModelID(input)
case "api_key":
return validateAPIKey(input)
case "endpoint":
return validateEndpoint(input)
default:
return false
}
}API keys and sensitive data are handled securely:
// Mask sensitive data in logs
func maskAPIKey(apiKey string) string {
if len(apiKey) <= 8 {
return "***"
}
return apiKey[:4] + "***" + apiKey[len(apiKey)-4:]
}
// Secure configuration storage
func saveSecureConfig(config map[string]interface{}, filePath string) error {
// Encrypt sensitive fields
encrypted := encryptSensitiveFields(config)
// Save with restrictive permissions
return saveWithPermissions(encrypted, filePath, 0600)
}Prevent API abuse:
type RateLimiter struct {
requests map[string]*time.Ticker
limits map[string]int
}
func (rl *RateLimiter) Allow(provider string) bool {
limit := rl.limits[provider]
// Rate limiting logic
}Dockerfile:
FROM golang:1.21-alpine AS builder
WORKDIR /app
COPY . .
RUN go build -o llm-verifier ./cmd/main.go
FROM alpine:latest
RUN apk --no-cache add ca-certificates
COPY --from=builder /app/llm-verifier /usr/local/bin/
EXPOSE 8080
CMD ["llm-verifier", "serve"]Docker Compose:
version: '3.8'
services:
llm-verifier:
build: .
ports:
- "8080:8080"
environment:
- LLM_VERIFIER_DATABASE_URL=postgres://...
volumes:
- ./config:/app/configDeployment manifest:
apiVersion: apps/v1
kind: Deployment
metadata:
name: llm-verifier
spec:
replicas: 3
template:
spec:
containers:
- name: llm-verifier
image: your-org/llm-verifier:latest
ports:
- containerPort: 8080
env:
- name: LLM_VERIFIER_DATABASE_URL
valueFrom:
secretKeyRef:
name: llm-verifier-secrets
key: database-urlMetrics collection:
// Prometheus metrics
var (
verificationDuration = prometheus.NewHistogramVec(
prometheus.HistogramOpts{
Name: "llm_verifier_duration_seconds",
Help: "Time taken for model verification",
},
[]string{"provider", "model"},
)
verificationErrors = prometheus.NewCounterVec(
prometheus.CounterOpts{
Name: "llm_verifier_errors_total",
Help: "Total number of verification errors",
},
[]string{"provider", "error_type"},
)
)Logging:
// Structured logging
logger := logrus.New()
logger.SetFormatter(&logrus.JSONFormatter{})
logger.WithFields(logrus.Fields{
"provider": providerName,
"model": modelID,
"duration": duration,
}).Info("Model verification completed")- Go Style: Follow standard Go formatting (
gofmt) - Documentation: Document all public APIs
- Testing: 90%+ test coverage for new code
- Error Handling: Use error wrapping and structured errors
- Logging: Use structured logging with appropriate levels
feat: add support for new AI provider
fix: resolve ProviderInitError in OpenCode configs
docs: update user manual with troubleshooting guide
test: add comprehensive integration tests
refactor: optimize model verification performance
- Branch naming:
feature/descriptionorfix/issue-number - Tests: All tests pass, new tests added
- Documentation: Updated if needed
- Review: At least one maintainer review
- Merge: Squash merge with descriptive commit message
Bug reports should include:
- Go version and OS
- Full error messages and stack traces
- Steps to reproduce
- Expected vs. actual behavior
- Configuration files (with sensitive data removed)
# Clean and rebuild
go clean -cache
go mod tidy
go build ./...# Run tests with verbose output
go test -v -run TestFailingTest
# Debug with race detector
go test -race -run TestFailingTest# Update dependencies
go get -u ./...
# Clean module cache
go clean -modcache# Profile application
go tool pprof http://localhost:8080/debug/pprof/profile- Q4 2024: Advanced model comparison tools
- Q1 2025: Real-time performance monitoring dashboard
- Q2 2025: Custom verification test frameworks
- Q3 2025: Multi-cloud provider optimization
- Q4 2025: AI-powered test case generation
- Go 1.22+ Migration: Utilize new language features
- Performance Optimizations: Further reduce latency
- Security Enhancements: Advanced threat detection
- Scalability Improvements: Support for 1000+ concurrent verifications
This developer guide provides comprehensive information for contributing to and extending LLM Verifier. For user-facing documentation, see the User Manual.