HTTP service for PDF/UA and PDF/A validation using VeraPDF.
Live Demo: https://pdf-verapdf-service.onrender.com
The German Barrierefreiheitsstärkungsgesetz (BFSG), which came into force on 28 June 2025, implements the European Accessibility Act (EAA) at national level. Together with increasingly strict procurement requirements from libraries and public institutions, this means that accessibility compliance is no longer optional for publishers. In practice, PDF/UA has become the central standard for meeting these obligations.
Enterprise-grade validation tools exist, but not every team or organisation has the budget, infrastructure, or organisational capacity to deploy them at scale. This service explores what is realistically achievable with constrained resources: limited CPU, limited memory, and simple cloud infrastructure - while still addressing real publishing requirements.
At the same time, in many publishing workflows accessibility validation still happens late, manually, and file by file. This makes it hard to scale, difficult to audit, and poorly suited for integration into modern production pipelines. Treating PDF validation as an API rather than a desktop tool enables earlier feedback, reproducible checks, and clearer responsibility boundaries between editorial, production, and technology.
This service wraps the open-source VeraPDF validation engine in a modern, cloud-native API. It enables:
- Accessibility audits at scale — batch validation of entire document repositories
- CI/CD integration — validate PDFs as part of automated publishing workflows
- Real-time feedback — WebSocket-based progress reporting for responsive UIs
- Cloud infrastructure — deploy on Render’s free tier or your own AWS environment
This repository is a deliberately small, opinionated reference implementation of how PDF/UA and PDF/A validation can be exposed as a cloud-native service using open standards.
It is not a finished product, not a hosted offering, and not intended to replace existing enterprise validation tools.
It is meant to spark discussion around:
- automation vs. manual QA in publishing
- accessibility validation as infrastructure, not as desktop task
- what 'good enough' cloud deployments look like for mid-sized publishers
┌─────────────────────────────────────────────────────────────────┐
│ Internet │
└─────────────────────────────────┬───────────────────────────────┘
│
▼
┌──────────────────┐
│ Load Balancer │
│ (ALB / Render) │
└─────────┬────────┘
│
▼
┌────────────────────────────────┐
│ VeraPDF Service │
│ (Java 21 / Javalin) │
│ │
│ ┌──────────────────────────┐ │
│ │ Validation Engine │ │
│ │ - Queue management │ │
│ │ - Progress tracking │ │
│ │ - Concurrent execution │ │
│ └──────────────────────────┘ │
│ │
│ ┌──────────────────────────┐ │
│ │ WebSocket Handler │ │
│ │ - Real-time updates │ │
│ │ - Session management │ │
│ └──────────────────────────┘ │
└────────────────────────────────┘
| Decision | Rationale |
|---|---|
| Javalin over Spring | Minimal footprint for a focused microservice; faster cold starts on free-tier hosting |
| WebSocket for progress | PDF validation can take 30+ seconds; real-time feedback prevents timeout assumptions |
| Queue with admission control | Graceful degradation under load; capacity signals via Retry-After headers |
| OpenAPI as contract | Contract tests validate responses against the spec; consumers get accurate documentation |
| Environment-based config | 12-factor compliance; same image works from laptop to production |
| Endpoint | Method | Description |
|---|---|---|
/health |
GET | Health check |
/status |
GET | Server status, queue info, and capacity |
/config |
GET | Current service configuration |
/profiles |
GET | List available validation profiles |
/validate/async |
POST | Validate a PDF with WebSocket progress updates |
/validate/batch |
POST | Validate multiple PDFs synchronously |
Validate multiple PDFs in a single synchronous request:
# Single file
curl -F "[email protected]" -F "profile=ua1" \
https://verapdf-service.onrender.com/validate/batch
# Multiple files
curl -F "[email protected]" -F "[email protected]" -F "profile=ua1" \
https://verapdf-service.onrender.com/validate/batchResponse:
{
"totalFiles": 2,
"compliantCount": 1,
"nonCompliantCount": 1,
"totalDurationSeconds": 28.5,
"results": [
{
"compliant": true,
"profile": "ua1",
"profileName": "PDF/UA-1 (Universal Accessibility)",
"rulesViolated": 0,
"failedChecks": 0,
"passedChecks": 150,
"violations": [],
"validationDurationSeconds": 12.3,
"fileSize": 1024000,
"summary": "Document is compliant with PDF/UA-1",
"filename": "doc1.pdf"
}
]
}For responsive UIs, use async validation with real-time progress:
curl -F "[email protected]" -F "profile=ua1" \
https://verapdf-service.onrender.com/validate/asyncResponse (202 Accepted):
{
"validationSessionId": "550e8400-e29b-41d4-a716-446655440000",
"status": "queued",
"queuePosition": 1,
"estimatedWaitSeconds": 30,
"message": "Connect to WebSocket for progress updates."
}Connect to WebSocket at wss://verapdf-service.onrender.com/ws:
const ws = new WebSocket('wss://verapdf-service.onrender.com/ws');
ws.send(JSON.stringify({
type: 'register',
validationSessionId: 'your-session-id'
}));
ws.onmessage = (event) => {
const msg = JSON.parse(event.data);
// msg.type: 'queued' | 'started' | 'progress' | 'complete' | 'error'
};Default limits (configurable via environment):
- Maximum 10 files per batch
- Maximum 20MB per file
- Maximum 200MB total request size
Available profiles: ua1, ua2, 1a, 1b, 2a, 2b, 2u, 3a, 3b, 3u, 4, 4e, 4f
cd backend
docker build -t verapdf-service .
docker run -p 8080:8080 verapdf-serviceRequires Java 21+.
cd backend
mvn package
java -jar target/verapdf-service-1.0.0.jarOr for development:
mvn compile exec:javaService runs at http://localhost:8080.
Copy .env.example to .env and customize:
# Resource limits
VERAPDF_LIMIT_MAX_QUEUE_SIZE=5
VERAPDF_LIMIT_MAX_CONCURRENT=1
VERAPDF_LIMIT_MAX_FILE_SIZE_MB=20
# Validation defaults
VERAPDF_VALIDATION_DEFAULT_PROFILE=ua1See backend/.env.example for all options including deployment profiles for different resource tiers.
This project uses a deliberately minimal CI pipeline that focuses on build reproducibility and API stability rather than exhaustive quality gates. The goal is fast feedback and production confidence, not pipeline complexity.
- Push this repo to GitHub
- Go to render.com → New + → Web Service
- Connect your repo
- Render auto-detects the Dockerfile
- Select Free plan, confirm port
8080 - Click Deploy
The terraform/ directory contains a complete ECS Fargate deployment:
cd terraform
# Configure
cp terraform.tfvars.example terraform.tfvars
# Edit terraform.tfvars with your container image URI
# Deploy
terraform init
terraform plan
terraform applySee terraform/README.md for detailed configuration options.
When deploying to production, review these settings:
| Setting | Default | Production Recommendation |
|---|---|---|
VERAPDF_SERVER_CORS_ALLOW_ALL |
true |
Set to false and specify allowed origins |
VERAPDF_LIMIT_MAX_CONCURRENT |
1 |
Increase based on available CPU |
| Log retention | 7 days | Adjust in Terraform for compliance needs |
| HTTPS | Not included | Add ACM certificate and HTTPS listener |
pdf-verapdf-service/
├── openapi.yaml # API specification (contract source of truth)
├── backend/
│ ├── Dockerfile
│ ├── pom.xml
│ └── src/
│ ├── main/java/com/pdfvalidator/
│ │ ├── Application.java # Entry point, route definitions
│ │ ├── Config.java # Environment-based configuration
│ │ ├── ProgressAwarePdfValidator.java
│ │ ├── ValidationWebSocket.java
│ │ └── ...
│ └── test/java/com/pdfvalidator/
│ └── ContractTest.java # OpenAPI contract validation
├── frontend/
│ └── src/
│ ├── App.jsx
│ ├── useValidationWebSocket.jsx # WebSocket hook
│ └── ProgressIndicator.jsx
└── terraform/
├── main.tf
├── ecs.tf
├── alb.tf
└── ...
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch (
git checkout -b feature/improvement) - Run tests (
cd backend && mvn test) - Submit a pull request
For major changes, please open an issue first to discuss the approach.
This project is licensed under the MIT License.
This software uses VeraPDF (MPL-2.0 / GPL-3.0), Javalin (Apache 2.0), and other open-source libraries.