AI-powered schema validation for scientific mission datasets (NASA PDS4, XML, JSON, CSV)
AI Ops Data Validator is a Python tool that validates scientific datasets against official schemas (NASA PDS4, JSON Schema, CSV metadata).
It goes beyond schema checks: the built-in AI reasoning layer explains validation errors in plain English and suggests concrete fixes, making it easier for researchers to debug and correct data.
- ✅ XML Validation — supports XSD and Schematron (PDS4-compliant).
- ✅ JSON Validation — fully compliant with JSON Schema Draft 2020-12.
- ✅ Human-readable reports — outputs Markdown/HTML summaries for researchers.
- ✅ AI Reasoning Layer — groups issues, explains them in plain English, and suggests fixes.
- ✅ Extensible — designed for anomaly detection and automated fix suggestions.
- ✅ CLI Tool — run validations directly from the terminal.
$ aiops-validate examples/bad_label.xml --kind xml --xsd schemas/pds4.xsd --schematron schemas/pds4.sch
Wrote report.md# Validation Report — bad_label.xml
**Summary:** FAIL
Errors: 3 | Warnings: 1
1. ERROR — schema
- Path: `/Product_Observational/Observation_Area`
- Message: Missing child element required by XSD.
- Suggested fix: Add `<Observation_Area>` element according to schema. git clone https://github.com/YugynDprodigy10/aiops-data-validator.git
cd aiops-data-validator
pip install -r requirements.txtpython -m aiops_validator.cli validate examples/bad_label.xml --kind xml --xsd schemas/pds4.xsd --schematron schemas/pds4.schpython -m aiops_validator.cli validate examples/sample.json --kind json --json-schema schemas/schema.jsonaiops_validator/
├── core/ # Models, reasoner, reporting
├── validators/ # XML, JSON, CSV validators
├── fixes/ # Suggested fix generation
├── templates/ # Report templates
└── cli.py # Command-line entrypoint
- Validators: Handle schema-level checks (XSD/JSON Schema).
- Reasoner: Interprets raw logs, produces plain-English explanations.
- Reporter: Outputs Markdown/HTML/JSON reports.
- Fixes: Suggests patches or snippets to resolve issues.
- Add CSV validation (via
frictionlessorpandera). - Implement anomaly detection (range checks, statistical outliers).
- Enable automated fixes with JSON Patch / XML transformations.
- Dockerize for deployment in research pipelines.
Pull requests welcome! See CONTRIBUTING.md.
MIT © 2025 Eugene Taabazuing
Eugene Taabazuing — LinkedIn

