A Python library for parsing, validating, and generating XARF v4 (eXtended Abuse Reporting Format) reports.
- Parse XARF reports from JSON with validation and typed results
- Generate XARF-compliant reports with auto-generated metadata (UUIDs, timestamps)
- Validate reports against the official JSON schemas with detailed errors and warnings
- Full type support with Pydantic v2 discriminated union models for all 7 categories
- v3 backward compatibility with automatic detection and conversion
- Schema-driven — validation rules derived from the official xarf-spec schemas, not hardcoded
pip install xarffrom xarf import parse
# Missing first_seen and source_port produce validation warnings.
result = parse({
"xarf_version": "4.2.0",
"report_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"timestamp": "2024-01-15T10:30:00Z",
# "first_seen": "2024-01-15T10:00:00Z",
"reporter": {
"org": "Security Team",
"contact": "[email protected]",
"domain": "example.com",
},
"sender": {
"org": "Security Team",
"contact": "[email protected]",
"domain": "example.com",
},
"source_identifier": "192.0.2.100",
# "source_port": 1234,
"category": "connection",
"type": "ddos",
"evidence_source": "honeypot",
"destination_ip": "203.0.113.10",
"protocol": "tcp",
})
if not result.errors:
print(result.report.category) # 'connection'
else:
for e in result.errors:
print(f"{e.field}: {e.message}")from xarf import create_report, create_evidence
# Returns XARFEvidence with content_type, payload (base64), hash, size, description
evidence = create_evidence(
"message/rfc822",
raw_email_bytes,
description="Original spam email",
hash_algorithm="sha256",
)
# xarf_version, report_id, and timestamp are auto-generated
result = create_report(
category="messaging",
type="spam",
source_identifier="192.0.2.100",
reporter={
"org": "Example Security",
"contact": "[email protected]",
"domain": "example.com",
},
sender={
"org": "Example Security",
"contact": "[email protected]",
"domain": "example.com",
},
evidence_source="spamtrap",
description="Spam email detected from source",
protocol="smtp",
smtp_from="[email protected]",
evidence=[evidence],
)
import json
print(json.dumps(result.report.model_dump(by_alias=True, exclude_none=True), indent=2))Parse and validate a XARF report from JSON. Supports both v4 and v3 (legacy) formats — v3 reports are automatically converted to v4 with deprecation warnings.
from xarf import parse
result = parse(json_data, strict=False, show_missing_optional=False)Parameters:
json_data: str | dict— JSON string or dict containing a XARF reportstrict: bool— Returnreport=Noneon validation failures (default:False)show_missing_optional: bool— Populateresult.infowith missing optional field details (default:False)
Returns ParseResult:
report: AnyXARFReport | None— The parsed report, typed by category (e.g.,DdosReport,SpamReport)errors: list[ValidationError]— Structured validation errors (each has.field,.message,.value)warnings: list[ValidationWarning]— Structured validation warningsinfo: list[dict[str, str]] | None— Missing optional field info (only whenshow_missing_optional=True)
Create a validated XARF report with auto-generated metadata. Automatically fills xarf_version, report_id (UUID v4), and timestamp (ISO 8601 UTC).
from xarf import create_report
result = create_report(
category="messaging",
type="spam",
source_identifier="192.0.2.100",
reporter={"org": "...", "contact": "...", "domain": "..."},
sender={"org": "...", "contact": "...", "domain": "..."},
# category-specific fields as keyword arguments
protocol="smtp",
)Parameters:
category: str— One of the 7 XARF categoriestype: str— Report type within the categorysource_identifier: str— IP address or identifier of the abuse sourcereporter: dict | ContactInfo— Reporting organization detailssender: dict | ContactInfo— Sending organization detailsstrict: bool— Returnreport=Noneon validation failures (default:False)show_missing_optional: bool— Populateresult.infowith missing optional field details (default:False)**kwargs— Category-specific fields (e.g.,protocol,destination_ip,smtp_from)
Returns CreateReportResult:
report: AnyXARFReport | None— The generated reporterrors: list[ValidationError]— Structured validation errors (field,message,value)warnings: list[ValidationWarning]— Structured validation warningsinfo: list[dict[str, str]] | None— Missing optional field info (only whenshow_missing_optional=True)
Create an evidence object with automatic base64 encoding, hashing, and size calculation.
from xarf import create_evidence
evidence = create_evidence(
"message/rfc822",
raw_bytes,
description="Original email",
hash_algorithm="sha256",
)Parameters:
content_type: str— MIME type of the evidence (e.g.,'message/rfc822')payload: bytes | str— The evidence data (strings are UTF-8 encoded)description: str | None— Human-readable descriptionhash_algorithm: Literal["sha256", "sha512", "sha1", "md5"]— Hash algorithm (default:"sha256")
Returns XARFEvidence with computed hash, size, and base64-encoded payload.
Access schema-derived validation rules and metadata programmatically.
from xarf import schema_registry
# Get all valid categories
schema_registry.get_categories()
# {'messaging', 'connection', 'content', 'infrastructure', 'copyright', 'vulnerability', 'reputation'}
# Get valid types for a category
schema_registry.get_types_for_category("connection")
# {'ddos', 'port_scan', 'login_attack', ...}
# Check if a category/type combination is valid
schema_registry.is_valid_type("connection", "ddos") # True
# Get field metadata including descriptions
schema_registry.get_field_metadata("confidence")
# FieldMetadata(description='...', required=False, recommended=True, ...)Both parse() and create_report() run validation internally. Additional behaviors:
- Unknown fields trigger warnings (or cause
report=Nonein strict mode) - Missing optional fields can be discovered with
show_missing_optional=True:
result = parse(report, show_missing_optional=True)
if result.info:
for item in result.info:
print(f"{item['field']}: {item['message']}")
# e.g., "description: OPTIONAL - Human-readable description of the abuse"
# e.g., "confidence: RECOMMENDED - Confidence score between 0.0 and 1.0"Type narrowing after parsing — use isinstance or check .category/.type:
from xarf import parse, DdosReport
result = parse(json_data)
if isinstance(result.report, DdosReport):
print(result.report.destination_ip)
# or check attributes directly
if result.report and result.report.category == "connection":
print(result.report.type)The library automatically detects XARF v3 reports (by the Version field) and converts them to v4 during parsing. Converted reports include legacy_version: '3' and deprecation warnings.
from xarf import parse
result = parse(v3_report)
print(result.report.xarf_version) # '4.2.0'
print(result.report.category) # mapped category (e.g., 'messaging')
print(result.report.legacy_version) # '3'
# result.warnings includes deprecation notice + conversion detailsYou can also use the low-level utilities directly:
from xarf import is_v3_report, convert_v3_to_v4, get_v3_deprecation_warning
if is_v3_report(json_data):
v4_data = convert_v3_to_v4(json_data)
print(get_v3_deprecation_warning())Unknown v3 report types cause a parse error listing the supported types. See MIGRATION_V3_TO_V4.md for the full type mapping and migration strategies.
This library validates against the official xarf-spec JSON schemas. Schemas are bundled with the package and pinned to the spec version configured in pyproject.toml:
[tool.xarf]
spec_version = "v4.2.0"# Re-fetch schemas (e.g., to pick up a newer spec version)
python -m xarf fetch-schemas
# Check whether a newer spec version is available
python -m xarf check-schema-updatesTo update to a newer spec version, change spec_version in pyproject.toml and run python -m xarf fetch-schemas.
pytest # Run tests
pytest --cov=xarf # Run tests with coverage
ruff check xarf/ # Lint
ruff format --check xarf/ # Check formatting
mypy --strict xarf/ # Type-checkSee CONTRIBUTING.md for development guidelines.