Skip to content

Releases: VynFi/VynFi-python

v1.4.0 — DataSynth 3.0 + VynFi API Adoption

16 Apr 18:30

Choose a tag to compare

Minor release adding support for DataSynth 3.0 features: scenario packs,
fingerprint synthesis, adversarial ONNX probing, AI-assisted config tuning,
and the dashboard co-pilot. All features verified end-to-end against the live API.

New Endpoints

Scenario Packs (client.scenarios.packs())

Eleven built-in counterfactual simulations across four categories:

Category Packs
Fraud vendor_collusion_ring, management_override, ghost_employee, procurement_kickback, channel_stuffing
Control failures sox_material_weakness, it_control_breakdown
Macro recession_2008_replay, supply_chain_disruption_q3, interest_rate_shock_300bp
Operational erp_migration_cutover
packs = client.scenarios.packs()
scenario = client.scenarios.create(
    name="Q3 revenue stress",
    generation_config={
        "sector": "retail", "rows": 10000,
        "scenarios": {"enabled": True, "packs": ["channel_stuffing"]},
    },
)
client.scenarios.run(scenario.id)
diff = client.scenarios.diff(scenario.id)

AI Tuning (client.jobs.tune(), Scale+)

suggestion = client.jobs.tune(job_id, target_scores={"overall": 0.95})
print(suggestion.explanation)
# -> {original_config, suggested_config, explanation, quality_summary}

Dashboard Co-pilot (client.ai.chat(), Scale+)

reply = client.ai.chat("Which fraud packs are right for audit training?")

Fingerprint Synthesis (client.fingerprint.synthesize(), Team+)

# Privacy-preserving synthesis from a .dsf fingerprint
submission = client.fingerprint.synthesize(
    "./private_data.dsf",
    rows=10000,
    backend="statistical",  # or "neural"/"hybrid" (Scale+)
)

Adversarial Probing (client.adversarial.probe(), Enterprise)

# Probe an ONNX fraud detector for decision-boundary weaknesses
probe = client.adversarial.probe("./model.onnx", n_probes=10000)
results = client.adversarial.results(probe.id)

Config-side DS 3.0 features (no SDK changes needed)

  • Neural diffusion: diffusion.backend = "neural" | "hybrid" with neural.* subsection (Scale+)
  • Quality gates: qualityGates.profile = "standard" | "strict" | "audit" (Team+)
  • Custom interventions: scenarios.interventions[].target/value/timing (Scale+)

Upstream DataSynth fixes now live

  • OCPM fields populated on JE headersocpm_event_ids, ocpm_object_ids, ocpm_case_id now carry full process mining metadata (was empty in 2.3.x). Verified 209/300 entries on a sample retail job.
  • is_fraud on document flow records, display_name on banking customers, numericMode: native, analytics/labels/process_mining output dirs — all confirmed.

Still upstream

  • exportLayout: flat hangs the DataSynth binary — use the default nested layout until upstream fix lands.

Other changes

  • Default client timeout bumped 30s → 60s (generate_quick server-side limit is 30s; 30s default was too tight with network latency).
  • Scenarios.create() contract updated to DS 3.0 shape ({name, generation_config}). Legacy template_id/interventions kwargs still work — auto-folded into config.

Four new examples

Full Changelog

v1.3.0...v1.4.0

v1.3.0 — DataSynth 2.3 + VynFi API 2.0 Features

13 Apr 07:46

Choose a tag to compare

Major release adding support for DataSynth 2.3 + VynFi API 2.0 features.
All features verified end-to-end against the live API.

New Endpoints

# Pre-built statistical analytics for a completed job
a = client.jobs.analytics(job_id)
print(f"Benford MAD: {a.benford_analysis.mad:.4f}")
print(f"AML coverage: {a.banking_evaluation.aml.typology_coverage:.2%}")

# Rate-controlled NDJSON streaming for TB-scale jobs (Scale tier+)
for envelope in client.jobs.stream_ndjson(job_id, rate=500, progress_interval=1000):
    if envelope.get("type") == "_progress":
        print(f"  {envelope['lines_emitted']:,} lines emitted")
    else:
        my_pipeline.send(envelope)

# Storage quota validation for TB-scale jobs
size = client.configs.estimate_size(config=my_config)
print(f"~{size.estimated_files} files, ~{size.estimated_bytes / 1e9:.1f} GB")
print(f"Tier quota: {size.tier_quota_bytes / 1e12:.1f} TB")

# Raw DataSynth YAML config submission (Scale tier+)
result = client.configs.submit_raw(yaml="rows: 1000\nsector: retail")

Transparent Archive Backends

JobArchive now seamlessly handles both legacy zip archives and the new TB-scale managed_blob manifests with presigned URLs:

archive = client.jobs.download_archive(job_id)
print(archive.backend)          # "zip" or "managed_blob"
entries = archive.json("journal_entries.json")  # lazy fetch via presigned URL if blob

DataSynth 2.3 Output Modes

job = client.jobs.generate_config(config={
    "sector": "retail",
    "rows": 1000,
    "output": {
        "exportLayout": "flat",    # one row per line, header merged ✓ verified live
        "numericMode": "native",   # JSON numbers (upstream DataSynth bug pending)
    },
})

New Models

  • Analytics (15 models): JobAnalytics, BenfordAnalysis, AmountDistributionAnalysis, VariantAnalysis, BankingEvaluation, KycCompletenessAnalysis, AmlDetectabilityAnalysis, CrossLayerCoherenceAnalysis, VelocityQualityAnalysis, FalsePositiveAnalysis, TypologyDetection
  • Sizing: EstimateSizeResponse, SizeBucket
  • Raw config: RawConfigResponse

Bug Fixes

  • CamelCase deserialization for JobFileList, JobFile, EstimateSizeResponse — were silently returning defaults when API actually had data
  • Download timeout extended from 30s → 5min (was breaking on large archive downloads)

Process Mining Notebook Enhanced

05_process_mining_ocel.ipynb now covers:

  • All 8 DataSynth processes (O2C, P2P, S2C, H2R, MFG, Banking, Audit, BankRecon)
  • OCEL 2.0 readiness section
  • Cross-process traceability via cross_process_links.json

New Examples

  • analytics_export.py — pre-built analytics workflow
  • ndjson_streaming.py — rate-controlled streaming for TB-scale
  • native_mode.py — DataSynth 2.3 native + flat layout

New Output Categories (DataSynth 2.3)

Category Description
analytics/ Pre-built statistical evaluations (Benford, distributions, variants, banking)
labels/ Anomaly labels + fraud red flags (CSV/JSON/JSONL formats)
process_mining/ Full OCEL 2.0 event log + objects + relationships (19,974 events + 7,381 objects in a sample retail job)

Verification

10 of 11 server-side fixes verified live. See docs/v1.3.0-verification-report.md for details.

Full Changelog

v1.2.0...v1.3.0

v1.2.0 — File Listing, Output Estimates, Per-File Download

11 Apr 11:23

Choose a tag to compare

What's New

Ships support for 3 API features deployed today by the API team.

File listing with schemas

List all files in a completed job's archive without downloading the full zip:

file_list = client.jobs.list_files(job_id)
print(f"{file_list.total_files} files, {file_list.total_size_bytes / 1e6:.0f} MB")

for f in file_list.files:
    cols = ", ".join(s.name for s in f.schema_[:3])
    print(f"  {f.path} ({f.size_bytes:,} bytes) [{cols}, ...]")

Output size estimates

estimate_cost() now returns expected output dimensions before you run a job:

est = client.configs.estimate_cost(config=my_config)
print(f"Credits: {est.total_credits}")
print(f"Output: ~{est.output.estimated_files} files, ~{est.output.estimated_size_bytes / 1e6:.0f} MB")
print(f"Note: {est.output.note}")

Per-file download (now working)

Download individual files from a job without pulling the full archive:

data = client.jobs.download_file(job_id, "journal_entries.json")
# Also supports subdirectory paths:
data = client.jobs.download_file(job_id, "banking/banking_customers.json")

New types

  • JobFileList, JobFile, FileSchema -- file listing response models
  • OutputEstimate -- output size estimate on EstimateCostResponse.output

Full Changelog

v1.1.0...v1.2.0

v1.1.0 — JobArchive, Examples Suite, Endpoint Fixes

11 Apr 11:01

Choose a tag to compare

What's New

JobArchive — ergonomic archive access

Downloaded job archives are now wrapped in a JobArchive class for easy file access:

archive = client.jobs.download_archive(job_id)
archive.files()            # list all 80+ files
archive.categories()       # ['banking', 'document_flows', 'esg', ...]
archive.json("journal_entries.json")  # parse JSON directly
archive.find("esg/*")     # glob-style search
archive.summary()          # file counts and sizes by category
archive.extract_to("./output")  # extract to disk

pandas: archive_to_dataframes()

Convert all JSON files in an archive to DataFrames in one call, with automatic header/lines flattening for journal entries:

from vynfi.integrations.pandas import archive_to_dataframes

frames = archive_to_dataframes(archive)
# {'journal_entries.json': DataFrame(95881 rows), 'banking/banking_customers.json': DataFrame(620 rows), ...}

14 examples — notebooks + scripts

Notebook Use Case
01_quickstart 5-minute getting started
02_audit_data_deep_dive Benford's law, debit/credit validation, SOX controls
03_fraud_detection_lab Labeled fraud data, RF classifier (98.3% accuracy)
04_document_flow_audit_trail P2P/O2C chains, three-way matching, gap analysis
05_process_mining_ocel Event log reconstruction, variant analysis
06_esg_sustainability_reporting Emissions, energy, diversity, materiality matrix
07_aml_compliance_testing KYC, transaction monitoring, risk scoring, SAR

Plus 7 standalone scripts: quickstart, streaming, pandas workflow, config management, multi-period sessions, what-if scenarios, quality monitoring.

Bug fixes

  • Config endpoints used wrong URL path (/v1/configs/.../v1/config/... for validate, estimate-cost, compose)
  • Sessions generate_next() used wrong URL path (/generate-next/generate)

All fixes confirmed against the Rust SDK reference and live API.

Full Changelog

v1.0.0...v1.1.0

v1.0.0

10 Apr 13:28

Choose a tag to compare

VynFi Python SDK v1.0.0

First stable release. Full API parity with the VynFi Rust SDK reference implementation.

New Resources

  • Configs — save, validate, estimate cost, and compose generation configs
  • Credits — purchase prepaid packs, check balance, view history
  • Sessions — multi-period generation sessions
  • Scenarios — what-if scenarios with causal graph templates
  • Notifications — list and mark-read

New Methods on Existing Resources

  • jobs.generate_config(), jobs.download_file(), jobs.wait()
  • catalog.list_templates()
  • billing.checkout(), billing.portal()

Ecosystem Integrations

  • pip install vynfi[pandas] — download job output as DataFrames
  • pip install vynfi[polars] — Polars DataFrame support

Other

  • 12 resources, 52 tests, all verified against the live API
  • ForbiddenError (403), dedicated QuickJobResponse / CancelJobResponse types
  • All Pydantic models aligned with actual API response shapes

See CHANGELOG.md for full details.

v0.1.0

05 Mar 13:24

Choose a tag to compare

Initial release

First public release of the VynFi Python SDK.

Features

  • Full coverage of VynFi API resources: Jobs, Catalog, Usage, API Keys, Quality, Webhooks, Billing
  • Automatic retry on 429/5xx with exponential backoff
  • SSE streaming for job progress
  • Typed responses via Pydantic v2
  • Python 3.9–3.13 support

Installation

pip install vynfi

Quick start

from vynfi import VynFi

client = VynFi(api_key="vf_live_...")
job = client.jobs.generate(sector="banking", tables={"transactions": 1000})