Hands-on examples demonstrating synthetic financial data generation with the VynFi Python SDK. Each example is tested against the live API and runs end-to-end.
pip install vynfi[pandas]
export VYNFI_API_KEY="vf_live_..."| Script | Description |
|---|---|
quickstart.py |
Generate your first dataset in 10 lines |
streaming_progress.py |
Real-time SSE progress monitoring for async jobs |
pandas_workflow.py |
DataFrame-native analysis with archive_to_dataframes |
config_cost_estimation.py |
Validate configs, estimate costs, save/delete configs |
multi_period_sessions.py |
Coherent multi-period data with the Sessions API |
what_if_scenarios.py |
Causal counterfactual analysis with the Scenarios API |
quality_monitoring.py |
Track data quality scores (Benford, correlation, distribution) |
analytics_export.py |
Pre-built analytics from DataSynth 2.3 (Benford, distributions, variants, banking) |
ndjson_streaming.py |
Rate-controlled NDJSON streaming for TB-scale jobs (Scale tier+) |
streaming_anomaly_detection.py |
Real-time anomaly alerting from anomaly_labels.jsonl |
streaming_aggregator.py |
Running aggregates (totals, top accounts) with O(1) memory |
streaming_etl.py |
Chunk-batched DataFrame → Parquet/CSV writes for TB-scale jobs |
streaming_fraud_monitor.py |
Back-pressured fraud alerting (token-bucket rate limit) for Slack/PagerDuty/SIEM |
scenario_packs.py |
DS 3.0 — List + run the 11 built-in scenario packs (fraud, control failure, macro, operational) |
ai_tune.py |
DS 3.0 — LLM-powered config suggestion based on job quality scores (Scale+) |
fingerprint_synthesis.py |
DS 3.0 — Privacy-preserving synthesis from .dsf fingerprint files (Team+) |
neural_diffusion.py |
DS 3.0 — Neural / hybrid diffusion backends for higher-fidelity synth (Scale+) |
pm4py_integration.py |
Load VynFi OCEL events into pm4py → Petri net discovery + conformance checking |
celonis_integration.py |
Export VynFi events as Celonis IBC-compatible event/case CSVs |
sap_test_data.py |
Map journal entries + vendors → SAP BKPF/BSEG/LFA1/LFB1 tables for S/4HANA testing |
ml_training_pipeline.py |
End-to-end fraud detection: generate → flatten → train RandomForest → CV + save model |
adversarial_fraud_training.py |
Train fraud classifier → export ONNX → probe with adversarial samples (Enterprise) |
document_level_fraud.py |
DS 3.1 — doc-level fraudRate + propagateToLines + is_fraud_propagated stratification |
behavioral_fraud_patterns.py |
DS 3.1 — weekend/round/off-hours/post-close fraud biases; feature importance showcase |
sector_dag_presets.py |
DS 3.1 — retail/manufacturing/financial_services causal DAG presets |
audit_opinions_kam.py |
DS 3.1 — ISA 700 audit opinions + ISA 701 key audit matters from audit/ |
native_mode.py |
DataSynth 2.3 native numbers + flat layout — drop the conversion boilerplate |
| Notebook | Description |
|---|---|
01_quickstart.ipynb |
5-minute getting started -- catalog, generation, download, pandas |
| Notebook | Use Case | Key Data |
|---|---|---|
02_audit_data_deep_dive.ipynb |
Journal entry forensics and audit analytics | Benford's law, debit/credit validation, SOX controls, manual entries |
03_fraud_detection_lab.ipynb |
ML fraud detection with labeled data | Fraud packs, feature engineering, detection models, SOD violations |
04_document_flow_audit_trail.ipynb |
P2P/O2C traceability and three-way matching | PO, GR, Invoice, Payment chains, gap analysis |
05_process_mining_ocel.ipynb |
Process mining with OCEL 2.0 event logs | Variant analysis, bottleneck detection, conformance checking |
06_esg_sustainability_reporting.ipynb |
ESG/sustainability reporting (GRI, TCFD) | Emissions, energy, water, diversity, materiality |
07_aml_compliance_testing.ipynb |
AML/KYC compliance testing | Transaction monitoring, structuring detection, SAR narratives |
counterfactual_simulation.ipynb |
DS 3.0 — Scenario packs, baseline vs counterfactual, recession 2008 replay, diff analysis | |
sox_compliance_testing.ipynb |
DS 3.0 — SOX material weakness simulation, SoD violations, PCAOB AS 2201 mapping | |
gnn_vendor_networks.ipynb |
AML networks — customer graph build, Louvain communities, link prediction, optional torch-geometric GCN |
VynFi-generated journal entries closely follow Benford's Law (MAD = 0.004), confirming realistic leading-digit distributions across 95,000+ line items:
A 430-account chart of accounts with full GAAP classification -- assets, liabilities, equity, revenue, and expense accounts:
Transaction volume distribution across the top 15 general ledger accounts, showing realistic concentration in AP (2900), AR (2000), and revenue sub-accounts:
ESG data includes Scope 1, 2, and 3 greenhouse gas emission records with realistic proportions:
Each generation produces a rich archive with 60-80+ files across categories:
(root)/
journal_entries.json -- Double-entry journal entries (header + lines)
chart_of_accounts.json -- 430-account hierarchical CoA
lineage_graph.json -- Data provenance graph
run_manifest.json -- Generation metadata and config
data_quality_stats.json -- Inline quality metrics
balance_validation.json -- Debit/credit balance proof
banking/ -- 700k+ transactions, customer KYC, AML labels
document_flows/ -- PO, GR, Invoice, Payment, SO, Delivery chains
esg/ -- 14 files: emissions, energy, water, waste, diversity
internal_controls/ -- SOD violations, COSO mappings, control rules
master_data/ -- Vendors, customers, employees, materials, assets
subledger/ -- AP/AR aging, depreciation, inventory, fixed assets
tax/ -- Provisions, deferred tax, ETR reconciliation
treasury/ -- Cash positions, forecasts
events/ -- Process evolution, organizational events
project_accounting/ -- Projects, milestones, earned value metrics
hr/ -- Employee change history
relationships/ -- Cross-process links
| Metric | Value |
|---|---|
| Journal entry line items | 95,881 per generation |
| Debit/credit balance | 10,296/10,296 documents balanced |
| Benford's Law MAD | 0.004 (excellent conformance) |
| Chart of accounts | 430 accounts across 5 types |
| Banking transactions | 697,897 per generation |
| AML suspicious labels | 286 flagged transactions |
| KYC profiles | 620 customers with risk tiers |
| ESG data files | 14 covering emissions, energy, water, waste, diversity |
| SOD violations | 1,026 segregation-of-duty records |
| Fraud detection accuracy | 98.3% (Random Forest on labeled data) |
| Document flow references | 441 cross-document links |
| Sector | Tables | Quality Score | Multiplier |
|---|---|---|---|
| Retail | 4 | 96 | 1.5x |
| Financial Services | 4 | 98 | 1.5x |
| Manufacturing | 5 | 94 | 1.5x |
| Pack | Description |
|---|---|
revenue_fraud |
Fictitious revenue, channel stuffing, cut-off manipulation |
vendor_kickback |
Shell vendors, inflated invoices, split purchases |
payroll_ghost |
Ghost employees, unauthorized salary changes, overtime abuse |
management_override |
Journal entry manipulation, related party transactions |
comprehensive |
All fraud types combined |
16 pre-built templates covering:
- Sector packages: Retail Complete, Manufacturing Full Cost Accounting, Financial Services Treasury
- Fraud scenarios: Revenue Manipulation, Procurement Kickbacks, Ghost Employees, Management Override, Multi-Scheme
- Country variants: Germany (HGB), France (PCG), Japan (IFRS)
- Special purpose: ESG Reporting, Fraud Investigation



