Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

VynFi SDK Examples

Hands-on examples demonstrating synthetic financial data generation with the VynFi Python SDK. Each example is tested against the live API and runs end-to-end.

Setup

pip install vynfi[pandas]
export VYNFI_API_KEY="vf_live_..."

Quick Start Scripts

Script Description
quickstart.py Generate your first dataset in 10 lines
streaming_progress.py Real-time SSE progress monitoring for async jobs
pandas_workflow.py DataFrame-native analysis with archive_to_dataframes
config_cost_estimation.py Validate configs, estimate costs, save/delete configs
multi_period_sessions.py Coherent multi-period data with the Sessions API
what_if_scenarios.py Causal counterfactual analysis with the Scenarios API
quality_monitoring.py Track data quality scores (Benford, correlation, distribution)
analytics_export.py Pre-built analytics from DataSynth 2.3 (Benford, distributions, variants, banking)
ndjson_streaming.py Rate-controlled NDJSON streaming for TB-scale jobs (Scale tier+)
streaming_anomaly_detection.py Real-time anomaly alerting from anomaly_labels.jsonl
streaming_aggregator.py Running aggregates (totals, top accounts) with O(1) memory
streaming_etl.py Chunk-batched DataFrame → Parquet/CSV writes for TB-scale jobs
streaming_fraud_monitor.py Back-pressured fraud alerting (token-bucket rate limit) for Slack/PagerDuty/SIEM
scenario_packs.py DS 3.0 — List + run the 11 built-in scenario packs (fraud, control failure, macro, operational)
ai_tune.py DS 3.0 — LLM-powered config suggestion based on job quality scores (Scale+)
fingerprint_synthesis.py DS 3.0 — Privacy-preserving synthesis from .dsf fingerprint files (Team+)
neural_diffusion.py DS 3.0 — Neural / hybrid diffusion backends for higher-fidelity synth (Scale+)
pm4py_integration.py Load VynFi OCEL events into pm4py → Petri net discovery + conformance checking
celonis_integration.py Export VynFi events as Celonis IBC-compatible event/case CSVs
sap_test_data.py Map journal entries + vendors → SAP BKPF/BSEG/LFA1/LFB1 tables for S/4HANA testing
ml_training_pipeline.py End-to-end fraud detection: generate → flatten → train RandomForest → CV + save model
adversarial_fraud_training.py Train fraud classifier → export ONNX → probe with adversarial samples (Enterprise)
document_level_fraud.py DS 3.1 — doc-level fraudRate + propagateToLines + is_fraud_propagated stratification
behavioral_fraud_patterns.py DS 3.1 — weekend/round/off-hours/post-close fraud biases; feature importance showcase
sector_dag_presets.py DS 3.1 — retail/manufacturing/financial_services causal DAG presets
audit_opinions_kam.py DS 3.1 — ISA 700 audit opinions + ISA 701 key audit matters from audit/
native_mode.py DataSynth 2.3 native numbers + flat layout — drop the conversion boilerplate

Jupyter Notebooks

Core Workflow

Notebook Description
01_quickstart.ipynb 5-minute getting started -- catalog, generation, download, pandas

Domain Deep Dives

Notebook Use Case Key Data
02_audit_data_deep_dive.ipynb Journal entry forensics and audit analytics Benford's law, debit/credit validation, SOX controls, manual entries
03_fraud_detection_lab.ipynb ML fraud detection with labeled data Fraud packs, feature engineering, detection models, SOD violations
04_document_flow_audit_trail.ipynb P2P/O2C traceability and three-way matching PO, GR, Invoice, Payment chains, gap analysis
05_process_mining_ocel.ipynb Process mining with OCEL 2.0 event logs Variant analysis, bottleneck detection, conformance checking
06_esg_sustainability_reporting.ipynb ESG/sustainability reporting (GRI, TCFD) Emissions, energy, water, diversity, materiality
07_aml_compliance_testing.ipynb AML/KYC compliance testing Transaction monitoring, structuring detection, SAR narratives
counterfactual_simulation.ipynb DS 3.0 — Scenario packs, baseline vs counterfactual, recession 2008 replay, diff analysis
sox_compliance_testing.ipynb DS 3.0 — SOX material weakness simulation, SoD violations, PCAOB AS 2201 mapping
gnn_vendor_networks.ipynb AML networks — customer graph build, Louvain communities, link prediction, optional torch-geometric GCN

Sample Output

Benford's Law Analysis

VynFi-generated journal entries closely follow Benford's Law (MAD = 0.004), confirming realistic leading-digit distributions across 95,000+ line items:

Benford's Law

Chart of Accounts

A 430-account chart of accounts with full GAAP classification -- assets, liabilities, equity, revenue, and expense accounts:

Chart of Accounts

GL Account Activity

Transaction volume distribution across the top 15 general ledger accounts, showing realistic concentration in AP (2900), AR (2000), and revenue sub-accounts:

Top Accounts

GHG Emissions by Scope

ESG data includes Scope 1, 2, and 3 greenhouse gas emission records with realistic proportions:

Emissions


What Data Does VynFi Generate?

Each generation produces a rich archive with 60-80+ files across categories:

(root)/
  journal_entries.json      -- Double-entry journal entries (header + lines)
  chart_of_accounts.json    -- 430-account hierarchical CoA
  lineage_graph.json        -- Data provenance graph
  run_manifest.json         -- Generation metadata and config
  data_quality_stats.json   -- Inline quality metrics
  balance_validation.json   -- Debit/credit balance proof

banking/                    -- 700k+ transactions, customer KYC, AML labels
document_flows/             -- PO, GR, Invoice, Payment, SO, Delivery chains
esg/                        -- 14 files: emissions, energy, water, waste, diversity
internal_controls/          -- SOD violations, COSO mappings, control rules
master_data/                -- Vendors, customers, employees, materials, assets
subledger/                  -- AP/AR aging, depreciation, inventory, fixed assets
tax/                        -- Provisions, deferred tax, ETR reconciliation
treasury/                   -- Cash positions, forecasts
events/                     -- Process evolution, organizational events
project_accounting/         -- Projects, milestones, earned value metrics
hr/                         -- Employee change history
relationships/              -- Cross-process links

Key Facts from Live Testing

Metric Value
Journal entry line items 95,881 per generation
Debit/credit balance 10,296/10,296 documents balanced
Benford's Law MAD 0.004 (excellent conformance)
Chart of accounts 430 accounts across 5 types
Banking transactions 697,897 per generation
AML suspicious labels 286 flagged transactions
KYC profiles 620 customers with risk tiers
ESG data files 14 covering emissions, energy, water, waste, diversity
SOD violations 1,026 segregation-of-duty records
Fraud detection accuracy 98.3% (Random Forest on labeled data)
Document flow references 441 cross-document links

Available Sectors

Sector Tables Quality Score Multiplier
Retail 4 96 1.5x
Financial Services 4 98 1.5x
Manufacturing 5 94 1.5x

Fraud Packs

Pack Description
revenue_fraud Fictitious revenue, channel stuffing, cut-off manipulation
vendor_kickback Shell vendors, inflated invoices, split purchases
payroll_ghost Ghost employees, unauthorized salary changes, overtime abuse
management_override Journal entry manipulation, related party transactions
comprehensive All fraud types combined

Generation Templates

16 pre-built templates covering:

  • Sector packages: Retail Complete, Manufacturing Full Cost Accounting, Financial Services Treasury
  • Fraud scenarios: Revenue Manipulation, Procurement Kickbacks, Ghost Employees, Management Override, Multi-Scheme
  • Country variants: Germany (HGB), France (PCG), Japan (IFRS)
  • Special purpose: ESG Reporting, Fraud Investigation