Blog
News, tutorials, and deep dives on synthetic financial data
Privacy-Preserving Synthesis: From Fingerprint to Dataset
VynFi's fingerprint-to-synthesis pipeline extracts differentially private statistical summaries from real data, then generates synthetic datasets that match those summaries without ever seeing the original records. This post walks through the full pipeline.
GNN-Generated Vendor Networks for AML Detection
VynFi 3.0 uses graph neural networks to generate realistic entity relationship graphs — vendor networks, correspondent banking chains, shell company structures — for AML model training. This post covers the GNN edge predictor architecture and criminal network simulation.
SOX Compliance Testing with Simulated Material Weaknesses
VynFi 3.0 can simulate SOX-relevant control failures — segregation of duties violations, unauthorized journal entries, IT access control breakdowns — and generate datasets that exhibit the downstream financial statement impact. This post walks through the simulation pipeline.
Stress Testing with Causal DAGs: A 2008 Recession Replay
Walk through VynFi's recession_2008_replay scenario pack step by step: the causal DAG structure, calibrated macro parameters, intervention sequence, and how to compare baseline vs. stressed portfolios in Python.
Neural Diffusion for Tabular Financial Data
VynFi 3.0 adds a score-based diffusion model for tabular financial data generation. This post covers the architecture — score networks, denoising score matching, classifier-free guidance — and shows how hybrid mode combines neural and rule-based generation.
Training Fraud Models with Adversarial Synthetic Data
VynFi 3.0's adversarial mode takes an ONNX model, probes its decision boundary, and generates targeted synthetic examples where the model is least confident. This post covers the full pipeline from model upload to augmented retraining.
Counterfactual Simulation: What Would Happen If...?
VynFi 3.0's counterfactual engine lets you define a causal DAG, inject interventions, and generate paired baseline/counterfactual datasets. This post walks through the structural causal model, do-calculus semantics, and a complete Python example.
Introducing VynFi 3.0: From Generation to Simulation
VynFi 3.0 moves beyond data generation into full scenario simulation. Three new pillars — counterfactual simulation, adversarial ML augmentation, and neural diffusion — transform how teams stress-test models, audit controls, and train AI.
pm4py + VynFi: Process Mining on Synthetic OCEL Event Logs
Generate OCEL 2.0 event logs with VynFi, load them into pm4py, discover process models, check conformance, and detect bottlenecks — no ERP data extraction needed.
Celonis + VynFi: Load Synthetic Process Mining Data via IBC
Generate Celonis IBC-compatible event logs with VynFi, import them into Celonis Process Mining, and run analysis — without extracting from your ERP.
Real-Time Fraud Pipelines: From Synthetic Labels to SIEM Alerts
Build a back-pressured fraud alerting pipeline that streams AML labels from VynFi, classifies typologies, and dispatches to Slack/PagerDuty/SIEM without overwhelming downstream rate limits.
Streaming TB-Scale Financial ETL to Parquet
Stream journal entries from VynFi's NDJSON endpoint, flatten header+lines into one row per line item, and write chunk-batched Parquet files — all without loading the full dataset into memory.
Journal Entry Forensics: Benford's Law, Anomaly Detection, and Pre-Built Analytics
Use VynFi's pre-built analytics API to validate Benford's Law conformity, inspect amount distributions, and assess process variant entropy — without computing anything client-side.
Process Mining on OCEL 2.0 Financial Event Logs
Generate OCEL-compliant event logs from synthetic P2P/O2C/manufacturing processes, reconstruct case traces from document references, and run variant analysis — all from a single VynFi job.
Document Flow Traceability: P2P/O2C Three-Way Matching
Every payment should trace back through an invoice, a goods receipt, and a purchase order. Here is how to reconstruct, validate, and audit those document chains with VynFi and Python.
Cleaner SDK Output: Native Decimals, Flat Layout, and DataSynth 2.3.1
DataSynth 2.3.1 fixes three output bugs and delivers two ergonomic wins: native JSON numbers and flat document layout. Here is what changed and how to use it.
Multi-Party AML Networks, Cross-Layer Fraud Propagation, and 14 Typologies
DataSynth 2.3 rebuilds the banking module from the ground up: synthetic identity, trade-based ML, crypto integration, sanctions evasion, real-estate integration, Barabási-Albert network topology, Payment ↔ BankTransaction bridging, velocity features, device fingerprints. Here is what changes for AML model training.
Streaming TB-Scale Synthetic Datasets Without Disk Hell
Customers hit OOM kills and disk-full errors generating terabyte datasets. We rebuilt the output pipeline around Azure Blob, per-file SAS URLs, BYO storage, and rate-controlled NDJSON streaming — so a 1 TB job now ships end-to-end with zero buffering.
Build a Fraud Detector in 30 Minutes with Python
Generate fully labeled fraud data, engineer features, train a RandomForest classifier, and compare it against rule-based audit analytics — all in one notebook session.
Audit Trail Analytics: Tracing P2P and O2C Document Flows
Every payment should trace back through an invoice, a goods receipt, and a purchase order. Here is how to reconstruct, validate, and audit those document chains with Python.
Process Mining with Synthetic Manufacturing Data and OCEL 2.0
Before Six Sigma consultants spend months mapping your processes, let process mining show you where the bottlenecks are. Here is how to do it with VynFi's manufacturing event logs.
ESG Data Analytics: From Carbon Footprint to Materiality Matrix
CSRD is now mandatory for large EU companies. Here is how to generate and analyze a full ESG data package — 14 files across all three pillars — with VynFi and Python.
AML Compliance Testing with 697K Synthetic Banking Transactions
AML-labeled transaction data is the most expensive and legally restricted dataset in financial services. Here is how to build and test a complete compliance program without it.
Synthetic Audit Data for PCAOB and SOC 2 Testing
Auditors need realistic test data to validate tools and train teams, but real client data is off limits. Here is how synthetic data solves the compliance testing problem.
How to Generate SAP-Compatible Test Data with VynFi
SAP implementations need realistic test data but getting it is painful. VynFi generates journal entries, trial balances, and subledgers in SAP-importable formats.
Building Financial AI Models? Here's Your Training Data Pipeline
Synthetic financial data beats anonymized real data for ML training. Benford compliance, balanced entries, ground-truth labels, and unlimited scale via API.
Introducing VynFi: Synthetic Financial Data for Everyone
Today we are launching VynFi, a cloud-native API that generates realistic synthetic financial data at 100K+ rows per second. Here is why we built it and what you can do with it.
Why Synthetic Financial Data Matters for Audit Training
Audit teams train on flat, unrealistic data. Synthetic financial data changes that by providing configurable complexity, labeled anomalies, and unlimited scale.
Getting Started with VynFi in 5 Minutes
A quick walkthrough: sign up, create an API key, generate your first dataset, and inspect the results. All in under 5 minutes.
The Ground Truth Problem in Enterprise Audit Analytics
Why you cannot use production data to build audit knowledge systems. The inverse problem is computationally infeasible, systematic errors propagate undetected, and internal consistency does not imply correctness.
How VynFi Generates Statistically Rigorous Financial Data
Inside the three-layer knowledge model, Benford compliance, copula-based dependencies, and calibration against 155 real-world datasets that power VynFi's generation engine.
130+ Fraud Scenarios: Building Better Fraud Detection Models
How VynFi generates labeled fraud training data with 130+ anomaly subtypes, multi-stage fraud schemes, and ground-truth labels across all five knowledge dimensions.
Privacy-Preserving Data Sharing with Differential Privacy Fingerprints
How VynFi enables cross-firm analytics without data exposure using epsilon-differential privacy fingerprints that separate the privacy boundary from data generation.