Skip to content

Latest commit

 

History

History
412 lines (332 loc) · 21.1 KB

File metadata and controls

412 lines (332 loc) · 21.1 KB

Changelog

All notable changes to the VynFi Python SDK are documented here.

[1.7.0] - 2026-04-21

Minor release that fills in the example and notebook coverage for every v1.6.x resource. Every script and notebook (except neural_diffusion.py and gnn_vendor_networks.ipynb, both of which require torch) now runs end-to-end against DS 4.1.x / API 4.1.x.

New example scripts

  • audit_optimizer.py — drives three of the six optimizer endpoints (risk_scope, portfolio, monte_carlo) with a sketched audit engagement. Keeps working across DS 4.1.x stub → 4.2.x real-analytics migrations because OptimizerResponse.report is opaque.
  • template_packs_crud.py — create → upsert vendor_names → validate → fetch → list → cleanup. Shows the enrichment-and-linking patterns as comments.
  • nl_config.pyconfigs.from_description(...) walkthrough plus a commented-out configs.from_company(...) example. Gated dry-run submission behind VYNFI_RUN_NL=1.
  • ds_41_features.py — single generation job that exercises every new DS 4.1 config surface: analyticsMetadata, audit, complianceRegulations, accountingStandards, interconnectivity, llm. Reports what actually landed in the archive rather than predicting, so it keeps working as DS fills in the pipelines.

Notebook fixes

  • 01_quickstart.ipynb — switch from generate_quick(tables=...) + journal-entries (hyphens) to the async generate_config pattern that emits journal_entries.json.
  • counterfactual_simulation.ipynb — reindex baseline/counterfactual period series to the union of their indexes before plotting (fixes matplotlib shape-mismatch).
  • sox_compliance_testing.ipynb — trim scenario scope to 300 rows × 2 companies × 2 periods so the paired baseline + counterfactual generation fits within the 500 s cell timeout.

[1.6.1] - 2026-04-21

Patch release from a live regression run of the full examples suite against DS 4.1.x / API 4.1.x.

Fixed

  • scenarios.create() — stop sending the legacy top-level interventions field that DS 3.1 removed. Interventions are now folded exclusively into generationConfig.scenarios.interventions (the backward-compat mapping was already there; only the stray top-level key was tripping server validation).
  • jobs.list_files() — retry on 404 up to ~4 s. The managed_blob file index can lag a second or two behind job completion; a single 404 right after wait() is almost always a race, not a real miss.

Regression-run fixes to examples

  • quickstart.py / pandas_workflow.py — switch from generate_quick (30 s server cap, overruns on DS 4.1.x full-domain retail) to the async generate_config + wait pattern. Fix tables name typo (journal-entriesjournal_entries).
  • native_mode.py — drop exportLayout: "flat" (still hanging upstream per docs/ds-3.1.1-verification.md § D); flatten nested output in-script and coerce amounts before summing.
  • multi_period_sessions.py — shrink from 1000 rows × 5 companies to 300 × 2 to fit a 5-minute run budget.
  • streaming_aggregator.py — aggregate 100 envelopes instead of 500 (Scale-tier NDJSON stream ~4/s for this workload).
  • quality_monitoring.py — iterate completed jobs until one with a live archive is found, rather than failing on a GC'd most-recent job.
  • fingerprint_synthesis.py — exit 0 (skip) when VYNFI_FINGERPRINT is unset, so regression runners don't flag it.

[1.6.0] - 2026-04-21

Adopts DataSynth 4.1.x + VynFi API 4.1.x. Surfaces the portal's new audit-optimizer CLI wrappers, user-uploaded template packs, natural- language config generation, and aggregated audit artifacts.

Added — new resources

  • client.optimizer — six POST /v1/optimizer/* wrappers (Scale+ tier). Each returns a typed OptimizerResponse whose .report carries the CLI's opaque JSON so the SDK doesn't have to track every DS 4.1.x stub → real-analytics migration:
    • risk_scope(engagement, top_n=None)
    • portfolio(candidates, budget_hours)
    • resources(schedule)
    • conformance(trace, blueprint)
    • monte_carlo(engagement, runs=None, seed=None) (server defaults: runs=1000, seed=42)
    • calibration(findings)
  • client.template_packs — user-uploaded DataSynth template packs (Team+ tier, DS 3.2+ templates.path):
    • list() / create() / get() / update() / delete()
    • categories() — supported category keys
    • get_category() / upsert_category() / delete_category()
    • validate() — re-validate every category
    • enrich_category() — LLM-enrich a category (Scale+ tier)

Added — new methods on existing resources

  • client.configs.from_description(description) — natural-language → validated PortalGenerationConfig (Scale+ tier).
  • client.configs.from_company(uid=..., name=..., periods=None, fraud_rate=None) — Swiss VynCo company profile → config (Scale+).
  • client.jobs.audit_artifacts(job_id) — aggregated reader for audit/audit_opinions.json, audit/key_audit_matters.json, and anomaly_labels.json via GET /v1/jobs/{id}/audit-artifacts.

Added — new models

OptimizerResponse, TemplatePack, TemplatePackList, TemplatePackCategorySummary, TemplatePackCategoryContent, TemplatePackValidation, TemplatePackValidationIssue, TemplatePackEnrichResponse, NlConfigResponse, CompanyConfigResponse, BatchCompanyResponse, AuditArtifacts.

DS 4.1 wire-visible fields supported via existing passthrough

config.analyticsMetadata, config.audit, config.complianceRegulations, config.accountingStandards, config.interconnectivity, config.templates.packId, config.llm.* — all pass through Jobs.generate_config(config=...) unchanged.

DS 4.1.0 statistical_validation and DS 4.1.3 interconnectivity result fields are internal to DS and not surfaced by the VynFi API; nothing to bind on the SDK side yet.

[1.5.1] - 2026-04-19

Adopts DataSynth 3.1.1 + VynFi API 3.1.1, which decisively fixed 7 of the 10 findings from docs/semantics-and-optimization-2026-04-18.md. See docs/ds-3.1.1-verification.md for the full scorecard.

Added

  • Jobs.fraud_split(job_id) — wraps the new GET /v1/jobs/{id}/fraud-split endpoint. Returns a typed FraudSplit with scheme-propagated vs direct-injection counts, propagation rate, and a by_fraud_type dict of FraudTypeSplit entries. Useful for stratified ML training (cross-document detectors on the scheme population, noise-robust detectors on the direct population).
  • New models: FraudSplit, FraudTypeSplit.

Verified live (DS 3.1.1)

  • round_dollar_bias — 0× → 170× lift
  • is_weekend bias — 1.83× → 32× lift
  • is_post_close bias — ∞ → ~3,106× lift
  • is_fraud_propagated — 0/388 → 12/33 populated on fraud entries
  • process_variant_summary.json — now emitted (162 variants, 55 % happy-path concentration)
  • audit/audit_opinions.json + key_audit_matters.json — now materialized (empty arrays when audit phase disabled)
  • ✓ AML typology coverage — 0.000 → 0.857 (passes the ≥ 0.80 threshold)
  • off_hours_bias still 0 % in both populations (upstream)
  • ⚠ AML relationship mix still 96.8 % TransactionCounterparty (upstream)
  • /v1/scenarios/templates still returns 1 (portal-side gap)
  • exportLayout: "flat" still hangs at 0 % (upstream writer bug)

[1.5.0] - 2026-04-18

Adapts the SDK to DataSynth 3.1, which addressed 5 of the 12 findings from docs/insights-2026-04-18.md. Also ships three quality-of-life helpers that emerged as patterns across the example suite.

Added

  • DataSynthQualityReport — one-call aggregator of every quality signal a job produces (Benford, amount distribution, balance validation, process variants with rework/skip/OOO rates, banking evaluation, data-quality stats). Renders to Markdown or a flat dict.
  • Jobs.wait_for_many() — parallel-aware waiter for paired jobs (baseline + counterfactual, session period pairs, etc.).
  • JobArchive.dataframes() — shortcut wrapping archive_to_dataframes() with auto-coercion of *_amount / *_date columns.
  • JobArchive.audit_opinions() / .key_audit_matters() — DS 3.1 now writes ISA 700 + ISA 701 outputs under audit/.
  • New models: AuditOpinion, KeyAuditMatter.
  • VariantAnalysis extended with rework_rate, skipped_step_rate, out_of_order_rate (DS 3.1 injects realistic process imperfections).
  • 4 new examples: document_level_fraud.py, behavioral_fraud_patterns.py, sector_dag_presets.py, audit_opinions_kam.py.

Changed

  • archive_to_dataframes() now auto-coerces common financial columns by default (new coerce=True kwarg).
  • ml_training_pipeline.py demonstrates 3.1 is_fraud_propagated stratification.
  • pm4py_integration.py prints timestamp retention (DS 3.1 microsecond truncation means 100 % of events now survive pandas parsing).

DS 3.1 config fields supported via existing endpoints

  • fraud.documentFraudRate, fraud.propagateToLines, fraud.propagateToDocument
  • businessProcesses.{o2c,p2p,r2r,h2r,a2r}Weight
  • scenarios.causalModel.presetmanufacturing / retail / financial_services / custom / default / minimal
  • scenarios.causalModel.nodes + edges for custom DAGs
  • banking.typologies.networkTypologyRate
  • diffusion.neural.{hybridWeight, hybridStrategy, neuralColumns}

Verified live (post-3.1 deploy)

  • ✓ AML network density 0.0014 → 0.053 (38× richer); 0 → 35 mule_links
  • ✓ OCPM on JE headers 47 % → 100 %
  • ✓ Timestamp parsing 95 % row loss → 0 %
  • download_file for managed_blob works
  • banking_evaluation.json in archive
  • process_variant_summary.json still missing (deploy lag)
  • ⚠ Behavioral fraud biases + is_fraud_propagated + audit_opinions.json not yet active — SDK code is in place and will pick them up when the deploy rolls

[1.4.1] - 2026-04-17

Fixed

  • jobs.download_file() on managed-blob jobs now returns the actual bytes instead of an empty response. The VynFi API issues a 302 redirect to an Azure Blob SAS URL for managed-blob jobs; httpx.Client defaults to follow_redirects=False, so the SDK was returning the empty body of the 302 response. Pass follow_redirects=True to the internal httpx.Client. Regression guard added in tests/test_resources.py::test_download_file_follows_managed_blob_redirect.

    Paired with the VynFi API DataSynth 3.1 upgrade (SDK-feedback item #12).

[1.4.0] - 2026-04-14

Adds support for DataSynth 3.0 + VynFi API features (scenario packs, fingerprint synthesis, adversarial probing, AI tuning, co-pilot chat). All features verified end-to-end against the live API.

Added

  • Scenarios.packs() — lists the 11 built-in DataSynth scenario packs:
    • Fraud: vendor_collusion_ring, management_override, ghost_employee, procurement_kickback, channel_stuffing
    • Control failures: sox_material_weakness, it_control_breakdown
    • Macro: recession_2008_replay, supply_chain_disruption_q3, interest_rate_shock_300bp
    • Operational: erp_migration_cutover
  • Fingerprint resource (client.fingerprint) — privacy-preserving synthesis from .dsf fingerprint files via POST /v1/fingerprint/synthesize. Team+ for statistical, Scale+ for neural/hybrid backends.
  • Adversarial resource (client.adversarial) — ONNX model probing for fraud detection robustness (Enterprise tier). Methods: probe(), results().
  • Jobs.tune() — LLM-powered config suggestion based on a completed job's quality scores (POST /v1/jobs/{id}/tune). Scale+.
  • AI resource (client.ai) — free-text chat with the dashboard co-pilot (POST /v1/ai/chat). Scale+.
  • New models: ScenarioPack, ScenarioPackList, FingerprintSynthesisResponse, AdversarialProbeResponse, AdversarialProbeResults, ProbeSample, AiTuneResponse, AiChatResponse, QualitySummary.
  • Four new examples: scenario_packs.py, ai_tune.py, fingerprint_synthesis.py, neural_diffusion.py.

Changed

  • Scenarios.create() — updated to the DS 3.0 request shape ({name, generation_config}). Legacy template_id and interventions kwargs are still accepted and are auto-folded into generation_config.scenarios.* for backward compatibility.
  • Default client timeout — bumped from 30s to 60s. The 30s default was too tight for generate_quick (30s server-side limit) with realistic network latency.

DataSynth 3.0 features supported via existing endpoints (no SDK changes)

  • Scenario packs in configgeneration_config.scenarios.packs = [...]
  • Custom interventionsgeneration_config.scenarios.interventions = [...] with timing (startMonth, durationMonths, onset) and constraints (preserveAccountingIdentity, etc.). Scale+.
  • Neural diffusiongeneration_config.diffusion.backend = "neural"|"hybrid" with neural.* subsection. Scale+.
  • Quality gatesgeneration_config.qualityGates.profile = "standard"|"strict"|"audit". Team+.
  • OCPM fields populatedocpm_event_ids, ocpm_object_ids, ocpm_case_id are now populated on journal entry headers (was empty in 2.3.x). Verified 209/300 entries on a sample retail job.

Still pending upstream (DataSynth team)

  • exportLayout: flat hangs the DataSynth binary — use the default nested layout for now.

[1.3.0] - 2026-04-13

Adds support for DataSynth 2.3 + VynFi API 2.0 features released by the platform team. All features verified end-to-end against the live API (verification report).

Fixed

  • CamelCase deserialization for JobFileList, JobFile, EstimateSizeResponse — these were declared with plain BaseModel but the API returns camelCase fields. Pydantic silently fell back to defaults (e.g. list_files() reported 0 files even when 85 existed). Switched to _CamelModel.
  • Download timeout extended from 30s → 5min default for request_raw(). Large archive downloads (>100MB) were timing out.

Added

  • Jobs.analytics() — fetch pre-built statistical analytics for a completed job (GET /v1/jobs/{id}/analytics). Returns Benford's Law conformity, amount distribution, process variant summary, and banking evaluation in one call.
  • Jobs.stream_ndjson() — rate-controlled NDJSON streaming (GET /v1/jobs/{id}/stream/ndjson) with rate, burst, progress_interval, file query parameters. Yields data records and _progress envelopes. Scale tier+.
  • Configs.estimate_size() — TB-scale storage quota validation (POST /v1/configs/estimate-size) with per-domain breakdown.
  • Configs.submit_raw() — raw DataSynth YAML config submission (POST /v1/configs/raw). Scale tier+.
  • JobArchive now supports both backends transparently — legacy zip archives and new TB-scale managed_blob manifests with presigned URLs. New archive.backend property and archive.url(path) / archive.ttl_seconds() methods.
  • Analytics models: JobAnalytics, BenfordAnalysis, AmountDistributionAnalysis, VariantAnalysis, BankingEvaluation, KycCompletenessAnalysis, AmlDetectabilityAnalysis, CrossLayerCoherenceAnalysis, VelocityQualityAnalysis, FalsePositiveAnalysis, TypologyDetection.
  • Config models: EstimateSizeResponse, SizeBucket, RawConfigResponse.
  • Three new examples: analytics_export.py, ndjson_streaming.py, native_mode.py.

DataSynth 2.3 features (server-side, supported by SDK)

  • output.numericMode = "native" — decimals as JSON numbers (eliminates the need for pd.to_numeric() in consumer code)
  • output.exportLayout = "flat" — header fields merged onto each line (eliminates manual flattening of journal entries)
  • banking_customers.display_name — pre-flattened name field
  • Fraud label propagationis_fraud and fraud_type now appear on document flow records (PO, GR, VI, Payment, SO, Delivery, CI), not just journal entries
  • Pre-built analyticsanalytics/ directory in every archive with Benford, distribution, variant, and banking evaluation files
  • TB-scale managed blob storage — large jobs return manifest with presigned URLs instead of inline zip archive

[1.2.0] - 2026-04-11

Added

  • Jobs.list_files() — list all files in a completed job's archive with sizes, content types, and per-file column schemas (GET /v1/jobs/{id}/files)
  • JobFileList, JobFile, FileSchema models for the file listing response
  • OutputEstimate model and output field on EstimateCostResponse — shows estimated file count and archive size before running a job
  • OutputEstimate.note field — explains that the rows parameter is a scale factor

[1.1.0] - 2026-04-11

Added

  • JobArchive class -- wraps downloaded zip archives for easy file access: archive.json(), archive.files(), archive.categories(), archive.find(), archive.summary(), archive.extract_to()
  • Jobs.download_archive() -- returns a JobArchive instead of raw bytes
  • archive_to_dataframes() in pandas integration -- converts all JSON files in an archive to a dict of DataFrames in one call, with automatic header/lines flattening for journal entries
  • 7 Jupyter notebooks -- quickstart, audit analytics, fraud detection, document flow audit trail, process mining, ESG reporting, AML compliance testing
  • 7 standalone scripts -- quickstart, streaming progress, pandas workflow, config management, multi-period sessions, what-if scenarios, quality monitoring
  • Examples README with sample output figures and key metrics from live testing

Fixed

  • Config endpoints used wrong URL path: /v1/configs/validate -> /v1/config/validate, /v1/configs/estimate-cost -> /v1/config/estimate-cost, /v1/configs/compose -> /v1/config/compose (singular, matching Rust SDK reference)
  • Sessions generate_next() used wrong URL path: /v1/sessions/{id}/generate-next -> /v1/sessions/{id}/generate (matching Rust SDK reference)

[1.0.0] - 2026-04-10

First stable release. Full API parity with the VynFi Rust SDK reference implementation.

Added

  • Configs resource — save, list, get, update, delete generation configs; validate configs, estimate cost, and compose from layers
  • Credits resource — purchase prepaid credit packs, check balance, view history
  • Sessions resource — create multi-period generation sessions, extend, and generate each period sequentially
  • Scenarios resource — create what-if scenarios with causal graph templates, run baseline vs counterfactual, get diff analysis
  • Notifications resource — list and mark-read user notifications
  • Catalog templatescatalog.list_templates() for browsing system generation templates
  • Billing checkout & portalbilling.checkout() and billing.portal() for Stripe integration
  • Jobs enhancementsgenerate_config() for config-based generation, download_file() for specific artifacts, wait() for polling convenience, download() returns bytes directly
  • ForbiddenError (403) exception type
  • QuickJobResponse and CancelJobResponse dedicated return types
  • WebhookDetail with delivery history
  • RevokeKeyResponse return type for API key revocation
  • pandas integration (vynfi.integrations.pandas) — download_dataframe(), job_to_dataframe(), usage_to_dataframe(), quality_to_dataframe()
  • polars integration (vynfi.integrations.polars) — download_frame(), job_to_frame(), usage_to_frame(), quality_to_frame()
  • Optional dependency groups: pip install vynfi[pandas], vynfi[polars], vynfi[all]

Changed

  • Version bumped to 1.0.0 (production-stable)
  • Job model updated to match live API: owner_id, config, artifacts, error_detail fields; removed legacy tables, format, sector_slug, output_path, error fields
  • JobList uses data field (was jobs) to match API response shape
  • Sector/SectorSummary/CatalogItem now include id, multiplier, quality_score (int), popularity fields
  • Column now includes example_values
  • TableDef now includes id, slug
  • Fingerprint model restructured: sector (value) + table (TableDef)
  • ApiKey/ApiKeyCreated use environment field (was scopes/expires_in_days)
  • UsageSummary.burn_rate is int (was float)
  • DailyUsageResponse.by_table is list[TableUsage] (was dict)
  • Invoice fields updated to match Stripe format (amount_due, amount_paid, number, created, hosted_invoice_url, pdf)
  • Subscription now includes stripe_price_id
  • Billing.payment_method() returns raw JSON (was PaymentMethod model)
  • Jobs.download() returns bytes (was file-path based); use download_to() for file output
  • Development status classifier: Alpha → Production/Stable

Removed

  • PaymentMethod model (API returns raw JSON)
  • Legacy JobProgress model (job progress is now untyped Any matching API)

[0.1.0] - 2026-04-09

Initial alpha release with 7 resource modules: jobs, catalog, usage, api_keys, quality, webhooks, billing.