This document provides technical details about the transaction data generation process and design decisions.
The transaction generator uses Python's Decimal type for all financial calculations to ensure precision. However, the final output is stored as floating-point numbers in CSV/JSON for broader compatibility.
- Generation: All calculations (interest rates, fees, amounts) use
Decimalwithquantize()to maintain precision - Storage: Output uses float for:
- Wide compatibility with data analysis tools (Excel, pandas, R, etc.)
- JSON standard number format
- Acceptable precision loss for test data (amounts rounded to 2 decimal places)
Maximum precision loss per transaction: ±0.005 florins (due to rounding to 2 decimal places) For 20,000 transactions, cumulative rounding effects are negligible (<0.01% of total volume).
All transactions maintain the fundamental principle: Debits = Credits
Validation results:
- Total debits: 5,376,606,368.51 florins
- Total credits: 5,376,606,368.51 florins
- Difference: 0.00 florins ✓
Each transaction has:
- Single debit account with amount
- One or more credit accounts with amounts
- Validation ensures sum(debits) = sum(credits) for each transaction
The generator uses weighted probabilities to reflect historical banking patterns:
- 31% deposits (reflecting strong papal banking relationship)
- 13% war financing (reflecting frequent conflicts)
- 13% operating expenses (daily operations across 8 branches)
- 10% loan repayments with realistic interest (8-25% per annum)
- 9% loan issuances (core banking activity)
- 8% bills of exchange (Medici innovation in international banking)
- 8% withdrawals (customer activity)
- 6% alum trade (papal monopoly)
Amounts use exponential distribution to reflect realistic banking:
- Most transactions are small (daily operations)
- Some transactions are very large (war financing, major loans)
- Multiplier weights favor smaller amounts (1x, 1x, 1x, 2x, 5x, 10x, 20x, 50x)
Special transactions represent documented historical events:
-
Council of Constance Ransom (May 29, 1415): Exactly 35,000 florins
- Historical fact: Giovanni di Bicci paid this to free Pope John XXIII
- Represented almost half of the bank's first 20 years of profits
-
War Financing Spikes: During documented war periods
- First Milanese War (1390-1402)
- Second Milanese War (1422-1426)
- Wars in Lombardy (1423-1454)
-
Papal Banking Boom: After 1410 when John XXIII appointed Medici as papal bankers
- Increased deposit activity at Rome branch
- Rome branch shows 32% of all transactions
The Medici banking network spanned Europe:
- Rome (32%): Papal banking center
- Florence (22%): Home base and headquarters
- Venice (10%): Major trading partner
- London, Bruges, Avignon, Geneva, Milan (6-7% each): International branches
Each branch conducts appropriate activities:
- Rome: Heavy papal deposits and religious institution transactions
- Florence: War financing and government loans
- All branches: Customer deposits/withdrawals, loans, operating expenses
- Initialize with known historical event (Council of Constance ransom)
- Generate random dates across 1390-1440 time period
- Select transaction type based on weighted probabilities
- Adjust probabilities during historical events (wars, papal banking boom)
- Generate realistic amounts using exponential distribution
- Ensure each transaction is balanced (debits = credits)
- Sort all transactions by date
- Renumber sequentially
- Seed: 42 (for reproducibility)
- Same seed always generates same dataset
- Change seed to generate different but statistically similar dataset
- Pros: Universal compatibility, human-readable, easy to import
- Cons: Less type safety, potential for precision loss
- Usage: Best for spreadsheet analysis and general data exploration
- Pros: Structured data, better for programmatic access
- Cons: Larger file size (7.2 MB vs 2.9 MB for CSV)
- Usage: Best for application integration and API usage
- 20,000 transactions: ~2-3 seconds on modern hardware
- Primarily limited by random number generation and list operations
- CSV: 2.9 MB (145 bytes per transaction average)
- JSON: 7.2 MB (360 bytes per transaction average)
- Peak memory: ~50 MB (all transactions held in memory)
- Could be optimized for streaming if generating millions of transactions
- Create generator method in
TransactionGeneratorclass - Add to
transaction_weightsdictionary - Update documentation
- Update
HistoricalPeriodclass with new date ranges - Add conditional logic in
generate_transactions()method - Create specific transaction for major events
To generate more transactions:
- Simply change
num_transactionsparameter - Current algorithm is O(n) and can handle 100,000+ transactions
- For millions of transactions, consider streaming output
The validate_transactions.py script checks:
- ✓ CSV/JSON structure validity
- ✓ Required fields present
- ✓ Date format correctness
- ✓ Transaction balance (debits = credits)
- ✓ Historical event presence
- ✓ Data distribution
Spot checks recommended:
- View sample transactions
- Verify historical events (Council of Constance)
- Check branch distribution
- Examine transaction type distribution
- Simplified Accounting: Real Medici records would have more complex account structures
- Historical Estimates: Exact transaction amounts are estimated based on historical research
- Currency Simplification: All amounts in florins; real operations used multiple currencies
- Float Precision: Output uses float instead of preserving full Decimal precision
- Static Data: Generated once; doesn't reflect temporal business growth patterns
Potential improvements:
- Add temporal business growth (increasing volumes over time)
- Multi-currency support with exchange rates
- More complex transaction types (partnerships, investments)
- Customer relationship tracking
- Seasonal business variations
- Regional economic events impact
- Employee and merchant name generation
- Full ledger account hierarchy
- de Roover, Raymond. "The Rise and Decline of the Medici Bank" (1963)
- Historical records of Renaissance Italian banking
- Python Decimal documentation for financial calculations
- Double-entry accounting principles
Version: 1.0
Last Updated: 2025-11-21
Author: GitHub Copilot
License: MIT