BaoStock + Mootdx + EastMoney + yfinance Multi-Source | China A-Shares + US Stocks | PTrade Compatible | DuckDB + Parquet Storage
SimTradeData is an efficient data download tool designed for SimTradeLab. It supports China A-shares (BaoStock, Mootdx, EastMoney) and US stocks (yfinance) from multiple data sources, automatically orchestrating each source's strengths. Data is stored in DuckDB as intermediate storage and exported to Parquet format, with efficient incremental updates and querying.
Fully PTrade Compatible | A-Shares + US Stocks | 20x+ Backtesting Speedup
No PTrade Strategy Code Changes Needed | Ultra-Fast Local Backtesting | Zero-Cost Solution
- DuckDB Intermediate Storage: High-performance columnar database with SQL queries and incremental updates
- Parquet Export Format: Highly compressed, cross-platform compatible, ideal for large-scale data analysis
- Automatic Incremental Updates: Intelligently detects existing data, only downloads new records
- Market Data: OHLCV daily bars with limit-up/down prices and previous close
- Valuation Metrics: PE/PB/PS/PCF/Turnover Rate/Total Shares/Float Shares
- Financial Data: 23 quarterly financial indicators + automatic TTM calculation
- Corporate Actions: Dividends, bonus shares, rights offerings (with forward adjustment factors)
- Metadata: Stock info, trading calendar, index constituents, ST/suspension status
- US Stock Support: 6,000+ US common stocks, S&P 500 / NASDAQ-100 index constituents
- Auto-Validation: Data integrity validation before writes
- Export-Time Calculation: Limit prices, TTM metrics computed at export for consistency
- Detailed Logging: Comprehensive error logs and warnings
data/
├── cn.duckdb # DuckDB database - A-shares (download source)
├── us.duckdb # DuckDB database - US stocks (download source)
└── export/ # Exported Parquet files (by market)
├── cn/ # A-shares export
│ ├── stocks/ # Daily bars (one file per stock)
│ │ ├── 000001.SZ.parquet
│ │ └── 600519.SS.parquet
│ ├── exrights/ # Corporate action events
│ ├── fundamentals/ # Quarterly financials (with TTM)
│ ├── valuation/ # Valuation metrics (daily)
│ ├── metadata/ # Metadata
│ └── manifest.json
└── us/ # US stocks export
├── stocks/
│ ├── AAPL.US.parquet
│ └── MSFT.US.parquet
├── exrights/
├── fundamentals/
├── valuation/
├── metadata/
└── manifest.json
- Python: 3.10 or higher
- Poetry: Installation guide
- Network: Required for downloading data from BaoStock/Mootdx/EastMoney/yfinance (China mainland network recommended for A-share data)
Download the latest data from Releases:
- A-shares:
data-cn-v*release → extract todata/cn/ - US stocks:
data-us-v*release → extract todata/us/
# A-shares
mkdir -p /path/to/SimTradeLab/data/cn
tar -xzf simtradelab-data-cn-*.tar.gz -C /path/to/SimTradeLab/data/cn/
# US stocks
mkdir -p /path/to/SimTradeLab/data/us
tar -xzf simtradelab-data-us-*.tar.gz -C /path/to/SimTradeLab/data/us/# Clone the project
git clone https://github.com/kay-ou/SimTradeData.git
cd SimTradeData
# Install dependencies
poetry install
# Activate virtual environment
poetry shellRecommended: Unified Download Command
A single command downloads all data, automatically orchestrating Mootdx and BaoStock for their respective strengths:
# Full download (recommended)
# Mootdx: market data, corporate actions, bulk financials, trading calendar, benchmark index
# BaoStock: valuation metrics, ST/suspension status, index constituents
poetry run python scripts/download.py
# Fast first-time download: import TDX daily package first, then supplement with corporate actions etc.
# (6,000+ stocks OHLCV reduced from hours to minutes)
poetry run python scripts/download.py --tdx-download --source mootdx --skip-fundamentals
# Use an already-downloaded TDX ZIP file
poetry run python scripts/download.py --tdx-source data/downloads/hsjday.zip --source mootdx
# Check data status
poetry run python scripts/download.py --status
# Skip financial data (faster)
poetry run python scripts/download.py --skip-fundamentals
# Run Mootdx phase only
poetry run python scripts/download.py --source mootdx
# Run BaoStock phase only
poetry run python scripts/download.py --source baostockData Source Division of Labor
| Data Type | Source | Reason |
|---|---|---|
| OHLCV Market Data (first time) | TDX Daily Package | Fastest, ~500MB bulk import of full history |
| OHLCV Market Data (incremental) | Mootdx | Fast, local network |
| Corporate Actions (XDXR) | Mootdx | More complete data |
| Bulk Financial Data | Mootdx | One ZIP = all stocks, far better than per-stock queries |
| Valuation PE/PB/PS/Turnover | BaoStock | Exclusive data |
| ST/Suspension Status | BaoStock | Exclusive data |
| Index Constituents | BaoStock | Exclusive data |
| Trading Calendar | Mootdx | Comes with market data |
| Benchmark Index | Mootdx | Comes with market data |
Using Individual Data Sources
# BaoStock (includes valuation data, but slower)
poetry run python scripts/download_efficient.py
poetry run python scripts/download_efficient.py --skip-fundamentals
poetry run python scripts/download_efficient.py --valuation-only # Valuation + status only
# Mootdx (faster, but no valuation data)
poetry run python scripts/download_mootdx.py
poetry run python scripts/download_mootdx.py --skip-fundamentalsEastMoney Complementary Data (Money Flow, Dragon Tiger Board, Margin Trading)
# Download last 30 days of complementary data (requires existing market data)
poetry run python scripts/download_daily_extras.py
# Specify number of days (LHB API only retains ~30 days, run regularly)
poetry run python scripts/download_daily_extras.py --days 7US Stock Data (yfinance)
Free US stock data via yfinance, no API key required:
# Full download (6,000+ US stocks with OHLCV + financials + valuation + metadata)
poetry run python scripts/download_us.py
# Specific symbols (small-scale testing)
poetry run python scripts/download_us.py --symbols AAPL,MSFT,GOOGL
# Market data only (skip time-consuming per-stock financials and metadata)
poetry run python scripts/download_us.py --skip-fundamentals --skip-metadata
# Specify start date
poetry run python scripts/download_us.py --start-date 2020-01-01US stock ticker format: AAPL.US (consistent with A-shares 600000.SS using {code}.{market}), stored in a separate database data/us.duckdb.
TDX Official Data Package (Fastest Way to Get Full Historical Data)
# Auto-download official TDX Shanghai/Shenzhen/Beijing daily data package (~500MB)
poetry run python scripts/download_tdx_day.py
# Force re-download
poetry run python scripts/download_tdx_day.py --force-download
# Use an already-downloaded file
poetry run python scripts/download_tdx_day.py --file hsjday.zip# Export A-shares → data/export/cn/
poetry run python scripts/export_parquet.py
# Export US stocks → data/export/us/
poetry run python scripts/export_parquet.py --market us
# Custom output directory
poetry run python scripts/export_parquet.py --market cn --output /custom/path# Release A-shares data
bash scripts/release_data.sh --market cn
# Release US stock data
bash scripts/release_data.sh --market us
# Specify version
bash scripts/release_data.sh --market cn 1.3.0# Copy exported data to SimTradeLab data directory
rsync -a data/export/cn/ /path/to/SimTradeLab/data/cn/
rsync -a data/export/us/ /path/to/SimTradeLab/data/us/SimTradeData/
├── scripts/
│ ├── download.py # Unified download entry (recommended for A-shares)
│ ├── download_efficient.py # BaoStock download script
│ ├── download_mootdx.py # Mootdx (TDX API) download script
│ ├── download_daily_extras.py # EastMoney complementary data download script
│ ├── download_tdx_day.py # TDX official daily data package download/import
│ ├── download_us.py # US stock download script (yfinance)
│ ├── import_tdx_day.py # TDX .day file import script
│ ├── export_parquet.py # Parquet export script
│ └── release_data.sh # GitHub Release publishing script
├── simtradedata/
│ ├── router/
│ │ ├── smart_router.py # SmartRouter - smart data source routing
│ │ ├── route_config.py # Route table configuration
│ │ └── exceptions.py # Router exceptions
│ ├── fetchers/
│ │ ├── base_fetcher.py # Base Fetcher class
│ │ ├── baostock_fetcher.py # BaoStock data fetching
│ │ ├── unified_fetcher.py # BaoStock unified fetching (optimized)
│ │ ├── mootdx_fetcher.py # Mootdx basic data fetching
│ │ ├── mootdx_unified_fetcher.py # Mootdx unified data fetching
│ │ ├── mootdx_affair_fetcher.py # Mootdx financial data fetching
│ │ ├── eastmoney_fetcher.py # EastMoney complementary data fetching
│ │ └── yfinance_fetcher.py # yfinance US stock data fetching
│ ├── processors/
│ │ └── data_splitter.py # Data stream splitting
│ ├── writers/
│ │ └── duckdb_writer.py # DuckDB write and export
│ ├── validators/
│ │ └── data_validator.py # Data quality validation
│ ├── config/
│ │ ├── field_mappings.py # A-share field mapping config
│ │ ├── us_field_mappings.py # US stock field mapping config
│ │ └── mootdx_finvalue_map.py # Mootdx financial field mapping
│ └── utils/
│ ├── code_utils.py # Stock code conversion
│ └── ttm_calculator.py # Quarterly range calculation
├── data/ # Data directory (gitignored)
│ ├── cn.duckdb # A-shares DuckDB source
│ ├── us.duckdb # US stocks DuckDB source
│ └── export/ # Parquet exports
│ ├── cn/ # A-shares export
│ └── us/ # US stocks export
└── docs/ # Documentation
├── PTRADE_PARQUET_FORMAT.md # Parquet format specification
└── PTrade_API_mini_Reference.md
1. SmartRouter - Smart Data Source Router
- Unified data access API, automatically selects the best data source by data type and market
- Static priority + health-aware: auto fallback to backup sources when primary fails
- Integrates Phase 1 circuit breaker, skips unhealthy sources
from simtradedata.router import SmartRouter
with SmartRouter() as router:
# Auto-selects best source: mootdx → eastmoney → baostock
df = router.get_daily_bars("600000.SS", "2024-01-01", "2024-12-31")
# Single-source data also goes through router for unified API
mf = router.get_money_flow("600000.SS", "2024-01-01", "2024-12-31")
# US stocks auto-route to yfinance
us = router.get_daily_bars("AAPL.US", "2024-01-01", "2024-12-31")2. UnifiedDataFetcher - Unified Data Fetching
- Single API call fetches market, valuation, and status data
- Reduces API calls by 33%
2. DuckDBWriter - Data Storage and Export
- Efficient incremental writes (upsert)
- Computes limit prices and TTM metrics at export time
- Forward-fills quarterly data to daily frequency
3. DataSplitter - Data Stream Splitting
- Routes unified data to appropriate tables by type
| Field | Description |
|---|---|
| date | Trading date |
| open/high/low/close | OHLC prices |
| high_limit/low_limit | Limit-up/down prices (computed at export) |
| preclose | Previous close price |
| volume | Trading volume (shares) |
| money | Trading amount (CNY) |
| Field | Description |
|---|---|
| pe_ttm/pb/ps_ttm/pcf | Valuation ratios |
| roe/roe_ttm/roa/roa_ttm | Profitability metrics (forward-filled from quarterly reports) |
| naps | Net asset per share (computed at export) |
| total_shares/a_floats | Total shares / float shares |
| turnover_rate | Turnover rate |
Contains 23 financial indicators and their TTM versions. See PTRADE_PARQUET_FORMAT.md for details.
Edit scripts/download_efficient.py:
# Date range
START_DATE = "2017-01-01"
END_DATE = None # None = current date
# Output directory
OUTPUT_DIR = "data"
# Batch size
BATCH_SIZE = 20| Document | Description |
|---|---|
| PTRADE_PARQUET_FORMAT.md | Parquet data format specification |
| PTrade_API_mini_Reference.md | PTrade API reference |
| Feature | BaoStock | Mootdx API | EastMoney | TDX Official Package | yfinance (US) |
|---|---|---|---|---|---|
| Market | A-shares | A-shares | A-shares | A-shares | US stocks |
| Speed | Slower | Fast | Fast | Fastest (bulk download) | Medium |
| Valuation Data | Yes (PE/PB/PS etc.) | No | No | No | Yes (computed) |
| Financial Data | Yes (per-stock query) | Yes (bulk ZIP, faster) | No | No | Yes (per-stock query) |
| Money Flow | No | No | Yes (exclusive) | No | No |
| Dragon Tiger Board | No | No | Yes (exclusive) | No | No |
| Margin Trading | No | No | Yes (exclusive) | No | No |
| History Start | 2015 | 2015 | 2015 | Full history | Full history |
| API Key | Not required | Not required | Not required | N/A | Not required |
Recommended: Use
scripts/download.pyunified command to automatically assign Mootdx for market data and financials, BaoStock for valuation and status, leveraging each source's strengths.
- Market Data: Checks for new trading days; skips in seconds when no new data
- Financial Data: Incremental checks based on remote file hash; only downloads changed quarters
- Index Constituents: Tracks downloaded months; only downloads new months
- Interrupt Recovery: Financial data progress and data are committed in the same transaction; resumes after interruption
# 1. Incremental download (fetches only new data, automatically skips existing)
poetry run python scripts/download.py
# 2. Export to Parquet
poetry run python scripts/export_parquet.py # CN → data/export/cn/
poetry run python scripts/export_parquet.py --market us # US → data/export/us/Step 1 automatically detects the latest date of existing data in DuckDB and only downloads the delta. When there are no new trading days, all stocks are skipped in seconds.
- Data sourced from BaoStock free data service
- For research and educational purposes only
# Unit tests (no network required)
poetry run pytest tests/ -v
# SmartRouter routing and fallback tests
poetry run pytest tests/router/ -v
# SmartRouter live integration test (requires network)
poetry run python scripts/test_smart_router_live.pySee CHANGELOG.md for the full version history.
Latest: v1.2.0 (2026-03-13) - Smart Data Source Router
- SimTradeLab: https://github.com/kay-ou/SimTradeLab
- BaoStock: http://baostock.com/
- Mootdx: https://github.com/mootdx/mootdx
- EastMoney: https://www.eastmoney.com/
- yfinance: https://github.com/ranaroussi/yfinance
If this project helps you, consider sponsoring!
| WeChat Pay | Alipay |
|---|---|
![]() |
![]() |
This project is licensed under AGPL-3.0. See the LICENSE file for details.
Status: Production Ready | Version: v1.2.0 | Last Updated: 2026-03-13

