A production-grade, fully serverless ETL pipeline on GCP that ingests daily Japanese stock market data from the J-Quants API, stores it in BigQuery, and powers a Looker Studio monitoring dashboard — all deployed via Infrastructure as Code and GitOps CI/CD.
Cloud Scheduler (OIDC)
└─► workflow_dispatcher (Cloud Run Service)
└─► Cloud Workflows
├─► market_open (Cloud Run Service) ── skip if market closed
├─► ingest_to_gcs (Cloud Run Job) ── J-Quants API → GCS Parquet
└─► load_to_bigquery (Cloud Run Job) ── GCS → BigQuery (partitioned)
│
Dataform (triggered after load)
└─► mart.report_stock_dashboard
│
Looker Studio Dashboard
| Pipeline | Data Source | Schedule (JST) | BigQuery Table |
|---|---|---|---|
jquants_ohlcv |
Daily OHLCV (bars) | 18:00 weekdays | staging_jquants.daily_quotes |
jquants_fins_summary |
Financial summary | 19:30 weekdays | staging_jquants.fins_summary |
jquants_fins_summary_confirmed |
Confirmed financials | 01:30 weekdays | staging_jquants.fins_summary |
jquants_earnings_calendar |
Earnings calendar | 19:30 weekdays | staging_jquants.earnings_calendar |
jquants_master |
Company master | 03:00 on 10th of month | staging_jquants.master |
daily_monitoring_dashboard |
Dataform transform | 18:20 weekdays | mart.report_stock_dashboard |
- Serverless: Cloud Run Jobs + Cloud Workflows + Cloud Scheduler — zero persistent servers, pay-per-use
- GitOps CI/CD:
build_deploy.yamlis the single source of truth. Push todevelop→ dev deploy; push tomain→ prod deploy - Declarative pipelines: Each
workflow.yamldefines its own schedule and jobs via__metadata__. Adding a new pipeline requires only a newworkflow.yaml - Two-stage BigQuery load: GCS Parquet → temp table → CAST SQL → final partitioned table. Ensures type safety without schema management overhead
- Market-aware scheduling: Workflows skip execution on non-business days by checking the J-Quants calendar API
- Self-healing cleanup: Cloud Build deletes Cloud Run / Workflows / Scheduler resources that no longer exist in the repository
- Secrets in Secret Manager: Zero credentials in the repository. All API keys and tokens stored in GCP Secret Manager, accessed at runtime
- Idempotent loads: Daily partitions are overwritten on re-run (WRITE_TRUNCATE), making backfills safe
| Service | Role |
|---|---|
| Cloud Run (Services) | market_open health check, workflow_dispatcher trigger bridge |
| Cloud Run (Jobs) | Per-pipeline ingest and load jobs |
| Cloud Workflows | Orchestration, market-open gating, error handling |
| Cloud Scheduler | Cron triggers via OIDC to dispatcher |
| Cloud Storage | Data lake (GCS Parquet, partitioned by date) |
| BigQuery | Staging tables (partitioned) + Dataform mart |
| Dataform | SQL transformations for reporting |
| Artifact Registry | Docker image registry |
| Cloud Build | CI/CD pipeline |
| Secret Manager | API keys and tokens |
.
├── build_deploy.yaml # Cloud Build CI/CD pipeline (single source of truth)
├── Dockerfile # Shared container image for all jobs and services
├── requirements.txt # Python dependencies
├── package.json # Dataform Node.js dependencies (@dataform/core)
├── dataform.json # Dataform project configuration
├── libs/ # Shared Python libraries
│ ├── config.py # GCP resource naming conventions
│ ├── utils.py # GCS, Secret Manager, env helpers
│ ├── utils_bigquery.py # BigQuery load with 2-stage CAST
│ ├── utils_discord.py # Discord error notifications
│ ├── utils_jquants.py # J-Quants API client (paginated)
│ └── utils_metadata.py # Pipeline run metadata recording
├── services/
│ ├── market_open/ # FastAPI: checks if market is open via J-Quants
│ └── workflow_dispatcher/ # FastAPI: bridges Cloud Scheduler → Cloud Workflows
├── workflows/
│ ├── ingest/ # One directory per ingestion pipeline
│ │ ├── jquants_ohlcv/
│ │ │ ├── workflow.yaml # Cloud Workflows definition + schedule metadata
│ │ │ ├── job__ingest_to_gcs.py # Fetch from J-Quants API → GCS
│ │ │ └── job__load_to_bigquery.py # GCS Parquet → BigQuery
│ │ ├── jquants_fins_summary/
│ │ ├── jquants_fins_summary_confirmed/
│ │ ├── jquants_earnings_calendar/
│ │ └── jquants_master/
│ ├── transform/
│ │ └── daily_monitoring_dashboard/ # Triggers Dataform compilation + invocation
│ └── utils/
│ └── backfill_executor/ # Sequential backfill for a date range
├── definitions/ # Dataform SQL transformations
│ ├── sources/jquants/ # Source declarations (point to staging tables)
│ ├── intermediate/
│ │ └── int_daily_stock_metrics.sqlx # Enriched metrics (price changes, flags, market cap)
│ └── reports/
│ └── report_stock_dashboard.sqlx # Final Looker Studio table (last 5 trading days)
├── includes/
│ └── env.js # Dataform env vars (dataset names)
└── terraform/ # Infrastructure as Code
├── bootstrap.sh # One-time setup: create state bucket + terraform init
├── terraform.tfvars.template
├── main.tf # Terraform backend (GCS) + provider config
├── variables.tf
├── apis.tf # GCP API enablement
├── iam.tf # Service accounts + IAM bindings
├── gcs.tf # Data lake buckets
├── bigquery.tf # BigQuery datasets + IAM
├── artifact_registry.tf # Docker registry
├── cloudbuild.tf # Cloud Build triggers (dev/prod)
├── dataform.tf # Dataform repositories
└── secret_manager.tf # Secret shells (values set manually)
- GCP project with billing enabled
gcloudCLI authenticated (gcloud auth application-default login)- Terraform >= 1.5
- GitHub repository connected to Cloud Build (via Cloud Build GitHub App)
- J-Quants API key (sign up)
cd terraform/
./bootstrap.sh gcp-jquants-etl-sample asia-northeast1This creates the GCS state bucket, generates terraform.tfvars from the template, and runs terraform init.
Edit terraform/terraform.tfvars:
project_id = "gcp-jquants-etl-sample"
region = "asia-northeast1"
github_repo_url = "https://github.com/<YOUR_OWNER>/jquants-ETL-sample.git"cd terraform/
terraform plan
terraform applyThis provisions all GCP resources: service accounts, IAM roles, GCS buckets, BigQuery datasets, Artifact Registry, Cloud Build triggers, Dataform repositories, and Secret Manager shells.
# J-Quants API key (required)
echo -n "YOUR_JQUANTS_API_KEY" | \
gcloud secrets versions add jquants-api-key \
--project=gcp-jquants-etl-sample \
--data-file=-
# Discord webhook for error notifications (required)
echo -n "<YOUR_DISCORD_WEBHOOK_URL>" | \
gcloud secrets versions add discord-webhook-url \
--project=gcp-jquants-etl-sample \
--data-file=-
# GitHub token for Dataform Git sync (required)
echo -n "<YOUR_GITHUB_PERSONAL_ACCESS_TOKEN>" | \
gcloud secrets versions add dataform-github \
--project=gcp-jquants-etl-sample \
--data-file=-In the GCP Console → Cloud Build → Triggers, connect the GitHub App to your repository. The dev-deploy and prod-deploy triggers are already configured by Terraform.
Push to the develop branch to trigger the dev deployment:
git checkout -b develop
git push origin developCloud Build will:
- Build and push the Docker image to Artifact Registry
- Deploy Cloud Run Services (market_open, workflow_dispatcher)
- Deploy Cloud Run Jobs for each pipeline step
- Deploy Cloud Workflows with rendered YAML
- Create/update Cloud Scheduler jobs
- Clean up orphaned resources
To backfill historical data for a pipeline, trigger the backfill_executor workflow manually from the GCP Console or via gcloud:
gcloud workflows run dev--backfill_executor \
--location=asia-northeast1 \
--data='{"workflow_name": "dev--jquants_ohlcv", "start": "2025-01-01", "end": "2025-01-31"}'The mart.report_stock_dashboard table is refreshed daily after market close. It contains the last 5 trading days of enriched stock metrics:
- Price metrics: OHLCV, adjusted close/volume, stop-high/low flags
- Momentum: Price change % (1d / 5d / 20d / 60d), turnover change %
- Fundamentals: Market cap (million JPY), last earnings date, next earnings date
- Rankings: Turnover rank within each market (Prime / Growth / Standard)
Connect Looker Studio to the mart_prod.report_stock_dashboard BigQuery table and set a daily refresh schedule.
The following is an example of what can be built on top of this pipeline. The application (not included in this repository) consumes the BigQuery mart tables and delivers daily analysis reports to Discord.
The example above shows one of several analysis reports delivered each evening after market close. Reports include (but are not limited to):
- Market summary: Overall fund flow trends, new money / rotation signals
- Sector analysis: Top inflows/outflows by sector with specific stock highlights
- Themes: Momentum themes active on the day
| Branch | Environment | BigQuery Datasets |
|---|---|---|
develop |
dev | staging_jquants_dev, pipeline_metadata_dev, mart_dev |
main |
prod | staging_jquants_prod, pipeline_metadata_prod, mart_prod |
Dev schedulers are automatically paused when a prod deployment runs.
