Reliable, scalable, fault-tolerant data pipeline simulation project.
This branch supports two deployment implementations, both without Kubernetes:
- Single-machine
- Distributed multi-machine
.
├── config
├── data
│ ├── analysis
│ ├── preprocess
│ ├── pyproject.toml
│ └── uv.lock
├── infra
│ ├── clickhouse
│ ├── connectors
│ ├── flink
│ ├── kafka
│ ├── nginx
│ ├── spark
│ └── terraform
├── model
└── services
├── flink-job
├── generator
└── ingestor
Core path:
generator -> nginx -> ingestor -> kafka -> flink -> clickhouse
Prediction branch (current default):
model (ONNX artifact) -> flink -> clickhouse.taxi_predictions
Optional branches:
kafka -> kafka connect s3 sink -> S3prometheus + kafka-exporter + grafana
- Runtime runbook index (scenario split):
docs/runbooks/runtime.md - Single-machine from scratch:
docs/runbooks/runtime-single-machine-from-scratch.md - Distributed from scratch:
docs/runbooks/runtime-distributed-from-scratch.md - Validation checklist:
docs/runbooks/validation.md - Total architecture overview:
docs/system-architecture.md - Config model + runtime ownership invariants:
config/EXPLANATION.md - Refactor history:
docs/history/demolish-ops.md,docs/history/merge-issues-2026-02-04.md
config/.env는 IP/PORT만 관리한다.- non-network 값(topic, tuning, table 등)은 컴포넌트 compose 파일에 하드코딩한다.
- 각 컴포넌트 compose 파일을 직접 실행한다 (root launcher 없음).
docker compose실행 시 항상--env-file config/.env를 명시한다.
Single-machine:
cp config/.env.single-machine config/.env
# Run each command in a separate terminal (foreground mode)
docker compose -f infra/kafka/docker-compose.yml --env-file config/.env up
docker compose -f infra/clickhouse/docker-compose.yml --env-file config/.env up
docker compose -f services/ingestor/docker-compose.yml --env-file config/.env up --build
docker compose -f infra/nginx/docker-compose.yml --env-file config/.env up
docker compose -f infra/flink/docker-compose.yml --env-file config/.env up --build
cd services/generator
./build/generateDistributed commands and machine-by-machine startup order are in docs/runbooks/runtime-distributed-from-scratch.md.
- flink-job: build with JDK 17+, target bytecode Java 11 (Flink 1.17.2 compatibility)
- ingestor: JDK 17+ (Spring Boot 3.2)
- generator: C++ toolchain (Conan + CMake)
- Python workspace:
data/withuv(data/pyproject.toml,data/uv.lock)
cd services/flink-job
mvn clean packageVersion Compatibility:
| Component | Version | Notes |
|---|---|---|
| Flink runtime | 1.17.2 | Docker image: flink:1.17.2-scala_2.12 |
| Target bytecode | Java 11 | Matches Flink 1.17.2 runtime |
| flink-connector-jdbc | 3.1.1-1.17 | Must match Flink version (not 1.18) |
| flink-connector-kafka | 1.17.2 | Must match Flink version |
| Lombok | 1.18.30+ | Required if building with JDK 21+ |
Common issues:
NoSuchFieldError: JCTree$JCImport→ Lombok version too old for your JDK. Upgrade Lombok.ClassNotFoundExceptionat runtime → Connector version mismatch with Flink runtime.
cd services/ingestor
./gradlew build
# or via Docker
docker build -t ingestor .Requires JDK 17+ (Spring Boot 3.2).