System Architecture for Convert and CDC Workloads

Understand how control plane, data plane, and monitoring layers work together for migration and replication across MySQL, PostgreSQL, files, and S3.

Architecture summary

- Sub-second CDC latency from transaction logs
- At-least-once delivery with automatic deduplication
- Horizontally scalable writers with concurrent worker pools
- No ZooKeeper, no Kafka cluster, no JVM tuning

WHY NOT KAFKA

Why This Architecture Exists

Kafka is powerful — but for database-to-database streaming, it brings operational weight that most teams don't need. DBConvert Streams takes a different approach.

Traditional Approach

Separate Kafka cluster to deploy and operate
ZooKeeper or KRaft for coordination
Partition rebalancing on scaling events
JVM heap tuning and GC optimization
Separate Connect workers for source and sink

DBConvert Streams

Embedded NATS JetStream — starts with the app
No external coordination service needed
Desktop package or one-line Docker setup — no JVM tuning
Reader and Writer built in — no plugins to configure

OVERVIEW

System Architecture

Three planes — control, data, and monitoring — working independently for maximum reliability.

Control Plane

UI + API manage stream lifecycle — create, start, pause, stop

Data Plane

Reader captures events, NATS brokers them, Writers apply in parallel

Monitoring

Metrics collected independently — no impact on data throughput

COMPONENTS

Core Components

Four components work together to move data reliably between any supported databases.

Source Reader

Connects to source databases and captures data changes.

CDC mode: reads from transaction logs (MySQL binlog, PostgreSQL WAL)
Convert mode: direct table reads with chunked processing
Publishes batched events to NATS JetStream
Automatic schema detection and type mapping

NATS JetStream

Embedded message broker that decouples readers from writers.

Runs embedded — no external Kafka or RabbitMQ to manage
Persistent streams with at-least-once delivery
Thread-safe deduplication via composite batch keys
Unacknowledged messages are automatically redelivered on failure

Target Writers

Consume events and write to destination systems in parallel.

Horizontally scalable — add more writers for higher throughput
Concurrent worker pool per writer instance
Transactional writes with automatic retry on failure
Supports MySQL, PostgreSQL, CSV, Parquet, and S3 (Snowflake coming soon)

API Server

Central control plane for managing streams and connections.

RESTful API for stream lifecycle (create, start, pause, stop)
Real-time metrics and progress reporting
Connection and credential management
Authentication via API keys

RELIABILITY

Delivery Guarantees

Every event is captured, delivered, and written — exactly once to the target.

At-Least-Once Delivery

Every event published to NATS JetStream is persisted until acknowledged by the writer. No silent data loss.

Batch Deduplication

Writers track dispatched batch IDs in memory. If NATS redelivers a message, the writer recognizes it and skips the duplicate — no double-writes.

Transaction Boundaries

CDC mode preserves operation order within transactions. INSERT, UPDATE, and DELETE arrive in the correct sequence.

Automatic Redelivery

If a writer fails to acknowledge a message, JetStream redelivers it automatically. No manual intervention, no lost events.

DATA FLOW

How Data Moves

From source to target in three phases — initialization, transfer, and monitoring.

Initialization

The API server validates the stream configuration and coordinates startup across components.

API creates a NATS JetStream stream and consumer groups
Source Reader connects to the source database and detects schema
Target Writers subscribe to the NATS stream and prepare target tables

Data Transfer

Data flows from source through the event hub to target writers in parallel batches.

Reader captures data (CDC events or table rows) and batches them
Batches are published to NATS JetStream with unique message IDs
Writers consume batches in parallel and write to targets
Each batch is acknowledged only after successful write

Monitoring & Completion

Real-time statistics track progress, and the system detects completion automatically.

Metrics collector aggregates row counts, throughput, and error rates
Statistics are available via API and displayed in the dashboard
For conversion mode, the stream completes when all data is transferred
For CDC mode, the stream runs continuously until paused or stopped

RESILIENCE

Failure & Recovery

Built to handle interruptions gracefully — failed writes roll back, unprocessed messages get redelivered.

Persistent Event Log

JetStream persists all events until writers acknowledge them. If a writer crashes, unacknowledged events are redelivered automatically.

Automatic Redelivery

When a writer fails to process a batch, it sends a Nak to JetStream, which redelivers the message. Connection and DNS errors are retried with backoff.

Pause & Resume

Running streams can be paused and resumed without losing progress. CDC mode tracks binlog/WAL position so replication continues from where it left off.

No State Corruption

Writers are stateless. A failed write does not leave partial data — transactions roll back cleanly on error.

MODES

Two Modes of Operation

Continuous replication or deterministic batch migration — same engine, different strategies.

CDC Mode

Continuous replication

Real-time streaming from database transaction logs. Captures every INSERT, UPDATE, and DELETE as it happens.

MySQL binlog and PostgreSQL WAL support
Continuous streaming — runs until stopped
Sub-second latency for change propagation
Preserves operation order and transaction boundaries

Learn more

Convert Mode

Deterministic batch migration

Bulk data transfer via direct table reads. Ideal for one-time migrations and periodic syncs.

Efficient chunked reads for large tables
Automatic schema conversion between database types
Parallel processing across multiple tables
Completes automatically when all data is transferred

Learn more

DEPLOYMENT

Self-Hosted by Design

Runs inside your VPC. No SaaS dependency. No data leaves your infrastructure. Same architecture across every environment.

Desktop App

All-in-one binary

Docker

Docker Compose stack

Cloud

AWS, GCP, Azure, DO

On-Premises

Inside your VPC

Simplify CDC infrastructure

Deploy without Kafka, then use CDC mechanics and monitoring from the same workflow.

Deploy Without Kafka Compare with Debezium

CDC mechanics Architecture Docs