System Architecture for Convert and CDC Workloads

Understand how control plane, data plane, and monitoring layers work together for migration and replication across MySQL, PostgreSQL, files, and S3.

Architecture summary

  • - Sub-second CDC latency from transaction logs
  • - At-least-once delivery with automatic deduplication
  • - Horizontally scalable writers with concurrent worker pools
  • - No ZooKeeper, no Kafka cluster, no JVM tuning
WHY NOT KAFKA

Why This Architecture Exists

Kafka is powerful — but for database-to-database streaming, it brings operational weight that most teams don't need. DBConvert Streams takes a different approach.

Traditional Approach

  • Separate Kafka cluster to deploy and operate
  • ZooKeeper or KRaft for coordination
  • Partition rebalancing on scaling events
  • JVM heap tuning and GC optimization
  • Separate Connect workers for source and sink

DBConvert Streams

  • Embedded NATS JetStream — starts with the app
  • No external coordination service needed
  • Desktop package or one-line Docker setup — no JVM tuning
  • Reader and Writer built in — no plugins to configure
OVERVIEW

System Architecture

Three planes — control, data, and monitoring — working independently for maximum reliability.

Control Plane

UI + API manage stream lifecycle — create, start, pause, stop

Data Plane

Reader captures events, NATS brokers them, Writers apply in parallel

Monitoring

Metrics collected independently — no impact on data throughput

COMPONENTS

Core Components

Four components work together to move data reliably between any supported databases.

Source Reader

Connects to source databases and captures data changes.

  • CDC mode: reads from transaction logs (MySQL binlog, PostgreSQL WAL)
  • Convert mode: direct table reads with chunked processing
  • Publishes batched events to NATS JetStream
  • Automatic schema detection and type mapping

NATS JetStream

Embedded message broker that decouples readers from writers.

  • Runs embedded — no external Kafka or RabbitMQ to manage
  • Persistent streams with at-least-once delivery
  • Thread-safe deduplication via composite batch keys
  • Unacknowledged messages are automatically redelivered on failure

Target Writers

Consume events and write to destination systems in parallel.

  • Horizontally scalable — add more writers for higher throughput
  • Concurrent worker pool per writer instance
  • Transactional writes with automatic retry on failure
  • Supports MySQL, PostgreSQL, CSV, Parquet, and S3 (Snowflake coming soon)

API Server

Central control plane for managing streams and connections.

  • RESTful API for stream lifecycle (create, start, pause, stop)
  • Real-time metrics and progress reporting
  • Connection and credential management
  • Authentication via API keys
RELIABILITY

Delivery Guarantees

Every event is captured, delivered, and written — exactly once to the target.

At-Least-Once Delivery

Every event published to NATS JetStream is persisted until acknowledged by the writer. No silent data loss.

Batch Deduplication

Writers track dispatched batch IDs in memory. If NATS redelivers a message, the writer recognizes it and skips the duplicate — no double-writes.

Transaction Boundaries

CDC mode preserves operation order within transactions. INSERT, UPDATE, and DELETE arrive in the correct sequence.

Automatic Redelivery

If a writer fails to acknowledge a message, JetStream redelivers it automatically. No manual intervention, no lost events.

DATA FLOW

How Data Moves

From source to target in three phases — initialization, transfer, and monitoring.

1

Initialization

The API server validates the stream configuration and coordinates startup across components.

  • API creates a NATS JetStream stream and consumer groups
  • Source Reader connects to the source database and detects schema
  • Target Writers subscribe to the NATS stream and prepare target tables
2

Data Transfer

Data flows from source through the event hub to target writers in parallel batches.

  • Reader captures data (CDC events or table rows) and batches them
  • Batches are published to NATS JetStream with unique message IDs
  • Writers consume batches in parallel and write to targets
  • Each batch is acknowledged only after successful write
3

Monitoring & Completion

Real-time statistics track progress, and the system detects completion automatically.

  • Metrics collector aggregates row counts, throughput, and error rates
  • Statistics are available via API and displayed in the dashboard
  • For conversion mode, the stream completes when all data is transferred
  • For CDC mode, the stream runs continuously until paused or stopped
RESILIENCE

Failure & Recovery

Built to handle interruptions gracefully — failed writes roll back, unprocessed messages get redelivered.

Persistent Event Log

JetStream persists all events until writers acknowledge them. If a writer crashes, unacknowledged events are redelivered automatically.

Automatic Redelivery

When a writer fails to process a batch, it sends a Nak to JetStream, which redelivers the message. Connection and DNS errors are retried with backoff.

Pause & Resume

Running streams can be paused and resumed without losing progress. CDC mode tracks binlog/WAL position so replication continues from where it left off.

No State Corruption

Writers are stateless. A failed write does not leave partial data — transactions roll back cleanly on error.

MODES

Two Modes of Operation

Continuous replication or deterministic batch migration — same engine, different strategies.

CDC Mode

Continuous replication

Real-time streaming from database transaction logs. Captures every INSERT, UPDATE, and DELETE as it happens.

  • MySQL binlog and PostgreSQL WAL support
  • Continuous streaming — runs until stopped
  • Sub-second latency for change propagation
  • Preserves operation order and transaction boundaries
Learn more

Convert Mode

Deterministic batch migration

Bulk data transfer via direct table reads. Ideal for one-time migrations and periodic syncs.

  • Efficient chunked reads for large tables
  • Automatic schema conversion between database types
  • Parallel processing across multiple tables
  • Completes automatically when all data is transferred
Learn more
DEPLOYMENT

Self-Hosted by Design

Runs inside your VPC. No SaaS dependency. No data leaves your infrastructure. Same architecture across every environment.

Desktop App

All-in-one binary

Docker

Docker Compose stack

Cloud

AWS, GCP, Azure, DO

On-Premises

Inside your VPC

Simplify CDC infrastructure

Deploy without Kafka, then use CDC mechanics and monitoring from the same workflow.