fix: handle orphaned S3 segments gracefully in RestoreFromS3 and Read by klaudworks · Pull Request #114 · KafScale/platform

klaudworks · 2026-02-26T22:52:38Z

Motivation

Addresses #112

When a partition is first accessed, RestoreFromS3 rebuilds the in-memory segment index from S3. It lists all .kfs segment files, then downloads each corresponding .index file. If any .index download fails, the entire partition becomes permanently unavailable until someone manually deletes the orphan from S3.
Orphaned .kfs files are a normal byproduct of partial flush failures. uploadFlush uploads .kfs and .index in parallel — when .kfs succeeds but .index fails, the .kfs persists in S3 with no matching index.

Changes

No error classification → typed ErrNotFound error — DownloadIndex returned a plain fmt.Errorf string. The restore loop treated "key doesn't exist" and "S3 is temporarily down" identically — both killed the partition. Both S3 client implementations now return a wrapped ErrNotFound when the index key doesn't exist, so callers can distinguish the two cases.

Single-failure partition kill → metadata-aware orphan skip — One .index upload failure (S3 503, network blip) made the partition permanently unavailable. Now, when an .index is confirmed missing (ErrNotFound) and the segment was never acknowledged to a producer, it's skipped with a warning log. We know a segment was never acknowledged because the metadata store (etcd) tracks the committed offset boundary, and it's only updated after a fully successful flush — if the .index upload failed, the flush failed, and the committed offset was never advanced. Transient errors always propagate regardless, because we can't tell if the .index is truly missing or if S3 is just temporarily unreachable.

Read() returned ErrOffsetOutOfRange for gaps → snap forward — If an intermediate segment was skipped during restore, consumers got an error instead of data from the next available segment. Now Read snaps forward to the next available segment, matching what Kafka does.

last derived from all listed entries, not valid segments — After skipping orphans, the last offset could reference a segment whose .index was never loaded. Now derived from valid segments only, with an empty-segments guard.

MemoryS3Client and awsS3Client now wrap ErrNotFound via %w when the index key does not exist. The AWS client detects NoSuchKey/404 via isNotFoundErr() helper. This lets callers distinguish 'key missing' from transient S3 errors.

Adds an optional *slog.Logger field and a logger() helper that falls back to slog.Default(). Wired in cmd/broker/main.go.

When DownloadIndex returns ErrNotFound and the segment's base offset is >= l.nextOffset (the committed-offset boundary from etcd), skip the segment with a warning log. This handles orphaned .kfs files left by partial flush failures where .index upload failed. Transient errors (non-ErrNotFound) always propagate — we cannot distinguish an orphan from S3 being temporarily down. Also: remove redundant entry struct (reuse segmentRange), derive last offset from valid segments only, guard against len(segments) == 0.

When an orphaned segment is skipped during restore, consumers requesting offsets in the gap get ErrOffsetOutOfRange. Instead, snap forward to the next available segment — matching Kafka's LocalLog.read() behavior.

All tests use MaxBytes: 1 with 70-byte batches, so ShouldFlush triggers synchronously inside AppendBatch. The 2ms sleeps before Flush() were no-ops. Verified with 50x race-detector runs.

novatechflow

thanks again @klaudworks!

…KafScale#114)

klaudworks added 5 commits February 26, 2026 23:34

storage: add ErrNotFound sentinel for DownloadIndex

d2fa5fd

MemoryS3Client and awsS3Client now wrap ErrNotFound via %w when the index key does not exist. The AWS client detects NoSuchKey/404 via isNotFoundErr() helper. This lets callers distinguish 'key missing' from transient S3 errors.

storage: add Logger to PartitionLogConfig

8c72be8

Adds an optional *slog.Logger field and a logger() helper that falls back to slog.Default(). Wired in cmd/broker/main.go.

storage: snap forward in Read() when offset falls in a gap

59f8459

When an orphaned segment is skipped during restore, consumers requesting offsets in the gap get ErrOffsetOutOfRange. Instead, snap forward to the next available segment — matching Kafka's LocalLog.read() behavior.

test: remove dead time.Sleep calls before Flush

fdffa03

All tests use MaxBytes: 1 with 70-byte batches, so ShouldFlush triggers synchronously inside AppendBatch. The 2ms sleeps before Flush() were no-ops. Verified with 50x race-detector runs.

novatechflow approved these changes Feb 27, 2026

View reviewed changes

novatechflow merged commit 16f6140 into KafScale:main Feb 27, 2026
4 checks passed

kamir pushed a commit to kamir/kafscale that referenced this pull request Mar 5, 2026

fix: handle orphaned S3 segments gracefully in RestoreFromS3 and Read (…

039f2e0

…KafScale#114)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: handle orphaned S3 segments gracefully in RestoreFromS3 and Read#114

fix: handle orphaned S3 segments gracefully in RestoreFromS3 and Read#114
novatechflow merged 5 commits intoKafScale:mainfrom
klaudworks:fix-restore-missing-index

klaudworks commented Feb 26, 2026 •

edited

Loading

Uh oh!

novatechflow left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

klaudworks commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Changes

Uh oh!

novatechflow left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

klaudworks commented Feb 26, 2026 •

edited

Loading