tsdb: use defer to release locks in mmapHeadChunks by roidelapluie · Pull Request #18422 · prometheus/prometheus

roidelapluie · 2026-04-01T15:43:39Z

handleChunkWriteError panics on chunk disk mapper errors, which occurs in TestDiskFillingUpAfterDisablingOOO. Without defer, the series mutex held inside mmapHeadChunks was not released during panic unwinding. A subsequent Close() call (e.g. from t.Cleanup) also calls mmapHeadChunks and deadlocks trying to re-acquire the same lock.

Extract the per-stripe and per-series iterations into dedicated helpers so that defer can release both the stripe read lock and the series mutex during panic unwinding.

Update the test comment to reflect that the deadlock is now resolved.

See #17941

Which issue(s) does the PR fix:

Release notes for end users (ALL commits must be considered).

Reviewers should verify clarity and quality.

NONE

handleChunkWriteError panics on chunk disk mapper errors, which occurs under i386 in TestDiskFillingUpAfterDisablingOOO. Without defer, the series mutex held inside mmapHeadChunks was not released during panic unwinding. A subsequent Close() call (e.g. from t.Cleanup) also calls mmapHeadChunks and deadlocks trying to re-acquire the same lock. Extract the per-stripe and per-series iterations into dedicated helpers so that defer can release both the stripe read lock and the series mutex during panic unwinding. Update the test comment to reflect that the deadlock is now resolved. See prometheus#17941 Signed-off-by: Julien Pivotto <[email protected]>

Signed-off-by: Julien Pivotto <[email protected]>

The test only writes ~80 samples, so the default 512MB chunk segment pre-allocation during compaction is unnecessary. Use 1MB instead to avoid large file allocations on constrained CI environments. Signed-off-by: Julien Pivotto <[email protected]>

Running count=500 sequentially on Windows causes SetFileInformationByHandle and FlushFileBuffers to hang, due to I/O pressure from the 128MB head chunk file preallocation on Windows after hundreds of iterations. Use 3 parallel matrix workers with count=50 each to distribute the load while preserving coverage. Signed-off-by: Julien Pivotto <[email protected]>

Signed-off-by: Julien Pivotto <[email protected]>

roidelapluie added 9 commits April 1, 2026 17:41

Focus tests

5737e8b

Signed-off-by: Julien Pivotto <[email protected]>

Test more...

d4769ea

Signed-off-by: Julien Pivotto <[email protected]>

ci: also run 500 tests on Windows

b944582

Signed-off-by: Julien Pivotto <[email protected]>

ci: also run 500 tests on Windows

a7815fe

Signed-off-by: Julien Pivotto <[email protected]>

ci: use -parallel 1 to serialize tests

2b8dc57

Signed-off-by: Julien Pivotto <[email protected]>

ci: add i386 workers for TestDiskFillingUpAfterDisablingOOO

4438728

Signed-off-by: Julien Pivotto <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tsdb: use defer to release locks in mmapHeadChunks#18422

tsdb: use defer to release locks in mmapHeadChunks#18422
roidelapluie wants to merge 9 commits intoprometheus:mainfrom
roidelapluie:roidelapluie/tsdb-mmap-defer-locks

roidelapluie commented Apr 1, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

roidelapluie commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue(s) does the PR fix:

Release notes for end users (ALL commits must be considered).

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

roidelapluie commented Apr 1, 2026 •

edited

Loading