tsdb: Refactor Head.Init(...) for fast startup#18442
tsdb: Refactor Head.Init(...) for fast startup#18442RushabhMehta2005 wants to merge 6 commits intoprometheus:mainfrom
Head.Init(...) for fast startup#18442Conversation
Signed-off-by: Rushabh Mehta <[email protected]>
bboreham
left a comment
There was a problem hiding this comment.
Seems like a plausible direction; take a look at the CI failures and see if there is anything you can clean up there.
| // Wait for background WAL replay to finish before shutdown. | ||
| h.walReplayWg.Wait() |
There was a problem hiding this comment.
Is that a good idea? WAL replay can take ten minutes; if someone requested to shut down do we want to wait that long?
Could it be cancelable?
There was a problem hiding this comment.
I am waiting here for it to complete as it is the simplest thing to do. What if we kill that goroutine when Close is called? Could it lead to wal file corruption or mess up some other state? I am not sure of the answer to these questions.
| // loadChunkSnapshot replays the chunk snapshot and restores the Head state from it. If there was any error returned, | ||
| // it is the responsibility of the caller to clear the contents of the Head. | ||
| func (h *Head) loadChunkSnapshot() (int, int, map[chunks.HeadSeriesRef]*memSeries, error) { | ||
| func (h *Head) loadChunkSnapshot(multiRef map[chunks.HeadSeriesRef]chunks.HeadSeriesRef, multiRefMtx *sync.Mutex) (int, int, map[chunks.HeadSeriesRef]*memSeries, error) { |
There was a problem hiding this comment.
Consider adding the new parameters to the docstring comment.
There was a problem hiding this comment.
Currently I am not sure if this is the right approach to make loadChunkSnapshot access multiRef (and that too, safely), I think it would be better to update the docstrings once its set in stone.
| if h.opts.EnableFastStartup { | ||
| state, err := h.readSeriesStateFile() |
There was a problem hiding this comment.
Style point: this if-statement is nearly the entire function; could it be two different functions with and without fast startup?
There was a problem hiding this comment.
Ganesh said once we are done with this we would want to remove this if entirely, making fast startup the default behaviour, hence it is kept as so for now.
|
Thanks for the review Bryan, will fix these. |
|
It seems to be hard to make sense of these CI failures, still will look into it. |
Signed-off-by: Rushabh Mehta <[email protected]>
…s happening Signed-off-by: Rushabh Mehta <[email protected]>
Signed-off-by: Rushabh Mehta <[email protected]>
Signed-off-by: Rushabh Mehta <[email protected]>
Signed-off-by: Rushabh Mehta <[email protected]>
Head.Init(...)has to be refactored to make sure it does not block onHead.loadWAL(...). The latter is launched in its own goroutine, there is a check added inHead.Close(...)as well, which makes sure the WAL loading is finished before attempting a graceful shutdown, if it hasn't, we wait for it. This makes the WAL replay run in parallel with scraping.Which issue(s) does the PR fix:
NA
Release notes for end users (ALL commits must be considered).
Reviewers should verify clarity and quality.