Scrape: add string interning to scrape cache#18501
Scrape: add string interning to scrape cache#18501bobrik wants to merge 1 commit intoprometheus:mainfrom
Conversation
| if e.Help != string(help) { | ||
| e.Help = string(help) | ||
| e.help = unique.Make(string(help)) | ||
| e.Help = e.help.Value() |
There was a problem hiding this comment.
This awkwardness enables no breaking changes for the public model that exposes string fields:
|
Mostly opening as an RFC first, will fix CI once there's some consensus on this being a good idea. |
A lot of memory is used by scrape cache:
```
(pprof) list addRef
Total: 99.23GB
ROUTINE ======================== github.com/prometheus/prometheus/scrape.(*scrapeCache).addRef in scrape/scrape.go
6.78GB 6.78GB (flat, cum) 6.83% of Total
. . 1085:func (c *scrapeCache) addRef(met []byte, ref storage.SeriesRef, lset labels.Labels, hash uint64) (ce *cacheEntry) {
. . 1086: if ref == 0 {
. . 1087: return nil
. . 1088: }
1.73GB 1.73GB 1089: ce = &cacheEntry{ref: ref, lastIter: c.iter, lset: lset, hash: hash}
5.04GB 5.04GB 1090: c.series[string(met)] = ce
. . 1091: return ce
. . 1092:}
. . 1093:
. . 1094:func (c *scrapeCache) addDropped(met []byte) {
. . 1095: iter := c.iter
```
Here `met` is a timeseries identity as it comes from the scrape, so it is very much shared
between instances. Instead of storing 255 instances of this coming from every target:
```
edgeworker_request_internalExceptions{isActor="true",exceptionType="overloaded",stableId="cloudflare/cf_imgresize",plan="ss",cordon="paid"}
```
One can store 255 u64 pointers, which sounds a lot more compact.
I built a vanilla prometheus binary and a patched binary and let them both scrape
the same targets in a datacenter with sum(scrape_samples_scraped) of about 10 million.
After running for not a very long period of time in parallel:
* Control:
```
(pprof) top30
Showing nodes accounting for 33.55GB, 94.28% of 35.58GB total
Dropped 702 nodes (cum <= 0.18GB)
Showing top 30 nodes out of 95
flat flat% sum% cum cum%
4.84GB 13.60% 13.60% 4.84GB 13.60% github.com/prometheus/prometheus/model/labels.(*Builder).Labels
4.10GB 11.53% 25.14% 4.10GB 11.53% github.com/prometheus/prometheus/scrape.(*scrapeCache).addRef
3.90GB 10.96% 36.09% 4.58GB 12.88% github.com/prometheus/prometheus/tsdb.newMemSeries (inline)
3.56GB 9.99% 46.09% 3.56GB 9.99% github.com/prometheus/prometheus/tsdb/chunkenc.NewXORChunk (inline)
2.99GB 8.40% 54.49% 2.99GB 8.40% github.com/prometheus/prometheus/tsdb/index.appendWithExponentialGrowth[go.shape.uint64] (inline)
2.41GB 6.78% 61.27% 2.41GB 6.78% github.com/prometheus/prometheus/tsdb.(*txRing).add
1.37GB 3.84% 65.11% 1.37GB 3.84% github.com/prometheus/prometheus/scrape.(*scrapeCache).trackStaleness
1.31GB 3.68% 68.79% 1.31GB 3.68% github.com/prometheus/prometheus/scrape.(*scrapeCache).addDropped
1.22GB 3.44% 72.23% 1.22GB 3.44% github.com/prometheus/prometheus/scrape.NewManager.func1
1.10GB 3.08% 75.31% 1.10GB 3.08% github.com/prometheus/prometheus/tsdb/chunkenc.(*XORChunk).Appender
1.08GB 3.05% 78.36% 5.76GB 16.19% github.com/prometheus/prometheus/tsdb.(*memSeries).cutNewHeadChunk
0.91GB 2.56% 80.93% 0.92GB 2.58% github.com/prometheus/prometheus/tsdb.(*memSeries).mmapChunks
0.70GB 1.97% 82.90% 0.70GB 1.97% github.com/prometheus/prometheus/scrape.(*scrapeCache).setHelp
0.68GB 1.92% 84.82% 0.68GB 1.92% github.com/prometheus/prometheus/tsdb.newTxRing (inline)
0.56GB 1.59% 86.41% 0.56GB 1.59% github.com/prometheus/prometheus/tsdb.(*seriesHashmap).set
0.55GB 1.55% 87.95% 5.70GB 16.02% github.com/prometheus/prometheus/tsdb.(*stripeSeries).getOrSet
0.35GB 0.99% 88.94% 0.36GB 1.01% golang.org/x/net/trace.NewEventLog
0.28GB 0.78% 89.73% 0.28GB 0.78% github.com/prometheus/prometheus/tsdb.NewCircularExemplarStorage
0.26GB 0.72% 90.45% 0.26GB 0.72% github.com/prometheus/prometheus/model/labels.(*ScratchBuilder).Labels
0.25GB 0.71% 91.16% 0.25GB 0.71% bufio.NewReaderSize (inline)
0.23GB 0.66% 91.81% 0.23GB 0.66% bufio.NewWriterSize (inline)
0.20GB 0.57% 92.39% 0.20GB 0.57% github.com/prometheus/prometheus/tsdb.(*blockSeriesSet).At
0.18GB 0.51% 92.89% 0.90GB 2.52% github.com/prometheus/prometheus/promql.(*evaluator).rangeEval
0.12GB 0.34% 93.23% 0.47GB 1.33% github.com/prometheus/prometheus/promql.expandSeriesSet
0.11GB 0.3% 93.53% 8.67GB 24.38% github.com/prometheus/prometheus/tsdb.(*headAppender).Append
0.10GB 0.29% 93.82% 1.88GB 5.28% github.com/prometheus/prometheus/rules.(*Group).Eval.func1
0.05GB 0.15% 93.97% 1.38GB 3.89% github.com/prometheus/prometheus/promql.(*evaluator).eval
0.05GB 0.15% 94.12% 0.92GB 2.59% net/http.(*Transport).dialConn
0.04GB 0.11% 94.22% 8.57GB 24.10% github.com/prometheus/prometheus/tsdb.(*headAppender).getOrCreate
0.02GB 0.059% 94.28% 0.18GB 0.52% github.com/prometheus/prometheus/promql.(*evaluator).VectorBinop
```
* Test:
```
(pprof) top30
Showing nodes accounting for 28.31GB, 94.43% of 29.98GB total
Dropped 684 nodes (cum <= 0.15GB)
Showing top 30 nodes out of 96
flat flat% sum% cum cum%
4.41GB 14.72% 14.72% 4.41GB 14.72% github.com/prometheus/prometheus/model/labels.(*Builder).Labels
3.74GB 12.46% 27.18% 4.38GB 14.63% github.com/prometheus/prometheus/tsdb.newMemSeries (inline)
3.15GB 10.51% 37.69% 3.15GB 10.51% github.com/prometheus/prometheus/tsdb/chunkenc.NewXORChunk (inline)
2.98GB 9.93% 47.61% 2.98GB 9.93% github.com/prometheus/prometheus/tsdb/index.appendWithExponentialGrowth[go.shape.uint64] (inline)
1.93GB 6.42% 54.03% 1.93GB 6.42% github.com/prometheus/prometheus/tsdb.(*txRing).add
1.88GB 6.28% 60.31% 1.88GB 6.28% bytes.growSlice
1.63GB 5.45% 65.76% 1.64GB 5.46% github.com/prometheus/prometheus/scrape.(*scrapeCache).addRef
1.17GB 3.90% 69.66% 1.17GB 3.90% github.com/prometheus/prometheus/scrape.(*scrapeCache).trackStaleness
0.97GB 3.23% 72.89% 5.03GB 16.78% github.com/prometheus/prometheus/tsdb.(*memSeries).cutNewHeadChunk
0.89GB 2.95% 75.84% 0.89GB 2.95% github.com/prometheus/prometheus/tsdb/chunkenc.(*XORChunk).Appender
0.68GB 2.28% 78.12% 0.71GB 2.36% github.com/prometheus/prometheus/tsdb.(*memSeries).mmapChunks
0.65GB 2.17% 80.29% 0.65GB 2.17% github.com/prometheus/prometheus/tsdb.newTxRing (inline)
0.56GB 1.88% 82.17% 5.50GB 18.35% github.com/prometheus/prometheus/tsdb.(*stripeSeries).getOrSet
0.55GB 1.84% 84.02% 0.55GB 1.84% github.com/prometheus/prometheus/tsdb.(*seriesHashmap).set
0.42GB 1.40% 85.42% 0.42GB 1.40% github.com/prometheus/prometheus/scrape.NewManager.func1
0.36GB 1.20% 86.61% 0.36GB 1.21% github.com/prometheus/prometheus/scrape.(*scrapeCache).setHelp
0.33GB 1.09% 87.70% 0.34GB 1.13% golang.org/x/net/trace.NewEventLog
0.28GB 0.92% 88.62% 0.28GB 0.92% github.com/prometheus/prometheus/tsdb.NewCircularExemplarStorage
0.26GB 0.88% 89.50% 0.26GB 0.88% github.com/prometheus/prometheus/scrape.(*scrapeCache).addDropped
0.24GB 0.8% 90.30% 0.24GB 0.8% bufio.NewReaderSize (inline)
0.24GB 0.79% 91.09% 0.24GB 0.79% bufio.NewWriterSize (inline)
0.20GB 0.68% 91.77% 8.95GB 29.86% github.com/prometheus/prometheus/tsdb.(*headAppender).Append
0.19GB 0.63% 92.40% 0.19GB 0.63% internal/stringslite.Clone
0.18GB 0.59% 92.99% 8.69GB 28.97% github.com/prometheus/prometheus/tsdb.(*headAppender).getOrCreate
0.17GB 0.58% 93.57% 0.17GB 0.58% github.com/prometheus/prometheus/tsdb/encoding.(*Encbuf).PutString
0.10GB 0.33% 93.90% 0.26GB 0.86% unique.(*canonMap[go.shape.string]).LoadOrStore
0.06GB 0.22% 94.11% 0.91GB 3.05% net/http.(*Transport).dialConn
0.04GB 0.13% 94.24% 0.38GB 1.27% github.com/prometheus/prometheus/promql.(*evaluator).eval
0.04GB 0.12% 94.37% 0.18GB 0.59% github.com/prometheus/prometheus/promql.expandSeriesSet
0.02GB 0.06% 94.43% 0.52GB 1.72% github.com/prometheus/prometheus/rules.(*Group).Eval.func1
```
Key changes:
* `addRef`: 4.10GB -> 1.63GB
* `addDropped`: 1.31GB -> 0.26GB
* `setHelp`: 0.70GB -> 0.36GB
Given that Go GC requires quite significant overhead, these numbers need to be multiplied
by 1.5-2x to get the actual amount of memory saved as visible from the OS perspective.
Signed-off-by: Ivan Babrou <[email protected]>
7dc8e26 to
b0fd424
Compare
|
/prombench main |
|
⏱️ Welcome to Prometheus Benchmarking Tool. ⏱️ Compared versions: After the successful deployment (check status here), the benchmarking results can be viewed at: Available Commands:
|
|
/prombench cancel |
|
Benchmark cancel is in progress. |
bboreham
left a comment
There was a problem hiding this comment.
Good idea. Prombench showed a small increase in CPU (which is expected) but no decrease in memory (which was also expected). However we don't always trust Prombench.
it is very much shared between instances
To nitpick this wording, I would say it is expected that instances of the same program export the same series.
| // Parsed string to an entry with information about the actual label set | ||
| // and its storage reference. | ||
| series map[string]*cacheEntry | ||
| series map[unique.Handle[string]]*cacheEntry |
There was a problem hiding this comment.
I would add a note that unique is used to share memory across targets.
| metaMtx sync.Mutex // Mutex is needed due to api touching it when metadata is queried. | ||
| metadata map[string]*metaEntry // metadata by metric family name. | ||
| metaMtx sync.Mutex // Mutex is needed due to api touching it when metadata is queried. | ||
| metadata map[unique.Handle[string]]*metaEntry // metadata by metric family name. |
There was a problem hiding this comment.
I wonder whether metadata is worth doing, since it is one per family not one per series.
|
|
||
| func (c *scrapeCache) get(met []byte) (*cacheEntry, bool, bool) { | ||
| e, ok := c.series[string(met)] | ||
| e, ok := c.series[unique.Make(string(met))] |
There was a problem hiding this comment.
Suggest that unique.Make(string(met)) is passed in and cached in the caller so we don't recompute it on line 1016, etc. The pattern for byte[] -> string is a specific compiler optimisation.
There was a problem hiding this comment.
👋 @bobrik is away but I promised him I'll try this patch on our production workload to get a better idea of mem/cpu impact (at least in our environment). I'll deploy patch will all the comments addressed and will report back here.
|
The |
I think:
So the number of string hashes should come down, and some extra Would be good to see some benchmark results. |
|
The calls in get-path functions ( Agreed that caching the Handle at the call site would reduce repeated |
|
Generally all stack allocations in a function will happen at the same time, with a single addition to the stack pointer. I already suggested to remove the metadata changes. |
A lot of memory is used by scrape cache:
Here
metis a timeseries identity as it comes from the scrape, so it is very much shared between instances. Instead of storing 255 instances of this coming from every target:One can store 255 u64 pointers, which sounds a lot more compact.
I built a vanilla prometheus binary and a patched binary and let them both scrape the same targets in a datacenter with sum(scrape_samples_scraped) of about 10 million.
After running for not a very long period of time in parallel:
Key changes:
addRef: 4.10GB -> 1.63GBaddDropped: 1.31GB -> 0.26GBsetHelp: 0.70GB -> 0.36GBGiven that Go GC requires quite significant overhead, these numbers need to be multiplied by 1.5-2x to get the actual amount of memory saved as visible from the OS perspective.
Which issue(s) does the PR fix:
Release notes for end users (ALL commits must be considered).