[improve][common] Optimize TopicName.get() to reduce lock contention on cache lookup by liangyepianzhou · Pull Request #25367 · apache/pulsar

liangyepianzhou · 2026-03-20T06:02:07Z

Motivation

TopicName.get() previously used ConcurrentHashMap.computeIfAbsent() to populate the topic-name cache. Although computeIfAbsent is atomic, it holds the internal bin-lock for the entire duration of the mapping function, which includes the non-trivial TopicName construction (string splitting, validation, etc.).

Under high-concurrency workloads where many threads simultaneously encounter the same uncached topic name, this causes unnecessary lock contention and can degrade throughput.

Modifications

Replace computeIfAbsent with an explicit two-step pattern:

Fast path: call cache.get(topic) first — a single volatile read with no locking — and return immediately on a cache hit (steady-state case).
Slow path (cache miss): construct TopicName outside the lock, then use cache.putIfAbsent() to insert. If two threads race on the same key, one wins the putIfAbsent and the other's instance is discarded; this is safe because TopicName is immutable.

Verifying this change

Make sure that the change passes the CI checks.

(Please pick either of the following options)

This change is a trivial rework / code cleanup without any test coverage.

(or)

This change is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

Added integration tests for end-to-end deployment with large payloads (10MB)
Extended integration test for recovery after broker failure

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

Documentation

doc
doc-required
doc-not-needed
doc-complete

Matching PR in forked repository

PR in forked repository:

…on cache lookup ### Motivation `TopicName.get()` previously used `ConcurrentHashMap.computeIfAbsent()` to populate the topic-name cache. Although `computeIfAbsent` is atomic, it holds the internal bin-lock for the entire duration of the mapping function, which includes the non-trivial `TopicName` construction (string splitting, validation, etc.). Under high-concurrency workloads where many threads simultaneously encounter the same uncached topic name, this causes unnecessary lock contention and can degrade throughput. ### Modifications Replace `computeIfAbsent` with an explicit two-step pattern: 1. **Fast path**: call `cache.get(topic)` first — a single volatile read with no locking — and return immediately on a cache hit (steady-state case). 2. **Slow path** (cache miss): construct `TopicName` *outside* the lock, then use `cache.putIfAbsent()` to insert. If two threads race on the same key, one wins the `putIfAbsent` and the other's instance is discarded; this is safe because `TopicName` is immutable. Add a Javadoc comment on `get()` explaining the rationale.

liangyepianzhou · 2026-03-20T06:02:34Z

Background

While load-testing Pulsar with a single topic containing 1,000,000 partitions, we observed that TopicName.get() was consuming a disproportionate amount of CPU, showing up prominently in flame graphs.

Root Cause Analysis

Two compounding issues were identified:

topicNameCacheMaxCapacity too small — With 1M partitions, the default cache capacity is inevitably exceeded, triggering repeated clearIfReachedMaxCapacity calls and causing cache stampedes where all partitions miss simultaneously.
new TopicName() executed inside the lock — The current implementation uses computeIfAbsent(key, TopicName::new), which holds the ConcurrentHashMap bin-lock for the entire duration of object construction. Under concurrent cache misses, threads serialize on the same lock, degrading throughput significantly.

Fix

Replace computeIfAbsent with a lock-free get + out-of-lock construction + putIfAbsent pattern:

// Before
return cache.computeIfAbsent(topic, TopicName::new);

// After
TopicName cached = cache.get(topic);
if (cached != null) return cached;
TopicName created = new TopicName(topic);          // constructed outside lock
TopicName existing = cache.putIfAbsent(topic, created);
return existing != null ? existing : created;

Under concurrent misses on the same key, each thread constructs its own instance independently; putIfAbsent elects the winner and the losers are simply GC'd — eliminating bin-lock contention entirely.

Benchmark Results (JMH · SingleShotTime · 8 threads · 1M partitions · full cold cache)

Benchmark	Avg (ms)	Median (ms)	p99 (ms)	Min (ms)
`miss_A` — `computeIfAbsent`	427.1	395.3	644.5	316.5
`miss_B` — `get + putIfAbsent`	287.1	278.9	368.0	236.1
Improvement	~1.49×	~1.42×	~1.75×	~1.34×

The tail latency improvement (1.75× at p99) is especially significant: computeIfAbsent causes severe jitter under lock contention (316–644 ms range), while get+putIfAbsent stays stable (236–368 ms).

Raw results: miss_B_1M_result.json
Benchmark source: TopicNameCacheBenchmark.java

merlimat

Change looks good.

can you try to compare to just call cache .put() instead of putIfAbsent()?

It would be good if you could add the jhm code to the microbenchmarks module

lhotari · 2026-03-20T20:11:09Z

Related PR from the past #24457 (change was rejected so I closed it) with a lot of interesting comments.
One of the comments came from Caffeine author: #24457 (comment).
Besides performance another detail to consider is the duplication of java.lang.String instances in memory.

lhotari

This doesn't address one of the key problems which is that the cache is cleared if the size exceeds the max size setting (topicNameCacheMaxCapacity):

pulsar/pulsar-common/src/main/java/org/apache/pulsar/common/naming/TopicName.java

Lines 54 to 62 in 2e3e78d

    
           public static void clearIfReachedMaxCapacity(int maxCapacity) { 
        
               if (maxCapacity < 0) { 
        
                   // Unlimited cache. 
        
                   return; 
        
               } 
        
               if (cache.size() > maxCapacity) { 
        
                   cache.clear(); 
        
               } 
        
           }

We'd be better off by switching to use Caffeine again. The Caffeine author commented in #24457 (comment) that the bottleneck in the earlier version has been addressed. Since we have upgraded to Java 17 for the client, we can use the newer Caffeine version.

lhotari · 2026-03-20T20:24:38Z

The flamegraph in #25367 (comment) looks like it was from an old version of Pulsar that uses Guava Cache (before #23052 <3.0.6, <3.3.1).
@liangyepianzhou What version of Pulsar are you load testing?

lhotari · 2026-03-20T20:37:05Z

The tail latency improvement (1.75× at p99) is especially significant: computeIfAbsent causes severe jitter under lock contention (316–644 ms range), while get+putIfAbsent stays stable (236–368 ms).

The benchmark seems to measure the miss of 1M cache entries. That's not a very realistic scenario that there would be such amount of misses at once.

I think it would be more useful to have a solution that ensures that the memory use of the cache is bounded.
That is something that I have added quite recently to AbstractMetadataStore children cache:

pulsar/pulsar-metadata/src/main/java/org/apache/pulsar/metadata/impl/AbstractMetadataStore.java

Lines 123 to 141 in 6e577f0

    
           long childrenCacheMaxSizeBytes = getChildrenCacheMaxSizeBytes(); 
        
           Caffeine<Object, Object> childrenCacheBuilder = Caffeine.newBuilder() 
        
                   .recordStats() 
        
                   .refreshAfterWrite(CACHE_REFRESH_TIME_MILLIS, TimeUnit.MILLISECONDS) 
        
                   .expireAfterWrite(CACHE_REFRESH_TIME_MILLIS * 2, TimeUnit.MILLISECONDS); 
        
           if (childrenCacheMaxSizeBytes > 0) { 
        
               childrenCacheBuilder.maximumWeight(childrenCacheMaxSizeBytes) 
        
                       .weigher((String key, List<String> children) -> { 
        
                           // calculate the total byte size of the key and entries in the children list 
        
                           // to get some estimation of the required heap memory required for the entry. 
        
                           // add 16 bytes overhead for Java object header and 16 bytes for java.lang.String fields. 
        
                           int totalSize = ByteBufUtil.utf8Bytes(key) + 32; 
        
                           for (String child : children) { 
        
                               totalSize += ByteBufUtil.utf8Bytes(child) + 32; 
        
                           } 
        
                           return totalSize; 
        
                       }); 
        
           }

in #24868
The max size is limited to 20% of max heapsize.

For TopicName cache there could be a bytesize limit too. The StringInterner solution would be useful at least for namespace and tenant Strings to ensure that those don't cause heap duplication if the namespace or tenant caches expire or overflow.

lhotari · 2026-03-20T20:41:25Z

Benchmark source: TopicNameCacheBenchmark.java

The benchmark doesn't use actual code which would be part of the Pulsar code base.
I believe that this is a more representative way: https://github.com/lhotari/pulsar/blob/lh-fix-topicname-memory-leak/microbench/src/main/java/org/apache/pulsar/common/naming/TopicNameBenchmark.java
That was done in #25367 which wasn't merged.

liangyepianzhou · 2026-03-23T02:48:32Z

The flamegraph in #25367 (comment) looks like it was from an old version of Pulsar that uses Guava Cache (before #23052 <3.0.6, <3.3.1). @liangyepianzhou What version of Pulsar are you load testing?

The flamegraph comes from 3.0.5, I change version to 3.0.13 and make topicNameCacheMaxCapacity = -1. CPU usage dropped from 90% to 9%, but the percentage of TopicName.get() in the flame graph is still very high.

The benchmark seems to measure the miss of 1M cache entries. That's not a very realistic scenario that there would be such amount of misses at once.

In my scenario, one topic has 1 million partitions, so there are 1 million cache misses during the initial startup. Topic loading is currently very slow, and I suspect this may be one of the reasons.

liangyepianzhou · 2026-04-13T07:27:46Z

The flamegraph in #25367 (comment) looks like it was from an old version of Pulsar that uses Guava Cache (before #23052 <3.0.6, <3.3.1). @liangyepianzhou What version of Pulsar are you load testing?

Thanks for pointing out the historical context and Ben Manes' insights!

Just to clarify the scope of this PR: the get + putIfAbsent change is intentionally a minimal, targeted fix to reduce bin-lock contention during cache misses, independent of the broader cache design questions (capacity bounding, Caffeine version, soft references, etc.).

That said, I'd like to confirm whether this small optimization is still worthwhile on its own:

Even with a properly bounded cache, concurrent cache misses (e.g. during a cold start or after a cache clear) will still happen. In those cases, putIfAbsent avoids holding the bin-lock during TopicName construction, which seems beneficial regardless of the eviction strategy.
The fast path (cache.get() first) is a strict improvement for the steady-state hit case with zero synchronization overhead.

So the question is: do you see any reason this low-risk change should be blocked on the larger cache redesign? Happy to fold it into a bigger PR if that's preferred, but wanted to check if it can stand alone first.

lhotari

LGTM, great solution @liangyepianzhou. Just one comment about documenting implementation details in javadoc.

…s) performance

liangyepianzhou · 2026-04-20T09:59:45Z

Change looks good.

can you try to compare to just call cache .put() instead of putIfAbsent()?

It would be good if you could add the jhm code to the microbenchmarks module

Thanks for the suggestion. Fixed.

BewareMyPower · 2026-04-20T10:07:24Z

I think we should not depend much from the cache itself. The TopicName construction is low-efficient and used nearly everywhere. Could you try my previous patch here (#24463) to see if there is any improvement? If so, I can resolve the conflicts again

lhotari · 2026-04-20T11:06:08Z

I think we should not depend much from the cache itself. The TopicName construction is low-efficient and used nearly everywhere. Could you try my previous patch here (#24463) to see if there is any improvement? If so, I can resolve the conflicts again

@BewareMyPower Could you rebase #24463? I think we can handle that optimization separately.

On the caching: one benefit worth preserving is that the TopicName and NamespaceName caches reduce java.lang.String instance duplication, which is valuable beyond just avoiding the construction cost itself. So I'd lean toward keeping the caches rather than removing them.

liangyepianzhou · 2026-04-21T06:24:24Z

I think we should not depend much from the cache itself. The TopicName construction is low-efficient and used nearly everywhere. Could you try my previous patch here (#24463) to see if there is any improvement? If so, I can resolve the conflicts again

@BewareMyPower After applying your optimization, the benchmark results are as follows: jmh2 uses computeIfAbsent, jmh1 uses put.

Benchmark Mode Cnt Score Error Units
TopicNameGetBenchmark.coldStartGet ss 50 5.600 ± 1.722 us/op
[1]+ Done java -jar ./microbench-*-benchmarks.jar TopicNameGetBenchmark 2>&1 > jmh1.log

Benchmark Mode Cnt Score Error Units
TopicNameGetBenchmark.coldStartGet ss 50 6.446 ± 2.168 us/op
[1]+ Done java -jar ./microbench-*-benchmarks.jar TopicNameGetBenchmark 2>&1 > jmh2.log

BewareMyPower · 2026-04-21T07:09:46Z

so the result is now with this PR (replace computeIfAbsent with put), it's

1.5x of the master branch (427 / 287)
0.87x of the master branch + my improvement in [improve][broker] Improve the performance of TopicName constructor #24463 (5.600 / 6.446)

~~Then I believe both PRs are valuable, I will rebase my PR soon, for this PR, just go ahead to fix the CI~~

BewareMyPower · 2026-04-21T07:28:36Z

I'm rebasing my PR currently. But it's a bit confusing from your test result. It seems that switching to put would be slower with my improvement. Is there anything wrong I understood?

liangyepianzhou · 2026-04-21T07:37:01Z

I'm rebasing my PR currently. But it's a bit confusing from your test result. It seems that switching to put would be slower with my improvement. Is there anything wrong I understood?

Are you testing the cold start / cache-miss scenario?

My understanding is that the main latency comes from the cache-miss startup phase, where the optimization is most noticeable. You can use my JMH program to test it.

BewareMyPower · 2026-04-21T07:39:34Z

No. I didn't run any test for now. I'm just analyzing your test report here: #25367 (comment)

mh2 uses computeIfAbsent, jmh1 uses put.

jmh1: 5.600 ± 1.722 us/op
jmh2: 6.446 ± 2.168 us/op

Oh my bad, I misunderstood the unit (us/op). I've thought it's ops/us

…on cache lookup (#25367) ### Motivation `TopicName.get()` previously used `ConcurrentHashMap.computeIfAbsent()` to populate the topic-name cache. Although `computeIfAbsent` is atomic, it holds the internal bin-lock for the entire duration of the mapping function, which includes the non-trivial `TopicName` construction (string splitting, validation, etc.). Under high-concurrency workloads where many threads simultaneously encounter the same uncached topic name, this causes unnecessary lock contention and can degrade throughput. ### Modifications Replace `computeIfAbsent` with an explicit two-step pattern: 1. **Fast path**: call `cache.get(topic)` first — a single volatile read with no locking — and return immediately on a cache hit (steady-state case). 2. **Slow path** (cache miss): construct `TopicName` *outside* the lock, then use `cache.put()` to insert. (cherry picked from commit 8c4e83d)

…on cache lookup (apache#25367) ### Motivation `TopicName.get()` previously used `ConcurrentHashMap.computeIfAbsent()` to populate the topic-name cache. Although `computeIfAbsent` is atomic, it holds the internal bin-lock for the entire duration of the mapping function, which includes the non-trivial `TopicName` construction (string splitting, validation, etc.). Under high-concurrency workloads where many threads simultaneously encounter the same uncached topic name, this causes unnecessary lock contention and can degrade throughput. ### Modifications Replace `computeIfAbsent` with an explicit two-step pattern: 1. **Fast path**: call `cache.get(topic)` first — a single volatile read with no locking — and return immediately on a cache hit (steady-state case). 2. **Slow path** (cache miss): construct `TopicName` *outside* the lock, then use `cache.put()` to insert. (cherry picked from commit 8c4e83d) (cherry picked from commit 069d1a4)

github-actions Bot added the doc-not-needed Your PR changes do not impact docs label Mar 20, 2026

liangyepianzhou requested review from lhotari and poorbarcode March 20, 2026 06:54

merlimat reviewed Mar 20, 2026

View reviewed changes

lhotari reviewed Mar 20, 2026

View reviewed changes

Merge remote-tracking branch 'origin/master' into optimize/TopicName-get

90c66fa

just call cache .put() instead of putIfAbsent()

7e4af0d

lhotari reviewed Apr 17, 2026

View reviewed changes

Comment thread pulsar-common/src/main/java/org/apache/pulsar/common/naming/TopicName.java Outdated

lhotari approved these changes Apr 17, 2026

View reviewed changes

xiangying added 2 commits April 20, 2026 16:46

simply code comments

e923046

JMH benchmark for {@link TopicName#get(String)} cold-start (cache-mis…

010b8a8

…s) performance

liangyepianzhou requested a review from merlimat April 20, 2026 09:59

improve jmh with more threads

8accc71

package-info.java file

c87108f

BewareMyPower mentioned this pull request Apr 21, 2026

[improve][broker] Improve the performance of TopicName constructor #24463

Merged

4 tasks

BewareMyPower approved these changes Apr 21, 2026

View reviewed changes

liangyepianzhou merged commit 8c4e83d into apache:master Apr 21, 2026
79 of 82 checks passed

liangyepianzhou deleted the optimize/TopicName-get branch April 21, 2026 08:56

lhotari added release/4.0.10 release/4.2.1 labels Apr 21, 2026

lhotari added the cherry-picked/branch-4.0 label Apr 21, 2026

lhotari added cherry-picked/branch-4.2 release/4.1.4 labels Apr 21, 2026

lhotari added the cherry-picked/branch-4.1 label Apr 21, 2026

lhotari added release/3.0.17 cherry-picked/branch-3.0 labels Apr 21, 2026

	public static void clearIfReachedMaxCapacity(int maxCapacity) {
	if (maxCapacity < 0) {
	// Unlimited cache.
	return;
	}
	if (cache.size() > maxCapacity) {
	cache.clear();
	}
	}

Conversation

liangyepianzhou commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Verifying this change

Does this pull request potentially affect one of the following parts:

Documentation

Matching PR in forked repository

Uh oh!

liangyepianzhou commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

merlimat left a comment

Choose a reason for hiding this comment

Uh oh!

lhotari commented Mar 20, 2026

Uh oh!

lhotari left a comment

Choose a reason for hiding this comment

Uh oh!

lhotari commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lhotari commented Mar 20, 2026

Uh oh!

lhotari commented Mar 20, 2026

Uh oh!

liangyepianzhou commented Mar 23, 2026

Uh oh!

liangyepianzhou commented Apr 13, 2026

Uh oh!

Uh oh!

lhotari left a comment

Choose a reason for hiding this comment

Uh oh!

liangyepianzhou commented Apr 20, 2026

Uh oh!

BewareMyPower commented Apr 20, 2026

Uh oh!

lhotari commented Apr 20, 2026

Uh oh!

liangyepianzhou commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BewareMyPower commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BewareMyPower commented Apr 21, 2026

Uh oh!

liangyepianzhou commented Apr 21, 2026

Uh oh!

BewareMyPower commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

liangyepianzhou commented Mar 20, 2026 •

edited

Loading

liangyepianzhou commented Mar 20, 2026 •

edited

Loading

lhotari commented Mar 20, 2026 •

edited

Loading

liangyepianzhou commented Apr 21, 2026 •

edited

Loading

BewareMyPower commented Apr 21, 2026 •

edited

Loading

BewareMyPower commented Apr 21, 2026 •

edited

Loading