fix(spanner): preserve all async cache updates#12740
Conversation
There was a problem hiding this comment.
Code Review
This pull request refactors the ChannelFinder class to support queuing multiple asynchronous cache updates using a ConcurrentLinkedQueue, replacing the previous logic that only retained the latest pending update. It introduces helper methods to filter out non-material updates and improves the handling of database ID transitions to prevent unnecessary cache clears. The update draining mechanism was updated to ensure all queued tasks are processed, and awaitPendingUpdates was adjusted for better synchronization in tests. New unit tests were added to verify that updates are processed in order without being dropped and to validate edge cases for database ID changes. I have no feedback to provide as there were no review comments to assess.
🤖 I have created a release *beep* *boop* --- <details><summary>1.84.0</summary> ## [1.84.0](v1.83.0...v1.84.0) (2026-04-10) ### Features * [aiplatform] [Memorystore for Redis Cluster] Add support for ([0bd7666](0bd7666)) * [aiplatform] Add container_spec to Reasoning Engine public protos ([0bd7666](0bd7666)) * [aiplatform] Add container_spec to Reasoning Engine public protos ([0bd7666](0bd7666)) * [aiplatform] Add container_spec to Reasoning Engine public protos ([3ba3854](3ba3854)) * [aiplatform] Add container_spec to Reasoning Engine public protos ([3ba3854](3ba3854)) * [aiplatform] Add custom session id field to create session v1 ([e29dd99](e29dd99)) * [aiplatform] add evaluation metrics and autorater configuration to ([0bd7666](0bd7666)) * [aiplatform] RagMetadata and RagDataSchema concepts and Batch API ([e29dd99](e29dd99)) * [backupdr] Adding new workload specific fields for AlloyDB ([6344cb0](6344cb0)) * [ces] update public libraries for CES v1 ([6344cb0](6344cb0)) * [ces] update public libraries for CES v1beta ([0bd7666](0bd7666)) * [ces] update public libraries for CES v1beta ([0bd7666](0bd7666)) * [chat] Addition of Section and SectionItem APIs ([0bd7666](0bd7666)) * [chat] Support app authentication with admin-consent scopes for ([0bd7666](0bd7666)) * [compute] Update Compute Engine v1 API to revision 20260227 ([e29dd99](e29dd99)) * [databasecenter] A new value `SUB_RESOURCE_TYPE_READ_POOL` is ([6344cb0](6344cb0)) * [dataflow] Add Pausing/Yaml capabilities to public protos ([3ba3854](3ba3854)) * [dataflow] add sha256 field to Package proto ([0bd7666](0bd7666)) * [dataflow] add sha256 field to Package proto ([3ba3854](3ba3854)) * [dataform] add folders and teamFolders related changes to v1 ([6344cb0](6344cb0)) * [datalineage] add configmanagement v1 module ([#12355](#12355)) ([2def625](2def625)) * [datamanager] add INVALID_MERCHANT_ID to the ErrorReason enum for ([6344cb0](6344cb0)) * [dialogflow-cx] updated v3 dialogflow client libraries with ([6344cb0](6344cb0)) * [dialogflow] updated v2 dialogflow client libraries ([6344cb0](6344cb0)) * [dialogflow] updated v2beta1 dialogflow client libraries ([6344cb0](6344cb0)) * [dlp] added support for detecting key-value pairs in client ([e5e22ed](e5e22ed)) * [document-ai] Add a field for upgrading previous processor version ([e29dd99](e29dd99)) * [document-ai] Added a fields for image and table annotation output ([0bd7666](0bd7666)) * [geminidataanalytics] add `ParameterizedSecureViewParameters` ([e29dd99](e29dd99)) * [geocode] new module for geocode ([#12343](#12343)) ([474efb1](474efb1)) * [grafeas] Added line_number to FileLocation ([e29dd99](e29dd99)) * [iap] add oauth fields for IapSettings ([e29dd99](e29dd99)) * [netapp] Add ONTAP passthrough APIs ([6344cb0](6344cb0)) * [network-security] Publish proto definitions for AuthzPolicy, ([6344cb0](6344cb0)) * [redis-cluster] [Memorystore for Redis Cluster] Add support for ([0bd7666](0bd7666)) * [redis-cluster] [Memorystore for Redis Cluster] Add support for ([3ba3854](3ba3854)) * [redis-cluster] [Memorystore for Redis Cluster] Add support for ([3ba3854](3ba3854)) * [securesourcemanager] Add CustomHostConfig to configure custom ([6344cb0](6344cb0)) * [securitycenter] Support Chokepoint and external exposure in ([e29dd99](e29dd99)) * [shopping-css] add product rating fields to CSS API v1. This is in ([e29dd99](e29dd99)) * [shopping-merchant-products] update products_common fields to ([e29dd99](e29dd99)) * [storage] populate the `persisted_data_checksums` field with ([e5e22ed](e5e22ed)) * [texttospeech] Support safety settings for Gemini voices and ([0bd7666](0bd7666)) * [texttospeech] Support safety settings for Gemini voices and ([0bd7666](0bd7666)) * [texttospeech] Support safety settings for Gemini voices and ([0bd7666](0bd7666)) * [texttospeech] Support safety settings for Gemini voices and ([0bd7666](0bd7666)) * [translate] A new field `mime_type` is added to message ([e5e22ed](e5e22ed)) * [valkey] [Memorystore for Valkey] Add support for Flexible CA ([0bd7666](0bd7666)) * [valkey] [Memorystore for Valkey] Add support for Flexible CA ([0bd7666](0bd7666)) * [valkey] [Memorystore for Valkey] Add support for Flexible CA ([3ba3854](3ba3854)) * [vectorsearch] Mark Vector Search v1 API as GA ([e29dd99](e29dd99)) * Add getProjectId getter for ComputeEngineCredentials ([#1833](#1833)) ([0a7895a](0a7895a)) * add initial span tracing to http calls ([#12089](#12089)) ([db50ccd](db50ccd)) * Add more attributes to golden signals metrics. ([#4135](#4135)) ([bc82dcb](bc82dcb)) * Add v1 hypercomputecluster client library ([#12110](#12110)) ([b3e042a](b3e042a)) * Add v1beta BigLake Hive client library ([#12118](#12118)) ([73c0bd3](73c0bd3)) * Add v1beta networkconnectivity client library ([#12111](#12111)) ([2ae693e](2ae693e)) * **bigguery:** add url.domain to span tracing ([#12208](#12208)) ([6f79c2d](6f79c2d)) * **bigquery observability:** add version attribute to span tracing ([#12132](#12132)) ([95c3eb8](95c3eb8)) * **bigquery:** add gcp.resource.destination.id for span tracing ([#12134](#12134)) ([5f31ded](5f31ded)) * **bigquery:** add HTTP response attribute tracing ([#12109](#12109)) ([f8a13e5](f8a13e5)) * **bigquery:** add opentelemetry W3C Trace Context to headers ([#12203](#12203)) ([965761a](965761a)) * **bigquery:** add resend attribute to span tracing + integration tests ([#12313](#12313)) ([167722d](167722d)) * **bigquery:** add url.full attribute to span tracing ([#12176](#12176)) ([7fdf9ff](7fdf9ff)) * **bigquery:** add url.template to span tracing ([#12181](#12181)) ([30f8afb](30f8afb)) * **bigquery:** added error attributes to span tracing ([#12115](#12115)) ([863d23b](863d23b)) * **dataplex:** add DataProductService to manage data products and ([e29dd99](e29dd99)) * Extract resource name from unary requests for tracing ([#4159](#4159)) ([23b16b7](23b16b7)) * **gapic-generator-java:** Extract resource name heuristicly ([#12207](#12207)) ([f46480a](f46480a)) * **gax-httpjson:** add HttpJsonErrorParser utility ([#4137](#4137)) ([6fe2446](6fe2446)) * **gax-httpjson:** populate ErrorDetails in HttpJsonApiExceptionFactory ([#4145](#4145)) ([63f5be9](63f5be9)) * **gax:** Actionable Errors Logging API Tracer ([#12202](#12202)) ([8d23279](8d23279)) * **gax:** Add error attributes to golden signal metrics. ([#12564](#12564)) ([063dfe5](063dfe5)) * **gax:** add utility for logging actionable errors ([#4144](#4144)) ([54fb8a5](54fb8a5)) * **gax:** Implement trace context extraction and injection with integration test ([#12625](#12625)) ([6675310](6675310)) * **observability:** Implement gcp.client.service attribute ([#12315](#12315)) ([e99812f](e99812f)) * **observability:** implement url.domain attribute ([#12316](#12316)) ([0a865bf](0a865bf)) * **sdk-platform-java:** Add CompositeTracer and CompositeTracerFactory. ([#12321](#12321)) ([4b5e8af](4b5e8af)) * Switch Eef metrics to using built in open telemetry ([#4385](#4385)) ([759bb22](759bb22)) ### Bug Fixes * Add error attributes to logging ([#12685](#12685)) ([a9198ee](a9198ee)) * **bq jdbc:** allow & ignore unknown parameters ([#12352](#12352)) ([2332ff1](2332ff1)) * **bq jdbc:** ensure getMoreResults() always moves the cursor ([#12353](#12353)) ([ac1f0f4](ac1f0f4)) * **ci:** consolidate duplicate yaml keys in github actions workflows ([#12306](#12306)) ([f644a19](f644a19)) * Clean up attributes for traces and metrics ([#12677](#12677)) ([914f97f](914f97f)) * Decrease log level for directpath warnings outside GCE ([#4139](#4139)) ([5151f34](5151f34)) * fix getLong on NUMERIC ([#2420](#2420)) ([75ec5c2](75ec5c2)) * fix unclosed literal error for consecutive backslashes ([#4387](#4387)) ([c00c633](c00c633)) * **gax-grpc:** add pick_first fallback to direct path service config ([#4143](#4143)) ([4934ad8](4934ad8)) * **gax:** Implement lazy resource name evaluation in ApiTracerContext ([#12618](#12618)) ([5e47749](5e47749)) * **gax:** reduce visibility of ApiTracerContext methods ([#12738](#12738)) ([1760164](1760164)) * Handle null server address ([#12184](#12184)) ([435dd8c](435dd8c)) * **hermetic-build:** do not add release please comments on vertexai poms ([#12559](#12559)) ([5e161a7](5e161a7)) * **hermetic-build:** prevent overwrite of Version.java in new libraries' ([#12742](#12742)) ([3bcca8e](3bcca8e)) * **java-spanner:** use the existing dependency versions ([#12746](#12746)) ([8650bc6](8650bc6)) * **jdbc bq:** unshade BQ SDK in uber jar ([#12078](#12078)) ([c4cabde](c4cabde)) * normalize indentation in owlbot yamls for new libraries ([#12759](#12759)) ([a4997cb](a4997cb)) * **o11y:** composite tracer to process url changes ([#12754](#12754)) ([70f75bd](70f75bd)) * **o11y:** create noop tracer when artifact ID is not set ([#12307](#12307)) ([630d83d](630d83d)) * **o11y:** do not record error.type in successful runs ([#12620](#12620)) ([28eebf0](28eebf0)) * **o11y:** remove `gpc.client.language` attribute ([#12621](#12621)) ([40d2e43](40d2e43)) * **oauth2:** mask sensitive tokens in HTTP logs ([#1900](#1900)) ([3e4ccb7](3e4ccb7)) * Populate method level attributes in metrics recording ([#4149](#4149)) ([69aabf8](69aabf8)) * **release:** add Version.java as extra files in release-please ([#12617](#12617)) ([f5101d9](f5101d9)) * remove Version.java from extra-files in release-please-config ([#12717](#12717)) ([ab29069](ab29069)) * **spanner:** enforce READY-only location aware routing and add endpoint lifecycle management ([ecb86fd](ecb86fd)) * **spanner:** enforce READY-only location aware routing and add endpoint lifecycle management ([#12678](#12678)) ([ca9edb9](ca9edb9)) * **spanner:** ensure executeQueryAsync is non-blocking ([#12715](#12715)) ([b7e34d2](b7e34d2)) * **spanner:** fix grpc-gcp affinity cleanup and multiplexed channel usage leaks ([#12726](#12726)) ([55c9857](55c9857)) * **spanner:** honor built-in metrics opt-out for gRPC metrics exporter ([#12711](#12711)) ([57baaea](57baaea)) * **spanner:** improve grpc-gcp affinity cleanup and location-aware retries ([a157c2f](a157c2f)) * **spanner:** improve grpc-gcp affinity cleanup and location-aware retries ([#12682](#12682)) ([aca0428](aca0428)) * **spanner:** preserve all async cache updates ([#12740](#12740)) ([b8bf432](b8bf432)) * StreamWriterTest to allow version in trace ID ([#12084](#12084)) ([d463c15](d463c15)) * **telemetry:** fix incorrect span titles and missing otelAttributes ([#12080](#12080)) ([f1c04a9](f1c04a9)) * update dependency com.google.cloud:sdk-platform-java-config to v3.58.0 ([#12106](#12106)) ([15fa933](15fa933)) * update Version.java and correct spanner version for 1.83.0 release ([#12712](#12712)) ([c2147fc](c2147fc)) * use dynamic tracer name instead of hardcoded gax-java ([#12190](#12190)) ([dea24db](dea24db)) ### Performance Improvements * **spanner:** reduce autoboxing when reading data ([#12741](#12741)) ([6b83871](6b83871)) ### Dependencies * bump jackson version to 2.18.3 ([#12351](#12351)) ([50304c1](50304c1)) * update dependencies.txt for grpc-gcp to 1.9.2 ([#4164](#4164)) ([f336fdc](f336fdc)) * update dependency com.google.apis:google-api-services-storage to v1-rev20260204-2.0.0 ([#1750](#1750)) ([340ecbe](340ecbe)) * update dependency com.google.apis:google-api-services-storage to v1-rev20260204-2.0.0 ([#3519](#3519)) ([1531107](1531107)) * update dependency com.google.cloud:google-cloud-storage to v2.64.1 ([#1752](#1752)) ([8fb6935](8fb6935)) * update dependency com.google.cloud:sdk-platform-java-config to v3.58.0 ([#1751](#1751)) ([9cc3e22](9cc3e22)) * update dependency com.google.cloud:sdk-platform-java-config to v3.58.0 ([#3523](#3523)) ([26d772a](26d772a)) * update dependency node to v24 ([#3509](#3509)) ([f308477](f308477)) * update gcr.io/cloud-devrel-public-resources/storage-testbench docker tag to v0.62.0 ([#3526](#3526)) ([ca29d5e](ca29d5e)) * update googleapis/sdk-platform-java action to v2.68.0 ([#3522](#3522)) ([abae1ac](abae1ac)) ### Reverts * ci: only run default list of graalvm tests if too many modules are touched ([#12292](#12292)) ([92bcdf4](92bcdf4)) ### Documentation * [dataplex] Change Dataplex library from `ALPHA` to `GA` ([6344cb0](6344cb0)) * [errorreporting] automated code change ([e29dd99](e29dd99)) * [run] An existing repeated string field custom_audiences is marked ([015d9a1](015d9a1)) * add warning about potential sensitive data in tracing ([#12701](#12701)) ([684511a](684511a)) * **hermetic-build:** improve usability of development guide ([#12362](#12362)) ([5944127](5944127)) </details> --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
Summary
This change fixes cache updates being missed after cache application moved off the request hot path. It replaces the lossy async updater with a lossless queue, skips truly empty async cache updates, moves endpoint lifecycle reconciliation off the cache-apply critical path
Problem
After moving cache application off the request hot path, cache updates could either be lost or applied too slowly. In production this showed up as:
We also observed that a large number of responses carried a
cache_updatefield that was present but effectively empty, which added queue pressure without changing routing state.Root Cause
There were a few separate issues:
databaseId=0updates could clear an already initialized finder state even when they did not carry meaningful cache contentsWhat This PR Changes
Result
Based on production logs after the fix:
This indicates the main issue was queue pressure from lossy handling and empty/no-op updates, not slow cache map mutation itself.