[{"id":"41E5S3mkTGDfkZuJZH5k","number":"6876619551109882402","begin":"2026-02-27T12:37:00+00:00","created":"2026-02-27T16:12:30+00:00","end":"2026-02-27T14:35:00+00:00","modified":"2026-03-09T05:25:43+00:00","external_desc":"Vertex AI Gemini API customers experienced increased error rates when accessing the global endpoint.","updates":[{"created":"2026-03-09T05:25:43+00:00","modified":"2026-03-09T05:25:43+00:00","when":"2026-03-09T05:25:43+00:00","text":"# Incident Report\n## Summary\nOn Friday, 27 February 2026 at 04:37 US/Pacific, customers using Vertex AI Gemini API models experienced increased error rates. Impacted services included Google Cloud Support, Agent Assist, Vertex Gemini API and Dialogflow CX in US regions and the global endpoint. The issue persisted for a duration of 1 hour and 58 minutes.\nThis is not the level of quality and reliability we strive to offer you, and we have taken immediate steps to improve the platform’s performance and availability.\n## Root Cause\nThis incident was caused by a configuration change to a safety filtering service that supports all Gemini models. For some specific requests, this created code paths that eventually led to service disruptions and capacity loss for the safety filtering service. Consequently, customers encountered overload (429 and 503) errors for their queries, with some users reporting elevated error rates for specific models in US regions.\n## Remediation and Prevention\nGoogle engineers were alerted to the issue via our automated monitoring system on Friday, 27 February 2026 04:54 US/Pacific and immediately started an investigation.\nEngineers identified the faulty configuration change and initiated a rollback to restore the previous stable configuration. Engineers also added more capacity to the service to stabilize it. Full service restoration was confirmed by 06:35 US/Pacific as the rollback propagated and servers became healthy. \\ \\\nGoogle is committed to preventing a repeat of this issue and is taking the following actions:\n* Reinforcing rollout processes to include mandatory validation checkpoints.\n* Improving alerting systems to monitor critical dependencies more closely.\n## ## Detailed Description of Impact\nOn Friday, 27 February 2026 between 04:37 and 06:35 US/Pacific, customers accessing Vertex Gemini APIs may have experienced the following:\n* **Affected Models:** All Vertex AI Gemini API models were affected, including gemini-2.5-flash, gemini-2.5-flash-lite, gemini-2.5-pro, gemini-3.0-flash-preview, gemini-3.0-pro-preview, gemini-2.0-flash, gemini-2.0-flash-lite.\n* **Error Experience:** * **PayGo Customers:** Experienced primarily 429 Resource Exhausted errors. * **Provisioned Throughput (PT) Customers:** Received 503 Service Unavailable errors. * For PT customers, most errors stopped at **06:00**. For PayGo customers, most errors stopped at **06:20**.\n* **Geographic Scope:** Global endpoint, us-central1, us-east4, and other US regions were impacted.","status":"AVAILABLE","affected_locations":[]},{"created":"2026-03-04T23:23:18+00:00","modified":"2026-03-09T05:25:43+00:00","when":"2026-03-04T23:23:18+00:00","text":"# Preliminary Incident Report\nWe apologize for the inconvenience this service disruption may have caused. We would like to provide some information about this incident below. Please note, this information is based on our best knowledge at the time of posting and is subject to change as our investigation continues. A final Incident Report with preventative actions will be posted once our investigation is complete. If you have experienced impact outside of what is listed below, please reach out to Google Cloud Support using https://cloud.google.com/support.\n## Date/Time of the Issue (All time US/Pacific)\nIncident Start: 27 February 2026 04:37\nIncident End: 27 February 2026 06:35\nDuration: 1 hour, 58 minutes\n## Summary\nOn Friday, 27 February 2026 at 04:37 US/Pacific, customers using Vertex AI Gemini API models (including Gemini 2.0, 2.5, and 3.0 previews) experienced increased error rates. Impacted services included Google Cloud Support, Agent Assist, the Vertex Gemini API and Dialogflow CX in US regions and the global endpoint for a duration of 1 hour and 58 minutes.\nThis is not the level of quality and reliability we strive to offer you, and we are taking immediate steps to improve the platform’s performance and availability.\n## Preliminary Root Cause\nThis incident was caused by a configuration change to a safety filtering service that supports all Gemini models. This configuration change enabled a code path that interacted poorly with specific requests, leading to service disruption for the safety filtering service. This in turn led to customers seeing overload (429 and 503) errors for their queries.\nGoogle engineers have begun a full root cause analysis and will provide additional information once it is available.\n## Remediation\nGoogle engineers were alerted to the service disruption via automated alert on Friday, 27 February 2026 04:54 US/Pacific and immediately started an investigation.\nEngineers identified the faulty configuration change for a safety filtering service and initiated a rollback to restore the previous stable configuration. Additionally, engineers added more capacity to the service. Full service restoration was confirmed by 06:35 US/Pacific as the rollback propagated and servers became healthy.\n## Description of Impact\nOn Friday, 27 February 2026 between 04:37 and 06:35 US/Pacific, customers accessing Vertex Gemini APIs may have experienced the following:\n- Affected Models: All Gemini versions were affected, including gemini-2.5-flash, gemini-2.5-flash-lite, gemini-2.5-pro, gemini-3.0-flash-preview, gemini-3.0-pro-preview, gemini-2.0-flash, gemini-2.0-flash-lite.\n- Error Experience: - PayGo Customers: Experienced primarily 429 Resource Exhausted errors. - Provisioned Throughput (PT) Customers: Received 503 Service Unavailable errors. - For PT customers, most errors stopped at 06:00. For PayGo customers, most errors stopped at 06:20.\n- Geographic Scope: Global endpoint, us-central1, us-east4, and other US regions were impacted.","status":"AVAILABLE","affected_locations":[]},{"created":"2026-02-27T16:12:30+00:00","modified":"2026-03-04T23:23:18+00:00","when":"2026-02-27T16:12:30+00:00","text":"**Description** \\\nBetween Friday, 2026-02-27, 04:36 and 06:45 PST, customers experienced increased error rates when accessing the Vertex Gemini API Global endpoint. The issue impacted API requests to multiple Gemini models.\nThe incident also caused downstream impact to Dialogflow CX, Agent Assist, Google Cloud Support AI agent, and Customer Experience Agent Studio, which rely on Gemini APIs.\nPreliminary analysis indicates the issue was triggered by a recent configuration change. Service was fully restored after the configuration change was rolled back.\nWe thank you for your patience while we worked on resolving the issue.\n**Symptom**\n\\\nCustomers experienced increased error rates when sending API requests to impacted multiple Gemini models through the global endpoint.","status":"SERVICE_INFORMATION","affected_locations":[{"title":"Montréal (northamerica-northeast1)","id":"northamerica-northeast1"},{"title":"São Paulo (southamerica-east1)","id":"southamerica-east1"},{"title":"Iowa (us-central1)","id":"us-central1"},{"title":"South Carolina (us-east1)","id":"us-east1"},{"title":"Northern Virginia (us-east4)","id":"us-east4"},{"title":"Columbus (us-east5)","id":"us-east5"},{"title":"Oregon (us-west1)","id":"us-west1"}]}],"most_recent_update":{"created":"2026-03-09T05:25:43+00:00","modified":"2026-03-09T05:25:43+00:00","when":"2026-03-09T05:25:43+00:00","text":"# Incident Report\n## Summary\nOn Friday, 27 February 2026 at 04:37 US/Pacific, customers using Vertex AI Gemini API models experienced increased error rates. Impacted services included Google Cloud Support, Agent Assist, Vertex Gemini API and Dialogflow CX in US regions and the global endpoint. The issue persisted for a duration of 1 hour and 58 minutes.\nThis is not the level of quality and reliability we strive to offer you, and we have taken immediate steps to improve the platform’s performance and availability.\n## Root Cause\nThis incident was caused by a configuration change to a safety filtering service that supports all Gemini models. For some specific requests, this created code paths that eventually led to service disruptions and capacity loss for the safety filtering service. Consequently, customers encountered overload (429 and 503) errors for their queries, with some users reporting elevated error rates for specific models in US regions.\n## Remediation and Prevention\nGoogle engineers were alerted to the issue via our automated monitoring system on Friday, 27 February 2026 04:54 US/Pacific and immediately started an investigation.\nEngineers identified the faulty configuration change and initiated a rollback to restore the previous stable configuration. Engineers also added more capacity to the service to stabilize it. Full service restoration was confirmed by 06:35 US/Pacific as the rollback propagated and servers became healthy. \\ \\\nGoogle is committed to preventing a repeat of this issue and is taking the following actions:\n* Reinforcing rollout processes to include mandatory validation checkpoints.\n* Improving alerting systems to monitor critical dependencies more closely.\n## ## Detailed Description of Impact\nOn Friday, 27 February 2026 between 04:37 and 06:35 US/Pacific, customers accessing Vertex Gemini APIs may have experienced the following:\n* **Affected Models:** All Vertex AI Gemini API models were affected, including gemini-2.5-flash, gemini-2.5-flash-lite, gemini-2.5-pro, gemini-3.0-flash-preview, gemini-3.0-pro-preview, gemini-2.0-flash, gemini-2.0-flash-lite.\n* **Error Experience:** * **PayGo Customers:** Experienced primarily 429 Resource Exhausted errors. * **Provisioned Throughput (PT) Customers:** Received 503 Service Unavailable errors. * For PT customers, most errors stopped at **06:00**. For PayGo customers, most errors stopped at **06:20**.\n* **Geographic Scope:** Global endpoint, us-central1, us-east4, and other US regions were impacted.","status":"AVAILABLE","affected_locations":[]},"status_impact":"SERVICE_INFORMATION","severity":"low","service_key":"zall","service_name":"Multiple Products","affected_products":[{"title":"Agent Assist","id":"eUntUKqUrHdbBLNcVVXq"},{"title":"Dialogflow CX","id":"BnCicQdHSdxaCv8Ya6Vm"},{"title":"Google Cloud Support","id":"bGThzF7oEGP5jcuDdMuk"},{"title":"Vertex Gemini API","id":"Z0FZJAMvEB4j3NbCJs6B"}],"uri":"incidents/41E5S3mkTGDfkZuJZH5k","currently_affected_locations":[],"previously_affected_locations":[{"title":"Global","id":"global"},{"title":"Montréal (northamerica-northeast1)","id":"northamerica-northeast1"},{"title":"São Paulo (southamerica-east1)","id":"southamerica-east1"},{"title":"Iowa (us-central1)","id":"us-central1"},{"title":"South Carolina (us-east1)","id":"us-east1"},{"title":"Northern Virginia (us-east4)","id":"us-east4"},{"title":"Columbus (us-east5)","id":"us-east5"},{"title":"Oregon (us-west1)","id":"us-west1"}]},{"id":"8cY8jdUpEGGbsSMSQk7J","number":"15787347096705530732","begin":"2025-07-18T14:42:00+00:00","created":"2025-07-18T15:54:23+00:00","end":"2025-07-18T16:47:00+00:00","modified":"2025-07-23T09:26:58+00:00","external_desc":"We are investigating elevated error rates with multiple products in us-east1","updates":[{"created":"2025-07-22T13:42:49+00:00","modified":"2025-07-23T09:26:58+00:00","when":"2025-07-22T13:42:49+00:00","text":"## \\# Incident Report\n## \\#\\# Summary\nOn Friday, 18 July 2025 07:50 US/Pacific, several Google Cloud Platform (GCP) and Google Workspace (GWS) products experienced elevated latencies and error rates in the us-east1 region for a duration of up to 1 hour and 57 minutes.\n**GCP Impact Duration:** 18 July 2025 07:50 \\- 09:47 US/Pacific : 1 hour 57 minutes\n**GWS Impact Duration:** 18 July 2025 07:50 \\- 08:40 US/Pacific : 50 minutes\nWe sincerely apologize for this incident, which does not reflect the level of quality and reliability we strive to offer. We are taking immediate steps to improve the platform’s performance and availability.\n##\n## \\#\\# Root Cause\nThe service interruption was triggered by a procedural error during a planned hardware replacement in our datacenter. An incorrect physical disconnection was made to the active network switch serving our control plane, rather than the redundant unit scheduled for removal. The redundant unit had been properly de-configured as part of the procedure, and the combination of these two events led to partitioning of the network control plane. Our network is designed to withstand this type of control plane failure by failing open, continuing operation.\nHowever, an operational topology change while the network control plane was in a failed open state caused our network fabric's topology information to become stale. This led to packet loss and service disruption until services were moved away from the fabric and control plane connectivity was restored.\n## \\#\\# Remediation and Prevention\nGoogle engineers were alerted to the outage by our monitoring system on 18 July 2025 07:06 US/Pacific and immediately started an investigation. The following timeline details the remediation and restoration efforts:\n* **07:39 US/Pacific**: The underlying root cause (device disconnect) was identified and onsite technicians were engaged to reconnect the control plane device and restore control plane connectivity. At that moment, network failure open mechanisms worked as expected and no impact was observed.\n* **07:50 US/Pacific**: A topology change led to traffic being routed suboptimally, due to the network being in a fail open state. This caused congestion on the subset of links, packet loss, and latency to customer traffic. Engineers made a decision to move traffic away from the affected fabric, which mitigated the impact for the majority of the services.\n* **08:40 US/Pacific**: Engineers mitigated Workspace impact by shifting traffic away from the affected region.\n* **09:47 US/Pacific**: Onsite technicians reconnected the device, control plane connectivity was fully restored and all services were back to stable state.\nGoogle is committed to preventing a repeat of the issue in the future, and is completing the following actions:\n* Pause non-critical workflows until safety controls are implemented (complete).\n* Strengthen safety controls for hardware upgrade workflows by end of Q3 2025\\.\n* Design and implement a mechanism to prevent control plane partitioning in case of dual failure of upstream routers by end of Q4 2025\\.\n## \\#\\# Detailed Description of Impact\n\\#\\#\\# GCP Impact:\nMultiple products in us-east1 were affected by the loss of network connectivity, with the most significant impacts seen in us-east1-b. Other regions were not affected.\nThe outage caused a range of issues for customers with zonal resources in the region, including packet loss across VPC networks, increased error rates and latency, service unavailable (503) errors, and slow or stuck operations up to loss of networking connectivity. While regional products were briefly impacted, they recovered quickly by failing over to unaffected zones.\nA small number (0.1%) of Persistent Disks in us-east1-b were unavailable for the duration of the outage: these disks became available once the outage was mitigated, with no customer data loss.\n\\#\\#\\# GWS Impact:\nA small subset of Workspace users, primarily around the Southeast US, experienced varying degrees of unavailability and increased delays across multiple products, including Gmail, Google Meet, Google Drive, Google Chat, Google Calendar, Google Groups, Google Doc/Editors, and Google Voice.","status":"AVAILABLE","affected_locations":[]},{"created":"2025-07-18T22:08:16+00:00","modified":"2025-07-22T13:42:49+00:00","when":"2025-07-18T22:08:16+00:00","text":"# Mini Incident Report\nWe apologize for the inconvenience this service disruption/outage may have caused. We would like to provide some information about this incident below. Please note, this information is based on our best knowledge at the time of posting and is subject to change as our investigation continues. If you have experienced impact outside of what is listed below, please reach out to Google Cloud Support using https://cloud.google.com/support or to Google Workspace Support using help article https://support.google.com/a/answer/1047213.\n(All Times US/Pacific)\n**GCP Impact start and end time:** 18 July 2025 08:10 - 09:47\n**Duration:** 1 hour 37 minutes\n**GWS Impact start and end time:** 18 July 2025 08:10 - 08:40\n**Duration:** 30 minutes\n**Regions/Zones:** us-east1\n**Description:**\nOn Friday, 18 July 2025 08:10 US/Pacific multiple GCP and GWS products experienced elevated latencies and error rates in the us-east1 region for a duration of up to 1 hour and 37 minutes.\nBased on the preliminary analysis, the root cause of the issue is a procedural error during a planned hardware maintenance in one of our data centers in the us-east1 region. Our engineering team mitigated the issue by draining traffic away from the clusters and then restoring the affected hardware.\nGoogle will be completing a full incident report in the following days that will provide a full root cause and preventive actions.\n**Customer Impact:**\nThe affected GCP and GWS products experienced elevated latencies and errors rates in the us-east1 region.\n**Affected Products:**\n**GCP :**\nAlloyDB for PostgreSQL, Apigee, Artifact Registry, Cloud Armor, Cloud Billing, Cloud Build, Cloud External Key Manager, Cloud Filestore, Cloud HSM, Cloud Key Management Service, Cloud Load Balancing, Cloud Monitoring, Cloud Run, Cloud Spanner, Cloud Storage for Firebase, Cloud Workflows, Database Migration Service, Dialogflow CX, Dialogflow ES, Google BigQuery, Google Cloud Dataflow, Google Cloud Dataproc, Google Cloud Storage, Google Cloud Support, Google Cloud Tasks, Google Compute Engine, Hybrid Connectivity, Media CDN, Network Telemetry, Private Service Connect, Secret Manager, Service Directory, Vertex AI Online Prediction, Virtual Private Cloud (VPC)\n**Workspace :**\nGmail, Google Meet, Google Drive, Google Chat, Google Calendar, Google Groups, Google Doc/Editors, Google Voice\n**Google SecOps:**\nGoogle SecOps SOAR \u0026 Google SecOps","status":"AVAILABLE","affected_locations":[]},{"created":"2025-07-18T18:03:11+00:00","modified":"2025-07-18T22:08:16+00:00","when":"2025-07-18T18:03:11+00:00","text":"The issue has been resolved for all affected products as of 2025-07-18 09:47 US/Pacific.\nFrom preliminary analysis, during a routine maintenance of our network in us-east1-b, we experienced elevated packet loss, causing service disruption in the zone.\nWe will publish a full Incident Report with root cause once we have completed our internal investigations.\nWe thank you for your patience while we worked on resolving the issue.","status":"AVAILABLE","affected_locations":[]},{"created":"2025-07-18T17:32:00+00:00","modified":"2025-07-18T18:03:11+00:00","when":"2025-07-18T17:32:00+00:00","text":"Our engineers have successfully recovered the network control plane in the affected us-east1 zones.\nWe're seeing multiple services reporting full recovery, and product engineers continue to validate the remaining services.\nWe'll provide another update with more details by 11:00 AM US/Pacific, July 18, 2025.","status":"SERVICE_DISRUPTION","affected_locations":[{"title":"South Carolina (us-east1)","id":"us-east1"}]},{"created":"2025-07-18T16:58:34+00:00","modified":"2025-07-18T17:32:00+00:00","when":"2025-07-18T16:58:34+00:00","text":"Our engineers have successfully recovered the network control plane in the affected us-east1 zones. We're seeing multiple services reporting full recovery, and product engineers are now validating the remaining services.\nWe'll provide another update with more details by 10:30 AM US/Pacific, July 18, 2025.","status":"SERVICE_DISRUPTION","affected_locations":[{"title":"South Carolina (us-east1)","id":"us-east1"}]},{"created":"2025-07-18T16:29:02+00:00","modified":"2025-07-18T16:58:34+00:00","when":"2025-07-18T16:29:02+00:00","text":"Our engineers have confirmed that us-east1-b is partially affected. All other zones in us-east1 are currently operating normally.\nOur engineers have recovered the failed hardware and are currently recovering the network control plane in the affected zones.\nWe'll provide another update by 10:00 AM US/Pacific, July 18, 2025.","status":"SERVICE_DISRUPTION","affected_locations":[{"title":"South Carolina (us-east1)","id":"us-east1"}]},{"created":"2025-07-18T15:54:23+00:00","modified":"2025-07-18T16:29:02+00:00","when":"2025-07-18T15:54:23+00:00","text":"We're currently experiencing elevated latency and error rates for several Cloud services in the us-east1 region, beginning at 7:06 AM PDT today, July 18, 2025. Our initial investigation points to a hardware infrastructure failure as the likely cause.\nWe apologize for any disruption this may be causing. We'll provide an update with more details by 9:15 AM PDT today.","status":"SERVICE_DISRUPTION","affected_locations":[{"title":"South Carolina (us-east1)","id":"us-east1"}]}],"most_recent_update":{"created":"2025-07-22T13:42:49+00:00","modified":"2025-07-23T09:26:58+00:00","when":"2025-07-22T13:42:49+00:00","text":"## \\# Incident Report\n## \\#\\# Summary\nOn Friday, 18 July 2025 07:50 US/Pacific, several Google Cloud Platform (GCP) and Google Workspace (GWS) products experienced elevated latencies and error rates in the us-east1 region for a duration of up to 1 hour and 57 minutes.\n**GCP Impact Duration:** 18 July 2025 07:50 \\- 09:47 US/Pacific : 1 hour 57 minutes\n**GWS Impact Duration:** 18 July 2025 07:50 \\- 08:40 US/Pacific : 50 minutes\nWe sincerely apologize for this incident, which does not reflect the level of quality and reliability we strive to offer. We are taking immediate steps to improve the platform’s performance and availability.\n##\n## \\#\\# Root Cause\nThe service interruption was triggered by a procedural error during a planned hardware replacement in our datacenter. An incorrect physical disconnection was made to the active network switch serving our control plane, rather than the redundant unit scheduled for removal. The redundant unit had been properly de-configured as part of the procedure, and the combination of these two events led to partitioning of the network control plane. Our network is designed to withstand this type of control plane failure by failing open, continuing operation.\nHowever, an operational topology change while the network control plane was in a failed open state caused our network fabric's topology information to become stale. This led to packet loss and service disruption until services were moved away from the fabric and control plane connectivity was restored.\n## \\#\\# Remediation and Prevention\nGoogle engineers were alerted to the outage by our monitoring system on 18 July 2025 07:06 US/Pacific and immediately started an investigation. The following timeline details the remediation and restoration efforts:\n* **07:39 US/Pacific**: The underlying root cause (device disconnect) was identified and onsite technicians were engaged to reconnect the control plane device and restore control plane connectivity. At that moment, network failure open mechanisms worked as expected and no impact was observed.\n* **07:50 US/Pacific**: A topology change led to traffic being routed suboptimally, due to the network being in a fail open state. This caused congestion on the subset of links, packet loss, and latency to customer traffic. Engineers made a decision to move traffic away from the affected fabric, which mitigated the impact for the majority of the services.\n* **08:40 US/Pacific**: Engineers mitigated Workspace impact by shifting traffic away from the affected region.\n* **09:47 US/Pacific**: Onsite technicians reconnected the device, control plane connectivity was fully restored and all services were back to stable state.\nGoogle is committed to preventing a repeat of the issue in the future, and is completing the following actions:\n* Pause non-critical workflows until safety controls are implemented (complete).\n* Strengthen safety controls for hardware upgrade workflows by end of Q3 2025\\.\n* Design and implement a mechanism to prevent control plane partitioning in case of dual failure of upstream routers by end of Q4 2025\\.\n## \\#\\# Detailed Description of Impact\n\\#\\#\\# GCP Impact:\nMultiple products in us-east1 were affected by the loss of network connectivity, with the most significant impacts seen in us-east1-b. Other regions were not affected.\nThe outage caused a range of issues for customers with zonal resources in the region, including packet loss across VPC networks, increased error rates and latency, service unavailable (503) errors, and slow or stuck operations up to loss of networking connectivity. While regional products were briefly impacted, they recovered quickly by failing over to unaffected zones.\nA small number (0.1%) of Persistent Disks in us-east1-b were unavailable for the duration of the outage: these disks became available once the outage was mitigated, with no customer data loss.\n\\#\\#\\# GWS Impact:\nA small subset of Workspace users, primarily around the Southeast US, experienced varying degrees of unavailability and increased delays across multiple products, including Gmail, Google Meet, Google Drive, Google Chat, Google Calendar, Google Groups, Google Doc/Editors, and Google Voice.","status":"AVAILABLE","affected_locations":[]},"status_impact":"SERVICE_DISRUPTION","severity":"medium","service_key":"zall","service_name":"Multiple Products","affected_products":[{"title":"AlloyDB for PostgreSQL","id":"fPovtKbaWN9UTepMm3kJ"},{"title":"Apigee","id":"9Y13BNFy4fJydvjdsN3X"},{"title":"Artifact Registry","id":"QbBuuiRdsLpMr9WmGwm5"},{"title":"Certificate Authority Service","id":"PvdE3tt1VdxKXzSyd8WF"},{"title":"Cloud Armor","id":"Kakg69gTC3xFyeJCY2va"},{"title":"Cloud Billing","id":"oLCqDYkE9NFWQVgctQTL"},{"title":"Cloud Build","id":"fw8GzBdZdqy4THau7e1y"},{"title":"Cloud External Key Manager","id":"GXALzYBgpi3XpsLLxLgu"},{"title":"Cloud Firestore","id":"CETSkT92V21G6A1x28me"},{"title":"Cloud HSM","id":"R3HPPUbVeFrApLaqQB4B"},{"title":"Cloud Key Management Service","id":"67cSySTL7dwJZo9JWUGU"},{"title":"Cloud Load Balancing","id":"ix7u9beT8ivBdjApTif3"},{"title":"Cloud Memorystore","id":"LGPLu3M5pcUAKU1z6eP3"},{"title":"Cloud Monitoring","id":"3zaaDb7antc73BM1UAVT"},{"title":"Cloud Run","id":"9D7d2iNBQWN24zc1VamE"},{"title":"Cloud Spanner","id":"EcNGGUgBtBLrtm4mWvqC"},{"title":"Cloud Storage for Firebase","id":"aY6Fbgy6TV4YWoutjhfe"},{"title":"Cloud Workflows","id":"C4P62W9Xc2zZ1Sk52bbw"},{"title":"Database Migration Service","id":"vY4CRgRFNbqUXWWyYGFS"},{"title":"Dataproc Metastore","id":"PXZh68NPz9auRyo4tVfy"},{"title":"Dialogflow CX","id":"BnCicQdHSdxaCv8Ya6Vm"},{"title":"Eventarc","id":"YaFawoMaXnqgY4keUBnW"},{"title":"Google App Engine","id":"kchyUtnkMHJWaAva8aYc"},{"title":"Google BigQuery","id":"9CcrhHUcFevXPSVaSxkf"},{"title":"Google Cloud Bigtable","id":"LfZSuE3xdQU46YMFV5fy"},{"title":"Google Cloud Console","id":"Wdsr1n5vyDvCt78qEifm"},{"title":"Google Cloud Dataflow","id":"T9bFoXPqG8w8g1YbWTKY"},{"title":"Google Cloud Dataproc","id":"yjXrEg3Yvy26BauMwr69"},{"title":"Google Cloud Pub/Sub","id":"dFjdLh2v6zuES6t9ADCB"},{"title":"Google Cloud SQL","id":"hV87iK5DcEXKgWU2kDri"},{"title":"Google Cloud Storage","id":"UwaYoXQ5bHYHG6EdiPB8"},{"title":"Google Cloud Support","id":"bGThzF7oEGP5jcuDdMuk"},{"title":"Google Cloud Tasks","id":"tMWyzhyKK4rAzAf7x62h"},{"title":"Google Compute Engine","id":"L3ggmi3Jy4xJmgodFA9K"},{"title":"Google Kubernetes Engine","id":"LCSbT57h59oR4W98NHuz"},{"title":"Hybrid Connectivity","id":"5x6CGnZvSHQZ26KtxpK1"},{"title":"Identity and Access Management","id":"adnGEDEt9zWzs8uF1oKA"},{"title":"Media CDN","id":"FK8WX6iZ3FuQL6qUwski"},{"title":"Memorystore for Memcached","id":"paC6vmsvnjCHsBkp4Wva"},{"title":"Memorystore for Redis","id":"3yFciKa9NQH7pmbnUYUs"},{"title":"Memorystore for Redis Cluster","id":"pAQRwuhqRn7Y1E2we8ds"},{"title":"Persistent Disk","id":"SzESm2Ux129pjDGKWD68"},{"title":"Private Service Connect","id":"fbzQRKqPfxZ2DUScMGV2"},{"title":"Secret Manager","id":"kzGfErQK3HzkFhptoeHH"},{"title":"Service Directory","id":"vmq8TsEZwitKYM6V9BaM"},{"title":"Vertex AI Online Prediction","id":"sdXM79fz1FS6ekNpu37K"},{"title":"Virtual Private Cloud (VPC)","id":"BSGtCUnz6ZmyajsjgTKv"}],"uri":"incidents/8cY8jdUpEGGbsSMSQk7J","currently_affected_locations":[],"previously_affected_locations":[{"title":"South Carolina (us-east1)","id":"us-east1"}]},{"id":"ow5i3PPK96RduMcb1SsW","number":"12995900318995415150","begin":"2025-06-12T17:51:00+00:00","created":"2025-06-12T18:46:38+00:00","end":"2025-06-13T01:18:00+00:00","modified":"2025-07-19T03:34:44+00:00","external_desc":"Multiple GCP products are experiencing Service issues.","updates":[{"created":"2025-06-13T23:45:21+00:00","modified":"2025-06-13T23:48:18+00:00","when":"2025-06-13T23:45:21+00:00","text":"# Incident Report\n## **Summary**\n*Google Cloud, Google Workspace and Google Security Operations products experienced increased 503 errors in external API requests, impacting customers.*\n***We deeply apologize for the impact this outage has had. Google Cloud customers and their users trust their businesses to Google, and we will do better. We apologize for the impact this has had not only on our customers’ businesses and their users but also on the trust of our systems. We are committed to making improvements to help avoid outages like this moving forward.***\n### **What happened?**\nGoogle and Google Cloud APIs are served through our Google API management and control planes. Distributed regionally, these management and control planes are responsible for ensuring each API request that comes in is authorized, has the policy and appropriate checks (like quota) to meet their endpoints. The core binary that is part of this policy check system is known as Service Control. Service Control is a regional service that has a regional datastore that it reads quota and policy information from. This datastore metadata gets replicated almost instantly globally to manage quota policies for Google Cloud and our customers.\nOn May 29, 2025, a new feature was added to Service Control for additional quota policy checks. This code change and binary release went through our region by region rollout, but the code path that failed was never exercised during this rollout due to needing a policy change that would trigger the code. As a safety precaution, this code change came with a red-button to turn off that particular policy serving path. The issue with this change was that it did not have appropriate error handling nor was it feature flag protected. Without the appropriate error handling, the null pointer caused the binary to crash. Feature flags are used to gradually enable the feature region by region per project, starting with internal projects, to enable us to catch issues. If this had been flag protected, the issue would have been caught in staging.\nOn June 12, 2025 at \\~10:45am PDT, a policy change was inserted into the regional Spanner tables that Service Control uses for policies. Given the global nature of quota management, this metadata was replicated globally within seconds. This policy data contained unintended blank fields. Service Control, then regionally exercised quota checks on policies in each regional datastore. This pulled in blank fields for this respective policy change and exercised the code path that hit the null pointer causing the binaries to go into a crash loop. This occurred globally given each regional deployment.\nWithin 2 minutes, our Site Reliability Engineering team was triaging the incident. Within 10 minutes, the root cause was identified and the red-button (to disable the serving path) was being put in place. The red-button was ready to roll out \\~25 minutes from the start of the incident. Within 40 minutes of the incident, the red-button rollout was completed, and we started seeing recovery across regions, starting with the smaller ones first.\nWithin some of our larger regions, such as us-central-1, as Service Control tasks restarted, it created a herd effect on the underlying infrastructure it depends on (i.e. that Spanner table), overloading the infrastructure. Service Control did not have the appropriate randomized exponential backoff implemented to avoid this. It took up to \\~2h 40 mins to fully resolve in us-central-1 as we throttled task creation to minimize the impact on the underlying infrastructure and routed traffic to multi-regional databases to reduce the load. At that point, Service Control and API serving was fully recovered across all regions. Corresponding Google and Google Cloud products started recovering with some taking longer depending upon their architecture.\n### **What is our immediate path forward?**\nImmediately upon recovery, we froze all changes to the Service Control stack and manual policy pushes until we can completely remediate the system.\n### **How did we communicate?**\nWe posted our first incident report to Cloud Service Health about \\~1h after the start of the crashes, due to the Cloud Service Health infrastructure being down due to this outage. For some customers, the monitoring infrastructure they had running on Google Cloud was also failing, leaving them without a signal of the incident or an understanding of the impact to their business and/or infrastructure. We will address this going forward.\n### **What’s our approach moving forward?**\nBeyond freezing the system as mentioned above, we will prioritize and safely complete the following:\n* We will modularize Service Control’s architecture, so the functionality is isolated and fails open. Thus, if a corresponding check fails, Service Control can still serve API requests.\n* We will audit all systems that consume globally replicated data. Regardless of the business need for near instantaneous consistency of the data globally (i.e. quota management settings are global), data replication needs to be propagated incrementally with sufficient time to validate and detect issues.\n* We will enforce all changes to critical binaries to be feature flag protected and disabled by default.\n* We will improve our static analysis and testing practices to correctly handle errors and if need be fail open.\n* We will audit and ensure our systems employ randomized exponential backoff.\n* We will improve our external communications, both automated and human, so our customers get the information they need asap to react to issues, manage their systems and help their customers.\n* We'll ensure our monitoring and communication infrastructure remains operational to serve customers even when Google Cloud and our primary monitoring products are down, ensuring business continuity.\n-------","status":"AVAILABLE","affected_locations":[]},{"created":"2025-06-13T06:34:31+00:00","modified":"2025-06-13T23:45:21+00:00","when":"2025-06-13T06:34:31+00:00","text":"# Mini Incident Report\nWe are deeply sorry for the impact to all of our users and their customers that this service disruption/outage caused. Businesses large and small trust Google Cloud with your workloads and we will do better. In the coming days, we will publish a full incident report of the root cause, detailed timeline and robust remediation steps we will be taking. Given the size and impact of this incident, we would like to provide some information below.\nPlease note, this information is based on our best knowledge at the time of posting and is subject to change as our investigation continues. If you have experienced impact outside of what is listed below, please reach out to Google Cloud Support using https://cloud.google.com/support or to Google Workspace Support using help article https://support.google.com/a/answer/1047213.\n**(All Times US/Pacific)**\n**Incident Start:** 12 June, 2025 10:49\n**All regions except us-central1 mitigated:** 12 June, 2025 12:48\n**Incident End:** 12 June, 2025 13:49\n**Duration:** 3 hours\n**Regions/Zones:** Global\n**Description:**\nMultiple Google Cloud and Google Workspace products experienced increased 503 errors in external API requests, impacting customers.\nFrom our initial analysis, the issue occurred due to an invalid automated quota update to our API management system which was distributed globally, causing external API requests to be rejected. To recover we bypassed the offending quota check, which allowed recovery in most regions within 2 hours. However, the quota policy database in us-central1 became overloaded, resulting in much longer recovery in that region. Several products had moderate residual impact (e.g. backlogs) for up to an hour after the primary issue was mitigated and a small number recovering after that.\nGoogle will complete a full Incident Report in the following days that will provide a detailed root cause.\n**Customer Impact:**\nCustomers had intermittent API and user-interface access issues to the impacted services. Existing streaming and IaaS resources were not impacted.\n**Additional details:**\nThis incident should not have happened, and we will take the following measures to prevent future recurrence:\n* Prevent our API management platform from failing due to invalid or corrupt data.\n* Prevent metadata from propagating globally without appropriate protection, testing and monitoring in place.\n* Improve system error handling and comprehensive testing for handling of invalid data.\n**Affected Services and Features:**\n**Google Cloud Products:**\n* Identity and Access Management\n* Cloud Build\n* Cloud Key Management Service\n* Google Cloud Storage\n* Cloud Monitoring\n* Google Cloud Dataproc\n* Cloud Security Command Center\n* Artifact Registry\n* Cloud Workflows\n* Cloud Healthcare\n* Resource Manager API\n* Dataproc Metastore\n* Cloud Run\n* VMWare engine\n* Dataplex\n* Migrate to Virtual Machines\n* Google BigQuery\n* Contact Center AI Platform\n* Google Cloud Deploy\n* Media CDN\n* Colab Enterprise\n* Vertex Gemini API\n* Cloud Data Fusion\n* Cloud Asset Inventory\n* Datastream\n* Integration Connectors\n* Apigee\n* Google Cloud NetApp Volumes\n* Google Cloud Bigtable\n* Looker (Google Cloud core)\n* Looker Studio\n* Google Cloud Functions\n* Cloud Load Balancing\n* Traffic Director\n* Document AI\n* AutoML Translation\n* Pub/Sub Lite\n* API Gateway\n* Agent Assist\n* AlloyDB for PostgreSQL\n* Cloud Firestore\n* Cloud Logging\n* Cloud Shell\n* Cloud Memorystore\n* Cloud Spanner\n* Contact Center Insights\n* Database Migration Service\n* Dialogflow CX\n* Dialogflow ES\n* Google App Engine\n* Google Cloud Composer\n* Google Cloud Console\n* Google Cloud DNS\n* Google Cloud Pub/Sub\n* Google Cloud SQL\n* Google Compute Engine\n* Identity Platform\n* Managed Service for Apache Kafka\n* Memorystore for Memcached\n* Memorystore for Redis\n* Memorystore for Redis Cluster\n* Persistent Disk\n* Personalized Service Health\n* Speech-to-Text\n* Text-to-Speech\n* Vertex AI Search\n* Retail API\n* Vertex AI Feature Store\n* BigQuery Data Transfer Service\n* Google Cloud Marketplace\n* Cloud NAT\n* Hybrid Connectivity\n* Cloud Vision\n* Network Connectivity Center\n* Cloud Workstations\n* Google Security Operations\n**Google Workspace Products:**\n* AppSheet\n* Gmail\n* Google Calendar\n* Google Drive\n* Google Chat\n* Google Voice\n* Google Docs\n* Google Meet\n* Google Cloud Search\n* Google Tasks","status":"AVAILABLE","affected_locations":[]},{"created":"2025-06-13T01:27:32+00:00","modified":"2025-06-13T06:34:31+00:00","when":"2025-06-13T01:27:32+00:00","text":"Vertex AI Online Prediction is full recovered as of 18:18 PDT.\nAll the services are fully recovered from the service issue\nWe will publish analysis of this incident once we have completed our internal investigation.\nWe thank you for your patience while we worked on resolving the issue.","status":"AVAILABLE","affected_locations":[]},{"created":"2025-06-13T00:59:00+00:00","modified":"2025-06-13T01:27:32+00:00","when":"2025-06-13T00:59:00+00:00","text":"**Vertex AI Online Prediction:**\nThe issue causing elevated 5xx errors with some Model Garden models was fully resolved as of 17:05 PDT. Vertex AI serving is now back to normal in all regions except europe-west1 and asia-southeast1. Engineers are actively working to restore normal serving capacity in these two regions.\nThe ETA for restoring normal serving capacity in europe-west1 and asia-southeast1 is 19:45 PDT.\nWe will provide an update by Thursday, 2025-06-12 19:45 PDT with current details.","status":"SERVICE_DISRUPTION","affected_locations":[{"title":"Taiwan (asia-east1)","id":"asia-east1"},{"title":"Hong Kong (asia-east2)","id":"asia-east2"},{"title":"Tokyo (asia-northeast1)","id":"asia-northeast1"},{"title":"Osaka (asia-northeast2)","id":"asia-northeast2"},{"title":"Seoul (asia-northeast3)","id":"asia-northeast3"},{"title":"Mumbai (asia-south1)","id":"asia-south1"},{"title":"Singapore (asia-southeast1)","id":"asia-southeast1"},{"title":"Jakarta (asia-southeast2)","id":"asia-southeast2"},{"title":"Sydney (australia-southeast1)","id":"australia-southeast1"},{"title":"Melbourne (australia-southeast2)","id":"australia-southeast2"},{"title":"Warsaw (europe-central2)","id":"europe-central2"},{"title":"Finland (europe-north1)","id":"europe-north1"},{"title":"Madrid (europe-southwest1)","id":"europe-southwest1"},{"title":"Belgium (europe-west1)","id":"europe-west1"},{"title":"London (europe-west2)","id":"europe-west2"},{"title":"Frankfurt (europe-west3)","id":"europe-west3"},{"title":"Netherlands (europe-west4)","id":"europe-west4"},{"title":"Zurich (europe-west6)","id":"europe-west6"},{"title":"Milan (europe-west8)","id":"europe-west8"},{"title":"Paris (europe-west9)","id":"europe-west9"},{"title":"Doha (me-central1)","id":"me-central1"},{"title":"Dammam (me-central2)","id":"me-central2"},{"title":"Tel Aviv (me-west1)","id":"me-west1"},{"title":"Montréal (northamerica-northeast1)","id":"northamerica-northeast1"},{"title":"Toronto (northamerica-northeast2)","id":"northamerica-northeast2"},{"title":"São Paulo (southamerica-east1)","id":"southamerica-east1"},{"title":"Santiago (southamerica-west1)","id":"southamerica-west1"},{"title":"Iowa (us-central1)","id":"us-central1"},{"title":"South Carolina (us-east1)","id":"us-east1"},{"title":"Northern Virginia (us-east4)","id":"us-east4"},{"title":"Columbus (us-east5)","id":"us-east5"},{"title":"Dallas (us-south1)","id":"us-south1"},{"title":"Oregon (us-west1)","id":"us-west1"},{"title":"Los Angeles (us-west2)","id":"us-west2"},{"title":"Salt Lake City (us-west3)","id":"us-west3"},{"title":"Las Vegas (us-west4)","id":"us-west4"}]},{"created":"2025-06-13T00:33:34+00:00","modified":"2025-06-13T00:59:00+00:00","when":"2025-06-13T00:33:34+00:00","text":"The impact on Personalized Service Health is now resolved and the updates should be reflected without any issues.\nThe issue with Google Cloud Dataflow is fully resolved as of 17:10 PDT\nThe only remaining impact is on Vertex AI Online Prediction as follows:\n**Vertex AI Online Prediction:** Customers may continue to experience elevated 5xx errors with some of the models available in the Model Garden. We are seeing gradual decrease in error rates as our engineers perform appropriate mitigation actions.\nThe ETA for full resolution of these 5xx errors is 22:00 PDT\nWe will provide an update by Thursday, 2025-06-12 22:00 PDT with current details.","status":"SERVICE_OUTAGE","affected_locations":[{"title":"Taiwan (asia-east1)","id":"asia-east1"},{"title":"Hong Kong (asia-east2)","id":"asia-east2"},{"title":"Tokyo (asia-northeast1)","id":"asia-northeast1"},{"title":"Osaka (asia-northeast2)","id":"asia-northeast2"},{"title":"Seoul (asia-northeast3)","id":"asia-northeast3"},{"title":"Mumbai (asia-south1)","id":"asia-south1"},{"title":"Singapore (asia-southeast1)","id":"asia-southeast1"},{"title":"Jakarta (asia-southeast2)","id":"asia-southeast2"},{"title":"Sydney (australia-southeast1)","id":"australia-southeast1"},{"title":"Melbourne (australia-southeast2)","id":"australia-southeast2"},{"title":"Warsaw (europe-central2)","id":"europe-central2"},{"title":"Finland (europe-north1)","id":"europe-north1"},{"title":"Madrid (europe-southwest1)","id":"europe-southwest1"},{"title":"Belgium (europe-west1)","id":"europe-west1"},{"title":"London (europe-west2)","id":"europe-west2"},{"title":"Frankfurt (europe-west3)","id":"europe-west3"},{"title":"Netherlands (europe-west4)","id":"europe-west4"},{"title":"Zurich (europe-west6)","id":"europe-west6"},{"title":"Milan (europe-west8)","id":"europe-west8"},{"title":"Paris (europe-west9)","id":"europe-west9"},{"title":"Global","id":"global"},{"title":"Doha (me-central1)","id":"me-central1"},{"title":"Dammam (me-central2)","id":"me-central2"},{"title":"Tel Aviv (me-west1)","id":"me-west1"},{"title":"Montréal (northamerica-northeast1)","id":"northamerica-northeast1"},{"title":"Toronto (northamerica-northeast2)","id":"northamerica-northeast2"},{"title":"São Paulo (southamerica-east1)","id":"southamerica-east1"},{"title":"Santiago (southamerica-west1)","id":"southamerica-west1"},{"title":"Iowa (us-central1)","id":"us-central1"},{"title":"South Carolina (us-east1)","id":"us-east1"},{"title":"Northern Virginia (us-east4)","id":"us-east4"},{"title":"Columbus (us-east5)","id":"us-east5"},{"title":"Dallas (us-south1)","id":"us-south1"},{"title":"Oregon (us-west1)","id":"us-west1"},{"title":"Los Angeles (us-west2)","id":"us-west2"},{"title":"Salt Lake City (us-west3)","id":"us-west3"},{"title":"Las Vegas (us-west4)","id":"us-west4"}]},{"created":"2025-06-13T00:06:24+00:00","modified":"2025-06-13T00:33:34+00:00","when":"2025-06-13T00:06:24+00:00","text":"The following Google Cloud products are still experiencing residual impact:\n**Google Cloud Dataflow:** Dataflow backlog has cleared up in all regions except us-central1. Customers may experience delays with Dataflow operations in us-central1 as the backlog clears up gradually. We do not have an ETA for Cloud Dataflow recovery in us-central1.\n**Vertex AI Online Prediction:** Customers may continue to experience elevated 5xx errors with some of the models available in the Model Garden. We are seeing gradual decrease in error rates as our engineers perform appropriate mitigation actions. The ETA for full resolution of these 5xx errors is 22:00 PDT\n**Personalized Service Health:** Updates on the Personalized Service Health are delayed and we recommend customers to continue using Cloud Service Health dashboard for updates.\nWe will provide an update by Thursday, 2025-06-12 17:45 PDT with current details.","status":"SERVICE_OUTAGE","affected_locations":[{"title":"Taiwan (asia-east1)","id":"asia-east1"},{"title":"Hong Kong (asia-east2)","id":"asia-east2"},{"title":"Tokyo (asia-northeast1)","id":"asia-northeast1"},{"title":"Osaka (asia-northeast2)","id":"asia-northeast2"},{"title":"Seoul (asia-northeast3)","id":"asia-northeast3"},{"title":"Mumbai (asia-south1)","id":"asia-south1"},{"title":"Singapore (asia-southeast1)","id":"asia-southeast1"},{"title":"Jakarta (asia-southeast2)","id":"asia-southeast2"},{"title":"Sydney (australia-southeast1)","id":"australia-southeast1"},{"title":"Melbourne (australia-southeast2)","id":"australia-southeast2"},{"title":"Warsaw (europe-central2)","id":"europe-central2"},{"title":"Finland (europe-north1)","id":"europe-north1"},{"title":"Madrid (europe-southwest1)","id":"europe-southwest1"},{"title":"Belgium (europe-west1)","id":"europe-west1"},{"title":"London (europe-west2)","id":"europe-west2"},{"title":"Frankfurt (europe-west3)","id":"europe-west3"},{"title":"Netherlands (europe-west4)","id":"europe-west4"},{"title":"Zurich (europe-west6)","id":"europe-west6"},{"title":"Milan (europe-west8)","id":"europe-west8"},{"title":"Paris (europe-west9)","id":"europe-west9"},{"title":"Global","id":"global"},{"title":"Doha (me-central1)","id":"me-central1"},{"title":"Dammam (me-central2)","id":"me-central2"},{"title":"Tel Aviv (me-west1)","id":"me-west1"},{"title":"Montréal (northamerica-northeast1)","id":"northamerica-northeast1"},{"title":"Toronto (northamerica-northeast2)","id":"northamerica-northeast2"},{"title":"São Paulo (southamerica-east1)","id":"southamerica-east1"},{"title":"Santiago (southamerica-west1)","id":"southamerica-west1"},{"title":"Iowa (us-central1)","id":"us-central1"},{"title":"South Carolina (us-east1)","id":"us-east1"},{"title":"Northern Virginia (us-east4)","id":"us-east4"},{"title":"Columbus (us-east5)","id":"us-east5"},{"title":"Dallas (us-south1)","id":"us-south1"},{"title":"Oregon (us-west1)","id":"us-west1"},{"title":"Los Angeles (us-west2)","id":"us-west2"},{"title":"Salt Lake City (us-west3)","id":"us-west3"},{"title":"Las Vegas (us-west4)","id":"us-west4"}]},{"created":"2025-06-12T23:13:50+00:00","modified":"2025-06-13T00:06:24+00:00","when":"2025-06-12T23:13:50+00:00","text":"The following Google Cloud products are still experiencing residual impact:\n**Google Cloud Dataflow:** Customers may experience delays with Dataflow operations as the backlog is clearing up gradually.\n**Vertex AI Online Prediction:** Customers may continue to experience elevated 5xx errors with some of the models available in the Model Garden.\n**Personalized Service Health:** Updates on the Personalized Service Health are delayed and we recommend customers to continue using Cloud Service Health dashboard for updates.\nWe currently do not have an ETA for full mitigation of the above services.\nWe will provide an update by Thursday, 2025-06-12 17:00 PDT with current details.","status":"SERVICE_OUTAGE","affected_locations":[{"title":"Johannesburg (africa-south1)","id":"africa-south1"},{"title":"Taiwan (asia-east1)","id":"asia-east1"},{"title":"Hong Kong (asia-east2)","id":"asia-east2"},{"title":"Tokyo (asia-northeast1)","id":"asia-northeast1"},{"title":"Osaka (asia-northeast2)","id":"asia-northeast2"},{"title":"Seoul (asia-northeast3)","id":"asia-northeast3"},{"title":"Mumbai (asia-south1)","id":"asia-south1"},{"title":"Delhi (asia-south2)","id":"asia-south2"},{"title":"Singapore (asia-southeast1)","id":"asia-southeast1"},{"title":"Jakarta (asia-southeast2)","id":"asia-southeast2"},{"title":"Sydney (australia-southeast1)","id":"australia-southeast1"},{"title":"Melbourne (australia-southeast2)","id":"australia-southeast2"},{"title":"Warsaw (europe-central2)","id":"europe-central2"},{"title":"Finland (europe-north1)","id":"europe-north1"},{"title":"Stockholm (europe-north2)","id":"europe-north2"},{"title":"Madrid (europe-southwest1)","id":"europe-southwest1"},{"title":"Belgium (europe-west1)","id":"europe-west1"},{"title":"Berlin (europe-west10)","id":"europe-west10"},{"title":"Turin (europe-west12)","id":"europe-west12"},{"title":"London (europe-west2)","id":"europe-west2"},{"title":"Frankfurt (europe-west3)","id":"europe-west3"},{"title":"Netherlands (europe-west4)","id":"europe-west4"},{"title":"Zurich (europe-west6)","id":"europe-west6"},{"title":"Milan (europe-west8)","id":"europe-west8"},{"title":"Paris (europe-west9)","id":"europe-west9"},{"title":"Global","id":"global"},{"title":"Doha (me-central1)","id":"me-central1"},{"title":"Dammam (me-central2)","id":"me-central2"},{"title":"Tel Aviv (me-west1)","id":"me-west1"},{"title":"Montréal (northamerica-northeast1)","id":"northamerica-northeast1"},{"title":"Toronto (northamerica-northeast2)","id":"northamerica-northeast2"},{"title":"Mexico (northamerica-south1)","id":"northamerica-south1"},{"title":"São Paulo (southamerica-east1)","id":"southamerica-east1"},{"title":"Santiago (southamerica-west1)","id":"southamerica-west1"},{"title":"Iowa (us-central1)","id":"us-central1"},{"title":"South Carolina (us-east1)","id":"us-east1"},{"title":"Northern Virginia (us-east4)","id":"us-east4"},{"title":"Columbus (us-east5)","id":"us-east5"},{"title":"Dallas (us-south1)","id":"us-south1"},{"title":"Oregon (us-west1)","id":"us-west1"},{"title":"Los Angeles (us-west2)","id":"us-west2"},{"title":"Salt Lake City (us-west3)","id":"us-west3"},{"title":"Las Vegas (us-west4)","id":"us-west4"}]},{"created":"2025-06-12T22:16:06+00:00","modified":"2025-06-12T23:13:50+00:00","when":"2025-06-12T22:16:06+00:00","text":"Most of the Google Cloud products are fully recovered as of 13:45 PDT.\nThere is some residual impact for the products currently marked as affected on the dashboard. Please continue to monitor the services and the dashboard for individual product recoveries.\nWe will provide an update by Thursday, 2025-06-12 16:00 PDT with current details.","status":"SERVICE_OUTAGE","affected_locations":[{"title":"Johannesburg (africa-south1)","id":"africa-south1"},{"title":"Multi-region: asia","id":"asia"},{"title":"Taiwan (asia-east1)","id":"asia-east1"},{"title":"Hong Kong (asia-east2)","id":"asia-east2"},{"title":"Tokyo (asia-northeast1)","id":"asia-northeast1"},{"title":"Osaka (asia-northeast2)","id":"asia-northeast2"},{"title":"Seoul (asia-northeast3)","id":"asia-northeast3"},{"title":"Mumbai (asia-south1)","id":"asia-south1"},{"title":"Delhi (asia-south2)","id":"asia-south2"},{"title":"Singapore (asia-southeast1)","id":"asia-southeast1"},{"title":"Jakarta (asia-southeast2)","id":"asia-southeast2"},{"title":"Sydney (australia-southeast1)","id":"australia-southeast1"},{"title":"Melbourne (australia-southeast2)","id":"australia-southeast2"},{"title":"Multi-region: eu","id":"eu"},{"title":"Warsaw (europe-central2)","id":"europe-central2"},{"title":"Finland (europe-north1)","id":"europe-north1"},{"title":"Stockholm (europe-north2)","id":"europe-north2"},{"title":"Madrid (europe-southwest1)","id":"europe-southwest1"},{"title":"Belgium (europe-west1)","id":"europe-west1"},{"title":"Berlin (europe-west10)","id":"europe-west10"},{"title":"Turin (europe-west12)","id":"europe-west12"},{"title":"London (europe-west2)","id":"europe-west2"},{"title":"Frankfurt (europe-west3)","id":"europe-west3"},{"title":"Netherlands (europe-west4)","id":"europe-west4"},{"title":"Zurich (europe-west6)","id":"europe-west6"},{"title":"Milan (europe-west8)","id":"europe-west8"},{"title":"Paris (europe-west9)","id":"europe-west9"},{"title":"Global","id":"global"},{"title":"Doha (me-central1)","id":"me-central1"},{"title":"Dammam (me-central2)","id":"me-central2"},{"title":"Tel Aviv (me-west1)","id":"me-west1"},{"title":"Montréal (northamerica-northeast1)","id":"northamerica-northeast1"},{"title":"Toronto (northamerica-northeast2)","id":"northamerica-northeast2"},{"title":"Mexico (northamerica-south1)","id":"northamerica-south1"},{"title":"São Paulo (southamerica-east1)","id":"southamerica-east1"},{"title":"Santiago (southamerica-west1)","id":"southamerica-west1"},{"title":"Multi-region: us","id":"us"},{"title":"Iowa (us-central1)","id":"us-central1"},{"title":"South Carolina (us-east1)","id":"us-east1"},{"title":"Northern Virginia (us-east4)","id":"us-east4"},{"title":"Columbus (us-east5)","id":"us-east5"},{"title":"Dallas (us-south1)","id":"us-south1"},{"title":"Oregon (us-west1)","id":"us-west1"},{"title":"Los Angeles (us-west2)","id":"us-west2"},{"title":"Salt Lake City (us-west3)","id":"us-west3"},{"title":"Las Vegas (us-west4)","id":"us-west4"}]},{"created":"2025-06-12T21:23:42+00:00","modified":"2025-06-12T22:16:06+00:00","when":"2025-06-12T21:23:42+00:00","text":"Most of the Google Cloud products have confirmed full service recovery.\nA few services are still seeing some residual impact and the respective engineering teams are actively working on recovery of those services.\nWe expect the recovery to complete in less than an hour.\nWe will provide an update by Thursday, 2025-06-12 15:00 PDT with current details.","status":"SERVICE_OUTAGE","affected_locations":[{"title":"Johannesburg (africa-south1)","id":"africa-south1"},{"title":"Multi-region: asia","id":"asia"},{"title":"Taiwan (asia-east1)","id":"asia-east1"},{"title":"Hong Kong (asia-east2)","id":"asia-east2"},{"title":"Tokyo (asia-northeast1)","id":"asia-northeast1"},{"title":"Osaka (asia-northeast2)","id":"asia-northeast2"},{"title":"Seoul (asia-northeast3)","id":"asia-northeast3"},{"title":"Mumbai (asia-south1)","id":"asia-south1"},{"title":"Delhi (asia-south2)","id":"asia-south2"},{"title":"Singapore (asia-southeast1)","id":"asia-southeast1"},{"title":"Jakarta (asia-southeast2)","id":"asia-southeast2"},{"title":"Sydney (australia-southeast1)","id":"australia-southeast1"},{"title":"Melbourne (australia-southeast2)","id":"australia-southeast2"},{"title":"Multi-region: eu","id":"eu"},{"title":"Warsaw (europe-central2)","id":"europe-central2"},{"title":"Finland (europe-north1)","id":"europe-north1"},{"title":"Stockholm (europe-north2)","id":"europe-north2"},{"title":"Madrid (europe-southwest1)","id":"europe-southwest1"},{"title":"Belgium (europe-west1)","id":"europe-west1"},{"title":"Berlin (europe-west10)","id":"europe-west10"},{"title":"Turin (europe-west12)","id":"europe-west12"},{"title":"London (europe-west2)","id":"europe-west2"},{"title":"Frankfurt (europe-west3)","id":"europe-west3"},{"title":"Netherlands (europe-west4)","id":"europe-west4"},{"title":"Zurich (europe-west6)","id":"europe-west6"},{"title":"Milan (europe-west8)","id":"europe-west8"},{"title":"Paris (europe-west9)","id":"europe-west9"},{"title":"Global","id":"global"},{"title":"Doha (me-central1)","id":"me-central1"},{"title":"Dammam (me-central2)","id":"me-central2"},{"title":"Tel Aviv (me-west1)","id":"me-west1"},{"title":"Montréal (northamerica-northeast1)","id":"northamerica-northeast1"},{"title":"Toronto (northamerica-northeast2)","id":"northamerica-northeast2"},{"title":"Mexico (northamerica-south1)","id":"northamerica-south1"},{"title":"São Paulo (southamerica-east1)","id":"southamerica-east1"},{"title":"Santiago (southamerica-west1)","id":"southamerica-west1"},{"title":"Multi-region: us","id":"us"},{"title":"Iowa (us-central1)","id":"us-central1"},{"title":"South Carolina (us-east1)","id":"us-east1"},{"title":"Northern Virginia (us-east4)","id":"us-east4"},{"title":"Columbus (us-east5)","id":"us-east5"},{"title":"Dallas (us-south1)","id":"us-south1"},{"title":"Oregon (us-west1)","id":"us-west1"},{"title":"Los Angeles (us-west2)","id":"us-west2"},{"title":"Salt Lake City (us-west3)","id":"us-west3"},{"title":"Las Vegas (us-west4)","id":"us-west4"}]},{"created":"2025-06-12T21:00:07+00:00","modified":"2025-06-12T21:23:42+00:00","when":"2025-06-12T21:00:07+00:00","text":"We have implemented mitigation for the issue in us-central1 and multi-region/us and we are seeing signs of recovery.\nWe have received confirmation from our internal monitoring and customers that the Google Cloud products are also seeing recovery in multiple regions and are also seeing signs of some recovery in us-central1 and mutli-region/us.\nWe expect the recovery to complete in less than an hour.\nWe will provide an update by Thursday, 2025-06-12 14:30 PDT with current details.","status":"SERVICE_OUTAGE","affected_locations":[{"title":"Johannesburg (africa-south1)","id":"africa-south1"},{"title":"Multi-region: asia","id":"asia"},{"title":"Taiwan (asia-east1)","id":"asia-east1"},{"title":"Hong Kong (asia-east2)","id":"asia-east2"},{"title":"Tokyo (asia-northeast1)","id":"asia-northeast1"},{"title":"Osaka (asia-northeast2)","id":"asia-northeast2"},{"title":"Seoul (asia-northeast3)","id":"asia-northeast3"},{"title":"Mumbai (asia-south1)","id":"asia-south1"},{"title":"Delhi (asia-south2)","id":"asia-south2"},{"title":"Singapore (asia-southeast1)","id":"asia-southeast1"},{"title":"Jakarta (asia-southeast2)","id":"asia-southeast2"},{"title":"Multi-region: asia1","id":"asia1"},{"title":"Sydney (australia-southeast1)","id":"australia-southeast1"},{"title":"Melbourne (australia-southeast2)","id":"australia-southeast2"},{"title":"Multi-region: eu","id":"eu"},{"title":"Multi-region: eur3","id":"eur3"},{"title":"Multi-region: eur4","id":"eur4"},{"title":"Multi-region: eur5","id":"eur5"},{"title":"Warsaw (europe-central2)","id":"europe-central2"},{"title":"Finland (europe-north1)","id":"europe-north1"},{"title":"Stockholm (europe-north2)","id":"europe-north2"},{"title":"Madrid (europe-southwest1)","id":"europe-southwest1"},{"title":"Belgium (europe-west1)","id":"europe-west1"},{"title":"Berlin (europe-west10)","id":"europe-west10"},{"title":"Turin (europe-west12)","id":"europe-west12"},{"title":"London (europe-west2)","id":"europe-west2"},{"title":"Frankfurt (europe-west3)","id":"europe-west3"},{"title":"Netherlands (europe-west4)","id":"europe-west4"},{"title":"Zurich (europe-west6)","id":"europe-west6"},{"title":"Milan (europe-west8)","id":"europe-west8"},{"title":"Paris (europe-west9)","id":"europe-west9"},{"title":"Global","id":"global"},{"title":"Doha (me-central1)","id":"me-central1"},{"title":"Dammam (me-central2)","id":"me-central2"},{"title":"Tel Aviv (me-west1)","id":"me-west1"},{"title":"Multi-region: nam-eur-asia1","id":"nam-eur-asia1"},{"title":"Multi-region: nam10","id":"nam10"},{"title":"Multi-region: nam11","id":"nam11"},{"title":"Multi-region: nam12","id":"nam12"},{"title":"Multi-region: nam13","id":"nam13"},{"title":"Multi-region: nam3","id":"nam3"},{"title":"Multi-region: nam5","id":"nam5"},{"title":"Multi-region: nam6","id":"nam6"},{"title":"Multi-region: nam7","id":"nam7"},{"title":"Multi-region: nam8","id":"nam8"},{"title":"Multi-region: nam9","id":"nam9"},{"title":"Montréal (northamerica-northeast1)","id":"northamerica-northeast1"},{"title":"Toronto (northamerica-northeast2)","id":"northamerica-northeast2"},{"title":"Mexico (northamerica-south1)","id":"northamerica-south1"},{"title":"São Paulo (southamerica-east1)","id":"southamerica-east1"},{"title":"Santiago (southamerica-west1)","id":"southamerica-west1"},{"title":"Multi-region: us","id":"us"},{"title":"Iowa (us-central1)","id":"us-central1"},{"title":"South Carolina (us-east1)","id":"us-east1"},{"title":"Northern Virginia (us-east4)","id":"us-east4"},{"title":"Columbus (us-east5)","id":"us-east5"},{"title":"Dallas (us-south1)","id":"us-south1"},{"title":"Oregon (us-west1)","id":"us-west1"},{"title":"Los Angeles (us-west2)","id":"us-west2"},{"title":"Salt Lake City (us-west3)","id":"us-west3"},{"title":"Las Vegas (us-west4)","id":"us-west4"}]},{"created":"2025-06-12T20:16:22+00:00","modified":"2025-06-12T21:00:07+00:00","when":"2025-06-12T20:16:22+00:00","text":"We have identified the root cause and applied appropriate mitigations.\nOur infrastructure has recovered in all regions except us-central1.\nGoogle Cloud products that rely on the affected infrastructure are seeing recovery in multiple locations.\nOur engineers are aware of the customers still experiencing issues on us-central1 and multi-region/us and are actively working on full recovery.\nWe do not have an ETA for full recovery.\nWe will provide an update by Thursday, 2025-06-12 14:00 PDT with current details.","status":"SERVICE_OUTAGE","affected_locations":[{"title":"Johannesburg (africa-south1)","id":"africa-south1"},{"title":"Multi-region: asia","id":"asia"},{"title":"Taiwan (asia-east1)","id":"asia-east1"},{"title":"Hong Kong (asia-east2)","id":"asia-east2"},{"title":"Tokyo (asia-northeast1)","id":"asia-northeast1"},{"title":"Osaka (asia-northeast2)","id":"asia-northeast2"},{"title":"Seoul (asia-northeast3)","id":"asia-northeast3"},{"title":"Mumbai (asia-south1)","id":"asia-south1"},{"title":"Delhi (asia-south2)","id":"asia-south2"},{"title":"Singapore (asia-southeast1)","id":"asia-southeast1"},{"title":"Jakarta (asia-southeast2)","id":"asia-southeast2"},{"title":"Multi-region: asia1","id":"asia1"},{"title":"Sydney (australia-southeast1)","id":"australia-southeast1"},{"title":"Melbourne (australia-southeast2)","id":"australia-southeast2"},{"title":"Multi-region: eu","id":"eu"},{"title":"Multi-region: eur3","id":"eur3"},{"title":"Multi-region: eur4","id":"eur4"},{"title":"Multi-region: eur5","id":"eur5"},{"title":"Warsaw (europe-central2)","id":"europe-central2"},{"title":"Finland (europe-north1)","id":"europe-north1"},{"title":"Stockholm (europe-north2)","id":"europe-north2"},{"title":"Madrid (europe-southwest1)","id":"europe-southwest1"},{"title":"Belgium (europe-west1)","id":"europe-west1"},{"title":"Berlin (europe-west10)","id":"europe-west10"},{"title":"Turin (europe-west12)","id":"europe-west12"},{"title":"London (europe-west2)","id":"europe-west2"},{"title":"Frankfurt (europe-west3)","id":"europe-west3"},{"title":"Netherlands (europe-west4)","id":"europe-west4"},{"title":"Zurich (europe-west6)","id":"europe-west6"},{"title":"Milan (europe-west8)","id":"europe-west8"},{"title":"Paris (europe-west9)","id":"europe-west9"},{"title":"Global","id":"global"},{"title":"Doha (me-central1)","id":"me-central1"},{"title":"Dammam (me-central2)","id":"me-central2"},{"title":"Tel Aviv (me-west1)","id":"me-west1"},{"title":"Multi-region: nam-eur-asia1","id":"nam-eur-asia1"},{"title":"Multi-region: nam10","id":"nam10"},{"title":"Multi-region: nam11","id":"nam11"},{"title":"Multi-region: nam12","id":"nam12"},{"title":"Multi-region: nam13","id":"nam13"},{"title":"Multi-region: nam3","id":"nam3"},{"title":"Multi-region: nam5","id":"nam5"},{"title":"Multi-region: nam6","id":"nam6"},{"title":"Multi-region: nam7","id":"nam7"},{"title":"Multi-region: nam8","id":"nam8"},{"title":"Multi-region: nam9","id":"nam9"},{"title":"Montréal (northamerica-northeast1)","id":"northamerica-northeast1"},{"title":"Toronto (northamerica-northeast2)","id":"northamerica-northeast2"},{"title":"Mexico (northamerica-south1)","id":"northamerica-south1"},{"title":"São Paulo (southamerica-east1)","id":"southamerica-east1"},{"title":"Santiago (southamerica-west1)","id":"southamerica-west1"},{"title":"Multi-region: us","id":"us"},{"title":"Iowa (us-central1)","id":"us-central1"},{"title":"South Carolina (us-east1)","id":"us-east1"},{"title":"Northern Virginia (us-east4)","id":"us-east4"},{"title":"Columbus (us-east5)","id":"us-east5"},{"title":"Dallas (us-south1)","id":"us-south1"},{"title":"Oregon (us-west1)","id":"us-west1"},{"title":"Los Angeles (us-west2)","id":"us-west2"},{"title":"Salt Lake City (us-west3)","id":"us-west3"},{"title":"Las Vegas (us-west4)","id":"us-west4"}]},{"created":"2025-06-12T19:41:55+00:00","modified":"2025-06-12T20:16:22+00:00","when":"2025-06-12T19:41:55+00:00","text":"Our engineers have identified the root cause and have applied appropriate mitigations.\nWhile our engineers have confirmed that the underlying dependency is recovered in all locations except us-central1, ***we are aware that customers are still experiencing varying degrees of impact on individual google cloud products***. All the respective engineering teams are actively engaged and working on service recovery. We do not have an ETA for full service recovery.\nWe will provide an update by Thursday, 2025-06-12 13:30 PDT with current details.","status":"SERVICE_OUTAGE","affected_locations":[{"title":"Johannesburg (africa-south1)","id":"africa-south1"},{"title":"Multi-region: asia","id":"asia"},{"title":"Taiwan (asia-east1)","id":"asia-east1"},{"title":"Hong Kong (asia-east2)","id":"asia-east2"},{"title":"Tokyo (asia-northeast1)","id":"asia-northeast1"},{"title":"Osaka (asia-northeast2)","id":"asia-northeast2"},{"title":"Seoul (asia-northeast3)","id":"asia-northeast3"},{"title":"Mumbai (asia-south1)","id":"asia-south1"},{"title":"Delhi (asia-south2)","id":"asia-south2"},{"title":"Singapore (asia-southeast1)","id":"asia-southeast1"},{"title":"Jakarta (asia-southeast2)","id":"asia-southeast2"},{"title":"Multi-region: asia1","id":"asia1"},{"title":"Sydney (australia-southeast1)","id":"australia-southeast1"},{"title":"Melbourne (australia-southeast2)","id":"australia-southeast2"},{"title":"Multi-region: eu","id":"eu"},{"title":"Multi-region: eur3","id":"eur3"},{"title":"Multi-region: eur4","id":"eur4"},{"title":"Warsaw (europe-central2)","id":"europe-central2"},{"title":"Finland (europe-north1)","id":"europe-north1"},{"title":"Stockholm (europe-north2)","id":"europe-north2"},{"title":"Madrid (europe-southwest1)","id":"europe-southwest1"},{"title":"Belgium (europe-west1)","id":"europe-west1"},{"title":"Berlin (europe-west10)","id":"europe-west10"},{"title":"Turin (europe-west12)","id":"europe-west12"},{"title":"London (europe-west2)","id":"europe-west2"},{"title":"Frankfurt (europe-west3)","id":"europe-west3"},{"title":"Netherlands (europe-west4)","id":"europe-west4"},{"title":"Zurich (europe-west6)","id":"europe-west6"},{"title":"Milan (europe-west8)","id":"europe-west8"},{"title":"Paris (europe-west9)","id":"europe-west9"},{"title":"Global","id":"global"},{"title":"Doha (me-central1)","id":"me-central1"},{"title":"Dammam (me-central2)","id":"me-central2"},{"title":"Tel Aviv (me-west1)","id":"me-west1"},{"title":"Multi-region: nam5","id":"nam5"},{"title":"Montréal (northamerica-northeast1)","id":"northamerica-northeast1"},{"title":"Toronto (northamerica-northeast2)","id":"northamerica-northeast2"},{"title":"Mexico (northamerica-south1)","id":"northamerica-south1"},{"title":"São Paulo (southamerica-east1)","id":"southamerica-east1"},{"title":"Santiago (southamerica-west1)","id":"southamerica-west1"},{"title":"Multi-region: us","id":"us"},{"title":"Iowa (us-central1)","id":"us-central1"},{"title":"South Carolina (us-east1)","id":"us-east1"},{"title":"Northern Virginia (us-east4)","id":"us-east4"},{"title":"Columbus (us-east5)","id":"us-east5"},{"title":"Dallas (us-south1)","id":"us-south1"},{"title":"Oregon (us-west1)","id":"us-west1"},{"title":"Los Angeles (us-west2)","id":"us-west2"},{"title":"Salt Lake City (us-west3)","id":"us-west3"},{"title":"Las Vegas (us-west4)","id":"us-west4"}]},{"created":"2025-06-12T19:30:44+00:00","modified":"2025-06-12T19:41:55+00:00","when":"2025-06-12T19:30:44+00:00","text":"All locations except us-central1 have fully recovered. us-central1 is mostly recovered. We do not have an ETA for full recovery in us-central1.\nWe will provide an update by Thursday, 2025-06-12 13:00 PDT with current details.","status":"SERVICE_OUTAGE","affected_locations":[{"title":"Johannesburg (africa-south1)","id":"africa-south1"},{"title":"Multi-region: asia","id":"asia"},{"title":"Taiwan (asia-east1)","id":"asia-east1"},{"title":"Hong Kong (asia-east2)","id":"asia-east2"},{"title":"Tokyo (asia-northeast1)","id":"asia-northeast1"},{"title":"Osaka (asia-northeast2)","id":"asia-northeast2"},{"title":"Seoul (asia-northeast3)","id":"asia-northeast3"},{"title":"Mumbai (asia-south1)","id":"asia-south1"},{"title":"Delhi (asia-south2)","id":"asia-south2"},{"title":"Singapore (asia-southeast1)","id":"asia-southeast1"},{"title":"Jakarta (asia-southeast2)","id":"asia-southeast2"},{"title":"Multi-region: asia1","id":"asia1"},{"title":"Sydney (australia-southeast1)","id":"australia-southeast1"},{"title":"Melbourne (australia-southeast2)","id":"australia-southeast2"},{"title":"Multi-region: eu","id":"eu"},{"title":"Multi-region: eur3","id":"eur3"},{"title":"Multi-region: eur4","id":"eur4"},{"title":"Warsaw (europe-central2)","id":"europe-central2"},{"title":"Finland (europe-north1)","id":"europe-north1"},{"title":"Stockholm (europe-north2)","id":"europe-north2"},{"title":"Madrid (europe-southwest1)","id":"europe-southwest1"},{"title":"Belgium (europe-west1)","id":"europe-west1"},{"title":"Berlin (europe-west10)","id":"europe-west10"},{"title":"Turin (europe-west12)","id":"europe-west12"},{"title":"London (europe-west2)","id":"europe-west2"},{"title":"Frankfurt (europe-west3)","id":"europe-west3"},{"title":"Netherlands (europe-west4)","id":"europe-west4"},{"title":"Zurich (europe-west6)","id":"europe-west6"},{"title":"Milan (europe-west8)","id":"europe-west8"},{"title":"Paris (europe-west9)","id":"europe-west9"},{"title":"Global","id":"global"},{"title":"Doha (me-central1)","id":"me-central1"},{"title":"Dammam (me-central2)","id":"me-central2"},{"title":"Tel Aviv (me-west1)","id":"me-west1"},{"title":"Multi-region: nam5","id":"nam5"},{"title":"Montréal (northamerica-northeast1)","id":"northamerica-northeast1"},{"title":"Toronto (northamerica-northeast2)","id":"northamerica-northeast2"},{"title":"Mexico (northamerica-south1)","id":"northamerica-south1"},{"title":"São Paulo (southamerica-east1)","id":"southamerica-east1"},{"title":"Santiago (southamerica-west1)","id":"southamerica-west1"},{"title":"Multi-region: us","id":"us"},{"title":"Iowa (us-central1)","id":"us-central1"},{"title":"South Carolina (us-east1)","id":"us-east1"},{"title":"Northern Virginia (us-east4)","id":"us-east4"},{"title":"Columbus (us-east5)","id":"us-east5"},{"title":"Dallas (us-south1)","id":"us-south1"},{"title":"Oregon (us-west1)","id":"us-west1"},{"title":"Los Angeles (us-west2)","id":"us-west2"},{"title":"Salt Lake City (us-west3)","id":"us-west3"},{"title":"Las Vegas (us-west4)","id":"us-west4"}]},{"created":"2025-06-12T19:09:08+00:00","modified":"2025-06-12T19:30:44+00:00","when":"2025-06-12T19:09:08+00:00","text":"Our engineers are continuing to mitigate the issue and we have confirmation that the issue is recovered in some locations.\nWe do not have an ETA on full mitigation at this point.\nWe will provide an update by Thursday, 2025-06-12 12:45 PDT with current details.","status":"SERVICE_OUTAGE","affected_locations":[{"title":"Johannesburg (africa-south1)","id":"africa-south1"},{"title":"Multi-region: asia","id":"asia"},{"title":"Taiwan (asia-east1)","id":"asia-east1"},{"title":"Hong Kong (asia-east2)","id":"asia-east2"},{"title":"Tokyo (asia-northeast1)","id":"asia-northeast1"},{"title":"Osaka (asia-northeast2)","id":"asia-northeast2"},{"title":"Seoul (asia-northeast3)","id":"asia-northeast3"},{"title":"Mumbai (asia-south1)","id":"asia-south1"},{"title":"Delhi (asia-south2)","id":"asia-south2"},{"title":"Singapore (asia-southeast1)","id":"asia-southeast1"},{"title":"Jakarta (asia-southeast2)","id":"asia-southeast2"},{"title":"Multi-region: asia1","id":"asia1"},{"title":"Sydney (australia-southeast1)","id":"australia-southeast1"},{"title":"Melbourne (australia-southeast2)","id":"australia-southeast2"},{"title":"Multi-region: eu","id":"eu"},{"title":"Multi-region: eur4","id":"eur4"},{"title":"Warsaw (europe-central2)","id":"europe-central2"},{"title":"Finland (europe-north1)","id":"europe-north1"},{"title":"Stockholm (europe-north2)","id":"europe-north2"},{"title":"Madrid (europe-southwest1)","id":"europe-southwest1"},{"title":"Belgium (europe-west1)","id":"europe-west1"},{"title":"Berlin (europe-west10)","id":"europe-west10"},{"title":"Turin (europe-west12)","id":"europe-west12"},{"title":"London (europe-west2)","id":"europe-west2"},{"title":"Frankfurt (europe-west3)","id":"europe-west3"},{"title":"Netherlands (europe-west4)","id":"europe-west4"},{"title":"Zurich (europe-west6)","id":"europe-west6"},{"title":"Milan (europe-west8)","id":"europe-west8"},{"title":"Paris (europe-west9)","id":"europe-west9"},{"title":"Global","id":"global"},{"title":"Doha (me-central1)","id":"me-central1"},{"title":"Dammam (me-central2)","id":"me-central2"},{"title":"Tel Aviv (me-west1)","id":"me-west1"},{"title":"Montréal (northamerica-northeast1)","id":"northamerica-northeast1"},{"title":"Toronto (northamerica-northeast2)","id":"northamerica-northeast2"},{"title":"Mexico (northamerica-south1)","id":"northamerica-south1"},{"title":"São Paulo (southamerica-east1)","id":"southamerica-east1"},{"title":"Santiago (southamerica-west1)","id":"southamerica-west1"},{"title":"Multi-region: us","id":"us"},{"title":"Iowa (us-central1)","id":"us-central1"},{"title":"South Carolina (us-east1)","id":"us-east1"},{"title":"Northern Virginia (us-east4)","id":"us-east4"},{"title":"Columbus (us-east5)","id":"us-east5"},{"title":"Dallas (us-south1)","id":"us-south1"},{"title":"Oregon (us-west1)","id":"us-west1"},{"title":"Los Angeles (us-west2)","id":"us-west2"},{"title":"Salt Lake City (us-west3)","id":"us-west3"},{"title":"Las Vegas (us-west4)","id":"us-west4"}]},{"created":"2025-06-12T18:59:31+00:00","modified":"2025-06-12T19:15:51+00:00","when":"2025-06-12T18:59:31+00:00","text":"**Summary:**\nMultiple GCP products are experiencing Service issues with API requests\n**Description**\nWe are experiencing service issues with multiple GCP products beginning at Thursday, 2025-06-12 10:51 PDT.\nOur engineering team continues to investigate the issue.\nWe will provide an update by Thursday, 2025-06-12 12:15 PDT with current details.\nWe apologize to all who are affected by the disruption.\n**Symptoms:**\nMultiple GCP products are experiencing varying level of service impacts with API requests.\n**Workaround:**\nNone at this time.","status":"SERVICE_OUTAGE","affected_locations":[{"title":"Johannesburg (africa-south1)","id":"africa-south1"},{"title":"Multi-region: asia","id":"asia"},{"title":"Taiwan (asia-east1)","id":"asia-east1"},{"title":"Hong Kong (asia-east2)","id":"asia-east2"},{"title":"Tokyo (asia-northeast1)","id":"asia-northeast1"},{"title":"Osaka (asia-northeast2)","id":"asia-northeast2"},{"title":"Seoul (asia-northeast3)","id":"asia-northeast3"},{"title":"Mumbai (asia-south1)","id":"asia-south1"},{"title":"Delhi (asia-south2)","id":"asia-south2"},{"title":"Singapore (asia-southeast1)","id":"asia-southeast1"},{"title":"Jakarta (asia-southeast2)","id":"asia-southeast2"},{"title":"Multi-region: asia1","id":"asia1"},{"title":"Sydney (australia-southeast1)","id":"australia-southeast1"},{"title":"Melbourne (australia-southeast2)","id":"australia-southeast2"},{"title":"Multi-region: eu","id":"eu"},{"title":"Multi-region: eur4","id":"eur4"},{"title":"Warsaw (europe-central2)","id":"europe-central2"},{"title":"Finland (europe-north1)","id":"europe-north1"},{"title":"Stockholm (europe-north2)","id":"europe-north2"},{"title":"Madrid (europe-southwest1)","id":"europe-southwest1"},{"title":"Belgium (europe-west1)","id":"europe-west1"},{"title":"Berlin (europe-west10)","id":"europe-west10"},{"title":"Turin (europe-west12)","id":"europe-west12"},{"title":"London (europe-west2)","id":"europe-west2"},{"title":"Frankfurt (europe-west3)","id":"europe-west3"},{"title":"Netherlands (europe-west4)","id":"europe-west4"},{"title":"Zurich (europe-west6)","id":"europe-west6"},{"title":"Milan (europe-west8)","id":"europe-west8"},{"title":"Paris (europe-west9)","id":"europe-west9"},{"title":"Global","id":"global"},{"title":"Doha (me-central1)","id":"me-central1"},{"title":"Dammam (me-central2)","id":"me-central2"},{"title":"Tel Aviv (me-west1)","id":"me-west1"},{"title":"Montréal (northamerica-northeast1)","id":"northamerica-northeast1"},{"title":"Toronto (northamerica-northeast2)","id":"northamerica-northeast2"},{"title":"Mexico (northamerica-south1)","id":"northamerica-south1"},{"title":"São Paulo (southamerica-east1)","id":"southamerica-east1"},{"title":"Santiago (southamerica-west1)","id":"southamerica-west1"},{"title":"Multi-region: us","id":"us"},{"title":"Iowa (us-central1)","id":"us-central1"},{"title":"South Carolina (us-east1)","id":"us-east1"},{"title":"Northern Virginia (us-east4)","id":"us-east4"},{"title":"Columbus (us-east5)","id":"us-east5"},{"title":"Dallas (us-south1)","id":"us-south1"},{"title":"Oregon (us-west1)","id":"us-west1"},{"title":"Los Angeles (us-west2)","id":"us-west2"},{"title":"Salt Lake City (us-west3)","id":"us-west3"},{"title":"Las Vegas (us-west4)","id":"us-west4"}]},{"created":"2025-06-12T18:46:38+00:00","modified":"2025-06-12T18:59:31+00:00","when":"2025-06-12T18:46:38+00:00","text":"**Summary:**\nMultiple GCP products are experiencing Service issues\n**Description**\nWe are experiencing service issues with multiple GCP products beginning at Thursday, 2025-06-12 10:51 PDT.\nOur engineering team continues to investigate the issue.\nWe will provide an update by Thursday, 2025-06-12 12:15 PDT with current details.\nWe apologize to all who are affected by the disruption.\n**Symptoms:**\nMultiple GCP products are experiencing varying level of service impacts.\n**Workaround:**\nNone at this time.","status":"SERVICE_OUTAGE","affected_locations":[{"title":"Johannesburg (africa-south1)","id":"africa-south1"},{"title":"Multi-region: asia","id":"asia"},{"title":"Taiwan (asia-east1)","id":"asia-east1"},{"title":"Hong Kong (asia-east2)","id":"asia-east2"},{"title":"Tokyo (asia-northeast1)","id":"asia-northeast1"},{"title":"Osaka (asia-northeast2)","id":"asia-northeast2"},{"title":"Seoul (asia-northeast3)","id":"asia-northeast3"},{"title":"Mumbai (asia-south1)","id":"asia-south1"},{"title":"Delhi (asia-south2)","id":"asia-south2"},{"title":"Singapore (asia-southeast1)","id":"asia-southeast1"},{"title":"Jakarta (asia-southeast2)","id":"asia-southeast2"},{"title":"Multi-region: asia1","id":"asia1"},{"title":"Sydney (australia-southeast1)","id":"australia-southeast1"},{"title":"Melbourne (australia-southeast2)","id":"australia-southeast2"},{"title":"Multi-region: eu","id":"eu"},{"title":"Multi-region: eur4","id":"eur4"},{"title":"Warsaw (europe-central2)","id":"europe-central2"},{"title":"Finland (europe-north1)","id":"europe-north1"},{"title":"Stockholm (europe-north2)","id":"europe-north2"},{"title":"Madrid (europe-southwest1)","id":"europe-southwest1"},{"title":"Belgium (europe-west1)","id":"europe-west1"},{"title":"Berlin (europe-west10)","id":"europe-west10"},{"title":"Turin (europe-west12)","id":"europe-west12"},{"title":"London (europe-west2)","id":"europe-west2"},{"title":"Frankfurt (europe-west3)","id":"europe-west3"},{"title":"Netherlands (europe-west4)","id":"europe-west4"},{"title":"Zurich (europe-west6)","id":"europe-west6"},{"title":"Milan (europe-west8)","id":"europe-west8"},{"title":"Paris (europe-west9)","id":"europe-west9"},{"title":"Global","id":"global"},{"title":"Doha (me-central1)","id":"me-central1"},{"title":"Dammam (me-central2)","id":"me-central2"},{"title":"Tel Aviv (me-west1)","id":"me-west1"},{"title":"Montréal (northamerica-northeast1)","id":"northamerica-northeast1"},{"title":"Toronto (northamerica-northeast2)","id":"northamerica-northeast2"},{"title":"Mexico (northamerica-south1)","id":"northamerica-south1"},{"title":"São Paulo (southamerica-east1)","id":"southamerica-east1"},{"title":"Santiago (southamerica-west1)","id":"southamerica-west1"},{"title":"Multi-region: us","id":"us"},{"title":"Iowa (us-central1)","id":"us-central1"},{"title":"South Carolina (us-east1)","id":"us-east1"},{"title":"Northern Virginia (us-east4)","id":"us-east4"},{"title":"Columbus (us-east5)","id":"us-east5"},{"title":"Dallas (us-south1)","id":"us-south1"},{"title":"Oregon (us-west1)","id":"us-west1"},{"title":"Los Angeles (us-west2)","id":"us-west2"},{"title":"Salt Lake City (us-west3)","id":"us-west3"},{"title":"Las Vegas (us-west4)","id":"us-west4"}]}],"most_recent_update":{"created":"2025-06-13T23:45:21+00:00","modified":"2025-06-13T23:48:18+00:00","when":"2025-06-13T23:45:21+00:00","text":"# Incident Report\n## **Summary**\n*Google Cloud, Google Workspace and Google Security Operations products experienced increased 503 errors in external API requests, impacting customers.*\n***We deeply apologize for the impact this outage has had. Google Cloud customers and their users trust their businesses to Google, and we will do better. We apologize for the impact this has had not only on our customers’ businesses and their users but also on the trust of our systems. We are committed to making improvements to help avoid outages like this moving forward.***\n### **What happened?**\nGoogle and Google Cloud APIs are served through our Google API management and control planes. Distributed regionally, these management and control planes are responsible for ensuring each API request that comes in is authorized, has the policy and appropriate checks (like quota) to meet their endpoints. The core binary that is part of this policy check system is known as Service Control. Service Control is a regional service that has a regional datastore that it reads quota and policy information from. This datastore metadata gets replicated almost instantly globally to manage quota policies for Google Cloud and our customers.\nOn May 29, 2025, a new feature was added to Service Control for additional quota policy checks. This code change and binary release went through our region by region rollout, but the code path that failed was never exercised during this rollout due to needing a policy change that would trigger the code. As a safety precaution, this code change came with a red-button to turn off that particular policy serving path. The issue with this change was that it did not have appropriate error handling nor was it feature flag protected. Without the appropriate error handling, the null pointer caused the binary to crash. Feature flags are used to gradually enable the feature region by region per project, starting with internal projects, to enable us to catch issues. If this had been flag protected, the issue would have been caught in staging.\nOn June 12, 2025 at \\~10:45am PDT, a policy change was inserted into the regional Spanner tables that Service Control uses for policies. Given the global nature of quota management, this metadata was replicated globally within seconds. This policy data contained unintended blank fields. Service Control, then regionally exercised quota checks on policies in each regional datastore. This pulled in blank fields for this respective policy change and exercised the code path that hit the null pointer causing the binaries to go into a crash loop. This occurred globally given each regional deployment.\nWithin 2 minutes, our Site Reliability Engineering team was triaging the incident. Within 10 minutes, the root cause was identified and the red-button (to disable the serving path) was being put in place. The red-button was ready to roll out \\~25 minutes from the start of the incident. Within 40 minutes of the incident, the red-button rollout was completed, and we started seeing recovery across regions, starting with the smaller ones first.\nWithin some of our larger regions, such as us-central-1, as Service Control tasks restarted, it created a herd effect on the underlying infrastructure it depends on (i.e. that Spanner table), overloading the infrastructure. Service Control did not have the appropriate randomized exponential backoff implemented to avoid this. It took up to \\~2h 40 mins to fully resolve in us-central-1 as we throttled task creation to minimize the impact on the underlying infrastructure and routed traffic to multi-regional databases to reduce the load. At that point, Service Control and API serving was fully recovered across all regions. Corresponding Google and Google Cloud products started recovering with some taking longer depending upon their architecture.\n### **What is our immediate path forward?**\nImmediately upon recovery, we froze all changes to the Service Control stack and manual policy pushes until we can completely remediate the system.\n### **How did we communicate?**\nWe posted our first incident report to Cloud Service Health about \\~1h after the start of the crashes, due to the Cloud Service Health infrastructure being down due to this outage. For some customers, the monitoring infrastructure they had running on Google Cloud was also failing, leaving them without a signal of the incident or an understanding of the impact to their business and/or infrastructure. We will address this going forward.\n### **What’s our approach moving forward?**\nBeyond freezing the system as mentioned above, we will prioritize and safely complete the following:\n* We will modularize Service Control’s architecture, so the functionality is isolated and fails open. Thus, if a corresponding check fails, Service Control can still serve API requests.\n* We will audit all systems that consume globally replicated data. Regardless of the business need for near instantaneous consistency of the data globally (i.e. quota management settings are global), data replication needs to be propagated incrementally with sufficient time to validate and detect issues.\n* We will enforce all changes to critical binaries to be feature flag protected and disabled by default.\n* We will improve our static analysis and testing practices to correctly handle errors and if need be fail open.\n* We will audit and ensure our systems employ randomized exponential backoff.\n* We will improve our external communications, both automated and human, so our customers get the information they need asap to react to issues, manage their systems and help their customers.\n* We'll ensure our monitoring and communication infrastructure remains operational to serve customers even when Google Cloud and our primary monitoring products are down, ensuring business continuity.\n-------","status":"AVAILABLE","affected_locations":[]},"status_impact":"SERVICE_OUTAGE","severity":"high","service_key":"zall","service_name":"Multiple Products","affected_products":[{"title":"API Gateway","id":"VzyLPL7CtWQqJ9WeKAjp"},{"title":"Agent Assist","id":"eUntUKqUrHdbBLNcVVXq"},{"title":"AlloyDB for PostgreSQL","id":"fPovtKbaWN9UTepMm3kJ"},{"title":"Apigee","id":"9Y13BNFy4fJydvjdsN3X"},{"title":"Apigee Edge Public Cloud","id":"SumcdgBT6GQBzp1vmdXu"},{"title":"Apigee Hybrid","id":"6gaft97Gv5hGQAJg6D3J"},{"title":"Cloud Data Fusion","id":"rLKDHeeaBiXTeutF1air"},{"title":"Cloud Firestore","id":"CETSkT92V21G6A1x28me"},{"title":"Cloud Logging","id":"PuCJ6W2ovoDhLcyvZ1xa"},{"title":"Cloud Memorystore","id":"LGPLu3M5pcUAKU1z6eP3"},{"title":"Cloud Monitoring","id":"3zaaDb7antc73BM1UAVT"},{"title":"Cloud Run","id":"9D7d2iNBQWN24zc1VamE"},{"title":"Cloud Security Command Center","id":"csyyfUYy88hkeqbv23Mc"},{"title":"Cloud Shell","id":"wF3PG44o1RzTnUW5dycy"},{"title":"Cloud Spanner","id":"EcNGGUgBtBLrtm4mWvqC"},{"title":"Cloud Workstations","id":"5UUXCiH1vfFHXmbDixrB"},{"title":"Contact Center AI Platform","id":"eSAGSSEKoxh8tTJucdYg"},{"title":"Contact Center Insights","id":"WYJx5eWkh8ZrCSQUcP4i"},{"title":"Data Catalog","id":"TFedVRYgKGRGMSJrUpup"},{"title":"Database Migration Service","id":"vY4CRgRFNbqUXWWyYGFS"},{"title":"Dataform","id":"JSShQKADMU3uXYNbCRCh"},{"title":"Dataplex","id":"Xx5qm9U2ovrN11z2Gd9Q"},{"title":"Dataproc Metastore","id":"PXZh68NPz9auRyo4tVfy"},{"title":"Datastream","id":"ibJgP4CNKnFojHHw8L3s"},{"title":"Dialogflow CX","id":"BnCicQdHSdxaCv8Ya6Vm"},{"title":"Dialogflow ES","id":"sQqrYvhjMT5crPHKWJFY"},{"title":"Google App Engine","id":"kchyUtnkMHJWaAva8aYc"},{"title":"Google BigQuery","id":"9CcrhHUcFevXPSVaSxkf"},{"title":"Google Cloud Bigtable","id":"LfZSuE3xdQU46YMFV5fy"},{"title":"Google Cloud Composer","id":"YxkG5FfcC42cQmvBCk4j"},{"title":"Google Cloud Console","id":"Wdsr1n5vyDvCt78qEifm"},{"title":"Google Cloud DNS","id":"TUZUsWSJUVJGW97Jq2sH"},{"title":"Google Cloud Dataflow","id":"T9bFoXPqG8w8g1YbWTKY"},{"title":"Google Cloud Dataproc","id":"yjXrEg3Yvy26BauMwr69"},{"title":"Google Cloud Pub/Sub","id":"dFjdLh2v6zuES6t9ADCB"},{"title":"Google Cloud SQL","id":"hV87iK5DcEXKgWU2kDri"},{"title":"Google Cloud Storage","id":"UwaYoXQ5bHYHG6EdiPB8"},{"title":"Google Compute Engine","id":"L3ggmi3Jy4xJmgodFA9K"},{"title":"Identity Platform","id":"LE1X2BHYANNsHtG1NM1M"},{"title":"Identity and Access Management","id":"adnGEDEt9zWzs8uF1oKA"},{"title":"Looker Studio","id":"kEYNqRYFXXHxP9QeFJ1d"},{"title":"Managed Service for Apache Kafka","id":"QMZ3IpyG3Ooxotv7JOKV"},{"title":"Memorystore for Memcached","id":"paC6vmsvnjCHsBkp4Wva"},{"title":"Memorystore for Redis","id":"3yFciKa9NQH7pmbnUYUs"},{"title":"Memorystore for Redis Cluster","id":"pAQRwuhqRn7Y1E2we8ds"},{"title":"Persistent Disk","id":"SzESm2Ux129pjDGKWD68"},{"title":"Personalized Service Health","id":"jY8GKegoC5RUVERU7vUG"},{"title":"Pub/Sub Lite","id":"5DWkcStmv4dFHRHLaRXb"},{"title":"Speech-to-Text","id":"5f5oET9B3whnSFHfwy4d"},{"title":"Text-to-Speech","id":"2Xt4Wt8rVvbz3UPsHBvx"},{"title":"Vertex AI Online Prediction","id":"sdXM79fz1FS6ekNpu37K"},{"title":"Vertex AI Search","id":"vNncXxtSVvqyhvSkQ6PJ"},{"title":"Vertex Gemini API","id":"Z0FZJAMvEB4j3NbCJs6B"},{"title":"Vertex Imagen API","id":"zeBmbgdSyHGTvPAiXwVS"},{"title":"reCAPTCHA Enterprise","id":"BubghYKyn8WLY5wnSjZL"}],"uri":"incidents/ow5i3PPK96RduMcb1SsW","currently_affected_locations":[],"previously_affected_locations":[{"title":"Johannesburg (africa-south1)","id":"africa-south1"},{"title":"Multi-region: asia","id":"asia"},{"title":"Taiwan (asia-east1)","id":"asia-east1"},{"title":"Hong Kong (asia-east2)","id":"asia-east2"},{"title":"Tokyo (asia-northeast1)","id":"asia-northeast1"},{"title":"Osaka (asia-northeast2)","id":"asia-northeast2"},{"title":"Seoul (asia-northeast3)","id":"asia-northeast3"},{"title":"Mumbai (asia-south1)","id":"asia-south1"},{"title":"Delhi (asia-south2)","id":"asia-south2"},{"title":"Singapore (asia-southeast1)","id":"asia-southeast1"},{"title":"Jakarta (asia-southeast2)","id":"asia-southeast2"},{"title":"Multi-region: asia1","id":"asia1"},{"title":"Sydney (australia-southeast1)","id":"australia-southeast1"},{"title":"Melbourne (australia-southeast2)","id":"australia-southeast2"},{"title":"Multi-region: eu","id":"eu"},{"title":"Multi-region: eur3","id":"eur3"},{"title":"Multi-region: eur4","id":"eur4"},{"title":"Multi-region: eur5","id":"eur5"},{"title":"Warsaw (europe-central2)","id":"europe-central2"},{"title":"Finland (europe-north1)","id":"europe-north1"},{"title":"Stockholm (europe-north2)","id":"europe-north2"},{"title":"Madrid (europe-southwest1)","id":"europe-southwest1"},{"title":"Belgium (europe-west1)","id":"europe-west1"},{"title":"Berlin (europe-west10)","id":"europe-west10"},{"title":"Turin (europe-west12)","id":"europe-west12"},{"title":"London (europe-west2)","id":"europe-west2"},{"title":"Frankfurt (europe-west3)","id":"europe-west3"},{"title":"Netherlands (europe-west4)","id":"europe-west4"},{"title":"Zurich (europe-west6)","id":"europe-west6"},{"title":"Milan (europe-west8)","id":"europe-west8"},{"title":"Paris (europe-west9)","id":"europe-west9"},{"title":"Global","id":"global"},{"title":"Doha (me-central1)","id":"me-central1"},{"title":"Dammam (me-central2)","id":"me-central2"},{"title":"Tel Aviv (me-west1)","id":"me-west1"},{"title":"Multi-region: nam-eur-asia1","id":"nam-eur-asia1"},{"title":"Multi-region: nam10","id":"nam10"},{"title":"Multi-region: nam11","id":"nam11"},{"title":"Multi-region: nam12","id":"nam12"},{"title":"Multi-region: nam13","id":"nam13"},{"title":"Multi-region: nam3","id":"nam3"},{"title":"Multi-region: nam5","id":"nam5"},{"title":"Multi-region: nam6","id":"nam6"},{"title":"Multi-region: nam7","id":"nam7"},{"title":"Multi-region: nam8","id":"nam8"},{"title":"Multi-region: nam9","id":"nam9"},{"title":"Montréal (northamerica-northeast1)","id":"northamerica-northeast1"},{"title":"Toronto (northamerica-northeast2)","id":"northamerica-northeast2"},{"title":"Mexico (northamerica-south1)","id":"northamerica-south1"},{"title":"São Paulo (southamerica-east1)","id":"southamerica-east1"},{"title":"Santiago (southamerica-west1)","id":"southamerica-west1"},{"title":"Multi-region: us","id":"us"},{"title":"Iowa (us-central1)","id":"us-central1"},{"title":"South Carolina (us-east1)","id":"us-east1"},{"title":"Northern Virginia (us-east4)","id":"us-east4"},{"title":"Columbus (us-east5)","id":"us-east5"},{"title":"Dallas (us-south1)","id":"us-south1"},{"title":"Oregon (us-west1)","id":"us-west1"},{"title":"Los Angeles (us-west2)","id":"us-west2"},{"title":"Salt Lake City (us-west3)","id":"us-west3"},{"title":"Las Vegas (us-west4)","id":"us-west4"}]},{"id":"SXRPpPwx2RZ5VHjTwFLx","number":"13271141640052664026","begin":"2025-05-20T03:23:00+00:00","created":"2025-05-20T11:07:41+00:00","end":"2025-05-20T12:05:00+00:00","modified":"2025-05-27T23:06:21+00:00","external_desc":"Google Compute Engine (GCE) issue impacting multiple dependent GCP services across zones","updates":[{"created":"2025-05-27T23:06:21+00:00","modified":"2025-05-27T23:06:22+00:00","when":"2025-05-27T23:06:21+00:00","text":"# Incident Report\n## Summary\nOn 19 May 2025, Google Compute Engine (GCE) encountered problems affecting Spot VM termination globally, and performance degradation and timeouts of reservation consumption / VM creation in us-central1 and us-east4 for a duration of 8 hours, 42 minutes. Consequently, multiple other Google Cloud Platform (GCP) products relying on GCE also experienced increased latencies and timeouts.\nTo our customers who were impacted during this disruption, we sincerely apologize. This is not the level of quality and reliability we strive to offer you, and we are taking immediate steps to improve the platform’s performance and availability.\n## Root Cause\nA recently deployed configuration change to a Google Compute Engine (GCE) component mistakenly disabled a feature flag that controlled how VM instance states are reported to other components. Safety checks intended to ensure gradual rollout of this type of change failed to be triggered, resulting in an unplanned rapid rollout of the change.\nThis caused Spot VMs to be stuck in an unexpected state. Consequently, Spot VMs that had initiated their standard termination process due to preemption began to accumulate as they failed to complete termination, creating a backlog that degraded performance for all VM types in some regions.\n## Remediation and Prevention\nGoogle engineers were alerted to the outage via internal monitoring on 19 May 2025, at 21:08 US/Pacific, and immediately started an investigation. Once the nature and scope of the issue became clear, Google engineers initiated a rollback of the change on 20 May 2025 at 03:29 US/Pacific.\nThe rollback completed at 03:55 US/Pacific, mitigating the impact.\nGoogle is committed to preventing a repeat of this issue in the future and is completing the following actions:\n* Google Cloud employs a robust and well-defined methodology for production updates, including a phased rollout approach as standard practice to avoid rapid global changes. This phased approach is meant to ensure that changes are introduced into production gradually and as safely as possible, however, in this case, the safety checks were not enforced. We have paused further feature flag rollouts for the affected system, while we undertake a comprehensive audit of safety checks and fix any exposed gaps that led to the unplanned rapid rollout of this change.\n* We will review and address scalability issues encountered by GCE during the incident.\n* We will improve monitoring coverage of Spot VM deletion workflows.\nGoogle is committed to quickly and continually improving our technology and operations to prevent service disruptions. We appreciate your patience and apologize again for the impact to your organization. We thank you for your business.\n## Detailed Description of Impact\nCustomers experienced increased latency for VM control plane operations in us-central1 and us-east4. VM control plane operations include creating, modifying, or deleting VMs. For some customers, Spot VM instances became stuck while terminating. Customers were not billed for Spot VM instances in this state. Furthermore, running virtual machines and the data plane were not impacted.\nVM control plane latency in the us-central1 and us-east4 regions began increasing at the start of the incident (19 May 2025 20:23 US/Pacific), and peaked around 20 May 2025 03:40 US/Pacific. At peak, median latency went from seconds to minutes, and tail latency went from minutes to hours. Several other regions experienced increased tail latency during the outage, but most operations in these regions completed as normal. Once mitigations took effect, median and tail latencies started falling and returned to normal by 05:15 US/Pacific.\nCustomers may have experienced similar latency increases in products which create, modify, failover or delete VM instances: GCE, GKE, Dataflow, Cloud SQL, Google Cloud Dataproc, Google App Engine, Cloud Deploy, Memorystore, Redis, Cloud Filestore, among others.","status":"AVAILABLE","affected_locations":[]},{"created":"2025-05-20T18:58:49+00:00","modified":"2025-05-27T23:06:21+00:00","when":"2025-05-20T18:58:49+00:00","text":"## \\# Mini Incident Report\nWe apologize for the inconvenience this service disruption may have caused. We would like to provide some information about this incident below. Please note, this information is based on our best knowledge at the time of posting and is subject to change as our investigation continues. If you have experienced impact outside of what is listed below, please reach out to Google Cloud Support using [***https://cloud.google.com/support***](https://cloud.google.com/support).\n(All Times US/Pacific)\n**Incident Start:** 19 May 2025 20:23:00\n**Incident End:** 20 May 2025 05:05:00\n**Duration:** 8 hours, 42 minutes\n**Affected Services and Features:**\nGoogle Compute Engine, Google Kubernetes Engine, Cloud Dataflow, Cloud SQL, AlloyDb for PostgreSQL, Cloud Composer, Cloud Build, Cloud Dataproc, Google App Engine, Migrate to Virtual Machines, Vertex GenAI, Cloud Deploy and Memorystore for Redis.\n**Regions/Zones:**\n***VM creation issues:***\n* us-central1 all zones\n* us-east4 all zones\n***VM termination issues:***\nasia-east1, asia-northeast1, asia-south1, asia-southeast1, australia-southeast1, europe-central2, europe-north1, europe-west1, europe-west12, europe-west2, europe-west3, europe-west4, me-central2, southamerica-east1, us-central1, us-east1, us-east4, us-east5, us-west1, us-west2, us-west4\n**Note:** VM terminate operations, products dependent on VM creations and terminations may have seen impact outside the above zones.\n**Description:**\nGoogle Compute Engine (GCE) encountered problems affecting VM creation, termination, and reservation consumption. Consequently, multiple Google Cloud products experienced increased latencies and timeouts during create, update, and terminate operations.\nPreliminary analysis indicates that a recent configuration change negatively impacted GCE handling of routine spot virtual machine (VM) terminations. As a result of this problem, GCE Control Plane services became overloaded causing disruptions for VM Instance creation, termination, and reservation consumption.\nThe issue was mitigated by changing the configuration to the previous state, thereby resolving the impact on all affected products.\nGoogle will complete a full Incident Report in the following days that will provide a full root cause.\n**Customer Impact:**\n**Google Compute Engine:** Customers may have observed elevated latency or timeouts for VM Instance operations like creation, reservation consumption, etc.\n**Google Kubernetes Engine:** Customers may have observed latency while performing operations like creating or deleting clusters, adding or resizing nodepools, etc.\n**Google Cloud Dataproc:** Customers may have observed elevated latency while performing operations like creating or deleting clusters, and scale up and scale down operations, etc.\n**Google Cloud Dataflow:** Customers may have observed elevated latency for start-up / scaleup / shut-downs for Dataflow jobs.\n**Cloud Filestore:** Customers may have observed create instance failures.\n**Cloud Build:** Customers using private pools may have observed elevated latency in build completion or sporadic build failures due to workers failing to start.\n**Cloud SQL:** Customers may have observed failures or elevated latency for instance creation, resizing and high-availability update operations. As a workaround, for failure in the create operations, customers can retry by deleting the failed instances and re-attempt the operation.\n**Cloud Composer:** Customers may have experienced failures in new Composer environment creation and in upgrade of Composer/Airflow versions, as well as delays in up-scaling of new airflow-workers and in KubernetesPodOperator tasks.\n**AlloyDB for PostgreSQL:** Customers may have experienced failures in instance creation operations. In addition, a small number of instance update operations may also see failures.\n**Google App Engine:** Customers may have experienced failures in insert/update/create/delete operations.\n**Migrate to Virtual Machines:** Customers may have experienced timeouts or errors.\n**Vertex GenAI:** Customers may have experienced issues in creating cluster operations.\n**Cloud Deploy:** Customers may have experienced Cloud Deploy operations (e.g. Render, Deploy, Verify, etc.) as “in progress” for a long time or failed to start.\n**Memorystore for Redis:** Customers may have experienced increased latency or timeouts for some CreateCluster operations.\n------","status":"AVAILABLE","affected_locations":[]},{"created":"2025-05-20T12:17:55+00:00","modified":"2025-05-20T18:58:49+00:00","when":"2025-05-20T12:17:55+00:00","text":"The issue with multiple dependent GCP services has been resolved for all affected users as of Tuesday, 2025-05-20 05:05 US/Pacific.\nWe thank you for your patience while we worked on resolving the issue.","status":"SERVICE_DISRUPTION","affected_locations":[{"title":"Belgium (europe-west1)","id":"europe-west1"},{"title":"Iowa (us-central1)","id":"us-central1"},{"title":"South Carolina (us-east1)","id":"us-east1"},{"title":"Northern Virginia (us-east4)","id":"us-east4"},{"title":"Oregon (us-west1)","id":"us-west1"}]},{"created":"2025-05-20T12:05:43+00:00","modified":"2025-05-20T12:17:55+00:00","when":"2025-05-20T12:05:43+00:00","text":"Description:\nWe are experiencing an issue with multiple dependent GCP services beginning on Monday, 2025-05-19 20:23 US/Pacific.\nOur engineering team has deployed a mitigation and are seeing improvement across all affected zones. Most of the impacted products have been mitigated and the work towards full mitigation is ongoing.\nWe will provide more information by Tuesday, 2025-05-20 05:30 US/Pacific.\nDiagnosis:\nGoogle Cloud Dataproc: Customers may experience elevated latency while performing operations like creating or deleting clusters, and scale up and scale down operations, etc.\nGoogle Compute Engine: Now Mitigated\nCustomers might experience increased latency or timeouts when performing VM instance operations, including creation and reservation consumption.\nGoogle Kubernetes Engine: Now Mitigated\nCustomers may experience latency while performing operations like creating or deleting clusters, adding or resizing nodepools, etc.\nGoogle Cloud Dataflow: Now Mitigated\nCustomers may experience elevated latency for start-up / scaleup / shut-downs for Dataflow jobs..\nCloud Filestore: Now Mitigated\nCustomers may experience create instance failures.\nCloud Build: Now Mitigated\nCustomers may experience elevated latency in build completion or sporadic build failures due to workers failing to start. Default pools (including the legacy \"global\" region) and private pools are both impacted.\nCloud SQL: Now Mitigated\nCustomers may experience failures or elevated latency for instance creation, resizing and high-availability update operations. As a workaround, for failure in the create operations, customers can retry by deleting the failed instances and re-attempt the operation.\nCloud Composer: Now Mitigated\nCustomers may experience failures in creation of new Composer environments and in upgrade of Composer/Airflow versions, as well as delays in up-scaling of new airflow-workers and in KubernetesPodOperator tasks.\nAlloyDB for PostgreSQL: Now Mitigated\nCustomers may experience failures in instance creation operations. In addition, a small number of instance update operations may also see failures.\nGoogle App Engine (Google App Engine Flexible): Customers may experience failures in insert/update/create/delete operations.\nMigrate to Virtual Machines: Now Mitigated\nCustomers may experience timeouts or errors.\nVertex GenAI: Now Mitigated\nCustomers may experience issues in creating cluster operations.\nCloud Deploy: Now Mitigated\nCustomers may see Cloud Deploy operations (e.g. Render, Deploy, Verify, etc.) as “in progress” for a long time or fail to start.\nMemorystore for Redis: Now Mitigated\nCustomers may experience increased latency or timeouts for some CreateCluster operations.\nWorkaround:\nCustomers who are experiencing impact are advised to use alternate zones.","status":"SERVICE_DISRUPTION","affected_locations":[{"title":"Belgium (europe-west1)","id":"europe-west1"},{"title":"Iowa (us-central1)","id":"us-central1"},{"title":"South Carolina (us-east1)","id":"us-east1"},{"title":"Northern Virginia (us-east4)","id":"us-east4"},{"title":"Oregon (us-west1)","id":"us-west1"}]},{"created":"2025-05-20T11:28:15+00:00","modified":"2025-05-20T12:05:43+00:00","when":"2025-05-20T11:28:15+00:00","text":"Description:\nWe are experiencing an issue with Google Compute Engine, Google Kubernetes Engine, Cloud Dataflow, Cloud SQL, AlloyDB for PostgreSQL, Cloud Composer, Cloud Build, Cloud Dataproc, Cloud Filestore, Google App Engine (Google App Engine Flexible) beginning on Monday, 2025-05-19 20:23 US/Pacific.\nOur engineering team has deployed a mitigation and are seeing improvement across all affected zones.\nWe will provide more information by Tuesday, 2025-05-20 05:00 US/Pacific.\nDiagnosis:\nGoogle Compute Engine: Customers might experience increased latency or timeouts when performing VM instance operations, including creation and reservation consumption.\nGoogle Kubernetes Engine: Customers may experience latency while performing operations like creating or deleting clusters, adding or resizing nodepools, etc.\nGoogle Cloud Dataproc: Customers may experience elevated latency while performing operations like creating or deleting clusters, and scale up and scale down operations, etc.\nGoogle Cloud Dataflow: Customers may experience elevated latency for start-up / scaleup / shut-downs for Dataflow jobs..\nCloud Filestore: Customers may experience create instance failures.\nCloud Build: Customers may experience elevated latency in build completion or sporadic build failures due to workers failing to start. Default pools (including the legacy \"global\" region) and private pools are both impacted.\nCloud SQL: Customers may experience failures or elevated latency for instance creation, resizing and high-availability update operations. As a workaround, for failure in the create operations, customers can retry by deleting the failed instances and re-attempt the operation.\nCloud Composer: Customers may experience failures in creation of new Composer environments and in upgrade of Composer/Airflow versions, as well as delays in up-scaling of new airflow-workers and in KubernetesPodOperator tasks.\nAlloyDB for PostgreSQL: Customers may experience failures in instance creation operations. In addition, a small number of instance update operations may also see failures.\nGoogle App Engine (Google App Engine Flexible): Customers may experience failures in insert/update/create/delete operations.\nWorkaround:\nCustomers who are experiencing impact are advised to use alternate zones.","status":"SERVICE_DISRUPTION","affected_locations":[{"title":"Belgium (europe-west1)","id":"europe-west1"},{"title":"Iowa (us-central1)","id":"us-central1"},{"title":"South Carolina (us-east1)","id":"us-east1"},{"title":"Northern Virginia (us-east4)","id":"us-east4"},{"title":"Oregon (us-west1)","id":"us-west1"}]},{"created":"2025-05-20T11:07:41+00:00","modified":"2025-05-20T11:28:15+00:00","when":"2025-05-20T11:07:41+00:00","text":"Description:\nWe are experiencing an issue with Google Compute Engine, Google Kubernetes Engine, Cloud Dataflow, Cloud SQL, AlloyDB for PostgreSQL, Cloud Composer, Cloud Build, Cloud Dataproc, Cloud Filestore, Google App Engine (Google App Engine Flexible) beginning on Monday, 2025-05-19 20:23 US/Pacific.\nMitigation work is currently underway by our engineering team.\nWe do not have an ETA for mitigation at this point.\nWe will provide more information by Tuesday, 2025-05-20 04:30 US/Pacific.\nDiagnosis:\nGoogle Compute Engine: Customers might experience increased latency or timeouts when performing VM instance operations, including creation and reservation consumption.\nGoogle Kubernetes Engine: Customers may experience latency while performing operations like creating or deleting clusters, adding or resizing nodepools, etc.\nGoogle Cloud Dataproc: Customers may experience elevated latency while performing operations like creating or deleting clusters, and scale up and scale down operations, etc.\nGoogle Cloud Dataflow: Customers may experience elevated latency for start-up / scaleup / shut-downs for Dataflow jobs..\nCloud Filestore: Customers may experience create instance failures.\nCloud Build: Customers may experience elevated latency in build completion or sporadic build failures due to workers failing to start. Default pools (including the legacy \"global\" region) and private pools are both impacted.\nCloud SQL: Customers may experience failures or elevated latency for instance creation, resizing and high-availability update operations. As a workaround, for failure in the create operations, customers can retry by deleting the failed instances and re-attempt the operation.\nCloud Composer: Customers may experience failures in creation of new Composer environments and in upgrade of Composer/Airflow versions, as well as delays in up-scaling of new airflow-workers and in KubernetesPodOperator tasks.\nAlloyDB for PostgreSQL: Customers may experience failures in instance creation operations. In addition, a small number of instance update operations may also see failures.\nGoogle App Engine (Google App Engine Flexible): Customers may experience failures in insert/update/create/delete operations.\nWorkaround:\nCustomers who are experiencing impact are advised to use alternate zones.","status":"SERVICE_DISRUPTION","affected_locations":[{"title":"Belgium (europe-west1)","id":"europe-west1"},{"title":"Iowa (us-central1)","id":"us-central1"},{"title":"South Carolina (us-east1)","id":"us-east1"},{"title":"Northern Virginia (us-east4)","id":"us-east4"},{"title":"Oregon (us-west1)","id":"us-west1"}]}],"most_recent_update":{"created":"2025-05-27T23:06:21+00:00","modified":"2025-05-27T23:06:22+00:00","when":"2025-05-27T23:06:21+00:00","text":"# Incident Report\n## Summary\nOn 19 May 2025, Google Compute Engine (GCE) encountered problems affecting Spot VM termination globally, and performance degradation and timeouts of reservation consumption / VM creation in us-central1 and us-east4 for a duration of 8 hours, 42 minutes. Consequently, multiple other Google Cloud Platform (GCP) products relying on GCE also experienced increased latencies and timeouts.\nTo our customers who were impacted during this disruption, we sincerely apologize. This is not the level of quality and reliability we strive to offer you, and we are taking immediate steps to improve the platform’s performance and availability.\n## Root Cause\nA recently deployed configuration change to a Google Compute Engine (GCE) component mistakenly disabled a feature flag that controlled how VM instance states are reported to other components. Safety checks intended to ensure gradual rollout of this type of change failed to be triggered, resulting in an unplanned rapid rollout of the change.\nThis caused Spot VMs to be stuck in an unexpected state. Consequently, Spot VMs that had initiated their standard termination process due to preemption began to accumulate as they failed to complete termination, creating a backlog that degraded performance for all VM types in some regions.\n## Remediation and Prevention\nGoogle engineers were alerted to the outage via internal monitoring on 19 May 2025, at 21:08 US/Pacific, and immediately started an investigation. Once the nature and scope of the issue became clear, Google engineers initiated a rollback of the change on 20 May 2025 at 03:29 US/Pacific.\nThe rollback completed at 03:55 US/Pacific, mitigating the impact.\nGoogle is committed to preventing a repeat of this issue in the future and is completing the following actions:\n* Google Cloud employs a robust and well-defined methodology for production updates, including a phased rollout approach as standard practice to avoid rapid global changes. This phased approach is meant to ensure that changes are introduced into production gradually and as safely as possible, however, in this case, the safety checks were not enforced. We have paused further feature flag rollouts for the affected system, while we undertake a comprehensive audit of safety checks and fix any exposed gaps that led to the unplanned rapid rollout of this change.\n* We will review and address scalability issues encountered by GCE during the incident.\n* We will improve monitoring coverage of Spot VM deletion workflows.\nGoogle is committed to quickly and continually improving our technology and operations to prevent service disruptions. We appreciate your patience and apologize again for the impact to your organization. We thank you for your business.\n## Detailed Description of Impact\nCustomers experienced increased latency for VM control plane operations in us-central1 and us-east4. VM control plane operations include creating, modifying, or deleting VMs. For some customers, Spot VM instances became stuck while terminating. Customers were not billed for Spot VM instances in this state. Furthermore, running virtual machines and the data plane were not impacted.\nVM control plane latency in the us-central1 and us-east4 regions began increasing at the start of the incident (19 May 2025 20:23 US/Pacific), and peaked around 20 May 2025 03:40 US/Pacific. At peak, median latency went from seconds to minutes, and tail latency went from minutes to hours. Several other regions experienced increased tail latency during the outage, but most operations in these regions completed as normal. Once mitigations took effect, median and tail latencies started falling and returned to normal by 05:15 US/Pacific.\nCustomers may have experienced similar latency increases in products which create, modify, failover or delete VM instances: GCE, GKE, Dataflow, Cloud SQL, Google Cloud Dataproc, Google App Engine, Cloud Deploy, Memorystore, Redis, Cloud Filestore, among others.","status":"AVAILABLE","affected_locations":[]},"status_impact":"SERVICE_DISRUPTION","severity":"medium","service_key":"zall","service_name":"Multiple Products","affected_products":[{"title":"AlloyDB for PostgreSQL","id":"fPovtKbaWN9UTepMm3kJ"},{"title":"Cloud Build","id":"fw8GzBdZdqy4THau7e1y"},{"title":"Cloud Filestore","id":"jog4nyYkquiLeSK5s26q"},{"title":"Colab Enterprise","id":"7Nbc1kZUvPLiihodettN"},{"title":"Google App Engine","id":"kchyUtnkMHJWaAva8aYc"},{"title":"Google Cloud Composer","id":"YxkG5FfcC42cQmvBCk4j"},{"title":"Google Cloud Dataflow","id":"T9bFoXPqG8w8g1YbWTKY"},{"title":"Google Cloud Dataproc","id":"yjXrEg3Yvy26BauMwr69"},{"title":"Google Cloud Deploy","id":"6z5SnvJrJMJQSdJmUQjH"},{"title":"Google Cloud SQL","id":"hV87iK5DcEXKgWU2kDri"},{"title":"Google Compute Engine","id":"L3ggmi3Jy4xJmgodFA9K"},{"title":"Google Kubernetes Engine","id":"LCSbT57h59oR4W98NHuz"},{"title":"Managed Service for Apache Kafka","id":"QMZ3IpyG3Ooxotv7JOKV"},{"title":"Migrate to Virtual Machines","id":"EwEFrihT41NLB9mhyWhz"}],"uri":"incidents/SXRPpPwx2RZ5VHjTwFLx","currently_affected_locations":[],"previously_affected_locations":[{"title":"Belgium (europe-west1)","id":"europe-west1"},{"title":"Iowa (us-central1)","id":"us-central1"},{"title":"South Carolina (us-east1)","id":"us-east1"},{"title":"Northern Virginia (us-east4)","id":"us-east4"},{"title":"Oregon (us-west1)","id":"us-west1"}]},{"id":"N3Dw7nbJ7rk7qwrtwh7X","number":"6284910072052476183","begin":"2025-03-29T19:53:00+00:00","created":"2025-03-30T01:30:30+00:00","end":"2025-03-30T02:15:00+00:00","modified":"2025-04-11T16:10:00+00:00","external_desc":"Customers are experiencing connectivity issues with multiple Google Cloud services in zone us-east5-c","updates":[{"created":"2025-04-11T16:10:00+00:00","modified":"2025-04-11T16:10:00+00:00","when":"2025-04-11T16:10:00+00:00","text":"# Incident Report\n## Summary:\nOn Saturday, 29 March 2025, multiple Google Cloud Services in the us-east5-c zone experienced degraded service or unavailability for a duration of 6 hours and 10 minutes. To our Google Cloud customers whose services were impacted during this disruption, we sincerely apologize. This is not the level of quality and reliability we strive to offer you, and we are taking immediate steps to improve the platform’s performance and availability.\n## Root Cause:\nThe root cause of the service disruption was a loss of utility power in the affected zone. This power outage triggered a cascading failure within the uninterruptible power supply (UPS) system responsible for maintaining power to the zone during such events. The UPS system, which relies on batteries to bridge the gap between utility power loss and generator power activation, experienced a critical battery failure.\nThis failure rendered the UPS unable to perform its core function of ensuring continuous power to the system. As a direct consequence of the UPS failure, virtual machine instances within the affected zone lost power and went offline, resulting in service downtime for customers. The power outage and subsequent UPS failure also triggered a series of secondary issues, including packet loss within the us-east5-c zone, which impacted network communication and performance. Additionally, a limited number of storage disks within the zone became unavailable during the outage.\n## Remediation and Prevention:\nGoogle engineers were alerted to the incident from our internal monitoring alerts at 12:54 US/Pacific on Saturday, 29 March and immediately started an investigation.\nGoogle engineers diverted traffic away from the impacted location to partially mitigate impact for some services that did not have zonal resource dependencies. Engineers bypassed the failed UPS and restored power via generator by 14:49 US/Pacific on Saturday, 29 March. The majority of Google Cloud services recovered shortly thereafter. A few services experienced longer restoration times as manual actions were required in some cases to complete full recovery.\nGoogle is committed to preventing a repeat of this issue in the future and is completing the following actions:\n* Harden cluster power failure and recovery path to achieve a predictable and faster time-to-serving after power is restored.\n* Audit systems that did not automatically failover and close any gaps that prevented this function.\n* Work with our uninterruptible power supply (UPS) vendor to understand and remediate issues in the battery backup system.\nGoogle is committed to quickly and continually improving our technology and operations to prevent service disruptions. We appreciate your patience and apologize again for the impact to your organization. We thank you for your business.\n## Detailed Description of Impact:\nCustomers experienced degraded service or unavailability for multiple Google Cloud products in the us-east5-c zone of varying impact and severity as noted below:\n**AlloyDB for PostgreSQL:** A few clusters experienced transient unavailability during the failover. Two impacted clusters did not failover automatically and required manual intervention from Google engineers to do the failover.\n**BigQuery:** A few customers in the impacted region experienced brief unavailability of the product between 12:57 US/Pacific until 13:19 US/Pacific.\n**Cloud Bigtable:** The outage resulted in increased errors and latency for a few customers between 12:47 US/Pacific to 19:37 US/Pacific.\n**Cloud Composer:** External streaming jobs for a few customers experienced increased latency for a period of 16 minutes.\n**Cloud Dataflow:** Streaming and batch jobs saw brief periods of performance degradation. 17% of streaming jobs experienced degradation from 12:52 US/Pacific to 13:08 US/Pacific, while 14% of batch jobs experienced degradation from 15:42 US/Pacific to 16:00 US/Pacific.\n**Cloud Filestore:** All basic, high scale and zonal instances in us-east5-c were unavailable and all enterprise and regional instances in us-east5 were operating in degraded mode from 12:54 to 18:47 US/Pacific on Saturday, 29 March 2025\\.\n**Cloud Firestore:** Limited impact of approximately 2 minutes where customers experienced elevated unavailability and latency, as jobs were being rerouted automatically.\n**Cloud Identity and Access Management:** A few customers experienced slight latency or errors while retrying for a short period of time.\n**Cloud Interconnect:** All us-east5 attachments connected to zone1 were unavailable for a duration of 2 hours, 7 minutes.\n**Cloud Key Management Service:** Customers experienced 5XX errors for a brief period of time (less than 4 mins). Google engineers rerouted the traffic to healthy cells shortly after the power loss to mitigate the impact.\n**Cloud Kubernetes Engine:** Customers experienced terminations of their nodes in us-east5-c. Some zonal clusters in us-east5-c experienced loss of connectivity to their control plane. No impact was observed for nodes or control planes outside of us-east5-c.\n**Cloud NAT:** Transient control plane outage affecting new VM creation processes and/or dynamic port allocation.\n**Cloud Router:** Cloud Router was unavailable for up to 30 seconds while leadership shifted to other clusters. This downtime was within the thresholds of most customer's graceful restart configuration (60 seconds).\n**Cloud SQL:** Based on monitoring data, 318 zonal instances experienced 3h of downtime in the us-east5-c zone. All external high-availability instances successfully failed out of the impacted zone.\n**Cloud Spanner:** Customers in the us-east5 region may have seen a few minutes of errors or latency increase during the few minutes after 12:52 US/Pacific when the cluster first failed.\n**Cloud VPN:** A few legacy customers experienced loss of connectivity of their sessions up to 5 mins.\n**Compute Engine:** Customers experienced instance unavailability and inability to manage instances in us-east5-c from 12:54 to 18:30 US/Pacific on Saturday, 29 March 2025\\.\n**Managed Service for Apache Kafka:** CreateCluster and some UpdateCluster commands (those that increased capacity config) had a 100% error rate in the region, with the symptom being INTERNAL errors or timeouts. Based on our monitoring, the impact was limited to one customer who attempted to use these methods during the incident.\n**Memorystore for Redis:** High availability instances failed over to healthy zones during the incident. 12 instances required manual intervention to bring back provisioned capacity. All instances were recovered by 19:28 US/Pacific.\n**Persistent Disk:** Customers experienced very high I/O latency, including stalled I/O operations or errors in some disks in us-east5-c from 12:54 US/Pacific to 20:45 US/Pacific on Saturday, 29 March 2025\\. Other products using PD or communicating with impacted PD devices experienced service issues with varied symptoms.\n**Secret Manager:** Customers experienced 5XX errors for a brief period of time (less than 4 mins). Google engineers rerouted the traffic to healthy cells shortly after the power loss to mitigate the impact.\n**Virtual Private Cloud:** Virtual machine instances running in the us-east5-c zone were unable to reach the network. Services were partially unavailable from the impacted zone. Customers wherever applicable were able to fail over workloads to different Cloud zones.","status":"AVAILABLE","affected_locations":[]},{"created":"2025-04-01T08:53:47+00:00","modified":"2025-04-11T16:10:00+00:00","when":"2025-04-01T08:53:47+00:00","text":"# Mini Incident Report\nWe apologize for the inconvenience this outage may have caused. We would like to provide some information about this incident below. Please note, this information is based on our best knowledge at the time of posting and is subject to change as our investigation continues. If you have experienced impact outside of what is listed below, please reach out to Google Cloud Support using https://cloud.google.com/support\n(All Times US/Pacific)\n**Incident Start:** 29 March 2025 12:53\n**Incident End:** 29 March 2025 19:12\n**Duration:** 6 hours, 19 minutes\n**Affected Services and Features:**\n- AlloyDB for PostgreSQL\n- BigQuery\n- Cloud Bigtable\n- Cloud Composer\n- Cloud Dataflow\n- Cloud Filestore\n- Cloud Firestore\n- Cloud Identity and Access Management\n- Cloud Interconnect\n- Cloud Key Management Service\n- Cloud Kubernetes Engine\n- Cloud NAT\n- Cloud Router\n- Cloud SQL\n- Cloud Spanner\n- Cloud VPN\n- Compute Engine\n- Managed Service for Apache Kafka\n- Memorystore for Redis\n- Persistent Disk\n- Secret Manager\n- Virtual Private Cloud\n**Regions/Zones:** us-east5-c\n**Description:**\nMultiple Google Cloud products were impacted in us-east5-c, with some zonal resources unavailable, for a duration of 6 hours and 19 minutes.\nThe root cause of the issue was a utility power outage in the zone and a subsequent failure of batteries within the uninterruptible power supply (UPS) system supporting a portion of the impacted zone. This failure prevented the UPS from operating correctly, thereby preventing a power source transfer to generators during the utility power outage. As a result, some Compute Engine instances in the zone experienced downtime. The incident also caused some packet loss within the us-east5-c zone, as well as some capacity constraints for Google Kubernetes Engine in other zones of us-east5. Additionally, a small number of Persistent Disks were unavailable during the outage.\nGoogle engineers diverted traffic away from the impacted location to partially mitigate impact for some services that did not have zonal resource dependencies. Engineers bypassed the failed UPS and restored power via generator, allowing the underlying infrastructure to come back online. Impact to all affected Cloud services was mitigated by 29 March 2025 at 19:12 US/Pacific.\nGoogle will complete a full Incident Report in the following days that will provide a detailed root cause analysis.\n**Customer Impact:**\nCustomers experienced degraded service or zonal unavailability for multiple Google Cloud products in us-east5-c.\n**Additional details:**\nThe us-east5-c zone has transitioned back to primary power without further impact as of 30 March 2025 at 17:30 US/Pacific.","status":"AVAILABLE","affected_locations":[]},{"created":"2025-03-30T02:43:53+00:00","modified":"2025-04-01T08:55:14+00:00","when":"2025-03-30T02:43:53+00:00","text":"Currently, the us-east5-c zone is stable on an alternate power source. All previously impacted products are mitigated as of 19:12 US/Pacific.\nA small number of Persistent Disks remain still in recovery, and are actively being worked on. Customers still experiencing issues attaching Persistent Disks should open a support case.\nOur engineers continue to monitor service stability prior to transitioning back to primary power.\nWe will provide continuing updates via PSH by Sunday, 2025-03-30 01:30 US/Pacific with current details.\nWe apologize to all who are affected by the disruption.","status":"AVAILABLE","affected_locations":[]},{"created":"2025-03-30T01:30:30+00:00","modified":"2025-04-01T08:39:26+00:00","when":"2025-03-30T01:30:30+00:00","text":"Our engineers are actively working on recovery following a power event in the affected zone. Full recovery is currently expected to take several hours.\nThe impacted services include Cloud Interconnect, Virtual Private Cloud (VPC), Google Compute Engine, Persistent Disk, AlloyDB for PostgreSQL, Cloud Dataproc, Cloud Dataflow, Cloud Filestore, Identity and Access Management, Cloud SQL , Google Kubernetes Engine, Cloud Composer, BigQuery, Cloud Bigtable and more.\nWe have determined that no other zones (a, b) in the us-east5 region are impacted.\nWe will provide an update by Saturday, 2025-03-29 20:00 US/Pacific with current details.\nWe apologize to all who are affected by the disruption.","status":"SERVICE_OUTAGE","affected_locations":[{"title":"Columbus (us-east5)","id":"us-east5"}]}],"most_recent_update":{"created":"2025-04-11T16:10:00+00:00","modified":"2025-04-11T16:10:00+00:00","when":"2025-04-11T16:10:00+00:00","text":"# Incident Report\n## Summary:\nOn Saturday, 29 March 2025, multiple Google Cloud Services in the us-east5-c zone experienced degraded service or unavailability for a duration of 6 hours and 10 minutes. To our Google Cloud customers whose services were impacted during this disruption, we sincerely apologize. This is not the level of quality and reliability we strive to offer you, and we are taking immediate steps to improve the platform’s performance and availability.\n## Root Cause:\nThe root cause of the service disruption was a loss of utility power in the affected zone. This power outage triggered a cascading failure within the uninterruptible power supply (UPS) system responsible for maintaining power to the zone during such events. The UPS system, which relies on batteries to bridge the gap between utility power loss and generator power activation, experienced a critical battery failure.\nThis failure rendered the UPS unable to perform its core function of ensuring continuous power to the system. As a direct consequence of the UPS failure, virtual machine instances within the affected zone lost power and went offline, resulting in service downtime for customers. The power outage and subsequent UPS failure also triggered a series of secondary issues, including packet loss within the us-east5-c zone, which impacted network communication and performance. Additionally, a limited number of storage disks within the zone became unavailable during the outage.\n## Remediation and Prevention:\nGoogle engineers were alerted to the incident from our internal monitoring alerts at 12:54 US/Pacific on Saturday, 29 March and immediately started an investigation.\nGoogle engineers diverted traffic away from the impacted location to partially mitigate impact for some services that did not have zonal resource dependencies. Engineers bypassed the failed UPS and restored power via generator by 14:49 US/Pacific on Saturday, 29 March. The majority of Google Cloud services recovered shortly thereafter. A few services experienced longer restoration times as manual actions were required in some cases to complete full recovery.\nGoogle is committed to preventing a repeat of this issue in the future and is completing the following actions:\n* Harden cluster power failure and recovery path to achieve a predictable and faster time-to-serving after power is restored.\n* Audit systems that did not automatically failover and close any gaps that prevented this function.\n* Work with our uninterruptible power supply (UPS) vendor to understand and remediate issues in the battery backup system.\nGoogle is committed to quickly and continually improving our technology and operations to prevent service disruptions. We appreciate your patience and apologize again for the impact to your organization. We thank you for your business.\n## Detailed Description of Impact:\nCustomers experienced degraded service or unavailability for multiple Google Cloud products in the us-east5-c zone of varying impact and severity as noted below:\n**AlloyDB for PostgreSQL:** A few clusters experienced transient unavailability during the failover. Two impacted clusters did not failover automatically and required manual intervention from Google engineers to do the failover.\n**BigQuery:** A few customers in the impacted region experienced brief unavailability of the product between 12:57 US/Pacific until 13:19 US/Pacific.\n**Cloud Bigtable:** The outage resulted in increased errors and latency for a few customers between 12:47 US/Pacific to 19:37 US/Pacific.\n**Cloud Composer:** External streaming jobs for a few customers experienced increased latency for a period of 16 minutes.\n**Cloud Dataflow:** Streaming and batch jobs saw brief periods of performance degradation. 17% of streaming jobs experienced degradation from 12:52 US/Pacific to 13:08 US/Pacific, while 14% of batch jobs experienced degradation from 15:42 US/Pacific to 16:00 US/Pacific.\n**Cloud Filestore:** All basic, high scale and zonal instances in us-east5-c were unavailable and all enterprise and regional instances in us-east5 were operating in degraded mode from 12:54 to 18:47 US/Pacific on Saturday, 29 March 2025\\.\n**Cloud Firestore:** Limited impact of approximately 2 minutes where customers experienced elevated unavailability and latency, as jobs were being rerouted automatically.\n**Cloud Identity and Access Management:** A few customers experienced slight latency or errors while retrying for a short period of time.\n**Cloud Interconnect:** All us-east5 attachments connected to zone1 were unavailable for a duration of 2 hours, 7 minutes.\n**Cloud Key Management Service:** Customers experienced 5XX errors for a brief period of time (less than 4 mins). Google engineers rerouted the traffic to healthy cells shortly after the power loss to mitigate the impact.\n**Cloud Kubernetes Engine:** Customers experienced terminations of their nodes in us-east5-c. Some zonal clusters in us-east5-c experienced loss of connectivity to their control plane. No impact was observed for nodes or control planes outside of us-east5-c.\n**Cloud NAT:** Transient control plane outage affecting new VM creation processes and/or dynamic port allocation.\n**Cloud Router:** Cloud Router was unavailable for up to 30 seconds while leadership shifted to other clusters. This downtime was within the thresholds of most customer's graceful restart configuration (60 seconds).\n**Cloud SQL:** Based on monitoring data, 318 zonal instances experienced 3h of downtime in the us-east5-c zone. All external high-availability instances successfully failed out of the impacted zone.\n**Cloud Spanner:** Customers in the us-east5 region may have seen a few minutes of errors or latency increase during the few minutes after 12:52 US/Pacific when the cluster first failed.\n**Cloud VPN:** A few legacy customers experienced loss of connectivity of their sessions up to 5 mins.\n**Compute Engine:** Customers experienced instance unavailability and inability to manage instances in us-east5-c from 12:54 to 18:30 US/Pacific on Saturday, 29 March 2025\\.\n**Managed Service for Apache Kafka:** CreateCluster and some UpdateCluster commands (those that increased capacity config) had a 100% error rate in the region, with the symptom being INTERNAL errors or timeouts. Based on our monitoring, the impact was limited to one customer who attempted to use these methods during the incident.\n**Memorystore for Redis:** High availability instances failed over to healthy zones during the incident. 12 instances required manual intervention to bring back provisioned capacity. All instances were recovered by 19:28 US/Pacific.\n**Persistent Disk:** Customers experienced very high I/O latency, including stalled I/O operations or errors in some disks in us-east5-c from 12:54 US/Pacific to 20:45 US/Pacific on Saturday, 29 March 2025\\. Other products using PD or communicating with impacted PD devices experienced service issues with varied symptoms.\n**Secret Manager:** Customers experienced 5XX errors for a brief period of time (less than 4 mins). Google engineers rerouted the traffic to healthy cells shortly after the power loss to mitigate the impact.\n**Virtual Private Cloud:** Virtual machine instances running in the us-east5-c zone were unable to reach the network. Services were partially unavailable from the impacted zone. Customers wherever applicable were able to fail over workloads to different Cloud zones.","status":"AVAILABLE","affected_locations":[]},"status_impact":"SERVICE_OUTAGE","severity":"high","service_key":"zall","service_name":"Multiple Products","affected_products":[{"title":"AlloyDB for PostgreSQL","id":"fPovtKbaWN9UTepMm3kJ"},{"title":"Cloud Firestore","id":"CETSkT92V21G6A1x28me"},{"title":"Google BigQuery","id":"9CcrhHUcFevXPSVaSxkf"},{"title":"Google Cloud Bigtable","id":"LfZSuE3xdQU46YMFV5fy"},{"title":"Google Cloud Composer","id":"YxkG5FfcC42cQmvBCk4j"},{"title":"Google Cloud Dataflow","id":"T9bFoXPqG8w8g1YbWTKY"},{"title":"Google Cloud Dataproc","id":"yjXrEg3Yvy26BauMwr69"},{"title":"Google Cloud SQL","id":"hV87iK5DcEXKgWU2kDri"},{"title":"Google Compute Engine","id":"L3ggmi3Jy4xJmgodFA9K"},{"title":"Google Kubernetes Engine","id":"LCSbT57h59oR4W98NHuz"},{"title":"Hybrid Connectivity","id":"5x6CGnZvSHQZ26KtxpK1"},{"title":"Identity and Access Management","id":"adnGEDEt9zWzs8uF1oKA"},{"title":"Persistent Disk","id":"SzESm2Ux129pjDGKWD68"},{"title":"Virtual Private Cloud (VPC)","id":"BSGtCUnz6ZmyajsjgTKv"}],"uri":"incidents/N3Dw7nbJ7rk7qwrtwh7X","currently_affected_locations":[],"previously_affected_locations":[{"title":"Columbus (us-east5)","id":"us-east5"}]}]