Gigalixir Status Page Status - Incident history

Us-Central1 - Investigating Performance Degradation

2026-03-03T23:14:59.103+00:00

Type: Incident

Duration: 1 hour and 19 minutes

Affected Components: US-Central1 (Google Cloud)

Mar 5, 17:29:22 GMT+0
Postmortem - Updated March 4, 2026 # Description of Issue A configuration distribution issue within our ingress system caused certificate and routing disruptions across the us-central1 region. This resulted in intermittent 502 responses, TLS/certificate errors, and temporary unavailability of the Gigalixir console. # Scope of the Issue The issue affected applications running in the us-central1 GCP region. During the remediation process, each ingress system required a rolling restart, which caused a brief but broader period of disruption. For the affected applications, this downtime lasted 9 minutes on average. # Prevention Measures We are implementing changes to further isolate ingress configurations across our infrastructure. These measures include, but are not limited to: * improved isolation of ingress configuration distribution * additional validation of configuration changes before application * enhanced monitoring and alerting for ingress health # Customer Recommendations There are no customer recommendations for this incident. The incident was entirely internal to the infrastructure of Gigalixir. # Incident Timeline ## 3 March - 22:15 UTC / 16:15 CST We began to see intermittent disruptions affecting a small number of applications on one of our shared ingress systems. The impact was limited and sporadic at this stage. ## 3 March - 22:35 UTC / 16:35 CST Our monitoring systems raised alerts indicating elevated error rates. Our engineering team began investigating the issue as a possible concern for more than just the one application we had detected at 16:15 CST. Our first customer report came in at 22:40 UTC. ## 3 March - 23:13 UTC / 17:13 CST We determined the issue was not isolated to individual applications but was a system-level ingress configuration problem. We declared an incident and began working on determining the root cause. ## 3 March - 23:32 UTC / 17:32 CST The root cause was identified. Invalid configurations were being applied to the ingress systems in an intermittently destructive manner. We began to investigate the best solution to resolve the problem with the least amount of additional disruption. ## 3 March - 23:46 UTC / 17:46 CST It was determined the best fix would require each of our ingress systems to be forcibly restarted. Although this would result in guaranteed downtime for applications, it was deemed the safest way to recover fully. We began applying the fix to the ingress system. This required a rolling restart of the ingress infrastructure, which temporarily caused a broader disruption. During this window, most applications experienced intermittent SSL errors and 503 responses. The average downtime per application during this phase was approximately 9 minutes. ## 4 March - 00:04 UTC / 18:04 CST The fix was fully applied and all services were restored. We continued to monitor the system to ensure stability. ## After resolution We have also added additional verifications in our configuration distribution to prevent application of errant configurations. We are implementing additional isolation measures for our ingress configuration distribution to prevent similar issues in the future..

Mar 3, 23:14:59 GMT+0
Investigating - We are currently investigating this incident..

Mar 3, 23:29:15 GMT+0
Identified - We are continuing to work on a fix for this incident. .

Mar 3, 23:54:59 GMT+0
Investigating - Investigating: Some applications in the US-Central-1 region are experiencing intermittent failures. This includes elevated 502 responses, occasional TLS/certificate errors, and the Gigalixir console () currently being unavailable. Our engineering team is actively investigating the issue and working on mitigation. Updates will be posted here as we learn more..

Mar 4, 00:16:49 GMT+0
Monitoring - The earlier issues affecting applications in the US-Central-1 region, including intermittent 502 errors, TLS/certificate errors, and temporary unavailability of the Gigalixir console (), appear to be resolved. Services have recovered and applications should be operating normally. We are continuing to monitor the system to ensure stability..

Mar 4, 00:34:27 GMT+0
Resolved - Applications in the US-Central-1 region experienced intermittent 502 responses, TLS/certificate errors, and temporary unavailability of the Gigalixir console. The root cause was a configuration distribution issue within our ingress system that caused certificate and routing disruptions across the region. Our engineering team identified and resolved the issue, and all services have been restored. We are implementing changes to further isolate ingress configurations and prevent similar issues in the future. We will continue monitoring to ensure stability..

GitHub Incident with Issues, Actions and Git Operations

2026-02-09T19:12:06.213+00:00

Type: Incident

Duration: 1 hour and 37 minutes

Affected Components: US-Central1 (Google Cloud), Europe-West1 (Google Cloud), US-East-1 (AWS), US-West-2 (AWS)

Feb 9, 19:12:06 GMT+0
Investigating - We are currently investigating the repercussions of GitHub's issues for git operations. This will possibly affect some Gigalixir operations if you're pushing to production new builds. .

Feb 9, 19:36:58 GMT+0
Identified - We are continuing to monitor GitHub's status for their expected timeline to fix their incident..

Feb 9, 20:26:56 GMT+0
Monitoring - We continue to monitor, please reach out to [[email protected]](mailto:[email protected]) if you are affected and need assistance. .

Feb 9, 20:49:31 GMT+0
Resolved - This incident has been resolved. We'll continue to monitor, please reach out if you have questions. .

Feb 9, 21:03:09 GMT+0
Resolved - This incident has been resolved..

GCP-US Central - Degraded performance on the shared ingress systems

2025-09-19T18:10:00.000+00:00

Type: Incident

Duration: 3 hours and 25 minutes

Affected Components: US-Central1 (Google Cloud)

Sep 19, 18:10:00 GMT+0
Identified - GCP us-central region. Increase in traffic started approx 1:10 PM central. Requests are intermittent, but we have put in some fixes to relieve the system. --- We are monitoring and continuing to investigate the performance. .

Sep 19, 19:14:22 GMT+0
Monitoring - We have applied mitigations. Our systems have recovered. We are actively monitoring the situation..

Sep 19, 21:35:15 GMT+0
Resolved - This incident has been resolved and we will continue to monitor. .

Sep 22, 14:48:36 GMT+0
Postmortem - 2025-09-19 us-central1 Ingress Incident Report Updated September 22, 2025 # Description of Issue We received a DDoS attack on one of our shared ingress systems. The attack originated from many distinct sources. A distributed denial-of-service (DDoS) attack is a malicious attempt to disrupt the normal traffic of a targeted server, service or network by overwhelming the target or its surrounding infrastructure with a flood of Internet traffic. DDoS attacks achieve effectiveness by utilizing multiple compromised computer systems as sources of attack traffic. # Scope of the Issue The attack was able to degrade our ability to process incoming requests on one of our shared ingress systems in the us-central1 GCP region. The attack prevented or degraded incoming traffic to applications that are on the same shared ingress setup. Applications on our dedicated ingress infrastructure were not affected. Applications on adjacent shared ingress systems were also not affected. We were able to identify and mitigate the effects relatively quickly, restoring traffic to affected applications. We were able to block the attacker(s) after applying additional measures. # Prevention Measures We have applied several layers of protections to make our system more resilient to this type of attack in the future. These measures include, but are not limited to: * dynamic rate limiting * dynamic firewall rules * additional monitors and alerts Additionally we are working to split our shared ingress systems into smaller pieces to limit the scope of any similar attacks in the future. # Customer Recommendations Gigalixir is always working to improve our infrastructure to prevent and lessen the impact of However, for general application protection we would recommend the following: 1\. **Run more than one Replica** When you run more than one replica, we run them on multiple servers and across multiple zones. This helps prevent an issue on a single server or zone from completely taking your application offline. 2\. **Consider Dedicated Ingress** Dedicated Ingress gives your applications their own load balancer and ingress resources. Applications on dedicated ingress also run in a separate runtime server pool from our common runtime server pool, which provides further isolation. 3\. **Consider DDoS Protections and/or WAF** There are a handful of good products that offer protections for traffic coming through your domain and/or hostnames. _(We can provide individual recommendations upon request.)_ 4\. **Source Filtering** We commonly work with customers to add source filtering in our system to ensure traffic only comes from trusted sources. The details of this vary depending on the customer's domain setup. One of our preferred solutions is to utilize Cloudflare for your DNS with a rule that applies a signature to all requests. If you couple this with our Dedicated Ingress, we can limit all traffic through the ingress system to only traffic coming from your Cloudflare setup. Cloudflare offers DDoS protections for all plan levels, including their free plan. # Incident Timeline ## 19 September - 16:40 UTC 11:40 CDT We started to receive elevated levels of ingress traffic on the affected shared ingress system. Our system was handling this scenario; scaling to account for the additional traffic. This traffic remained elevated, but managed until 12:50pm CDT. At 12:50pm, the traffic levels settled back to a nominal level. ## 19 September - 18:10 UTC / 13:10 CDT A new flood of traffic came into the affected ingress system. This time our system began to drop packets in the affected ingress system. Our team increased the resources we were dedicating to ingress and worked to identify the patterns and sources of the traffic. After these changes, our system was delivering traffic to customer applications again, but the threat was still active. ## 19 September - 18:40 UTC / 13:40 CDT With the additional resources we added, customer traffic was healthy until 1:40pm CDT. At this time a few of our ingress servers restarted, but the large majority remained online and healthy. We increased our resources limits further at this time help with the situation. ## 19 September - 18:50 UTC / 13:50 CDT At this time we were able to uniquely identify the traffic pattern used in the attack. We were able to isolate this traffic away from the other applications on the affected ingress system. We added some additional identification systems to help us block/reject the traffic further upstream. We began to block the traffic at our firewall. Our team continued to monitor the traffic and our system’s response. ## 19 September - 21:30 UTC / 16:30 CDT We considered the incident to be resolved. We were able to block the attack at our firewall and had implemented several measures to identify and block similar attacks in the future. ## After resolution We continued to implement measures, alerts, and strategies to prevent the problem as well as to improve our response and recovery times. .

Global Internet Issues

2025-06-12T18:10:00.000+00:00

Type: Incident

Duration: 3 hours and 57 minutes

Affected Components: US-Central1 (Google Cloud), US-West-2 (AWS), Dashboard, Europe-West1 (Google Cloud), US-East-1 (AWS)

Jun 12, 18:10:00 GMT+0
Identified - AWS, GCP, and Cloudflare are showing large outages. These are all downstream providers for us. At the moment, new builds/deploys are not going out. Applications largely continue to run, but degraded performance can be expected. (Updated status with proper start date).

Jun 12, 20:07:26 GMT+0
Monitoring - Cloudflare and GCP continue to try to get things back to normal, but no new updates providing a clear path to resolution yet. .

Jun 12, 19:06:49 GMT+0
Monitoring - Per Google: **Multiple GCP products are experiencing impact due to Identity and Access Management Service Issue** .

Jun 12, 20:23:21 GMT+0
Monitoring - It appears we're seeing progress towards recovery, except for GCP US-Central1 Per Google: _Our infrastructure has recovered in all regions except us-central1._.

Jun 12, 22:06:30 GMT+0
Resolved - It appears the multiple downstream incidents have been resolved. All services and components should be fully functional. As always, we'll keep monitoring and let us know if you see anything not behaving as expected so we can investigate. .

Jun 13, 20:36:10 GMT+0
Postmortem - Postmortem from Google: Multiple GCP products are experiencing Service issues - .

Unusually High Flow of Traffic

2025-03-30T16:15:00.000+00:00

Type: Incident

Duration: 1 day, 10 hours and 53 minutes

Affected Components: US-West-2 (AWS)

Mar 30, 16:49:52 GMT+0
Resolved - We will continue to monitor the situation, but everything is operating as expected, so we will mark this resolved..

Mar 30, 16:15:00 GMT+0
Monitoring - We noticed an issue for roughly 25 minutes due to processing an unusual flow of traffic through this region. We have taken steps to mitigate the impacts on the system and are addressing the root cause. We will continue to monitor the situation, but everything is operating as expected..

NPM Builds Not Completeing

2024-11-07T16:31:47.522+00:00

Type: Incident

Duration: 1 day, 14 hours and 58 minutes

Affected Components: US-Central1 (Google Cloud), Europe-West1 (Google Cloud), US-East-1 (AWS), US-West-2 (AWS)

Nov 7, 16:31:47 GMT+0
Investigating - We've seen a few customers with issues related to NPM builds not deploying successfully. It appears that NPM has changed their packaging system, causing the build failures. We should have a solution deployed shortly..

Nov 7, 17:10:44 GMT+0
Resolved - As we believed, there was a change in the packaging system, causing build failures. We have applied a fix for the nodejs resolution issues during the build process. Please contact support if you are still experiencing issues with deploying..

GCP us-central1 Service Outage

2024-07-16T11:31:00.000+00:00

Type: Incident

Duration: 11 days, 19 hours and 20 minutes

Affected Components: US-Central1 (Google Cloud)

Jul 16, 11:31:00 GMT+0
Investigating - We are experiencing a resource attack in the GCP us-central1 region. We are working to mitigate the issue. Applications are currently experiencing degraded performance and outages. We are working with our google partners to resolve this issue..

Jul 16, 14:57:00 GMT+0
Resolved - July 15-16, 2024 us-central Ingress Incident Report Updated July 18, 2024 # **Description of Issue** We received two separate occasions of SYN Flood attacks. SYN Flood attacks are when a source (_in this case many_) sends TCP SYN packets with illegitimate source addresses. Servers attempt to respond to the illegitimate source addresses multiple times and hold onto these "half-open" connections waiting for a reply. Given enough of these types of packets over a short period of time, it can overwhelm buffers in servers and prevent them from handling "_proper_" traffic. The source of the attack has not been identified. The attackers were clever enough to use a distributed attack. This attack was targeted at our load balancers directly and was not an attack on Google Cloud. # **Scope of the Issue** Due to the nature of the attack, it had an impact on the packet delivery of our network interfaces on our servers. This means that all applications running on the impacted server(s) could experience connectivity issues (_ingress and egress_) as the network interfaces would become unusable at times. Though the attack came in on our default ingress load balancers, it was able to affect our dedicated ingress applications as well, since it was halting traffic across the entire network interface. # **Prevention Measures** We have applied a handful of mitigations to prevent this exact type of attack from happening again, many detailed below. Our system will now detect and mitigate heavy volumes of failed SYN retries and SYN ACK retries. We have also increased our processing volume for network traffic at all of our servers. We have also set up more policies, via [Google Cloud Armor](https://cloud.google.com/security/products/armor) and our own firewalls, to detect and mitigate this type of attack and similar DDoS style attacks. Additionally, we are actively working with our Google partners to identify areas that we can improve even further. We are also looking into creating more isolation in our system to limit the scope of similar issues in the future. We are considering moving applications with dedicated ingress to isolated servers to further protect them against multiple types of attacks. # **Customer Recommendations** For this particular type of issue, the prevention efforts largely fall on Gigalixir, as outlined in the previous section. However, for general application protection we would recommend the following: ‌ **Run more than one Replica** When you run more than one replica, we run them on multiple servers and across multiple zones. This helps prevent an issue on a single server or zone from completely taking your application offline. ‌ **Consider Dedicated Ingress** Dedicated Ingress gives your applications their own load balancer and ingress resources. In the near future, we are considering moving applications out of the common runtime server pool, which would provide further isolation, including for similar events. ‌ **Consider DDoS Protections and/or WAF** There are a handful of good products that offer protections for traffic coming through your domain and/or hostnames. _(We can provide individual recommendations upon request.)_ One of our preferred solutions is to utilize [CloudFlare](https://www.cloudflare.com/) for your DNS with a rule that applies a signature to all requests. If you couple this with our Dedicated Ingress, we can limit all traffic through the ingress system to only traffic coming from your Cloudflare setup. CloudFlare offers DDoS protections for all plan levels, including their free plan. # **Incident Timeline** ### **15 July - 13:57 UTC / 08:57 CDT** Several of our health check alerts went off simultaneously and we began to investigate the cause. We quickly found that many applications were experiencing intermittent connectivity issues. We subsequently identified a heavy volume of packets coming in through the load balancers for custom domains in the us-central1 region. The volume of packets was 1000x normal traffic volume. We expected this would cause issues with the default ingress system, which the load balancers were pointing to. However, this was not the case, as the ingress controllers were running normally. At this time, we continued to investigate the traffic. Given the volume, our running assumption was that this was a DDoS attack. ### **15 July - 14:11 UTC / 9:11 UTC** The surge of traffic was over and network connectivity returned to normal operating levels. During the attack, we layered in additional DDoS protection (_via Google Cloud Armor settings_), which we hoped had kicked in and resulted in the decrease in volume. Unfortunately, this was not the case, and it appears the attacker had just given up for the day. ### **15 July** We continued to investigate the attack and preventive measures throughout the day. At that time, we were able to determine the packets were failing at layer 4 (TCP), which means the packets never made it to our ingress system. The flood of packets had overwhelmed the network interfaces themselves on our servers, which was causing sporadic network connectivity with heavy loads of dropped packets. We were then comfortable concluding this was a DDoS attack / SYN Flood attack. We continued to work through the day to investigate the situation and discuss with our Google partners how traffic was making it through to us and what could be done to mitigate this particular issue and others like it. They suggested we layer in some additional policies through Google Cloud Armor to our network setup to help with additional DDoS protections, which was put into place. ### **16 July - 12:57 UTC / 07:57 CDT** We started receiving alerts for various resource outages. We identified that the issue was the same as the day before. This time the attack was even stronger and did not stop after a short period of time. We were seeing over 1500x the number of packets per second than we have on a normal day of operation. Unfortunately, the attack was not coming from a single source IP or range, so blocking traffic by that source was not feasible. We spun up more servers and ingress controllers to attempt to "handle" the load. This had a positive impact on app connectivity, but the root cause was clearly still there. We needed to shed the incoming load. ### **16 July - 13:29 UTC / 08:29 CDT** We applied some more strict policies to our firewall and Google cloud armor to help mitigate the attack. This had an immediate impact of bringing the packet load down to only 30x our normal network load. Applications largely started working again, but there was clearly degraded performance within our network. ### **16 July - 14:33 UTC / 09:33 CDT** To try to break up the inflow of traffic, we stopped processing all traffic to the affected load balancer. This took applications offline that were on the default ingress system. We applied some new changes to our firewall rules to attempt to solve the problem, which were ultimately unsuccessful. ### **16 July - 14:43 UTC / 09:43 CDT** We restarted traffic on the offending load balancer. The attack was still present, but it did lessen the load down to about 28x from normal. We continued to dig through logs and monitors to try to find any way to filter out the traffic reasonably. ### **16 July - 14:56 UTC / 09:56 CDT** At this time, we were able to identify points where the system was handling the problem poorly. The attack was taking advantage of spoofed IP addresses and heavy amounts of TCP retries. That knowledge allowed us to apply changes to detect this situation and silently reject these TCP packets that were previously being sent back to bogus destinations. As we presumptively assume these attacks may persist in the future, we applied additional changes to our network to recognize and handle the same type of attack at a volume that is several orders of magnitude greater than the one we experienced during this timeline. ### **16 July** We continued to monitor and harden the system. We added new policies to ensure all _new_ systems would also have these rules applied at creation. We added additional traffic alerts to our system to help identify similar situations more quickly in the future. Finally, we continue to speak with our Google partners Cloud Armor experts about the situation and get their advice on additional strategies to put in place. We expected (_and still expect_) that Cloud Armor should be able to mitigate these types of attacks. We will continue to work with our partners to apply any changes they recommend and improve our network protection..

GCP us-central1 Degraded Performance

2024-07-15T12:57:00.000+00:00

Type: Incident

Duration: 3 hours and 58 minutes

Affected Components: US-Central1 (Google Cloud)

Jul 15, 12:57:00 GMT+0
Investigating - ## **Scope** Some applications in the us-central1 region experienced degraded packet delivery from approximately 13:57 UTC (08:57 CDT) to 14:11 UTC (09:11 CDT), with most applications recovering by 14:04 UTC (9:04 CDT). [gigalixir.com](http://gigalixir.com), the [Gigalixir console](https://console.gigalixir.com/) and use of the Gigalixir CLI and APIs were also affected during this time..

Jul 15, 16:55:00 GMT+0
Resolved - This incident has been resolved. ## **Timeline** ### **16:55 UTC / 11:55 CDT** The root cause has been identified as a flood of malformed requests into load balancers. We are investing measures to help prevent similar issues in the future. If you have any questions, feel free to contact support at [[email protected]](mailto:[email protected]). ### **14:11 UTC / 09:11 CDT** At this time we show everything is operational and we are continuing to monitor the situation. Please report back if you are still having any issues. Contact us at [[email protected]](mailto:[email protected]) if you have any questions. ### **13:57 UTC / 08:57 CDT** We are experiencing delayed packet delivery and increased packet loss in the us-central1 region. At this time we are investigating the issue and will provide updates as we have them..