Chloe Liu activity https://gitlab.com/chloeliu 2026-03-18T16:37:32Z tag:gitlab.com,2026-03-17:5214965361 Chloe Liu pushed to project branch master at GitLab.org / GitLab 2026-03-17T22:09:22Z chloeliu Chloe Liu

Chloe Liu (2f632e5d) at 17 Mar 22:09

Merge branch 'fix-webhook-auto-disable-flaky-test-18789' into 'master'

... and 1 more commit

tag:gitlab.com,2026-03-17:5214963560 Chloe Liu deleted project branch fix-webhook-auto-disable-flaky-test-18789 at GitLab.org / GitLab 2026-03-17T22:08:44Z chloeliu Chloe Liu

Chloe Liu (c48a6a4c) at 17 Mar 22:08

tag:gitlab.com,2026-03-17:5214961451 Chloe Liu accepted merge request !227239: Fix flaky webhook auto-disable test at GitLab.org / GitLab 2026-03-17T22:08:06Z chloeliu Chloe Liu

Root Cause

The flaky test (webhook_events_spec.rb:108 — "hook is auto-disabled") had two problems:

  1. Wrong assertion target. The test asserted alert_status == 'disabled' (permanently disabled), but permanent disable requires recent_failures > 39 (40 consecutive failures). The test only triggers 5 issues, so the hook can reach at most recent_failures == 4, which is temporarily_disabled — never disabled. The test was likely written when the threshold was lower and was not updated when it changed.

  2. Race condition on async worker. The alert_status was checked with a single-shot webhook.reload! + expect(...).to eql(...). Since failures are processed asynchronously via WebHooks::LogExecutionWorker (protected by an ExclusiveLease), the failure count may not have been fully persisted at the time of the assertion. This explains why the test sometimes saw executable instead of any disabled state.

Stack trace from the issue confirms this: expected: "disabled", got: "executable".

Changes

  • Assert temporarily_disabled instead of disabled — This matches what 4 consecutive failures actually produce given the current TEMPORARILY_DISABLED_FAILURE_THRESHOLD = 3 and PERMANENTLY_DISABLED_FAILURE_THRESHOLD = 39.
  • Wrap alert_status check in eventually_eq with retry — Polls webhook.reload!.alert_status for up to 30 seconds, accommodating async worker processing time.
  • Use eventually_be_truthy for event count — Assert at least 4 events arrived (not exactly 4), since timing may allow a 5th event before the hook is disabled.
  • Increase inter-trigger sleep from 0.5s to 1s — Reduces the chance of concurrent backoff! calls racing on the ExclusiveLease, where one call may see the hook as already temporarily disabled and skip incrementing.
  • Remove quarantine annotation — The test should now be stable.

Resolves https://gitlab.com/gitlab-org/quality/test-failure-issues/-/work_items/18789

tag:gitlab.com,2026-03-17:5214958976 Chloe Liu commented on merge request !227239 at GitLab.org / GitLab 2026-03-17T22:07:24Z chloeliu Chloe Liu

@carlad-gl Thanks for the update! Changes LGTM 🚀

tag:gitlab.com,2026-03-17:5214956202 Chloe Liu approved merge request !227239: Fix flaky webhook auto-disable test at GitLab.org / GitLab 2026-03-17T22:06:30Z chloeliu Chloe Liu

Root Cause

The flaky test (webhook_events_spec.rb:108 — "hook is auto-disabled") had two problems:

  1. Wrong assertion target. The test asserted alert_status == 'disabled' (permanently disabled), but permanent disable requires recent_failures > 39 (40 consecutive failures). The test only triggers 5 issues, so the hook can reach at most recent_failures == 4, which is temporarily_disabled — never disabled. The test was likely written when the threshold was lower and was not updated when it changed.

  2. Race condition on async worker. The alert_status was checked with a single-shot webhook.reload! + expect(...).to eql(...). Since failures are processed asynchronously via WebHooks::LogExecutionWorker (protected by an ExclusiveLease), the failure count may not have been fully persisted at the time of the assertion. This explains why the test sometimes saw executable instead of any disabled state.

Stack trace from the issue confirms this: expected: "disabled", got: "executable".

Changes

  • Assert temporarily_disabled instead of disabled — This matches what 4 consecutive failures actually produce given the current TEMPORARILY_DISABLED_FAILURE_THRESHOLD = 3 and PERMANENTLY_DISABLED_FAILURE_THRESHOLD = 39.
  • Wrap alert_status check in eventually_eq with retry — Polls webhook.reload!.alert_status for up to 30 seconds, accommodating async worker processing time.
  • Use eventually_be_truthy for event count — Assert at least 4 events arrived (not exactly 4), since timing may allow a 5th event before the hook is disabled.
  • Increase inter-trigger sleep from 0.5s to 1s — Reduces the chance of concurrent backoff! calls racing on the ExclusiveLease, where one call may see the hook as already temporarily disabled and skip incrementing.
  • Remove quarantine annotation — The test should now be stable.

Resolves https://gitlab.com/gitlab-org/quality/test-failure-issues/-/work_items/18789

tag:gitlab.com,2026-03-17:5214237729 Chloe Liu commented on merge request !227239 at GitLab.org / GitLab 2026-03-17T18:20:14Z chloeliu Chloe Liu

@SamWord Thanks for the clarification 🙏 If we're certain the failure is fully completed by the time we assert, a single reload is sufficient. We probably don't need a eventually_eq in this scenario because we're just polling on a same cached value with a 30s timeout. I'd suggest a direct eq assertion would be sufficient.

tag:gitlab.com,2026-03-17:5212340384 Chloe Liu pushed to project branch master at GitLab.org / GitLab 2026-03-17T11:24:17Z chloeliu Chloe Liu

Chloe Liu (ccc19ffb) at 17 Mar 11:24

Merge branch 'bump-helm-kind-ci' into 'master'

... and 1 more commit

tag:gitlab.com,2026-03-17:5212339351 Chloe Liu deleted project branch bump-helm-kind-ci at GitLab.org / GitLab 2026-03-17T11:24:05Z chloeliu Chloe Liu

Chloe Liu (a141b806) at 17 Mar 11:24

tag:gitlab.com,2026-03-17:5212336661 Chloe Liu accepted merge request !227458: Update kind and helm versions in CI at GitLab.org / GitLab 2026-03-17T11:23:30Z chloeliu Chloe Liu

What does this MR do and why?

Helm v4 is the recommended version for latest version of gitlab chart

Updates CI tool versions:

  • kind: 0.29 → 0.31
  • Helm: 3.16 → 4.1

Pinned kind node image (kindest/node:v1.34.0)

kind 0.31 ships with a default node image of kindest/node:v1.35.0 (Kubernetes v1.35.0). This caused kind cluster creation to fail in CI because the kubelet inside the kind node never becomes healthy in the Docker-in-Docker environment used by the docker+machine executor.

The failure manifests during kubeadm init at the wait-control-plane phase:

[kubelet-check] The kubelet is not healthy after 4m0.000442569s

Kubernetes v1.35.0 is very new and has compatibility issues with the nested DinD setup on CI runners. To resolve this, a --kind-image option has been added to the orchestrator, allowing the kind node image to be pinned independently of the kind binary version. The CI configuration pins kindest/node:v1.34.0 to maintain a working Kubernetes version while still benefiting from kind 0.31 improvements.

Changes

  • .gitlab/ci/version.yml: Bumps kind (0.29 → 0.31) and Helm (3.16 → 4.1) versions
  • .gitlab/ci/test-on-cng/main.gitlab-ci.yml: Passes --kind-image kindest/node:v1.34.0 to the orchestrator
  • qa/gems/gitlab-orchestrator/lib/gitlab/orchestrator/commands/subcommands/deployment.rb: Adds --kind-image CLI option
  • qa/gems/gitlab-orchestrator/lib/gitlab/orchestrator/lib/kind/cluster.rb: Accepts and passes the kind_image parameter to kind create cluster --image

References

MR acceptance checklist

Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

tag:gitlab.com,2026-03-17:5211909851 Chloe Liu approved merge request !227458: Update kind and helm versions in CI at GitLab.org / GitLab 2026-03-17T09:53:51Z chloeliu Chloe Liu

What does this MR do and why?

Helm v4 is the recommended version for latest version of gitlab chart

Updates CI tool versions:

  • kind: 0.29 → 0.31
  • Helm: 3.16 → 4.1

Pinned kind node image (kindest/node:v1.34.0)

kind 0.31 ships with a default node image of kindest/node:v1.35.0 (Kubernetes v1.35.0). This caused kind cluster creation to fail in CI because the kubelet inside the kind node never becomes healthy in the Docker-in-Docker environment used by the docker+machine executor.

The failure manifests during kubeadm init at the wait-control-plane phase:

[kubelet-check] The kubelet is not healthy after 4m0.000442569s

Kubernetes v1.35.0 is very new and has compatibility issues with the nested DinD setup on CI runners. To resolve this, a --kind-image option has been added to the orchestrator, allowing the kind node image to be pinned independently of the kind binary version. The CI configuration pins kindest/node:v1.34.0 to maintain a working Kubernetes version while still benefiting from kind 0.31 improvements.

Changes

  • .gitlab/ci/version.yml: Bumps kind (0.29 → 0.31) and Helm (3.16 → 4.1) versions
  • .gitlab/ci/test-on-cng/main.gitlab-ci.yml: Passes --kind-image kindest/node:v1.34.0 to the orchestrator
  • qa/gems/gitlab-orchestrator/lib/gitlab/orchestrator/commands/subcommands/deployment.rb: Adds --kind-image CLI option
  • qa/gems/gitlab-orchestrator/lib/gitlab/orchestrator/lib/kind/cluster.rb: Accepts and passes the kind_image parameter to kind create cluster --image

References

MR acceptance checklist

Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

tag:gitlab.com,2026-03-17:5211902543 Chloe Liu commented on merge request !227458 at GitLab.org / GitLab 2026-03-17T09:52:25Z chloeliu Chloe Liu

given helm v4 is literal recommendation from our chart so it's probably been tested by team developing chart itself

Good point! I don’t have anything particular in mind, so I’m good with merging this 🚀 Thanks @acunskis 🙏

tag:gitlab.com,2026-03-16:5210439692 Chloe Liu commented on merge request !227458 at GitLab.org / GitLab 2026-03-16T23:59:32Z chloeliu Chloe Liu

@acunskis Thanks for working on this! Just one nitpick and also an overall question since we're bumping major helm version with breaking changes, have we verified all chart operations are compatible? We can go ahead and merge this if this is confirmed 🙏

tag:gitlab.com,2026-03-16:5210439688 Chloe Liu approved merge request !227458: Update kind and helm versions in CI at GitLab.org / GitLab 2026-03-16T23:59:32Z chloeliu Chloe Liu

What does this MR do and why?

Helm v4 is the recommended version for latest version of gitlab chart

Updates CI tool versions:

  • kind: 0.29 → 0.31
  • Helm: 3.16 → 4.1

Pinned kind node image (kindest/node:v1.34.0)

kind 0.31 ships with a default node image of kindest/node:v1.35.0 (Kubernetes v1.35.0). This caused kind cluster creation to fail in CI because the kubelet inside the kind node never becomes healthy in the Docker-in-Docker environment used by the docker+machine executor.

The failure manifests during kubeadm init at the wait-control-plane phase:

[kubelet-check] The kubelet is not healthy after 4m0.000442569s

Kubernetes v1.35.0 is very new and has compatibility issues with the nested DinD setup on CI runners. To resolve this, a --kind-image option has been added to the orchestrator, allowing the kind node image to be pinned independently of the kind binary version. The CI configuration pins kindest/node:v1.34.0 to maintain a working Kubernetes version while still benefiting from kind 0.31 improvements.

Changes

  • .gitlab/ci/version.yml: Bumps kind (0.29 → 0.31) and Helm (3.16 → 4.1) versions
  • .gitlab/ci/test-on-cng/main.gitlab-ci.yml: Passes --kind-image kindest/node:v1.34.0 to the orchestrator
  • qa/gems/gitlab-orchestrator/lib/gitlab/orchestrator/commands/subcommands/deployment.rb: Adds --kind-image CLI option
  • qa/gems/gitlab-orchestrator/lib/gitlab/orchestrator/lib/kind/cluster.rb: Accepts and passes the kind_image parameter to kind create cluster --image

References

MR acceptance checklist

Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

tag:gitlab.com,2026-03-16:5210439681 Chloe Liu commented on merge request !227458 at GitLab.org / GitLab 2026-03-16T23:59:32Z chloeliu Chloe Liu
        --kind-image "${KIND_NODE_IMAGE}" \

nitpick