Furhan Shabir activity https://gitlab.com/fshabir 2026-03-19T16:22:01Z tag:gitlab.com,2026-03-19:5223024484 Furhan Shabir deleted project branch revert-2543ec35 at GitLab.com / GitLab Infrastructure Team / Kubernetes Workloads / GitLab.com 2026-03-19T16:22:01Z fshabir Furhan Shabir [email protected]

Furhan Shabir (efd2eb14) at 19 Mar 16:22

tag:gitlab.com,2026-03-19:5223023862 Furhan Shabir pushed to project branch master at GitLab.com / GitLab Infrastructure Team / Kubernetes Workloads / GitLab.com 2026-03-19T16:21:51Z fshabir Furhan Shabir [email protected]

Furhan Shabir (d11e142d) at 19 Mar 16:21

Merge branch 'revert-2543ec35' into 'master'

... and 1 more commit

tag:gitlab.com,2026-03-19:5223023836 Furhan Shabir accepted merge request !5286: Revert "Merge branch 'chore/add-name-to-keda-triggers' into 'master'" at GitLab.com / GitLab Infrastructure Team ... 2026-03-19T16:21:51Z fshabir Furhan Shabir [email protected]

What does this MR do?

Revert "Merge branch 'chore/add-name-to-keda-triggers' into 'master'"

This reverts merge request !5284

Adding name to trigger didn't change external metrics name for KEDA scaler.

Author Check-list

Please read the Contributing document and once you do, complete the following:

  • Check if all of the following apply:
    • Assign to the correct reviewer per the contributing document
    • Apply the correct metadata per the contributing document
    • Link to related MRs for applying the changes on other environments
    • Link to related Chef changes
    • If necessary link to a Criticality 4 Change Request issue

Reviewer Check-list

  • Check if all of the following apply:
    • Reviewed the diff jobs to confirm changes are as expected
    • No changes shown in the diffs not associated with this MR - This may require a rebase or further investigation

Applier Check-list

  • Make sure there is no ongoing deployment for the affected envs before merging (see #announcements slack channel)
tag:gitlab.com,2026-03-19:5223020261 Furhan Shabir commented on issue #375 at GitLab.com / GitLab Infrastructure Team / GitLab Tenant Scale / Tenant Services / Tenant Services team issues 2026-03-19T16:20:55Z fshabir Furhan Shabir [email protected]

+1 to not paging on 6h burn-rate for Sidekiq and using shorter (1 hour?) burn rate for alerting. We can evaluate new queuing apdex using 1 hour burn rate then.

tag:gitlab.com,2026-03-19:5220596998 Furhan Shabir commented on merge request !5286 at GitLab.com / GitLab Infrastructure Team / Kubernetes Workloads / GitLab.com 2026-03-19T06:37:38Z fshabir Furhan Shabir [email protected]

Adding name to the trigger didn't change the external metrics name that we wanted to change in the first place.

Screenshot_2026-03-19_at_15.35.55 Source

tag:gitlab.com,2026-03-19:5220592397 Furhan Shabir opened merge request !5286: Revert "Merge branch 'chore/add-name-to-keda-triggers' into 'master'" at GitLab.com / GitLab Infrastructure Team / ... 2026-03-19T06:35:28Z fshabir Furhan Shabir [email protected]

What does this MR do?

Revert "Merge branch 'chore/add-name-to-keda-triggers' into 'master'"

This reverts merge request !5284

Adding name to trigger didn't change external metrics name for KEDA scaler.

Author Check-list

Please read the Contributing document and once you do, complete the following:

  • Check if all of the following apply:
    • Assign to the correct reviewer per the contributing document
    • Apply the correct metadata per the contributing document
    • Link to related MRs for applying the changes on other environments
    • Link to related Chef changes
    • If necessary link to a Criticality 4 Change Request issue

Reviewer Check-list

  • Check if all of the following apply:
    • Reviewed the diff jobs to confirm changes are as expected
    • No changes shown in the diffs not associated with this MR - This may require a rebase or further investigation

Applier Check-list

  • Make sure there is no ongoing deployment for the affected envs before merging (see #announcements slack channel)
tag:gitlab.com,2026-03-19:5220590441 Furhan Shabir pushed new project branch revert-2543ec35 at GitLab.com / GitLab Infrastructure Team / Kubernetes Workloads / GitLab.com 2026-03-19T06:34:31Z fshabir Furhan Shabir [email protected]

Furhan Shabir (efd2eb14) at 19 Mar 06:34

Revert "Merge branch 'chore/add-name-to-keda-triggers' into 'master'"

tag:gitlab.com,2026-03-18:5217851272 Furhan Shabir deleted project branch chore/add-name-to-keda-triggers at GitLab.com / GitLab Infrastructure Team / Kubernetes Workloads / GitLab.com 2026-03-18T14:14:58Z fshabir Furhan Shabir [email protected]

Furhan Shabir (3c99fd0c) at 18 Mar 14:14

tag:gitlab.com,2026-03-18:5217848760 Furhan Shabir pushed to project branch master at GitLab.com / GitLab Infrastructure Team / Kubernetes Workloads / GitLab.com 2026-03-18T14:14:30Z fshabir Furhan Shabir [email protected]

Furhan Shabir (2543ec35) at 18 Mar 14:14

Merge branch 'chore/add-name-to-keda-triggers' into 'master'

... and 1 more commit

tag:gitlab.com,2026-03-18:5217848719 Furhan Shabir accepted merge request !5284: chore: Add name to KEDA prometheus triggers for urgent-cpu-bound at GitLab.com / GitLab Infrastructure Team / Kub... 2026-03-18T14:14:30Z fshabir Furhan Shabir [email protected]

What

Add name for the KEDA prometheus triggers for urgent-cpu-bound sidekiq shard

Why

We suspect the default name, used by multiple sidekiq shards, is causing prometheus scrape failures and this trigger is often erroring out as a result.

Reference issue: gitlab-com/gl-infra/tenant-scale/tenant-services/team#373 (comment 3170674284)

Author Check-list

Please read the Contributing document and once you do, complete the following:

  • Check if all of the following apply:
    • Assign to the correct reviewer per the contributing document
    • Apply the correct metadata per the contributing document
    • Link to related MRs for applying the changes on other environments
    • Link to related Chef changes
    • If necessary link to a Criticality 4 Change Request issue

Reviewer Check-list

  • Check if all of the following apply:
    • Reviewed the diff jobs to confirm changes are as expected
    • No changes shown in the diffs not associated with this MR - This may require a rebase or further investigation

Applier Check-list

  • Make sure there is no ongoing deployment for the affected envs before merging (see #announcements slack channel)
tag:gitlab.com,2026-03-18:5217751422 Furhan Shabir opened merge request !5284: chore: Add name to KEDA prometheus triggers for urgent-cpu-bound at GitLab.com / GitLab Infrastructure Team / Kuber... 2026-03-18T13:56:38Z fshabir Furhan Shabir [email protected]

What

Add name for the KEDA prometheus triggers for urgent-cpu-bound sidekiq shard

Why

We suspect the default name, used by multiple sidekiq shards, is causing prometheus scrape failures and this trigger is often erroring out as a result.

Reference issue: gitlab-com/gl-infra/tenant-scale/tenant-services/team#373 (comment 3170674284)

Author Check-list

Please read the Contributing document and once you do, complete the following:

  • Check if all of the following apply:
    • Assign to the correct reviewer per the contributing document
    • Apply the correct metadata per the contributing document
    • Link to related MRs for applying the changes on other environments
    • Link to related Chef changes
    • If necessary link to a Criticality 4 Change Request issue

Reviewer Check-list

  • Check if all of the following apply:
    • Reviewed the diff jobs to confirm changes are as expected
    • No changes shown in the diffs not associated with this MR - This may require a rebase or further investigation

Applier Check-list

  • Make sure there is no ongoing deployment for the affected envs before merging (see #announcements slack channel)
tag:gitlab.com,2026-03-18:5217748583 Furhan Shabir pushed new project branch chore/add-name-to-keda-triggers at GitLab.com / GitLab Infrastructure Team / Kubernetes Workloads / GitLab.com 2026-03-18T13:56:07Z fshabir Furhan Shabir [email protected]

Furhan Shabir (3c99fd0c) at 18 Mar 13:56

chore: Add name to KEDA prometheus triggers for urgent-cpu-bound

tag:gitlab.com,2026-03-18:5217691310 Furhan Shabir commented on issue #373 at GitLab.com / GitLab Infrastructure Team / GitLab Tenant Scale / Tenant Services / Tenant Services team issues 2026-03-18T13:44:36Z fshabir Furhan Shabir [email protected]

There is an improvement in HPA scaling during peak hours, where we are seeing flat-line saturation lasting for shorter times as compared to week before decreasing concurrency, which is a good sign.

Screenshot_2026-03-18_at_22.41.40 Source

tag:gitlab.com,2026-03-18:5217672556 Furhan Shabir commented on issue #373 at GitLab.com / GitLab Infrastructure Team / GitLab Tenant Scale / Tenant Services / Tenant Services team issues 2026-03-18T13:40:44Z fshabir Furhan Shabir [email protected]

KEDA investigation

Looking at the horizontal scaler, it relies on cpu utilization and shard worker saturation. However, looking through the scaler logs, it seems that scaling was almost always happening based on cpu utilization.

The prometheus metrics based scaler is almost always erroring out:

Screenshot_2026-03-18_at_22.37.40 Source

Mimir credentials look alright and there is no explicit reason for the failure in scaler logs:

unable to get external metric gitlab/s1-prometheus/&LabelSelector{MatchLabels:map[string]string{scaledobject.keda.sh/name: urgent-sidekiq-cpu-bound-v2,},MatchExpressions:[]LabelSelectorRequirement{},}: unable to fetch metrics from external metrics API: rpc error: code = Unknown desc = error when getting metric values metric:s1-prometheus encountered error

Since there is no explicit name for KEDA scaling source, it is defaulted to s1-prometheus but then this is happening for all the other scalers (shards) to, which could be the cause of Unknown errors.

tag:gitlab.com,2026-03-18:5217613592 Furhan Shabir commented on issue #373 at GitLab.com / GitLab Infrastructure Team / GitLab Tenant Scale / Tenant Services / Tenant Services team issues 2026-03-18T13:29:03Z fshabir Furhan Shabir [email protected]

Logged an issue to add GVL wait time measurements to sidekiq job logs: #384

tag:gitlab.com,2026-03-18:5217579793 Furhan Shabir opened issue #384: Add GVL wait time to Sidekiq logs at GitLab.com / GitLab Infrastructure Team / GitLab Tenant Scale / Tenant Services / Tenan... 2026-03-18T13:22:27Z fshabir Furhan Shabir [email protected] tag:gitlab.com,2026-03-18:5216619605 Furhan Shabir commented on issue #383 at GitLab.com / GitLab Infrastructure Team / GitLab Tenant Scale / Tenant Services / Tenant Services team issues 2026-03-18T09:57:13Z fshabir Furhan Shabir [email protected]

Duplicate of #373

tag:gitlab.com,2026-03-18:5216619255 Furhan Shabir closed issue #383: Using one hour burn rates for Sidekiq SLO alerts at GitLab.com / GitLab Infrastructure Team / GitLab Tenant Scale / Tenant S... 2026-03-18T09:57:08Z fshabir Furhan Shabir [email protected] tag:gitlab.com,2026-03-18:5216311705 Furhan Shabir commented on issue #373 at GitLab.com / GitLab Infrastructure Team / GitLab Tenant Scale / Tenant Services / Tenant Services team issues 2026-03-18T08:49:05Z fshabir Furhan Shabir [email protected]

GVL metrics were enabled here, but it doesn't seem to be much useful in its current form since we only get wait time for GVL for a random job but we can't compare it with total duration of the job to get the measure of time spent in waiting.

Screenshot_2026-03-18_at_17.48.31 Source

We would need to add this metric to logs, which already has duration_s, to come up with wait time percentage.