Furhan Shabir activity

Furhan Shabir deleted project branch revert-2543ec35 at GitLab.com / GitLab Infrastructure Team / Kubernetes Workloads / GitLab.com

2026-03-19T16:22:01Z

Furhan Shabir (efd2eb14) at 19 Mar 16:22

Furhan Shabir pushed to project branch master at GitLab.com / GitLab Infrastructure Team / Kubernetes Workloads / GitLab.com

2026-03-19T16:21:51Z

Furhan Shabir (d11e142d) at 19 Mar 16:21

Merge branch 'revert-2543ec35' into 'master'

... and 1 more commit

Furhan Shabir accepted merge request !5286: Revert "Merge branch 'chore/add-name-to-keda-triggers' into 'master'" at GitLab.com / GitLab Infrastructure Team ...

2026-03-19T16:21:51Z

What does this MR do?

Revert "Merge branch 'chore/add-name-to-keda-triggers' into 'master'"

This reverts merge request !5284

Adding name to trigger didn't change external metrics name for KEDA scaler.

Author Check-list

Please read the Contributing document and once you do, complete the following:

Check if all of the following apply:
- Assign to the correct reviewer per the contributing document
- Apply the correct metadata per the contributing document
- Link to related MRs for applying the changes on other environments
- Link to related Chef changes
- If necessary link to a Criticality 4 Change Request issue

Reviewer Check-list

Check if all of the following apply:
- Reviewed the diff jobs to confirm changes are as expected
- No changes shown in the diffs not associated with this MR - This may require a rebase or further investigation

Applier Check-list

Make sure there is no ongoing deployment for the affected envs before merging (see #announcements slack channel)

Furhan Shabir commented on issue #375 at GitLab.com / GitLab Infrastructure Team / GitLab Tenant Scale / Tenant Services / Tenant Services team issues

2026-03-19T16:20:55Z

+1 to not paging on 6h burn-rate for Sidekiq and using shorter (1 hour?) burn rate for alerting. We can evaluate new queuing apdex using 1 hour burn rate then.

Furhan Shabir commented on merge request !5286 at GitLab.com / GitLab Infrastructure Team / Kubernetes Workloads / GitLab.com

2026-03-19T06:37:38Z

Adding name to the trigger didn't change the external metrics name that we wanted to change in the first place.

Source

Furhan Shabir opened merge request !5286: Revert "Merge branch 'chore/add-name-to-keda-triggers' into 'master'" at GitLab.com / GitLab Infrastructure Team / ...

2026-03-19T06:35:28Z

What does this MR do?

Revert "Merge branch 'chore/add-name-to-keda-triggers' into 'master'"

This reverts merge request !5284

Adding name to trigger didn't change external metrics name for KEDA scaler.

Author Check-list

Please read the Contributing document and once you do, complete the following:

Check if all of the following apply:
- Assign to the correct reviewer per the contributing document
- Apply the correct metadata per the contributing document
- Link to related MRs for applying the changes on other environments
- Link to related Chef changes
- If necessary link to a Criticality 4 Change Request issue

Reviewer Check-list

Check if all of the following apply:
- Reviewed the diff jobs to confirm changes are as expected
- No changes shown in the diffs not associated with this MR - This may require a rebase or further investigation

Applier Check-list

Make sure there is no ongoing deployment for the affected envs before merging (see #announcements slack channel)

Furhan Shabir pushed new project branch revert-2543ec35 at GitLab.com / GitLab Infrastructure Team / Kubernetes Workloads / GitLab.com

2026-03-19T06:34:31Z

Furhan Shabir (efd2eb14) at 19 Mar 06:34

Revert "Merge branch 'chore/add-name-to-keda-triggers' into 'master'"

Furhan Shabir deleted project branch chore/add-name-to-keda-triggers at GitLab.com / GitLab Infrastructure Team / Kubernetes Workloads / GitLab.com

2026-03-18T14:14:58Z

Furhan Shabir (3c99fd0c) at 18 Mar 14:14

Furhan Shabir pushed to project branch master at GitLab.com / GitLab Infrastructure Team / Kubernetes Workloads / GitLab.com

2026-03-18T14:14:30Z

Furhan Shabir (2543ec35) at 18 Mar 14:14

Merge branch 'chore/add-name-to-keda-triggers' into 'master'

... and 1 more commit

Furhan Shabir accepted merge request !5284: chore: Add name to KEDA prometheus triggers for urgent-cpu-bound at GitLab.com / GitLab Infrastructure Team / Kub...

2026-03-18T14:14:30Z

What

Add name for the KEDA prometheus triggers for urgent-cpu-bound sidekiq shard

Why

We suspect the default name, used by multiple sidekiq shards, is causing prometheus scrape failures and this trigger is often erroring out as a result.

Reference issue: gitlab-com/gl-infra/tenant-scale/tenant-services/team#373 (comment 3170674284)

Author Check-list

Please read the Contributing document and once you do, complete the following:

Check if all of the following apply:
- Assign to the correct reviewer per the contributing document
- Apply the correct metadata per the contributing document
- Link to related MRs for applying the changes on other environments
- Link to related Chef changes
- If necessary link to a Criticality 4 Change Request issue

Reviewer Check-list

Check if all of the following apply:
- Reviewed the diff jobs to confirm changes are as expected
- No changes shown in the diffs not associated with this MR - This may require a rebase or further investigation

Applier Check-list

Make sure there is no ongoing deployment for the affected envs before merging (see #announcements slack channel)

Furhan Shabir opened merge request !5284: chore: Add name to KEDA prometheus triggers for urgent-cpu-bound at GitLab.com / GitLab Infrastructure Team / Kuber...

2026-03-18T13:56:38Z

What

Add name for the KEDA prometheus triggers for urgent-cpu-bound sidekiq shard

Why

We suspect the default name, used by multiple sidekiq shards, is causing prometheus scrape failures and this trigger is often erroring out as a result.

Reference issue: gitlab-com/gl-infra/tenant-scale/tenant-services/team#373 (comment 3170674284)

Author Check-list

Please read the Contributing document and once you do, complete the following:

Check if all of the following apply:
- Assign to the correct reviewer per the contributing document
- Apply the correct metadata per the contributing document
- Link to related MRs for applying the changes on other environments
- Link to related Chef changes
- If necessary link to a Criticality 4 Change Request issue

Reviewer Check-list

Check if all of the following apply:
- Reviewed the diff jobs to confirm changes are as expected
- No changes shown in the diffs not associated with this MR - This may require a rebase or further investigation

Applier Check-list

Make sure there is no ongoing deployment for the affected envs before merging (see #announcements slack channel)

Furhan Shabir pushed new project branch chore/add-name-to-keda-triggers at GitLab.com / GitLab Infrastructure Team / Kubernetes Workloads / GitLab.com

2026-03-18T13:56:07Z

Furhan Shabir (3c99fd0c) at 18 Mar 13:56

chore: Add name to KEDA prometheus triggers for urgent-cpu-bound

Furhan Shabir commented on issue #373 at GitLab.com / GitLab Infrastructure Team / GitLab Tenant Scale / Tenant Services / Tenant Services team issues

2026-03-18T13:44:36Z

There is an improvement in HPA scaling during peak hours, where we are seeing flat-line saturation lasting for shorter times as compared to week before decreasing concurrency, which is a good sign.

Source

Furhan Shabir commented on issue #373 at GitLab.com / GitLab Infrastructure Team / GitLab Tenant Scale / Tenant Services / Tenant Services team issues

2026-03-18T13:40:44Z

KEDA investigation

Looking at the horizontal scaler, it relies on cpu utilization and shard worker saturation. However, looking through the scaler logs, it seems that scaling was almost always happening based on cpu utilization.

The prometheus metrics based scaler is almost always erroring out:

Source

Mimir credentials look alright and there is no explicit reason for the failure in scaler logs:

unable to get external metric gitlab/s1-prometheus/&LabelSelector{MatchLabels:map[string]string{scaledobject.keda.sh/name: urgent-sidekiq-cpu-bound-v2,},MatchExpressions:[]LabelSelectorRequirement{},}: unable to fetch metrics from external metrics API: rpc error: code = Unknown desc = error when getting metric values metric:s1-prometheus encountered error

Since there is no explicit name for KEDA scaling source, it is defaulted to s1-prometheus but then this is happening for all the other scalers (shards) to, which could be the cause of Unknown errors.

Furhan Shabir commented on issue #373 at GitLab.com / GitLab Infrastructure Team / GitLab Tenant Scale / Tenant Services / Tenant Services team issues

2026-03-18T13:29:03Z

Logged an issue to add GVL wait time measurements to sidekiq job logs: #384

Furhan Shabir opened issue #384: Add GVL wait time to Sidekiq logs at GitLab.com / GitLab Infrastructure Team / GitLab Tenant Scale / Tenant Services / Tenan...

2026-03-18T13:22:27Z

Furhan Shabir commented on issue #383 at GitLab.com / GitLab Infrastructure Team / GitLab Tenant Scale / Tenant Services / Tenant Services team issues

2026-03-18T09:57:13Z

Duplicate of #373

Furhan Shabir closed issue #383: Using one hour burn rates for Sidekiq SLO alerts at GitLab.com / GitLab Infrastructure Team / GitLab Tenant Scale / Tenant S...

2026-03-18T09:57:08Z

Furhan Shabir commented on issue #373 at GitLab.com / GitLab Infrastructure Team / GitLab Tenant Scale / Tenant Services / Tenant Services team issues

2026-03-18T08:49:05Z

GVL metrics were enabled here, but it doesn't seem to be much useful in its current form since we only get wait time for GVL for a random job but we can't compare it with total duration of the job to get the measure of time spent in waiting.

Source

We would need to add this metric to logs, which already has duration_s, to come up with wait time percentage.