There does seem to be a metrics gap, but for all metrics. Not just application SLIs, so that is unrelated to this.
Marking this as done, so we can proceed with the production side tomorrow.
QA is still running, but the deploy completed: https://ops.gitlab.net/gitlab-com/gl-infra/k8s-workloads/gitlab-com/-/pipelines/5587799
Bob Van Landuyt (57555e04) at 17 Mar 16:55
Implement support for passing saturation dimensions (starting with shard) as Grafana URL variables on Service Overview dashboard links generated from capacity warning issues.
In capacity warning threads, links to Service Overview dashboards are not immediately useful for debugging unless engineers manually apply shard filters.
Example context:
import-shared-storage) to quickly assess trend and saturation context.shard (and other available saturation dimensions where applicable) as URL variables in Service Overview dashboard links.This issue tracks the proposal discussed in:
Bob Van Landuyt (296434ef) at 17 Mar 16:54
Merge branch 'feat/filter-service-dashboard-by-dimension' into 'main'
... and 4 more commits
Bob Van Landuyt (a939c4ff) at 17 Mar 16:51
Bob Van Landuyt (ee7167df) at 17 Mar 16:51
Merge branch 'reprazent/enable-labkit-sli-staging' into 'master'
... and 1 more commit
Enable the GITLAB_LABKIT_SLI environment variable in the staging environment for both webservice and sidekiq.
When set, this switches Gitlab::Metrics::Sli from the in-repo implementation to the one provided by the Labkit gem (Labkit::Metrics::Sli). The implementations are functionally identical — this toggle validates correctness before removing the in-repo copy entirely.
GITLAB_LABKIT_SLI=1 in gitlab.webservice.extraEnv (gstg)GITLAB_LABKIT_SLI=1 in gitlab.sidekiq.extraEnv (gstg)Production checks pass now. So I'll continue with this staging change so we can still do the production one tomorrow.
This is in progress in gitlab-com/gl-infra/tenant-scale/cells-infrastructure/team#616 by the tenant services team (with help from us, and setup that Calliope has done).
- Alerts are not yet available for Cells. [In progress Work Item](https://gitlab.com/gitlab-com/gl-infra/tenant-scale/cells-infrastructure/team/-/work_items/616).We've touched on this discussion a bit, but it's currently not in progress as such. @knottos has been working on logging for the Component Ownership Model (component-tenant) in gitlab-com/gl-infra#1711. We've touched on cells in discussions on how those things fit together, but there's no conclusion yet.
@knottos Please correct me if I'm wrong or outdated here
- Alerts are available through a tenant-local alertmanager and can be configured through the metrics-catalog.- Logs are available using a Cloud-provided service: OpenSearch (AWS)We don't have dedicated on GCP, as far as I know.