Fabien Catteau activity https://gitlab.com/fcatteau 2026-03-18T17:07:38Z tag:gitlab.com,2026-03-18:5218725477 Fabien Catteau commented on issue #570943 at GitLab.org / GitLab 2026-03-18T17:07:38Z fcatteau Fabien Catteau [email protected]

@jtouchstone1 I have some answers but I'm still figuring this out.

Parent interface

I would suggest the admin area.

  • We need the recovery generation page for an OpenBao deployment.
  • We have a single OpenBao deployment per GitLab instance.

The SaaS platform would be the exception though.

  • Right now we have a single OpenBao deployment for SaaS.
  • However, eventually we'll have one deployment per cell.
  • GitLab will operate the cells, not the customers.

We might not want this UI on SaaS anyway. Let's say we can turn that off on that platform.

When do we generate the recovery key?

We only generate the recovery key once. That would happen:

  • after installing GitLab with OpenBao
  • after enabling enabling OpenBao on an existing GitLab instance
  • after a reset of the OpenBao database

Display the recovery key

According to the issue, it should be displayed. This ensures that admins on Dedicated can get the key at all time. I'd like to discuss this with SREs familiar with OpenBao/Vault though.

Verify the recovery key

According to the issue, GitLab would verify the recovery key. Admins don't need to type anything since the GitLab backend has the recovery key, and it can connect to OpenBao API to verify it.

cc @jmallissery

tag:gitlab.com,2026-03-18:5218491891 Fabien Catteau commented on issue #592905 at GitLab.org / GitLab 2026-03-18T16:23:34Z fcatteau Fabien Catteau [email protected]

@jmallissery Could you go ahead and create these two issue?

  • Production change issue to generate recovery key – or at least discuss doing it. You might want to ping @pguinoiseau there – or any SRE familiar with Vault. Issue to be added to gitlab-org#19390.
  • Documentation issue to mention recovery generation and the rake task in post-install section of admin docs. We might also mention the warning in the troubleshoot section (which we don't have at the moment). Issue to be added to gitlab-org#17903.

Then we can close this very issue.

tag:gitlab.com,2026-03-18:5218469173 Fabien Catteau commented on issue #592905 at GitLab.org / GitLab 2026-03-18T16:18:19Z fcatteau Fabien Catteau [email protected]

@jmallissery Right, that makes sense.

quoting !208680 (merged):

OpenBao provides more documentation, but the summary of it is that the recovery key can be retrieved once. Once retrieved, the endpoint returns an empty array. To my mind, this enforces two requirements:

  • We should enforce that there is only one recovery key in the database.
  • We should avoid losing the recovery key.

Also, the implementation and its spec make it clear that this is expected.

We can only generate a new recovery key when we reset the OpenBao database.

That's not so obvious in the docs though, in my opinion.

tag:gitlab.com,2026-03-18:5218435598 Fabien Catteau commented on issue #592905 at GitLab.org / GitLab 2026-03-18T16:11:01Z fcatteau Fabien Catteau [email protected]

@jmallissery Yes, I believe this should be document in post-install instructions of admin docs. It could be repeated in the troubleshooting section – that would be a second entry point. That would be worth a small doc issue.

Do we need to create an issue to track the tasks that we need to do w.r.t. running the rake task in Staging and production to generate recovery keys ? Do we need to do that before G/A ?

Yes, let's create a Change Request issue. We need a separate confidential issue to discuss this with SREs anyways.

tag:gitlab.com,2026-03-18:5218055329 Fabien Catteau commented on issue #592905 at GitLab.org / GitLab 2026-03-18T14:53:34Z fcatteau Fabien Catteau [email protected]

@jmallissery Thanks. Let's try running the rake task a second time, and see what happens. I'm adding that to the action items.

tag:gitlab.com,2026-03-18:5217937133 Fabien Catteau commented on merge request !227654 at GitLab.org / GitLab 2026-03-18T14:31:45Z fcatteau Fabien Catteau [email protected]

I've updated the user docs but with a different wording:

Value: Cannot be more than 10 KB (10,000 bytes).

Validation passes if the secret is exactly 10 KB.

tag:gitlab.com,2026-03-18:5217926238 Fabien Catteau pushed to project branch duo-edit-20260317-131751 at GitLab.org / GitLab 2026-03-18T14:29:38Z fcatteau Fabien Catteau [email protected]

Fabien Catteau (b4408e44) at 18 Mar 14:29

Fix secret value size limit wording

tag:gitlab.com,2026-03-18:5217907196 Fabien Catteau pushed to project branch duo-edit-20260317-131751 at GitLab.org / GitLab 2026-03-18T14:25:55Z fcatteau Fabien Catteau [email protected]

Fabien Catteau (3e32356f) at 18 Mar 14:25

Document secret value size limit

tag:gitlab.com,2026-03-18:5217889292 Fabien Catteau commented on merge request !227654 at GitLab.org / GitLab 2026-03-18T14:22:15Z fcatteau Fabien Catteau [email protected]

@marcel.amirault Thanks for taking a look. I thought this would be an instance limit similar to https://docs.gitlab.com/administration/instance_limits/#size-of-commit-titles-and-descriptions for instance. In any case, I agree we should fix user documentation first since it's incorrect.

tag:gitlab.com,2026-03-18:5217869811 Fabien Catteau approved merge request !227604: Update details about secret availability and masking at GitLab.org / GitLab 2026-03-18T14:18:25Z fcatteau Fabien Catteau [email protected]

What does this MR do?

As mentioned in the related issue discussion, secrets that you try to expose in the job log are [masked], like other CI/CD variables, so we should clarify this. While here, it's worth pointing out that they act like file type variables, so they'd get exposed with cat, not echo.

Additionally, instead of using cat in the example, which is like showing people echo $MY_SECRET (risky example), let's put a more realistic fake example of using a command that accepts credentials from a file.

Author's checklist

If you are a GitLab team member and only adding documentation, do not add any of the following labels:

  • ~"frontend"
  • ~"backend"
  • ~"type::bug"
  • ~"database"

These labels cause the MR to be added to code verification QA issues.

Reviewer's checklist

Documentation-related MRs should be reviewed by a Technical Writer for a non-blocking review, based on Documentation Guidelines and the Style Guide.

If you aren't sure which tech writer to ask, use roulette or ask in the #docs Slack channel.

  • If the content requires it, ensure the information is reviewed by a subject matter expert.
  • Technical writer review items:
    • Ensure docs metadata is present and up-to-date.
    • Ensure the appropriate labels are added to this MR.
    • Ensure a release milestone is set.
    • If relevant to this MR, ensure content topic type principles are in use, including:
      • The headings should be something you'd do a Google search for. Instead of Default behavior, say something like Default behavior when you close an issue.
      • The headings (other than the page title) should be active. Instead of Configuring GDK, say something like Configure GDK.
      • Any task steps should be written as a numbered list.
      • If the content still needs to be edited for topic types, you can create a follow-up issue with the docs-technical-debt label.
  • Review by assigned maintainer, who can always request/require the reviews above. Maintainer's review can occur before or after a technical writer review.
tag:gitlab.com,2026-03-18:5217829080 Fabien Catteau commented on issue #592905 at GitLab.org / GitLab 2026-03-18T14:10:48Z fcatteau Fabien Catteau [email protected]

@jmallissery Thanks for sharing this. Overall that seems correct – though I would have to read thoroughly all that's been shared. 😅

When a proper rotation is eventually implemented (the current rake task only handles the initial bootstrap), the flow would be:

My understanding is that the recovery_key_retrieve task rotates the recovery key and can be called a second time.

tag:gitlab.com,2026-03-18:5217739550 Fabien Catteau commented on issue #592905 at GitLab.org / GitLab 2026-03-18T13:54:21Z fcatteau Fabien Catteau [email protected]

@jmallissery Thanks a lot for researching. Indeed this warning makes sense since we haven't generated the recovery. Related issues:

I wouldn't run the Rake task on staging – at least not for testing purposes. To run a rake task on staging and production we would need to go through the change management process. See https://handbook.gitlab.com/handbook/engineering/infrastructure-platforms/change-management/

We could wait for the rake task to be ported to the GitLab UI but that would delay verification.

I suggest we test the Rake task on a GitLab CN or CNH deployment. WDYT?

Indeed we don't have documentation for the rake task. That would probably go under https://docs.gitlab.com/administration/secrets_manager/.

Actions items

  • Verification: Run gitlab:secrets_management:openbao:recovery_key_retrieve rake task on a GitLab CN/CNH deployment. We expect the warning to no longer appear when the server starts.
    • Additionally, test using the recovery key.
    • Additionally, try to run the command the second time.
  • If confirmed, document that in troubleshooting sections for SaaS and self-managed.
  • Additionally, document the rake task – we missed that.

None of this seems critical.

tag:gitlab.com,2026-03-18:5217250692 Fabien Catteau commented on issue #592988 at GitLab.org / GitLab 2026-03-18T12:13:50Z fcatteau Fabien Catteau [email protected]

@jrandazzo I'm confirming the two problems reported here.

  • When extraVolumes and extraVolumeMounts are set by users, the predefined secrets for then unseal key and the audit token are no longer set in the OpenBao chart. There's a workaround though. See #592988 (comment 3169984641)
  • OpenBao Chart requires its own configuration for certificates. It doesn't get the ones passed to the GitLab Chart. #592988 (comment 3148898103)

I need to open two issues for this.

tag:gitlab.com,2026-03-18:5217004716 Fabien Catteau commented on issue #561299 at GitLab.org / GitLab 2026-03-18T11:17:07Z fcatteau Fabien Catteau [email protected]

@reprazent As discussed above, the openbao_core_active metric tells if if we have 1 or 0 active node. We should always have 1. Is that something we can leverage in service metrics? It's redundant with the liveness probe. Does it matter?

We can't get an apdex from requests from HTTP requests since we don't have an error rate. But can we have an apdex based on the active state? Should we?

tag:gitlab.com,2026-03-18:5216959069 Fabien Catteau commented on issue #561299 at GitLab.org / GitLab 2026-03-18T11:06:22Z fcatteau Fabien Catteau [email protected]

I think this is finally ready for dev.

tag:gitlab.com,2026-03-18:5216910164 Fabien Catteau commented on issue #592988 at GitLab.org / GitLab 2026-03-18T10:55:40Z fcatteau Fabien Catteau [email protected]

@clemensbeck OK, so we would update OpenBao Chart to the leverage what's already defined in https://gitlab.com/gitlab-org/charts/gitlab/-/blob/master/templates/_certificates.tpl. No need for additional values. 👍

I assume there's no workaround for this b/c initContainers can't be set, right?

tag:gitlab.com,2026-03-18:5216848830 Fabien Catteau commented on issue #592988 at GitLab.org / GitLab 2026-03-18T10:43:02Z fcatteau Fabien Catteau [email protected]

Thanks for sharing a workaround!

tag:gitlab.com,2026-03-18:5216753807 Fabien Catteau commented on issue #592988 at GitLab.org / GitLab 2026-03-18T10:23:13Z fcatteau Fabien Catteau [email protected]

@clemensbeck Thanks! I finally get it. 😅

tag:gitlab.com,2026-03-18:5216538718 Fabien Catteau commented on issue #561299 at GitLab.org / GitLab 2026-03-18T09:39:31Z fcatteau Fabien Catteau [email protected]

NOTE: The updated https://gitlab.com/gitlab-com/runbooks/-/blob/master/metrics-catalog/services/secrets-manager.jsonnet would look like this:

metricsCatalog.serviceDefinition(
  runwayArchetype(
    type='secrets-manager',
    team='pipeline_security',
    featureCategory='secrets_management'
  ) + {
    serviceLevelIndicators+: {
      openbao_requests: {
        userImpacting: true,
        featureCategory: 'secrets_management',
        requestRate: rateMetric(
          counter='secrets_manager_openbao_core_handle_request_count'
        ),
        significantLabels: [],
      },
    },
  }
)

For GET Hybrid the metrics is openbao_core_handle_request_count. The Runway service and the OpenBao chart use a different prefix for metrics.

tag:gitlab.com,2026-03-18:5216536322 Fabien Catteau commented on issue #561299 at GitLab.org / GitLab 2026-03-18T09:38:58Z fcatteau Fabien Catteau [email protected]

@reprazent Thanks again for your help. So we would follow these steps:

  1. Augment service definition for the secrets-manager Runway service with SLI based on OpenBao metrics.
  2. Check on gitlab.dashboard.net.
  3. Create service definition and SLI for GET Hybrid.
  4. Check this too.

I can't find anything on testing what ends up in get-hybrid/config though. How do we do that? Do we have dev docs for this?