Fabien Catteau activity

Fabien Catteau commented on issue #570943 at GitLab.org / GitLab

2026-03-18T17:07:38Z

@jtouchstone1 I have some answers but I'm still figuring this out.

Parent interface

I would suggest the admin area.

We need the recovery generation page for an OpenBao deployment.
We have a single OpenBao deployment per GitLab instance.

The SaaS platform would be the exception though.

Right now we have a single OpenBao deployment for SaaS.
However, eventually we'll have one deployment per cell.
GitLab will operate the cells, not the customers.

We might not want this UI on SaaS anyway. Let's say we can turn that off on that platform.

When do we generate the recovery key?

We only generate the recovery key once. That would happen:

after installing GitLab with OpenBao
after enabling enabling OpenBao on an existing GitLab instance
after a reset of the OpenBao database

Display the recovery key

According to the issue, it should be displayed. This ensures that admins on Dedicated can get the key at all time. I'd like to discuss this with SREs familiar with OpenBao/Vault though.

Verify the recovery key

According to the issue, GitLab would verify the recovery key. Admins don't need to type anything since the GitLab backend has the recovery key, and it can connect to OpenBao API to verify it.

cc @jmallissery

Fabien Catteau commented on issue #592905 at GitLab.org / GitLab

2026-03-18T16:23:34Z

@jmallissery Could you go ahead and create these two issue?

Production change issue to generate recovery key – or at least discuss doing it. You might want to ping @pguinoiseau there – or any SRE familiar with Vault. Issue to be added to gitlab-org#19390.
Documentation issue to mention recovery generation and the rake task in post-install section of admin docs. We might also mention the warning in the troubleshoot section (which we don't have at the moment). Issue to be added to gitlab-org#17903.

Then we can close this very issue.

Fabien Catteau commented on issue #592905 at GitLab.org / GitLab

2026-03-18T16:18:19Z

@jmallissery Right, that makes sense.

quoting !208680 (merged):

OpenBao provides more documentation, but the summary of it is that the recovery key can be retrieved once. Once retrieved, the endpoint returns an empty array. To my mind, this enforces two requirements:

We should enforce that there is only one recovery key in the database.

We should avoid losing the recovery key.

Also, the implementation and its spec make it clear that this is expected.

We can only generate a new recovery key when we reset the OpenBao database.

That's not so obvious in the docs though, in my opinion.

Fabien Catteau commented on issue #592905 at GitLab.org / GitLab

2026-03-18T16:11:01Z

@jmallissery Yes, I believe this should be document in post-install instructions of admin docs. It could be repeated in the troubleshooting section – that would be a second entry point. That would be worth a small doc issue.

Do we need to create an issue to track the tasks that we need to do w.r.t. running the rake task in Staging and production to generate recovery keys ? Do we need to do that before G/A ?

Yes, let's create a Change Request issue. We need a separate confidential issue to discuss this with SREs anyways.

Fabien Catteau commented on issue #592905 at GitLab.org / GitLab

2026-03-18T14:53:34Z

@jmallissery Thanks. Let's try running the rake task a second time, and see what happens. I'm adding that to the action items.

Fabien Catteau commented on merge request !227654 at GitLab.org / GitLab

2026-03-18T14:31:45Z

I've updated the user docs but with a different wording:

Value: Cannot be more than 10 KB (10,000 bytes).

Validation passes if the secret is exactly 10 KB.

Fabien Catteau pushed to project branch duo-edit-20260317-131751 at GitLab.org / GitLab

2026-03-18T14:29:38Z

Fabien Catteau (b4408e44) at 18 Mar 14:29

Fix secret value size limit wording

Fabien Catteau pushed to project branch duo-edit-20260317-131751 at GitLab.org / GitLab

2026-03-18T14:25:55Z

Fabien Catteau (3e32356f) at 18 Mar 14:25

Document secret value size limit

Fabien Catteau commented on merge request !227654 at GitLab.org / GitLab

2026-03-18T14:22:15Z

@marcel.amirault Thanks for taking a look. I thought this would be an instance limit similar to https://docs.gitlab.com/administration/instance_limits/#size-of-commit-titles-and-descriptions for instance. In any case, I agree we should fix user documentation first since it's incorrect.

Fabien Catteau approved merge request !227604: Update details about secret availability and masking at GitLab.org / GitLab

2026-03-18T14:18:25Z

What does this MR do?

As mentioned in the related issue discussion, secrets that you try to expose in the job log are [masked], like other CI/CD variables, so we should clarify this. While here, it's worth pointing out that they act like file type variables, so they'd get exposed with cat, not echo.

Additionally, instead of using cat in the example, which is like showing people echo $MY_SECRET (risky example), let's put a more realistic fake example of using a command that accepts credentials from a file.

Related to #592309 (comment 3159698766)

Author's checklist

Optional. Consider taking the GitLab Technical Writing Fundamentals course.
Follow the:
If you're adding a new page, add the product availability details under the H1 topic title.
If you are a GitLab team member, request a review based on:
- The documentation page's metadata.
- The associated Technical Writer.

If you are a GitLab team member and only adding documentation, do not add any of the following labels:

~"frontend"
~"backend"
~"type::bug"
~"database"

These labels cause the MR to be added to code verification QA issues.

Reviewer's checklist

Documentation-related MRs should be reviewed by a Technical Writer for a non-blocking review, based on Documentation Guidelines and the Style Guide.

If you aren't sure which tech writer to ask, use roulette or ask in the #docs Slack channel.

If the content requires it, ensure the information is reviewed by a subject matter expert.
Technical writer review items:
- Ensure docs metadata is present and up-to-date.
- Ensure the appropriate labels are added to this MR.
- Ensure a release milestone is set.
- If relevant to this MR, ensure content topic type principles are in use, including:
  - The headings should be something you'd do a Google search for. Instead of Default behavior, say something like Default behavior when you close an issue.
  - The headings (other than the page title) should be active. Instead of Configuring GDK, say something like Configure GDK.
  - Any task steps should be written as a numbered list.
  - If the content still needs to be edited for topic types, you can create a follow-up issue with the docs-technical-debt label.
Review by assigned maintainer, who can always request/require the reviews above. Maintainer's review can occur before or after a technical writer review.

Fabien Catteau commented on issue #592905 at GitLab.org / GitLab

2026-03-18T14:10:48Z

@jmallissery Thanks for sharing this. Overall that seems correct – though I would have to read thoroughly all that's been shared. 😅

When a proper rotation is eventually implemented (the current rake task only handles the initial bootstrap), the flow would be:

My understanding is that the recovery_key_retrieve task rotates the recovery key and can be called a second time.

Fabien Catteau commented on issue #592905 at GitLab.org / GitLab

2026-03-18T13:54:21Z

@jmallissery Thanks a lot for researching. Indeed this warning makes sense since we haven't generated the recovery. Related issues:

I wouldn't run the Rake task on staging – at least not for testing purposes. To run a rake task on staging and production we would need to go through the change management process. See https://handbook.gitlab.com/handbook/engineering/infrastructure-platforms/change-management/

We could wait for the rake task to be ported to the GitLab UI but that would delay verification.

I suggest we test the Rake task on a GitLab CN or CNH deployment. WDYT?

Indeed we don't have documentation for the rake task. That would probably go under https://docs.gitlab.com/administration/secrets_manager/.

Actions items

Verification: Run gitlab:secrets_management:openbao:recovery_key_retrieve rake task on a GitLab CN/CNH deployment. We expect the warning to no longer appear when the server starts.
- Additionally, test using the recovery key.
- Additionally, try to run the command the second time.
If confirmed, document that in troubleshooting sections for SaaS and self-managed.
- https://runbooks.gitlab.com/secrets-manager/
- https://docs.gitlab.com/administration/secrets_manager/
Additionally, document the rake task – we missed that.

None of this seems critical.

Fabien Catteau commented on issue #592988 at GitLab.org / GitLab

2026-03-18T12:13:50Z

@jrandazzo I'm confirming the two problems reported here.

When extraVolumes and extraVolumeMounts are set by users, the predefined secrets for then unseal key and the audit token are no longer set in the OpenBao chart. There's a workaround though. See #592988 (comment 3169984641)
OpenBao Chart requires its own configuration for certificates. It doesn't get the ones passed to the GitLab Chart. #592988 (comment 3148898103)

I need to open two issues for this.

Fabien Catteau commented on issue #561299 at GitLab.org / GitLab

2026-03-18T11:17:07Z

@reprazent As discussed above, the openbao_core_active metric tells if if we have 1 or 0 active node. We should always have 1. Is that something we can leverage in service metrics? It's redundant with the liveness probe. Does it matter?

We can't get an apdex from requests from HTTP requests since we don't have an error rate. But can we have an apdex based on the active state? Should we?

Fabien Catteau commented on issue #561299 at GitLab.org / GitLab

2026-03-18T11:06:22Z

I think this is finally ready for dev.

Fabien Catteau commented on issue #592988 at GitLab.org / GitLab

2026-03-18T10:55:40Z

@clemensbeck OK, so we would update OpenBao Chart to the leverage what's already defined in https://gitlab.com/gitlab-org/charts/gitlab/-/blob/master/templates/_certificates.tpl. No need for additional values. 👍

I assume there's no workaround for this b/c initContainers can't be set, right?

Fabien Catteau commented on issue #592988 at GitLab.org / GitLab

2026-03-18T10:43:02Z

Thanks for sharing a workaround!

Fabien Catteau commented on issue #592988 at GitLab.org / GitLab

2026-03-18T10:23:13Z

@clemensbeck Thanks! I finally get it. 😅

The OpenBao chart adds static-unseal and http-audit secrets to volumes and volumesMount. That doesn't collide with extraVolumes and extraVolumnesMount. https://gitlab.com/gitlab-org/cloud-native/charts/openbao/-/blob/406a95f3e3f85668e986fbf7704085cd7434db6c/templates/deployment.yaml#L76-112
However, OpenBao chart only adds these secrets to volumes and mounts when generate is true.
The GitLab chart sets generate to false. It prepares secrets for the static unseal key and the http audit token, and sets these using extraVolumes and extraVolumesMount.
Then users of the GitLab chart can't use extraVolumes and extraVolumesMount really.

Fabien Catteau commented on issue #561299 at GitLab.org / GitLab

2026-03-18T09:39:31Z

NOTE: The updated https://gitlab.com/gitlab-com/runbooks/-/blob/master/metrics-catalog/services/secrets-manager.jsonnet would look like this:

metricsCatalog.serviceDefinition(
  runwayArchetype(
    type='secrets-manager',
    team='pipeline_security',
    featureCategory='secrets_management'
  ) + {
    serviceLevelIndicators+: {
      openbao_requests: {
        userImpacting: true,
        featureCategory: 'secrets_management',
        requestRate: rateMetric(
          counter='secrets_manager_openbao_core_handle_request_count'
        ),
        significantLabels: [],
      },
    },
  }
)

For GET Hybrid the metrics is openbao_core_handle_request_count. The Runway service and the OpenBao chart use a different prefix for metrics.

Fabien Catteau commented on issue #561299 at GitLab.org / GitLab

2026-03-18T09:38:58Z

@reprazent Thanks again for your help. So we would follow these steps:

Augment service definition for the secrets-manager Runway service with SLI based on OpenBao metrics.
Check on gitlab.dashboard.net.
Create service definition and SLI for GET Hybrid.
Check this too.

I can't find anything on testing what ends up in get-hybrid/config though. How do we do that? Do we have dev docs for this?