Patrick Bajao activity https://gitlab.com/patrickbajao 2026-03-16T07:04:07Z tag:gitlab.com,2026-03-16:5206708455 Patrick Bajao commented on issue #502515 at GitLab.org / GitLab 2026-03-16T07:04:07Z patrickbajao Patrick Bajao [email protected]

Assigning this to me for 18.11.

tag:gitlab.com,2026-03-16:5206674438 Patrick Bajao commented on issue #593667 at GitLab.org / GitLab 2026-03-16T06:53:31Z patrickbajao Patrick Bajao [email protected]

Capturing the discussion from Slack that started on Feb 24, 2026:

@patrickbajao:

I just caught up from the last week's recording. Just sharing my thoughts on the "diffs when whitespace is disabled" item.

RE: caching diffs when whitespace is disabled, I think we should look into improving the Gitaly RPC first (the new one) as other Rapid Diffs pages (Commit, Compare, MR creation) will also benefit from it. @slashmanov opened a new issue to improve that RPC and I think that should be the next step: gitaly#7087 (closed).

I'm 20/80 on caching diffs when whitespace is disabled because of the difference in usage. It's not really being used a lot based on the metrics mentioned during the call. So storing it like how we store the default diffs can be wasteful. We can be "smart" to only cache it on demand so subsequent requests will be faster. We can also opt in to only store it when instance is using external diffs. But that's additional complexity and storage for something that is not being used a lot. (edited)

@slashmanov:

I see an ‘ideal’ architecture as the following:

Rapid Diffs → RPC Cache service → Highlighter service (Headwaters; we can also delegate sorting and binary detection to it) → Gitaly

This allows us to prune diffs in the database because caching service will be responsible for that. Same for highlighting. We can also configure cache with better precision: decide what and when to cache, how long, etc. Since we cache RPC responses we remain flexible on the Rapid Diffs app part.

We always store keeparound refs, might as well reuse those to reduce requirements for existing caching layers: DB (or Object storage), Redis, etc. (edited)

Whole architecture.

image

@patrickbajao:

Yup, my ideal architecture is we have a separate "diffs" service that we can ask diffs from based on specific diff refs. That abstracts the caching and highlighting and will return diffs the way we need it in Rapid Diffs and do it "rapidly". This is because I want to reduce the footprint of diffs in DB (I don't think we can remove all diff related tables though). This is becoming a crossover between 2 initiatives. 😄

There are things that make me hesitate here though:

  1. Where will we store the cache? Not all instances have object storage. This is why we have external diffs configurable and by default uses the DB. So if we use DB by default, it doesn't make a difference in terms of DB footprint. 😅 Another storage that instances always have would be redis (which has its own constraints), which would mean we need to have our own redis instance for this or use the existing redis cache cluster.
  2. Additional complexity. We need to make sure this service scales for different instances. The team needs to manage another service and have expertise on it. The same concern I had when I looked into separate highlighting service when we were starting Rapid Diffs.

I do think it's an idea that we need to keep in mind. We are already caching now but not for everything which I think is acceptable given the usage. Right now, I still feel that  improving the Gitaly RPC first is what we need to do.

tag:gitlab.com,2026-03-16:5206653094 Patrick Bajao opened issue #593667: Discussion: Future Rapid Diffs Backend architecture at GitLab.org / GitLab 2026-03-16T06:47:00Z patrickbajao Patrick Bajao [email protected] tag:gitlab.com,2026-03-16:5206635795 Patrick Bajao commented on issue #7087 at GitLab.org / Gitaly 2026-03-16T06:42:43Z patrickbajao Patrick Bajao [email protected]

@jamesliu-gitlab we can try to use that since it's already being used in gitlab-rails codebase. One question I have though is why are we not allowing concurrency for Gitaly client? What are possible gotchas that can happen if we do so?

tag:gitlab.com,2026-03-16:5206504111 Patrick Bajao commented on merge request !227425 at GitLab.org / GitLab 2026-03-16T05:48:48Z patrickbajao Patrick Bajao [email protected]

Hi @krasio, this is the MR to document the sequence_name option. Can you please take a look when you have time? 🙏

tag:gitlab.com,2026-03-16:5206499937 Patrick Bajao commented on issue #590307 at GitLab.org / GitLab 2026-03-16T05:45:57Z patrickbajao Patrick Bajao [email protected]

@Saahmed yup, it can take time. As a team member work on groupcode review specific work, they can gain expertise or knowledge around our domain.

I'm not opposed to have a more specific list of things to look at. Like examples for each item I listed above. Would that help? My worry is that it can become outdated as our codebase/feature evolves.

FWIW, my mindset when reviewing is that if I'm not confident I am the best reviewer, I ask another reviewer. This is inline with our generic Code Review guidelines.

In this case of groupcode review domain expert review, if the reviewer being asked isn't really familiar with the change, they should ask a groupcode review engineer who is more familiar. But I don't think this should stop them from asking questions, reviewing, giving their opinions and suggestions. This way, they can learn and collaborate at the same time.

I'm keen to hear/read other engineers thoughts about this. 🙂

tag:gitlab.com,2026-03-16:5206470845 Patrick Bajao pushed to project branch pb-partitioned-by-sequence-name-doc at GitLab.org / GitLab 2026-03-16T05:27:27Z patrickbajao Patrick Bajao [email protected]

Patrick Bajao (4cef5abe) at 16 Mar 05:27

Doc: add info about sequence_name in partition config

tag:gitlab.com,2026-03-16:5206446095 Patrick Bajao opened merge request !227425: Doc: add info about sequence_name in partition config at GitLab.org / GitLab 2026-03-16T05:13:57Z patrickbajao Patrick Bajao [email protected]

What does this MR do?

sequence_name option has been added but it's not documented. This adds info about that option, when to set it and why.

!226992 (comment 3161137470)

Author's checklist

If you are a GitLab team member and only adding documentation, do not add any of the following labels:

  • ~"frontend"
  • ~"backend"
  • ~"type::bug"
  • ~"database"

These labels cause the MR to be added to code verification QA issues.

Reviewer's checklist

Documentation-related MRs should be reviewed by a Technical Writer for a non-blocking review, based on Documentation Guidelines and the Style Guide.

If you aren't sure which tech writer to ask, use roulette or ask in the #docs Slack channel.

  • If the content requires it, ensure the information is reviewed by a subject matter expert.
  • Technical writer review items:
    • Ensure docs metadata is present and up-to-date.
    • Ensure the appropriate labels are added to this MR.
    • Ensure a release milestone is set.
    • If relevant to this MR, ensure content topic type principles are in use, including:
      • The headings should be something you'd do a Google search for. Instead of Default behavior, say something like Default behavior when you close an issue.
      • The headings (other than the page title) should be active. Instead of Configuring GDK, say something like Configure GDK.
      • Any task steps should be written as a numbered list.
      • If the content still needs to be edited for topic types, you can create a follow-up issue with the docs-technical-debt label.
  • Review by assigned maintainer, who can always request/require the reviews above. Maintainer's review can occur before or after a technical writer review.
tag:gitlab.com,2026-03-16:5206443287 Patrick Bajao pushed new project branch pb-partitioned-by-sequence-name-doc at GitLab.org / GitLab 2026-03-16T05:12:13Z patrickbajao Patrick Bajao [email protected]

Patrick Bajao (ea1a28f5) at 16 Mar 05:12

Doc: add info about sequence_name in partition config

tag:gitlab.com,2026-03-16:5206420248 Patrick Bajao commented on merge request !227418 at GitLab.org / GitLab 2026-03-16T05:00:36Z patrickbajao Patrick Bajao [email protected]

@krasio my understanding is that the sequence_name is being used to get the min_id. If it's not, then we return MIN_ID.

In this model's case, sequence_name by default is public.merge_request_commits_metadata_id_seq. So it takes the starting ID of that as the min_id. When this table was created, it starts with 1 so it gets that as min_id.

If it was altered to support cells, then it should still get whatever the min_value of public.merge_request_commits_metadata_id_seq. I'm thinking both min_value of merge_request_commits_metadata_id_seq and projects_id_seq should be the same but this MR makes sure we're getting it from the appropriate column. Unless of course sequences were updated and they both have a different min_value (not sure why would that happen).

For GitLab.com, it won't do anything since the partitions were already created (we created them during creation of this table via DB migration). New partitions will continue from the last partition's upper bound.

For existing self-managed instances, if they already have this table + partitions, then it won't have any effect as well given that partitions already exists.

WDYT? Am I missing anything here?

tag:gitlab.com,2026-03-16:5206356050 Patrick Bajao commented on merge request !226992 at GitLab.org / GitLab 2026-03-16T04:21:34Z patrickbajao Patrick Bajao [email protected]

Thanks @krasio! I replied to your questions. 🏓

tag:gitlab.com,2026-03-16:5206356046 Patrick Bajao commented on merge request !226992 at GitLab.org / GitLab 2026-03-16T04:21:34Z patrickbajao Patrick Bajao [email protected]

This is also related to this discussion: !223703 (comment 3118326876).

We bulk create merge_request_diff_commits records in MergeRequestDiffCommit.create_bulk. If we use FOR EACH ROW, it means we will have N+1 INSERT calls.

tag:gitlab.com,2026-03-16:5206356041 Patrick Bajao commented on merge request !226992 at GitLab.org / GitLab 2026-03-16T04:21:34Z patrickbajao Patrick Bajao [email protected]

This is related to this discussion in original MR: !223703 (comment 3118326876).

We are currently not setting project_id in old merge_request_diff_commits table. It exists but we only set it if merge_request_diff_commits_partition FF is enabled (which is disabled right now). Since this partitioned table is partitioned by project_id and is required, we don't want to insert when project_id is not set.

We also do not want to insert to this table if merge_request_commits_metadata_id isn't set because that means the old record isn't deduplicated.

tag:gitlab.com,2026-03-16:5206356030 Patrick Bajao commented on merge request !226992 at GitLab.org / GitLab 2026-03-16T04:21:33Z patrickbajao Patrick Bajao [email protected]

Yup, we don't update merge_request_diff_commits records.

tag:gitlab.com,2026-03-16:5206326680 Patrick Bajao commented on merge request !227418 at GitLab.org / GitLab 2026-03-16T04:03:56Z patrickbajao Patrick Bajao [email protected]

Hi @egrieff, would you be able to review this tiny MR? 🙏 Made this change based on the suggestion made in !226992 (comment 3161137470). I know you have the busy status but I feel it's better to inform you this way about related changes to MRDC dedup/partition work. But please let me know if I need to find another reviewer. 🙇

Hi @krasio, this is related to the suggestion you made in another MR. This is another table wherein we partition by project_id but it's not a sequence in the table. We should also use projects_id_seq as sequence name in this case. Can you please review this as well? 🙏

tag:gitlab.com,2026-03-16:5206323126 Patrick Bajao commented on merge request !227418 at GitLab.org / GitLab 2026-03-16T04:02:38Z patrickbajao Patrick Bajao [email protected]

@GitLabDuo This table is being partitioned by project_id so we need to check the sequence for that which is projects_id_seq.

tag:gitlab.com,2026-03-16:5206315159 Patrick Bajao opened merge request !227418: Set sequence_name for commits metadata partition config at GitLab.org / GitLab 2026-03-16T03:58:55Z patrickbajao Patrick Bajao [email protected]

What does this MR do and why?

The merge_request_commits_metadata table is being partitioned by project_id but it's not a sequence in the table. We need to set the sequence_name to support cell-specific ID ranges.

MR acceptance checklist

Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

tag:gitlab.com,2026-03-16:5206312350 Patrick Bajao pushed new project branch pb-merge-request-commits-metadata-projects-id-seq at GitLab.org / GitLab 2026-03-16T03:57:25Z patrickbajao Patrick Bajao [email protected]

Patrick Bajao (cc74a13d) at 16 Mar 03:57

Set sequence_name for commits metadata partition config

tag:gitlab.com,2026-03-16:5206297551 Patrick Bajao commented on merge request !226992 at GitLab.org / GitLab 2026-03-16T03:46:46Z patrickbajao Patrick Bajao [email protected]

Thanks @krasio! I made the change in !226992 (0bbaf218). I also had to change milestones to 18.11 since we're now in 18.11. As of this writing, pipeline is still running but DB pipeline testing comment has been updated: !226992 (comment 3153019297). Can you please take another look? 🙏

@egrieff @tskorupa-gl since I pushed new changes, your approvals were reset again. Can you please re-review.