@manuelgrabowski @fabiopitino @rutshah @dhershkovitch I wanted to raise this as an additional performance-oriented motivation for prioritizing the duplicate pipeline problem; solving it wouldn't just fix the functional issues, but would also eliminate a meaningful amount of wasted infrastructure work on pipelines we end up discarding.
As we know, this problem has broad impact:
I want to highlight another perspective of this problem; performance (resource waste): Duplicate pipelines increase the compute and infrastructure costs.
Beyond the functional issues above, there's a significant infrastructure performance cost we're paying unnecessarily.
Looking at the pipeline creation chain in Ci::CreatePipelineService, the SEQUENCE processes steps in this order (simplified):
1. Build
2. Validate::Abilities
3. Validate::Repository
4. Build::Associations
5. Limit::RateLimit
6. Validate::SecurityOrchestrationPolicy
7. AssignPartition
8. PipelineExecutionPolicies::EvaluatePolicies
9. Skip
10. Validate::Config
11. Config::Content ← Fetches CI config content
12. Config::Process ← THE EXPENSIVE STEP: parses YAML, resolves includes, expands variables, creates jobs
13. StopLinting
14. Validate::AfterConfig
15. RemoveUnwantedChatJobs
16. SeedBlock
17. EvaluateWorkflowRules ← DUPLICATE PIPELINE DETECTION HAPPENS HERE (too late!)
18. Seed
The key problem: EvaluateWorkflowRules runs after Config::Process. Config::Process is an expensive step in the chain; it parses the full YAML, fetches all include: files (potentially many Gitaly/network calls to other projects, remote URLs, components), expands variables, resolves extends:, etc. We have a 30-second timeout on config processing, and recently we've been hitting it more frequently due to infra issues.
This means that for every duplicate pipeline that will ultimately be rejected by workflow:rules, we are:
Solving this would free up resources we use.
Furkan Ayhan (0df72fd2) at 17 Mar 10:06
Furkan Ayhan (9c98287d) at 17 Mar 10:05
Merge branch 'revert-4b9ecfc2' into 'master'
... and 1 more commit
Slack message (Internal) about possible incidents in customer pipelines because of this change.
We are reverting this and allowing to use job names for more than 255 chars.
master resolution DRI.~"regression:*" label.~"regression:*" label.Slack message (Internal) about possible incidents in customer pipelines because of this change.
We are reverting this and allowing to use job names for more than 255 chars.
master resolution DRI.~"regression:*" label.~"regression:*" label.Furkan Ayhan (18e32374) at 17 Mar 09:26
Fix undercoverage
from https://docs.gitlab.com/development/database/query_recorder/#recommended-pattern =>
warm-up: Handles one-time initialization queries that do not repeat on subsequent requests, such as schema loading.
do you mean whether component_name is present or not? I haven't thought on it so much but this logic was already here before this MR and we did not see a problem with it :)
@lauraXD Yes! That's because our goal here is to see there is a call to this method with string keys. If we see a single entry, we'll disable the FF and fix the call so that we'll not pass a hash with string keys.
Furkan Ayhan (69a55785) at 16 Mar 16:39
Optimize CI component fetching with request-scoped caching and batc...
@skarbek Thank you, I understand the point.
My initial assumption was that we should always wait for the pipeline/job to finish before checking the output; that's the typical pattern we (or I) follow in most QA specs.
Are you suggesting that we should check the job trace without waiting for the job to complete? So instead of wait_until the pipeline finishes, we would do something like wait_until all expect(trace).to conditions are met?
@mmichaux-ext this is the only one left
thanks!
yes, please
@manuelgrabowski I don't think we can do anything about it. As our customers suffer from the incidents, our tests do, too.
@treagitlab Maybe, we should not quarantine these kinds of tests. WDYT?