[release/v7.4] Split TPN manifest and Component Governance manifest (#26891)#26955
Conversation
Co-authored-by: Copilot <[email protected]>
There was a problem hiding this comment.
Pull request overview
This PR backports PR #26891 to the release/v7.4 branch, splitting the single tools/cgmanifest.json into two separate manifests: tools/cgmanifest/main/cgmanifest.json (for Component Governance / component detection) and tools/cgmanifest/tpn/cgmanifest.json (for Third Party Notices generation). It also adds new ClearlyDefined tooling including cache persistence, a package version search function, and a script to find the last harvested version of a NuGet package.
Changes:
- New manifests:
tools/cgmanifest/main/cgmanifest.jsonandtools/cgmanifest/tpn/cgmanifest.jsonreplace the oldtools/cgmanifest.json - All path references to
cgmanifest.jsonupdated in scripts and CI/pipeline YAML files to use the new split-manifest locations - New ClearlyDefined module functions for cache persistence (
Save-/Import-ClearlyDefinedCache, etc.), package version search (Search-ClearlyDefined,Get-ClearlyDefinedPackageVersions), and enhanced retry/error handling; newFind-LastHarvestedVersion.ps1script
Reviewed changes
Copilot reviewed 11 out of 12 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
tools/cgmanifest/main/cgmanifest.json |
New CG manifest (moved from tools/cgmanifest.json) with v7.4 package versions |
tools/cgmanifest/tpn/cgmanifest.json |
New TPN manifest with harvested-only packages filtered for notice generation |
tools/packaging/packaging.psm1 |
Path updated to tools/cgmanifest/main/cgmanifest.json |
tools/findMissingNotices.ps1 |
New -ForceHarvestedOnly parameter; new TPN manifest update logic |
tools/clearlyDefined/src/ClearlyDefined/ClearlyDefined.psm1 |
New cache persistence functions, Search-ClearlyDefined, Get-ClearlyDefinedPackageVersions, retry/error handling improvements |
tools/clearlyDefined/Find-LastHarvestedVersion.ps1 |
New script to find the last harvested version of a NuGet package |
tools/clearlyDefined/ClearlyDefined.ps1 |
Updated cgmanifest path |
.vsts-ci/windows-arm64.yml |
PR path filter updated to tools/cgmanifest/* |
.vsts-ci/psresourceget-acr.yml |
PR path filter updated to tools/cgmanifest/* |
.vsts-ci/mac.yml |
PR path filter updated to tools/cgmanifest/* |
.vsts-ci/linux-internal.yml |
PR path filter updated to tools/cgmanifest/* |
.pipelines/templates/compliance/generateNotice.yml |
Component Detection now scans only tools\cgmanifest\tpn |
| if ($retryIntervalSec -lt 300) { | ||
| $retryIntervalSec++ | ||
| } |
There was a problem hiding this comment.
The $retryIntervalSec variable is declared at module scope and incremented inside the catch blocks of both Search-ClearlyDefined (line 215) and Get-ClearlyDefinedData (line 357). Because this is a module-level variable, every failure in any call permanently increases the retry interval for all subsequent API calls within the same PowerShell session. For example, after processing hundreds of packages, a single transient failure will raise the retry interval from 90 seconds to 91, 92, etc., progressively slowing all future requests for the lifetime of the module.
This mutation is also redundant: Invoke-RestMethod already handles retries internally with -MaximumRetryCount $maxRetryCount -RetryIntervalSec $retryIntervalSec. By the time the catch block is reached, all retries have been exhausted. The intent seems to be to implement progressive backoff across distinct calls, but this approach uses a shared module-level variable rather than a local variable, and does not reset to the baseline after successes.
Consider either: (a) using a local variable for per-call backoff, or (b) using a module-level variable that resets on success, or (c) removing the manual increment and relying solely on the built-in -MaximumRetryCount/-RetryIntervalSec retry mechanism of Invoke-RestMethod.
| if ($retryIntervalSec -lt 300) { | ||
| $retryIntervalSec++ | ||
| } |
There was a problem hiding this comment.
Same issue as in Search-ClearlyDefined: the module-level $retryIntervalSec variable is incremented inside the catch block (line 357), permanently raising the retry interval for all subsequent calls in the same session. After a transient failure, all future API calls will use a longer retry interval, even when those calls succeed immediately. This is also redundant with the built-in retry handling of Invoke-RestMethod.
| Start-job -ScriptBlock { | ||
| Invoke-WebRequest -Method Post -Uri 'https://api.clearlydefined.io/harvest' -Body $using:body -ContentType 'application/json' -MaximumRetryCount $using:maxRetryCount -RetryIntervalSec $using:retryIntervalSec | ||
| } |
There was a problem hiding this comment.
Start-ClearlyDefinedHarvest now wraps Invoke-WebRequest in Start-Job, which creates a background job and emits the job object to the output pipeline. The callers in ClearlyDefined.ps1 (line 42) and findMissingNotices.ps1 (line 478) discard this output by not capturing or piping it further.
Because the job objects are never assigned, waited on, or cleaned up, they accumulate as orphaned background jobs in the PowerShell session with no way to check their success or failure. If the HTTP request fails (after all retries), the error is silently lost. Additionally, these jobs bypass the error handling that was previously provided by (Invoke-WebRequest ...).Content.
If the intent is fire-and-forget async harvesting, the callers should at minimum save and later clean up the jobs (e.g., with Remove-Job), or the function should document this fire-and-forget pattern explicitly. If the caller needs to know whether harvesting succeeded, Wait-Job and Receive-Job would be needed.
| # Build a lookup table of harvest status by package name + version | ||
| $harvestStatus = @{} | ||
| foreach ($item in $fullList) { | ||
| $key = "$($item.Name)|$($item.PackageVersion)" | ||
| $harvestStatus[$key] = $item.harvested |
There was a problem hiding this comment.
The $harvestStatus lookup table is built using $item.Name and $item.PackageVersion (line 377) where $item is an object returned by Get-ClearlyDefinedData. However, Get-ClearlyDefinedData outputs the raw ClearlyDefined REST API response objects (with added cachedTime, harvested, and harvestedResult properties). The ClearlyDefined API response does not include top-level Name or PackageVersion properties — the package name is in coordinates.name and the version is in coordinates.revision.
As a result, $item.Name and $item.PackageVersion will both be $null for every non-cached entry, and $harvestStatus will contain only the key "|" for each package (all entries sharing the same key, with the last overwriting previous ones). The lookup at line 400 ($harvestStatus.ContainsKey($key)) will never match a valid package key, so every package will fall through to the Find-LastHarvestedVersion path, making the harvest status check in lines 372-378 completely ineffective.
The same issue affects the second TPN update block at lines 515-517 where $item.Name and $item.PackageVersion are used to find matching new registrations.
| foreach ($item in $finalHarvestData) { | ||
| $matchingNewRegistration = $newRegistrations | Where-Object { | ||
| $_.Component.Nuget.Name -eq $item.Name -and | ||
| $_.Component.Nuget.Version -eq $item.PackageVersion |
There was a problem hiding this comment.
Same issue as lines 374-378: $item.Name and $item.PackageVersion are accessed on the ClearlyDefined API response objects, but those top-level properties don't exist in the response. The package name is in $item.coordinates.name and the version is in $item.coordinates.revision. As a result, $matchingNewRegistration will always be $null (no package will match), and the entire TPN manifest update logic in this block will produce an empty $tpnRegistrations list, causing the TPN manifest not to be saved.
…owerShell#26891) (PowerShell#26955) Co-authored-by: Travis Plunk <[email protected]> Co-authored-by: Copilot <[email protected]>
Backport of #26891 to release/v7.4
Triggered by @jshigetomi
Original CL Label: CL-BuildPackaging
/cc @PowerShell/powershell-maintainers
Impact
REQUIRED: Choose either Tooling Impact or Customer Impact (or both). At least one checkbox must be selected.
Tooling Impact
Customer Impact
Regression
REQUIRED: Check exactly one box.
This is not a regression.
Testing
Risk
REQUIRED: Check exactly one box.