A comprehensive tool for validating Snyk SCA (Software Composition Analysis) files against their actual repository sources. This validator uses an efficient join-based approach to match GitLab repositories with Snyk targets, ensuring that Snyk projects reference files that actually exist in the source repositories.
The validator uses a batch join approach for efficient validation:
- Builds a GitLab Repository Catalog: Lists all accessible GitLab repositories with their metadata (default branch, project path, etc.). By default, fetches all accessible repos; use
--gitlab-membership-onlyto restrict to membership repos, or--matched-repos-onlyfor optimized mode that only fetches repos in Snyk targets. - Collects Snyk Targets: Gathers all Snyk targets from specified organizations and normalizes their repository URLs
- Joins the Datasets: Matches GitLab repositories with Snyk targets using canonical repository keys
- Validates and Reports:
- Validates that Snyk-tracked files exist in GitLab repositories
- Identifies Snyk-supported files that aren't being tracked by Snyk
- Reports on stale Snyk targets (repositories no longer in GitLab)
- Reports on GitLab repositories with no Snyk coverage
- Comprehensive Coverage Analysis: Identifies matched repos, stale Snyk targets, and untracked GitLab repositories
- Flexible GitLab Fetching:
- Default: Fetches all accessible GitLab repos (complete visibility)
--gitlab-membership-only: Restricts to repos where token is a member--matched-repos-only: Optimized mode for large GitLab instances (only fetches repos in Snyk targets)
- Detailed Reporting: Generates comprehensive reports showing:
- Matched repositories with file validation results
- Stale Snyk targets (repositories no longer in GitLab)
- GitLab repositories with no Snyk coverage
- Snyk-supported files not being tracked
- Duplicate projects with KEEP/REMOVE recommendations
- CSV Export: Generate filterable CSV reports for duplicate projects
- Flexible Configuration: Supports different Snyk regions and GitLab instances
- Debug Mode: Optional detailed logging for troubleshooting
- Standard GitLab.com repositories
- Custom GitLab instances
- Local file paths (
file://and absolute paths) - SSH URLs (
git@host:owner/repo.git)
- Clone the repository:
git clone https://github.com/tsrobsworld/snyk-sca-validator
cd snyk_sca_validator- Install dependencies:
pip install -r requirements.txtValidate all organizations:
python3 snyk_sca_validator.py --snyk-token YOUR_SNYK_TOKENValidate a specific organization:
python3 snyk_sca_validator.py --snyk-token YOUR_SNYK_TOKEN --org-id ORG_IDValidate all organizations in a group:
python3 snyk_sca_validator.py --snyk-token YOUR_SNYK_TOKEN --group-id GROUP_IDWith GitLab token for private repositories:
python3 snyk_sca_validator.py --snyk-token YOUR_SNYK_TOKEN --gitlab-token YOUR_GITLAB_TOKENGenerate CSV report for duplicate projects:
python3 snyk_sca_validator.py --snyk-token YOUR_SNYK_TOKEN --org-id ORG_ID --duplicates-csv duplicates.csvOptimized mode (only fetch repos in Snyk targets):
python3 snyk_sca_validator.py --snyk-token YOUR_SNYK_TOKEN --org-id ORG_ID --gitlab-token YOUR_GITLAB_TOKEN --matched-repos-onlyRestrict to membership repos only:
python3 snyk_sca_validator.py --snyk-token YOUR_SNYK_TOKEN --org-id ORG_ID --gitlab-token YOUR_GITLAB_TOKEN --gitlab-membership-onlyCustom GitLab instance:
python3 snyk_sca_validator.py --snyk-token YOUR_SNYK_TOKEN --gitlab-url https://gitlab.company.comDifferent Snyk region:
python3 snyk_sca_validator.py --snyk-token YOUR_SNYK_TOKEN --snyk-region SNYK-EU-01Custom output report:
python3 snyk_sca_validator.py --snyk-token YOUR_SNYK_TOKEN --output-report my_report.txtDebug logging for troubleshooting:
python3 snyk_sca_validator.py --snyk-token YOUR_SNYK_TOKEN --debug| Option | Description | Required | Default |
|---|---|---|---|
--snyk-token |
Snyk API token | Yes | - |
--org-id |
Specific Snyk organization ID (mutually exclusive with --group-id) | No | All organizations |
--group-id |
Snyk group ID to process all organizations in group (mutually exclusive with --org-id) | No | - |
--snyk-region |
Snyk API region | No | SNYK-US-01 |
--gitlab-token |
GitLab API token for private repos | No | - |
--gitlab-url |
GitLab instance URL | No | https://gitlab.com |
--gitlab-membership-only |
Only fetch GitLab repos where token is a member (default: fetch all accessible repos) | No | False |
--matched-repos-only |
Optimized mode: Only fetch GitLab repos that are in Snyk targets. Requires --gitlab-token. Assumes all Snyk target URLs point to GitLab. | No | False |
--output-report |
Custom report filename | No | batch_report.txt |
--duplicates-csv |
Generate CSV file with duplicate projects (KEEP and REMOVE) | No | - |
--timeout |
HTTP request timeout in seconds | No | 60 |
--max-retries |
Maximum retry attempts for failed requests | No | 3 |
--no-ssl-verify |
Disable SSL certificate verification for GitLab API calls | No | False |
--skip-org-validation |
Skip Snyk org access validation and fetch targets directly | No | False |
--debug |
Enable debug logging for troubleshooting | No | False |
SNYK-US-01(default): https://api.snyk.ioSNYK-US-02: https://api.us.snyk.ioSNYK-EU-01: https://api.eu.snyk.ioSNYK-AU-01: https://api.au.snyk.io
The validator generates a comprehensive text report (batch_report.txt by default) containing:
Summary Section:
- Total number of matched repositories
- Number of stale Snyk targets (repositories no longer in GitLab)
- Number of GitLab repositories with no Snyk coverage
Snyk-Only (Stale Targets) Section:
- Lists repositories that have Snyk targets but are no longer accessible in GitLab
- Useful for cleaning up old Snyk projects
GitLab-Only (No Snyk Targets) Section:
- Lists GitLab repositories that have no Snyk coverage
- Useful for identifying repositories that should be imported into Snyk
Matched Repositories Section:
- For each matched repository:
- Number of files tracked by Snyk
- Number of Snyk-supported files found in the repository
- List of supported files not being tracked by Snyk (potential missing projects)
Duplicate Projects Section:
- Lists duplicate Snyk projects detected within the same target
- Shows which project to KEEP (newest) and which to REMOVE (stale duplicates)
- For Maven projects, includes artifactId validation:
- Expected artifactId (from project name suffix after ':')
- Found artifactId (from pom.xml in repository)
- Match status (MATCH/MISMATCH)
- Discovered pom.xml paths and their artifactIds
When using the --duplicates-csv flag, a CSV file is generated with all duplicate projects in a filterable format. The CSV includes:
- Action: KEEP or REMOVE
- Unique Identifier: The part of the project name after ':'
- Project Name: Full Snyk project name
- Project ID: Snyk project UUID
- Type: Project type (maven, npm, etc.)
- Created Date: When the project was created
- Org ID: Snyk organization ID
- Project URL: Direct link to the Snyk project
- Expected ArtifactId: For Maven projects, the expected artifactId
- Found ArtifactId: The actual artifactId found in pom.xml
- ArtifactId Match Status: MATCH or MISMATCH
- Reason: Why the project should be kept or removed
This CSV format makes it easy to filter and sort duplicate projects for review and cleanup.
https://gitlab.com/owner/repohttps://gitlab.com/owner/repo/tree/branchhttps://gitlab.com/owner/repo/-/tree/branchhttps://gitlab.com/owner/repo/-/blob/branch/file- Custom instances:
https://gitlab.company.com/owner/repo
- Local paths:
file:///path/to/repoor/path/to/repo - SSH URLs:
[email protected]:owner/repo.git - GitHub:
https://github.com/owner/repo - Bitbucket:
https://bitbucket.org/owner/repo
The --debug flag enables comprehensive debug logging to help troubleshoot repository mapping issues. When enabled, the script will log:
- URL Parsing: Detailed information about how repository URLs are parsed and which patterns match/fail
- Repository Mapping: Shows the mapping from Snyk target URLs to parsed repository information
- API Calls: Complete request/response details for all API calls to Snyk, GitLab, GitHub, and Bitbucket
- File Validation: Shows which repository each file validation is being performed against
- Missing File Detection: Details about what files are found in repos vs. what Snyk is tracking
Identify Snyk projects that reference files no longer present in repositories, allowing you to clean up stale projects.
After migrating repositories or changing file structures, validate that Snyk projects still reference the correct files.
Ensure that all SCA files tracked by Snyk are actually present in the source repositories for compliance purposes.
Identify discrepancies between Snyk project configurations and actual repository contents.
- File Content Validation: Currently only checks file existence, not content differences
- Branch Detection: Uses default branch detection logic; may need adjustment for specific workflows
- Private Repository Access: Requires appropriate API tokens for private repositories
- Rate Limiting: Subject to API rate limits of the respective platforms
- Python 3.7+
- Snyk API token
- GitLab API token (for private GitLab repositories)
- Internet access for API calls
requests: HTTP library for API callsargparse: Command-line argument parsingcsv: CSV file handlingjson: JSON data processingre: Regular expressions for URL parsingos: Operating system interfacedatetime: Date and time handling
- Added duplicate project detection based on name patterns
- Implemented Maven artifactId validation for duplicate projects
- Added CSV export for duplicate projects (
--duplicates-csvflag) - Added support for processing all organizations in a group (
--group-id) - Added
--skip-org-validationflag for organizations with validation endpoint issues - Improved GitLab API pagination for nested file discovery
- Enhanced pom.xml discovery with recursive repository scanning
- Added support for CLI projects
- Enhanced URL parsing for multiple platforms
- Improved API efficiency with source filtering
- Added support for GitHub and Bitbucket repositories
- Better error handling and logging
- Initial release with GitLab integration support
- Basic file validation functionality
- CSV and text report generation
This repository is closed to public contributions.
This project is maintained by Snyk and is not accepting external contributions at this time.
If you are a Snyk employee contributing to this project:
All contributors must have a valid Contributor License Agreement (CLA) on file with Snyk. This protects both the contributor and Snyk's intellectual property.
- Fork the repo and create your branch from
main. - If you've added code that should be tested, add tests.
- If you've changed APIs, update the documentation.
- Ensure the test suite passes.
- Make sure your code lints.
- Issue that pull request!
- Follow the existing code style
- Add appropriate tests for new functionality
- Update documentation for any API changes
- Ensure all tests pass before submitting
For internal issue reporting and tracking, please use the project's issue tracker.
For security issues, please follow the process outlined in SECURITY.md.
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
This tool is not officially affiliated with Snyk. It's a community-driven utility for validating Snyk SCA file tracking. Use at your own discretion and ensure you comply with Snyk's terms of service and API usage policies.