fix: preserve protocol (e.g. ssh:// or https://) and port in dependency URL#665
Conversation
Fixes microsoft#661. When a user specifies an explicit ssh:// URL in apm.yml (e.g. ssh://[email protected]:7999/project/repo.git), APM was silently stripping the port during ssh:// → git@ normalisation and then falling back to https:// after the portless SSH clone attempt failed. Store the original ssh:// string in DependencyReference.original_ssh_url before normalisation, and pass it verbatim to git clone in _clone_with_fallback so the port and protocol are preserved.
sergio-sisternes-epam
left a comment
There was a problem hiding this comment.
Clean, focused fix — the "preserve original URL, don't fix normalization" approach is the right call. Minimal risk, no side effects on existing URL handling. The PR description is excellent with clear root cause analysis and before/after examples.
The test coverage is thorough: custom port, standard ssh, https exclusion, git@ exclusion, and clone URL ordering all covered.
One request before merge: please add a CHANGELOG entry under ## [Unreleased]:
### Fixed
- Preserve `ssh://` dependency URLs with custom ports for Bitbucket Datacenter repositories instead of silently falling back to HTTPS (#661)
Thanks for another solid contribution!
There was a problem hiding this comment.
Pull request overview
This PR fixes cloning failures for self-hosted Git servers (notably Bitbucket Datacenter) when dependencies are specified with an explicit ssh://...:PORT/... URL by preserving the original SSH URL and using it for the SSH clone attempt.
Changes:
- Add
DependencyReference.original_ssh_urland populate it when parsingssh://dependency strings. - Update
_clone_with_fallbackto preferoriginal_ssh_urlfor the SSH clone method. - Add regression tests covering parsing and clone URL selection for
ssh://URLs with custom ports.
Show a summary per file
| File | Description |
|---|---|
src/apm_cli/models/dependency/reference.py |
Adds original_ssh_url and captures it during parsing for ssh:// inputs. |
src/apm_cli/deps/github_downloader.py |
Uses original_ssh_url directly for the SSH clone attempt to preserve custom ports. |
tests/unit/test_generic_git_urls.py |
Adds parsing regression tests for Bitbucket Datacenter-style ssh:// URLs with ports. |
tests/unit/test_auth_scoping.py |
Adds downloader regression tests to assert the SSH clone attempt uses the original ssh:// URL verbatim. |
Copilot's findings
- Files reviewed: 4/4 changed files
- Comments generated: 3
Raw dependency strings can include '#ref' or '@alias' suffixes which are not valid git clone URL syntax. Strip these from original_ssh_url at capture time so the port-preserving clone path does not silently fail and re-trigger the https fallback. Also narrows the over-broad except clause in regression tests from (RuntimeError, Exception) to (RuntimeError, GitCommandError), and adds the CHANGELOG entry requested in review.
|
@edenfunf looks like we got a conflict now. We will need that solved before we can merge the PR. Can you review this please? |
|
@sergio-sisternes-epam Conflict has been resolved — merged the latest main into the branch. Ready for review. |
a5917bf to
c75b2e6
Compare
|
@edenfunf the lockfile needs to track provenance URL with the port too in this case. We need to test this also works. |
When a dependency is specified as ssh://host:port/repo, the lockfile was storing only host and repo_url, silently dropping the custom port. Add original_ssh_url to LockedDependency so the verbatim ssh:// URL (including any non-standard port) is written to apm.lock.yaml. from_dependency_ref copies it from DependencyReference; to_dict/from_dict round-trip it correctly. Add three regression tests to TestBitbucketDatacenterSSH verifying that the lockfile captures the port, that the field survives a to_dict/from_dict round-trip, and that HTTPS deps do not set the field.
|
@danielmeppiel additional comments from my side:
The lockfile should record what was used (respecting the installer's choice), but clone time should be forgiving: try the lockfile protocol first, fall back to the alternative if it fails. This way:
The only trade-off is slightly noisier lockfile diffs/conflicts when two devs install the same package with different protocols, but that's minor compared to broken installs. We might want to update the documentation and recommend teams to use the same clone method to avoid this inconvenience. |
|
@danielmeppiel Weighing in on your two questions: 1. Normalise SCP to 2. Protocol fallback for team UX? I think we should fall back. The scenario you described is real -- one dev installs with SSH, another teammate doesn't have SSH keys configured, and they're stuck. The lockfile should record what was used (respecting the installer's choice), but clone time should be forgiving: try the lockfile protocol first, fall back to the alternative if it fails. This way:
The only trade-off is slightly noisier lockfile diffs when two devs install the same package with different protocols, but that's minor compared to broken installs. |
|
Thanks @danielmeppiel and @sergio-sisternes-epam for the thoughtful discussion. Here's my planned approach before I start reworking the PR: Lockfile format
Clone-time protocol fallback
Field changes
Tests
Does this match what you both had in mind? Happy to adjust before I start — otherwise I'll push the rework shortly. |
Maintainer verdict (port preservation design)@edenfunf @sergio-sisternes-epam thanks for the thoughtful back-and-forth. I ran this through our python-architect persona and a security review. Here's where I land — and I'd like to redirect the rework before you push v3. TL;DRReject both v2 ( Why not v2
Why not v3It's over-engineered for what is a port-preservation bug. Specifically:
What to build instead1. Single new field# src/apm_cli/models/dependency/reference.py
port: Optional[int] = None # Non-standard SSH/HTTPS port
# src/apm_cli/deps/lockfile.py — LockedDependency
port: Optional[int] = None2. Stop normalizing
|
| File | Change |
|---|---|
src/apm_cli/models/dependency/reference.py |
Add port, replace _normalize_ssh_protocol_url with a real ssh:// parser via urlparse |
src/apm_cli/utils/github_host.py |
build_ssh_url / build_https_clone_url accept port |
src/apm_cli/deps/lockfile.py |
Add port, serialize when non-default, validate on deserialize |
src/apm_cli/deps/github_downloader.py |
Thread dep_ref.port through _build_repo_url (no fallback chain changes) |
Closes the same root cause for both #661 (SSH) and #731 (HTTPS).
@edenfunf — happy to discuss any specific tradeoff, but this is the direction I'd like the PR to take. Let me know if anything is unclear before you start.
|
@edenfunf here's a concrete checklist to land this. Sing out if any item is unclear and I'll expand. Code
TestsTwo existing test files are the right home — please add to them rather than creating a new file:
DocsThese need updating in the same PR (our convention is code + docs land together):
CHANGELOG
Out of scope (separate follow-up)
Validation before pushinguv run pytest tests/unit/test_generic_git_urls.py tests/unit/test_lockfile.py tests/unit/test_auth_scoping.py -x
uv run pytest tests/unit tests/test_console.py -x # full unit suite, must stay greenOnce all of the above is in, I'll do another review pass and we can ship. Thanks for sticking with this — the original bug report is going to make a lot of self-hosted users happy. |
Adopt the maintainer's preferred design: capture non-default git ports as a single port: Optional[int] field on DependencyReference and thread it through the URL builders, rather than stashing the original ssh:// URL for verbatim replay. - reference.py: replace _normalize_ssh_protocol_url with a real urlparse based _parse_ssh_protocol_url; drop original_ssh_url; update to_canonical / get_identity / to_github_url / __str__ to render host:port when port is set - github_host.py: build_ssh_url and build_https_clone_url accept an optional port; build_ssh_url switches from SCP shorthand to ssh:// when a port is supplied (SCP syntax cannot carry a port) - lockfile.py: serialize port only when non-None; defensive int + range check (1-65535) on read to reject tampered values - github_downloader.py: thread dep_ref.port into _build_repo_url so all three fallback methods (auth-HTTPS, SSH, plain HTTPS) emit URLs that retain the port; same port is reused across protocols - tests: new TestCustomPortParsing covering scp-shorthand/ssh/https/http parsing, canonical rendering, identity equivalence, clone URL building, and lockfile round-trip; TestCloneWithFallbackPortPreservation verifies both attempts keep the port - docs: update dependencies.md, manifest-schema.md, private-packages.md with custom-port grammar and examples - CHANGELOG: note fix for issues microsoft#661 and microsoft#731 Fixes microsoft#661, microsoft#731
…acenter' into fix/preserve-ssh-url-bitbucket-datacenter
|
Hi @danielmeppiel — thanks again for the detailed walkthrough. Summary of what landed: Design
Defensive readsLockfile Small additions beyond the checklist
Tests
Validation
Docs & CHANGELOG
One trade-off I'd like to flag explicitly, since Sergio raised it: same port across both protocols on fallback. For Bitbucket Datacenter the defaults are 7999 (SSH) and 7990 (HTTPS), so a failed SSH clone will fall back to an HTTPS URL that speaks SSH — loud failure, not silent misdirection. I'd rather that than silently dropping the port and hitting the default 443 on a different host surface, but happy to revisit if you'd prefer an explicit per-protocol port field. Let me know. Ready for another pass whenever you have time. |
APM Review Panel -- PR #665Reviewed by the APM Review Panel: a 7-agent multi-disciplinary review (architecture, auth, CLI logging, DevX UX, supply-chain security, growth, strategy) wired into Headline: Strong, well-tested fix for a real enterprise pain. One blocker (Rule 4 doc sync), one nice-to-have follow-up, several green lights. Recommend merge once the Rule 4 fix is added. [auth-expert]
[python-architect]
[supply-chain-security-expert]
[devx-ux-expert]
[cli-logging-expert]
[oss-growth-hacker] -- side-channel
[apm-ceo] -- final callDecision: APPROVE with one required change before merge.
Required before merge:
Recommended follow-ups (separate issues, not blockers): Quality gates
Once the Rule 4 fix lands, this PR is ready to merge. This review was produced by an automated multi-agent panel. Findings flagged as |
The starlight docs (docs/guides/dependencies.md, reference/manifest-schema.md, guides/private-packages.md) gained the custom-port grammar and examples in the earlier refactor commit, but the agent-consumed skill resource was not updated alongside them. Agents installing the apm-guide skill therefore would not learn about the :port syntax or the SCP-shorthand caveat. Add a "Custom git ports" sub-section documenting: - ssh:// and https:// URL forms with :port - SCP shorthand cannot carry a port (path separator collision) - port preservation across SSH -> HTTPS fallback - lockfile port: <int> (1-65535), emitted only when non-default - port is a transport detail, not part of package identity Also add a custom-port example to the object form block so the pattern is discoverable from both string and object surfaces.
…bitbucket-datacenter # Conflicts: # CHANGELOG.md # tests/test_lockfile.py
|
Thanks for the panel review. Blocking fix landed (Rule 4 doc sync) Updated
Also added a custom-port example in the object-form block so the pattern is discoverable from both the string and object surfaces. Pushed as a separate commit on top of the existing branch — rebase-friendly if you prefer a squash. Follow-ups (not in this PR, per the CEO decision) I'll open these as separate issues after merge:
Cross-protocol port trade-off (raised by @sergio-sisternes-epam) Worth being upfront about, since the panel flagged the BB DC case indirectly: for Bitbucket Datacenter the defaults are 7999 (SSH) and 7990 (HTTPS), so an SSH → HTTPS fallback built with same-port will hit an SSH-only port and fail with a protocol error. The alternative (drop port, fall back to default 443) risks reaching a different service on the same hostname behind corporate reverse proxies. Both options are guesses. I picked the one that fails visibly rather than silently. Per-protocol port fields ( |
CodeQL flagged the `"host:port" in url` checks in
TestCloneWithFallbackPortPreservation as potential incomplete URL
substring sanitization (py/incomplete-url-substring-sanitization, High).
In a test assertion context it is a false positive -- we are not
sanitizing user input, we are verifying a URL our own builder emitted.
But the alert is worth clearing because:
1. urlparse-based assertions are stricter: they reject a URL that
happens to embed "bitbucket.example.com:7999" somewhere in the
path or query string but routes elsewhere.
2. Leaving High-severity code-scanning alerts on a PR obscures future
real findings.
Switch the HTTPS-fallback port assertions to parse the URL and compare
parsed.hostname / parsed.port separately. Also tighten the no-port
scp-shorthand case to check the path segment is not purely numeric,
rather than testing for arbitrary port substrings.
danielmeppiel
left a comment
There was a problem hiding this comment.
APM Review Panel -- approved
The blocking finding from the previous panel review is resolved:
- [+] 65d2675
docs(apm-guide): mirror :port syntax into shipped skill resource-- closes the Rule 4 doc-sync gap. The added section even goes beyond what the panel asked for: it documents that port is a transport detail and not part of package identity, which matches the implementation semantics exactly. Nice. - [+] a10fe33
test: assert URL host/port via urlparse instead of substring match-- bonus improvement. Substring matches like"host:7999" in urlare brittle (e.g. would also matchhost:7999something); urlparse is precise.
All CI green, mergeable. Net change: +443 / -98 across 12 files; no breaking changes.
Recommended follow-ups (separate issues, not blockers, restated for tracking):
- Actionable error for SCP-shorthand-with-port confusion (
git@host:7999/path-- devx-ux-expert finding) - Document credential-helper per-host limitation when ports differ (auth-expert finding)
Thanks @edenfunf -- this is exactly the kind of enterprise-self-host fix that compounds APM's positioning. Approved.
) (microsoft#788) * fix(auth): thread dep_ref.port into credential resolution (microsoft#785) DependencyReference.port was parsed and stored but never reached the authentication path, so same-host multi-port setups (e.g. Bitbucket Datacenter with distinct PATs on 7990 and 7991) could collapse onto a single credential entry via the AuthResolver cache. - HostInfo gains a port field and a display_name property for user-facing identifiers; classify_host stays transport-agnostic -- port does not influence host kind. - AuthResolver._cache key widens to (host, port, org). port=None keeps the pre-fix key for the >99% common case; custom-port setups get per-port entries. - TokenManager._credential_cache mirrors the widening, keyed by (host, port). A half-widened cache would return the wrong cached credential at the collapsed layer. - git credential fill sends host=host:port per gitcredentials(7). The credential protocol has no standalone port= attribute; a separate port= line is silently dropped by every helper. - resolve_for_dep threads dep_ref.port; downloader, validation, and sources call sites forward the port argument. - build_error_context surfaces host:port in the failure summary and appends an [i] hint when a port is set, pointing users at credential helpers that key by hostname only. Lockfile identity is unaffected -- DependencyReference.get_unique_key() still excludes port by design; the widened keys are purely in-process memoisation. Refs: microsoft#661, microsoft#665, microsoft#784 * fix(auth): address review feedback on PR microsoft#788 - _try_credential_fallback now reads host_info.host / host_info.port instead of the closure host / port variables. Behaviourally identical today (the resolver always passes the same host into both) but matches the symmetry of _resolve_token at the same indirection layer and removes a future drift hazard if try_with_fallback ever takes a pre-resolved AuthContext directly. - HostInfo.display_name and _format_credential_host now use ``port is not None`` instead of truthy checks. ``None`` is the sentinel for "no port" everywhere else in the resolver (build_error_context already used this form), so the truthy fallback was the only inconsistency. Functionally equivalent for valid TCP ports; tightens the contract against any future port=0 misuse. - Mirror the new "Custom-port hosts and per-port credentials" section from docs/getting-started/authentication.md into the in-repo APM usage skill at packages/apm-guide/.apm/skills/apm-usage/, so the skill APM ships with stays consistent with the public docs. - Anchor the build_error_context substring assertion in test_auth.py with the surrounding " on " / "." tokens. The previous "<host:port>" in <message> form was flagged by CodeQL's py/incomplete-url-substring-sanitization rule (false positive in test context, but the anchored form is also strictly more precise -- it pins the rendered position rather than just substring existence). Default-port normalisation (port=443/80/22 still rendering host:port) is intentionally left for a follow-up issue per the panel review -- that fix belongs at the parser or HostInfo.__post_init__ layer and needs its own contract test.
Fixes #661.
Root Cause
When APM parses a dependency URL that starts with
ssh://,_normalize_ssh_protocol_urlconverts it to thegit@host:pathformat and silently drops any custom port:_clone_with_fallbackthen tries SSH on the reconstructed URL (default port 22, wrong) which fails, and falls back to:This affects Bitbucket Datacenter and any other self-hosted git server that uses a non-standard SSH port.
Fix
Add an
original_ssh_urlfield toDependencyReference. Whenparse_from_strencounters a URL that begins withssh://, the verbatim string is stored in this field before the normalisation step.In
_clone_with_fallback, Method 2 (SSH) now usesdep_ref.original_ssh_urldirectly when it is set, instead of rebuilding the URL from host + repo_ref (which loses the port):Before
After
Changes
src/apm_cli/models/dependency/reference.pyoriginal_ssh_urldataclass field; capture it inparse_from_strbefore normalisation; pass to constructorsrc/apm_cli/deps/github_downloader.pydep_ref.original_ssh_urlverbatim in SSH clone attempt when presenttests/unit/test_generic_git_urls.pyTestBitbucketDatacenterSSHclass with 5 regression testsRegression Protection
Added
TestBitbucketDatacenterSSHcovering:ssh://with custom port stores the original URL inoriginal_ssh_urlrepo_urlfields are still correctly parsedssh://without a port also preserves the original URLoriginal_ssh_urlgit@shorthand URLs do not setoriginal_ssh_urlAll 3 763 existing unit tests continue to pass.