Skip to content

fix: preserve protocol (e.g. ssh:// or https://) and port in dependency URL#665

Merged
danielmeppiel merged 22 commits intomicrosoft:mainfrom
edenfunf:fix/preserve-ssh-url-bitbucket-datacenter
Apr 20, 2026
Merged

fix: preserve protocol (e.g. ssh:// or https://) and port in dependency URL#665
danielmeppiel merged 22 commits intomicrosoft:mainfrom
edenfunf:fix/preserve-ssh-url-bitbucket-datacenter

Conversation

@edenfunf
Copy link
Copy Markdown
Contributor

Fixes #661.

Root Cause

When APM parses a dependency URL that starts with ssh://, _normalize_ssh_protocol_url converts it to the git@host:path format and silently drops any custom port:

ssh://[email protected]:7999/project/repo.git
→ [email protected]:project/repo.git   (port 7999 gone)

_clone_with_fallback then tries SSH on the reconstructed URL (default port 22, wrong) which fails, and falls back to:

https://bitbucket.domain.ext/project/repo   ← not what the user asked for

This affects Bitbucket Datacenter and any other self-hosted git server that uses a non-standard SSH port.

Fix

Add an original_ssh_url field to DependencyReference. When parse_from_str encounters a URL that begins with ssh://, the verbatim string is stored in this field before the normalisation step.

In _clone_with_fallback, Method 2 (SSH) now uses dep_ref.original_ssh_url directly when it is set, instead of rebuilding the URL from host + repo_ref (which loses the port):

if dep_ref and dep_ref.original_ssh_url:
    ssh_url = dep_ref.original_ssh_url      # exact user-supplied URL
else:
    ssh_url = self._build_repo_url(...)     # existing behaviour

Before

ssh://[email protected]:7999/project/repo.git
→ git clone ... https://bitbucket.domain.ext/project/repo   ❌

After

ssh://[email protected]:7999/project/repo.git
→ git clone ... ssh://[email protected]:7999/project/repo.git   ✅

Changes

File Change
src/apm_cli/models/dependency/reference.py Add original_ssh_url dataclass field; capture it in parse_from_str before normalisation; pass to constructor
src/apm_cli/deps/github_downloader.py Use dep_ref.original_ssh_url verbatim in SSH clone attempt when present
tests/unit/test_generic_git_urls.py New TestBitbucketDatacenterSSH class with 5 regression tests

Regression Protection

Added TestBitbucketDatacenterSSH covering:

  • ssh:// with custom port stores the original URL in original_ssh_url
  • Host and repo_url fields are still correctly parsed
  • ssh:// without a port also preserves the original URL
  • HTTPS URLs do not set original_ssh_url
  • git@ shorthand URLs do not set original_ssh_url

All 3 763 existing unit tests continue to pass.

Fixes microsoft#661. When a user specifies an explicit ssh:// URL in apm.yml
(e.g. ssh://[email protected]:7999/project/repo.git), APM was
silently stripping the port during ssh:// → git@ normalisation and then
falling back to https:// after the portless SSH clone attempt failed.

Store the original ssh:// string in DependencyReference.original_ssh_url
before normalisation, and pass it verbatim to git clone in
_clone_with_fallback so the port and protocol are preserved.
Copy link
Copy Markdown
Collaborator

@sergio-sisternes-epam sergio-sisternes-epam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean, focused fix — the "preserve original URL, don't fix normalization" approach is the right call. Minimal risk, no side effects on existing URL handling. The PR description is excellent with clear root cause analysis and before/after examples.

The test coverage is thorough: custom port, standard ssh, https exclusion, git@ exclusion, and clone URL ordering all covered.

One request before merge: please add a CHANGELOG entry under ## [Unreleased]:

### Fixed
- Preserve `ssh://` dependency URLs with custom ports for Bitbucket Datacenter repositories instead of silently falling back to HTTPS (#661)

Thanks for another solid contribution!

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes cloning failures for self-hosted Git servers (notably Bitbucket Datacenter) when dependencies are specified with an explicit ssh://...:PORT/... URL by preserving the original SSH URL and using it for the SSH clone attempt.

Changes:

  • Add DependencyReference.original_ssh_url and populate it when parsing ssh:// dependency strings.
  • Update _clone_with_fallback to prefer original_ssh_url for the SSH clone method.
  • Add regression tests covering parsing and clone URL selection for ssh:// URLs with custom ports.
Show a summary per file
File Description
src/apm_cli/models/dependency/reference.py Adds original_ssh_url and captures it during parsing for ssh:// inputs.
src/apm_cli/deps/github_downloader.py Uses original_ssh_url directly for the SSH clone attempt to preserve custom ports.
tests/unit/test_generic_git_urls.py Adds parsing regression tests for Bitbucket Datacenter-style ssh:// URLs with ports.
tests/unit/test_auth_scoping.py Adds downloader regression tests to assert the SSH clone attempt uses the original ssh:// URL verbatim.

Copilot's findings

  • Files reviewed: 4/4 changed files
  • Comments generated: 3

Comment thread src/apm_cli/deps/github_downloader.py Outdated
Comment thread src/apm_cli/models/dependency/reference.py Outdated
Comment thread tests/unit/test_auth_scoping.py Outdated
Raw dependency strings can include '#ref' or '@alias' suffixes which are
not valid git clone URL syntax. Strip these from original_ssh_url at capture
time so the port-preserving clone path does not silently fail and re-trigger
the https fallback.

Also narrows the over-broad except clause in regression tests from
(RuntimeError, Exception) to (RuntimeError, GitCommandError), and adds the
CHANGELOG entry requested in review.
@sergio-sisternes-epam
Copy link
Copy Markdown
Collaborator

@edenfunf looks like we got a conflict now. We will need that solved before we can merge the PR. Can you review this please?

@edenfunf
Copy link
Copy Markdown
Contributor Author

@sergio-sisternes-epam Conflict has been resolved — merged the latest main into the branch. Ready for review.

danielmeppiel
danielmeppiel previously approved these changes Apr 14, 2026
@edenfunf edenfunf force-pushed the fix/preserve-ssh-url-bitbucket-datacenter branch from a5917bf to c75b2e6 Compare April 14, 2026 17:56
@danielmeppiel
Copy link
Copy Markdown
Collaborator

@edenfunf the lockfile needs to track provenance URL with the port too in this case. We need to test this also works.

When a dependency is specified as ssh://host:port/repo, the lockfile was
storing only host and repo_url, silently dropping the custom port.

Add original_ssh_url to LockedDependency so the verbatim ssh:// URL
(including any non-standard port) is written to apm.lock.yaml.
from_dependency_ref copies it from DependencyReference; to_dict/from_dict
round-trip it correctly.

Add three regression tests to TestBitbucketDatacenterSSH verifying that
the lockfile captures the port, that the field survives a to_dict/from_dict
round-trip, and that HTTPS deps do not set the field.
@sergio-sisternes-epam
Copy link
Copy Markdown
Collaborator

sergio-sisternes-epam commented Apr 18, 2026

@danielmeppiel additional comments from my side:

  1. Normalise SCP → ssh:// URL in lockfile? Agree 100%. SCP syntax (git@host:path) isn't a real URL — it's a Git convenience. Normalising to ssh://git@host/path gives us a proper URL that standard parsers can
    handle, while Git still accepts it. No information loss.

  2. Protocol fallback for team UX? I think we should fall back. The scenario you described is real — one dev installs with SSH, another teammate doesn't have SSH keys configured, and they're stuck.
    _clone_with_fallback already does this at install time, so extending that behaviour to lockfile-driven installs is consistent.

The lockfile should record what was used (respecting the installer's choice), but clone time should be forgiving: try the lockfile protocol first, fall back to the alternative if it fails. This way:

  • The lockfile is honest (auditability preserved)
  • Teammates aren't blocked (UX preserved)
  • Git's own insteadOf config still works for users who want to force a protocol globally

The only trade-off is slightly noisier lockfile diffs/conflicts when two devs install the same package with different protocols, but that's minor compared to broken installs. We might want to update the documentation and recommend teams to use the same clone method to avoid this inconvenience.

@sergio-sisternes-epam
Copy link
Copy Markdown
Collaborator

@danielmeppiel Weighing in on your two questions:

1. Normalise SCP to ssh:// URL in lockfile? Agree 100%. SCP syntax (git@host:path) isn't a real URL -- it's a Git convenience. Normalising to ssh://git@host/path gives us a proper URL that standard parsers can handle, while Git still accepts it. No information loss.

2. Protocol fallback for team UX? I think we should fall back. The scenario you described is real -- one dev installs with SSH, another teammate doesn't have SSH keys configured, and they're stuck. _clone_with_fallback already does this at install time, so extending that behaviour to lockfile-driven installs is consistent.

The lockfile should record what was used (respecting the installer's choice), but clone time should be forgiving: try the lockfile protocol first, fall back to the alternative if it fails. This way:

  • The lockfile is honest (auditability preserved)
  • Teammates aren't blocked (UX preserved)
  • Git's own insteadOf config still works for users who want to force a protocol globally

The only trade-off is slightly noisier lockfile diffs when two devs install the same package with different protocols, but that's minor compared to broken installs.

@edenfunf
Copy link
Copy Markdown
Contributor Author

Thanks @danielmeppiel and @sergio-sisternes-epam for the thoughtful discussion. Here's my planned approach before I start reworking the PR:

Lockfile format

  • Normalize SCP syntax (git@host:path) to true SSH URL (ssh://git@host/path.git) at parse time
  • Store the normalized URL honestly in the lockfile — including the user's chosen protocol and any custom port
  • Drop the original_ssh_url workaround; a single source_url field carries protocol + host + port

Clone-time protocol fallback

  • Extend the existing _clone_with_fallback behavior to lockfile-driven installs
  • Try the protocol declared in the lockfile first (respects the author's choice)
  • Fall back to the alternative protocol on failure, so teammates without SSH keys aren't blocked when the author committed an ssh:// URL
  • Ports are protocol-specific, so on fallback we drop the source-protocol port and use the target protocol's default (or a known alternative)

Field changes

  • DependencyReference / LockedDependency: add source_url (normalized) + protocol; host stays port-free for identity matching
  • build_ssh_url goes back to emitting a clean ssh:// URL — no more stuffing host:port into the host field
  • A small normalize_to_ssh_url() helper handles the SCP → URL conversion

Tests

  • SCP normalization round-trips
  • Port preservation for ssh://host:7999 and https://host:8443
  • Protocol fallback in both directions (SSH → HTTPS and vice versa)
  • Identity matching across the same repo reached via different ports

Does this match what you both had in mind? Happy to adjust before I start — otherwise I'll push the rework shortly.

@danielmeppiel
Copy link
Copy Markdown
Collaborator

Maintainer verdict (port preservation design)

@edenfunf @sergio-sisternes-epam thanks for the thoughtful back-and-forth. I ran this through our python-architect persona and a security review. Here's where I land — and I'd like to redirect the rework before you push v3.

TL;DR

Reject both v2 (original_ssh_url parallel field) and v3 (source_url + protocol + fallback). The right fix is smaller and more direct: add a single port: Optional[int] field, thread it through the URL builders, and stop normalizing ssh:// to SCP form during parsing. ~4 files, ~50 lines, zero schema breakage.

Why not v2

original_ssh_url is a duplicate of repo_url/host for one transport. It only fixes SSH and ignores #731 (HTTPS https://host:8443/... has the same bug). Cluttering both DependencyReference and LockedDependency with a field that exists only because the parser drops a number is the wrong abstraction.

Why not v3

It's over-engineered for what is a port-preservation bug. Specifically:

  • Adding source_url + protocol to both dataclasses re-creates the same duplication problem on a bigger surface — these fields are derivable from host + port + repo_url if we treat the port as a first-class field.
  • The protocol-fallback-with-port-drop has a real security risk. ssh://internal-host:7999/... failing and silently retrying as https://internal-host/... (default 443) routes to a potentially different service on a corporate proxy. Worse: an attacker who can poison the HTTPS path but not SSH (or vice versa) gets a free transport swap. Our content_hash only catches this on re-install, not first install.
  • _clone_with_fallback already runs for lockfile-driven installs — we don't need to extend it. The right fix is making sure the URL builders carry the port; the existing chain then does the right thing.

What to build instead

1. Single new field

# src/apm_cli/models/dependency/reference.py
port: Optional[int] = None  # Non-standard SSH/HTTPS port

# src/apm_cli/deps/lockfile.py — LockedDependency
port: Optional[int] = None

2. Stop normalizing ssh:// to SCP

The current chain _normalize_ssh_protocol_url (reference.py:380) -> SCP -> _parse_ssh_url is what drops the port (SCP git@host:path cannot represent a port — the : is a path separator).

Replace it with a real ssh:// parser that mirrors how we already handle https:// URLs in _parse_standard_url (reference.py:711):

@staticmethod
def _parse_ssh_protocol_url(url: str):
    if not url.startswith("ssh://"):
        return None
    parsed = urllib.parse.urlparse(url)
    return parsed.hostname, parsed.port, repo_url, reference, alias

This also addresses Copilot inline comments #1/#2 (#ref/@alias stripping) cleanly via urlparse — fragment is separate from path.

3. URL builders accept port

# src/apm_cli/utils/github_host.py
def build_ssh_url(host, repo_ref, port=None):
    if port:
        return f"ssh://git@{host}:{port}/{repo_ref}.git"  # ssh:// form (SCP can't carry port)
    return f"git@{host}:{repo_ref}.git"  # SCP shorthand for default port

def build_https_clone_url(host, repo_ref, token=None, port=None):
    netloc = f"{host}:{port}" if port else host
    ...

When port is set, builders emit ssh:// form; without port they keep emitting SCP shorthand (no behavioural change for the 99% case).

4. Same port across protocols on fallback

_clone_with_fallback doesn't change structurally. SSH attempt uses ssh://host:7999, HTTPS attempt uses https://host:7999. If 7999 doesn't speak HTTPS, fail loudly — better than silently hitting a different service.

5. Identity/dedup unchanged

get_unique_key() stays repo_url (bare). get_identity() stays host/repo_url (bare). Port is a transport detail, not an identity component — two devs referencing the same logical repo via different protocols/ports should still dedup to the same package.

6. Lockfile validation (security)

In LockedDependency.from_dict, validate the port:

_p = data.get("port")
port = int(_p) if _p is not None and 1 <= int(_p) <= 65535 else None

We already do validate_path_segments on repo_url; the port deserves the same defensive cast against lockfile tampering.

Tests to add

  • ssh://host:7999/path.git parses with port=7999 (no SCP intermediate)
  • https://host:8443/path parses with port=8443 (covers Using dependencies within non-default ports #731)
  • SCP shorthand git@host:path still parses with port=None (no behaviour change)
  • Lockfile round-trip preserves port
  • Lockfile rejects garbage port (port: 'x', port: 99999, port: -1)
  • Same logical repo on two ports still dedups via get_unique_key()
  • _clone_with_fallback builds both SSH and HTTPS URLs with the port when set

Why I'm not asking for protocol fallback in this PR

The team-UX scenario (one dev installs via SSH, teammate has no SSH keys) is real but separate from this bug and bigger than it looks — it changes lockfile semantics across the whole codebase. Let's land port preservation cleanly first, then open a dedicated discussion for cross-protocol fallback design.

Scope summary

File Change
src/apm_cli/models/dependency/reference.py Add port, replace _normalize_ssh_protocol_url with a real ssh:// parser via urlparse
src/apm_cli/utils/github_host.py build_ssh_url / build_https_clone_url accept port
src/apm_cli/deps/lockfile.py Add port, serialize when non-default, validate on deserialize
src/apm_cli/deps/github_downloader.py Thread dep_ref.port through _build_repo_url (no fallback chain changes)

Closes the same root cause for both #661 (SSH) and #731 (HTTPS).

@edenfunf — happy to discuss any specific tradeoff, but this is the direction I'd like the PR to take. Let me know if anything is unclear before you start.

@danielmeppiel
Copy link
Copy Markdown
Collaborator

@edenfunf here's a concrete checklist to land this. Sing out if any item is unclear and I'll expand.

Code

  • src/apm_cli/models/dependency/reference.py
    • Add port: Optional[int] = None to the dataclass (right after host)
    • Replace _normalize_ssh_protocol_url (line ~380) with _parse_ssh_protocol_url that returns (host, port, repo_url, reference, alias) using urllib.parse.urlparse
    • Update parse() to try the new ssh:// parser before the SCP _parse_ssh_url regex
    • Update _parse_standard_url (line ~711) to capture parsed_url.port for https:///http://
    • In to_canonical()/get_identity(): if port is set, render as host:port/repo_url for display only — get_unique_key() stays repo_url bare
  • src/apm_cli/utils/github_host.py
    • build_ssh_url(host, repo_ref, port=None) — emit ssh://git@host:port/repo.git when port is set, keep SCP shorthand otherwise
    • build_https_clone_url(host, repo_ref, token=None, port=None) — embed port in netloc
  • src/apm_cli/deps/lockfile.py
    • Add port: Optional[int] = None to LockedDependency
    • to_dict: emit port only when non-None (backward compat)
    • from_dict: defensive cast — int(p) only if 1 <= int(p) <= 65535, else None
    • from_dependency_ref: pass port=dep_ref.port
  • src/apm_cli/deps/github_downloader.py
    • In _build_repo_url (line ~571), thread dep_ref.port to the URL builders
    • Drop the original_ssh_url branch in _clone_with_fallback (lines 697-701) — once port is in the URL builders, the existing fallback chain produces the right URLs naturally

Tests

Two existing test files are the right home — please add to them rather than creating a new file:

  • tests/unit/test_generic_git_urls.py (parsing)
    • ssh://host:7999/path.git -> host=..., port=7999, repo_url=path
    • https://host:8443/path -> port=8443 (covers Using dependencies within non-default ports #731)
    • git@host:path (SCP) -> port=None (no behaviour change)
    • ssh://host/path.git (no port) -> port=None
    • #ref and @alias correctly extracted from ssh://host:7999/path.git#main@alias
  • tests/unit/test_lockfile.py (or wherever LockedDependency round-trip lives)
    • port round-trips via to_dict/from_dict
    • Lockfile without port field still loads cleanly
    • Garbage port values rejected: 'x', 99999, -1, 0 -> None
    • Same logical repo_url with two different ports still collides under get_unique_key() (dedup invariant)
  • tests/unit/test_auth_scoping.py (clone behaviour — you already added cases here)
    • _clone_with_fallback builds ssh://host:7999/... for the SSH attempt when dep_ref.port=7999
    • Same builds https://host:7999/... for the HTTPS attempt — port preserved across protocols
    • Replace the except Exception swallow with except (RuntimeError, GitCommandError) (Copilot inline Will there be MCP coverage? #3)
  • Drop the existing TestBitbucketDatacenterSSH class — it's testing original_ssh_url which goes away

Docs

These need updating in the same PR (our convention is code + docs land together):

  • docs/src/content/docs/guides/dependencies.md
    • Around line 124 ("SSH URL" bullet) add a sub-bullet: ssh://git@host:PORT/owner/repo.git -- explicit ssh:// form is required when using a non-default SSH port (SCP shorthand cannot carry ports)
    • Around line 139 (the [email protected]:team/rules.git example) add a Bitbucket Datacenter / self-hosted example with custom port
  • docs/src/content/docs/reference/manifest-schema.md
    • Line ~177 (URL grammar): note that ssh:// and https:// URLs may include :PORT; git@ SCP shorthand may not
    • Line ~206 (examples block): add a ssh://[email protected]:7999/team/repo.git example
    • If there's a lockfile schema section, add the new port field
  • docs/src/content/docs/guides/private-packages.md (line ~39 self-hosted section) — one-line note that custom SSH/HTTPS ports are supported via the URL form

CHANGELOG

Out of scope (separate follow-up)

Validation before pushing

uv run pytest tests/unit/test_generic_git_urls.py tests/unit/test_lockfile.py tests/unit/test_auth_scoping.py -x
uv run pytest tests/unit tests/test_console.py -x   # full unit suite, must stay green

Once all of the above is in, I'll do another review pass and we can ship. Thanks for sticking with this — the original bug report is going to make a lot of self-hosted users happy.

@danielmeppiel danielmeppiel added CI/CD Deprecated: use area/ci-cd. Kept for issue history; will be removed in milestone 0.10.0. and removed CI/CD Deprecated: use area/ci-cd. Kept for issue history; will be removed in milestone 0.10.0. labels Apr 19, 2026
Adopt the maintainer's preferred design: capture non-default git ports
as a single port: Optional[int] field on DependencyReference and thread
it through the URL builders, rather than stashing the original ssh://
URL for verbatim replay.

- reference.py: replace _normalize_ssh_protocol_url with a real urlparse
  based _parse_ssh_protocol_url; drop original_ssh_url; update
  to_canonical / get_identity / to_github_url / __str__ to render
  host:port when port is set
- github_host.py: build_ssh_url and build_https_clone_url accept an
  optional port; build_ssh_url switches from SCP shorthand to ssh://
  when a port is supplied (SCP syntax cannot carry a port)
- lockfile.py: serialize port only when non-None; defensive int + range
  check (1-65535) on read to reject tampered values
- github_downloader.py: thread dep_ref.port into _build_repo_url so all
  three fallback methods (auth-HTTPS, SSH, plain HTTPS) emit URLs that
  retain the port; same port is reused across protocols
- tests: new TestCustomPortParsing covering scp-shorthand/ssh/https/http
  parsing, canonical rendering, identity equivalence, clone URL
  building, and lockfile round-trip; TestCloneWithFallbackPortPreservation
  verifies both attempts keep the port
- docs: update dependencies.md, manifest-schema.md, private-packages.md
  with custom-port grammar and examples
- CHANGELOG: note fix for issues microsoft#661 and microsoft#731

Fixes microsoft#661, microsoft#731
…acenter' into fix/preserve-ssh-url-bitbucket-datacenter
@edenfunf
Copy link
Copy Markdown
Contributor Author

Hi @danielmeppiel — thanks again for the detailed walkthrough. Summary of what landed:

Design

  • Added a single port: Optional[int] field on DependencyReference and LockedDependency. Port is a transport detail, not identity — get_unique_key() stays bare repo_url, so the same repo reachable on different ports dedupes to one entry.
  • Replaced _normalize_ssh_protocol_url with a real urlparse-based _parse_ssh_protocol_url that returns (host, port, repo_url, reference, alias). The SCP shorthand parser returns the same tuple shape with port=None, since git@host:path cannot carry a port (the : is the path separator).
  • build_ssh_url(host, repo_ref, port=None) emits ssh://git@host:PORT/repo.git when a port is set and falls back to SCP shorthand when not. build_https_clone_url(..., port=None) preserves the port on the netloc.
  • In _clone_with_fallback, dep_ref.port is threaded through _build_repo_url so both the SSH attempt and the HTTPS fallback reuse the same port. Protocol fallback with port dropping is explicitly out of scope.
  • Dropped original_ssh_url entirely — no verbatim URL stash anywhere.

Defensive reads

Lockfile port values are cast via int(...) inside a try/except (TypeError, ValueError) and range-checked 1..65535. Garbage values ("rm -rf /", 99999, 0, -1) resolve to port=None rather than raising or propagating.

Small additions beyond the checklist

  • _detect_virtual_package now short-circuits ssh:// (previously only git@, https://, http://), otherwise a bare ssh://... string tripped the virtual-package validator before reaching parse().
  • to_canonical(), get_identity(), to_github_url(), and __str__() all render host:port when a port is set, so serialized forms are lossless.

Tests

  • New TestCustomPortParsing in test_generic_git_urls.py (14 cases) covering scp / ssh / https / http parsing, canonical rendering, identity-equivalence across ports, and clone-URL builder behaviour.
  • New TestCloneWithFallbackPortPreservation in test_auth_scoping.py mocking Repo.clone_from and asserting both the SSH attempt and the HTTPS fallback carry the port.
  • TestLockedDependency in tests/test_lockfile.py gets port round-trip and defensive-cast cases.

Validation

  • pytest tests/unit tests/test_console.py tests/test_lockfile.py — all green (3,891 unit + 40 root tests).
  • End-to-end smoke test: parsed ssh://[email protected]:7999/project/repo.git and https://git.internal:8443/team/repo.git, ran them through _clone_with_fallback with a mocked Repo, and confirmed every URL handed to git retains the port. Also verified ssh://...#v1.0 still captures reference=v1.0 correctly alongside port=7999.

Docs & CHANGELOG


One trade-off I'd like to flag explicitly, since Sergio raised it: same port across both protocols on fallback. For Bitbucket Datacenter the defaults are 7999 (SSH) and 7990 (HTTPS), so a failed SSH clone will fall back to an HTTPS URL that speaks SSH — loud failure, not silent misdirection. I'd rather that than silently dropping the port and hitting the default 443 on a different host surface, but happy to revisit if you'd prefer an explicit per-protocol port field. Let me know.

Ready for another pass whenever you have time.

@danielmeppiel
Copy link
Copy Markdown
Collaborator

APM Review Panel -- PR #665

Reviewed by the APM Review Panel: a 7-agent multi-disciplinary review (architecture, auth, CLI logging, DevX UX, supply-chain security, growth, strategy) wired into microsoft/apm to instrument the maintainer's review process per the Instrumented Codebase chapter of the Agentic SDLC Handbook.

Headline: Strong, well-tested fix for a real enterprise pain. One blocker (Rule 4 doc sync), one nice-to-have follow-up, several green lights. Recommend merge once the Rule 4 fix is added.


[auth-expert]

  • [+] Token isolation preserved. _build_repo_url still gates x-access-token: injection on is_github and github_token -- a custom port on bitbucket.example.com:7999 will NOT receive a GitHub PAT. Method 1 (auth URL) remains GitHub-only. Good.
  • [+] build_https_clone_url correctly threads port into the netloc on both branches (with and without token). The token-bearing form https://x-access-token:TOKEN@host:port/repo.git is RFC-3986 compliant.
  • [+] Per-dep auth pattern intact. dep_ref is threaded through _clone_with_fallback -> _build_repo_url; no instance-var token leakage.
  • [+] SSH form switch is safe. When port is set, build_ssh_url correctly switches from SCP shorthand to ssh://git@host:port/repo.git. The existing GIT_SSH_COMMAND ConnectTimeout merge logic and GIT_ASKPASS=echo apply uniformly to both forms.
  • [!] Limitation worth documenting (not blocking): git credential fill is called per-host, not per-(host, port). If a user ever needs different credentials for bitbucket.example.com:7999 vs bitbucket.example.com:443, the credential helper will not disambiguate. Pre-existing; surfaces more often now that custom-port hosts are first-class. Suggest a one-line note in getting-started/authentication.md under "Git credential helper not found".

[python-architect]

  • [+] _normalize_ssh_protocol_url -> _parse_ssh_protocol_url is a clean win. Replacing hand-rolled string slicing with urllib.parse.urlparse removes a class of bugs (#ref@alias fragment handling, port preservation, git@ user prefix) and reduces LoC. The new helper has a clearly-typed return tuple.
  • [+] 5-tuple shape is consistent across _parse_ssh_url, _parse_ssh_protocol_url, _parse_standard_url -- no asymmetry for the caller in parse().
  • [+] Defensive port casting on lockfile read (from_dict: try/except, range check 1-65535) is the right pattern for an external-data boundary. Mirrors the path-segment validation pattern.
  • [i] Minor: port: Optional[int] is a new dataclass field on both DependencyReference and LockedDependency -- positioned correctly (between host and reference/registry_prefix). Backwards-compatible via Optional[int] = None.
  • [i] Test coverage is excellent: new TestCustomPortParsing (12 cases), TestCloneWithFallbackPortPreservation (4 end-to-end cases capturing emitted URLs), lockfile round-trip. The fallback-chain capture test is particularly valuable because it asserts the observable behavior at the Repo.clone_from boundary, not just the field shape.

[supply-chain-security-expert]

  • [+] No new attack surface. port is a bounded integer (1-65535), validated on both parse and lockfile read. No path-traversal vector.
  • [+] Identity dedup via get_unique_key() returns bare repo_url -- port is correctly NOT part of identity (asserted by test_same_repo_different_ports_dedup_by_repo_url). This is the right call: same code on different transports is the same package. Prevents identity-spoofing where repo and repo:7999 could be treated as distinct deps and shadow each other.
  • [+] Token never emitted into SSH URLs. Custom SSH port does not introduce a new token-leak vector (tokens flow only through build_https_clone_url token branch).
  • [+] enterprise/security.md invariants intact: still no runtime component, still no arbitrary code execution, still git-only. Custom port is a transport detail, not an execution surface change. The Security Model contract holds.
  • BLOCKING -- Rule 4 violation. Per repo copilot-instructions.md doc-sync rules: when changing dependency formats, packages/apm-guide/.apm/skills/apm-usage/dependencies.md must be updated alongside docs/src/content/docs/guides/dependencies.md. The starlight docs were updated (good); the shipped skill resource was NOT. Agents consuming the apm-guide skill will not learn about the :port syntax. Fix: mirror the new "Custom port" sub-bullets and the SCP-shorthand caveat into packages/apm-guide/.apm/skills/apm-usage/dependencies.md.

[devx-ux-expert]

  • [+] Three doc surfaces updated (guides/dependencies.md, guides/private-packages.md, reference/manifest-schema.md) with consistent terminology. Custom-port examples are realistic (Bitbucket DC port 7999, self-hosted GitLab 8443).
  • [+] Mental model preserved: the git-key object form is presented as the default for non-trivial cases; SCP shorthand limitation is explained inline rather than as a footnote. New users won't get surprised.
  • [+] __str__ and to_canonical() correctly include :port -- error messages will show the URL the user actually provided, not a sanitized one.
  • [!] UX gap (suggested follow-up, not blocking): if a user pastes [email protected]:7999/project/repo.git (SCP form, intending 7999 as a port), parsing succeeds but treats 7999/project/repo.git as the repo path. Clone then fails with a 404 from git -- not an APM error pointing at the SCP/port mismatch. Detecting "leading numeric segment after :" in the SCP path and raising an actionable error ("port not supported in SCP shorthand; use ssh://git@host:7999/...") would close the loop. Worth opening as a follow-up issue.
  • [i] CHANGELOG entry is accurate but reads as one long sentence. Style only.

[cli-logging-expert]

  • [+] No CLI output paths touched. Verbose-callback messages on success (Cloned from: {url}) will now correctly include the port, which is the right behavior. No new _rich_* calls in non-CLI modules.
  • [i] N/A for the rest of the LOC.

[oss-growth-hacker] -- side-channel

  • [+] High-conversion fix. Bitbucket Datacenter is the largest enterprise self-hosted git surface APM does not yet handle gracefully; this fix removes a hard "doesn't work for us" gate for an entire class of internal-platform users. Story angle for release notes: "APM now works on every git server you have, including the ones behind a custom port." Pairs naturally with the existing private-packages narrative.
  • [+] Bonus angle: the enterprise/security.md "no runtime component" framing now extends cleanly to "and works on your air-gapped Bitbucket DC on port 7999 too." Reinforces the enterprise-trust positioning.
  • [i] Issues #661 and #731 are external -- contributor @edenfunf is from outside the core team. Worth a callout in the release notes (community contribution).

[apm-ceo] -- final call

Decision: APPROVE with one required change before merge.

  • The fix is correct, well-tested, and protective. The architecture refactor (urlparse-based SSH parser) is a strict improvement.
  • The supply-chain panel correctly identified that Rule 4 is violated -- packages/apm-guide/.apm/skills/apm-usage/dependencies.md must be updated in this PR. This is the contract that keeps agent-consumed docs in sync with user-facing docs; we do not make exceptions.
  • The DevX UX follow-up (actionable error for git@host:NNNN/... SCP-with-port confusion) is real but should be a separate issue -- shipping this fix today is more valuable than holding it for a polish iteration.
  • Breaking-change check: zero. New optional dataclass fields, new optional kwargs, defensive lockfile parse. Existing port-less workflows unchanged.
  • CHANGELOG entry is accurate and lands under ### Fixed. Good.

Required before merge:

  1. Update packages/apm-guide/.apm/skills/apm-usage/dependencies.md to mention :port syntax and SCP-shorthand limitation.

Recommended follow-ups (separate issues, not blockers):
2. Open issue: actionable error for SCP-shorthand-with-port confusion (devx-ux-expert finding).
3. Open issue: document credential-helper per-host limitation when ports differ (auth-expert finding).


Quality gates

  • [+] Python Architect -- structural improvement; no anti-patterns introduced
  • [+] CLI Logging Expert -- no logging paths affected
  • [+] DevX UX Expert -- public surface familiar; one follow-up tracked
  • Supply Chain Security Expert -- Rule 4 doc-sync gap (blocking, see above)
  • [+] APM CEO -- positioning intact, no breaking changes, CHANGELOG correct
  • [+] OSS Growth Hacker -- net positive for adoption; release-note angle ready

Once the Rule 4 fix lands, this PR is ready to merge.


This review was produced by an automated multi-agent panel. Findings flagged as [x] are blocking; [!] are notable-but-not-blocking; [+] are confirmations of good practice; [i] are informational. Each specialist reasoned from a single canonical reference (per-persona doc links in their .agent.md) so the bar is auditable and consistent across reviews.

The starlight docs (docs/guides/dependencies.md, reference/manifest-schema.md,
guides/private-packages.md) gained the custom-port grammar and examples in
the earlier refactor commit, but the agent-consumed skill resource was not
updated alongside them. Agents installing the apm-guide skill therefore
would not learn about the :port syntax or the SCP-shorthand caveat.

Add a "Custom git ports" sub-section documenting:
  - ssh:// and https:// URL forms with :port
  - SCP shorthand cannot carry a port (path separator collision)
  - port preservation across SSH -> HTTPS fallback
  - lockfile port: <int> (1-65535), emitted only when non-default
  - port is a transport detail, not part of package identity

Also add a custom-port example to the object form block so the pattern is
discoverable from both string and object surfaces.
…bitbucket-datacenter

# Conflicts:
#	CHANGELOG.md
#	tests/test_lockfile.py
@edenfunf
Copy link
Copy Markdown
Contributor Author

Thanks for the panel review.

Blocking fix landed (Rule 4 doc sync)

Updated packages/apm-guide/.apm/skills/apm-usage/dependencies.md to mirror the starlight docs. The shipped skill resource now covers:

  • ssh:// and https:// URL forms with :port
  • SCP-shorthand caveat (the : is the path separator, so git@host:port/... is unreachable — must use ssh:// form for a custom port)
  • Port preservation across SSH → HTTPS fallback
  • Lockfile port: <int> (1–65535), emitted only when non-default
  • Port is a transport detail, not part of package identity (dedup stays on bare repo_url)

Also added a custom-port example in the object-form block so the pattern is discoverable from both the string and object surfaces. Pushed as a separate commit on top of the existing branch — rebase-friendly if you prefer a squash.

Follow-ups (not in this PR, per the CEO decision)

I'll open these as separate issues after merge:

  1. Actionable error for SCP-shorthand-with-port confusion. If a user writes [email protected]:7999/project/repo.git intending 7999 as a port, parsing currently succeeds and treats 7999/project/repo as the path. Clone fails with a 404 from git. A leading-numeric-segment heuristic in _parse_ssh_url could raise "port not supported in SCP shorthand; use ssh://git@host:7999/..." and close the loop.
  2. Credential-helper per-host limitation. git credential fill is per-host, not per-(host, port) — users needing distinct creds for host:7999 vs host:443 will hit ambiguity. Pre-existing, but more user-visible now that custom-port hosts are first-class. Worth a short note in getting-started/authentication.md under "Git credential helper not found".

Cross-protocol port trade-off (raised by @sergio-sisternes-epam)

Worth being upfront about, since the panel flagged the BB DC case indirectly: for Bitbucket Datacenter the defaults are 7999 (SSH) and 7990 (HTTPS), so an SSH → HTTPS fallback built with same-port will hit an SSH-only port and fail with a protocol error. The alternative (drop port, fall back to default 443) risks reaching a different service on the same hostname behind corporate reverse proxies. Both options are guesses. I picked the one that fails visibly rather than silently. Per-protocol port fields (ssh_port / https_port on the object form) would remove the guess entirely — happy to add as a follow-up if users hit the BB DC path in practice, but held off on expanding the schema in this PR.

Comment thread tests/unit/test_auth_scoping.py Fixed
CodeQL flagged the `"host:port" in url` checks in
TestCloneWithFallbackPortPreservation as potential incomplete URL
substring sanitization (py/incomplete-url-substring-sanitization, High).
In a test assertion context it is a false positive -- we are not
sanitizing user input, we are verifying a URL our own builder emitted.
But the alert is worth clearing because:

  1. urlparse-based assertions are stricter: they reject a URL that
     happens to embed "bitbucket.example.com:7999" somewhere in the
     path or query string but routes elsewhere.
  2. Leaving High-severity code-scanning alerts on a PR obscures future
     real findings.

Switch the HTTPS-fallback port assertions to parse the URL and compare
parsed.hostname / parsed.port separately. Also tighten the no-port
scp-shorthand case to check the path segment is not purely numeric,
rather than testing for arbitrary port substrings.
Copy link
Copy Markdown
Collaborator

@danielmeppiel danielmeppiel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

APM Review Panel -- approved

The blocking finding from the previous panel review is resolved:

  • [+] 65d2675 docs(apm-guide): mirror :port syntax into shipped skill resource -- closes the Rule 4 doc-sync gap. The added section even goes beyond what the panel asked for: it documents that port is a transport detail and not part of package identity, which matches the implementation semantics exactly. Nice.
  • [+] a10fe33 test: assert URL host/port via urlparse instead of substring match -- bonus improvement. Substring matches like "host:7999" in url are brittle (e.g. would also match host:7999something); urlparse is precise.

All CI green, mergeable. Net change: +443 / -98 across 12 files; no breaking changes.

Recommended follow-ups (separate issues, not blockers, restated for tracking):

  • Actionable error for SCP-shorthand-with-port confusion (git@host:7999/path -- devx-ux-expert finding)
  • Document credential-helper per-host limitation when ports differ (auth-expert finding)

Thanks @edenfunf -- this is exactly the kind of enterprise-self-host fix that compounds APM's positioning. Approved.

@danielmeppiel danielmeppiel added this pull request to the merge queue Apr 20, 2026
Merged via the queue into microsoft:main with commit 15ed194 Apr 20, 2026
11 checks passed
edenfunf added a commit to edenfunf/apm that referenced this pull request Apr 21, 2026
) (microsoft#788)

* fix(auth): thread dep_ref.port into credential resolution (microsoft#785)

DependencyReference.port was parsed and stored but never reached the
authentication path, so same-host multi-port setups (e.g. Bitbucket
Datacenter with distinct PATs on 7990 and 7991) could collapse onto a
single credential entry via the AuthResolver cache.

- HostInfo gains a port field and a display_name property for
  user-facing identifiers; classify_host stays transport-agnostic --
  port does not influence host kind.
- AuthResolver._cache key widens to (host, port, org). port=None keeps
  the pre-fix key for the >99% common case; custom-port setups get
  per-port entries.
- TokenManager._credential_cache mirrors the widening, keyed by
  (host, port). A half-widened cache would return the wrong cached
  credential at the collapsed layer.
- git credential fill sends host=host:port per gitcredentials(7). The
  credential protocol has no standalone port= attribute; a separate
  port= line is silently dropped by every helper.
- resolve_for_dep threads dep_ref.port; downloader, validation, and
  sources call sites forward the port argument.
- build_error_context surfaces host:port in the failure summary and
  appends an [i] hint when a port is set, pointing users at credential
  helpers that key by hostname only.

Lockfile identity is unaffected -- DependencyReference.get_unique_key()
still excludes port by design; the widened keys are purely in-process
memoisation.

Refs: microsoft#661, microsoft#665, microsoft#784

* fix(auth): address review feedback on PR microsoft#788

- _try_credential_fallback now reads host_info.host / host_info.port
  instead of the closure host / port variables. Behaviourally identical
  today (the resolver always passes the same host into both) but matches
  the symmetry of _resolve_token at the same indirection layer and
  removes a future drift hazard if try_with_fallback ever takes a
  pre-resolved AuthContext directly.
- HostInfo.display_name and _format_credential_host now use
  ``port is not None`` instead of truthy checks. ``None`` is the
  sentinel for "no port" everywhere else in the resolver
  (build_error_context already used this form), so the truthy fallback
  was the only inconsistency. Functionally equivalent for valid TCP
  ports; tightens the contract against any future port=0 misuse.
- Mirror the new "Custom-port hosts and per-port credentials" section
  from docs/getting-started/authentication.md into the in-repo APM
  usage skill at packages/apm-guide/.apm/skills/apm-usage/, so the
  skill APM ships with stays consistent with the public docs.
- Anchor the build_error_context substring assertion in
  test_auth.py with the surrounding " on " / "." tokens. The previous
  "<host:port>" in <message> form was flagged by CodeQL's
  py/incomplete-url-substring-sanitization rule (false positive in test
  context, but the anchored form is also strictly more precise -- it
  pins the rendered position rather than just substring existence).

Default-port normalisation (port=443/80/22 still rendering host:port)
is intentionally left for a follow-up issue per the panel review --
that fix belongs at the parser or HostInfo.__post_init__ layer and
needs its own contract test.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Ignores ssh://, tries https (Bitbucket Datacenter)

5 participants