Skip to content

allow ssh to be used for debugging github test runners#1276

Open
BrentBaccala wants to merge 9 commits intoSingular:spielwiesefrom
BrentBaccala:github-actions
Open

allow ssh to be used for debugging github test runners#1276
BrentBaccala wants to merge 9 commits intoSingular:spielwiesefrom
BrentBaccala:github-actions

Conversation

@BrentBaccala
Copy link
Copy Markdown
Contributor

This PR adds to the existing github test harness to provide an option, when the action is triggered manually, to wait at the end for an ssh connection into the test runner, that can be used for debugging.

@dimpase
Copy link
Copy Markdown
Contributor

dimpase commented Nov 23, 2025

Does this reliably work? We might have a use for it in @sagemath

@BrentBaccala
Copy link
Copy Markdown
Contributor Author

I haven't used it recently, but it worked fine this spring. It's a bit of a pain to configure.

To get it working, you need to have the changes made on the github repository's default branch (having it on another branch doesn't work). You can either change the default branch on the repository settings, or merge the patch into the default branch. Then you can trigger the workflow manually, as documented here.

This also means you can't test it normally as a PR. You can change the git default branch on your own copy of the Singular repository, or merge into it, as I just described, and then make your own PR on your own repository, and run the ssh runner there. I've done that; it should work.

If the maintainers accept it into the main repository, and put it on the default branch, it will be easier to use.

I was also thinking of using it to diagnose the sagemath test suite failures, as I think you are, but am working on a different project.

Could you try it and see if it works for you?

action-tmate itself has no timeout for how long it will wait for
someone to ssh in — it waits indefinitely. On a forgotten manual
dispatch across a 6-job matrix that can burn up to 36 hours of runner
time before GitHub's default 6-hour job timeout fires on each job.

Cap the wait at the step level with timeout-minutes, and expose it as
a workflow_dispatch input (default 10 minutes).
The workflow file has to live on a branch of the repo where the run
executes, but actions/checkout can pull source from anywhere. Exposing
repo and ref as workflow_dispatch inputs lets a tmate-enabled workflow
on one branch debug a build of any other branch (or upstream repo)
without having to merge workflow changes into the branch under test.

Defaults preserve existing behavior: without inputs (push/pull_request
events or manual dispatch without overrides), checkout uses the same
repository and ref it would have used before.
Without limit-access-to-actor, the 25-character session token embedded
in the printed ssh URL is the only credential — anyone who sees the
workflow log (public for public repos) can connect to the runner and
read its environment, including any secrets the job has access to.

limit-access-to-actor=true fetches the dispatching user's public keys
from github.com/<user>.keys and writes them to the session's
authorized_keys, so a valid private key is required in addition to
the token. Expose it as an input so a user without github-registered
ssh keys can still opt into token-only access by setting it false.
Explains the workflow_dispatch inputs added by the preceding commits
(repo, ref, tmate, tmate_timeout_minutes, tmate_limit_access_to_actor),
how to dispatch a manual run from the web UI or gh CLI, how to connect
to and exit a tmate session, the security implications of running with
or without limit-access-to-actor, and the tmate upstream deprecation.

Linked from doc/How-To-Contribute.md so a contributor hitting a
CI-only failure can find it. A comment at the top of runtests.yml
points at the doc for anyone reading the workflow directly.
NAT/firewall rationale: GitHub-hosted runners have no inbound
connectivity, so direct ssh is impossible and a relay is necessary.
Also note the self-hosted tmate-ssh-server option for projects that
don't want to trust the public tmate.io relay.
@BrentBaccala
Copy link
Copy Markdown
Contributor Author

This PR has been rebased onto current spielwiese and expanded with several new commits beyond the original three.

New workflow inputs (available when dispatching manually from the Actions tab or gh workflow run):

  • repo / ref — check out and test a different repository or branch than the one the workflow lives on
  • tmate — open an ssh-accessible shell on the runner after tests finish (pass or fail), using mxschmitt/action-tmate
  • tmate_timeout_minutes — cap how long the session waits for a connection (default 10 min)
  • tmate_limit_access_to_actor — require the connecting user's ssh key to match the dispatching user's GitHub profile (default true)

Push and pull_request triggers are unchanged — the tmate step is skipped and the checkout behaves as before.

Documentation added in doc/Debugging-CI.md covering usage, security considerations, and the tmate upstream deprecation timeline (Homebrew plans to drop the formula on 2026-12-11). A pointer was added from doc/How-To-Contribute.md.

Tested on the fork:

  • Push events correctly skip the tmate step
  • Manual dispatch with tmate=true reaches the tmate step on all 6 matrix jobs
  • limit-access-to-actor=true: matching ssh key accepted, non-matching key rejected with "Permission denied (publickey)"

This comment was researched and written by an AI assistant (Claude) on behalf of Brent Baccala ([email protected]).

@jankoboehm
Copy link
Copy Markdown
Member

Thanks, we will have a look again, but please don't be disappointed if we currently go for something more standard.

@BrentBaccala
Copy link
Copy Markdown
Contributor Author

I didn't realize it had been reviewed at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants