Skip to content

build(deps): Bump NVIDIA/holodeck from 0.2.18 to 0.3.1#2260

Merged
tariq1890 merged 2 commits intomainfrom
dependabot/github_actions/NVIDIA/holodeck-0.3.1
Apr 21, 2026
Merged

build(deps): Bump NVIDIA/holodeck from 0.2.18 to 0.3.1#2260
tariq1890 merged 2 commits intomainfrom
dependabot/github_actions/NVIDIA/holodeck-0.3.1

Conversation

@dependabot
Copy link
Copy Markdown
Contributor

@dependabot dependabot Bot commented on behalf of github Apr 1, 2026

Bumps NVIDIA/holodeck from 0.2.18 to 0.3.1.

Release notes

Sourced from NVIDIA/holodeck's releases.

v0.3.1

What's Changed

SSH Reliability

  • SSH keepalive probes (30s interval) prevent session drops during long operations like kubeadm init
  • 15s handshake timeout prevents connectOrDie from blocking indefinitely on unresponsive hosts

AWS Resource Cleanup — Provider

  • Handle InvalidInternetGatewayID.NotFound in IGW detach (skip retries)
  • Handle NotFound errors in NLB/listener/target-group deletion
  • HA NLB hairpin routing fix (localhost:6443 for kubectl)
  • Switch HA NLB to internal scheme

AWS Resource Cleanup — Periodic Cleanup Action

  • NLB cleanup before subnet/IGW/VPC deletion (prevents DependencyViolation)
  • Revoke cross-referencing SG rules before deletion
  • Treat InvalidVpcID.NotFound as success
  • Suppress NotFound warnings for all resource types (IGW, SG, subnet, route table)

CI

  • Periodic cleanup workflow updated with manual trigger support

Full Changelog: NVIDIA/holodeck@v0.3.0...v0.3.1

Closes #771

v0.3.0

Holodeck v0.3.0

A major release with production-grade cluster networking, custom templates, RPM distribution support, ARM64 improvements, and comprehensive CI/CD enhancements.

Highlights

  • Production-Grade Cluster Networking — Private subnets, NAT gateways, separate security groups, SSM transport, and NLB support for HA clusters (#720#728)
  • Custom Templates — User-defined provisioning templates with full lifecycle phase support (#701#706)
  • RPM Support — First-class support for Rocky Linux 9, Amazon Linux 2023, and Fedora 42 across all runtime stacks (#676#681)
  • ARM64 Support — Automatic architecture inference from instance type, cross-validation, and runtime arch detection (#661#669)
  • Multi-Source Installation — Install NVIDIA drivers and container runtimes from package, runfile, or git sources (#635#637)
  • E2E Test Tiering — Smoke tests (pre-merge) and full suite (post-merge) for faster CI feedback (#740)
  • Coverage Threshold Gate — 40% minimum unit test coverage enforced in CI (#754)

Bug Fixes

  • Fixed node name newline injection in cluster mode (#757)
  • Fixed NAT Gateway race condition (#735)
  • Fixed API server verification before NLB switch (#721)
  • Numerous SSH, security, and AWS provider fixes

See the full CHANGELOG for all changes.

Changelog

Sourced from NVIDIA/holodeck's changelog.

[v0.3.1] - 2026-04-02

Bug Fixes

SSH Reliability

  • fix: add SSH keepalive and handshake timeout (#772) — SSH connections now send keepalive probes every 30 seconds to prevent session drops during long operations (e.g., kubeadm init). A 15-second handshake timeout prevents connectOrDie from blocking indefinitely against hosts that accept TCP but never complete the SSH handshake.

AWS Resource Cleanup — Provider

  • fix: HA NLB hairpin routing (#746, #762) — Control-plane nodes now use localhost:6443 for kubectl instead of the NLB endpoint, avoiding AWS NLB hairpin/loopback timeouts.
  • fix: switch HA NLB to internal scheme (#760) — NLB uses internal scheme to keep traffic within the VPC.
  • fix: handle InvalidInternetGatewayID.NotFound in IGW detach (#772) — The detach step now recognizes InvalidInternetGatewayID.NotFound alongside Gateway.NotAttached and skips retries.
  • fix: handle NotFound errors in NLB/listener/target-group deletion (#772) — All NLB cleanup paths now check for LoadBalancerNotFound, ListenerNotFound, and TargetGroupNotFound, treating already-deleted resources as success.

AWS Resource Cleanup — Periodic Cleanup Action

  • fix: NLB cleanup in periodic VPC cleaner (#762)DeleteVPCResources now deletes NLB listeners, target groups, and load balancers before attempting subnet/IGW/VPC deletion, preventing DependencyViolation errors from NLB-owned ENIs.
  • fix: revoke cross-referencing SG rules before deletion (#766) — Security groups that reference each other are now cleaned up by revoking all ingress/egress rules before attempting deletion.
  • fix: treat InvalidVpcID.NotFound as success in VPC cleanup (#769) — VPCs that no longer exist are treated as successfully cleaned up.
  • fix: suppress NotFound warnings in all cleanup delete functions (#772) — The periodic cleanup job no longer logs misleading warnings when IGWs, security groups, subnets, or route tables are already gone.

CI

  • ci: update periodic cleanup and add manual trigger (#758, #765) — Periodic cleanup workflow uses the latest holodeck binary and supports manual dispatch.

[v0.3.0] - 2026-03-30

Features

Production-Grade Cluster Networking (#720#728)

A complete overhaul of multi-node cluster networking for production workloads:

Custom Templates (#701#706)

User-defined provisioning templates with full lifecycle phase support:

RPM Support (#676#681, #693)

... (truncated)

Commits
  • d81971e release: consolidate v0.3.1-v0.3.4 into v0.3.1 (#773)
  • d2c0a36 fix: harden error handling and add SSH keepalive for v0.3.4 (#772)
  • d5a497e release: bump version to v0.3.3 (#770)
  • babab84 fix: treat InvalidVpcID.NotFound as success in VPC cleanup (#769)
  • 2f11ae4 release: bump version to v0.3.2 (#767)
  • c62e351 fix: revoke cross-referencing SG rules before deletion in cleanup (#766)
  • f58fa3b ci: update periodic cleanup to v0.3.1 (#765)
  • 4600c5a release: bump version to v0.3.1 (#763)
  • 41454b9 fix: HA NLB hairpin routing and cleanup (#746) (#762)
  • cd30218 fix: switch HA NLB to internal scheme to fix hairpin routing (#746) (#760)
  • Additional commits viewable in compare view

@dependabot dependabot Bot added dependencies Issue/PR Pull about a dependency file github_actions Pull requests that update GitHub Actions code labels Apr 1, 2026
@dependabot dependabot Bot added the dependencies Issue/PR Pull about a dependency file label Apr 1, 2026
@dependabot dependabot Bot added the github_actions Pull requests that update GitHub Actions code label Apr 1, 2026
@dependabot dependabot Bot requested a review from tariq1890 as a code owner April 1, 2026 07:54
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Apr 1, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@rahulait
Copy link
Copy Markdown
Contributor

rahulait commented Apr 8, 2026

/ok to test f1b1736

@coveralls
Copy link
Copy Markdown

coveralls commented Apr 8, 2026

Coverage Status

coverage: 28.201%. remained the same — dependabot/github_actions/NVIDIA/holodeck-0.3.1 into main

@tariq1890
Copy link
Copy Markdown
Contributor

Hi @ArangoGutierrez , I believe this PR is the root cause behind the failures we see here. What would you suggest we do here?

@tariq1890
Copy link
Copy Markdown
Contributor

@dependabot rebase

Bumps [NVIDIA/holodeck](https://github.com/nvidia/holodeck) from 0.2.18 to 0.3.1.
- [Release notes](https://github.com/nvidia/holodeck/releases)
- [Changelog](https://github.com/NVIDIA/holodeck/blob/main/CHANGELOG.md)
- [Commits](NVIDIA/holodeck@v0.2.18...v0.3.1)

---
updated-dependencies:
- dependency-name: NVIDIA/holodeck
  dependency-version: 0.3.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
@dependabot dependabot Bot force-pushed the dependabot/github_actions/NVIDIA/holodeck-0.3.1 branch from f1b1736 to ea92535 Compare April 21, 2026 19:26
@tariq1890
Copy link
Copy Markdown
Contributor

/ok to test eb8eb70

Copy link
Copy Markdown
Collaborator

@ArangoGutierrez ArangoGutierrez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MEGA - LGTM
Thanks a lot @tariq1890

@tariq1890 tariq1890 merged commit ad9b4a3 into main Apr 21, 2026
20 checks passed
@tariq1890 tariq1890 deleted the dependabot/github_actions/NVIDIA/holodeck-0.3.1 branch April 21, 2026 22:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Issue/PR Pull about a dependency file github_actions Pull requests that update GitHub Actions code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants