Omer Tuchfeld activity https://gitlab.com/omertuchfeld 2026-01-24T15:46:46Z tag:gitlab.com,2026-01-24:5029440217 Omer Tuchfeld commented on issue #15 at Leonhard LLC / ops 2026-01-24T15:46:46Z omertuchfeld Omer Tuchfeld

Perhaps it would be better to have:

impl AsRef<Path> for &TempDir {
    fn as_ref(&self) -> &Path {
        self.path()
    }
}

Rather than:

impl AsRef<Path> for TempDir {
    fn as_ref(&self) -> &Path {
        self.path()
    }
}

This way users will not be able to move, e.g.:

Diagnostics:
1. the trait bound `TempDir: std::convert::AsRef<std::path::Path>` is not satisfied
   the trait `std::convert::AsRef<std::path::Path>` is not implemented for `TempDir` [E0277]
2. required by a bound introduced by this call [E0277]
3. consider borrowing here: `&` [E0277]
tag:gitlab.com,2026-01-24:5029360022 Omer Tuchfeld commented on issue #15 at Leonhard LLC / ops 2026-01-24T14:36:12Z omertuchfeld Omer Tuchfeld

This leads to a rather unfortunate subtle footgun:

use std::process::Command;

fn main() {
    // Works
    {
        let temp_dir = temp_dir::TempDir::with_prefix("foo").unwrap();
        Command::new("ls").current_dir(&temp_dir).status().expect("example 1");
    }

    // Doesn't work
    {
        let temp_dir = temp_dir::TempDir::with_prefix("foo").unwrap();
        Command::new("ls").current_dir(temp_dir).status().expect("example 2");
    }

    println!("Hello, world!");
}
thread 'main' (1162815) panicked at src/main.rs:13:59:
example 2: Os { code: 2, kind: NotFound, message: "No such file or directory" }
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

temp_dir gets dropped by current_dir as it's moved

tag:gitlab.com,2021-11-24:1582078584 Omer Tuchfeld commented on merge request !343 at nvidia / kubernetes / gpu-operator 2021-11-24T10:29:59Z omertuchfeld Omer Tuchfeld

Can omit the INFO: as the log formatter is responsible for printing severity

Could you add to this log "... by labeling the $namespace with $label=$value" so the user has a better idea of how this "enabling" was done exactly

Maybe also tell them they can edit the label value to disable monitoring (and the operator won't fight them for that)

tag:gitlab.com,2021-11-24:1582057301 Omer Tuchfeld commented on merge request !343 at nvidia / kubernetes / gpu-operator 2021-11-24T10:22:19Z omertuchfeld Omer Tuchfeld

Maybe also log the expected label value here ocpNamespaceMonitoringLabelValue

tag:gitlab.com,2021-11-24:1582048386 Omer Tuchfeld commented on merge request !343 at nvidia / kubernetes / gpu-operator 2021-11-24T10:19:12Z omertuchfeld Omer Tuchfeld

Could you give some background in a comment about what is the "suggested namespace", and why do we not ocpEnsureNamespaceMonitoring when the operator is not installed on the suggested namespace?

tag:gitlab.com,2021-11-24:1582044074 Omer Tuchfeld commented on merge request !343 at nvidia / kubernetes / gpu-operator 2021-11-24T10:17:38Z omertuchfeld Omer Tuchfeld

nit: enabled*

tag:gitlab.com,2021-09-30:1488750833 Omer Tuchfeld pushed new project branch omerupgrade at Omer Tuchfeld / gpu-operator 2021-09-30T13:03:49Z omertuchfeld Omer Tuchfeld

Omer Tuchfeld (ea5a9229) at 30 Sep 13:03

Update path for validation init container image

... and 126 more commits

tag:gitlab.com,2021-09-16:1467128763 Omer Tuchfeld commented on merge request !298 at nvidia / kubernetes / gpu-operator 2021-09-16T23:13:18Z omertuchfeld Omer Tuchfeld

Could we possibly do something more robust, to allow this to be more compatible with future versions?

I think we should brainstorm on how to make sure this doesn't break / has to be tailored for every driver container version in existence. Even make changes to the driver image to make this easier.

One direction I thought about that I feel is really good is:

  • Have this script be already part the driver container - this way the driver container decides what needs to be copied.

And

  • Have the next script (the one that runs on the DTK mainContainer) be ALSO part of the driver container.

This way the driver container is in full control of what needs to be ran in the original container and in the DTK container, which means transformOpenShiftDriverToolkitContainer can be much more stupid and thus much more forwards compatible.

All transformOpenShiftDriverToolkitContainer would have to do is look for those scripts in an agreed-upon location (something like /dtk-helpers/init.sh and dtk-helpers/dtk.sh), run one script in the original driver image initContainer, and copy the other one to the /shared/nvidia dir so it can be ran in the DTK image mainContainer.

What do you think? Of-course it doesn't have to happen in this PR

tag:gitlab.com,2021-09-16:1467071240 Omer Tuchfeld commented on merge request !298 at nvidia / kubernetes / gpu-operator 2021-09-16T22:01:48Z omertuchfeld Omer Tuchfeld

I think a more detailed overview of the recursion flow here would make this easier to understand for future readers

tag:gitlab.com,2021-09-16:1467069534 Omer Tuchfeld commented on merge request !298 at nvidia / kubernetes / gpu-operator 2021-09-16T21:59:50Z omertuchfeld Omer Tuchfeld

Okay I wrapped my head around the recursion... func DaemonSet becomes func ocpDriverDaemonSet, which in turn calls multiple func DaemonSets, aggregates their state, then returns their overallState which gets returned from the original func Daemonset call, and this is all controlled via the currentRhcosVersion "global" variable. That's pretty complex. But I guess other designs might be more disruptive to the existing code...

tag:gitlab.com,2021-09-16:1467059299 Omer Tuchfeld commented on merge request !298 at nvidia / kubernetes / gpu-operator 2021-09-16T21:50:18Z omertuchfeld Omer Tuchfeld

Could you elaborate on the difference between ocpCleanupUnusedDriverDaemonSets and ocpCleanupUnusedDriverToolkitDaemonSets?

Isn't the "DriverToolkitDaemonset" supposed to completely replace the regular driver daemonset? Isn't there only a single "Regular" driver daemonset for all versions, and if so, how can it be unused / why is it plural, why does it need cleaning up?

tag:gitlab.com,2021-09-16:1467053341 Omer Tuchfeld commented on merge request !298 at nvidia / kubernetes / gpu-operator 2021-09-16T21:43:25Z omertuchfeld Omer Tuchfeld

FYI https://github.com/openshift/driver-toolkit/pull/60 has been reverted for now, due to it causing widespread CI issues

tag:gitlab.com,2021-09-16:1466951329 Omer Tuchfeld commented on merge request !298 at nvidia / kubernetes / gpu-operator 2021-09-16T20:14:56Z omertuchfeld Omer Tuchfeld

Not a list of deployments

tag:gitlab.com,2021-09-16:1466951080 Omer Tuchfeld commented on merge request !298 at nvidia / kubernetes / gpu-operator 2021-09-16T20:14:46Z omertuchfeld Omer Tuchfeld

Not a list of deployments

tag:gitlab.com,2021-09-16:1466946349 Omer Tuchfeld commented on merge request !298 at nvidia / kubernetes / gpu-operator 2021-09-16T20:11:41Z omertuchfeld Omer Tuchfeld

Is this indirect recursion on purpose?

tag:gitlab.com,2021-09-16:1466944006 Omer Tuchfeld commented on merge request !298 at nvidia / kubernetes / gpu-operator 2021-09-16T20:09:56Z omertuchfeld Omer Tuchfeld

ocpDriverDaemonSet calls DaemonSet which calls ocpDriverDaemonSet... I'm confused

tag:gitlab.com,2021-09-16:1466812641 Omer Tuchfeld commented on merge request !298 at nvidia / kubernetes / gpu-operator 2021-09-16T18:19:13Z omertuchfeld Omer Tuchfeld

same for the consts below

tag:gitlab.com,2021-09-16:1466812455 Omer Tuchfeld commented on merge request !298 at nvidia / kubernetes / gpu-operator 2021-09-16T18:19:04Z omertuchfeld Omer Tuchfeld

s/ocpDtkVersionLabel/ocpDriverToolkitVersionLabel

It's not well-known enough outside of RedHat, I don't feel the acronym is fair