Skip to content

Unable to run cuda:latest image - nvidia: version magic #16

@Aspart

Description

@Aspart

Shortly after running container stdout print:

ERROR: Unable to load the kernel module 'nvidia.ko'.  This happens most frequently when this kernel module was built
       against the wrong or improperly configured kernel sources, with a version of gcc that differs from the one
       used to build the target kernel, or if a driver such as rivafb, nvidiafb, or nouveau is present and prevents
       the NVIDIA kernel module from obtaining ownership of the NVIDIA graphics device(s), or no NVIDIA GPU installed
       in this system is supported by this NVIDIA Linux graphics driver release.

       Please see the log entries 'Kernel module load error' and 'Kernel messages' at the end of the file
       '/var/log/nvidia-installer.log' for more information.

ERROR: Installation has failed.  Please see the file '/var/log/nvidia-installer.log' for details.  You may find
       suggestions on fixing installation problems in the README available on the Linux driver download page at
       www.nvidia.com.

In /var/log/nvidia-installer.log:

-> done.
-> Kernel module compilation complete.
ERROR: Unable to load the kernel module 'nvidia.ko'.  This happens most frequently when this kernel module was built against the wrong or improperly configured kernel sources, with a version of gcc that differs from the one used to build the target kernel, or if a driver such as rivafb, nvidiafb, or nouveau is present and prevents the NVIDIA kernel module from obtaining ownership of the NVIDIA graphics device(s), or no NVIDIA GPU installed in this system is supported by this NVIDIA Linux graphics driver release.

Please see the log entries 'Kernel module load error' and 'Kernel messages' at the end of the file '/var/log/nvidia-installer.log' for more information.
-> Kernel module load error: Exec format error
-> Kernel messages:
[245414.289012] vgaarb: device changed decodes: PCI:0000:06:00.0,olddecodes=none,decodes=none:owns=none
[245414.289295] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  352.39  Fri Aug 14 18:09:10 PDT 2015
[245414.290603] nvidia_uvm: version magic '4.5.0-coreos-r1.0 SMP mod_unload ' should be '4.5.0-coreos-r1 SMP mod_unload '
[245530.126055] vgaarb: device changed decodes: PCI:0000:01:00.0,olddecodes=none,decodes=none:owns=io+mem
[245530.126355] vgaarb: device changed decodes: PCI:0000:06:00.0,olddecodes=none,decodes=none:owns=none
[245530.126647] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  352.39  Fri Aug 14 18:09:10 PDT 2015
[245530.127915] nvidia_uvm: version magic '4.5.0-coreos-r1.0 SMP mod_unload ' should be '4.5.0-coreos-r1 SMP mod_unload '
[245683.707725] docker0: port 1(veth22fab08) entered disabled state
[245683.714388] vethac0f204: renamed from eth0
[245683.736914] docker0: port 1(veth22fab08) entered forwarding state
[245683.743732] docker0: port 1(veth22fab08) entered forwarding state
[245683.838647] docker0: port 1(veth22fab08) entered disabled state
[245683.848227] device veth22fab08 left promiscuous mode
[245683.854539] docker0: port 1(veth22fab08) entered disabled state
[245689.662547] device veth1ced9a7 entered promiscuous mode
[245689.668489] IPv6: ADDRCONF(NETDEV_UP): veth1ced9a7: link is not ready
[245689.676076] IPv6: ADDRCONF(NETDEV_CHANGE): veth1ced9a7: link becomes ready
[245689.683594] docker0: port 1(veth1ced9a7) entered forwarding state
[245689.690294] docker0: port 1(veth1ced9a7) entered forwarding state
[245689.850591] docker0: port 1(veth1ced9a7) entered disabled state
[245689.857775] eth0: renamed from veth91b59ba
[245689.871251] docker0: port 1(veth1ced9a7) entered forwarding state
[245689.878092] docker0: port 1(veth1ced9a7) entered forwarding state
[245704.917474] docker0: port 1(veth1ced9a7) entered forwarding state
[245729.922871] nvidia: version magic '4.5.0-coreos-r1.0 SMP mod_unload ' should be '4.5.0-coreos-r1 SMP mod_unload '
ERROR: Installation has failed.  Please see the file '/var/log/nvidia-installer.log' for details.  You may find suggestions on fixing installation problems in the README available on the Linux driver download page at www.nvidia.com.

I am running 4.5.0-coreos-r1, /usr/src/kernels/linux contains 4.5.5 kernel.

I have forced container to run by editing /usr/src/kernels/linux/include/generated/utsrelease.h from:

#define UTS_RELEASE "4.5.0-coreos-r1.0"

to

#define UTS_RELEASE "4.5.0-coreos-r1"

Probably 21 string in current master Dockerfile is redundant.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions