Guys,
does anyone know what the likely cause is when you get an error at the following code line?
https://github.com/gridengine/gridengine/blob/master/source/daemons/shepherd/shepherd.c#L325
We had a working GridEngine setup and it has broken after updating from Debian 8 to 9.
We get errors like:
Cannot reset euid dpovey due to Operation not permitted
when a user's qlogin gets scheduled on a node.
Job 8883854 caused action: none
User = dpovey
Queue = [email protected]
Start Time = <unknown>
End Time = <unknown>
failed in prolog: 04/30/2018 15:52:43 [119:9698]: exit_status of prolog = 143
Shepherd trace:
04/30/2018 15:52:43 [0:9698]: shepherd called with uid = 0, euid = 0
04/30/2018 15:52:43 [0:9698]: qlogin_daemon = builtin
04/30/2018 15:52:43 [119:9698]: starting up 8.1.9
04/30/2018 15:52:43 [119:9698]: setpgid(9698, 9698) returned 0
04/30/2018 15:52:43 [119:9698]: do_core_binding: "binding" parameter not found in config file
04/30/2018 15:52:43 [119:9698]: calling fork_pty()
04/30/2018 15:52:43 [119:9698]: parent: forked "prolog" with pid 9700
04/30/2018 15:52:43 [119:9698]: using signal delivery delay of 120 seconds
04/30/2018 15:52:43 [119:9698]: parent: prolog-pid: 9700
04/30/2018 15:52:43 [60410:9698]: Cannot reset euid dpovey due to Operation not permitted
04/30/2018 15:52:43 [60410:9698]: now sending signal TERM to pid -9700
04/30/2018 15:52:43 [60410:9698]: Cannot reset euid dpovey due to Operation not permitted
04/30/2018 15:52:43 [60410:9698]: now sending signal TERM to pid -9700
04/30/2018 15:52:43 [119:9698]: Poll received POLLHUP (Hang up). Unregister the FD.
04/30/2018 15:52:43 [119:9698]: wait3 returned 9700 (status: 15; WIFSIGNALED: 1, WIFEXITED: 0, WEXITSTATUS: 0)
04/30/2018 15:52:43 [119:9698]: prolog exited with exit status 0
04/30/2018 15:52:43 [119:9698]: reaped "prolog" with pid 9700
04/30/2018 15:52:43 [119:9698]: prolog exited due to signal
04/30/2018 15:52:43 [119:9698]: prolog signaled: 15
04/30/2018 15:52:43 [119:9698]: exit_status of prolog = 143
04/30/2018 15:52:43 [119:9698]: no epilog script to start
04/30/2018 15:52:43 [119:9698]: writing exit status to qrsh: 0
04/30/2018 15:52:43 [119:9698]: sending UNREGISTER_CTRL_MSG with exit_status = "0"
04/30/2018 15:52:43 [119:9698]: sending to host: <null>
04/30/2018 15:52:43 [119:9698]: comm_write_message returned: can't find handle
04/30/2018 15:52:43 [119:9698]: close_parent_loop: comm_write_message() returned 0 instead of 1!!!
04/30/2018 15:52:43 [119:9698]: waiting for UNREGISTER_RESPONSE_CTRL_MSG
04/30/2018 15:52:43 [119:9698]: No connection or problem while waiting for message: 1
04/30/2018 15:52:43 [119:9698]: parent: cl_com_ignore_timeouts
04/30/2018 15:52:43 [119:9698]: parent: error in comm_cleanup_lib(): 3
04/30/2018 15:52:43 [119:9698]: parent: leaving close_parent_loop()
Shepherd error:
04/30/2018 15:52:43 [119:9698]: exit_status of prolog = 143
Guys,
does anyone know what the likely cause is when you get an error at the following code line?
https://github.com/gridengine/gridengine/blob/master/source/daemons/shepherd/shepherd.c#L325
We had a working GridEngine setup and it has broken after updating from Debian 8 to 9.
We get errors like:
Cannot reset euid dpovey due to Operation not permittedwhen a user's qlogin gets scheduled on a node.