I also think it's a rather hacky way. Especially in the kubernetes context, embedding yaml in toml, that is embedded in yaml feels weird at least.
I totally get why it feels a bit hacky!
Oups I have just seen your comment. I suggested a config below (deleted comment). I think what was missing to your config is the patch_type. I tested with the snipped below and the podSpec was updated. I just didn't have the volume avalaible (job log below)
Running with gitlab-runner development version (HEAD)
on investigation __REDACTED__, system ID: s_b188029b2abb
feature flags: FF_USE_LEGACY_KUBERNETES_EXECUTION_STRATEGY:true, FF_USE_ADVANCED_POD_SPEC_CONFIGURATION:true, FF_PRINT_POD_EVENTS:true, FF_SCRIPT_TO_STEP_MIGRATION:true
Preparing the "kubernetes" executor
00:02
WARNING: Namespace is empty, therefore assuming 'default'.
Using Kubernetes namespace: default
Using Kubernetes executor with image alpine ...
Using effective pull policy of [Always] for container build
Using effective pull policy of [Always] for container helper
Using effective pull policy of [Always] for container init-permissions
Preparing environment
02:28
Using FF_USE_POD_ACTIVE_DEADLINE_SECONDS, the Pod activeDeadlineSeconds will be set to the job timeout: 1h0m0s...
WARNING: Advanced Pod Spec configuration enabled, merging the provided PodSpec to the generated one. This is a beta feature and is subject to change. Feedback is collected in this issue: https://gitlab.com/gitlab-org/gitlab-runner/-/issues/29659 ...
Subscribing to Kubernetes Pod events...
Type Reason Message
Normal Scheduled Successfully assigned default/runner-__REDACTED__-project-25452826-concurrent-0-ah2vrwxx to gke-ra-cluster-linux-pool-7ae7231b-ttub
Warning FailedMount MountVolume.SetUp failed for volume "nfs-data" : mount failed: exit status 1
Mounting command: /home/kubernetes/containerized_mounter/mounter
Mounting arguments: mount -t nfs nfs-server.example.com:/exported/path /var/lib/kubelet/pods/c098f15d-808d-43ac-b296-5bf6d2373bdd/volumes/kubernetes.io~nfs/nfs-data
Output: Mount failed: mount failed: exit status 32
Mounting command: chroot
Mounting arguments: [/home/kubernetes/containerized_mounter/rootfs mount -t nfs nfs-server.example.com:/exported/path /var/lib/kubelet/pods/c098f15d-808d-43ac-b296-5bf6d2373bdd/volumes/kubernetes.io~nfs/nfs-data]
Output: MOUNT_WRAPPER: starting rpcbind
* Starting RPC port mapper daemon rpcbind
...done.
mount.nfs: Failed to resolve server nfs-server.example.com: Name or service not known
Warning FailedMount MountVolume.SetUp failed for volume "nfs-data" : mount failed: exit status 1
Mounting command: /home/kubernetes/containerized_mounter/mounter
Mounting arguments: mount -t nfs nfs-server.example.com:/exported/path /var/lib/kubelet/pods/c098f15d-808d-43ac-b296-5bf6d2373bdd/volumes/kubernetes.io~nfs/nfs-data
Output: Mount failed: mount failed: exit status 32
Mounting command: chroot
Mounting arguments: [/home/kubernetes/containerized_mounter/rootfs mount -t nfs nfs-server.example.com:/exported/path /var/lib/kubelet/pods/c098f15d-808d-43ac-b296-5bf6d2373bdd/volumes/kubernetes.io~nfs/nfs-data]
Output: MOUNT_WRAPPER: starting rpcbind
* Already running: rpcbind
mount.nfs: Failed to resolve server nfs-server.example.com: Name or service not known
Warning FailedMount MountVolume.SetUp failed for volume "nfs-data" : mount failed: exit status 1
Mounting command: /home/kubernetes/containerized_mounter/mounter
Mounting arguments: mount -t nfs nfs-server.example.com:/exported/path /var/lib/kubelet/pods/c098f15d-808d-43ac-b296-5bf6d2373bdd/volumes/kubernetes.io~nfs/nfs-data
Output: Mount failed: mount failed: exit status 32
Mounting command: chroot
Mounting arguments: [/home/kubernetes/containerized_mounter/rootfs mount -t nfs nfs-server.example.com:/exported/path /var/lib/kubelet/pods/c098f15d-808d-43ac-b296-5bf6d2373bdd/volumes/kubernetes.io~nfs/nfs-data]
Output: MOUNT_WRAPPER: starting rpcbind
* Already running: rpcbind
mount.nfs: Failed to resolve server nfs-server.example.com: Name or service not known
Warning FailedMount MountVolume.SetUp failed for volume "nfs-data" : mount failed: exit status 1
Mounting command: /home/kubernetes/containerized_mounter/mounter
Mounting arguments: mount -t nfs nfs-server.example.com:/exported/path /var/lib/kubelet/pods/c098f15d-808d-43ac-b296-5bf6d2373bdd/volumes/kubernetes.io~nfs/nfs-data
Output: Mount failed: mount failed: exit status 32
Mounting command: chroot
Mounting arguments: [/home/kubernetes/containerized_mounter/rootfs mount -t nfs nfs-server.example.com:/exported/path /var/lib/kubelet/pods/c098f15d-808d-43ac-b296-5bf6d2373bdd/volumes/kubernetes.io~nfs/nfs-data]
Output: MOUNT_WRAPPER: starting rpcbind
* Already running: rpcbind
mount.nfs: Failed to resolve server nfs-server.example.com: Name or service not known
Warning FailedMount MountVolume.SetUp failed for volume "nfs-data" : mount failed: exit status 1
Mounting command: /home/kubernetes/containerized_mounter/mounter
Mounting arguments: mount -t nfs nfs-server.example.com:/exported/path /var/lib/kubelet/pods/c098f15d-808d-43ac-b296-5bf6d2373bdd/volumes/kubernetes.io~nfs/nfs-data
Output: Mount failed: mount failed: exit status 32
Mounting command: chroot
Mounting arguments: [/home/kubernetes/containerized_mounter/rootfs mount -t nfs nfs-server.example.com:/exported/path /var/lib/kubelet/pods/c098f15d-808d-43ac-b296-5bf6d2373bdd/volumes/kubernetes.io~nfs/nfs-data]
Output: MOUNT_WRAPPER: starting rpcbind
* Already running: rpcbind
mount.nfs: Failed to resolve server nfs-server.example.com: Name or service not known
Warning FailedMount MountVolume.SetUp failed for volume "nfs-data" : mount failed: exit status 1
Mounting command: /home/kubernetes/containerized_mounter/mounter
Mounting arguments: mount -t nfs nfs-server.example.com:/exported/path /var/lib/kubelet/pods/c098f15d-808d-43ac-b296-5bf6d2373bdd/volumes/kubernetes.io~nfs/nfs-data
Output: Mount failed: mount failed: exit status 32
Mounting command: chroot
Mounting arguments: [/home/kubernetes/containerized_mounter/rootfs mount -t nfs nfs-server.example.com:/exported/path /var/lib/kubelet/pods/c098f15d-808d-43ac-b296-5bf6d2373bdd/volumes/kubernetes.io~nfs/nfs-data]
Output: MOUNT_WRAPPER: starting rpcbind
* Already running: rpcbind
mount.nfs: Failed to resolve server nfs-server.example.com: Name or service not known
Warning FailedMount MountVolume.SetUp failed for volume "nfs-data" : mount failed: exit status 1
Mounting command: /home/kubernetes/containerized_mounter/mounter
Mounting arguments: mount -t nfs nfs-server.example.com:/exported/path /var/lib/kubelet/pods/c098f15d-808d-43ac-b296-5bf6d2373bdd/volumes/kubernetes.io~nfs/nfs-data
Output: Mount failed: mount failed: exit status 32
Mounting command: chroot
Mounting arguments: [/home/kubernetes/containerized_mounter/rootfs mount -t nfs nfs-server.example.com:/exported/path /var/lib/kubelet/pods/c098f15d-808d-43ac-b296-5bf6d2373bdd/volumes/kubernetes.io~nfs/nfs-data]
Output: MOUNT_WRAPPER: starting rpcbind
* Already running: rpcbind
mount.nfs: Failed to resolve server nfs-server.example.com: Name or service not known
ERROR: Job failed: canceled
Would you mind giving it another try using this config: (I deleted the comment below to keep the discussion in this thread)
concurrent = 1
check_interval = 30
[[runners]]
name = "kubernetes-runner"
url = "https://gitlab.example.com"
token = "__REDACTED__"
executor = "kubernetes"
# Enable the feature flag for pod_spec
environment = ["FF_USE_ADVANCED_POD_SPEC_CONFIGURATION=true"]
[runners.kubernetes]
namespace = "gitlab-runner"
image = "alpine:latest"
[[runners.kubernetes.pod_spec]]
name = "nfs-volume"
patch_type = "strategic"
patch = '''
containers:
- name: build
volumeMounts:
- name: nfs-data
mountPath: /mnt/nfs
- name: helper
volumeMounts:
- name: nfs-data
mountPath: /mnt/nfs
volumes:
- name: nfs-data
nfs:
server: nfs-server.example.com
path: /exported/path
readOnly: false
'''
@Emmanuel326
The monitoring was actually intentional. GitLab Runner relies on the helper image being present because all logs are proxied through it to the Runner. This is especially critical with the Attach strategy, where trap commands signal to Runner when a stage passes or fails.
If we remove this monitoring, jobs could hang indefinitely since the helper container would exit before the build container finishes processing.
I think it's worth digging into why the helper container is exiting with code 137. That could give us some important clues about what's happening thus adequately fixing the issue
But at the same time I am under the impression that the fix has not been deployed. This job: https://gitlab.com/gitlab-org/gitlab-runner/-/jobs/13590211080 was supposed to pass with the fix implemented. Anyway, I will keep digging just in case.
@stanhu Quick update
The error is still there
$ if [[ https://gitlab.com/gitlab-org/gitlab-runner != https://gitlab.com/gitlab-org/gitlab-runner || -z [MASKED] ]]; then
/tmp/step-runner-script-2559448087.sh: line 20: DANGER_GITLAB_API_TOKEN: command not found
awk: 1: unexpected character '''
awk: line 2: missing } near end of file
What we did in the previous MR isn't enough then. I will go through Claude suggestion and see what it missing
I am not closing the issue yet, because I want to second check that those jobs don't fail anymore against Pilot Runners
Job passes without failure: https://gitlab.com/ra-group2/playground-bis/-/jobs/13583612811
The helper used is the one from the MR linked above.
Running with gitlab-runner development version (HEAD)
on Local GitLab Runner for tests and debugging REDACTED, system ID: s_b188029b2abb
feature flags: FF_USE_FASTZIP:true, FF_USE_NEW_BASH_EVAL_STRATEGY:true, FF_SCRIPT_SECTIONS:true, FF_EXPORT_HIGH_CARDINALITY_METRICS:true, FF_USE_ADAPTIVE_REQUEST_CONCURRENCY:false, FF_SCRIPT_TO_STEP_MIGRATION:true
Preparing the "docker" executor
00:03
Using Docker executor with image alpine ...
Using helper image: registry.gitlab.com/gitlab-org/gitlab-runner/gitlab-runner-helper-dev:alpine-latest-arm64-943917bd (overridden, default would be registry.gitlab.com/gitlab-org/gitlab-runner/gitlab-runner-helper:arm64-latest )
Using effective pull policy of [if-not-present] for container registry.gitlab.com/gitlab-org/gitlab-runner/gitlab-runner-helper-dev:alpine-latest-arm64-943917bd
Authenticating with credentials from job payload (GitLab Registry)
Pulling docker image registry.gitlab.com/gitlab-org/gitlab-runner/gitlab-runner-helper-dev:alpine-latest-arm64-943917bd ...
Using docker image sha256:27587da377165ef43957f8e7f859ed9d43c7dbe44bb576b382785853069970f2 for registry.gitlab.com/gitlab-org/gitlab-runner/gitlab-runner-helper-dev:alpine-latest-arm64-943917bd with digest registry.gitlab.com/gitlab-org/gitlab-runner/gitlab-runner-helper-dev@sha256:27587da377165ef43957f8e7f859ed9d43c7dbe44bb576b382785853069970f2 ...
Using helper image: registry.gitlab.com/gitlab-org/gitlab-runner/gitlab-runner-helper-dev:alpine-latest-arm64-943917bd (overridden, default would be registry.gitlab.com/gitlab-org/gitlab-runner/gitlab-runner-helper:arm64-latest )
Using effective pull policy of [if-not-present] for container registry.gitlab.com/gitlab-org/gitlab-runner/gitlab-runner-helper-dev:alpine-latest-arm64-943917bd
Using docker image sha256:27587da377165ef43957f8e7f859ed9d43c7dbe44bb576b382785853069970f2 for registry.gitlab.com/gitlab-org/gitlab-runner/gitlab-runner-helper-dev:alpine-latest-arm64-943917bd with digest registry.gitlab.com/gitlab-org/gitlab-runner/gitlab-runner-helper-dev@sha256:27587da377165ef43957f8e7f859ed9d43c7dbe44bb576b382785853069970f2 ...
Using helper image: registry.gitlab.com/gitlab-org/gitlab-runner/gitlab-runner-helper-dev:alpine-latest-arm64-943917bd (overridden, default would be registry.gitlab.com/gitlab-org/gitlab-runner/gitlab-runner-helper:arm64-latest )
Using effective pull policy of [if-not-present] for container registry.gitlab.com/gitlab-org/gitlab-runner/gitlab-runner-helper-dev:alpine-latest-arm64-943917bd
Using docker image sha256:27587da377165ef43957f8e7f859ed9d43c7dbe44bb576b382785853069970f2 for registry.gitlab.com/gitlab-org/gitlab-runner/gitlab-runner-helper-dev:alpine-latest-arm64-943917bd with digest registry.gitlab.com/gitlab-org/gitlab-runner/gitlab-runner-helper-dev@sha256:27587da377165ef43957f8e7f859ed9d43c7dbe44bb576b382785853069970f2 ...
Using effective pull policy of [if-not-present] for container alpine
Using locally found image version due to "if-not-present" pull policy
Using docker image sha256:25109184c71bdad752c8312a8623239686a9a2071e8825f20acb8f2198c3f659 for alpine with digest alpine@sha256:25109184c71bdad752c8312a8623239686a9a2071e8825f20acb8f2198c3f659 ...
Preparing environment
00:00
Using helper image: registry.gitlab.com/gitlab-org/gitlab-runner/gitlab-runner-helper-dev:alpine-latest-arm64-943917bd (overridden, default would be registry.gitlab.com/gitlab-org/gitlab-runner/gitlab-runner-helper:arm64-latest )
Using effective pull policy of [if-not-present] for container registry.gitlab.com/gitlab-org/gitlab-runner/gitlab-runner-helper-dev:alpine-latest-arm64-943917bd
Using docker image sha256:27587da377165ef43957f8e7f859ed9d43c7dbe44bb576b382785853069970f2 for registry.gitlab.com/gitlab-org/gitlab-runner/gitlab-runner-helper-dev:alpine-latest-arm64-943917bd with digest registry.gitlab.com/gitlab-org/gitlab-runner/gitlab-runner-helper-dev@sha256:27587da377165ef43957f8e7f859ed9d43c7dbe44bb576b382785853069970f2 ...
Using effective pull policy of [if-not-present] for container sha256:27587da377165ef43957f8e7f859ed9d43c7dbe44bb576b382785853069970f2
Running on runner-idukxkzgd-project-25452826-concurrent-0 via ratchade--20240612-H2W0T...
Getting source from Git repository
00:01
Gitaly correlation ID: 9df6788c6eb4a294-YUL
Fetching changes with git depth set to 50...
Reinitialized existing Git repository in /builds/ra-group2/playground-bis/.git/
Created fresh repository.
Checking out 9849ed52 as detached HEAD (ref is hello)...
Removing artifact.txt
Skipping Git submodules setup
Executing "step_script" stage of the job script
00:00
Using effective pull policy of [if-not-present] for container alpine
Using docker image sha256:25109184c71bdad752c8312a8623239686a9a2071e8825f20acb8f2198c3f659 for alpine with digest alpine@sha256:25109184c71bdad752c8312a8623239686a9a2071e8825f20acb8f2198c3f659 ...
step-runner is listening on socket /tmp/step-runner.sock
Running step name=user_script
$ echo $'\033[1;32m$ if [ -z "${DANGER_GITLAB_API_TOKEN}" ]; then echo \'`DANGER_GITLAB_API_TOKEN` is not set...\' unset GITLAB_CI; ... awk \'{print $1}\' ...\033[0m'
$ if [ -z "${DANGER_GITLAB_API_TOKEN}" ]; then echo '`DANGER_GITLAB_API_TOKEN` is not set...' unset GITLAB_CI; ... awk '{print $1}' ...
$ echo $TEST > artifact.txt
Uploading artifacts for successful job
00:01
Uploading artifacts...
artifact.txt: found 1 matching artifact files and directories
Uploading artifacts as "archive" to coordinator... 201 Created correlation_id=9df678af7908a269-YUL id=13583612811 responseStatus=201 Created token=6d_y6XDFM
Cleaning up project directory and file based variables
00:01
Job succeeded
Romuald Atchadé (9849ed52) at 20 Mar 17:34
Edit .gitlab-ci.yml
We merged earlier today fix: Properly escape ANSI color codes in shell ... (gitlab-runner!6527 - merged) that improves the escaping. It should be deployed on the Pilot Runners by tomorrow. I was planning to check if that solves the issue (I think it does).
I will run a quick job with just
echo $'\033[1;32m$ if [ -z "${DANGER_GITLAB_API_TOKEN}" ]; then echo \'`DANGER_GITLAB_API_TOKEN` is not set...\' unset GITLAB_CI; ... awk \'{print $1}\' ...\033[0m'
Closing as gitlab-runner!6527 (merged) has been merged
Revert "Remove GPG signing color"
This reverts commit 2b238fce.
This reverts the changes made in !6484 as a fix has been implemented in !6527
This MR cannot be merged until !6527 is deployed with the Pilot Runners.
After deployment of !6527, all the packages' jobs must pass.
Relate to step-runner#414
Romuald Atchadé (9fd9f923) at 20 Mar 15:20
Revert "Remove GPG signing color"
Thanks @stanhu I am planning to check if I can find out where is the issue with the script