update kselftest, use Linaro test-defintions by khilman · Pull Request #445 · kernelci/kernelci-core

khilman · 2020-07-17T17:13:07Z

Update the kselftest test-plan:

use the kselftest tarball that is now part of kernelci builds.
use debian NFS root (ramdisk + modules, then pivot-root to NFS)
use Linaro test-definitions repo for running kselftests

Sharing test-defintions with LInaro/LKFT will allow better collaboration on keeping kselftests running/reliable.

NOTE: this draft PR is based on staging due to dependencies on existing PRs not yet merged into master.

khilman · 2020-08-12T17:44:04Z

FYI: still using staging-skip because this depends on the fix here for a working rootfs: #463

khilman · 2020-08-18T19:43:27Z

Update includes:

rootfs image update for: rootfs_configs: fix baseline overlay path #463
rebase onto latest master

Successful LAVA job: https://lava.baylibre.com/scheduler/job/1757#bottom

Removing staging-skip now that rootfs is available.

khilman · 2020-08-21T18:33:49Z

Looks good in staging:

test-runner job: https://bot.staging.kernelci.org/job/test-runner/4720/console
LAVA job (lab-baylibre: https://lava.baylibre.com/scheduler/job/17034

However, while the LAVA job runs successfully, I don't see any kselftest results in the staging frontend: https://staging.kernelci.org/test/

@gctucker any idea what might be going on here?
The lab-baylibre callback tokens etc. are fine because lab-baylibre is sending staging results for all the other test plans.

gctucker · 2020-08-21T19:47:11Z

Looking at the backend logs, it seems to be some bug in the log line number translation:

Aug 21 16:40:25 kernelci-staging kernelci-celery[32919]: [    INFO/MainThread] Processing LAVA test data: job 17034 from lab-baylibre
Aug 21 16:40:27 kernelci-staging kernelci-celery[32919]: [    INFO/MainThread] Received task: error-handler[be666017-4980-47d7-b388-b5c7d98561de]
Aug 21 16:40:27 kernelci-staging kernelci-celery[32919]: [   ERROR/MainThread] Task failed, UUID: fcf4f889-8db7-4b12-8846-e68c2a5d4bd3, error: '12612'
Aug 21 16:40:27 kernelci-staging kernelci-celery[32919]: [    INFO/MainThread] Task error-handler[be666017-4980-47d7-b388-b5c7d98561de] succeeded in 0.00206007994711s: None
Aug 21 16:40:27 kernelci-staging kernelci-celery[32919]: [   ERROR/MainThread] Task lava-test[fcf4f889-8db7-4b12-8846-e68c2a5d4bd3] raised unexpected: KeyError('12612',)
Aug 21 16:40:27 kernelci-staging kernelci-celery[32919]: Traceback (most recent call last):
Aug 21 16:40:27 kernelci-staging kernelci-celery[32919]:   File "/srv/.venv/api.staging.kernelci.org/local/lib/python2.7/site-packages/celery/app/trace.py", line 240, in trace_task
Aug 21 16:40:27 kernelci-staging kernelci-celery[32919]:     R = retval = fun(*args, **kwargs)
Aug 21 16:40:27 kernelci-staging kernelci-celery[32919]:   File "/srv/.venv/api.staging.kernelci.org/local/lib/python2.7/site-packages/celery/app/trace.py", line 438, in __protected_call__
Aug 21 16:40:27 kernelci-staging kernelci-celery[32919]:     return self.run(*args, **kwargs)
Aug 21 16:40:27 kernelci-staging kernelci-celery[32919]:   File "/srv/api.staging.kernelci.org/app/taskqueue/tasks/callback.py", line 42, in lava_test
Aug 21 16:40:27 kernelci-staging kernelci-celery[32919]:     taskc.app.conf.db_options)
Aug 21 16:40:27 kernelci-staging kernelci-celery[32919]:   File "/srv/api.staging.kernelci.org/app/utils/callback/lava.py", line 652, in add_tests
Aug 21 16:40:27 kernelci-staging kernelci-celery[32919]:     _translate_log_end_lines(results, translate_map)
Aug 21 16:40:27 kernelci-staging kernelci-celery[32919]:   File "/srv/api.staging.kernelci.org/app/utils/callback/lava.py", line 590, in _translate_log_end_lines
Aug 21 16:40:27 kernelci-staging kernelci-celery[32919]:     new_log_end_line = translate_map[result['log_end_line']]
Aug 21 16:40:27 kernelci-staging kernelci-celery[32919]: KeyError: '12612'

I guess the kselftest job logs happen to hit that condition for some reason, maybe because they're longer than the other test jobs we're currently running. I'll take a closer look, it should be easy to reproduce by capturing the callback JSON and re-injecting it directly in the callback handler.

khilman · 2020-08-24T20:52:08Z

FYI @mgalka

gctucker · 2020-08-26T09:04:08Z

The backend exception has been fixed: kernelci/kernelci-backend#251

gctucker

What about the other template variants though (barebox, grub)? Even if they can't all be tested right now, I think they should be updated to also include kselftest/kselftest.jinja2. And the ipxe one should probably be dropped altogether as nothing is using that any more.

gctucker · 2020-08-26T09:25:09Z

templates/kselftest/kselftest.jinja2

+
+    - repository: https://github.com/Linaro/test-definitions.git
+      from: git
+      revision: master


One issue with using the head of the master branch is that it makes it harder to reproduce results. There doesn't seem to be any regular release tags in the test-definitions repository so we should either ask for tags to be created or keep a mirror and add KernelCI tags for each production update.

As discussed, we will need a fork of test-definitions for KernelCI in any case to have a staging branch. We could use that fork with a kernelci.org branch used in production too.

gctucker · 2020-08-26T09:27:50Z

templates/kselftest/kselftest.jinja2

+      name: kselftest
+      parameters:
+        TESTPROG_URL: {{ kselftests_url }}
+        SKIPFILE: skipfile-lkft.yaml


Some skips will be platform-dependent, and there are many more platforms being tested in KernelCI than LKFT. So we'll need to either have a common file for everyone, or skipfile-kernelci.yaml added to the repo, or have a separate file (probably in a fork).

Also worth mentioning that KernelCI tracks and reports regressions only, not test cases that have never passed. So we may be OK without maintaining a full list of tests that won't be fixed, only the broken ones that really don't work or cause some bad side-effects and prevent other tests from running.

test-configs.yaml

gctucker · 2020-08-26T09:32:24Z

test-configs.yaml

    filters:
      - passlist: {defconfig: ['kselftest']}
+    params:
+      job_timeout: '60'


This reminds me, we should change the parameter name to make it clear the timeout is a number of minutes...

gctucker · 2020-08-26T09:37:53Z

Other than the comments above, this was tested OK on staging:
https://staging.kernelci.org/test/plan/id/5f45aff802c5b617e7b1458c/

gctucker · 2020-08-26T10:19:32Z

@khilman Please also drop the "WIP" from the PR title if/when you think it's ready to be merged.

kees · 2020-08-26T13:43:24Z

Yay; this looks like great progress! Selftest tarball built, saved, fetched, and run! :)

Next seems to be fixing the execution environment? It seems like something is mangling the log output (and likely interfering with the tests):

TAP version 13
13: _ TAP
1..72
: _ 1..72
# selftests lkdtm PANIC.sh
lkdtm: PANIC.sh_ #
# Skipping PANIC crashes entire system
PANIC: crashes_entire #
[SKIP] 1 selftests lkdtm PANIC.sh # SKIP
selftests: lkdtm_PANIC.sh [SKIP]
# selftests lkdtm BUG.sh
lkdtm: BUG.sh_ #
# BUG missing 'kernel BUG at' [FAIL]
missing: 'kernel_BUG #
[FAIL] 2 selftests lkdtm BUG.sh # exit=1
selftests: lkdtm_BUG.sh [FAIL]

All the weird "$word: _ $truncated_later_words" lines are coming from somewhere external to the kselftest run. Is that Lava getting in the way? e.g. "13: _ TAP", etc.

And maybe some portion of the test's stdout/stderr is missing? This lkdtm "BUG" test can't actually fail: something central seems to be missing.

gctucker · 2020-08-26T15:21:39Z

@kees Thanks a lot for checking this out! Yes LAVA adds some noise because it uses the serial console to communicate with the platform. Then there is some logic in KernelCI backend to reduce the noise but maybe it's not tuned correctly. Let's dig out the raw log and what the filters produced to see what's going on.

This is actually not a kselftest specific issue, the same logic is used for all LAVA tests. Maybe it's just more visible with kselftest.

khilman · 2020-08-26T18:54:48Z

@khilman Please also drop the "WIP" from the PR title if/when you think it's ready to be merged.

Removed the WIP. I think the issues with the output mangling are not directly related to this PR. Some tweaking of the test-defintion itself to be sure that the results are parsed from a log file and not from stdout should probably be explored.

khilman · 2020-08-26T18:57:21Z

I wonder if @danrue might have any feedback on the output mangling issues above. We're using the LKFT LAVA test-definition un-modified.

kees · 2020-08-26T21:10:09Z

Should I open a separate bug for the "log mangling"? I can't quite figure out yet how to reproduce the runtime environment the tests are in. (As in, how do I set up Lava locally to try to debug this myself, etc.)

khilman · 2020-08-28T00:03:02Z

@kees setting up LAVA might be a bit overkill, but what you could try is running the wrapper scripts that the LInaro test-defintion runs.

Here's the test-definition: https://github.com/Linaro/test-definitions/tree/master/automated/linux/kselftest
and the import part is the shell script that it runs: https://github.com/Linaro/test-definitions/blob/master/automated/linux/kselftest/kselftest.yaml#L50
That's the script that calls ./run_kselftest.sh but it appears that does some of its own sed magic which helps LAVA parse results.

It does tee the output to a logfile, which the kselftest.sh script then uses to parse and collect LAVA results (which are then sent to KernelCI), so in theory, even with some mess on stdout, the result parsing should not be affected since it's using the logfile and not stdtout.

gctucker · 2020-08-28T08:36:08Z

@kees @khilman I've captured the raw log produced by LAVA and dumped all the "target" messages in a plain text file:
https://people.collabora.com/~gtucker/tmp/kselftest-raw-lava-target-log.txt

The only difference between that and what is stored by KernelCI is some LAVA signals, here's the diff with what the filters did:

--- kselftest-raw-lava-target-log.txt	2020-08-28 09:29:10.661301920 +0100
+++ kselftest-meson-g12b-odroid-n2.txt	2020-08-28 09:14:01.000000000 +0100
@@ -1975,17 +1975,14 @@
 + cat uuid
 + UUID=31179_1.6.2.4.1
 + set +x
-<LAVA_SIGNAL_STARTRUN 0_timesync-off 31179_1.6.2.4.1>
 + systemctl stop systemd-timesyncd
 [0;1;31mWarning:[0m The unit file, source configuration file or drop-ins of systemd-timesyncd.service changed on disk. Run 'systemctl daemon-reload' to reload units.
 + set +x
-<LAVA_SIGNAL_ENDRUN 0_timesync-off 31179_1.6.2.4.1>
 + export TESTRUN_ID=1_kselftest
 + cd /lava-31179/0/tests/1_kselftest
 + cat uuid
 + UUID=31179_1.6.2.4.5
 + set +x
-<LAVA_SIGNAL_STARTRUN 1_kselftest 31179_1.6.2.4.5>
 + cd ./automated/linux/kselftest/
 + ./kselftest.sh -t kselftest_armhf.tar.gz -s false -u http://storage.staging.kernelci.org/kernelci/staging.kernelci.org/staging-20200825.2/arm64/defconfig+kselftest/gcc-8/kselftest.tar.gz -L  -S skipfile-lkft.yaml -b  -g  -e  -p /opt/kselftests/mainline/
 INFO: Generating a skipfile based on /lava-31179/0/tests/1_kselftest/automated/linux/kselftest/skipfile-lkft.yaml

So if the kselftest log output doesn't look right, it's indeed a problem with how the tests are being run in LAVA. So it's not a KernelCI issue per se but rather something with either kselftest or the Linaro test definitions. Let's see if @danrue has some thoughts on that; we can also check what the LKFT logs and results look like.

gctucker · 2020-08-28T08:44:03Z

FYI, some LKFT results and sample log:
https://qa-reports.linaro.org/lkft/linux-stable-rc-5.8-oe/build/v5.8.5/testrun/3125177/suite/kselftest/tests/
https://qa-reports.linaro.org/lkft/linux-stable-rc-5.8-oe/build/v5.8.5/testrun/3125177/suite/kselftest/test/bpf_get_cgroup_id_user/log

Looks similar to what we get (with apparently an extra bug when calling diff...):

TAP version 13
13: _ TAP
1..71
: _ 1..71
# selftests lkdtm PANIC.sh
lkdtm: PANIC.sh_ #
# Skipping PANIC crashes entire system
PANIC: crashes_entire #
[SKIP] 1 selftests lkdtm PANIC.sh # SKIP
selftests: lkdtm_PANIC.sh [SKIP]
# selftests lkdtm BUG.sh
lkdtm: BUG.sh_ #
# diff unrecognized option '--changed-group-format=%>'
unrecognized: option_'--changed-group-format=%>' #
# BusyBox v1.27.2 (2020-08-04 221300 UTC) multi-call binary.
v1.27.2: (2020-08-04_221300 #
# 
: _ #
# Usage diff [-abBdiNqrTstw] [-L LABEL] [-S FILE] [-U LINES] FILE1 FILE2
diff: [-abBdiNqrTstw]_[-L #
# BUG missing 'kernel BUG at' [FAIL]

danrue · 2020-08-28T13:49:22Z

I haven't looked at LKFT's kselftest implementation details in a while, but I know that we often have problems getting results from kselftest runs. The diff thing looks like a busybox vs gnu diff type problem.

The kselftest test-plan has been long out of date and mostly unused. Remove all current users since results have not been looked at, and because the test-plan is changing to require NFSroot, which not all devices support. Devices can be added back as they are validated. Signed-off-by: Kevin Hilman <[email protected]>

Update the kselftest testplan to use the kselftest tarball that is generated during the kernel build process. Also use the Linaro test-definitions repo for the test-plan. Signed-off-by: Kevin Hilman <[email protected]>

Signed-off-by: Kevin Hilman <[email protected]>

gctucker · 2020-09-09T14:11:44Z

lab-configs.yaml

@@ -10,6 +10,7 @@ labs:
          plan:
            - baseline
            - baseline-fastboot


I believe it's worth adding baseline-nfs since kselftest uses nfsroot, to check it's working with regular kernel configs:

Suggested change

- baseline-fastboot

- baseline-nfs

- baseline-fastboot

I left that out on purpose because it's an extra test that doesn't add any value, other than sanity checking NFS.

Since there's no way to check the status of baseline-nfs before running kselftest, I don't see any extra value here.

Well it can be used to verify manually that nfsroot works as expected, when kselftest jobs fail to run.

Sure, but that's orthogonal to this patch, and should be up to the lab owner to decide. Let's not hold this series up for unrelated issues.

Sure, this is not what's holding this PR. It just seemed like an obvious thing to me when I started enabling kselftests on more devices.

gctucker

Tested OK in combination with #487, which will need to be rebased and tested again once this is is merged.

kees · 2020-09-15T01:53:17Z

@kees setting up LAVA might be a bit overkill, but what you could try is running the wrapper scripts that the LInaro test-defintion runs.

Here's the test-definition: https://github.com/Linaro/test-definitions/tree/master/automated/linux/kselftest
and the import part is the shell script that it runs: https://github.com/Linaro/test-definitions/blob/master/automated/linux/kselftest/kselftest.yaml#L50
That's the script that calls ./run_kselftest.sh but it appears that does some of its own sed magic which helps LAVA parse results.

Well this looks like exactly the problem:

./run_kselftest.sh 2>&1 | tee "${LOGFILE}" | sed 's/^not ok/[FAIL]/'|sed 's/^ok/[PASS]/'|sed 's/://g'|awk '{if ($0 ~ "# SKIP") {$1 = "[SKIP]"} print $0"\n"$3 ": " $4"_"$5 " " $1}'

It assume LAVA doesn't actually parse TAP? Hmmm

gctucker · 2020-09-15T07:53:12Z

Right, so that's an issue with the Linaro test definitions rather than KernelCI which runs them.

I've created a fork in kernelci/test-definitions so we can work on things like that, and a first issue to follow up from this PR: kernelci/test-definitions#1

khilman added the staging-skip Don't test automatically on staging.kernelci.org label Jul 17, 2020

khilman linked an issue Jul 22, 2020 that may be closed by this pull request

kselftest #331

Closed

5 tasks

khilman force-pushed the wip/kselftest-template branch from e9412c2 to 73f1f54 Compare August 11, 2020 16:48

khilman marked this pull request as ready for review August 11, 2020 16:49

khilman added staging-skip Don't test automatically on staging.kernelci.org and removed staging-skip Don't test automatically on staging.kernelci.org labels Aug 11, 2020

gctucker requested a review from a team August 12, 2020 20:42

khilman force-pushed the wip/kselftest-template branch from 73f1f54 to 1548086 Compare August 18, 2020 19:18

khilman removed the staging-skip Don't test automatically on staging.kernelci.org label Aug 18, 2020

khilman force-pushed the wip/kselftest-template branch from 1548086 to aaf3b56 Compare August 25, 2020 20:22

gctucker mentioned this pull request Aug 26, 2020

build.py: build/install kselftests #339

Merged

gctucker reviewed Aug 26, 2020

View reviewed changes

khilman changed the title ~~WIP: update kselftest, use Linaro test-defintions~~ update kselftest, use Linaro test-defintions Aug 26, 2020

khilman mentioned this pull request Sep 1, 2020

add build config and test-plan for PREEMPT_RT #475

Merged

1 task

khilman added 3 commits September 2, 2020 07:27

templates/kselftest: use kernelci tarball, Linaro test-definitions

5ac9866

Update the kselftest testplan to use the kselftest tarball that is generated during the kernel build process. Also use the Linaro test-definitions repo for the test-plan. Signed-off-by: Kevin Hilman <[email protected]>

kselftest: add odroid-n2, enable in lab-baylibre

fb3bc4b

Signed-off-by: Kevin Hilman <[email protected]>

khilman force-pushed the wip/kselftest-template branch from aaf3b56 to fb3bc4b Compare September 2, 2020 14:28

gctucker mentioned this pull request Sep 2, 2020

T23362 kselftest setup #487

Merged

gctucker reviewed Sep 9, 2020

View reviewed changes

gctucker approved these changes Sep 14, 2020

View reviewed changes

gctucker merged commit 6ec6343 into kernelci:master Sep 15, 2020

gctucker mentioned this pull request Sep 15, 2020

kselftest log and results appear to be mangled kernelci/test-definitions#1

Open

khilman deleted the wip/kselftest-template branch January 18, 2023 18:10

Conversation

khilman commented Jul 17, 2020

Uh oh!

khilman commented Aug 12, 2020

Uh oh!

khilman commented Aug 18, 2020

Uh oh!

khilman commented Aug 21, 2020

Uh oh!

gctucker commented Aug 21, 2020

Uh oh!

khilman commented Aug 24, 2020

Uh oh!

gctucker commented Aug 26, 2020

Uh oh!

gctucker left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gctucker commented Aug 26, 2020

Uh oh!

gctucker commented Aug 26, 2020

Uh oh!

kees commented Aug 26, 2020

Uh oh!

gctucker commented Aug 26, 2020

Uh oh!

khilman commented Aug 26, 2020

Uh oh!

khilman commented Aug 26, 2020

Uh oh!

kees commented Aug 26, 2020

Uh oh!

khilman commented Aug 28, 2020

Uh oh!

gctucker commented Aug 28, 2020

Uh oh!

gctucker commented Aug 28, 2020

Uh oh!

danrue commented Aug 28, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gctucker left a comment

Choose a reason for hiding this comment

Uh oh!

kees commented Sep 15, 2020

Uh oh!

gctucker commented Sep 15, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants