Skip to content

update kselftest, use Linaro test-defintions#445

Merged
gctucker merged 3 commits intokernelci:masterfrom
khilman:wip/kselftest-template
Sep 15, 2020
Merged

update kselftest, use Linaro test-defintions#445
gctucker merged 3 commits intokernelci:masterfrom
khilman:wip/kselftest-template

Conversation

@khilman
Copy link
Contributor

@khilman khilman commented Jul 17, 2020

Update the kselftest test-plan:

  1. use the kselftest tarball that is now part of kernelci builds.
  2. use debian NFS root (ramdisk + modules, then pivot-root to NFS)
  3. use Linaro test-definitions repo for running kselftests

Sharing test-defintions with LInaro/LKFT will allow better collaboration on keeping kselftests running/reliable.

NOTE: this draft PR is based on staging due to dependencies on existing PRs not yet merged into master.

@khilman khilman added the staging-skip Don't test automatically on staging.kernelci.org label Jul 17, 2020
@khilman khilman linked an issue Jul 22, 2020 that may be closed by this pull request
5 tasks
@khilman khilman force-pushed the wip/kselftest-template branch from e9412c2 to 73f1f54 Compare August 11, 2020 16:48
@khilman khilman marked this pull request as ready for review August 11, 2020 16:49
@khilman khilman added staging-skip Don't test automatically on staging.kernelci.org and removed staging-skip Don't test automatically on staging.kernelci.org labels Aug 11, 2020
@khilman
Copy link
Contributor Author

khilman commented Aug 12, 2020

FYI: still using staging-skip because this depends on the fix here for a working rootfs: #463

@gctucker gctucker requested a review from a team August 12, 2020 20:42
@khilman khilman force-pushed the wip/kselftest-template branch from 73f1f54 to 1548086 Compare August 18, 2020 19:18
@khilman
Copy link
Contributor Author

khilman commented Aug 18, 2020

Update includes:

Successful LAVA job: https://lava.baylibre.com/scheduler/job/1757#bottom

Removing staging-skip now that rootfs is available.

@khilman khilman removed the staging-skip Don't test automatically on staging.kernelci.org label Aug 18, 2020
@khilman
Copy link
Contributor Author

khilman commented Aug 21, 2020

Looks good in staging:

However, while the LAVA job runs successfully, I don't see any kselftest results in the staging frontend: https://staging.kernelci.org/test/

@gctucker any idea what might be going on here?
The lab-baylibre callback tokens etc. are fine because lab-baylibre is sending staging results for all the other test plans.

@gctucker
Copy link
Collaborator

Looking at the backend logs, it seems to be some bug in the log line number translation:

Aug 21 16:40:25 kernelci-staging kernelci-celery[32919]: [    INFO/MainThread] Processing LAVA test data: job 17034 from lab-baylibre
Aug 21 16:40:27 kernelci-staging kernelci-celery[32919]: [    INFO/MainThread] Received task: error-handler[be666017-4980-47d7-b388-b5c7d98561de]
Aug 21 16:40:27 kernelci-staging kernelci-celery[32919]: [   ERROR/MainThread] Task failed, UUID: fcf4f889-8db7-4b12-8846-e68c2a5d4bd3, error: '12612'
Aug 21 16:40:27 kernelci-staging kernelci-celery[32919]: [    INFO/MainThread] Task error-handler[be666017-4980-47d7-b388-b5c7d98561de] succeeded in 0.00206007994711s: None
Aug 21 16:40:27 kernelci-staging kernelci-celery[32919]: [   ERROR/MainThread] Task lava-test[fcf4f889-8db7-4b12-8846-e68c2a5d4bd3] raised unexpected: KeyError('12612',)
Aug 21 16:40:27 kernelci-staging kernelci-celery[32919]: Traceback (most recent call last):
Aug 21 16:40:27 kernelci-staging kernelci-celery[32919]:   File "/srv/.venv/api.staging.kernelci.org/local/lib/python2.7/site-packages/celery/app/trace.py", line 240, in trace_task
Aug 21 16:40:27 kernelci-staging kernelci-celery[32919]:     R = retval = fun(*args, **kwargs)
Aug 21 16:40:27 kernelci-staging kernelci-celery[32919]:   File "/srv/.venv/api.staging.kernelci.org/local/lib/python2.7/site-packages/celery/app/trace.py", line 438, in __protected_call__
Aug 21 16:40:27 kernelci-staging kernelci-celery[32919]:     return self.run(*args, **kwargs)
Aug 21 16:40:27 kernelci-staging kernelci-celery[32919]:   File "/srv/api.staging.kernelci.org/app/taskqueue/tasks/callback.py", line 42, in lava_test
Aug 21 16:40:27 kernelci-staging kernelci-celery[32919]:     taskc.app.conf.db_options)
Aug 21 16:40:27 kernelci-staging kernelci-celery[32919]:   File "/srv/api.staging.kernelci.org/app/utils/callback/lava.py", line 652, in add_tests
Aug 21 16:40:27 kernelci-staging kernelci-celery[32919]:     _translate_log_end_lines(results, translate_map)
Aug 21 16:40:27 kernelci-staging kernelci-celery[32919]:   File "/srv/api.staging.kernelci.org/app/utils/callback/lava.py", line 590, in _translate_log_end_lines
Aug 21 16:40:27 kernelci-staging kernelci-celery[32919]:     new_log_end_line = translate_map[result['log_end_line']]
Aug 21 16:40:27 kernelci-staging kernelci-celery[32919]: KeyError: '12612'

I guess the kselftest job logs happen to hit that condition for some reason, maybe because they're longer than the other test jobs we're currently running. I'll take a closer look, it should be easy to reproduce by capturing the callback JSON and re-injecting it directly in the callback handler.

@khilman
Copy link
Contributor Author

khilman commented Aug 24, 2020

FYI @mgalka

@khilman khilman force-pushed the wip/kselftest-template branch from 1548086 to aaf3b56 Compare August 25, 2020 20:22
@gctucker
Copy link
Collaborator

The backend exception has been fixed: kernelci/kernelci-backend#251

Copy link
Collaborator

@gctucker gctucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the other template variants though (barebox, grub)? Even if they can't all be tested right now, I think they should be updated to also include kselftest/kselftest.jinja2. And the ipxe one should probably be dropped altogether as nothing is using that any more.


- repository: https://github.com/Linaro/test-definitions.git
from: git
revision: master
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One issue with using the head of the master branch is that it makes it harder to reproduce results. There doesn't seem to be any regular release tags in the test-definitions repository so we should either ask for tags to be created or keep a mirror and add KernelCI tags for each production update.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, we will need a fork of test-definitions for KernelCI in any case to have a staging branch. We could use that fork with a kernelci.org branch used in production too.

name: kselftest
parameters:
TESTPROG_URL: {{ kselftests_url }}
SKIPFILE: skipfile-lkft.yaml
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some skips will be platform-dependent, and there are many more platforms being tested in KernelCI than LKFT. So we'll need to either have a common file for everyone, or skipfile-kernelci.yaml added to the repo, or have a separate file (probably in a fork).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also worth mentioning that KernelCI tracks and reports regressions only, not test cases that have never passed. So we may be OK without maintaining a full list of tests that won't be fixed, only the broken ones that really don't work or cause some bad side-effects and prevent other tests from running.

filters:
- passlist: {defconfig: ['kselftest']}
params:
job_timeout: '60'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reminds me, we should change the parameter name to make it clear the timeout is a number of minutes...

@gctucker
Copy link
Collaborator

Other than the comments above, this was tested OK on staging:
https://staging.kernelci.org/test/plan/id/5f45aff802c5b617e7b1458c/

@gctucker
Copy link
Collaborator

@khilman Please also drop the "WIP" from the PR title if/when you think it's ready to be merged.

@kees
Copy link
Contributor

kees commented Aug 26, 2020

Yay; this looks like great progress! Selftest tarball built, saved, fetched, and run! :)

Next seems to be fixing the execution environment? It seems like something is mangling the log output (and likely interfering with the tests):

TAP version 13
13: _ TAP
1..72
: _ 1..72
# selftests lkdtm PANIC.sh
lkdtm: PANIC.sh_ #
# Skipping PANIC crashes entire system
PANIC: crashes_entire #
[SKIP] 1 selftests lkdtm PANIC.sh # SKIP
selftests: lkdtm_PANIC.sh [SKIP]
# selftests lkdtm BUG.sh
lkdtm: BUG.sh_ #
# BUG missing 'kernel BUG at' [FAIL]
missing: 'kernel_BUG #
[FAIL] 2 selftests lkdtm BUG.sh # exit=1
selftests: lkdtm_BUG.sh [FAIL]

All the weird "$word: _ $truncated_later_words" lines are coming from somewhere external to the kselftest run. Is that Lava getting in the way? e.g. "13: _ TAP", etc.

And maybe some portion of the test's stdout/stderr is missing? This lkdtm "BUG" test can't actually fail: something central seems to be missing.

@gctucker
Copy link
Collaborator

@kees Thanks a lot for checking this out! Yes LAVA adds some noise because it uses the serial console to communicate with the platform. Then there is some logic in KernelCI backend to reduce the noise but maybe it's not tuned correctly. Let's dig out the raw log and what the filters produced to see what's going on.

This is actually not a kselftest specific issue, the same logic is used for all LAVA tests. Maybe it's just more visible with kselftest.

@khilman khilman changed the title WIP: update kselftest, use Linaro test-defintions update kselftest, use Linaro test-defintions Aug 26, 2020
@khilman
Copy link
Contributor Author

khilman commented Aug 26, 2020

@khilman Please also drop the "WIP" from the PR title if/when you think it's ready to be merged.

Removed the WIP. I think the issues with the output mangling are not directly related to this PR. Some tweaking of the test-defintion itself to be sure that the results are parsed from a log file and not from stdout should probably be explored.

@khilman
Copy link
Contributor Author

khilman commented Aug 26, 2020

I wonder if @danrue might have any feedback on the output mangling issues above. We're using the LKFT LAVA test-definition un-modified.

@kees
Copy link
Contributor

kees commented Aug 26, 2020

Should I open a separate bug for the "log mangling"? I can't quite figure out yet how to reproduce the runtime environment the tests are in. (As in, how do I set up Lava locally to try to debug this myself, etc.)

@khilman
Copy link
Contributor Author

khilman commented Aug 28, 2020

@kees setting up LAVA might be a bit overkill, but what you could try is running the wrapper scripts that the LInaro test-defintion runs.

Here's the test-definition: https://github.com/Linaro/test-definitions/tree/master/automated/linux/kselftest
and the import part is the shell script that it runs: https://github.com/Linaro/test-definitions/blob/master/automated/linux/kselftest/kselftest.yaml#L50
That's the script that calls ./run_kselftest.sh but it appears that does some of its own sed magic which helps LAVA parse results.

It does tee the output to a logfile, which the kselftest.sh script then uses to parse and collect LAVA results (which are then sent to KernelCI), so in theory, even with some mess on stdout, the result parsing should not be affected since it's using the logfile and not stdtout.

@gctucker
Copy link
Collaborator

@kees @khilman I've captured the raw log produced by LAVA and dumped all the "target" messages in a plain text file:
https://people.collabora.com/~gtucker/tmp/kselftest-raw-lava-target-log.txt

The only difference between that and what is stored by KernelCI is some LAVA signals, here's the diff with what the filters did:

--- kselftest-raw-lava-target-log.txt	2020-08-28 09:29:10.661301920 +0100
+++ kselftest-meson-g12b-odroid-n2.txt	2020-08-28 09:14:01.000000000 +0100
@@ -1975,17 +1975,14 @@
 + cat uuid
 + UUID=31179_1.6.2.4.1
 + set +x
-<LAVA_SIGNAL_STARTRUN 0_timesync-off 31179_1.6.2.4.1>
 + systemctl stop systemd-timesyncd
 [0;1;31mWarning:[0m The unit file, source configuration file or drop-ins of systemd-timesyncd.service changed on disk. Run 'systemctl daemon-reload' to reload units.
 + set +x
-<LAVA_SIGNAL_ENDRUN 0_timesync-off 31179_1.6.2.4.1>
 + export TESTRUN_ID=1_kselftest
 + cd /lava-31179/0/tests/1_kselftest
 + cat uuid
 + UUID=31179_1.6.2.4.5
 + set +x
-<LAVA_SIGNAL_STARTRUN 1_kselftest 31179_1.6.2.4.5>
 + cd ./automated/linux/kselftest/
 + ./kselftest.sh -t kselftest_armhf.tar.gz -s false -u http://storage.staging.kernelci.org/kernelci/staging.kernelci.org/staging-20200825.2/arm64/defconfig+kselftest/gcc-8/kselftest.tar.gz -L  -S skipfile-lkft.yaml -b  -g  -e  -p /opt/kselftests/mainline/
 INFO: Generating a skipfile based on /lava-31179/0/tests/1_kselftest/automated/linux/kselftest/skipfile-lkft.yaml

So if the kselftest log output doesn't look right, it's indeed a problem with how the tests are being run in LAVA. So it's not a KernelCI issue per se but rather something with either kselftest or the Linaro test definitions. Let's see if @danrue has some thoughts on that; we can also check what the LKFT logs and results look like.

@gctucker
Copy link
Collaborator

FYI, some LKFT results and sample log:
https://qa-reports.linaro.org/lkft/linux-stable-rc-5.8-oe/build/v5.8.5/testrun/3125177/suite/kselftest/tests/
https://qa-reports.linaro.org/lkft/linux-stable-rc-5.8-oe/build/v5.8.5/testrun/3125177/suite/kselftest/test/bpf_get_cgroup_id_user/log

Looks similar to what we get (with apparently an extra bug when calling diff...):

TAP version 13
13: _ TAP
1..71
: _ 1..71
# selftests lkdtm PANIC.sh
lkdtm: PANIC.sh_ #
# Skipping PANIC crashes entire system
PANIC: crashes_entire #
[SKIP] 1 selftests lkdtm PANIC.sh # SKIP
selftests: lkdtm_PANIC.sh [SKIP]
# selftests lkdtm BUG.sh
lkdtm: BUG.sh_ #
# diff unrecognized option '--changed-group-format=%>'
unrecognized: option_'--changed-group-format=%>' #
# BusyBox v1.27.2 (2020-08-04 221300 UTC) multi-call binary.
v1.27.2: (2020-08-04_221300 #
# 
: _ #
# Usage diff [-abBdiNqrTstw] [-L LABEL] [-S FILE] [-U LINES] FILE1 FILE2
diff: [-abBdiNqrTstw]_[-L #
# BUG missing 'kernel BUG at' [FAIL]

@danrue
Copy link
Contributor

danrue commented Aug 28, 2020

I haven't looked at LKFT's kselftest implementation details in a while, but I know that we often have problems getting results from kselftest runs. The diff thing looks like a busybox vs gnu diff type problem.

The kselftest test-plan has been long out of date and mostly unused.
Remove all current users since results have not been looked at, and
because the test-plan is changing to require NFSroot, which not
all devices support.

Devices can be added back as they are validated.

Signed-off-by: Kevin Hilman <[email protected]>
Update the kselftest testplan to use the kselftest tarball that is
generated during the kernel build process.

Also use the Linaro test-definitions repo for the test-plan.

Signed-off-by: Kevin Hilman <[email protected]>
@khilman khilman force-pushed the wip/kselftest-template branch from aaf3b56 to fb3bc4b Compare September 2, 2020 14:28
@gctucker gctucker mentioned this pull request Sep 2, 2020
@@ -10,6 +10,7 @@ labs:
plan:
- baseline
- baseline-fastboot
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it's worth adding baseline-nfs since kselftest uses nfsroot, to check it's working with regular kernel configs:

Suggested change
- baseline-fastboot
- baseline-nfs
- baseline-fastboot

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left that out on purpose because it's an extra test that doesn't add any value, other than sanity checking NFS.

Since there's no way to check the status of baseline-nfs before running kselftest, I don't see any extra value here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well it can be used to verify manually that nfsroot works as expected, when kselftest jobs fail to run.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but that's orthogonal to this patch, and should be up to the lab owner to decide. Let's not hold this series up for unrelated issues.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, this is not what's holding this PR. It just seemed like an obvious thing to me when I started enabling kselftests on more devices.

Copy link
Collaborator

@gctucker gctucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested OK in combination with #487, which will need to be rebased and tested again once this is is merged.

@gctucker gctucker merged commit 6ec6343 into kernelci:master Sep 15, 2020
@kees
Copy link
Contributor

kees commented Sep 15, 2020

@kees setting up LAVA might be a bit overkill, but what you could try is running the wrapper scripts that the LInaro test-defintion runs.

Here's the test-definition: https://github.com/Linaro/test-definitions/tree/master/automated/linux/kselftest
and the import part is the shell script that it runs: https://github.com/Linaro/test-definitions/blob/master/automated/linux/kselftest/kselftest.yaml#L50
That's the script that calls ./run_kselftest.sh but it appears that does some of its own sed magic which helps LAVA parse results.

Well this looks like exactly the problem:

./run_kselftest.sh 2>&1 | tee "${LOGFILE}" | sed 's/^not ok/[FAIL]/'|sed 's/^ok/[PASS]/'|sed 's/://g'|awk '{if ($0 ~ "# SKIP") {$1 = "[SKIP]"} print $0"\n"$3 ": " $4"_"$5 " " $1}'

It assume LAVA doesn't actually parse TAP? Hmmm

@gctucker
Copy link
Collaborator

Right, so that's an issue with the Linaro test definitions rather than KernelCI which runs them.

I've created a fork in kernelci/test-definitions so we can work on things like that, and a first issue to follow up from this PR: kernelci/test-definitions#1

@khilman khilman deleted the wip/kselftest-template branch January 18, 2023 18:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

kselftest

4 participants