Enable GitHub action to test the patches as early and as often as possible #21

coiby · 2024-07-03T03:47:49Z

This patch set enables GitHub Action to test the patches whenever there is a pull request (PR) or a change to the PR. By testing the patches as early and as often as possible, we can prevent problems or catch problems early.

Four kinds of tests will be triggered automatically,

Static analysis run by shellcheck
Unit tests run by ShellSpec
Integration tests
- nfs-kdump
- nfs-early-kdump,
- ssh-kdump
- local-kdump (disabled for now due to https://bugzilla.redhat.com/show_bug.cgi?id=2270423)
- lvm2-thinp-kdump

For integration tests, the tests are now re-written in TMT. TMT (Test Management Tool) supports running tests on virtual machines and provides an user-friendly interface (YAML) to define tasks like installing packages, running commands on the virtual machines. By using TMT, we no longer need to spend time and efforts on maintaining the current test framework and can also enjoy rich handy features like managing tests provided by TMT.

Barathrum-Liu

I like this change.

To be honest, I don't have enough knowledge about github actions to confirm the correctness of this PR. But I think it will not affect the functionality of kdump-utils itself, I am willing to give a try.

prudo1

Hi @coiby,
I second @Barathrum-Liu, it's great to have those tests run as early as possible. I'm a little bit concerned about all those hard coded version being used in the tests. Is there a way to stick with some official packages instead? Or maybe let all tests run on the same container, so we only need to update the container?
Thanks
Philipp

prudo1 · 2024-07-18T09:53:48Z

.github/workflows/main.yml

+      - uses: actions/checkout@v2
+      - run: wget https://github.com/mvdan/sh/releases/download/v3.4.3/shfmt_v3.4.3_linux_amd64 -O /usr/local/bin/shfmt && chmod +x /usr/local/bin/shfmt
+      - run: shfmt -d *.sh kdumpctl mk*dumprd


is there a reason why you download shfmt directly from github instead of simply installing it via the official repo? If so could you please add a comment why you are doing this.
I'm afraid with all the hard coded versions it will be pretty painful to maintain these tests as it would require to manually update them regularly when we run into problems...

Thanks for raising the concern! I downloaded shfmt from github mainly wanting to use latest shfmt. This patch dates back to one or two years ago and that time latest shfmt is v3.4.3 but the default ubuntu image provided by github uses a much older version (in fact Github's latest ubuntu is 22.04 and still has v3.4.3). Now for new version, I use Fedora's latest docker image which ships v3.7.0 (the latest upstream version is 3.8.0) instead. So we can use a relatively new shfmt without maintaining the Github Action .yml file.

prudo1 · 2024-07-18T09:54:12Z

.github/workflows/main.yml

+    steps:
+      - uses: actions/checkout@v2
+      - run: curl -L -O https://github.com/koalaman/shellcheck/releases/download/v0.8.0/shellcheck-v0.8.0.linux.x86_64.tar.xz && tar -xJf shellcheck-v0.8.0.linux.x86_64.tar.xz


same like for shfmt

For shellcheck, I also use Fedora's latest docker image which ships shellcheck:0.9.0 (the latest upstream version is v0.10.0) in new version now.

prudo1 · 2024-07-18T09:56:36Z

tools/run-integration-tests.sh

+	USE_GUESTMOUNT=1
+fi
+
+KUMP_TEST_QEMU_TIMEOUT=20m USE_GUESTMOUNT=$USE_GUESTMOUNT BASE_IMAGE=/usr/share/cloud_images/Fedora-Cloud-Base-Generic.x86_64-40-1.14.qcow2 RELEASE=$VERSION_ID make test-run


What happens when F40 runs out of service? Do we need to provide a new docker image then?

Current tests just fail F40 and they are difficult to fix. So I decide to abandon our own testing framework and choose TMT instead. Now the code to maintain is significantly less

47 files changed, 519 insertions(+), 1393 deletions(-)

For new approach, and there is no need to maintain the docker image.

coiby · 2024-08-01T01:38:42Z

Hi @Barathrum-Liu and @prudo1,

Thanks for the encouragements! I've pushed a new version and the biggest change is I abandon current testing framework due to some difficult issues and switch to TMT.

prudo1

Hi @coiby,
I finally had the time to take a look. Besides some nits the PR looks fine. My biggest critique is the coding style in the new scripts. There's a weird mixture of different indentations. So best to use shfmt to fix it.
In addition there are a lot of shellcheck findings for the new tools/ and tests/ directories.

tests/Makefile

tests/setup/trigger_crash/test.sh

prudo1 · 2024-08-30T13:16:24Z

tests/setup/trigger_crash/test.sh

+function get_IP() {
+    if echo $1 | grep -E -q '[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+'; then
+        echo $1
+    else
+        host $1 | sed -n -e 's/.*has address //p' | head -n 1
+    fi
+}


Usually we don't add the function keyword. I would prefer to keep it this way. Same for assign_server_roles.

In addition get_IP only works for IPv4. Does it make sense to expand it to IPv6 as well?

I drop the function keyword now. Thanks!

As for get_ip, currently the test VMs don't have IPv6. So unless you have a test case requiring IPv6, I'll defer it to future work.

prudo1 · 2024-08-30T13:17:35Z

tests/tests/check_vmcore/test.sh

+#!/bin/sh -eux
+


Is there a reason why you use /bin/sh here? All the other scripts use /bin/bash.

Thanks for catching it! tmt tests create /test/check_vmcore -t shell simply used /bin/sh and the new version uses bash instead.

prudo1 · 2024-08-30T13:18:21Z

tests/tests/check_vmcore/test.sh

+	MAX_WAIT_TIME=300
+	_waited=0
+	while [ $_waited -le $MAX_WAIT_TIME ]; do
+		if ls -1 $path &> /dev/null; then
+			vmcore_dir=$path/$(ls -1 $path | tail -n 1)
+			break
+		fi
+		_waited=$((_waited+1))
+	done


What unit does MAX_WAIT_TIME have? Seconds? Don't we need something like sleep 1 here?

Yeah, sleep 1 is needed, thanks for catching this issue!

prudo1 · 2024-08-30T13:54:24Z

tests/setup/lvm2_thinp/test.sh

+  elif [ $TMT_REBOOT_COUNT == 1 ]; then
+    rlPhaseStartTest
+        rlRun "grep crashkernel=$_default_crashkernel /proc/cmdline"
+    rlPhaseEnd
+  fi


Do we need this here? Isn't that already done in setup/default_crashkernel? Especially, where is $default_crashkernel set in this script?

You are right, it's not needed. New version drops it.

prudo1 · 2024-08-30T13:55:45Z

.github/workflows/main.yml

+name: kdump-utils tests
+


In commit message
s/kexec-tools repo/kdump-utils repo/

Fixed in new version, thanks!

prudo1 · 2024-08-30T14:03:44Z

tools/run-integration-tests.sh

+fedora_version=40
+if [[ -n $1 ]]; then
+  fedora_version=$1
+fi


fedora_version=${1:-40}?

Applied to new version, good suggestion!

prudo1 · 2024-08-30T14:03:48Z

.github/workflows/main.yml

+  static-analysis:
+    runs-on: ubuntu-latest
+    container: docker.io/fedora:latest
+    steps:
+      - uses: actions/checkout@v4
+      - run: dnf install shellcheck -y
+      # Currently, for kdump-utils, there is need for shellcheck to require
+      # the sourced file to give correct warnings about the checked file
+      - run: shellcheck --exclude=1090,1091 *.sh spec/*.sh kdumpctl mk*dumprd


Instead of disabling SC1091 I would add option -x so shellcheck follows the external files. That prevents false positive errors, e.g. when a pull request defines a new variable in kdump-lib.sh and references it in kdumpctl. With #33 merged there should be a directive on all sourced files now, so shellcheck knows where to find the files. If there isn't I would consider it a bug in our code we should fix rather than simply hiding the error. Same for SC1090.

Plus as above we need to add the new directories.

I disable SC109{0,1} in new version. New directories are also added now. Thanks!

prudo1 · 2024-08-30T14:03:51Z

.github/workflows/main.yml

+  format-check:
+    runs-on: ubuntu-latest
+    container: docker.io/fedora:latest
+    steps:
+      - uses: actions/checkout@v4
+      - run: dnf install shfmt -y
+      - run: shfmt -d *.sh kdumpctl mk*dumprd


dracut also uses option -s for their tests. Shall we use that as well?
In addition we need to add the new dracut/, tools/ and tests/ directories.

I've used -s and added the new directories now, thanks!

Currently, all integration tests fail F40 because of failing to install package and enable services on the cloud image properly. TMT (Test Management Tool) supports running tests on virtual machines and provides an user-friendly interface (YAML) to define tasks like installing packages, running commands on the virtual machines. By using TMT, we no longer need to spend time and efforts on maintaining the current test framework and can also enjoy rich handy features like managing tests provided by TMT. Signed-off-by: Coiby Xu <[email protected]>

Currently, all integration tests fail F40 because of failing to install package and enable services on the cloud image properly. TMT (Test Management Tool) supports running tests on virtual machines and provides an user-friendly interface (YAML) to define tasks like installing packages, running commands on the virtual machines. By using TMT, we no longer need to spend time and efforts on maintaining the current test framework and can also enjoy rich handy features like managing tests provided by TMT. There are six test steps in TMT. Quoting [1], here is an overview, - discover Gather information about test cases to be executed. - provision Provision an environment for testing or use localhost. - prepare Prepare the environment for testing. - execute Run tests using the specified executor. - report Provide test results overview and send reports. - finish Perform the finishing tasks and clean up provisioned guests. And each step step is configurable via the so-called plans [2]. TMT also uses plans to group relevant tests and select needed tests. This commit adds the test of sending crash dump to a NFS target by setting two VMs using TMT's MultiHost Testing feature [3], - client Setup up NFS dump target and trigger crash - server acts as an NFS server and check sent vmcore The scaffold of TMT test project can be created as follows, mkdir tests && cd tests tmt init --template mini mv plans/example/{example,kdump}.fmf tmt tests create /setup/default_crashkernel -t beakerlib tmt tests create /setup/nfs_server -t beakerlib tmt tests create /test/trigger_crash -t beakerlib tmt tests create /test/check_vmcore -t beakerlib Then we fill in the content for created plans and tests, # tmt run -a /var/tmp/tmt/run-017 /plans/kdump discover how: fmf name: client-setup directory: /root/kdump-tests tests: /setup/kdump how: fmf name: server-setup ... execute task #4: server-test on server how: tmt summary: 4 tests executed report how: display summary: 4 tests passed finish A main.fmf is put under the root directory of tests and setup for other tests to inherit [4]. When there is a kernel panic, ssh access to the guest will exit with with 255 and TMT will retreat it as an error. We can use restart-on-exit-code=255 [5] to let TMT this is expected. [1] https://tmt.readthedocs.io/en/stable/overview.html#run [2] https://tmt.readthedocs.io/en/stable/spec/plans.html [3] https://tmt.readthedocs.io/en/stable/guide.html#multihost-testing [4] https://tmt.readthedocs.io/en/stable/guide.html#inheritance [5] https://tmt.readthedocs.io/en/latest/spec/tests.html#restart Signed-off-by: Coiby Xu <[email protected]>

Somehow the dmesg log extracted by makedumpfile vmcore-dmesg.2.txt has "[ T[0-9]]" like follows, --- /var/crash/192.168.122.27-2024-07-30-21:01:49/vmcore-dmesg.txt 2024-07-30 21:01:51.865079103 +0000 +++ /var/crash/192.168.122.27-2024-07-30-21:01:49/vmcore-dmesg.txt.2 2024-07-30 21:02:05.504060793 +0000 @@ -1,830 +1,830 @@ -[ 0.000000] Linux version 6.8.5-301.fc40.x86_64 (mockbuild@0bc0cc78c12e4762acf61c209bd02e96) (gcc (GCC) 14.0.1 20240328 (Red Hat 14.0.1-0), GNU ld version 2.41-34.fc40) #1 SMP PREEMPT_DYNAMIC Thu Apr 11 20:00:10 UTC 2024 -[ 0.000000] Command line: BOOT_IMAGE=(hd0,gpt3)/vmlinuz-6.8.5-301.fc40.x86_64 no_timer_check net.ifnames=0 console=tty1 console=ttyS0,115200n8 root=UUID=8424dfd0-f878-4302-8ee5-b2ec6e5eb868 rootflags=subvol=root crashkernel=1G-4G:192M,4G-64G:256M,64G-:512M -[ 0.000000] BIOS-provided physical RAM map: -[ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable -[ 0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved -[ 0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved ... +[ 0.000000] [ T0] Linux version 6.8.5-301.fc40.x86_64 (mockbuild@0bc0cc78c12e4762acf61c209bd02e96) (gcc (GCC) 14.0.1 20240328 (Red Hat 14.0.1-0), GNU ld version 2.41-34.fc40) #1 SMP PREEMPT_DYNAMIC Thu Apr 11 20:00:10 UTC 2024 +[ 0.000000] [ T0] Command line: BOOT_IMAGE=(hd0,gpt3)/vmlinuz-6.8.5-301.fc40.x86_64 no_timer_check net.ifnames=0 console=tty1 console=ttyS0,115200n8 root=UUID=8424dfd0-f878-4302-8ee5-b2ec6e5eb868 rootflags=subvol=root crashkernel=1G-4G:192M,4G-64G:256M,64G-:512M +[ 0.000000] [ T0] BIOS-provided physical RAM map: +[ 0.000000] [ T0] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable +[ 0.000000] [ T0] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved +[ 0.000000] [ T0] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved Signed-off-by: Coiby Xu <[email protected]>

To make it easier to troubleshot test failures, set root password as kdump. Then we can use "virsh console VM" to have console access to a VM. Note we can also have ssh access to the VMs using tmt [1]. For example, "tmt run login --step prepare" allows you to access the VM at the prepare step. [1] https://tmt.readthedocs.io/en/stable/examples.html#guest-login Signed-off-by: Coiby Xu <[email protected]>

Currently it takes about half an hour to finish the test on dell-per340-03. If we use a faster mirror, e.g. https://mirrors.tuna.tsinghua.edu.cn/fedora, only 4m is needed. Support using custom mirror by "tmt run --environment" e.g. tmt run --environment CUSTOM_MIRROR=https://mirrors.tuna.tsinghua.edu.cn/fedora ... Note fedora-cisco-openh264 is disabled because it's not needed. Signed-off-by: Coiby Xu <[email protected]>

We can use "tmt run --environment KDUMP_UTILS_RPM=PATH_TO_RPM" to install custom built kdump-utils. Signed-off-by: Coiby Xu <[email protected]>

For local vmcore dumping, we only need to trigger a kernel crash and check if vmcore exists and it can re-use some setups and tests from NFS dumping. Note 1. plans/main.fmf is created for NFS and local dumping test plans to inherit. 2. Virtual tests [1] are created so local dumping can re-use the code of checking vmcore on an NFS server 3. The local test is disabled for now due to [2] 4. exit-first=true [3] is to make the executor stop executing tests once a test failure is encountered. But TMT currently has the problem of exit-first not applying across discover phase boundaries [4]. So for local dumping test plan, simply remove multiple discover phases [5]. [1] https://tmt.readthedocs.io/en/stable/guide.html#virtual-tests [2] https://bugzilla.redhat.com/show_bug.cgi?id=2270423 [3] https://tmt.readthedocs.io/en/stable/spec/plans.html#exit-first [4] teemtee/tmt#3116 [5] https://tmt.readthedocs.io/en/stable/guide.html#multihost-testing Signed-off-by: Coiby Xu <[email protected]>

TMT stores server SSH key in /var/tmp/tmt/run-*/plans/ssh/provision/server/id_ecdsa. Signed-off-by: Coiby Xu <[email protected]>

To avoid triggering the kernel panic repeatedly, this test use kexec reboot to only boot into the initrd with early kdump enabled once. Signed-off-by: Coiby Xu <[email protected]>

Use hardware.disk[1] to specify an additional disk for LVM2 thin provision as follows, provision: - name: client how: virtual connection: system hardware: disk: - size: = 40GB - size: = 1GB Note tmt doesn't support specify a size of smaller unit like 300MB. [1] https://tmt.readthedocs.io/en/stable/spec/hardware.html#disk Signed-off-by: Coiby Xu <[email protected]>

With this patch, four kinds of tests are triggered when there is a pull request created in a Github kdump-uitls repo, - format check using shfmt - static analysis using shellcheck - ShellSpec unit tests (test cases in spec/) - integration tests (tests cases in tests/) Signed-off-by: Coiby Xu <[email protected]>

coiby · 2024-10-31T07:43:44Z

Hi @coiby, I finally had the time to take a look. Besides some nits the PR looks fine. My biggest critique is the coding style in the new scripts. There's a weird mixture of different indentations. So best to use shfmt to fix it. In addition there are a lot of shellcheck findings for the new tools/ and tests/ directories.

Hi @prudo1, thanks for carefully reviewing the patch set! I believe all your concerns have been addressed now:)

coiby changed the title ~~Enable GitHub action to test the patches as early and as often as possible~~ [RFC] Enable GitHub action to test the patches as early and as often as possible Jul 3, 2024

Barathrum-Liu reviewed Jul 11, 2024

View reviewed changes

Barathrum-Liu approved these changes Jul 11, 2024

View reviewed changes

prudo1 requested changes Jul 18, 2024

View reviewed changes

coiby force-pushed the github_action branch 2 times, most recently from 1b73bcd to 267c113 Compare August 1, 2024 01:24

coiby changed the title ~~[RFC] Enable GitHub action to test the patches as early and as often as possible~~ Enable GitHub action to test the patches as early and as often as possible Aug 1, 2024

coiby force-pushed the github_action branch from 267c113 to 3994a75 Compare August 1, 2024 12:28

prudo1 self-requested a review August 30, 2024 12:42

prudo1 requested changes Aug 30, 2024

View reviewed changes

coiby force-pushed the github_action branch 2 times, most recently from b6f85a7 to e277c1b Compare October 24, 2024 10:08

coiby added 9 commits October 25, 2024 12:36

tests: install a custom built kdump-utils RPM

44329e5

We can use "tmt run --environment KDUMP_UTILS_RPM=PATH_TO_RPM" to install custom built kdump-utils. Signed-off-by: Coiby Xu <[email protected]>

tests: add SSH dumping test plan

803dad4

TMT stores server SSH key in /var/tmp/tmt/run-*/plans/ssh/provision/server/id_ecdsa. Signed-off-by: Coiby Xu <[email protected]>

tests: add NFS early dumping test plan

77f3c85

To avoid triggering the kernel panic repeatedly, this test use kexec reboot to only boot into the initrd with early kdump enabled once. Signed-off-by: Coiby Xu <[email protected]>

coiby force-pushed the github_action branch from e277c1b to c224b63 Compare October 29, 2024 09:17

coiby force-pushed the github_action branch from c224b63 to 934f624 Compare October 31, 2024 07:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable GitHub action to test the patches as early and as often as possible #21

Enable GitHub action to test the patches as early and as often as possible #21

coiby commented Jul 3, 2024 •

edited

Loading

Barathrum-Liu left a comment

prudo1 left a comment

prudo1 Jul 18, 2024

coiby Aug 1, 2024

prudo1 Jul 18, 2024

coiby Aug 1, 2024

prudo1 Jul 18, 2024

coiby Aug 1, 2024

coiby commented Aug 1, 2024

prudo1 left a comment

prudo1 Aug 30, 2024

coiby Oct 31, 2024

prudo1 Aug 30, 2024

coiby Oct 31, 2024

prudo1 Aug 30, 2024

coiby Oct 31, 2024

prudo1 Aug 30, 2024

coiby Oct 31, 2024

prudo1 Aug 30, 2024

coiby Oct 31, 2024

prudo1 Aug 30, 2024

coiby Oct 31, 2024

prudo1 Aug 30, 2024

coiby Oct 31, 2024

prudo1 Aug 30, 2024

coiby Oct 31, 2024

coiby commented Oct 31, 2024

		#!/bin/sh -eux

Enable GitHub action to test the patches as early and as often as possible #21

Are you sure you want to change the base?

Enable GitHub action to test the patches as early and as often as possible #21

Conversation

coiby commented Jul 3, 2024 • edited Loading

Barathrum-Liu left a comment

Choose a reason for hiding this comment

prudo1 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coiby commented Aug 1, 2024

prudo1 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coiby commented Oct 31, 2024

coiby commented Jul 3, 2024 •

edited

Loading