-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable GitHub action to test the patches as early and as often as possible #21
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this change.
To be honest, I don't have enough knowledge about github actions to confirm the correctness of this PR. But I think it will not affect the functionality of kdump-utils itself, I am willing to give a try.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @coiby,
I second @Barathrum-Liu, it's great to have those tests run as early as possible. I'm a little bit concerned about all those hard coded version being used in the tests. Is there a way to stick with some official packages instead? Or maybe let all tests run on the same container, so we only need to update the container?
Thanks
Philipp
.github/workflows/main.yml
Outdated
- uses: actions/checkout@v2 | ||
- run: wget https://github.com/mvdan/sh/releases/download/v3.4.3/shfmt_v3.4.3_linux_amd64 -O /usr/local/bin/shfmt && chmod +x /usr/local/bin/shfmt | ||
- run: shfmt -d *.sh kdumpctl mk*dumprd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reason why you download shfmt directly from github instead of simply installing it via the official repo? If so could you please add a comment why you are doing this.
I'm afraid with all the hard coded versions it will be pretty painful to maintain these tests as it would require to manually update them regularly when we run into problems...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for raising the concern! I downloaded shfmt from github mainly wanting to use latest shfmt. This patch dates back to one or two years ago and that time latest shfmt is v3.4.3 but the default ubuntu image provided by github uses a much older version (in fact Github's latest ubuntu is 22.04 and still has v3.4.3). Now for new version, I use Fedora's latest docker image which ships v3.7.0 (the latest upstream version is 3.8.0) instead. So we can use a relatively new shfmt without maintaining the Github Action .yml file.
.github/workflows/main.yml
Outdated
steps: | ||
- uses: actions/checkout@v2 | ||
- run: curl -L -O https://github.com/koalaman/shellcheck/releases/download/v0.8.0/shellcheck-v0.8.0.linux.x86_64.tar.xz && tar -xJf shellcheck-v0.8.0.linux.x86_64.tar.xz |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same like for shfmt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For shellcheck, I also use Fedora's latest docker image which ships shellcheck:0.9.0 (the latest upstream version is v0.10.0) in new version now.
tools/run-integration-tests.sh
Outdated
USE_GUESTMOUNT=1 | ||
fi | ||
|
||
KUMP_TEST_QEMU_TIMEOUT=20m USE_GUESTMOUNT=$USE_GUESTMOUNT BASE_IMAGE=/usr/share/cloud_images/Fedora-Cloud-Base-Generic.x86_64-40-1.14.qcow2 RELEASE=$VERSION_ID make test-run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens when F40 runs out of service? Do we need to provide a new docker image then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Current tests just fail F40 and they are difficult to fix. So I decide to abandon our own testing framework and choose TMT instead. Now the code to maintain is significantly less
47 files changed, 519 insertions(+), 1393 deletions(-)
For new approach, and there is no need to maintain the docker image.
1b73bcd
to
267c113
Compare
Hi @Barathrum-Liu and @prudo1, Thanks for the encouragements! I've pushed a new version and the biggest change is I abandon current testing framework due to some difficult issues and switch to TMT. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @coiby,
I finally had the time to take a look. Besides some nits the PR looks fine. My biggest critique is the coding style in the new scripts. There's a weird mixture of different indentations. So best to use shfmt
to fix it.
In addition there are a lot of shellcheck findings for the new tools/
and tests/
directories.
tests/setup/trigger_crash/test.sh
Outdated
function get_IP() { | ||
if echo $1 | grep -E -q '[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+'; then | ||
echo $1 | ||
else | ||
host $1 | sed -n -e 's/.*has address //p' | head -n 1 | ||
fi | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually we don't add the function
keyword. I would prefer to keep it this way. Same for assign_server_roles
.
In addition get_IP
only works for IPv4. Does it make sense to expand it to IPv6 as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I drop the function
keyword now. Thanks!
As for get_ip
, currently the test VMs don't have IPv6. So unless you have a test case requiring IPv6, I'll defer it to future work.
tests/tests/check_vmcore/test.sh
Outdated
#!/bin/sh -eux | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason why you use /bin/sh
here? All the other scripts use /bin/bash
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching it! tmt tests create /test/check_vmcore -t shell
simply used /bin/sh
and the new version uses bash instead.
tests/tests/check_vmcore/test.sh
Outdated
MAX_WAIT_TIME=300 | ||
_waited=0 | ||
while [ $_waited -le $MAX_WAIT_TIME ]; do | ||
if ls -1 $path &> /dev/null; then | ||
vmcore_dir=$path/$(ls -1 $path | tail -n 1) | ||
break | ||
fi | ||
_waited=$((_waited+1)) | ||
done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What unit does MAX_WAIT_TIME
have? Seconds? Don't we need something like sleep 1
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, sleep 1
is needed, thanks for catching this issue!
tests/setup/lvm2_thinp/test.sh
Outdated
elif [ $TMT_REBOOT_COUNT == 1 ]; then | ||
rlPhaseStartTest | ||
rlRun "grep crashkernel=$_default_crashkernel /proc/cmdline" | ||
rlPhaseEnd | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this here? Isn't that already done in setup/default_crashkernel
? Especially, where is $default_crashkernel
set in this script?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, it's not needed. New version drops it.
name: kdump-utils tests | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit message
s/kexec-tools repo/kdump-utils repo/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in new version, thanks!
tools/run-integration-tests.sh
Outdated
fedora_version=40 | ||
if [[ -n $1 ]]; then | ||
fedora_version=$1 | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fedora_version=${1:-40}
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Applied to new version, good suggestion!
.github/workflows/main.yml
Outdated
static-analysis: | ||
runs-on: ubuntu-latest | ||
container: docker.io/fedora:latest | ||
steps: | ||
- uses: actions/checkout@v4 | ||
- run: dnf install shellcheck -y | ||
# Currently, for kdump-utils, there is need for shellcheck to require | ||
# the sourced file to give correct warnings about the checked file | ||
- run: shellcheck --exclude=1090,1091 *.sh spec/*.sh kdumpctl mk*dumprd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of disabling SC1091 I would add option -x
so shellcheck follows the external files. That prevents false positive errors, e.g. when a pull request defines a new variable in kdump-lib.sh
and references it in kdumpctl
. With #33 merged there should be a directive on all sourced files now, so shellcheck knows where to find the files. If there isn't I would consider it a bug in our code we should fix rather than simply hiding the error. Same for SC1090.
Plus as above we need to add the new directories.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disable SC109{0,1} in new version. New directories are also added now. Thanks!
.github/workflows/main.yml
Outdated
format-check: | ||
runs-on: ubuntu-latest | ||
container: docker.io/fedora:latest | ||
steps: | ||
- uses: actions/checkout@v4 | ||
- run: dnf install shfmt -y | ||
- run: shfmt -d *.sh kdumpctl mk*dumprd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dracut
also uses option -s
for their tests. Shall we use that as well?
In addition we need to add the new dracut/
, tools/
and tests/
directories.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've used -s
and added the new directories now, thanks!
Currently, all integration tests fail F40 because of failing to install package and enable services on the cloud image properly. TMT (Test Management Tool) supports running tests on virtual machines and provides an user-friendly interface (YAML) to define tasks like installing packages, running commands on the virtual machines. By using TMT, we no longer need to spend time and efforts on maintaining the current test framework and can also enjoy rich handy features like managing tests provided by TMT. Signed-off-by: Coiby Xu <[email protected]>
b6f85a7
to
e277c1b
Compare
Currently, all integration tests fail F40 because of failing to install package and enable services on the cloud image properly. TMT (Test Management Tool) supports running tests on virtual machines and provides an user-friendly interface (YAML) to define tasks like installing packages, running commands on the virtual machines. By using TMT, we no longer need to spend time and efforts on maintaining the current test framework and can also enjoy rich handy features like managing tests provided by TMT. There are six test steps in TMT. Quoting [1], here is an overview, - discover Gather information about test cases to be executed. - provision Provision an environment for testing or use localhost. - prepare Prepare the environment for testing. - execute Run tests using the specified executor. - report Provide test results overview and send reports. - finish Perform the finishing tasks and clean up provisioned guests. And each step step is configurable via the so-called plans [2]. TMT also uses plans to group relevant tests and select needed tests. This commit adds the test of sending crash dump to a NFS target by setting two VMs using TMT's MultiHost Testing feature [3], - client Setup up NFS dump target and trigger crash - server acts as an NFS server and check sent vmcore The scaffold of TMT test project can be created as follows, mkdir tests && cd tests tmt init --template mini mv plans/example/{example,kdump}.fmf tmt tests create /setup/default_crashkernel -t beakerlib tmt tests create /setup/nfs_server -t beakerlib tmt tests create /test/trigger_crash -t beakerlib tmt tests create /test/check_vmcore -t beakerlib Then we fill in the content for created plans and tests, # tmt run -a /var/tmp/tmt/run-017 /plans/kdump discover how: fmf name: client-setup directory: /root/kdump-tests tests: /setup/kdump how: fmf name: server-setup ... execute task #4: server-test on server how: tmt summary: 4 tests executed report how: display summary: 4 tests passed finish A main.fmf is put under the root directory of tests and setup for other tests to inherit [4]. When there is a kernel panic, ssh access to the guest will exit with with 255 and TMT will retreat it as an error. We can use restart-on-exit-code=255 [5] to let TMT this is expected. [1] https://tmt.readthedocs.io/en/stable/overview.html#run [2] https://tmt.readthedocs.io/en/stable/spec/plans.html [3] https://tmt.readthedocs.io/en/stable/guide.html#multihost-testing [4] https://tmt.readthedocs.io/en/stable/guide.html#inheritance [5] https://tmt.readthedocs.io/en/latest/spec/tests.html#restart Signed-off-by: Coiby Xu <[email protected]>
Somehow the dmesg log extracted by makedumpfile vmcore-dmesg.2.txt has "[ T[0-9]]" like follows, --- /var/crash/192.168.122.27-2024-07-30-21:01:49/vmcore-dmesg.txt 2024-07-30 21:01:51.865079103 +0000 +++ /var/crash/192.168.122.27-2024-07-30-21:01:49/vmcore-dmesg.txt.2 2024-07-30 21:02:05.504060793 +0000 @@ -1,830 +1,830 @@ -[ 0.000000] Linux version 6.8.5-301.fc40.x86_64 (mockbuild@0bc0cc78c12e4762acf61c209bd02e96) (gcc (GCC) 14.0.1 20240328 (Red Hat 14.0.1-0), GNU ld version 2.41-34.fc40) #1 SMP PREEMPT_DYNAMIC Thu Apr 11 20:00:10 UTC 2024 -[ 0.000000] Command line: BOOT_IMAGE=(hd0,gpt3)/vmlinuz-6.8.5-301.fc40.x86_64 no_timer_check net.ifnames=0 console=tty1 console=ttyS0,115200n8 root=UUID=8424dfd0-f878-4302-8ee5-b2ec6e5eb868 rootflags=subvol=root crashkernel=1G-4G:192M,4G-64G:256M,64G-:512M -[ 0.000000] BIOS-provided physical RAM map: -[ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable -[ 0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved -[ 0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved ... +[ 0.000000] [ T0] Linux version 6.8.5-301.fc40.x86_64 (mockbuild@0bc0cc78c12e4762acf61c209bd02e96) (gcc (GCC) 14.0.1 20240328 (Red Hat 14.0.1-0), GNU ld version 2.41-34.fc40) #1 SMP PREEMPT_DYNAMIC Thu Apr 11 20:00:10 UTC 2024 +[ 0.000000] [ T0] Command line: BOOT_IMAGE=(hd0,gpt3)/vmlinuz-6.8.5-301.fc40.x86_64 no_timer_check net.ifnames=0 console=tty1 console=ttyS0,115200n8 root=UUID=8424dfd0-f878-4302-8ee5-b2ec6e5eb868 rootflags=subvol=root crashkernel=1G-4G:192M,4G-64G:256M,64G-:512M +[ 0.000000] [ T0] BIOS-provided physical RAM map: +[ 0.000000] [ T0] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable +[ 0.000000] [ T0] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved +[ 0.000000] [ T0] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved Signed-off-by: Coiby Xu <[email protected]>
To make it easier to troubleshot test failures, set root password as kdump. Then we can use "virsh console VM" to have console access to a VM. Note we can also have ssh access to the VMs using tmt [1]. For example, "tmt run login --step prepare" allows you to access the VM at the prepare step. [1] https://tmt.readthedocs.io/en/stable/examples.html#guest-login Signed-off-by: Coiby Xu <[email protected]>
Currently it takes about half an hour to finish the test on dell-per340-03. If we use a faster mirror, e.g. https://mirrors.tuna.tsinghua.edu.cn/fedora, only 4m is needed. Support using custom mirror by "tmt run --environment" e.g. tmt run --environment CUSTOM_MIRROR=https://mirrors.tuna.tsinghua.edu.cn/fedora ... Note fedora-cisco-openh264 is disabled because it's not needed. Signed-off-by: Coiby Xu <[email protected]>
We can use "tmt run --environment KDUMP_UTILS_RPM=PATH_TO_RPM" to install custom built kdump-utils. Signed-off-by: Coiby Xu <[email protected]>
For local vmcore dumping, we only need to trigger a kernel crash and check if vmcore exists and it can re-use some setups and tests from NFS dumping. Note 1. plans/main.fmf is created for NFS and local dumping test plans to inherit. 2. Virtual tests [1] are created so local dumping can re-use the code of checking vmcore on an NFS server 3. The local test is disabled for now due to [2] 4. exit-first=true [3] is to make the executor stop executing tests once a test failure is encountered. But TMT currently has the problem of exit-first not applying across discover phase boundaries [4]. So for local dumping test plan, simply remove multiple discover phases [5]. [1] https://tmt.readthedocs.io/en/stable/guide.html#virtual-tests [2] https://bugzilla.redhat.com/show_bug.cgi?id=2270423 [3] https://tmt.readthedocs.io/en/stable/spec/plans.html#exit-first [4] teemtee/tmt#3116 [5] https://tmt.readthedocs.io/en/stable/guide.html#multihost-testing Signed-off-by: Coiby Xu <[email protected]>
TMT stores server SSH key in /var/tmp/tmt/run-*/plans/ssh/provision/server/id_ecdsa. Signed-off-by: Coiby Xu <[email protected]>
To avoid triggering the kernel panic repeatedly, this test use kexec reboot to only boot into the initrd with early kdump enabled once. Signed-off-by: Coiby Xu <[email protected]>
Use hardware.disk[1] to specify an additional disk for LVM2 thin provision as follows, provision: - name: client how: virtual connection: system hardware: disk: - size: = 40GB - size: = 1GB Note tmt doesn't support specify a size of smaller unit like 300MB. [1] https://tmt.readthedocs.io/en/stable/spec/hardware.html#disk Signed-off-by: Coiby Xu <[email protected]>
With this patch, four kinds of tests are triggered when there is a pull request created in a Github kdump-uitls repo, - format check using shfmt - static analysis using shellcheck - ShellSpec unit tests (test cases in spec/) - integration tests (tests cases in tests/) Signed-off-by: Coiby Xu <[email protected]>
Hi @prudo1, thanks for carefully reviewing the patch set! I believe all your concerns have been addressed now:) |
This patch set enables GitHub Action to test the patches whenever there is a pull request (PR) or a change to the PR. By testing the patches as early and as often as possible, we can prevent problems or catch problems early.
Four kinds of tests will be triggered automatically,
For integration tests, the tests are now re-written in TMT. TMT (Test Management Tool) supports running tests on virtual machines and provides an user-friendly interface (YAML) to define tasks like installing packages, running commands on the virtual machines. By using TMT, we no longer need to spend time and efforts on maintaining the current test framework and can also enjoy rich handy features like managing tests provided by TMT.