Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable GitHub action to test the patches as early and as often as possible #21

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

coiby
Copy link
Member

@coiby coiby commented Jul 3, 2024

This patch set enables GitHub Action to test the patches whenever there is a pull request (PR) or a change to the PR. By testing the patches as early and as often as possible, we can prevent problems or catch problems early.

Four kinds of tests will be triggered automatically,

For integration tests, the tests are now re-written in TMT. TMT (Test Management Tool) supports running tests on virtual machines and provides an user-friendly interface (YAML) to define tasks like installing packages, running commands on the virtual machines. By using TMT, we no longer need to spend time and efforts on maintaining the current test framework and can also enjoy rich handy features like managing tests provided by TMT.

@coiby coiby changed the title Enable GitHub action to test the patches as early and as often as possible [RFC] Enable GitHub action to test the patches as early and as often as possible Jul 3, 2024
Copy link
Collaborator

@Barathrum-Liu Barathrum-Liu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this change.

To be honest, I don't have enough knowledge about github actions to confirm the correctness of this PR. But I think it will not affect the functionality of kdump-utils itself, I am willing to give a try.

Copy link
Collaborator

@prudo1 prudo1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @coiby,
I second @Barathrum-Liu, it's great to have those tests run as early as possible. I'm a little bit concerned about all those hard coded version being used in the tests. Is there a way to stick with some official packages instead? Or maybe let all tests run on the same container, so we only need to update the container?
Thanks
Philipp

Comment on lines 9 to 12
- uses: actions/checkout@v2
- run: wget https://github.com/mvdan/sh/releases/download/v3.4.3/shfmt_v3.4.3_linux_amd64 -O /usr/local/bin/shfmt && chmod +x /usr/local/bin/shfmt
- run: shfmt -d *.sh kdumpctl mk*dumprd
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason why you download shfmt directly from github instead of simply installing it via the official repo? If so could you please add a comment why you are doing this.
I'm afraid with all the hard coded versions it will be pretty painful to maintain these tests as it would require to manually update them regularly when we run into problems...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for raising the concern! I downloaded shfmt from github mainly wanting to use latest shfmt. This patch dates back to one or two years ago and that time latest shfmt is v3.4.3 but the default ubuntu image provided by github uses a much older version (in fact Github's latest ubuntu is 22.04 and still has v3.4.3). Now for new version, I use Fedora's latest docker image which ships v3.7.0 (the latest upstream version is 3.8.0) instead. So we can use a relatively new shfmt without maintaining the Github Action .yml file.

Comment on lines 15 to 17
steps:
- uses: actions/checkout@v2
- run: curl -L -O https://github.com/koalaman/shellcheck/releases/download/v0.8.0/shellcheck-v0.8.0.linux.x86_64.tar.xz && tar -xJf shellcheck-v0.8.0.linux.x86_64.tar.xz
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same like for shfmt

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For shellcheck, I also use Fedora's latest docker image which ships shellcheck:0.9.0 (the latest upstream version is v0.10.0) in new version now.

USE_GUESTMOUNT=1
fi

KUMP_TEST_QEMU_TIMEOUT=20m USE_GUESTMOUNT=$USE_GUESTMOUNT BASE_IMAGE=/usr/share/cloud_images/Fedora-Cloud-Base-Generic.x86_64-40-1.14.qcow2 RELEASE=$VERSION_ID make test-run
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens when F40 runs out of service? Do we need to provide a new docker image then?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current tests just fail F40 and they are difficult to fix. So I decide to abandon our own testing framework and choose TMT instead. Now the code to maintain is significantly less

47 files changed, 519 insertions(+), 1393 deletions(-)

For new approach, and there is no need to maintain the docker image.

@coiby coiby force-pushed the github_action branch 2 times, most recently from 1b73bcd to 267c113 Compare August 1, 2024 01:24
@coiby
Copy link
Member Author

coiby commented Aug 1, 2024

Hi @Barathrum-Liu and @prudo1,

Thanks for the encouragements! I've pushed a new version and the biggest change is I abandon current testing framework due to some difficult issues and switch to TMT.

@coiby coiby changed the title [RFC] Enable GitHub action to test the patches as early and as often as possible Enable GitHub action to test the patches as early and as often as possible Aug 1, 2024
@prudo1 prudo1 self-requested a review August 30, 2024 12:42
Copy link
Collaborator

@prudo1 prudo1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @coiby,
I finally had the time to take a look. Besides some nits the PR looks fine. My biggest critique is the coding style in the new scripts. There's a weird mixture of different indentations. So best to use shfmt to fix it.
In addition there are a lot of shellcheck findings for the new tools/ and tests/ directories.

tests/Makefile Outdated Show resolved Hide resolved
tests/setup/trigger_crash/test.sh Outdated Show resolved Hide resolved
Comment on lines 5 to 11
function get_IP() {
if echo $1 | grep -E -q '[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+'; then
echo $1
else
host $1 | sed -n -e 's/.*has address //p' | head -n 1
fi
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually we don't add the function keyword. I would prefer to keep it this way. Same for assign_server_roles.

In addition get_IP only works for IPv4. Does it make sense to expand it to IPv6 as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I drop the function keyword now. Thanks!

As for get_ip, currently the test VMs don't have IPv6. So unless you have a test case requiring IPv6, I'll defer it to future work.

Comment on lines 1 to 2
#!/bin/sh -eux

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why you use /bin/sh here? All the other scripts use /bin/bash.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching it! tmt tests create /test/check_vmcore -t shell simply used /bin/sh and the new version uses bash instead.

Comment on lines 8 to 16
MAX_WAIT_TIME=300
_waited=0
while [ $_waited -le $MAX_WAIT_TIME ]; do
if ls -1 $path &> /dev/null; then
vmcore_dir=$path/$(ls -1 $path | tail -n 1)
break
fi
_waited=$((_waited+1))
done
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What unit does MAX_WAIT_TIME have? Seconds? Don't we need something like sleep 1 here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, sleep 1 is needed, thanks for catching this issue!

Comment on lines 42 to 46
elif [ $TMT_REBOOT_COUNT == 1 ]; then
rlPhaseStartTest
rlRun "grep crashkernel=$_default_crashkernel /proc/cmdline"
rlPhaseEnd
fi
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this here? Isn't that already done in setup/default_crashkernel? Especially, where is $default_crashkernel set in this script?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, it's not needed. New version drops it.

Comment on lines +1 to +2
name: kdump-utils tests

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit message
s/kexec-tools repo/kdump-utils repo/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in new version, thanks!

Comment on lines 7 to 10
fedora_version=40
if [[ -n $1 ]]; then
fedora_version=$1
fi
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fedora_version=${1:-40}?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied to new version, good suggestion!

Comment on lines 14 to 22
static-analysis:
runs-on: ubuntu-latest
container: docker.io/fedora:latest
steps:
- uses: actions/checkout@v4
- run: dnf install shellcheck -y
# Currently, for kdump-utils, there is need for shellcheck to require
# the sourced file to give correct warnings about the checked file
- run: shellcheck --exclude=1090,1091 *.sh spec/*.sh kdumpctl mk*dumprd
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of disabling SC1091 I would add option -x so shellcheck follows the external files. That prevents false positive errors, e.g. when a pull request defines a new variable in kdump-lib.sh and references it in kdumpctl. With #33 merged there should be a directive on all sourced files now, so shellcheck knows where to find the files. If there isn't I would consider it a bug in our code we should fix rather than simply hiding the error. Same for SC1090.

Plus as above we need to add the new directories.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disable SC109{0,1} in new version. New directories are also added now. Thanks!

Comment on lines 6 to 12
format-check:
runs-on: ubuntu-latest
container: docker.io/fedora:latest
steps:
- uses: actions/checkout@v4
- run: dnf install shfmt -y
- run: shfmt -d *.sh kdumpctl mk*dumprd
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dracut also uses option -s for their tests. Shall we use that as well?
In addition we need to add the new dracut/, tools/ and tests/ directories.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've used -s and added the new directories now, thanks!

Currently, all integration tests fail F40 because of failing to install
package and enable services on the cloud image properly. TMT (Test Management
Tool) supports running tests on virtual machines and provides an
user-friendly interface (YAML) to define tasks like installing packages,
running commands on the virtual machines. By using TMT, we no longer
need to spend time and efforts on maintaining the current test framework
and can also enjoy rich handy features like managing tests provided by TMT.

Signed-off-by: Coiby Xu <[email protected]>
@coiby coiby force-pushed the github_action branch 2 times, most recently from b6f85a7 to e277c1b Compare October 24, 2024 10:08
Currently, all integration tests fail F40 because of failing to install
package and enable services on the cloud image properly. TMT (Test Management
Tool) supports running tests on virtual machines and provides an
user-friendly interface (YAML) to define tasks like installing packages,
running commands on the virtual machines. By using TMT, we no longer
need to spend time and efforts on maintaining the current test framework
and can also enjoy rich handy features like managing tests provided by TMT.

There are six test steps in TMT. Quoting [1], here is an overview,
 - discover
   Gather information about test cases to be executed.

 - provision
   Provision an environment for testing or use localhost.

 - prepare
   Prepare the environment for testing.

 - execute
   Run tests using the specified executor.

 - report
   Provide test results overview and send reports.

 - finish
   Perform the finishing tasks and clean up provisioned guests.

And each step step is configurable via the so-called plans [2]. TMT also
uses plans to group relevant tests and select needed tests.

This commit adds the test of sending crash dump to a NFS target by
setting two VMs using TMT's MultiHost Testing feature [3],
 - client
   Setup up NFS dump target and trigger crash
 - server
   acts as an NFS server and check sent vmcore

The scaffold of TMT test project can be created as follows,

    mkdir tests && cd tests
    tmt init --template mini
    mv plans/example/{example,kdump}.fmf

    tmt tests create /setup/default_crashkernel -t beakerlib
    tmt tests create /setup/nfs_server -t beakerlib
    tmt tests create /test/trigger_crash -t beakerlib
    tmt tests create /test/check_vmcore -t beakerlib

Then we fill in the content for created plans and tests,

    # tmt run -a
    /var/tmp/tmt/run-017

    /plans/kdump
        discover
            how: fmf
            name: client-setup
            directory: /root/kdump-tests
            tests: /setup/kdump
            how: fmf
            name: server-setup
    ...
            execute task #4: server-test on server
            how: tmt

            summary: 4 tests executed
        report
            how: display
            summary: 4 tests passed
        finish

A main.fmf is put under the root directory of tests and setup for other
tests to inherit [4].

When there is a kernel panic, ssh access to the guest will exit with
with 255 and TMT will retreat it as an error. We can use
restart-on-exit-code=255 [5] to let TMT this is expected.

[1] https://tmt.readthedocs.io/en/stable/overview.html#run
[2] https://tmt.readthedocs.io/en/stable/spec/plans.html
[3] https://tmt.readthedocs.io/en/stable/guide.html#multihost-testing
[4] https://tmt.readthedocs.io/en/stable/guide.html#inheritance
[5] https://tmt.readthedocs.io/en/latest/spec/tests.html#restart

Signed-off-by: Coiby Xu <[email protected]>
Somehow the dmesg log extracted by makedumpfile vmcore-dmesg.2.txt has
"[      T[0-9]]" like follows,

    --- /var/crash/192.168.122.27-2024-07-30-21:01:49/vmcore-dmesg.txt      2024-07-30 21:01:51.865079103 +0000
    +++ /var/crash/192.168.122.27-2024-07-30-21:01:49/vmcore-dmesg.txt.2    2024-07-30 21:02:05.504060793 +0000
    @@ -1,830 +1,830 @@
    -[    0.000000] Linux version 6.8.5-301.fc40.x86_64 (mockbuild@0bc0cc78c12e4762acf61c209bd02e96) (gcc (GCC) 14.0.1 20240328 (Red Hat 14.0.1-0), GNU ld version 2.41-34.fc40) #1 SMP PREEMPT_DYNAMIC Thu Apr 11 20:00:10 UTC 2024
    -[    0.000000] Command line: BOOT_IMAGE=(hd0,gpt3)/vmlinuz-6.8.5-301.fc40.x86_64 no_timer_check net.ifnames=0 console=tty1 console=ttyS0,115200n8 root=UUID=8424dfd0-f878-4302-8ee5-b2ec6e5eb868 rootflags=subvol=root crashkernel=1G-4G:192M,4G-64G:256M,64G-:512M
    -[    0.000000] BIOS-provided physical RAM map:
    -[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
    -[    0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
    -[    0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
    ...
    +[    0.000000] [      T0] Linux version 6.8.5-301.fc40.x86_64 (mockbuild@0bc0cc78c12e4762acf61c209bd02e96) (gcc (GCC) 14.0.1 20240328 (Red Hat 14.0.1-0), GNU ld version 2.41-34.fc40) #1 SMP PREEMPT_DYNAMIC Thu Apr 11 20:00:10 UTC 2024
    +[    0.000000] [      T0] Command line: BOOT_IMAGE=(hd0,gpt3)/vmlinuz-6.8.5-301.fc40.x86_64 no_timer_check net.ifnames=0 console=tty1 console=ttyS0,115200n8 root=UUID=8424dfd0-f878-4302-8ee5-b2ec6e5eb868 rootflags=subvol=root crashkernel=1G-4G:192M,4G-64G:256M,64G-:512M
    +[    0.000000] [      T0] BIOS-provided physical RAM map:
    +[    0.000000] [      T0] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
    +[    0.000000] [      T0] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
    +[    0.000000] [      T0] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved

Signed-off-by: Coiby Xu <[email protected]>
To make it easier to troubleshot test failures, set root password as
kdump. Then we can use "virsh console VM" to have console access to a
VM.

Note we can also have ssh access to the VMs using tmt [1]. For example,
"tmt run login --step prepare" allows you to access the VM at the
prepare step.

[1] https://tmt.readthedocs.io/en/stable/examples.html#guest-login

Signed-off-by: Coiby Xu <[email protected]>
Currently it takes about half an hour to finish the test on
dell-per340-03. If we use a faster mirror, e.g.
https://mirrors.tuna.tsinghua.edu.cn/fedora, only 4m is needed.

Support using custom mirror by "tmt run --environment" e.g.
tmt run --environment CUSTOM_MIRROR=https://mirrors.tuna.tsinghua.edu.cn/fedora ...

Note fedora-cisco-openh264 is disabled because it's not needed.

Signed-off-by: Coiby Xu <[email protected]>
We can use "tmt run --environment KDUMP_UTILS_RPM=PATH_TO_RPM" to install
custom built kdump-utils.

Signed-off-by: Coiby Xu <[email protected]>
For local vmcore dumping, we only need to trigger a kernel crash and
check if vmcore exists and it can re-use some setups and tests from NFS
dumping.

Note
1. plans/main.fmf is created for NFS and local dumping test plans to
   inherit.
2. Virtual tests [1] are created so local dumping can re-use the code of
   checking vmcore on an NFS server
3. The local test is disabled for now due to [2]
4. exit-first=true [3] is to make the executor stop executing tests once
   a test failure is encountered. But TMT currently has the problem of
   exit-first not applying across discover phase boundaries [4]. So for local
   dumping test plan, simply remove multiple discover phases [5].

[1] https://tmt.readthedocs.io/en/stable/guide.html#virtual-tests
[2] https://bugzilla.redhat.com/show_bug.cgi?id=2270423
[3] https://tmt.readthedocs.io/en/stable/spec/plans.html#exit-first
[4] teemtee/tmt#3116
[5] https://tmt.readthedocs.io/en/stable/guide.html#multihost-testing

Signed-off-by: Coiby Xu <[email protected]>
TMT stores server SSH key in
/var/tmp/tmt/run-*/plans/ssh/provision/server/id_ecdsa.

Signed-off-by: Coiby Xu <[email protected]>
To avoid triggering the kernel panic repeatedly, this test use kexec
reboot to only boot into the initrd with early kdump enabled once.

Signed-off-by: Coiby Xu <[email protected]>
Use hardware.disk[1] to specify an additional disk for LVM2 thin
provision as follows,

    provision:
       - name: client
         how: virtual
         connection: system
         hardware:
            disk:
             - size: = 40GB
             - size: = 1GB

Note tmt doesn't support specify a size of smaller unit like 300MB.

[1] https://tmt.readthedocs.io/en/stable/spec/hardware.html#disk

Signed-off-by: Coiby Xu <[email protected]>
With this patch, four kinds of tests are triggered when there is a
pull request created in a Github kdump-uitls repo,
   - format check using shfmt
   - static analysis using shellcheck
   - ShellSpec unit tests (test cases in spec/)
   - integration tests (tests cases in tests/)

Signed-off-by: Coiby Xu <[email protected]>
@coiby
Copy link
Member Author

coiby commented Oct 31, 2024

Hi @coiby, I finally had the time to take a look. Besides some nits the PR looks fine. My biggest critique is the coding style in the new scripts. There's a weird mixture of different indentations. So best to use shfmt to fix it. In addition there are a lot of shellcheck findings for the new tools/ and tests/ directories.

Hi @prudo1, thanks for carefully reviewing the patch set! I believe all your concerns have been addressed now:)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants