-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CI Improvements, Minor documentation changes #18
Conversation
geotrieu
commented
Oct 27, 2023
•
edited
Loading
edited
- Changed test names to include std_ prefix for tests to be run on the CI pipeline
- Added CI testing for MLP, DLRM, and NPU tests
- CI testing displays NPU test results on the summary page
- Implemented failing states for NPU tests, both for failing to compile/run, and also failing a QoR tolerance of 50.00%
- The MLP and DLRM tests remain smoke tests, only checks for compilation success/fail.
- Removed redundant tests from the rad-sim/test directory (as they are duplicated in rad-sim/test_legacy folder already
- RAD-Sim Adder example "Running the Example" section has added instructions on how to run it (feature request)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember the baseline numbers of the NPU test were outdated. We should update them so the QoR comparison is actually 0% for all tests. Is that already done?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just updated the baseline values, running now here: https://github.com/andrewboutros/rad-flow/actions/runs/6726373725
I wonder if it might be worth it to change the tolerance from 50% if all the QoR is now reset back to 0%.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing it to something like 5-10% should be better. You are right, 50% sounds like a very loose upper/lower bound. We should add to the documentation how to change this tolerance value for each test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These npu-rads/rad_{1,2,3} tests are basically simulating the NPU on the 3 RAD examples in the IEEE Access paper. They could be redundant in terms of workloads but they test different variations of RAD architectures which is something that is not covered by other test cases. We might want to think if we want to include some of these in our tests.