-
-
Notifications
You must be signed in to change notification settings - Fork 46
test: add failing test for large dataset performance (#420) #421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
test: add failing test for large dataset performance (#420) #421
Conversation
- Test documents performance degradation with >10K observations - check_model() currently takes 5+ minutes on 25K dataset - Test expects completion in <30 seconds - Initially FAILS as expected (proving bug exists) This test will pass after implementing data sampling in plot.check_model(). Relates to easystats#420
Summary of ChangesHello @ANAMASGARD, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a comprehensive set of R tests designed to expose and quantify a significant performance bottleneck in the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds a new test file to document and test a performance issue with check_model() on large datasets. The new tests are designed to fail initially and will pass once the performance fix is implemented. My review focuses on improving the clarity and maintainability of the new test code. I've suggested renaming a misleadingly named test and an associated cleanup, as well as using a more specific variable name to avoid confusion. Overall, this is a good step towards addressing the performance problem.
- Rename test to better describe what it checks - Remove unnecessary lme4 dependency skip for lm() test - Use unique variable name (large_data_lm) to avoid confusion Based on Gemini Code Assist review feedback.
- Add .sample_for_plot() helper (default 3000 points) and apply sampling in QQ, REQQ, linearity and homogeneity plots - Preserve all influential points when sampling outliers plot - Add performance test for large datasets Fixes easystats#420
- Remove trailing whitespace from all modified files - Fix line length in roxygen comment (split to <120 chars) - Use sample.int() instead of sample(seq_len()) - Remove explicit return() statements per linter - Run styler on all modified files All tests pass. Ready for CI checks.
This test will pass after implementing data sampling in plot.check_model().
In the Next Commit I will Push the Code with all the necessary Fixes .
Relates to #420