Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add default limit for fetching failed rows in DbSample #1991

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

NathanBick
Copy link

Github Issue: #1985

Currently the failed rows check sample fetches all rows in dataset that fails. This fetches arbitrarily large data into memory, causing bugs. A comment in the code currently acknowledges this risk. The check does not respect either the default limit or the specified limit in a check yaml.

In PR proposes to resolve this bug by always respecting the default limit of 100. It is a very simple and contained PR. However, it does not resolve the issue of not respecting a user-specified samples limit. Nonetheless, this is an improvement. It makes the situation better and no worse.

@CLAassistant
Copy link

CLAassistant commented Jan 18, 2024

CLA assistant check
All committers have signed the CLA.

Copy link

Quality Gate Passed Quality Gate passed

The SonarCloud Quality Gate passed, but some issues were introduced.

2 New issues
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

@m1n0
Copy link
Contributor

m1n0 commented Jan 25, 2024

Hi, thanks for the contribution! Without taking the configured limit into consideration we cannot merge this as-is unfortunately. Yes indeed the comment acknowledges a potential issue if too many rows are fetched, but the way this works is that the failed rows samples queries have a built in LIMIT clause in case samples limit configuration is present, with 100 being a default.
There main reason why this is not done while fetching the results but in the query is that a large, non-limited query might cause performance issues, even if only a subset of rows is fetched afterwards.

That being said:

  • there might still be a case where LIMIT is not applied correctly - if so, could you please report it by creating an issue?
  • IF the configured sample is propagated into the DbSample fetch method then yes, limiting the result there would make 100% sense as a last resort

Copy link
Contributor

@m1n0 m1n0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just applying the limit here would break the samples limit configuration, please see my comment about how I suggest we proceed here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants