Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: 400 bad request error after loading Parquet with inf values #4601

Closed
cdeil opened this issue Apr 14, 2024 · 10 comments
Closed

Fix: 400 bad request error after loading Parquet with inf values #4601

cdeil opened this issue Apr 14, 2024 · 10 comments
Assignees
Labels
Type:Bug Something isn't working

Comments

@cdeil
Copy link

cdeil commented Apr 14, 2024

Describe the bug

After loading a Parquet data files with inf values I get 400 bad request errors when trying to make a model or dashboard.

To Reproduce

Create a Parquet file that has some inf values e.g. like this:

import numpy as np
import pandas as pd
df = pd.DataFrame({"a": [np.inf, 42, np.nan, np.inf, -np.inf, 99]})
df.to_parquet("example.parquet")

Load it with rill

Open browser dev tools and see the error.

Expected behavior

rill should be able to handle data with inf values.

Screenshots

Screenshot 2024-04-14 at 23 53 55

Desktop (please complete the following information):

rill version 0.42.3 (build commit: cd0501f date: 2024-04-01T13:39:29Z)

Chrome Version 123.0.6312.59 (Official Build) (arm64)

@cdeil cdeil added the Type:Bug Something isn't working label Apr 14, 2024
@mindspank mindspank changed the title 400 bad request error after loading Parquet with inf values Fix: 400 bad request error after loading Parquet with inf values Jun 3, 2024
@mindspank
Copy link
Contributor

@begelundmuller just a nudge so we don't miss it for this sprint.

@egor-ryashin
Copy link
Contributor

egor-ryashin commented Jun 25, 2024

stddev_pop doesn’t work when there’s Infinity value, should we return empty stats for such a column? @begelundmuller

or we can skip those values

D select case when isinf('Infinity'::DOUBLE) then 1 else 0 end as a;
┌───────┐
│   a   │
│ int32 │
├───────┤
│     1 │

@begelundmuller
Copy link
Contributor

If the function doesn't support infinity, I guess ideally we return null when infinity is present? Because we don't know if the std. dev. should be 0 or an infinity. We should also file an issue for this with DuckDB.

@egor-ryashin
Copy link
Contributor

egor-ryashin commented Jun 26, 2024

I think std dev should not work with Infinity values (the same way a division shoudn't work with zeros), those values usually filtered out.
The question if a user supplies non-prepared data - my opinion we should show at least something but indicate the data contains incorrect values. The user can filter them out in the model.

@egor-ryashin
Copy link
Contributor

How about we add additional field here so that UI can decide how to indicate it:

message NumericStatistics {
  double min = 1;
  double max = 2;
  double mean = 3;
  double q25 = 4;
  double q50 = 5;
  double q75 = 6;
  double sd = 7;
  bool inifinity = 8;  // here
}

@egor-ryashin
Copy link
Contributor

btw, we don't user stddev in UI
image

@egor-ryashin
Copy link
Contributor

image

@egor-ryashin
Copy link
Contributor

btw, we don't support Infinity in UI, min(-Infinity) returns -Infinity, right now it shows
image

@egor-ryashin
Copy link
Contributor

egor-ryashin commented Jun 26, 2024

btw, Clickhouse reports 0 for such column, and Pandas reports NaN. I prefer to rely on Pandas in statistics as it was used for statistical methods long before Clickhouse, meaning we shoudn't skip Infinity but should report incorrect values.
Anyway we can just skip using stddev as we don't use it in UI.

@egor-ryashin
Copy link
Contributor

#5156

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type:Bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants