Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distinct values from a cube #24

Open
longhotsummer opened this issue Oct 28, 2016 · 1 comment
Open

Distinct values from a cube #24

longhotsummer opened this issue Oct 28, 2016 · 1 comment

Comments

@longhotsummer
Copy link

longhotsummer commented Oct 28, 2016

We have a postgresql table with about 28 million facts with a financial_year column. Users can use the babbage API to essentially query the distinct financial_year values, which is about 10 unique values.

Postgresql seems to be very naive when doing SELECT DISTINCT financial_year FROM table because it runs a table scan even though financial_year has an index, which takes 60+ seconds. This seems to be a known problem with postgresql.

How have others solved this problem? Do we split out the financial_year data (and all the other dimensions of a fact) into a separate table?

@akariv
Copy link
Member

akariv commented Oct 29, 2016

@longhotsummer yes, I hit that brick wall myself a couple of months ago.

I did solve it partially, by changing the DISTINCT queries to GROUP BY ones - turns out they're much quicker (still inexcusably slow, though).
The changes are in this pull request: #22

I didn't merge it yet cause tests were failing for some reason and I didn't have time to understand why (but I'm still using that branch for the time being).

If you want to CR it, and maybe try and understand why those two tests are failing, it could be great. Either way you can take this branch and see if it works for you as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants