Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERFORMANCE: SELECT COUNT() #30

Open
kfo2010 opened this issue Sep 6, 2018 · 1 comment
Open

PERFORMANCE: SELECT COUNT() #30

kfo2010 opened this issue Sep 6, 2018 · 1 comment

Comments

@kfo2010
Copy link
Contributor

kfo2010 commented Sep 6, 2018

PROBLEM: The use of select count() in mysql is extremely inefficient and will not scale well with tables full of millions+ or even billions+ of rows.

PROOF:
sql: EXPLAIN SELECT COUNT(1) FROM transactions; # this shows that it did a full index/table scan of every row in the table. 1.5M rows. Full index/table scans are bad. They are SLOOOW and will not scale well.

sql: EXPLAIN SELECT COUNT(id) FROM transactions; # still full scan 1.5M rows

sql: EXPLAIN SELECT id FROM transactions ORDER BY id LIMIT 0,1; # still full scan 1.5M rows

sql: EXPLAIN SELECT id FROM transactions ORDER BY id LIMIT 1; # scanned ONE row.

sql: SHOW TABLE STATUS LIKE 'transactions'; # this took 80 milliseconds

RESULT: The difference is 2+ seconds vs 80 milliseconds. This is huge and the 2+ seconds will increase over time as data grows. The 80 milliseconds will not.

SOLUTION:

  1. If you need to check if at least one result is returned use ORDER BY id LIMIT 0,1; instead of SELECT COUNT. Avoid using COUNT if at all possible. This will not scale.
  2. SQL queries in code should be tested with EXPLAIN. If they are doing full table scans, they need to be removed completely from code and other methods should be used that can scale to billions of rows.
  3. NOTE: InnoDB (MyISAM is dead) do not store row counts and will count() all rows each time.
  4. if you need table counts try "SHOW TABLE STATUS LIKE ..." NOTE this is an approximation but should be fine for large tables when you need to do some sort of paging.
  5. This should also speed up PAGING on the block explorer. I am seeing that the explorer is starting to get slower. This is one reason why.
@angelexevior
Copy link

angelexevior commented Mar 24, 2020

I know this is old, but i'm throwing this out there.
Just a thought process, and a wicked technique i used in the past to optimize queries for a db having millions of rows. Not sure it will apply as we don't have auto-increment fields in db.

Added where count was necessary, a primary key, autoincrement id field (If not already). For this to work, no voids should exists between AI ids. (Let's call this AIid)
Last id, is your total table count.... -1 if you want to count from 0 and not from 1.
If looking for the count on specific set of results, SELECT (MAX(AIid)-MIN(AIid)) as counttx FROM transactions WHERE.... WHATEVER...
Of course an autoincrement index comes with it's own performance impact. Will have to test!

Thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants