You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have encountered a problem in our testing environment.
I have a scenario where I am running a similarity match for an address
I have created a table with following number of records
1603423
And I have created GIN index (since i am using pg_similarity) library jaccard similarity method
when I run the Explain analyze
EXPLAIN ANALYZE select complete_address, jaccard(complete_address, 'raj nagar ext near ajnara integrity up ghaziabad 201017') as qsim from address where complete_address~??'raj nagar ext near ajnara integrity up ghaziabad 201017' order by qsim DESC;
QUERY PLAN
Sort (cost=5856.13..5860.14 rows=1603 width=93) (actual time=12101.194..12101.197 rows=6 loops=1)
Sort Key: (jaccard(complete_address, 'raj nagar ext near ajnara integrity up ghaziabad 201017'::text)) DESC
Sort Method: quicksort Memory: 25kB
-> Bitmap Heap Scan on address (cost=172.43..5770.80 rows=1603 width=93) (actual time=3516.233..12101.172 rows=6 loops=1)
Recheck Cond: (complete_address ~?? 'raj nagar ext near ajnara integrity up ghaziabad 201017'::text)
Rows Removed by Index Recheck: 1039186
Heap Blocks: exact=58018
-> Bitmap Index Scan on address_complete_address_idx1 (cost=0.00..172.02 rows=1603 width=0) (actual time=256.037..256.037 rows=1039192 loops=1)
Index Cond: (complete_address ~?? 'raj nagar ext near ajnara integrity up ghaziabad 201017'::text)
Planning Time: 0.141 ms
Execution Time: 12101.245 ms
(11 rows)
it took 12 seconds
following are my pgconf file settings
shared buffer as 4GB
work_mem 256 MB
maintenence_work_mem 512MB
autovacuum_work_mem 20MB
My index definition is this "address_complete_address_idx1" gin (complete_address gin_similarity_ops)
It is taking the index correctly. But why it took 12 seconds to process I really don't understand.
Please help.
Thanks
C.R.Bala
The text was updated successfully, but these errors were encountered:
Your WHERE clause has low selectivity (1039192 of 1603423 rows), hence, Bitmap Heap Scan has to read and discard more than 1 million rows (actual time=3516.233..12101.172).
If you add BUFFERS as an EXPLAIN option, it shows how many rows pages it is reading.
Please find the stats as given below:
dedupechecksample=# EXPLAIN (ANALYZE, BUFFERS) select complete_address, jaccard(complete_address, 'raj nagar ext near ajnara integrity up ghaziabad 201017') as qsim from address where complete_address~??'raj nagar ext near ajnara integrity up ghaziabad 201017' order by qsim DESC;
QUERY PLAN
Sort (cost=5856.13..5860.14 rows=1603 width=93) (actual time=12023.801..12023.803 rows=0 loops=1)
Sort Key: (jaccard(complete_address, 'raj nagar ext near ajnara integrity up ghaziabad 201017'::text)) DESC
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=58306
-> Bitmap Heap Scan on address20 (cost=172.43..5770.80 rows=1603 width=93) (actual time=12023.793..12023.794 rows=0 loops=1)
Recheck Cond: (complete_address ~?? 'raj nagar ext near ajnara integrity up ghaziabad 201017'::text)
Rows Removed by Index Recheck: 1039192
Heap Blocks: exact=58018
Buffers: shared hit=58306
-> Bitmap Index Scan on address_complete_address_idx1 (cost=0.00..172.02 rows=1603 width=0) (actual time=260.443..260.443 rows=1039192 loops=1)
Index Cond: (complete_address ~?? 'raj nagar ext near ajnara integrity up ghaziabad 201017'::text)
Buffers: shared hit=288
Planning:
Buffers: shared hit=1
Planning Time: 0.129 ms
Execution Time: 12025.412 ms
(16 rows)
Am I missing anything here?
If I need to improve the performance any other settings has to be done.
Please help here.
If this can't be improved they are thinking of alternatives to use faiss library.
Hi Team
We have encountered a problem in our testing environment.
I have a scenario where I am running a similarity match for an address
I have created a table with following number of records
1603423
And I have created GIN index (since i am using pg_similarity) library jaccard similarity method
when I run the Explain analyze
EXPLAIN ANALYZE select complete_address, jaccard(complete_address, 'raj nagar ext near ajnara integrity up ghaziabad 201017') as qsim from address where complete_address~??'raj nagar ext near ajnara integrity up ghaziabad 201017' order by qsim DESC;
QUERY PLAN
Sort (cost=5856.13..5860.14 rows=1603 width=93) (actual time=12101.194..12101.197 rows=6 loops=1)
Sort Key: (jaccard(complete_address, 'raj nagar ext near ajnara integrity up ghaziabad 201017'::text)) DESC
Sort Method: quicksort Memory: 25kB
-> Bitmap Heap Scan on address (cost=172.43..5770.80 rows=1603 width=93) (actual time=3516.233..12101.172 rows=6 loops=1)
Recheck Cond: (complete_address ~?? 'raj nagar ext near ajnara integrity up ghaziabad 201017'::text)
Rows Removed by Index Recheck: 1039186
Heap Blocks: exact=58018
-> Bitmap Index Scan on address_complete_address_idx1 (cost=0.00..172.02 rows=1603 width=0) (actual time=256.037..256.037 rows=1039192 loops=1)
Index Cond: (complete_address ~?? 'raj nagar ext near ajnara integrity up ghaziabad 201017'::text)
Planning Time: 0.141 ms
Execution Time: 12101.245 ms
(11 rows)
it took 12 seconds
following are my pgconf file settings
shared buffer as 4GB
work_mem 256 MB
maintenence_work_mem 512MB
autovacuum_work_mem 20MB
My index definition is this "address_complete_address_idx1" gin (complete_address gin_similarity_ops)
It is taking the index correctly. But why it took 12 seconds to process I really don't understand.
Please help.
Thanks
C.R.Bala
The text was updated successfully, but these errors were encountered: