Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve species rematching process #288

Open
qifeng-bai opened this issue Oct 11, 2023 · 9 comments
Open

Improve species rematching process #288

qifeng-bai opened this issue Oct 11, 2023 · 9 comments
Assignees
Milestone

Comments

@qifeng-bai
Copy link
Contributor

qifeng-bai commented Oct 11, 2023

Once a namematching service is reindexed, the lists need to have a way to rematch all existing lists.
add a function to trace the progress. We can continue aborted mission if necessary

Memo:
Rematching after > 100,000 species, starts significantly affect CPU load

@qifeng-bai qifeng-bai self-assigned this Oct 11, 2023
@qifeng-bai qifeng-bai added this to the 4.5.0 milestone Oct 11, 2023
@qifeng-bai
Copy link
Contributor Author

qifeng-bai commented Oct 14, 2023

4.5.0 Snapshot has been deployed on lists-test. Database has been reset
Screenshot 2023-10-14 at 1 32 50 pm

We can click "Rematch" button to do full rematching. After the rematching is on process, we will see:
Screenshot 2023-10-17 at 11 10 36 am

By click, we can see the progress:
Screenshot 2023-10-17 at 11 10 30 am

@charvolant

@qifeng-bai qifeng-bai changed the title Ability of rematching again Improve species rematching process Oct 17, 2023
qifeng-bai added a commit that referenced this issue Oct 17, 2023
Track rematching process etc
qifeng-bai added a commit that referenced this issue Oct 18, 2023
qifeng-bai added a commit that referenced this issue Oct 18, 2023
#288 Record rematching status
qifeng-bai added a commit that referenced this issue Oct 18, 2023
#279 Fixing missing supplied fields
#288 Track rematching process
@qifeng-bai qifeng-bai reopened this Oct 19, 2023
@qifeng-bai
Copy link
Contributor Author

qifeng-bai commented Feb 8, 2024

We have noticed that the process is slowing down when it processes large amount species:

The list dr18679, which has 166,350 records. The time completed for the first 500 records starts from 2 seconds, increased to 14seconds (~7 seconds in reading those 500 records) after 147,500 of 166,350 has been processed.

Cpu hits > 85%, Memory 6.2G/7.6.

CPU reduced to 2% 15 secs after the process is completed.

We believed that the offset relating Hibernate pagination is the possible reason. The fellowing query were used in the rematching process

SpeciesListItem.findAllByDataResourceUid(id, [max: BATCH_SIZE, offset: offset])

` def c = SpeciesListItem.createCriteria()

                    items = c.list(max: BATCH_SIZE, offset: offset) {

                        order "id", "desc"

                        le("id", beforeId)

                    }`

I built a test case to check if the assumption is correct. I use two different query to query from the first available species id, then do 50,000 offset, 50,500, 100,000 offset. The reading speed should obviously reduce if our assumption is correct.

And also, I recorded the id of the first available specie id, No. 50,000, 100,000 id, and used the where id > offset to fetch 500 records, to check if there is any difference.

The results shew the reading speeds have no obvious difference. The each reading process has been finished in 1 second.

Here is the result:

2024-02-08 16:00:59.354  INFO --- [nio-8080-exec-7] au.org.ala.specieslist.HelperService     : The list {dr18679} has totla 166350

2024-02-08 16:00:59.354  INFO --- [nio-8080-exec-7] au.org.ala.specieslist.HelperService     : Fetching 500 records with offset 0

2024-02-08 16:00:59.923  INFO --- [nio-8080-exec-7] au.org.ala.specieslist.HelperService     : Fetching completed

2024-02-08 16:01:02.928  INFO --- [nio-8080-exec-7] au.org.ala.specieslist.HelperService     : Fetching 500 records with offset 50000

2024-02-08 16:01:03.053  INFO --- [nio-8080-exec-7] au.org.ala.specieslist.HelperService     : Fetching completed

2024-02-08 16:01:03.053  INFO --- [nio-8080-exec-7] au.org.ala.specieslist.HelperService     : Fetching 500 records with offset 50500

2024-02-08 16:01:03.134  INFO --- [nio-8080-exec-7] au.org.ala.specieslist.HelperService     : Fetching completed

2024-02-08 16:01:06.139  INFO --- [nio-8080-exec-7] au.org.ala.specieslist.HelperService     : Fetching 500 records with from 2*50000
2024-02-08 16:01:06.341  INFO --- [nio-8080-exec-7] au.org.ala.specieslist.HelperService     : Fetching completed

2024-02-08 16:01:09.341  INFO --- [nio-8080-exec-7] au.org.ala.specieslist.HelperService     : Testing clause where id > ?? 
2024-02-08 16:01:09.342  INFO --- [nio-8080-exec-7] au.org.ala.specieslist.HelperService     : Fetching 500 records from the first item

2024-02-08 16:01:09.851  INFO --- [nio-8080-exec-7] au.org.ala.specieslist.HelperService     : Fetching completed

2024-02-08 16:01:12.857  INFO --- [nio-8080-exec-7] au.org.ala.specieslist.HelperService     : Fetching 500 records from NO.50,000 

2024-02-08 16:01:12.881  INFO --- [nio-8080-exec-7] au.org.ala.specieslist.HelperService     : Fetching completed

2024-02-08 16:01:15.887  INFO --- [nio-8080-exec-7] au.org.ala.specieslist.HelperService     : Fetching 500 records from No. 100,000 

2024-02-08 16:01:15.911  INFO --- [nio-8080-exec-7] au.org.ala.specieslist.HelperService     : Fetching completed
```




@qifeng-bai
Copy link
Contributor Author

Few attempts:
DB:SpeciesListItem * :1 MatchedSpecies. When a rematched species is found , first find it from MatchedSpecies. Link it to the speciesListItem if found. Or created a new MatchedSpecies and then link.

DO NOT WORK. We batch processing 500 species (at this moment) in a transaction. So when we find a new matched species and save to DB, but this record is actually not pushed into DB until the transaction is completed

@qifeng-bai
Copy link
Contributor Author

qifeng-bai commented Jun 6, 2024

Performance records:
Process the largest list of dr18679 (6 month ago) , has 166350 species

For each 500 species rematching process, the time elapsed gradually increased from ~2.6s to ~8.5s,
The process CPU load increased from ~12% to ~26.5%
Reading speed for each 500 species increased from 0.5s to 1s

However, when system started rematching 151219 species in the list dr18685 after the previous list is completed
reading speed was reduced to 0.023s. Reason is yet known
Uploading cpu_log.txt…

We believed that querying a large dataset with offset slows down the query. Using ScorllableResults would be faster.

However, when we use this snippet of reading code. It ended with ~1s slower than 'offset'

                def session = sessionFactory.currentSession
                def hql = "FROM SpeciesListItem WHERE dataResourceUid = :listDRId AND id > :startId ORDER BY id "
                def query = session.createQuery(hql)
                query.setParameter("listDRId", listDRId)
                query.setParameter("startId", startId)
                def scrollableResults = query.scroll(ScrollMode.FORWARD_ONLY)

                while (scrollableResults.next()) {
                    def species = (SpeciesListItem) scrollableResults.get(0)
                    items.add(species)
                    if (items.size() > BATCH_SIZE ) {
                        startId = species.id
                        break;
                    }
                }

                scrollableResults.close()

                log.info("reading time: ${TimeCategory.minus(new Date(), `startReading)}")`
Rematched 500 of 166350 completed, time elapsed: 3.854 seconds, System CPU load:  0.0%, Process CPU load : 12.063%
 reading time: 1.591 seconds
Rematched 1000 of 166350 completed, time elapsed: 1.823 seconds, System CPU load:  60.8824%, Process CPU load : 12.8777%
 reading time: 1.721 seconds
 Rematched 1500 of 166350 completed, time elapsed: 1.609 seconds, System CPU load:  53.8346%, Process CPU load : 12.4646%
 reading time: 1.566 seconds

**

For improve the reading speed of scrollableResult, we need to keep Session is open and scrollableResult remains open after , for example, 500 records were loaded and being processed. The benefit is the reading speed maintain stable ~0.8s, compare with Offset solution. the time won't increase with the more records have been read

In this way, we need to create a new Session, with long timeout to ~4Hours for the largest list. Otherwise it may lead to ScrollableResults "expiring" if the cursor is left open too long without being used.

**

2024-06-07 12:02:03.907  INFO --- au.org.ala.specieslist.HelperService     : Reading 161000 / 166350 took 0.681 seconds
2024-06-07 12:02:04.673  INFO --- au.org.ala.specieslist.HelperService     : Reading 161500 / 166350 took 0.766 seconds
2024-06-07 12:02:05.516  INFO --- au.org.ala.specieslist.HelperService     : Reading 162000 / 166350 took 0.843 seconds
2024-06-07 12:02:06.352  INFO --- au.org.ala.specieslist.HelperService     : Reading 162500 / 166350 took 0.836 seconds
2024-06-07 12:02:07.115  INFO --- au.org.ala.specieslist.HelperService     : Reading 163000 / 166350 took 0.763 seconds
2024-06-07 12:02:07.740  INFO --- au.org.ala.specieslist.HelperService     : Reading 163500 / 166350 took 0.625 seconds
2024-06-07 12:02:08.674  INFO --- au.org.ala.specieslist.HelperService     : Reading 164000 / 166350 took 0.934 seconds
2024-06-07 12:02:09.709  INFO --- au.org.ala.specieslist.HelperService     : Reading 164500 / 166350 took 1.034 seconds
2024-06-07 12:02:10.726  INFO --- au.org.ala.specieslist.HelperService     : Reading 165000 / 166350 took 1.017 seconds
2024-06-07 12:02:11.492  INFO --- au.org.ala.specieslist.HelperService     : Reading 165500 / 166350 took 0.766 seconds
2024-06-07 12:02:12.213  INFO --- au.org.ala.specieslist.HelperService     : Reading 166000 / 166350 took 0.720 seconds

If we use

query.setParameter("listDRId", listDRId)
                    query.setFirstResult(offset)  // Set the starting point of the results
                    query.setMaxResults(BATCH_SIZE)  // Limit the number of results to fetch

It does not improve the reading process. The reading time starts from ~0.8s, increase slightly to 1.2s, (Without CPU burden caused by Rematching

2024-06-07 10:53:26.775  INFO --- au.org.ala.specieslist.HelperService     : Reading 500 from 1500 / 166350 took 0.785 secondss
2024-06-07 10:53:27.431  INFO --- au.org.ala.specieslist.HelperService     : Reading 500 from 2000 / 166350 took 0.656 secondss
2024-06-07 10:53:28.146  INFO --- au.org.ala.specieslist.HelperService     : Reading 500 from 2500 / 166350 took 0.715 secondss
2024-06-07 10:53:29.066  INFO --- au.org.ala.specieslist.HelperService     : Reading 500 from 3000 / 166350 took 0.920 secondss
2024-06-07 10:53:29.822  INFO --- au.org.ala.specieslist.HelperService     : Reading 500 from 3500 / 166350 took 0.756 secondss

2024-06-07 10:58:26.471  INFO --- au.org.ala.specieslist.HelperService     : Reading 500 from 164500 / 166350 took 1.221 secondss
2024-06-07 10:58:27.701  INFO --- au.org.ala.specieslist.HelperService     : Reading 500 from 165000 / 166350 took 1.230 secondss
2024-06-07 10:58:28.830  INFO --- au.org.ala.specieslist.HelperService     : Reading 500 from 165500 / 166350 took 1.129 secondss


@qifeng-bai
Copy link
Contributor Author

qifeng-bai commented Jun 13, 2024

Using a new session with a long timeout connection, the reading / writing speeds on processing a large list is very stable.

Starts with reading/writing : ~0.025S/0.838s from the beginning, and R/W 0.024S/0.98S

Total time cost to complete the list of dr18679 [166350] : 18 minutes, 26.660 seconds

2024-06-13 13:12:00.258  INFO --- au.org.ala.specieslist.HelperService     : Reading 500 / 166350 took 0.025 seconds
2024-06-13 13:12:02.014  INFO --- au.org.ala.specieslist.HelperService     : Rematching took 1.748 seconds
2024-06-13 13:12:02.873  INFO --- au.org.ala.specieslist.HelperService     : Saving to DB took 0.848 seconds
2024-06-13 13:12:02.889  INFO --- au.org.ala.specieslist.HelperService     : Reading 1000 / 166350 took 0.014 seconds
2024-06-13 13:12:04.997  INFO --- au.org.ala.specieslist.HelperService     : Rematching took 2.108 seconds
2024-06-13 13:12:05.855  INFO --- au.org.ala.specieslist.HelperService     : Saving to DB took 0.858 seconds
2024-06-13 13:12:05.872  INFO --- au.org.ala.specieslist.HelperService     : Reading 1500 / 166350 took 0.017 seconds
2024-06-13 13:12:08.169  INFO --- au.org.ala.specieslist.HelperService     : Rematching took 2.297 seconds
2024-06-13 13:12:09.037  INFO --- au.org.ala.specieslist.HelperService     : Saving to DB took 0.867 seconds
2024-06-13 13:12:09.054  INFO --- au.org.ala.specieslist.HelperService     : Reading 2000 / 166350 took 0.017 seconds
2024-06-13 13:12:10.977  INFO --- au.org.ala.specieslist.HelperService     : Rematching took 1.922 seconds
2024-06-13 13:12:11.824  INFO --- au.org.ala.specieslist.HelperService     : Saving to DB took 0.847 seconds
2024-06-13 13:29:37.804  INFO --- au.org.ala.specieslist.HelperService     : Reading 165000 / 166350 took 0.027 seconds
2024-06-13 13:29:40.376  INFO --- au.org.ala.specieslist.HelperService     : Rematching took 2.572 seconds
2024-06-13 13:29:41.300  INFO --- au.org.ala.specieslist.HelperService     : Saving to DB took 0.923 seconds
2024-06-13 13:29:41.326  INFO --- au.org.ala.specieslist.HelperService     : Reading 165500 / 166350 took 0.026 seconds
2024-06-13 13:29:43.088  INFO --- au.org.ala.specieslist.HelperService     : Rematching took 1.762 seconds
2024-06-13 13:29:44.009  INFO --- au.org.ala.specieslist.HelperService     : Saving to DB took 0.921 seconds
2024-06-13 13:29:44.033  INFO --- au.org.ala.specieslist.HelperService     : Reading 166000 / 166350 took 0.024 seconds
2024-06-13 13:29:46.641  INFO --- au.org.ala.specieslist.HelperService     : Rematching took 2.608 seconds
2024-06-13 13:29:47.609  INFO --- au.org.ala.specieslist.HelperService     : Saving to DB took 0.968 seconds
2024-06-13 13:29:49.607  INFO --- au.org.ala.specieslist.HelperService     : Rematching took 1.977 seconds
2024-06-13 13:29:50.343  INFO --- au.org.ala.specieslist.HelperService     : Saving to DB took 0.736 seconds

@qifeng-bai
Copy link
Contributor Author

qifeng-bai commented Jun 14, 2024

Full rematch testing:

~166K ~14 minutes -> 200/s
~38K  ~3m ->220/s
~1017, 6s -> 169/s
~248, 2s -> 124/s
~20, 2s -> 10/s
2024-06-14 09:40:49.416  INFO --- au.org.ala.specieslist.HelperService     : Rematching 166350 species in the list dr18679
2024-06-14 09:55:22.167  INFO --- au.org.ala.specieslist.HelperService     : Total time cost to complete the list of dr18679 [166350] : 14 minutes, 32.746 seconds
2024-06-14 09:55:25.278  INFO --- au.org.ala.specieslist.HelperService     : Rematching 151219 species in the list dr18685
2024-06-14 10:08:33.134  INFO --- au.org.ala.specieslist.HelperService     : Total time cost to complete the list of dr18685 [151219] : 13 minutes, 7.856 seconds
2024-06-14 10:08:35.488  INFO --- au.org.ala.specieslist.HelperService     : Rematching 38412 species in the list dr13406
2024-06-14 10:11:33.622  INFO --- au.org.ala.specieslist.HelperService     : Total time cost to complete the list of dr13406 [38412] : 2 minutes, 58.134 seconds
2024-06-14 10:11:35.864  INFO --- au.org.ala.specieslist.HelperService     : Rematching 35277 species in the list dr8309
2024-06-14 10:14:15.864  INFO --- au.org.ala.specieslist.HelperService     : Total time cost to complete the list of dr8309 [35277] : 2 minutes, 40.000 seconds
2024-06-14 10:51:23.047  INFO --- au.org.ala.specieslist.HelperService     : Total time cost to complete the list of dr8046 [7307] : 39.216 seconds
2024-06-14 10:51:25.761  INFO --- au.org.ala.specieslist.HelperService     : Rematching 7147 species in the list dr2720
2024-06-14 10:51:55.762  INFO --- au.org.ala.specieslist.HelperService     : Total time cost to complete the list of dr2720 [7147] : 30.001 seconds
2024-06-14 10:51:58.325  INFO --- au.org.ala.specieslist.HelperService     : Rematching 7063 species in the list dr7781
2024-06-14 10:52:26.489  INFO --- au.org.ala.specieslist.HelperService     : Total time cost to complete the list of dr7781 [7063] : 28.164 seconds
2024-06-14 10:52:29.533  INFO --- au.org.ala.specieslist.HelperService     : Rematching 7031 species in the list dr22372
2024-06-14 12:03:07.785  INFO --- au.org.ala.specieslist.HelperService     : Total time cost to complete the list of dr8126 [1017] : 6.634 seconds
2024-06-14 12:03:11.480  INFO --- au.org.ala.specieslist.HelperService     : Rematching 1017 species in the list dr8127
2024-06-14 12:03:18.303  INFO --- au.org.ala.specieslist.HelperService     : Total time cost to complete the list of dr8127 [1017] : 6.823 seconds
2024-06-14 12:03:21.941  INFO --- au.org.ala.specieslist.HelperService     : Rematching 1017 species in the list dr9045
2024-06-14 12:03:27.811  INFO --- au.org.ala.specieslist.HelperService     : Total time cost to complete the list of dr9045 [1017] : 5.870 seconds
2024-06-14 12:25:36.580  INFO --- au.org.ala.specieslist.HelperService     : Rematching 250 species in the list dr1946
2024-06-14 12:25:41.740  INFO --- au.org.ala.specieslist.HelperService     : Total time cost to complete the list of dr1946 [250] : 5.160 seconds
2024-06-14 12:25:45.659  INFO --- au.org.ala.specieslist.HelperService     : Rematching 250 species in the list dr1949
2024-06-14 12:25:51.606  INFO --- au.org.ala.specieslist.HelperService     : Total time cost to complete the list of dr1949 [250] : 5.947 seconds
2024-06-14 12:25:55.373  INFO --- au.org.ala.specieslist.HelperService     : Rematching 248 species in the list dr7389
2024-06-14 12:25:58.606  INFO --- au.org.ala.specieslist.HelperService     : Total time cost to complete the list of dr7389 [248] : 3.233 seconds
2024-06-14 12:26:02.377  INFO --- au.org.ala.specieslist.HelperService     : Rematching 242 species in the list dr6796
2024-06-14 13:49:56.551  INFO --- au.org.ala.specieslist.HelperService     : Rematching 38 species in the list dr11501
2024-06-14 13:50:00.274  INFO --- au.org.ala.specieslist.HelperService     : Total time cost to complete the list of dr11501 [19] : 3.723 seconds
2024-06-14 13:50:04.347  INFO --- au.org.ala.specieslist.HelperService     : Rematching 19 species in the list dr11500
2024-06-14 13:50:06.873  INFO --- au.org.ala.specieslist.HelperService     : Total time cost to complete the list of dr11500 [19] : 2.526 seconds
2024-06-14 13:50:10.948  INFO --- au.org.ala.specieslist.HelperService     : Rematching 38 species in the list dr11501
2024-06-14 13:50:14.165  INFO --- au.org.ala.specieslist.HelperService     : Total time cost to complete the list of dr11501 [19] : 3.217 seconds
2024-06-14 13:50:18.137  INFO --- au.org.ala.specieslist.HelperService     : Rematching 19 species in the list dr11502
2024-06-14 13:50:20.523  INFO --- au.org.ala.specieslist.HelperService     : Total time cost to complete the list of dr11502 [19] : 2.386 seconds
2024-06-14 13:50:24.471  INFO --- au.org.ala.specieslist.HelperService     : Rematching 19 species in the list dr11503

qifeng-bai added a commit that referenced this issue Jun 20, 2024
qifeng-bai added a commit that referenced this issue Jun 20, 2024
qifeng-bai added a commit that referenced this issue Jun 20, 2024
qifeng-bai added a commit that referenced this issue Jun 21, 2024
Amend the DB update script
hamzajaved-csiro added a commit that referenced this issue Jun 21, 2024
@qifeng-bai
Copy link
Contributor Author

qifeng-bai commented Jun 23, 2024

Rematching on lists-test was completed :
Total time to complete 3631/3631 lists : 1 hours, 10 minutes, 54.559 seconds

There is still a small issue in rematchLog: https://lists-test.ala.org.au/ws/rematchLog/1. [Deleted]
The last two logs were not written into the database

2024-06-21 17:11:11.248  INFO --- [0.1-8080-exec-9] au.org.ala.specieslist.HelperService     : dr22734 [ 1 ] completed, time cost : 0.303 seconds
2024-06-21 17:11:11.250  INFO --- [0.1-8080-exec-9] au.org.ala.specieslist.HelperService     : Rematching 1 species in the list dr22735
2024-06-21 17:11:11.551  INFO --- [0.1-8080-exec-9] au.org.ala.specieslist.HelperService     : dr22735 [ 1 ] completed, time cost : 0.301 seconds
2024-06-21 17:11:11.568  INFO --- [0.1-8080-exec-9] a.o.a.specieslist.SpeciesListController  : Total time to complete 3631/3631 lists : 1 hours, 10 minutes, 54.559 seconds

@hamzajaved-csiro
Copy link
Contributor

Probably why the status of the rematching does not change as well, remains at RUNNING

qifeng-bai added a commit that referenced this issue Jun 24, 2024
qifeng-bai added a commit that referenced this issue Jun 24, 2024
Using a long live session to process a large list
Rematching based on list not species
@qifeng-bai
Copy link
Contributor Author

qifeng-bai commented Jun 24, 2024

Another round rematching was tested. It was completed in 2 hours. https://lists-test.ala.org.au/ws/rematchLog/2

@hamzajaved-csiro and I figured out a workaround to solve the issue the last rematching log was not written in DB. However, we did not understand the exact reasons why the last log was not saved .

qifeng-bai added a commit that referenced this issue Jun 28, 2024
#288 Optimised rematch process
qifeng-bai added a commit that referenced this issue Jun 30, 2024
updates included in 5.2.0 snapshot
#292 Grails 6 upgrade
#295 Support the latest AuthService Plugin
#299 Fixed the broken list upload from Spatial
#302 Fixed download list csv issue
#306 Fixed 'family(matched)' facade
#288 Improved rematching process performance
qifeng-bai added a commit that referenced this issue Jun 30, 2024
Including updates from 5.2.1 snapshot

updates included in 5.2.0 snapshot
#292 Grails 6 upgrade
#295 Support the latest AuthService Plugin
#299 Fixed the broken list upload from Spatial
#302 Fixed download list csv issue
#306 Fixed 'family(matched)' facade
#288 Improved rematching process performance
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants