-
Notifications
You must be signed in to change notification settings - Fork 0
Benchmarks & System Requirements
The more cores the merrier, but you can scale it to your system recommend. However atm, you'll need to manually configure which ever entry point you use in lib/tasks/*.py
. Long term I intend to detect system capacities and use this scale the number of workers.
Besides the code, and the python dependencies, this project downloads a lot of stuff off the net, such as:
- 10Gb of Cached API requests against GIS servers
- The cache will mean you can rerun the ingestion from scratch in a fraction of the earlier time.
- 12Gb Compressed Zips
- As well as 86.31Gb with the same data uncompressed
Overall that's 108Gb of storage usage. If you only intend to build the database once, you can just delete the data after building the database. You won't really need it. To do so run the following
rm -rf _out_{state,cache,web,zip} && mkdir -p _out_{state,cache,web,zip}
Date | Ram | Storage | CPU | OS |
---|---|---|---|---|
< 20240601 | 128Gb | 4Tb | Apple M3 Max | macOS 15.2 |
This is part an CPU operation and part a IO operation. Tho it will download zips not already downloaded.
Date | Commit | Time | Workers | Years | ZIP Already Downloaded |
---|---|---|---|---|---|
20250114 | 9ac07b52 |
2m 20s | 8 | 2024, 2020, 2017 | Yes |
python -m lib.tasks.nsw_vg.ingest_land_values --instance ? --workers ? --truncate-raw-earlier
This is part an CPU operation and part a IO operation.
Date | Commit | Time | Workers | DB pool size | DB batch size | Public Year Filter |
---|---|---|---|---|---|---|
20250114 | 9ac07b52 |
38m 19s | 8 | 16 | 1000 | None |
20250116 | 8b279452 |
27s | 8 | 16 | 1000 | >= 2024 |
python -m lib.tasks.nsw_vg.ingest_property_sales --instance ? --debug \
--workers ? --worker-db-pool-size ? --worker-db-batch-size ? \
--truncate-earlier
Date | Commit | Time |
---|---|---|
20250114 | 9ac07b52 |
49m 54s |
python -m lib.tasks.nsw_vg.ingest_deduplicate \
--instance 3 --debug --initial-truncate
Date | Commit | Time | Workers | Sub Workers |
---|---|---|---|---|
20250114 | 9ac07b52 |
8m | 8 | 8 |
python -m lib.tasks.nsw_vg.ingest_property_descriptions --instance ? \
--workers ? --sub-workers ?
This is part an CPU operation and part a IO operation.
Date | Commit | Time | Workers |
---|---|---|---|
20250114 | 9ac07b52 |
< 1m | 8 |
This is part an CPU operation and part a IO operation.
Date | Commit | Time | States | Workers |
---|---|---|---|---|
20250114 | 9ac07b52 |
3m | NSW | 8 |
This is more network bound, but reading from cache would be improved if multiprocessing was added.
date | commit | runtime | cached | scope |
---|---|---|---|---|
2025-01-14 | d528222a |
2hr 35m ??s | No | Only nsw_spatial lot feature |
2025-01-14 | 9ac07b52 |
0hr 18m 40s | Yes | Only nsw_spatial lot feature |
2025-01-14 | 9ac07b52 |
Roughly 9hr | No | Only nsw_spatial prop feature |