Skip to content

Benchmarks & System Requirements

Angus edited this page Jan 16, 2025 · 18 revisions

Recommendations

CPU

The more cores the merrier, but you can scale it to your system recommend. However atm, you'll need to manually configure which ever entry point you use in lib/tasks/*.py. Long term I intend to detect system capacities and use this scale the number of workers.

Storage Usage

Besides the code, and the python dependencies, this project downloads a lot of stuff off the net, such as:

  • 10Gb of Cached API requests against GIS servers
    • The cache will mean you can rerun the ingestion from scratch in a fraction of the earlier time.
  • 12Gb Compressed Zips
    • As well as 86.31Gb with the same data uncompressed

Overall that's 108Gb of storage usage. If you only intend to build the database once, you can just delete the data after building the database. You won't really need it. To do so run the following

rm -rf _out_{state,cache,web,zip} && mkdir -p _out_{state,cache,web,zip}

Benchmarks

My System Benchmarks

Date Ram Storage CPU OS
< 20240601 128Gb 4Tb Apple M3 Max macOS 15.2

Operation Performance

NSW VG LV Ingestion

This is part an CPU operation and part a IO operation. Tho it will download zips not already downloaded.

Date Commit Time Workers Years ZIP Already Downloaded
20250114 9ac07b52 2m 20s 8 2024, 2020, 2017 Yes
python -m lib.tasks.nsw_vg.ingest_land_values --instance ? --workers ? --truncate-raw-earlier

NSW VG PSI Ingestion

This is part an CPU operation and part a IO operation.

Date Commit Time Workers DB pool size DB batch size Public Year Filter
20250114 9ac07b52 38m 19s 8 16 1000 None
20250116 8b279452 27s 8 16 1000 >= 2024
python -m lib.tasks.nsw_vg.ingest_property_sales --instance ? --debug \
  --workers ? --worker-db-pool-size ? --worker-db-batch-size ? \
  --truncate-earlier

NSW VG Deduplicate

Date Commit Time
20250114 9ac07b52 49m 54s
python -m lib.tasks.nsw_vg.ingest_deduplicate \
  --instance 3 --debug --initial-truncate

NSW VG Property Description Processing

Date Commit Time Workers Sub Workers
20250114 9ac07b52 8m 8 8
python -m lib.tasks.nsw_vg.ingest_property_descriptions --instance ? \
  --workers ? --sub-workers ?

ABS Ingestion (minus download times)

This is part an CPU operation and part a IO operation.

Date Commit Time Workers
20250114 9ac07b52 < 1m 8

Gnaf Ingestion (minus download times)

This is part an CPU operation and part a IO operation.

Date Commit Time States Workers
20250114 9ac07b52 3m NSW 8

GIS Scrapping

This is more network bound, but reading from cache would be improved if multiprocessing was added.

date commit runtime cached scope
2025-01-14 d528222a 2hr 35m ??s No Only nsw_spatial lot feature
2025-01-14 9ac07b52 0hr 18m 40s Yes Only nsw_spatial lot feature
2025-01-14 9ac07b52 Roughly 9hr No Only nsw_spatial prop feature