- The user interface is a read-only PostgreSQL database in AWS RDS, with Azure authentication.
- Accessible to all current BDE users.
- Process BDE files every day except Sunday into a structure conforming to the legacy BDE Processor DB. The processing should take much less than a day.
- Optional post-processing pulled in from separate internal repository.
- All changes for a single day are imported within a single transaction.
- Immediate notification of completed import (like the legacy BDE Processor SNS topic).
- Documentation (including SQL) of how the simplified layers are generated.
- Schema changes synced with users.
- End user documentation about how
- Table versioning. TBD: Verify that the known use cases are still valid:
- Positioning uses it ("mostly parcels") for further analysis in QGIS.
- The Koordinates import process requires it for a speed-up.
- Keep BDE files. This could be done as a separate system, but that would complicate the architecture and release process. TBD: How long do we need to keep these files? Indefinitely is going to be very costly. A year would be cheaper, but completely arbitrary. Keeping just the last L0 and the associated L5 files would require some code, but would probably be the minimum we could keep to enable a restore from a blank slate. Also, which use cases do we have for this data?
- Thorough testing of every part of the process before merging any pull requests.
- Separate, managed accounts for infrastructure testing and production.
- Trivial deployment.
- Monitor performance characteristics.
- Verbose logging to central ElasticSearch instance.
- Alerts:
- Process start/finish with result and auxiliary information in case of failure.
- Big subprocesses' start/finish.
- No/minimal Koordinates changes necessary, since these would be very costly.
- Identical schema to current BDE. This would ensure users would not need to change anything in their import process.
- Pull rather than push data from Landonline. This way we can keep the entire system together, rather than having a costly and awkward split between on-premise and cloud architecture.
- Continuous deployment. Any time a change is merged to the main branch some smoke tests are performed and the result is deployed to production.
- React to change rather than run on a timer, where possible. Ideally the BDE Processor should know about new files as soon as the last file has been closed for writing, and should immediately start processing.
- From-scratch deployment friendly. That is, we should always have the option to deploy from scratch, deleting all the resources we've created and getting back to an identical state (from the end users' perspective) in a timely fashion (that is, before the next data dump).
- Public access, which in turn requires:
- A separate user with read-only access to only the appropriately licensed tables (some have a restrictive customer license).
- Some way to make sure users know about the Creative Commons licensing of the content , maybe by having to get an API key to access it.
- Minimal code changes for upstream schema changes. For example, rather than using complex schema migration code, we should be able to just change the ETL code and redeploy.
- Read replica use does not affect the performance of the pipeline to LDS. Basically just means we'll need to use separate read replicas for them.
- Zero configuration. The service should rely on Azure SSO so we don't need to maintain users, we should not need to do any IP whitelisting, and the current version of the BDE Processor should only support the current version of Landonline.
graph LR
subgraph Landonline
informix[(Informix)] -->|daily| file-share[file share]
end
subgraph processor-graph[BDE Processor]
file-share --> processor[/processor/] --> reference[(reference)]
end
reference -->|read-only| LDS
reference -->|read-only| public
- Original BDE Processor and dependent repositories
- bde-processor-deployment wiki
- Internal documentation