Releases: OHNLP/Backbone
Releases · OHNLP/Backbone
Release v1.0.11
- Beam version update to support spark 3.x
Release v1.0.10
- Improved local run script allowing for interactive runtime selection of configuration to use instead of requiring script editing
- Non-interactive run still possible by supplying configuration name as an argument to the script, e.g.
./run_pipeline_local.sh your_config_name.json
- Non-interactive run still possible by supplying configuration name as an argument to the script, e.g.
- Script will now auto-package the flink jarfile on new runs
Release v1.0.9
Note: This release has significant changes to the pipeline packaging and execution. Please redownload in full and copy only the "configs", "modules", and "resources" folders over from your previous setup and rerun packaging.
Changes:
- Local Direct Runner is removed due to performance issues and replaced with an embedded flink cluster. 'run_pipeline_local.sh' has been accordingly updated to set up and use this embedded flink cluster
- Please follow instructions given during the install process. which will be automatically ran the first time 'run_pipeline_local.sh' is called.
- Example configuration updated for PASC/RECOVER task. Please update your NLP run configurations accordingly
Release v1.0.8
Transient Release - use v1.0.9 Instead
Release v1.0.7
Draft Release Autonomously Generated By CI
Release v1.0.6
Note: This release has significant changes to the pipeline packaging and execution. Please redownload in full and copy only the "configs", "modules", and "resources" folders over from your previous setup and rerun packaging.
Changes:
- JDBC Read is now done in Parallel using OFFSET/FETCH (or equivalent depending on SQL dialect)
- Pipeline options for setting parallel read optimizations to JDBCExtract have been added. Please refer to the new example configs for reference
- Of particular note, because each parallel query must be sorted on the SQL server side (required to ensure result consistency across parallel queries), please ensure
batch_sizeis reasonably large as memory permits. A reasonable start for textual narratives would be batch size of 10000 identifier_colparameter is highly recommended: this should be unique and numeric in nature if possible (although not required). Even better if this is indexed on the SQL side, as it will be used for sorting optimization. If parameter is not provided, backbone will default to sorting on all columns in column declaration order for result consistency, which may be slow.
- Of particular note, because each parallel query must be sorted on the SQL server side (required to ensure result consistency across parallel queries), please ensure
Release v1.0.5
Transitional/Draft Release. Do not Use
Release v1.0.4
Transitional/Draft Release, Do not Use.
Release v1.0.3
Explicitly exit after main thread unblocks from pipeline completion to cleanup other hanging threads.
Release v1.0.2
- Platform Specific Builds For:
- Direct Local (Debugging)
- Apache Flink
- Google Cloud Platform Dataflow
- Apache Spark
- Standalone/Local Mode (v2.4.7)
- Cluster Mode (v2.x)
- Updated example configs