Releases: bottomless-archive-project/library-of-alexandria
1.3.0
The documentation for this release is available here.
Features
- #283 Migrated the UI of the Web Application to the latest version of Angular. This change required a full rewrite of the UI.
- #306 Improved the loading speed of the dashboard screen by parallelizing queries and requests to MongoDB and Elasticsearch.
- #284 Added a debug screen to the dashboard where individual document's database information can be requested by providing the id of the document.
- #285 Added loading indicators for the search and dashboard pages.
- #297 Added an option to configure the MongoDB connection with a URI.
Documentation
- #303 Updated the hardware recommendations in the documentation.
- #301 Added documentation for the folder-based "downloading".
Bugfixes
- #254 Fixed a bug that killed the Generator Application in rare cases.
Maintenance
1.2.0
For a full changelog between 1.2.0 and 1.1.1 please check the changelogs of the release candidates and milestones.
The documentation for this release is available here.
Bugfixes
- #293 Fixed a buffer leak in the Indexer Application.
- #294 Files that had no content were never set to INDEXED. Because of this, these documents were re-parsed every time the Indexer Application is started.
- #291 The DebugController returned DocumentEntities instead of a DebugResponse. This made the response cluttered with some meaningless information.
Documentation
1.2.0-milestone.1
The next release will be 1.2.0 most likely.
The documentation for this release is available here.
Features
- #152 Replaced the Vault Application's web endpoints with RSocket based ones. This makes indexing more performant.
- #278 Created a debug endpoint to make querying a document's info from the database easier. Later on, this endpoint will be integrated into the administrator dashboard on the Web Application. Until then, it can be called manually from the browser.
- #273 Upgraded the application to Java 16.
- #270 Renamed the INDEXING_FAILURE document status to CORRUPT. This hopefully makes it more describing what's the status of the document. Corruption can happen, and indexing failure is just a symptom. The INDEXING_FAILURE status will be removed in the future. Please use the cleanup command to remove documents from the database that has this status.
- #271 Created a cleanup command that removes the corrupt documents from the database and corresponding vaults. It removes every document that has the status INDEXING_FAILURE or CORRUPT.
- #282 Re-added some optimizations that were removed when the whole application was moved to the reactive stack.
Bugfixes
- #267 Fixed a bug that made the Vault Application occasionally fail because it tried to read the queue on multiple threads while initializing.
Maintenance
1.1.1
1.1.0
For a full changelog between 1.1.0 and 1.0.1 please check the changelogs of the release candidates and milestones.
The documentation for this release is available here.
Features
- #227 Created an official webpage for the project.
Documentation
- #262 Fixed that the documentation still mentioned Java 14 while the requirement is Java 15.
- #263 Added the Indexer Application's missing concurrent-indexer-threads property to the documentation.
- #258 Added the Web Application's missing queue properties to the documentation.
- #256 Added the Generator Application's missing skip lines property to the documentation.
- #253 The documentation mentioned commoncrawl while the parameter should be common-crawl.
Bugfixes
- #261 Fixed that the mapper.json file was not found when the Indexer Application is started as a JAR file.
- #260 The skip lines property in the Generator Application was an integer instead of long.
- #259 Bad URLs were saved to the location database because of a faulty toString() call.
- #255 The Common crawl document source were always registered, even when file was specified as the document source.
Maintenance
- #257 Updated Spring Boot version to 2.4.2.
1.1.0-release-candidate.2
The next release will be the final (non-rc) release for 1.1.0. There is a feature-freeze in place until the 1.1.0 release.
The documentation for this release is available here.
Features
- #246 Using the new Spring Boot docker plugin to publish removing one more dependency.
- #230 Added a log that prints the WARC file's location that is under processing by the Generator Application.
- #235 Replaced the Spring Boot logo on startup with our own.
- #231 The Web Application show unavailable vaults on the dashboard.
- #220 The indexer should initialize itself in Elasticsearch without running the initialize-indexer command.
- #203 Created a diagram about the flow and layout of the applications, added it to the documentation.
Optimization
- #251 Optimized the document location generating by doing the validation in the main source module.
- #245 Made the configuration properties final everywhere in the code.
- #242 Add checkstyle to the project.
- #240 Improved the speed of the WARC based document location generator.
- #238 Make the recompress endpoint reactive.
- #228 Refactored a class that used a deprecated method in the ArchivingService.
- #221 Refactored a class that used a deprecated method in the DocumentSearchService.
- #226 Fixed minor typos in the documentation.
- #225 Fixed warnings in the queue service.
Bugfixes
- #239 Fixed a Chart.js exception that occurred when there were a lot of data on the dashboard screen in the Web Application.
- #223 The documents are marked as indexing failure by accident.
Maintenance
1.1.0-release-candidate.1
The 1.1.0 version was officially intended to be released without any RC releases. Unfortunately, this is no longer possible because the necessary test systems are not available for the main developer at the moment for personal reasons. The 1.1.0 release will probably happen in 2021 January.
The documentation for this release is available here.
Features
- #201 Created a dashboard for the Web Application.
- #41 Added the ability to configure more than one vault.
- #95 Dockerised the project. The docker containers will be released alongside this release and the upcoming ones as well.
- #3 Created a reindex command that reverts every already indexed document to the DOWNLOADED status.
- #206 Added new languages to the language detector (Ganda, Shona, Sotho, Swahili, Tsonga, Tswana, Xhosa, Yoruba, Zulu). Removed language Norwegian in favor of Bokmal and Nynorsk.
- #207 Replaced the existing flag-based language display with a new solution that shows the name of the language instead.
- #200 Added the modification-enabled configuration flag to the Vault Application. If this flag is set to true, then the existing archived documents in the vault could be modified or even removed.
- #202 Added a remove-document endpoint to the Vault Application. For this endpoint to work, the modifications should be enabled on the vault.
- #178 Added an AWS S3 compatible storage backend for the Vault Application.
- #186 Removed the version 1 fix from the FileVaultLocationFactory. This is a breaking change for anybody using milestone versions before 1.0.0-RC1. We don't know about any instances that seriously use these versions so no upgrade path is provided for this issue.
- #31 Added the ability to skip lines at the beginning of the source file when using the file-based document location service.
Optimization
- #211 Improved the parallel processing when using the WARC location generator.
Bugfixes
- #197 Fixed a rarely occurring bug that stopped the ClientSession from closing when deallocation happened.
- #196 The BufferedReader in the FileDocumentLocationFactory was not closed correctly.
- #192 Fixed the MongoCursorNotFoundExceptions.
Maintenance
1.0.1
1.0.0
This is our first fully-fledged release. It was tested by collecting more than 10 million documents. Unfortunately, it is not perfect, but with some manual intervention capable to collect and index millions of documents. We will focus on more stability fixes in the upcoming 1.1.0 release.
The documentation for this release is available here.
Features
- #179 - We created new and extended documentation fro this release. It's available here.
- #187 - We disabled and minimized the extra (noisy) logging that is happening in the background. We weren't able to disable everything we wanted because of some technical limitations but we will try to follow up on them in the upcoming releases.
Bugfixes
- #189 - The Queue Application had multiple problems (partially saved messages) while storing documents. This has been fixed.
- #188 - Sometimes HTML pages that were collected as PDF documents were able to pollute the database/archive. This has been fixed, we made the document verification a lot more robust.
- #181 - The parser's InputStream was not closed correctly. Because of this, it left temporary files on the disc.
- #176 - The duplicate document detection were failing occasionally.
1.0.0-release-candidate.7
The next release will be 1.0.0. Until then a feature freeze is in place.
Features
- #172 Replaced the Spring Boot based MongoDB implementation with a pure MongoDB one. The representation of a document in the database was changed as well significantly. Because of this, all documents should be recollected. This was the last time that such thing was needed, in the future we will provide migration scripts for the database.
- #154 The Downloader and the Vault application uses a queue to communicate.
- #153 Updated top Lingua 0.6.1 adding 11 new languages to the existing language detection solution (Armenian, Bosnian, Azerbaijani, Esperanto, Georgian, Kazakh, Macedonian, Marathi, Mongolian, Serbian, Ukrainian).
- #158 Improved the logging of the applications to make them easier to diagnose when an error occurs.
- #138 Added a retry mechanism in the case when the server responds with TOO_MANY_REQUESTS.
- #161 The archiving could be done from a folder (using the Downloader Application).
Optimization
- #159 Improved the archiving speed of the Vault Application.
- #150 Made some optimizations with the schedulers in the Downloader Application.
Bugfixes
- #170 Fixed a memory leak in the Vault Application.