Skip to content

Releases: bottomless-archive-project/library-of-alexandria

2.0.0

05 Jul 09:43
Compare
Choose a tag to compare

At least a year later than the original plans, this release is done and dusted! Sorry for the delay.

Bugfixes

  • #531 Fixed that the Vault Application used incorrect version numbers when saving documents into the database.

Maintenance

  • #518 Updated the Java version to 20.
  • #520 Updated the Angular used on the frontend to version 17.

2.0.0-milestone.1

06 Jun 08:23
50b7220
Compare
Choose a tag to compare
2.0.0-milestone.1 Pre-release
Pre-release

The speed of releases slowed down recently. This is unfortunate, and there are a few logical reasons for it:

  • The application is mature and works well. Only small improvements are missing, but the core features are already there.
  • The scope of this release is rather big and the complexity of the existing applications is quite big as well.
  • We are working on improving the testing of the application. It doesn't add any new features but is necessary for further feature development.
  • My son was born a year ago, and he just started walking in the past 6 months, adding a lot more headaches to my day and lowering the amount of free time I have quite significantly. :)

The documentation for this release is available here.

Features

  • #463 Created the Beacon Application. It is helpful when the majority of the bandwidth usage should be done off-site. The documentation for this application is not yet available but will be added once 2.0.0 is released. Until then, you can ask the developers on Discord or in the Discussions.
  • #510 Added validation for every application parameter. If an invalid value is presented to one of them, the application will not start and present an error message to the console.
  • #460 Added a popup to every search result that shows the source location for the document as well as more debug information.
  • #500 Renamed some of the staging directory-related parameters in every application that used temporary directories. Please review the application input parameters while updating to the new version.
  • #474 The applications auto-create the configured folders if they do not exist. This feature was added to help with the already quite overwhelming first setup of the application suite.

Bugfixes

  • #494 Fixed that the Vault Application's recompress endpoint did not update the compression of the file nor the compression of the file in the database.

Documentation

  • #473 Documented the disk usage patterns of the applications for every folder that is being used. The documentation is available here.

Maintenance

  • #493 Updated many dependencies to their latest versions.
  • #493 Updated ElasticSearch to 8.7.
  • #509 Updated the Java version to 20.
  • #505 Updated Angular version to 15.
  • #488 Updated Spring Boot version to 3.0.0.

1.10.0

18 Nov 09:48
Compare
Choose a tag to compare

Please upgrade the MongoDB version to 6.0 if you are upgrading from 1.9.x. Doing a full database dump/backup is highly advised before doing both the database and the LoA upgrade.

The queues should be emptied before upgrading from 1.9.x to 1.10.0!

This is the last major release before 2.0.0 is released. The new version's ETA is around Q1 of 2023 but this might change depending on the scope of the release.

The documentation for this release is available here.

Features

  • #428 The result of a download request made to a document location is saved in the database. If the request downloaded something (even invalid documents) OK is saved as a result. In other scenarios, the error type (ie TIMEOUT, ORIGIN_NOT_FOUND, etc) will be saved.
  • #465 Added the duplication filtering logic to the Downloader Application as well, so it no longer sends the obvious duplicates to the Vault/Staging Applications. We kept the duplication logic in the Vault Application tough because the same duplicate can be present in the Staging Application twice while not being present in the database yet.
  • #439 Added the Brotli compression format. It is a very CPU-heavy compression algorithm but produces the best compression ratios in the vast majority of cases.
  • #425 Added support for the .txt file format.
  • #468 Added a debug endpoint for the document locations. It is very similar to the debug endpoint that existed for the documents previously.
  • #419 Added some explanation text to the administration dashboard about what each application does.
  • #461 Added the Administration Application instances and the command they are executing to the administration dashboard as well.
  • #286 Added users to the applications/UI. This is just preliminary work and the users are disabled by default. At the moment users can be registered and they can log in but there are no other consequences/features that are available to them. More changes and features will come around the users in the future tough.

Bugfixes

  • #441 Disabled the HTTPS/SSL host verifications. We do not care about the validity of a certificate when downloading documents. It caused weird errors and in general slowed down the download operations (because the SSL certs should have been downloaded, validated, etc).
  • #458 Fixed that pushing enter while the focus was in the search input field did not start the search on the search page.
  • #459 Fixed a NullPointerException that happened when certain applications were started (Web Application for example) without any Vault Applications running.
  • #462 Fixed that the Administrator Application was missing some default configurations that were required to connect to the Conductor Application.

Documentation

  • #464 Removed some no longer used runtime parameters from the documentation (loa.queue.producer-pool-size and loa.queue.consumer-pool-size).
  • #471 Documented that the -- prefix is needed before the runtime parameters when the applications are started.
  • #470 Fixed some documentation errors regarding the loa.conductor.application-type runtime parameter.

Maintenance

  • #432 Updated the supported MongoDB version to 6.0.
  • #437 Updated the Angular version used on the frontend to 14.1.0.
  • #438 Updated the versions of many minor dependencies used by the applications.

1.9.4

10 Nov 12:41
Compare
Choose a tag to compare

The documentation for this release is available here.

Bugfixes

  • #475 The Downloader Application stops after the processing of the last document is finished when the loa.downloader.source is set to FOLDER. Previously the application kept running, doing nothing at all.

Documentation

  • #478 Removed a parameter (loa.stage.location) from the documentation of the Vault Application because it was no longer used by the application after the 1.9.0 update.
  • #477 The documentation incorrectly stated the parameters name for the Staging Application's stage path. The correct name is loa.staging.location instead of loa.staging.path.

1.9.3

26 Sep 07:54
Compare
Choose a tag to compare

The documentation for this release is available here.

Bugfixes

  • #449 Fixed that the applications page was not responsive on the administration UI.
  • #466 Fixed that changing the language or the document length did not reset the page to the first (unlike everything else on the search UI).
  • #467 Fixed that the Queue Application disregarded the data directory path when creating the paging folder. The paging folder was rarely created, this is why this bug was able to hide for so long.
  • #448 Fixed that the Staging Application was not visible on the applications admin page.

1.9.2

20 Aug 12:44
Compare
Choose a tag to compare

The documentation for this release is available here.

Bugfixes

  • #451 The Downloader Application did not stop when it's stage folder was full or almost full. Now, when the stage folder got less disk space than the maximum allowed document file size (8 GB by default) the Downloader Application stops.
  • #455 The parallelism level didn't work correctly above 64 threads in the Downloader Application, because the OkHTTP client used by the downloader limited the maximum open HTTP connections to 64. This has been fixed. The maximum HTTP connection count is set to the same number as the parallelism level set for the application.
  • #454 Occasionally, the applications called the Conductor Application more than once, instead of always caching the response. This has been fixed.
  • #452 The Downloader Application printed a lot of unnecessary logs. This got changed, the application no longer prints the unnecessary parsing related logs, but prints more fine-grained log lines that make debugging easier.

Mainteance

  • #453 The Downloader and the Indexer Applications complained in the logs about an XML parser parameter that was unsupported by the default Java parser. To fix this, an XML parser called Woodstox got included in these applications. This parser is more performant and supports the parameter required by the parsers.

1.9.1

06 Aug 12:23
Compare
Choose a tag to compare

The documentation for this release is available here.

Bugfixes

  • #443 The Staging Application did not delete some temporarily created files after each upload. This after some time caused the disks to fill up, effectively stopping new uploads. This has been fixed.
  • #444 There was a minor resource leak in the DocumentFolderReader class. It has been fixed, the class doesn't leak file handles anymore.
  • #445 A deadlock can happen in the Downloader Application when files with unknown file types are being parsed.

1.9.0

20 Jul 09:53
d625054
Compare
Choose a tag to compare

The documentation for this release is available here.

Please update the Elasticsearch version used to 8.3.x!

Because of the changes regarding the document sending logic between the Downloader and the Vault Applications, the queues should be emptied (all messages drained) before updating to the new version.

Features

  • #369 Added a Staging Application to the application suite. It is responsible to hold the content of the documents when they are moved between the Downloader Application and the Vault Applications. The queues are no longer used to send large messages. This significantly lowered the memory requirements in the Vault Application (from ~16 GB to ~256 MB).
  • #423 Added a search button to the UI and disabled the previously existing auto-search. Now, if the user wants to search for documents, he/she should click on the search button. This was done because the existing auto-search was unreliable. For example, when a user mistyped something and then corrected the typo it fired two requests using up resources.
  • #424 Added support for the FictionBook (FB2) format.
  • #419 Extended the documentation for the Queue, Vault, and Generator Applications. The new documentation changes are available on the "Applications" page on the UI (under a question mark, next to the application's name) as well as in the documentation. This makes the documentation easily accessible. This work is ongoing, and the UI will be updated with hints for every application in the future.

Bugfixes

  • #429 The VaultQueueListener in the Vault Application did not support parallelism. This made the archiving quite slow compared to how quick it could have been with parallelism. This was especially problematic when the user wanted to save to more disks at the same time. The document archiving was made to support parallelism to fix this problem.

Maintenance

  • #430 Removed the connection pooling for the queue client. It is no longer necessary because all of our messages in the queue are reasonably small and the Queue Application can handle more messages per second than what the IO operations could saturate on the other parts of the application suite. So, downloading the document takes a significantly (!!!) longer time than actually sending the messages in the queues.
  • #422 Updated Elasticsearch to 8.3.1 and also updated other minor dependencies to their latest versions.

1.8.1

13 Jun 18:54
Compare
Choose a tag to compare

The documentation for this release is available here.

Bugfixes

  • #411 Fixed that the indexing logic was not run in a loop. Before, when the Indexer Application started to run, it started indexing all of the documents that were in the DB at startup time, but nothing else. Now if the processing of the initial batch is finished, then the Indexing Application will re-request new documents that were added while the indexing was running.
  • #414 The Indexer & Web Applications have never refreshed the vaults after startup. So, if the Web Application was started first, then even if the vaults were started later on, they were never discovered.
  • #415 The Downloader Application had a resource leak in its HTTP client. This has been fixed.
  • #417 Previously the statistics page on the administrator dashboard did not use the MongoDB indexes in the database. This made the query super slow when there were a lot of documents in the database. This has been fixed, now the page uses the indexes when querying the data. A new index was added for the document type aggregation for this reason as well. The page still loads rather slowly with ~80 million documents, so more optimizations will come.

Maintenance

  • #416 Updated Lingua's version to 1.2.1. Usually, we do not increase library versions on minor releases, but the memory optimizations for the library, while it does the language detection, were too big to ignore.

1.8.0

07 Jun 09:17
Compare
Choose a tag to compare

The documentation for this release is available here.

Please update the Elasticsearch version used to 8.2.x!

The Java version should be updated to 18 for this release to run.

Features

  • #403 Reworked the search UI to be more mobile-friendly. Searching should be a lot easier on mobile.
  • #389 Split the dashboard into two pages. One of the pages is responsible for showing the statistics about the collected documents, debugging document data, and so on while the other one is responsible to show the administration-related information like the running applications and their parameters.
  • #404 Made the batch size configurable on the Indexer Application. This will help with the fight against the "cursor timeout" errors.
  • #388 Added the parallelism and batchSize parameters to the dashboard for the Downloader and the Indexer Applications.
  • #383 Added a download button to the PDF files next to the open/close buttons on the search page.

Bugfixes

  • #410 Fixed that the Downloader Application printed out an incorrect message when it found a document with an unknown document type.

Maintenance

  • #384 Updated the Java version to 18.
  • #406 Updated Spring Boot to version 2.7.0.
  • #390 Updated Elasticsearch to version 8.2.0.
  • #399 Updated Angular to version 14.