Skip to content

Releases: bottomless-archive-project/library-of-alexandria

1.0.0-release-candidate.6

29 Dec 15:00
Compare
Choose a tag to compare
Pre-release

We are getting very close to the final release of the application. Hopefully, if no major problem comes up RC.7 will be the final RC release before the live product.

Features

  • #145 Improved the logging of the applications. More needs to be done later on.

Optimization

  • #132 Made the Vault Application's archive endpoint non-memory dependant. Now the vault application could run on even 256 MB of ram easily (depending on the number of downloaders running).
  • #131 Made the Vault Application's archive endpoint non-blocking thus improving the throughput significantly.
  • #128 Updated the queue service and the applications to use native Artemis instead of the JMS implementation.
  • #126 Removed the web module from the applications that don't strictly required it.
  • #124 Reusing the Tika parser between documents in the future speeding up the parsing and lowering the memory requirements.
  • #123 We are using TikaInputStream whenever possible. This could lover the memory requirements for the parsing of certain document types.
  • #122 The document parsing is not being run twice to parse the page count of the documents.

Bugfixes

  • #148 The generator retries instead of failing when parsing a WARC file.
  • #142 The vault could get overwhelmed when the downloader is processing the documents too fast. Added a circuit breaker to handle this. This is not a long term fix however. We want to replace the existing HTTP implementation with RSocket.
  • #140 Updated the WebClient configuration while fixed a stream timeout bug in the Downloader Application.
  • #129 The MongoDB connection was initialized twice in every application. Now we only initialize the reactive MongoDB driver.
  • #130 The Queue Application was only accessible from the localhost. This has been fixed.
  • #125 The downloader version was always 0. This has been fixed.

Maintenance

  • #137 Update the Spring Boot version to 2.2.2.RELEASE.
  • #136 Update the Elasticsearch version to 7.5.

Because of the changed Artemis logic the queue should be emptied (deleted) before starting with the new version.

1.0.0-release-candidate.5

13 Nov 14:30
Compare
Choose a tag to compare
Pre-release

Features

  • #118 Moved the document creation logic to the Vault Application. This way only valid documents will be registered in the database.
  • #115 Moved the document validity detection to the downloader application. This way we can get rid of the REMOVED document status.
  • #102 Using Lingua for language detection, scaling the detectable languages from 15 to 50+.
  • #110 Split the document location generation logic to a new service. This will make the services easier to scale and maintain in the long run.
  • #104 Converted the Downloader Application to the reactive stack.

Bugfixes

  • #117 Fixed a bug that made the compression not to work.
  • #109 Files were not closed when used compression to save them to the vault.
  • #108 Removed the _class field when saving entities to the database.

Maintenance

  • #116 Update the Spring Boot version to 2.2.1.RELEASE.
  • #114 The redirects were not followed when downloaded the documents.
  • #113 Fixed a race condition when downloading documents.
  • #111 Updated Elasticsearch to 7.4.2.
  • #107 Removed Micrometer as a logging statistic collector.
  • #106 Fixed the MongoDB deprecation warning on startup.

Because of the changed mappings and other improvements a reindexing is required to migrate to this new release!

1.0.0-release-candidate.4

22 Oct 15:25
Compare
Choose a tag to compare
Pre-release

Features

  • #98 Optimized the Elasticsearch mappings to save space and improve indexing throughput.
  • #93 Added a document type based filtering feature to the search in the web application.
  • #91 Added a document page count based filtering feature to the search in the web application.

Bugfixes

  • #90 Fixed a bug that caused the UI to try to search when every character was deleted from the search input field.
  • #89 Fixed a bug that caused the index.html to not being shown by default when "http://localhost/" was accessed.

Maintenance

  • #85 Updated MongoDB to 4.2.
  • #87 Updated to Java 13.
  • #86 Updated to Spring Boot 2.2.0.RELEASE.
  • #84 Updated to Elasticsearch 7.4.

Please update the Java version to 13 and also update the Elasticsearch version used to 7.4.0 if you index the archived documents!

Because of the changed mappings and other improvements a reindexing is required to migrate to this new release!

1.0.0-release-candidate.3

07 Oct 19:03
Compare
Choose a tag to compare
Pre-release

Features

  • #78 The applications are switched to a fully reactive stack (Spring Webflux). This was a huge change and improved the performance of the applications anywhere between 25 - 400% depending on the hardware used and other factors.
  • #76 Added a command that removes invalid PDF documents. (The command's name is --document-validator, the documentation will be updated later on.)
  • #82 Added a loading bar to the UI when searching.
  • #48 Added the "indexed" and "available in vault" document count to the UI.
  • #55 Added a max size filter property in the downloader application. (The property is "loa.downloader.maximum-archive-size", the default value is 8589934592 bytes.)
  • #70 Added language as a search parameter on the UI.
  • #71 Re-added the exact search removed by the previous release.

Maintenance

  • #80 Updated Spring Boot version to 2.1.8.RELEASE.
  • #50 Updated Elasticsearch to 7.3.1.

Deprecations

  • #77 Removed SMB as a vault backend. For more information see the ticket.

Please update the Elasticsearch version used to 7.3.1 if you index the archived documents!

1.0.0-release-candidate.2

03 Aug 13:01
Compare
Choose a tag to compare
Pre-release

Features

  • #67 Did some performance tuning on the indexing application. (Using CBOR for indexing etc.)
  • #65 Replaced the existing processing solution with Java Task Force in the indexer and downloader applications.
  • #59 Made the web application mobile ready (reactive).
  • #56 Added a size limit to the documents when indexing.

Bugfixes

  • #60 Fixed an exception "index has exceeded [1000000] - maximum allowed to be analyzed for highlighting." thrown by Elasticsearch while indexing.

Maintenance

  • #64 Updated Spring Boot version to 2.1.6.RELEASE.
  • #64 Updated Morphia version to 1.5.3.

1.0.0-release-candidate.1

01 Jun 10:46
Compare
Choose a tag to compare
Pre-release

Features

  • #46 Every application is starting on a different web port. This will stop the "port already used" messages popping up when more than one application is running on the same machine.

Bugfixes

  • #45 Checking the file size and checksum is no longer enough for duplicate detection. The document type should be included too.
  • #47 The VaultController returned 500 (Internal Server Error) when no document was found with the provided document id. This has been fixed.
  • #49 Increased indexing timeouts.
  • #52 The web application was only working from the http://localhost/ URL. This has been fixed, now you can access it with an IP address too.
  • #53 Fixed an integer overflow that killed the application while trying to load huge documents for indexing.
  • #54 Fixed an out of memory error in the Base64Encoder.

1.0.0-milestone.4

19 May 21:04
Compare
Choose a tag to compare
1.0.0-milestone.4 Pre-release
Pre-release

Features

  • #28 Replaced MySQL with MongoDB.
  • #30 Split the documents table to document and document_location.
  • #24 Created a vault application. This application will save the documents into the vault. The downloader and indexer applications are able to save/read documents from the vault via this application.
  • #32 Added support to save doc, docx, ppt, pptx, xsl, xslx, mobi, epub and rtf documents.
  • #43 When the documents are downloaded the Accept-Encoding header is set to GZIP. If the document's host supports the GZIP encoding then it will be used for the downloading. This will save bandwidth in the long run while sacrificing some CPU cycles.
  • #20 Added a failed to index document status.

Bugfixes

  • #38 The indexer application saved the raw document data to Elasticsearch. This has been fixed.

Maintenance

  • #34 Renamed the backend application to web application.
  • #21 Renamed the database application to administrator application.
  • #37 Updated the JSoup version to 1.12.1.
  • #39 Updated the Spring Boot version to 2.1.5.RELEASE.

Documents should be recollected because of the database changes.

Reindexing is required to migrate to this new release!

1.0.0-milestone.3

06 May 08:59
Compare
Choose a tag to compare
1.0.0-milestone.3 Pre-release
Pre-release

Features

  • #12 Added the backend application. It provides search functionality for indexed documents.
  • #14 Added a way to compress documents in the vault. The GZIP and LZMA compression algorithms are supported. The compression is done on a per document basis.
  • #16 Added a command to the database application that could go over existing documents in the vault and compress them.

Bugfixes

  • #15 Fixed a bug in the URLEncoder that could kill the downloader application.
  • #17 Fixed that the indexer initializer command was unable to run.
  • #18 Fixed that the application was used mismatched Elasticsearch dependencies (mixed versions from 6.4.x with 7.0.0).

Maintenance

  • #2 Updated the Elasticsearch version used by the indexer application to 7.0.1.
  • #22 The indexer application no longer exclude the contents of the documents. It will use more disk space but necessary for the highlighting used by the backend application.

Running the database application to migrate to a new database schema is required before updating any other applications in this release!

Reindexing is required to migrate to this new release!

1.0.0-milestone.2

09 Apr 07:30
Compare
Choose a tag to compare
1.0.0-milestone.2 Pre-release
Pre-release

Features

  • #7 The URLs are saved as percentage encoded to the MySQL Database. This lowers the number of duplicate URLs.
  • #6 Added an SMB based implementation to the Vault.

Bugfixes

  • #11 Fixed that the downloader application was unable to shut down after downloaded everything that was available in the source.

Maintenance

  • #9 Updated Spring Boot version to 2.1.4.RELEASE.
  • #5 Decoupled the vault location from the filesystem.

1.0.0-milestone.1

28 Mar 08:16
Compare
Choose a tag to compare
1.0.0-milestone.1 Pre-release
Pre-release

Features

  • First milestone release of the project.
  • The database/downloader/indexer applications are fully functional but not yet production ready.