Skip to content

Releases: abrom/henkei

v2.2.1.2

24 Feb 04:19
0b8e79a
Compare
Choose a tag to compare

What's Changed

  • Update gemspec to allow Ruby 3.1 by @abrom in #26

Full Changelog: v2.2.1.1...v2.2.1.2

v2.2.1.1

27 Dec 06:59
3fca9d2
Compare
Choose a tag to compare

Update Apache Tika to v2.2.1

Tika 2.2.1 includes log4j 2.17.0

v2.2.0.1

18 Dec 10:24
8e37793
Compare
Choose a tag to compare

Update Apache Tika to v2.2.0

Apache Tika v2.x brings with it some changes. One key change is that the Tika client and server applications have
been split up. To keep the gem size down Henkei will only include the client app. That is to say, each time you
call to Henkei, a new Java process will be started, run your command, then terminate.

Another change is the metadata keys. A lot of duplicate keys have been removed in favour of a more standards
based approach. A list of the old vs new key names can be found here

Note 1: Anyone concerned about log4j's CVE-2021-44228, Tika 2.2.0 includes log4j 2.15.0 (which disables JndiLookup)

Note 2: The updated Tika will by default log an INFO message about the performance impact of the TesseractOCR library. I have made Henkei v2.x behave the same as v1.x by making the loading of the OCR library opt in.

I've tried to disable the INFO message by specifying a Log4j configuration file (see below), however my knowledge of Log4j is limited, and in specifying the config file it appears to disable logging of any message. I don't think that is a good option as it would mute any "real" errors. I tried enabling log4j debugging which showed that the config was loading successfully, but still no output. Any input on how to do this "properly" would be appreciated!

java -Dlog4j.configurationFile=path/to/log4j-config.xml -Dlog4j2.debug=true -jar path/to/tika-app.jar .... etc etc ....

v1.27.1

14 Dec 01:28
8e37793
Compare
Choose a tag to compare

Update Apache Tika to v1.27

v1.26.1

19 Jul 15:39
d3d6ea5
Compare
Choose a tag to compare

Update Apache Tika to v1.26

v1.25.1

02 Feb 00:53
e84c44b
Compare
Choose a tag to compare

Update Apache Tika to v1.25

v1.24.1

02 Feb 00:49
e3c1b7c
Compare
Choose a tag to compare

Update Apache Tika to v1.24.1

v1.23.3

02 Feb 00:47
11cf3cf
Compare
Choose a tag to compare

PERF: Eliminate dependency on mime-types gem (#17) - @BigBigDoudou
Fix bug where Java or Jar/config paths were not escaped (#19)

v1.23.1

02 Feb 00:46
6650406
Compare
Choose a tag to compare

Fix data streaming error for web sourced PDFs #12

v1.23.0

26 Dec 14:36
0437714
Compare
Choose a tag to compare

Updated Apache Tika to v1.23 (#10)
For changes, see https://tika.apache.org/1.23/index.html

Update Tika read function to use Open3.capture2 instead of IO.popen due to memory corruption issues with certain files (#9)