Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates NLM branch to 2.9.2 #5

Open
wants to merge 1,948 commits into
base: 2.4.1-nlm
Choose a base branch
from

Conversation

jamesvillarrubia
Copy link

@jamesvillarrubia jamesvillarrubia commented Jun 11, 2024

PR Summary

This PR updates the 2.4.1-nlm branch to Tika 2.9.2. In addition to this core update, it addresses the OCR issues reported in issue #50 and potentially resolves several other related issues.

Details

  • Branch Update: Upgraded from Tika 2.4.1 to 2.9.2.
  • Issue Resolution: Expected to resolve OCR issues documented in issue #50.
  • Test Status: While some tests pass, others exhibit flakiness on the Tika side. As Java is not my primary language, further review and testing are necessary.

Additional Notes

Given the extensive nature of this PR, the custom functions ported from 2.4.1 have been compared against 2.9.3. The diffs for these custom functions are available for review here:

Custom Functions Diff

Request for Review

  • Expertise Needed: Review and confirmation from someone with more Java expertise.
  • Action Required: Please review the changes, especially the custom functions diff linked above, and confirm if the update and fixes are correctly implemented.

THausherr and others added 30 commits September 11, 2023 07:58
…2.547

Bump aws.version from 1.12.546 to 1.12.547
* TIKA-4125 -- tweak rfc822 detection a bit
* TIKA-4124 -- add test documents and turn on unit tests for altchunk in docx
Bumps `aws.version` from 1.12.547 to 1.12.548.

Updates `com.amazonaws:aws-java-sdk-s3` from 1.12.547 to 1.12.548
- [Changelog](https://github.com/aws/aws-sdk-java/blob/master/CHANGELOG.md)
- [Commits](aws/aws-sdk-java@1.12.547...1.12.548)

Updates `com.amazonaws:aws-java-sdk-transcribe` from 1.12.547 to 1.12.548
- [Changelog](https://github.com/aws/aws-sdk-java/blob/master/CHANGELOG.md)
- [Commits](aws/aws-sdk-java@1.12.547...1.12.548)

---
updated-dependencies:
- dependency-name: com.amazonaws:aws-java-sdk-s3
  dependency-type: direct:production
  update-type: version-update:semver-patch
- dependency-name: com.amazonaws:aws-java-sdk-transcribe
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
…2.548

Bump aws.version from 1.12.547 to 1.12.548
Bumps `aws.version` from 1.12.548 to 1.12.549.

Updates `com.amazonaws:aws-java-sdk-s3` from 1.12.548 to 1.12.549
- [Changelog](https://github.com/aws/aws-sdk-java/blob/master/CHANGELOG.md)
- [Commits](aws/aws-sdk-java@1.12.548...1.12.549)

Updates `com.amazonaws:aws-java-sdk-transcribe` from 1.12.548 to 1.12.549
- [Changelog](https://github.com/aws/aws-sdk-java/blob/master/CHANGELOG.md)
- [Commits](aws/aws-sdk-java@1.12.548...1.12.549)

---
updated-dependencies:
- dependency-name: com.amazonaws:aws-java-sdk-s3
  dependency-type: direct:production
  update-type: version-update:semver-patch
- dependency-name: com.amazonaws:aws-java-sdk-transcribe
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
…2.549

Bump aws.version from 1.12.548 to 1.12.549
Bumps [io.projectreactor:reactor-core](https://github.com/reactor/reactor-core) from 3.5.9 to 3.5.10.
- [Release notes](https://github.com/reactor/reactor-core/releases)
- [Commits](reactor/reactor-core@v3.5.9...v3.5.10)

---
updated-dependencies:
- dependency-name: io.projectreactor:reactor-core
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Bumps `reactor.netty.version` from 1.1.10 to 1.1.11.

Updates `io.projectreactor.netty:reactor-netty-core` from 1.1.10 to 1.1.11
- [Release notes](https://github.com/reactor/reactor-netty/releases)
- [Commits](reactor/reactor-netty@v1.1.10...v1.1.11)

Updates `io.projectreactor.netty:reactor-netty-http` from 1.1.10 to 1.1.11
- [Release notes](https://github.com/reactor/reactor-netty/releases)
- [Commits](reactor/reactor-netty@v1.1.10...v1.1.11)

---
updated-dependencies:
- dependency-name: io.projectreactor.netty:reactor-netty-core
  dependency-type: direct:production
  update-type: version-update:semver-patch
- dependency-name: io.projectreactor.netty:reactor-netty-http
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
…ersion-1.1.11

Bump reactor.netty.version from 1.1.10 to 1.1.11
…or-reactor-core-3.5.10

Bump io.projectreactor:reactor-core from 3.5.9 to 3.5.10
* TIKA-4133 -- add a capture group metadata filter
* TIKA-4108 -- update jetbrain's annotations dependency

(cherry picked from commit 55989e1f81a1d626c30177f1068ae4c3a2a13679)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants