Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update and reorganize the XOAI dependencies under local_lib #8372

Closed
poikilotherm opened this issue Jan 27, 2022 · 18 comments · Fixed by #8753
Closed

Update and reorganize the XOAI dependencies under local_lib #8372

poikilotherm opened this issue Jan 27, 2022 · 18 comments · Fixed by #8753
Assignees
Labels
Component: Code Infrastructure formerly "Feature: Code Infrastructure" Feature: Harvesting NIH OTA DC Grant: The Harvard Dataverse repository: A generalist repository integrated with a Data Commons NIH OTA: 1.4.1 4 | 1.4.1 | Resolve OAI-PMH harvesting issues | 5 prdOwnThis is an item synched from the product ... pm.epic.nih_harvesting pm.GREI-d-1.4.1 NIH, yr1, aim4, task1: Resolve OAI-PMH harvesting issues pm.GREI-d-1.4.2 NIH, yr1, aim4, task2: Create working group on packaging standards User Role: Sysadmin Installs, upgrades, and configures the system, connects via ssh
Milestone

Comments

@poikilotherm
Copy link
Contributor

Currently, Dataverse codebase uses a custom patched XOAI 4.1.0, provided in /local_lib.

It would be a good idea to

  1. Find out what patches have been applied (needs anatomization of patched JAR sources with original sources)
  2. Fork https://github.com/DSpace/xoai (which is the moved repo from https://github.com/lyncode) to @gdcc
  3. Find out if our patches are still needed
  4. As development in https://github.com/DSpace/xoai has stalled (see e.g. Several potential NullPointerException bugs. DSpace/xoai#72 (comment)), make it our package
  5. Update LOTS of dependencies and especially get rid of log4j-1.2!
  6. Setup Github Workflow, Code Coverage etc.
  7. Make releases to Maven Central
  8. Use & test in Dataverse upstream code

It might be worth a try to 1) not rename the packages but still publish under our @gdcc Maven group id and 2) create fork and a pull request for upstream plus setting up wei/pull to auto-create pull requests when DSpace updates their branch.

@poikilotherm poikilotherm added Component: Code Infrastructure formerly "Feature: Code Infrastructure" User Role: Sysadmin Installs, upgrades, and configures the system, connects via ssh labels Jan 27, 2022
@landreev
Copy link
Contributor

I am removing the log4j dependency that's being brought in by XOAI as part of https://github.com/IQSS/dataverse-security/issues/48; (I only found this a couple of days ago myself; about to make a PR into the main Dataverse project.)

But yes, in the process of tracing this dependency, I was reminded of this XOAI debacle, that we still have these locally-built libraries in the project, etc. (I can comment more on this;)
Once the immediate log4j issue is addressed, we can use this issue to address cleaning up the XOAI. I was going to open one myself, but thanks for beating me to it.

landreev added a commit that referenced this issue Jan 27, 2022
@landreev
Copy link
Contributor

(Once we merge the pom.xml PR that removes the log4j jar from the war file, I'm going to update the title of this issue to reflect the fact that it is mainly about refactoring these XOAI dependencies, and not specifically about log4j. I will add some information on these XOAI libraries in the meantime).

@landreev landreev changed the title As sysadmin, I would sleep better with no more log4j present (bisect and update XOAI) Update and reorganize the XOAI dependencies under local_lib Feb 2, 2022
@landreev
Copy link
Contributor

landreev commented Feb 3, 2022

As development in https://github.com/DSpace/xoai has stalled (see e.g. DSpace/xoai#72 (comment)), make it our package

Yes, the situation is back to where it was 6 years ago, when lyncode still nominally owned the package; but it wasn't being actively maintained and it was not possible to submit and merge a PR. We really liked the package, but it was impossible to use it without a few minor fixes. So that's what forced us to build those jars locally in the first place. We appear to have missed the window when DSpace was somewhat actively maintaining the package. And then these libraries got further stuck in our local_lib limbo. So we definitely need to assume a more active role there in order to clean up this dependency. But we will need to decide what exactly "mak[ing] it our package" should entail.

Bad news:

  1. I'm seeing no trace of the local fork of the old lyncode repo from which those jars were built, anywhere.

Better news:

  1. We still have all the sources (and they are included under local_lib, as source jars, I believe).
  2. I will recreate that fork somewhere on github; either under IQSS/, or as my own, as the first step.
  3. The changes were really minimal.
  4. It's a fairly simple package to start with.

It might be worth a try to ... not rename the packages but still publish under our https://github.com/gdcc Maven group id and

That's an interesting idea. I never even considered that - publishing somebody else's packages? But that appears to be exactly what they did with reload4j - forked from log4j-1.2.17, released with the package pathnames unchanged, but under their own group id ... (assuming that's what you meant/had in mind above - ?). (I guess it's a slightly different situation with reload4j - since log4j-1 was officially retired/EOL-ed; and the new incarnation was made by the original author of log4j).

OK, that part we can figure out as we go. But we can definitely proceed with the other parts. Create a fork of the latest DSpace xoai-4 under gdcc, patch/fix what's necessary to get it to work with Dataverse, make pull requests, see how that goes etc.

@poikilotherm
Copy link
Contributor Author

@landreev please take a look at https://github.com/gdcc/sword2-server for an example of what I did to release our own version to Maven Central.

This was a fork of what DANS did (which also was a fork of the original source) to push necessary changes from us, plus some more stuff to modernize and adapt for dependency updates. And it still has the old Java package layout, but being released under our Maven group ID. 🤠

landreev added a commit to landreev/xoai that referenced this issue Feb 8, 2022
@poikilotherm
Copy link
Contributor Author

@landreev I started moving stuff around in https://github.com/gdcc/xoai

@landreev
Copy link
Contributor

Added "feature: Harvesting" label; to make sure the issue is added to the pile of Harvesting issues we are prioritizing now as part of the GREI effort.

@mreekie
Copy link

mreekie commented Apr 27, 2022

sprint

  • This came out of the spike in the previous sprint.

  • This is a cleanup/maintenance issue. Came to attention during log4j security issue.

  • This is a code dependecy for DV. We need to put this somewhere we can maintain it and control it.

  • a lot of work has been done on this.

  • medium. It's close to being able to build DV with the updated library version. However harvesting is a complicated subsystem and so everything will need to be verified and this will be labor consuming.

@mreekie
Copy link

mreekie commented Apr 27, 2022

The library has had some bugs addressed. This lead to DV creating it's own jar.

  • The bugs are addressed.
  • native json harvesting was added.
  • left: completely eliminate log4j as opposed to the current patch that was applied.
  • left: remove other dependencies that have potential security risks.
  • biggest chunk. Recompile and test with the goal to confirm that it works as expected. This includes some known bugs whichwill not be addressed as part of the effort

@poikilotherm
Copy link
Contributor Author

poikilotherm commented Apr 27, 2022

I'd appreciate help on this in https://github.com/gdcc/xoai (currently crammed with lots of other work)

I started moving lots of things into the lib, as some of the deps aren't maintained either. Happy to answer questions and give permissions.

@pdurbin
Copy link
Member

pdurbin commented Apr 29, 2022

I checked in with @poikilotherm about this in Matrix today: https://matrix.to/#/!AmypvmJtUjBesRrnLM:matrix.org/$s548c8B-SZ-rdF5GN9LyZZ94YoZxDMkSW1QXLzvCc3A

One thing I didn't realize is that @poikilotherm is hoping that this issue could be worked in in parallel with the Payara 6/Jakarta EE 9 upgrade in #8305. (We're pretty sure we'd like to go ahead and upgrade to EE9 per discussion in Slack: https://iqss.slack.com/archives/C010LA04BCG/p1651155098181049 ). When I asked, "Wait. Is this all dependent on the upgrade to Payara 6 and Jakarta EE 9?" he said, "That's half way correct... The Jakarta EE 9 move was the motivation to create a branch with these breaking changes, so we avoid any inconsistencies... So this is mostly about getting rid of old Java EE and refactor ye old deps". The branch he's talking about is https://github.com/gdcc/xoai/tree/branch-5.0 .

While there are unit tests to work on, it sounds like @poikilotherm's plan is to build and test the refactored xoai jar with Payara 6. We aren't working on Payara 6 in the sprint we just started, but I said I'd at least poke around.

A little poking seems helpful. mvn package didn't "just work" and @poikilotherm is helping me with it at gdcc/xoai#11

I'll keep poking and will coordinate more with @poikilotherm next week. There's still refactoring to do and other tasks such as pushing a beta (or snapshot or whatever) to Maven Central.

@poikilotherm
Copy link
Contributor Author

Let me add some notes here:

  • Refactoring of the library is mostly independent of any Payara 6 / Dataverse 6 activities.
  • Using the old version might work with Payara 6 as a fallback.
  • Preparing the library to be incorporated once we move on to creating Dataverse 6 + Payara 6 seems like a good thing. This is not depending on the move and may be slower or faster, but can happen in parallel.
  • This doesn't have to happen within this sprint if there are more urgent things.
  • I expect 2 workdays refactoring on this to have a snapshot to be usable for integration testing with a Jakarta EE 9 app.
  • We can iterate from there with thing like Java 17 compatibility, PMD/Spotbugs, CI + test coverage, etc. Makes it adjustable to programming resources.
  • I am able and willing to spend dev time in this.

@landreev
Copy link
Contributor

Just reading the updates from the last 2 days now. Yes, I'm happy to help with this work in /gdcc/xoai. That's why I pushed for prioritizing the issue/adding it to the schedule.

I thought we were going to start with getting Dataverse to build with the new, gdcc-branded jars in its (Dataverse's) current state; just to make sure everything worked as it should. If there's a compelling reason to go straight to Payara 6/EE 9, we could do that too. But then we can't switch to gdcc/xoai until we upgrade.

@poikilotherm
Copy link
Contributor Author

poikilotherm commented Apr 29, 2022

Well there is still room for going with Jakarta EE 8 if this is what you want.

We could create a v5.0 for DV 5 and then create a v6.0 for DV 6 and Jakarta EE 9 (10?)

The code refactors already done are still targeting Jakarta EE 8 (so far only JAXB 2.3 in use), see https://github.com/gdcc/xoai/blob/06dbf60994fee51246e4b6b0c436d00d16b3b652/pom.xml#L46.

Now would be a good time to decide on this. 😎 I'm fine with sticking to EE 8 for now, too.

@landreev
Copy link
Contributor

I would personally vote for starting with EE 8. I am operating under the assumption that it would then be possible to make a PR switching Dataverse to gdcc/xoai fairly quickly/easily. Is that correct?

Thank you for leading this effort/all the work that's already been done, @poikilotherm.

@mreekie
Copy link

mreekie commented Jun 8, 2022

Sprint:

  • pm.sprint.2022_05_25 ended WIP

landreev added a commit that referenced this issue Jul 13, 2022
…manipulating the "until" date parameter to ensure inclusivity. (#8372)
landreev added a commit that referenced this issue Jul 13, 2022
…the jdk http client and stream .transferTo() (#8372)
@pdurbin pdurbin added this to the 5.12.1 milestone Oct 28, 2022
@mreekie mreekie added the NIH OTA: 1.4.1 4 | 1.4.1 | Resolve OAI-PMH harvesting issues | 5 prdOwnThis is an item synched from the product ... label Dec 5, 2022
@mreekie mreekie added the pm.GREI-d-1.4.1 NIH, yr1, aim4, task1: Resolve OAI-PMH harvesting issues label Mar 20, 2023
@mreekie mreekie added the pm.GREI-d-1.4.2 NIH, yr1, aim4, task2: Create working group on packaging standards label Apr 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Code Infrastructure formerly "Feature: Code Infrastructure" Feature: Harvesting NIH OTA DC Grant: The Harvard Dataverse repository: A generalist repository integrated with a Data Commons NIH OTA: 1.4.1 4 | 1.4.1 | Resolve OAI-PMH harvesting issues | 5 prdOwnThis is an item synched from the product ... pm.epic.nih_harvesting pm.GREI-d-1.4.1 NIH, yr1, aim4, task1: Resolve OAI-PMH harvesting issues pm.GREI-d-1.4.2 NIH, yr1, aim4, task2: Create working group on packaging standards User Role: Sysadmin Installs, upgrades, and configures the system, connects via ssh
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants