Skip to content

In 1614 dspace 8 support#50

Merged
ehanson8 merged 3 commits intomainfrom
IN-1614-dspace-8-support
Feb 13, 2026
Merged

In 1614 dspace 8 support#50
ehanson8 merged 3 commits intomainfrom
IN-1614-dspace-8-support

Conversation

@ehanson8
Copy link
Copy Markdown
Contributor

Purpose and background context

Adding support for the DSpace 8 REST API via dspace-rest-python.

How can a reviewer manually see the effects of these changes?

While local Docker testing has established successful ingests into DSpace 8, the DSpace 8 Python client needs minor updates before we can successfully submit to a DSpace 8 instance via the ECR image. However, this DSS run shows that DSS can still successfully submit to DSpace 6 after the ECR image was updated with these commits.

Includes new or updated dependencies?

YES

Changes expectations for external applications?

NO

What are the relevant tickets?

Code review

  • Code review best practices are documented here and you are encouraged to have a constructive dialogue with your reviewers about their preferences and expectations.

Comment thread submitter/config.py
Comment on lines +43 to +45
self.LOCAL_DSPACE_API_URL = "mock://dspace.edu/server/api"
self.LOCAL_DSPACE_USER = "local_test"
self.LOCAL_DSPACE_PASSWORD = "local_test" # nosec # noqa: S105
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, these will be deprecated once we shift to connection secrets

@ehanson8 ehanson8 marked this pull request as ready for review February 10, 2026 21:59
@ehanson8 ehanson8 requested a review from a team as a code owner February 10, 2026 21:59
Copy link
Copy Markdown
Contributor

@ghukill ghukill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some good stuff here! Despite my deluge of comments, they are all kind of saying the same thing, pulled from one of the comments:

It's kind of the same story everywhere: while we're juggling two clients / two DSpace types, I really like encapsulating client or instance type specific logic in private methods. At these higher level public methods, you have checking and branching, so anything client or instance specific looks off here.

The rub, is that when we collapse down to only DSpace8 support, many of those private *_dspace8 methods may go away, and their business logic is moved back to this higher level, public method.

Using this comment to propose that anywhere we logic branch between instance type support, we take a close look and see if we can make it nearly identical, with private methods doing the work.

The changes I'm actually requesting are related to DSpace6Client | DSpace8Client type hinting.

The rest are comments and suggestions for discussion. As noted, most of them kind of orient around the sentiment above.

But really, looking good. I'm feeling confident this dual client / dual DSpace instance type is doable, understandable, and will be removable without too much headache.

Comment thread submitter/submission.py
Comment thread submitter/submission.py Outdated
Comment thread submitter/submission.py Outdated
Comment thread submitter/submission.py Outdated
Comment thread submitter/submission.py Outdated
Comment on lines +13 to +14
from dspace.client import DSpaceClient as DSpace6Client
from dspace_rest_client.client import DSpaceClient as DSpace8Client
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commenting here, but might not need a change. This was a little surprising, but maybe okay overall. And, acknowledging this juggling of two clients is somewhat temporary.

If we had dedicated methods like _init_dspace6_client() and _init_dspace8_client(), then the imports could be in those methods as they would never collide for a DSpaceClient type.

However, I see how this could complicate the return types for methods. I'm okay with this as-is! Just noting it did throw me for a second.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on a throw but appreciate the neat use of aliases!

Comment thread submitter/submission.py Outdated
Comment thread submitter/submission.py Outdated
Comment thread submitter/submission.py Outdated
Comment thread submitter/submission.py Outdated
Comment thread submitter/submission.py Outdated
Comment on lines +265 to +273
if self.destination == "DSpace@MIT":
# DSpace 6 instances, to be removed after migration to DSpace 8
item = self.create_item_dspace6()
item = self.add_bitstreams_to_item_dspace6(item)
self.post_item_dspace6(item, self.collection_handle)
self.post_bitstreams_dspace6(item)
elif self.destination in ["DSpace8Local", "DSpace8MIT"]:
# DSpace 8 instances
item = self.create_item_dspace8()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it a bit jarring that for DS6 we're doing a lot of the work here -- e.g. adding bitstreams, posting items, etc. -- when we have a method create_item_dspace6() that could encapsulate that?

Seems like there is world where this could look like:

if self.destination == "DSpace@MIT":
    # DSpace 6 instances, to be removed after migration to DSpace 8
    item = self.create_item_dspace6()
elif self.destination in ["DSpace8Local", "DSpace8MIT"]:
    # DSpace 8 instances
    item = self.create_item_dspace8()

It's kind of the same story everywhere: while we're juggling two clients / two DSpace types, I really like encapsulating client or instance type specific logic in private methods. At these higher level public methods, you have checking and branching, so anything client or instance specific looks off here.

The rub, is that when we collapse down to only DSpace8 support, many of those private *_dspace8 methods may go away, and their business logic is moved back to this higher level, public method.

Using this comment to propose that anywhere we logic branch between instance type support, we take a close look and see if we can make it nearly identical, with private methods doing the work.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree on grouping all individual steps as private methods under a 'catch-all' _dspace6/8 method. I can see that the individual create_, add_bistreams, post_item, and post_bitstreams have some excellent exception handling.

Not certain it should all be under create_item but rather another catch-all method that also calls _create_item_dspace. It would be an opportunity to abstract away some of the exception handling here

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ghukill @jcuerdo Ah, that's what I was trying to do in a previous PR (albeit with clumsy naming 🙃) but when @ghukill pushed back on the naming, I realized that mucking around with the DSpace 6 code felt like rearranging deck chairs on the Titanic given its short lifespan at this stage.

So my thinking was:

  • Leave the DSpace 6 code largely as is
  • Add conditional logic to switch between DSpace versions
  • Design the DSpace 8 methods as the ideal approach going forward (but again leave DSpace 6 methods alone since they're going away soon)

If the incongruity is too much during this interim period, I can definitely encapsulate the DSpace 6 code in the same manner as the DSpace 8 code but that's the logic behind why I didn't do it initially

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ghukill @jcuerdo Ah, that's what I was trying to do in a previous PR (albeit with clumsy naming 🙃) but when @ghukill pushed back on the naming, I realized that mucking around with the DSpace 6 code felt like rearranging deck chairs on the Titanic given its short lifespan at this stage.

So my thinking was:

  • Leave the DSpace 6 code largely as is
  • Add conditional logic to switch between DSpace versions
  • Design the DSpace 8 methods as the ideal approach going forward (but again leave DSpace 6 methods alone since they're going away soon)

If the incongruity is too much during this interim period, I can definitely encapsulate the DSpace 6 code in the same manner as the DSpace 8 code but that's the logic behind why I didn't do it initially

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ghukill @jcuerdo Ah, that's what I was trying to do in a previous PR (albeit with clumsy naming 🙃) but when @ghukill pushed back on the naming, I realized that mucking around with the DSpace 6 code felt like rearranging deck chairs on the Titanic given its short lifespan at this stage.

So my thinking was:

  • Leave the DSpace 6 code largely as is
  • Add conditional logic to switch between DSpace versions
  • Design the DSpace 8 methods as the ideal approach going forward (but again leave DSpace 6 methods alone since they're going away soon)

If the incongruity is too much during this interim period, I can definitely encapsulate the DSpace 6 code in the same manner as the DSpace 8 code but that's the logic behind why I didn't do it initially

Copy link
Copy Markdown
Contributor

@ghukill ghukill Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ehanson8 - I can appreciate that was present in your original PR, and my apologies if maybe I didn't see it then.

Well organized and modular code is easy to move; and the modularization itself has value. So if there are lots of _dspace6|8 methods in this class in the short-term, but eventually many of them will collapse back into the main method, that does feel ideal.

Silly toy example:

DSpace 6 only

def foo():  ... # e.g. create_item(), create_dspace_client(), submit_item(), etc.
def bar():  ... # e.g. create_item(), create_dspace_client(), submit_item(), etc.

DSpace 6 + 8 support

def foo():
  if instance == 6:
    foo_6()
  else:
    foo_8()

def bar():
  if instance == 6:
    bar_6()
  else:
    bar_8()

def _foo_6(): ...
def _foo_8(): ...

def _bar_6(): ...
def _bar_8(): ...

DSpace 8 support only:

def foo():  ... # e.g. create_item(), create_dspace_client(), submit_item(), etc.
  # logic from _foo_8() collapses here

def bar():  ... # e.g. create_item(), create_dspace_client(), submit_item(), etc.
  # logic from _bar_8() collapses here

So yeah, I have no doubt your original PR was heading this direction. I have hunch my comments then and now are kind of the same, that the symmetry wasn't quite high enough to be obvious. And in that middleground, it felt more confusing. Or, I just understand the approach better now, and the original PR was already pretty spot on! Either way, thanks for the flexibility and willingness to toss code around.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea why that comment appeared so many times! All good points, implementing these updates now

Comment thread submitter/submission.py Outdated
it is re-raised with some useful message information and stops the
entire SQS message loop process until someone can investigate further.
"""
item = None
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes revealed/created a logic hole in submit requiring this addition, I'm creating a ticket to address the unwieldy exception handling in this method

@ehanson8
Copy link
Copy Markdown
Contributor Author

ehanson8 commented Feb 11, 2026

@ghukill @jonavellecuerdo Pushed updates based on your comments! Also, updated the ECR image and did a DSS run to confirm these changes didn't break DSpace 6 submissions

@ehanson8 ehanson8 requested a review from ghukill February 11, 2026 19:41
Copy link
Copy Markdown
Contributor

@ghukill ghukill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tilted into "Comment" with the revisions. Nothing here is a blocking request, but some suggestions and thoughts.

One is just some slight tweaking of submit() to provide some developer code scanning ease and remove the item = None requirement.

Much bigger picture, a reflection on all the *_dspace6 methods.

To state something first and clearly, I think the encapsulation is clean now. It's clear that when we migrate to DS8, we can:

  1. remove every method that has a _dspace6 suffix
  2. look at remaining methods, and if it contains a _dspace8 suffixed method, consider moving that logic directly into that calling method

What I had kind of imagined very early on, maybe even before the first PR, was the possibility of how this could be handled by classes. This main Submission class would attach either a DSpace6Submission or a DSpace8Submission instance to something like self.instance_submitter. Those two classes could implemente a base DSpaceSubmission class that had required methods like create_client(), create_item(), etc.

Without going into the nitty gritty, one can imagine how each class would just do what it needs to do, and this Submission would just call entrypoint methods it knows they must have.

If we were going to support 6 & 8 long-term, I would have pushed for this immediately at the onset. And I imagine you may have landed on something similar; modular and maintainable.

I mention all this because I was surprised how many _dspace6 methods there ended up being compared to only a single _create_item_dspace8 method. But.... if it's temporary.... then who cares?

My TLDR: The lopsidedness of _dspace6 to _dspace8 methods is surprising at first, and feels like it would be difficult to maintain in the long run. But if we're truly dropping DSpace6 support in the near future, and most of those methods will dissappear, and the remaining ones will consolidate around DS8 behavior, I'm good with it 😎.

This review remains a comment abouto the submit() method, but the rest is effectively approved!

Comment thread submitter/submission.py
}
)

def submit(self) -> None:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commenting here at the submit() method level, given some churn in the lines and comments.

You asked for a nit picky PR review, so not holding back @ehanson8!

Personally, I think I would find this major / important method easier to reason about with a) some additional whitespace and b) some comments. It might also orgnanically provide a way to move item = None more localized to when it's set.

Something like:

if CONFIG.SKIP_PROCESSING != "true":
    self.client = self.get_dspace_client()

item = None
try:
    if self.destination == "DSpace@MIT":  # Update after DSpace 8 migration
        item = self._create_item_dspace6()
    elif self.destination in ["DSpace8Local", "DSpace8MIT"]:
        item = self._create_item_dspace8()
    self.result_success_message(item)

# TODO: what situation does this handle?
except requests.exceptions.Timeout as e:
    dspace_url = self.client.base_url if self.client else "Unknown DSpace URL"
    raise errors.DSpaceTimeoutError(dspace_url, self.result_attributes) from e

# TODO: what situation does this handle?
except (  # Update after DSpace 8 migration
    errors.ItemCreateError,
    errors.BitstreamAddError,
    errors.ItemPostError,
) as e:
    self.result_error_message(e.message, getattr(e, "dspace_error", None))

# TODO: what situation does this handle?
except (errors.BitstreamOpenError, errors.BitstreamPostError) as e:
    self.result_error_message(e.message, getattr(e, "dspace_error", None))
    if item:
        self.clean_up_partial_success(item)

# TODO: what situation does this handle?
except Exception:
    logger.exception(
        "Unexpected exception, aborting DSpace Submission Service processing"
    )
    raise

Further thinking about this method, and item = None, perhaps it's suggesting that an else block could handle a non-recognized destination?

try:
    if self.destination == "DSpace@MIT":  # Update after DSpace 8 migration
        item = self._create_item_dspace6()
    elif self.destination in ["DSpace8Local", "DSpace8MIT"]:
        item = self._create_item_dspace8()
    else:
        raise ValueError(f"Unrecognized destination: {self.destination}")
    self.result_success_message(item)

@ghukill ghukill self-requested a review February 11, 2026 20:31
Copy link
Copy Markdown
Contributor

@ghukill ghukill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, see actual comment above.

@jonavellecuerdo
Copy link
Copy Markdown
Contributor

I was noodling with some code and regarding the issue of encapsulation resulting failure to run clean_up_partial_success yesterday, I'm curious what both of you (@ehanson8 , @ghukill) think of the proposed fix:

  • Move the clean_up_partial_success() call inside of the post_bitstreams_dspace_6().
  • Delete the call from submit().
def _post_bitstreams_dspace6(  # Update after DSpace 8 migration
        self, item: dspace.item.Item
    ) -> None:
        """Post all bitstreams to an existing DSpace item."""
        logger.debug(
            "Posting %d bitstream(s) to item '%s' in DSpace",
            len(item.bitstreams),
            item.handle,
        )
        for bitstream in item.bitstreams:
            try:
                bitstream.post(self.client, item_uuid=item.uuid)
                logger.debug(
                    "Posted bitstream '%s' to item '%s', new bitstream uuid is '%s'",
                    bitstream.name,
                    item.handle,
                    bitstream.uuid,
                )
            except (FileNotFoundError, requests.exceptions.HTTPError) as e:
                partial_item_handle = item.handle
                self.clean_up_partial_success(item)
                if isinstance(e, FileNotFoundError):
                    raise errors.BitstreamOpenError(
                        bitstream.file_path, partial_item_handle
                    ) from e

                raise errors.BitstreamPostError(
                    e, bitstream.name, partial_item_handle
                ) from e

This avoids the need to include an item=None at the beginning of submit(). From looking at the code, my understanding is that we only have a partial success (i.e., an item was created in DSpace) at the start of post_bitstreams_dspace_6() because the call to post_item_dspace6() would've needed to run without error.

Why these changes are being introduced:
* DSS needs to support the DSpace 8 REST API as we migrate away from DSpace 6

How this addresses that need:
* Add dspace-rest-client as dependency
* Add LOCAL_ DSpace env var to test block of Config class
* Add DSpaceAuthenticationError custom exception
* Add _create_authenticated_dspace8_client method and update _create_dspace_client method to call it
* Add create_item_dspace8 and _create_bundle_and_bitstream methods
* Update Submission.submit method with conditional code block to switch between DSpace 6 and 8 methods
* Add DSpace 8 fixtures to conftest.py
* Add unit tests for the new methods

Side effects of this change:
* None

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/IN-1614
* Add inline comments to mark code for future deprecation
* Update DSpace client type hints
* Rename _create_authenticated_dspace8_client > _create_dspace8_client
* Add _create_dspace6_client method and update _create_dspace_client to call it
@ehanson8 ehanson8 force-pushed the IN-1614-dspace-8-support branch from 59477a1 to 8c311eb Compare February 13, 2026 15:24
@ehanson8
Copy link
Copy Markdown
Contributor Author

I was noodling with some code and regarding the issue of encapsulation resulting failure to run clean_up_partial_success yesterday, I'm curious what both of you (@ehanson8 , @ghukill) think of the proposed fix:

Slack recap: I think this will work @jonavellecuerdo but I'm shifting it to my next batch of work focusing on streamlining the exception handling and updating the DSpace 8 code to generate the expected exceptions

@ghukill @jonavellecuerdo Reverted the last commit to be addressed in the next PR and successfully ran DSS with the updated ECR image

@ghukill ghukill self-requested a review February 13, 2026 16:06
Copy link
Copy Markdown
Contributor

@ghukill ghukill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved! Given the upcoming ticket to streamline exception handling, which will take another pass at temproary DS6/8 juggling, I think things are looking good. Thanks for all the discussion.

@ehanson8 ehanson8 merged commit f3fe8d9 into main Feb 13, 2026
4 checks passed
@ehanson8 ehanson8 deleted the IN-1614-dspace-8-support branch February 13, 2026 16:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants