Conversation
| self.LOCAL_DSPACE_API_URL = "mock://dspace.edu/server/api" | ||
| self.LOCAL_DSPACE_USER = "local_test" | ||
| self.LOCAL_DSPACE_PASSWORD = "local_test" # nosec # noqa: S105 |
There was a problem hiding this comment.
Again, these will be deprecated once we shift to connection secrets
ghukill
left a comment
There was a problem hiding this comment.
Some good stuff here! Despite my deluge of comments, they are all kind of saying the same thing, pulled from one of the comments:
It's kind of the same story everywhere: while we're juggling two clients / two DSpace types, I really like encapsulating client or instance type specific logic in private methods. At these higher level public methods, you have checking and branching, so anything client or instance specific looks off here.
The rub, is that when we collapse down to only DSpace8 support, many of those private *_dspace8 methods may go away, and their business logic is moved back to this higher level, public method.
Using this comment to propose that anywhere we logic branch between instance type support, we take a close look and see if we can make it nearly identical, with private methods doing the work.
The changes I'm actually requesting are related to DSpace6Client | DSpace8Client type hinting.
The rest are comments and suggestions for discussion. As noted, most of them kind of orient around the sentiment above.
But really, looking good. I'm feeling confident this dual client / dual DSpace instance type is doable, understandable, and will be removable without too much headache.
| from dspace.client import DSpaceClient as DSpace6Client | ||
| from dspace_rest_client.client import DSpaceClient as DSpace8Client |
There was a problem hiding this comment.
Commenting here, but might not need a change. This was a little surprising, but maybe okay overall. And, acknowledging this juggling of two clients is somewhat temporary.
If we had dedicated methods like _init_dspace6_client() and _init_dspace8_client(), then the imports could be in those methods as they would never collide for a DSpaceClient type.
However, I see how this could complicate the return types for methods. I'm okay with this as-is! Just noting it did throw me for a second.
There was a problem hiding this comment.
+1 on a throw but appreciate the neat use of aliases!
| if self.destination == "DSpace@MIT": | ||
| # DSpace 6 instances, to be removed after migration to DSpace 8 | ||
| item = self.create_item_dspace6() | ||
| item = self.add_bitstreams_to_item_dspace6(item) | ||
| self.post_item_dspace6(item, self.collection_handle) | ||
| self.post_bitstreams_dspace6(item) | ||
| elif self.destination in ["DSpace8Local", "DSpace8MIT"]: | ||
| # DSpace 8 instances | ||
| item = self.create_item_dspace8() |
There was a problem hiding this comment.
I find it a bit jarring that for DS6 we're doing a lot of the work here -- e.g. adding bitstreams, posting items, etc. -- when we have a method create_item_dspace6() that could encapsulate that?
Seems like there is world where this could look like:
if self.destination == "DSpace@MIT":
# DSpace 6 instances, to be removed after migration to DSpace 8
item = self.create_item_dspace6()
elif self.destination in ["DSpace8Local", "DSpace8MIT"]:
# DSpace 8 instances
item = self.create_item_dspace8()It's kind of the same story everywhere: while we're juggling two clients / two DSpace types, I really like encapsulating client or instance type specific logic in private methods. At these higher level public methods, you have checking and branching, so anything client or instance specific looks off here.
The rub, is that when we collapse down to only DSpace8 support, many of those private *_dspace8 methods may go away, and their business logic is moved back to this higher level, public method.
Using this comment to propose that anywhere we logic branch between instance type support, we take a close look and see if we can make it nearly identical, with private methods doing the work.
There was a problem hiding this comment.
Agree on grouping all individual steps as private methods under a 'catch-all' _dspace6/8 method. I can see that the individual create_, add_bistreams, post_item, and post_bitstreams have some excellent exception handling.
Not certain it should all be under create_item but rather another catch-all method that also calls _create_item_dspace. It would be an opportunity to abstract away some of the exception handling here
There was a problem hiding this comment.
@ghukill @jcuerdo Ah, that's what I was trying to do in a previous PR (albeit with clumsy naming 🙃) but when @ghukill pushed back on the naming, I realized that mucking around with the DSpace 6 code felt like rearranging deck chairs on the Titanic given its short lifespan at this stage.
So my thinking was:
- Leave the DSpace 6 code largely as is
- Add conditional logic to switch between DSpace versions
- Design the DSpace 8 methods as the ideal approach going forward (but again leave DSpace 6 methods alone since they're going away soon)
If the incongruity is too much during this interim period, I can definitely encapsulate the DSpace 6 code in the same manner as the DSpace 8 code but that's the logic behind why I didn't do it initially
There was a problem hiding this comment.
@ghukill @jcuerdo Ah, that's what I was trying to do in a previous PR (albeit with clumsy naming 🙃) but when @ghukill pushed back on the naming, I realized that mucking around with the DSpace 6 code felt like rearranging deck chairs on the Titanic given its short lifespan at this stage.
So my thinking was:
- Leave the DSpace 6 code largely as is
- Add conditional logic to switch between DSpace versions
- Design the DSpace 8 methods as the ideal approach going forward (but again leave DSpace 6 methods alone since they're going away soon)
If the incongruity is too much during this interim period, I can definitely encapsulate the DSpace 6 code in the same manner as the DSpace 8 code but that's the logic behind why I didn't do it initially
There was a problem hiding this comment.
@ghukill @jcuerdo Ah, that's what I was trying to do in a previous PR (albeit with clumsy naming 🙃) but when @ghukill pushed back on the naming, I realized that mucking around with the DSpace 6 code felt like rearranging deck chairs on the Titanic given its short lifespan at this stage.
So my thinking was:
- Leave the DSpace 6 code largely as is
- Add conditional logic to switch between DSpace versions
- Design the DSpace 8 methods as the ideal approach going forward (but again leave DSpace 6 methods alone since they're going away soon)
If the incongruity is too much during this interim period, I can definitely encapsulate the DSpace 6 code in the same manner as the DSpace 8 code but that's the logic behind why I didn't do it initially
There was a problem hiding this comment.
@ehanson8 - I can appreciate that was present in your original PR, and my apologies if maybe I didn't see it then.
Well organized and modular code is easy to move; and the modularization itself has value. So if there are lots of _dspace6|8 methods in this class in the short-term, but eventually many of them will collapse back into the main method, that does feel ideal.
Silly toy example:
DSpace 6 only
def foo(): ... # e.g. create_item(), create_dspace_client(), submit_item(), etc.
def bar(): ... # e.g. create_item(), create_dspace_client(), submit_item(), etc.DSpace 6 + 8 support
def foo():
if instance == 6:
foo_6()
else:
foo_8()
def bar():
if instance == 6:
bar_6()
else:
bar_8()
def _foo_6(): ...
def _foo_8(): ...
def _bar_6(): ...
def _bar_8(): ...DSpace 8 support only:
def foo(): ... # e.g. create_item(), create_dspace_client(), submit_item(), etc.
# logic from _foo_8() collapses here
def bar(): ... # e.g. create_item(), create_dspace_client(), submit_item(), etc.
# logic from _bar_8() collapses hereSo yeah, I have no doubt your original PR was heading this direction. I have hunch my comments then and now are kind of the same, that the symmetry wasn't quite high enough to be obvious. And in that middleground, it felt more confusing. Or, I just understand the approach better now, and the original PR was already pretty spot on! Either way, thanks for the flexibility and willingness to toss code around.
There was a problem hiding this comment.
No idea why that comment appeared so many times! All good points, implementing these updates now
| it is re-raised with some useful message information and stops the | ||
| entire SQS message loop process until someone can investigate further. | ||
| """ | ||
| item = None |
There was a problem hiding this comment.
These changes revealed/created a logic hole in submit requiring this addition, I'm creating a ticket to address the unwieldy exception handling in this method
|
@ghukill @jonavellecuerdo Pushed updates based on your comments! Also, updated the ECR image and did a DSS run to confirm these changes didn't break DSpace 6 submissions |
ghukill
left a comment
There was a problem hiding this comment.
I've tilted into "Comment" with the revisions. Nothing here is a blocking request, but some suggestions and thoughts.
One is just some slight tweaking of submit() to provide some developer code scanning ease and remove the item = None requirement.
Much bigger picture, a reflection on all the *_dspace6 methods.
To state something first and clearly, I think the encapsulation is clean now. It's clear that when we migrate to DS8, we can:
- remove every method that has a
_dspace6suffix - look at remaining methods, and if it contains a
_dspace8suffixed method, consider moving that logic directly into that calling method
What I had kind of imagined very early on, maybe even before the first PR, was the possibility of how this could be handled by classes. This main Submission class would attach either a DSpace6Submission or a DSpace8Submission instance to something like self.instance_submitter. Those two classes could implemente a base DSpaceSubmission class that had required methods like create_client(), create_item(), etc.
Without going into the nitty gritty, one can imagine how each class would just do what it needs to do, and this Submission would just call entrypoint methods it knows they must have.
If we were going to support 6 & 8 long-term, I would have pushed for this immediately at the onset. And I imagine you may have landed on something similar; modular and maintainable.
I mention all this because I was surprised how many _dspace6 methods there ended up being compared to only a single _create_item_dspace8 method. But.... if it's temporary.... then who cares?
My TLDR: The lopsidedness of _dspace6 to _dspace8 methods is surprising at first, and feels like it would be difficult to maintain in the long run. But if we're truly dropping DSpace6 support in the near future, and most of those methods will dissappear, and the remaining ones will consolidate around DS8 behavior, I'm good with it 😎.
This review remains a comment abouto the submit() method, but the rest is effectively approved!
| } | ||
| ) | ||
|
|
||
| def submit(self) -> None: |
There was a problem hiding this comment.
Commenting here at the submit() method level, given some churn in the lines and comments.
You asked for a nit picky PR review, so not holding back @ehanson8!
Personally, I think I would find this major / important method easier to reason about with a) some additional whitespace and b) some comments. It might also orgnanically provide a way to move item = None more localized to when it's set.
Something like:
if CONFIG.SKIP_PROCESSING != "true":
self.client = self.get_dspace_client()
item = None
try:
if self.destination == "DSpace@MIT": # Update after DSpace 8 migration
item = self._create_item_dspace6()
elif self.destination in ["DSpace8Local", "DSpace8MIT"]:
item = self._create_item_dspace8()
self.result_success_message(item)
# TODO: what situation does this handle?
except requests.exceptions.Timeout as e:
dspace_url = self.client.base_url if self.client else "Unknown DSpace URL"
raise errors.DSpaceTimeoutError(dspace_url, self.result_attributes) from e
# TODO: what situation does this handle?
except ( # Update after DSpace 8 migration
errors.ItemCreateError,
errors.BitstreamAddError,
errors.ItemPostError,
) as e:
self.result_error_message(e.message, getattr(e, "dspace_error", None))
# TODO: what situation does this handle?
except (errors.BitstreamOpenError, errors.BitstreamPostError) as e:
self.result_error_message(e.message, getattr(e, "dspace_error", None))
if item:
self.clean_up_partial_success(item)
# TODO: what situation does this handle?
except Exception:
logger.exception(
"Unexpected exception, aborting DSpace Submission Service processing"
)
raiseFurther thinking about this method, and item = None, perhaps it's suggesting that an else block could handle a non-recognized destination?
try:
if self.destination == "DSpace@MIT": # Update after DSpace 8 migration
item = self._create_item_dspace6()
elif self.destination in ["DSpace8Local", "DSpace8MIT"]:
item = self._create_item_dspace8()
else:
raise ValueError(f"Unrecognized destination: {self.destination}")
self.result_success_message(item)
ghukill
left a comment
There was a problem hiding this comment.
Oops, see actual comment above.
|
I was noodling with some code and regarding the issue of encapsulation resulting failure to run
def _post_bitstreams_dspace6( # Update after DSpace 8 migration
self, item: dspace.item.Item
) -> None:
"""Post all bitstreams to an existing DSpace item."""
logger.debug(
"Posting %d bitstream(s) to item '%s' in DSpace",
len(item.bitstreams),
item.handle,
)
for bitstream in item.bitstreams:
try:
bitstream.post(self.client, item_uuid=item.uuid)
logger.debug(
"Posted bitstream '%s' to item '%s', new bitstream uuid is '%s'",
bitstream.name,
item.handle,
bitstream.uuid,
)
except (FileNotFoundError, requests.exceptions.HTTPError) as e:
partial_item_handle = item.handle
self.clean_up_partial_success(item)
if isinstance(e, FileNotFoundError):
raise errors.BitstreamOpenError(
bitstream.file_path, partial_item_handle
) from e
raise errors.BitstreamPostError(
e, bitstream.name, partial_item_handle
) from eThis avoids the need to include an |
Why these changes are being introduced: * DSS needs to support the DSpace 8 REST API as we migrate away from DSpace 6 How this addresses that need: * Add dspace-rest-client as dependency * Add LOCAL_ DSpace env var to test block of Config class * Add DSpaceAuthenticationError custom exception * Add _create_authenticated_dspace8_client method and update _create_dspace_client method to call it * Add create_item_dspace8 and _create_bundle_and_bitstream methods * Update Submission.submit method with conditional code block to switch between DSpace 6 and 8 methods * Add DSpace 8 fixtures to conftest.py * Add unit tests for the new methods Side effects of this change: * None Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/IN-1614
* Add inline comments to mark code for future deprecation * Update DSpace client type hints * Rename _create_authenticated_dspace8_client > _create_dspace8_client * Add _create_dspace6_client method and update _create_dspace_client to call it
59477a1 to
8c311eb
Compare
Slack recap: I think this will work @jonavellecuerdo but I'm shifting it to my next batch of work focusing on streamlining the exception handling and updating the DSpace 8 code to generate the expected exceptions @ghukill @jonavellecuerdo Reverted the last commit to be addressed in the next PR and successfully ran DSS with the updated ECR image |
Purpose and background context
Adding support for the DSpace 8 REST API via dspace-rest-python.
How can a reviewer manually see the effects of these changes?
While local Docker testing has established successful ingests into DSpace 8, the DSpace 8 Python client needs minor updates before we can successfully submit to a DSpace 8 instance via the ECR image. However, this DSS run shows that DSS can still successfully submit to DSpace 6 after the ECR image was updated with these commits.
Includes new or updated dependencies?
YES
Changes expectations for external applications?
NO
What are the relevant tickets?
Code review