Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Published datasets should contain files #10994

Open
wants to merge 17 commits into
base: develop
Choose a base branch
from

Conversation

stevenwinship
Copy link
Contributor

What this PR does / why we need it: The existing publishing model in HDV allows the publishing of datasets without files. This lends to the curation team contacting depositors about uploading files via support and often to deleting "empty" datasets.

To decrease the support contact and dataset deletions, change the publishing model to require all deposits contain at least one file before publishing can happen.

Which issue(s) this PR closes:#10981

Special notes for your reviewer:

Suggestions on how to test this: See the added IT test

Does this PR introduce a user interface change? If mockups are available, please link/include them here: No

Is there a release notes update needed for this change?: Included

Additional documentation:

@stevenwinship stevenwinship self-assigned this Oct 31, 2024
@stevenwinship stevenwinship added Size: 30 A percentage of a sprint. 21 hours. (formerly size:33) FY25 Sprint 9 FY25 Sprint 9 (2024-10-23 - 2024-11-06) labels Oct 31, 2024
@stevenwinship stevenwinship force-pushed the 10981-published-datasets-should-contain-files branch from d5bb6c2 to 85af55f Compare November 1, 2024 17:04

This comment has been minimized.

@stevenwinship stevenwinship removed their assignment Nov 1, 2024
@coveralls
Copy link

coveralls commented Nov 6, 2024

Coverage Status

coverage: 22.576% (+0.005%) from 22.571%
when pulling 179d085 on 10981-published-datasets-should-contain-files
into e3b5795 on develop.

This comment has been minimized.

@cmbz cmbz added the FY25 Sprint 10 FY25 Sprint 10 (2024-11-06 - 2024-11-20) label Nov 7, 2024
Copy link
Member

@qqmyers qqmyers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I've suggested that the Harvard team verify the UI is acceptable and made a note questioning whether we need a feature flag for this, but assuming no other feedback, the only change I'd suggest is just moving/renaming one method.

}
}

private boolean requiresFilesToPublishDataset() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in other cases we've used the patter getEffectiveRequireFilesToPublishDataset which might be useful for consistency. I also think this should probably be in Dataverse.java with the get/set - when there's a UI, we need to know the effective setting at other times than just publishing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed and added to Dataverse.Java

@@ -292,6 +292,9 @@ public static JsonObjectBuilder json(Dataverse dv, Boolean hideEmail, Boolean re
if (dv.getFilePIDsEnabled() != null) {
bld.add("filePIDsEnabled", dv.getFilePIDsEnabled());
}
if (dv.getRequireFilesToPublishDataset() != null) {
bld.add("requireFilesToPublishDataset", dv.getRequireFilesToPublishDataset());
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW: My guess is that the SPA will want the effective settings for this and other things (file pids, storage driver, etc.). Not sure if that would go here or not. I guess it can wait until we address the whole set.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

swapped the requireFilesToPublishDataset for effectiveRequiresFilesToPublishDataset


if (getDataset().getFiles().isEmpty() && requiresFilesToPublishDataset()) {
throw new IllegalCommandException(BundleUtil.getStringFromBundle("dataset.mayNotPublish.FilesRequired"), this);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This gives a warning in the current UI, right? It would probably be better to just disable the publish button, or have a persistent message that files must be added before this dataset can be published, but if just failing with a warning is OK with @sbarbosadataverse, et. al., the current code avoids changes to the current UI code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stevenwinship - can you post a UI image for what this looks like?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stevenwinship images like this are so helpful. Can you please also add it under "Does this PR introduce a user interface change?" above?

src/main/resources/db/migration/V6.5.0.1.sql Show resolved Hide resolved

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

@sbarbosadataverse
Copy link

I have one comment @stevenwinship can we say "published datasets should contain at least one 'data' file? To help users understand we want data and not just a pdf of am article?

@qqmyers qqmyers removed their assignment Nov 21, 2024
@cmbz cmbz added the FY25 Sprint 11 FY25 Sprint 11 (2024-11-20 - 2024-12-04) label Nov 21, 2024
@stevenwinship
Copy link
Contributor Author

Changed the message

image

This comment has been minimized.

@ofahimIQSS
Copy link
Contributor

@stevenwinship Looks like the branch has conflicts which must be resolved.

This comment has been minimized.

1 similar comment

This comment has been minimized.

@ofahimIQSS ofahimIQSS self-assigned this Dec 4, 2024
@cmbz cmbz added the FY25 Sprint 12 FY25 Sprint 12 (2024-12-04 - 2024-12-18) label Dec 5, 2024

This comment has been minimized.

1 similar comment

This comment has been minimized.

This comment has been minimized.

1 similar comment
Copy link

📦 Pushed preview images as

ghcr.io/gdcc/dataverse:10981-published-datasets-should-contain-files
ghcr.io/gdcc/configbaker:10981-published-datasets-should-contain-files

🚢 See on GHCR. Use by referencing with full name as printed above, mind the registry name.

@ofahimIQSS
Copy link
Contributor

I faced some issues in docker while trying to build and test this PR in local. Server log attached
server.log --- docker fail.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FY25 Sprint 9 FY25 Sprint 9 (2024-10-23 - 2024-11-06) FY25 Sprint 10 FY25 Sprint 10 (2024-11-06 - 2024-11-20) FY25 Sprint 11 FY25 Sprint 11 (2024-11-20 - 2024-12-04) FY25 Sprint 12 FY25 Sprint 12 (2024-12-04 - 2024-12-18) Size: 30 A percentage of a sprint. 21 hours. (formerly size:33)
Projects
Status: QA ✅
Development

Successfully merging this pull request may close these issues.

Publishing: Published datasets should contain files
8 participants