v6.2
Dataverse 6.2
Please note: As of 2024-05-16, the "dvinstall.zip" file below has been updated (PR #10563) to fix a bug (#10557) in the as-install.sh script as detailed in an announcement.
Please note: To read these instructions in full, please go to https://github.com/IQSS/dataverse/releases/tag/v6.2 rather than the list of releases, which will cut them off.
This release brings new features, enhancements, and bug fixes to the Dataverse software.
Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.
Table of Contents
- 💡 Release Highlights
- 🪲 Bug fixes
- 💾 Persistence
- 🌐 API
⚠️ Backward Incompatibilities- 📖 Guides
- ⚙️ New Settings
- 📋 Complete List of Changes
- 🛟 Getting Help
- 💻 Upgrade instructions
💡Release Highlights
Search and Facet by License
License have been added to the search facets in the search side panel to filter datasets by license (e.g. CC0).
Datasets with Custom Terms are aggregated under the "Custom Terms" value of this facet. See the Licensing section of the guide for more details on configured Licenses and Custom Terms.
For more information, see #9060.
Licenses can also be used to filter the Search API results using the fq
parameter, for example : /api/search?q=*&fq=license%3A%22CC0+1.0%22
for CC0 1.0, see the Search API guide for more examples.
For more information, see #10204.
When Returning Datasets to Authors, Reviewers Can Add a Note to the Author
The Popup for returning to author now allows to type in a message to explain the reasons of return and potential edits needed, that will be sent by email to the author.
Please note that this note is mandatory, but that you can still type a creative and meaningful comment such as "The author would like to modify his dataset", "Files are missing", "Nothing to report" or "A curation report with comments and suggestions/instructions will follow in another email" that suits your situation.
For more information, see #10137.
Support for Using Multiple PID Providers
This release adds support for using multiple PID (DOI, Handle, PermaLink) providers, multiple PID provider accounts
(managing a given protocol, authority, separator, shoulder combination), assigning PID provider accounts to specific collections,
and supporting transferred PIDs (where a PID is managed by an account when its authority, separator, and/or shoulder don't match
the combination where the account can mint new PIDs). It also adds the ability for additional provider services beyond the existing
DataCite, EZId, Handle, and PermaLink providers to be dynamically added as separate jar files.
These changes require per-provider settings rather than the global PID settings previously supported. While backward compatibility
for installations using a single PID Provider account is provided, updating to use the new microprofile settings is highly recommended
and will be required in a future version.
For more information check the PID settings on this link.
Rate Limiting
The option to rate limit has been added to prevent users from over taxing the system either deliberately or by runaway automated processes.
Rate limiting can be configured on a tier level with tier 0 being reserved for guest users and tiers 1-any for authenticated users.
Superuser accounts are exempt from rate limiting.
Rate limits can be imposed on command APIs by configuring the tier, the command, and the hourly limit in the database.
Two database settings configure the rate limiting :RateLimitingDefaultCapacityTiers and RateLimitingCapacityByTierAndAction, If either of these settings exist in the database rate limiting will be enabled and If neither setting exists rate limiting is disabled.
For more details check the detailed guide on this link.
Simplified SMTP Configuration
With this release, we deprecate the usage of asadmin create-javamail-resource
to configure Dataverse to send mail using your SMTP server and provide a simplified, standard alternative using JVM options or MicroProfile Config.
At this point, no action is required if you want to keep your current configuration.
Warnings will show in your server logs to inform and remind you about the deprecation.
A future major release of Dataverse may remove this way of configuration.
Please do take the opportunity to update your SMTP configuration. Details can be found in section of the Installation Guide starting with the SMTP/Email Configuration section of the Installation Guide.
Once reconfiguration is complete, you should remove legacy, unused config. First, run asadmin delete-javamail-resource mail/notifyMailSession
as described in the 6.2 guides. Then run curl -X DELETE http://localhost:8080/api/admin/settings/:SystemEmail
as this database setting has been replace with dataverse.mail.system-email
as described below.
Please note: as there have been problems with email delivered to SPAM folders when the "From" within mail envelope and the mail session configuration didn't match (#4210), as of this version the sole source for the "From" address is the setting dataverse.mail.system-email
once you migrate to the new way of configuration.
Binder Redirect
If your installation is configured to use Binder, you should remove the old "girder_ythub" tool and replace it with the tool described at https://github.com/IQSS/dataverse-binder-redirect
For more information, see #10360.
Optional Croissant 🥐 Exporter Support
When a Dataverse installation is configured to use a metadata exporter for the Croissant format, the content of the JSON-LD in the <head> of dataset landing pages will be replaced with that format. However, both JSON-LD and Croissant will still be available for download from the dataset page and API.
For more information, see #10382.
Harvesting Handle Missing Controlled Values
Allows datasets to be harvested with Controlled Vocabulary Values that existed in the originating Dataverse installation but are not in the harvesting Dataverse installation. For more information, view the changes to the endpoint here.
Add .QPJ and .QMD Extensions to Shapefile Handling
Support for .qpj and .qmd files in shapefile uploads has been introduced, ensuring that these files are properly recognized and handled as part of geospatial datasets in Dataverse.
For more information, see #10305.
Ingested Tabular Data Files Can Be Stored Without the Variable Name Header
Tabular Data Ingest can now save the generated archival files with the list of variable names added as the first tab-delimited line.
Access API will be able to take advantage of Direct Download for .tab files saved with these headers on S3 - since they no longer have to be generated and added to the streamed content on the fly.
This behavior is controlled by the new setting :StoreIngestedTabularFilesWithVarHeaders. It is false by default, preserving the legacy behavior. When enabled, Dataverse will be able to handle both the newly ingested files, and any already-existing legacy files stored without these headers transparently to the user. E.g. the access API will continue delivering tab-delimited files with this header line, whether it needs to add it dynamically for the legacy files, or reading complete files directly from storage for the ones stored with it.
We are planning to add an API for converting existing legacy tabular files in a future release.
For more information, see #10282.
Uningest/Reingest Options Available in the File Page Edit Menu
New Uningest/Reingest options are available in the File Page Edit menu. Ingest errors can be cleared by users who can published the associated dataset and by superusers, allowing for a successful ingest to be undone or retried (e.g. after a Dataverse version update or if ingest size limits are changed).
The /api/files//uningest api also now allows users who can publish the dataset to undo an ingest failure.
For more information, see #10319.
Sphinx Guides Now Support Markdown Format and Tabs
Our guides now support the Markdown format with the extension .md. Additionally, an option to create tabs in the guides using Sphinx Tabs has been added. (You can see the tabs in action in the "dev usage" page of the Container Guide.) To continue building the guides, you will need to install this new dependency by re-running:
pip install -r requirements.txt
For more information, see #10111.
Number of Concurrent Indexing Operations Now Configurable
A new MicroProfile setting called dataverse.solr.concurrency.max-async-indexes
has been added that controls the maximum number of simultaneously running asynchronous dataset index operations (defaults to 4).
For more information, see #10388.
🪲 Bug fixes
Publication Status Facet Restored
In version 6.1, the publication status facet location was unintentionally moved to the bottom. In this version, we have restored the original order.
Assign a Role With Higher Permissions Than Its Own Role Has Been Fixed
The permissions required to assign a role have been fixed. It is no longer possible to assign a role that includes permissions that the assigning user doesn't have.
Geospatial Metadata Block Fields for North and South Renamed
The Geospatial metadata block fields for north and south were labeled incorrectly as longitudes, as reported in #5645. After updating to this version of Dataverse, users will need to update any API client code used "northLongitude" and "southLongitude" to "northLatitude" and "southLatitude", respectively, as mentioned on the mailing list.
Also, we have updated the tooltips in the Geospatial metadata block, where the use of commas instead of dots in coordinate values was incorrectly suggested.
OAI-PMH Error Handling Has Been Improved
OAI-PMH error handling has been improved to display a machine-readable error in XML rather than a 500 error with no further information.
- /oai?foo=bar will show "No argument 'verb' found"
- /oai?verb=foo&verb=bar will show "Verb must be singular, given: '[foo, bar]'"
Granting File Access Without Access Request
A bug introduced with the guestbook-at-request, requests are not deleted when granted, they are now given the state granted.
Harvesting redirects fixed
Redirects from search cards back to the original source for datasets harvested from "Generic OAI Archives", i.e. non-Dataverse OAI servers, have been fixed.
💾 Persistence
Missing Database Constraints
This release adds two missing database constraints that will assure that the externalvocabularyvalue table only has one entry for each uri and that the oaiset table only has one set for each spec. (In the very unlikely case that your existing database has duplicate entries now, install would fail. This can be checked by running the following commands:
SELECT uri, count(*) FROM externalvocabularyvalue group by uri;
And:
SELECT spec, count(*) FROM oaiset group by spec;
Then removing any duplicate rows (where count>1).
Universe Field in Variablemetadata Table Changed
Universe field in variablemetadata table was changed from varchar(255) to text. The change was made to support longer strings in "universe" metadata field, similar to the rest of text fields in variablemetadata table.
PostgreSQL Versions
This release adds install script support for the new permissions model in PostgreSQL versions 15+, and bumps Flyway to support PostgreSQL 16.
PostgreSQL 13 remains the version used with automated testing.
🌐 API
Listing Collection/Dataverse API
Listing collection/dataverse role assignments via API still requires ManageDataversePermissions, but listing dataset role assignments via API now requires only ManageDatasetPermissions.
New API Endpoint for Clearing an Individual Dataset From Solr
A new Index API endpoint has been added allowing an admin to clear an individual dataset from Solr.
For more information visit the documentation on this link
New Accounts Metrics API
Users can retrieve new types of metrics related to user accounts. The new capabilities are described in the guides.
New canDownloadAtLeastOneFile Endpoint
The /api/datasets/{id}/versions/{versionId}/canDownloadAtLeastOneFile
endpoint has been created.
This API endpoint indicates if the calling user can download at least one file from a dataset version. Note that Shibboleth group permissions are not considered.
Harvesting Client Endpoint Extended
The API endpoint api/harvest/clients/{harvestingClientNickname}
has been extended to include the following fields:
- allowHarvestingMissingCVV: enable/disable allowing datasets to be harvested with controlled vocabulary values that exist in the originating Dataverse server but are not present in the harvesting Dataverse server. The default is false.
Note: This setting is only available to the API and not currently accessible/settable via the UI.
Version Files Endpoint Extended
The response for getVersionFiles /api/datasets/{id}/versions/{versionId}/files
endpoint has been modified to include a total count of records available totalCount:x.
This will aid in pagination by allowing the caller to know how many pages can be iterated through. The existing API (getVersionFileCounts) to return the count will still be available.
Metadata Blocks Endpoint Extended
The API endpoint /api/metadatablocks/{block_id}
has been extended to include the following fields:
- isRequired: Whether or not this field is required
- displayOrder: The display order of the field in create/edit forms
- typeClass: The type class of this field ("controlledVocabulary", "compound", or "primitive")
Get File Citation as JSON
It is now possible to retrieve via API the file citation as it appears on the file landing page. It is formatted in HTML and encoded in JSON.
This API is not for downloading various citation formats such as EndNote XML, RIS, or BibTeX.
For more information check the documentation on this link
Files Endpoint Extended
The API endpoint api/files/{id}
has been extended to support the following optional query parameters:
- includeDeaccessioned: Indicates whether or not to consider deaccessioned dataset versions in the latest file search. (Default:
false
). - returnDatasetVersion: Indicates whether or not to include the dataset version of the file in the response. (Default:
false
).
A new endpoint api/files/{id}/versions/{datasetVersionId}
has been created. This endpoint returns the file metadata present in the requested dataset version. To specify the dataset version, you can use :latest-published
, :latest
, :draft
or 1.0
or any other available version identifier.
The endpoint supports the includeDeaccessioned and returnDatasetVersion optional query parameters, as does the api/files/{id}
endpoint.
api/files/{id}/draft
endpoint is no longer available in favor of the new endpoint api/files/{id}/versions/{datasetVersionId}
, which can use the version identifier :draft
(api/files/{id}/versions/:draft
) to obtain the same result.
Datasets, Dataverse Collections, and Datafiles Endpoints Extended
The API endpoints for getting datasets, Dataverse collections, and datafiles have been extended to support the following optional 'returnOwners' query parameter.
Including the parameter and setting it to true will add a hierarchy showing which dataset and dataverse collection(s) the object is part of to the json object returned.
For more information visit the full native API guide on this link
Endpoint Fixed: Datasets Metadata
The API endpoint api/datasets/{id}/metadata
has been changed to default to the latest version of the dataset to which the user has access.
Experimental Make Data Count processingState API
An experimental Make Data Count processingState API has been added. For now it has been documented in the (developer guide)[https://guides.dataverse.org/en/6.2/developers/make-data-count.html#processing-archived-logs].
⚠️ Backward Incompatibilities
To view a list of changes that can be impactful to your implementation please visit our detailed list of changes to the API.
📖 Guides
Container Guide, Documentation for Faster Redeploy
In the Container Guide, documentation for developers on how to quickly redeploy code has been added for Netbeans and improved for IntelliJ.
Also in the context of containers, a new option to skip deployment has been added and the war file is now consistently named "dataverse.war" rather than having a version in the filename, such as "dataverse-6.1.war". This predictability makes tooling easier.
Evaluation Version Tutorial on the Containers Guide
The Container Guide now containers a tutorial for running Dataverse in containers for demo or evaluation purposes: https://guides.dataverse.org/en/6.2/container/running/demo.html
New QA Guide
A new QA Guide is intended mostly for the core development team but may be of interest to contributors on: https://guides.dataverse.org/en/6.2/develop/qa
⚙️ New Settings
MicroProfile Settings
The * indicates a provider id indicating which provider the setting is for
- dataverse.pid.providers
- dataverse.pid.default-provider
- dataverse.pid.*.type
- dataverse.pid.*.label
- dataverse.pid.*.authority
- dataverse.pid.*.shoulder
- dataverse.pid.*.identifier-generation-style
- dataverse.pid.*.datafile-pid-format
- dataverse.pid.*.managed-list
- dataverse.pid.*.excluded-list
- dataverse.pid.*.datacite.mds-api-url
- dataverse.pid.*.datacite.rest-api-url
- dataverse.pid.*.datacite.username
- dataverse.pid.*.datacite.password
- dataverse.pid.*.ezid.api-url
- dataverse.pid.*.ezid.username
- dataverse.pid.*.ezid.password
- dataverse.pid.*.permalink.base-url
- dataverse.pid.*.permalink.separator
- dataverse.pid.*.handlenet.index
- dataverse.pid.*.handlenet.independent-service
- dataverse.pid.*.handlenet.auth-handle
- dataverse.pid.*.handlenet.key.path
- dataverse.pid.*.handlenet.key.passphrase
- dataverse.spi.pidproviders.directory
- dataverse.solr.concurrency.max-async-indexes
SMTP Settings:
- dataverse.mail.system-email
- dataverse.mail.mta.host
- dataverse.mail.mta.port
- dataverse.mail.mta.ssl.enable
- dataverse.mail.mta.auth
- dataverse.mail.mta.user
- dataverse.mail.mta.password
- dataverse.mail.mta.allow-utf8-addresses
- Plus many more for advanced usage and special provider requirements. See configuration guide for a full list.
Database Settings:
- :RateLimitingDefaultCapacityTiers
- :RateLimitingCapacityByTierAndAction
- :StoreIngestedTabularFilesWithVarHeaders
📋 Complete List of Changes
For the complete list of code changes in this release, see the 6.2 Milestone in GitHub.
🛟 Getting Help
For help with upgrading, installing, or general questions please post to the Dataverse Community Google Group or email [email protected].
💻 Upgrade Instructions
Upgrading requires a maintenance window and downtime. Please plan ahead, create backups of your database, etc.
These instructions assume that you've already upgraded through all the 5.x releases and are now running Dataverse 6.1.
0. These instructions assume that you are upgrading from the immediate previous version. If you are running an earlier version, the only safe way to upgrade is to progress through the upgrades to all the releases in between before attempting the upgrade to this version.
If you are running Payara as a non-root user (and you should be!), remember not to execute the commands below as root. Use sudo
to change to that user first. For example, sudo -i -u dataverse
if dataverse
is your dedicated application user.
In the following commands we assume that Payara 6 is installed in /usr/local/payara6
. If not, adjust as needed.
export PAYARA=/usr/local/payara6
(or setenv PAYARA /usr/local/payara6
if you are using a csh
-like shell)
1. Usually, when a Solr schema update is released, we recommend deploying the new version of Dataverse, then updating the schema.xml
on the solr side. With 6.2, we recommend to install the base schema first. Without it Dataverse 6.2 is not going to be able to show any results after the initial deployment. If your instance is using any custom metadata blocks, you will need to further modify the schema, see the last step of this instruction (step 8).
-
Stop Solr instance (usually
service solr stop
, depending on Solr installation/OS, see the Installation Guide) -
Replace schema.xml
wget https://raw.githubusercontent.com/IQSS/dataverse/v6.2/conf/solr/9.3.0/schema.xml
cp schema.xml /usr/local/solr/solr-9.3.0/server/solr/collection1/conf
-
Start Solr instance (usually
service solr start
, depending on Solr/OS)
2. Undeploy the previous version.
$PAYARA/bin/asadmin undeploy dataverse-6.1
3. Stop Payara and remove the generated directory
service payara stop
rm -rf $PAYARA/glassfish/domains/domain1/generated
4. Start Payara
service payara start
5. Deploy this version.
$PAYARA/bin/asadmin deploy dataverse-6.2.war
The deployment of the war file may take some time on a large production database due to the database migration scripts that are part of the release.
6. For installations with internationalization:
- Please remember to update translations via Dataverse language packs.
7. Restart Payara
service payara stop
service payara start
8. Update the following Metadata Blocks to reflect the incremental improvements made to the handling of core metadata fields:
wget https://github.com/IQSS/dataverse/releases/download/v6.2/geospatial.tsv
curl http://localhost:8080/api/admin/datasetfield/load -H "Content-type: text/tab-separated-values" -X POST --upload-file geospatial.tsv
wget https://github.com/IQSS/dataverse/releases/download/v6.2/citation.tsv
curl http://localhost:8080/api/admin/datasetfield/load -H "Content-type: text/tab-separated-values" -X POST --upload-file citation.tsv
wget https://github.com/IQSS/dataverse/releases/download/v6.2/astrophysics.tsv
curl http://localhost:8080/api/admin/datasetfield/load -H "Content-type: text/tab-separated-values" -X POST --upload-file astrophysics.tsv
wget https://github.com/IQSS/dataverse/releases/download/v6.2/biomedical.tsv
curl http://localhost:8080/api/admin/datasetfield/load -H "Content-type: text/tab-separated-values" -X POST --upload-file biomedical.tsv
9. For installations with custom or experimental metadata blocks:
-
Stop Solr instance (usually
service solr stop
, depending on Solr installation/OS, see the Installation Guide) -
Run the
update-fields.sh
script that we supply, as in the example below (modify the command lines as needed to reflect the correct path of your solr installation):
wget https://raw.githubusercontent.com/IQSS/dataverse/v6.2/conf/solr/9.3.0/update-fields.sh
chmod +x update-fields.sh
curl "http://localhost:8080/api/admin/index/solr/schema" | ./update-fields.sh /usr/local/solr/solr-9.3.0/server/solr/collection1/conf/schema.xml
- Restart Solr instance (usually
service solr restart
depending on solr/OS)
10. Reindex Solr:
For details, see https://guides.dataverse.org/en/6.2/admin/solr-search-index.html but here is the reindex command:
curl http://localhost:8080/api/admin/index