diff --git a/.gitignore b/.gitignore index 1e802991547..624691049bc 100644 --- a/.gitignore +++ b/.gitignore @@ -2,6 +2,9 @@ nb-configuration.xml target infer-out nbactions.xml +.settings +.classpath +.project michael-local GPATH GTAGS diff --git a/doc/sphinx-guides/source/_static/installation/files/var/www/dataverse/branding/analytics-code.html b/doc/sphinx-guides/source/_static/installation/files/var/www/dataverse/branding/analytics-code.html new file mode 100644 index 00000000000..bd24ad20dfa --- /dev/null +++ b/doc/sphinx-guides/source/_static/installation/files/var/www/dataverse/branding/analytics-code.html @@ -0,0 +1,148 @@ + + + + diff --git a/doc/sphinx-guides/source/admin/dashboard.rst b/doc/sphinx-guides/source/admin/dashboard.rst index b411402812d..5ee53790471 100644 --- a/doc/sphinx-guides/source/admin/dashboard.rst +++ b/doc/sphinx-guides/source/admin/dashboard.rst @@ -29,3 +29,7 @@ Users This dashboard tool allows you to search a list of all users of your Dataverse installation. You can remove roles from user accounts and assign or remove superuser status. See the :doc:`user-administration` section for more details. +Move Data +--------- + +This tool allows you to move datasets. To move dataverses, see the :doc:`dataverses-datasets` section. diff --git a/doc/sphinx-guides/source/admin/dataverses-datasets.rst b/doc/sphinx-guides/source/admin/dataverses-datasets.rst index 69abae42308..2e8893f5f07 100644 --- a/doc/sphinx-guides/source/admin/dataverses-datasets.rst +++ b/doc/sphinx-guides/source/admin/dataverses-datasets.rst @@ -46,7 +46,9 @@ Datasets Move a Dataset ^^^^^^^^^^^^^^ -Moves a dataset whose id is passed to a dataverse whose alias is passed. If the moved dataset has a guestbook or a dataverse link that is not compatible with the destination dataverse, you will be informed and given the option to force the move and remove the guestbook or link. Only accessible to users with permission to publish the dataset in the original and destination dataverse. :: +Superusers can move datasets using the dashboard. See also :doc:`dashboard`. + +Moves a dataset whose id is passed to a dataverse whose alias is passed. If the moved dataset has a guestbook or a dataverse link that is not compatible with the destination dataverse, you will be informed and given the option to force the move (with ``forceMove=true`` as a query parameter) and remove the guestbook or link (or both). Only accessible to users with permission to publish the dataset in the original and destination dataverse. :: curl -H "X-Dataverse-key: $API_TOKEN" -X POST http://$SERVER/api/datasets/$id/move/$alias @@ -82,3 +84,9 @@ Make Metadata Updates Without Changing Dataset Version ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ As a superuser, click "Update Current Version" when publishing. (This option is only available when a 'Minor' update would be allowed.) + +Diagnose Constraint Violations Issues in Datasets +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +To identifiy invalid data values in specific datasets (if, for example, an attempt to edit a dataset results in a ConstraintViolationException in the server log), or to check all the datasets in the Dataverse for constraint violations, see :ref:`Dataset Validation ` in the :doc:`/api/native-api` section of the User Guide. + diff --git a/doc/sphinx-guides/source/admin/metadatacustomization.rst b/doc/sphinx-guides/source/admin/metadatacustomization.rst index 8c13233d5c7..85145be95cf 100644 --- a/doc/sphinx-guides/source/admin/metadatacustomization.rst +++ b/doc/sphinx-guides/source/admin/metadatacustomization.rst @@ -606,7 +606,7 @@ Reloading a Metadata Block As mentioned above, changes to metadata blocks that ship with Dataverse will be made over time to improve them and release notes will sometimes instruct you to reload an existing metadata block. The syntax for reloading is the same as reloading. Here's an example with the "citation" metadata block: -``curl http://localhost:8080/api/admin/datasetfield/load -H "Content-type: text/tab-separated-values" -X POST --upload-file --upload-file citation.tsv`` +``curl http://localhost:8080/api/admin/datasetfield/load -H "Content-type: text/tab-separated-values" -X POST --upload-file citation.tsv`` Great care must be taken when reloading a metadata block. Matching is done on field names (or identifiers and then names in the case of controlled vocabulary values) so it's easy to accidentally create duplicate fields. diff --git a/doc/sphinx-guides/source/admin/metadataexport.rst b/doc/sphinx-guides/source/admin/metadataexport.rst index 27df2d4e7a3..078177b609a 100644 --- a/doc/sphinx-guides/source/admin/metadataexport.rst +++ b/doc/sphinx-guides/source/admin/metadataexport.rst @@ -7,14 +7,7 @@ Metadata Export Automatic Exports ----------------- -Publishing a dataset automatically starts a metadata export job, that will run in the background, asynchronously. Once completed, it will make the dataset metadata exported and cached in all the supported formats: - -- Dublin Core -- Data Documentation Initiative (DDI) -- DataCite 4 -- native JSON (Dataverse-specific) -- OAI_ORE -- Schema.org JSON-LD +Publishing a dataset automatically starts a metadata export job, that will run in the background, asynchronously. Once completed, it will make the dataset metadata exported and cached in all the supported formats listed under :ref:`Supported Metadata Export Formats ` in the :doc:`/user/dataset-management` section of the User Guide. A scheduled timer job that runs nightly will attempt to export any published datasets that for whatever reason haven't been exported yet. This timer is activated automatically on the deployment, or restart, of the application. So, again, no need to start or configure it manually. (See the "Application Timers" section of this guide for more information) diff --git a/doc/sphinx-guides/source/admin/troubleshooting.rst b/doc/sphinx-guides/source/admin/troubleshooting.rst index 662060b7438..8cec4431947 100644 --- a/doc/sphinx-guides/source/admin/troubleshooting.rst +++ b/doc/sphinx-guides/source/admin/troubleshooting.rst @@ -58,3 +58,16 @@ followed by an Exception stack trace with these lines in it: Make sure you install the correct version of the driver. For example, if you are running the version 9.3 of PostgreSQL, make sure you have the driver postgresql-9.3-1104.jdbc4.jar in your :fixedwidthplain:`/glassfish/lib` directory. Go `here `_ to download the correct version of the driver. If you have an older driver in glassfish/lib, make sure to remove it, replace it with the new version and restart Glassfish. (You may need to remove the entire contents of :fixedwidthplain:`/glassfish/domains/domain1/generated` before you start Glassfish). + +Constraint Violations Issues +---------------------------- + +In real life production use, it may be possible to end up in a situation where some values associated with the datasets in your database are no longer valid under the constraints enforced by the latest version of Dataverse. This is not very likely to happen, but if it does, the symptomps will be as follows: Some datasets can no longer be edited, long exception stack traces logged in the Glassfish server log, caused by:: + + javax.validation.ConstraintViolationException: + Bean Validation constraint(s) violated while executing Automatic Bean Validation on callback event:'preUpdate'. + Please refer to embedded ConstraintViolations for details. + +(contrary to what the message suggests, there are no specific "details" anywhere in the stack trace that would explain what values violate which constraints) + +To identifiy the specific invalid values in the affected datasets, or to check all the datasets in the Dataverse for constraint violations, see :ref:`Dataset Validation ` in the :doc:`/api/native-api` section of the User Guide. diff --git a/doc/sphinx-guides/source/api/native-api.rst b/doc/sphinx-guides/source/api/native-api.rst index e715541b7ff..c0d1bfd8e11 100644 --- a/doc/sphinx-guides/source/api/native-api.rst +++ b/doc/sphinx-guides/source/api/native-api.rst @@ -291,7 +291,7 @@ Export Metadata of a Dataset in Various Formats GET http://$SERVER/api/datasets/export?exporter=ddi&persistentId=$persistentId -.. note:: Supported exporters (export formats) are ``ddi``, ``oai_ddi``, ``dcterms``, ``oai_dc``, ``schema.org`` , ``OAI_ORE`` , ``Datacite`` and ``dataverse_json``. +.. note:: Supported exporters (export formats) are ``ddi``, ``oai_ddi``, ``dcterms``, ``oai_dc``, ``schema.org`` , ``OAI_ORE`` , ``Datacite``, ``oai_datacite`` and ``dataverse_json``. Schema.org JSON-LD ^^^^^^^^^^^^^^^^^^ @@ -1325,6 +1325,43 @@ Recalculate the UNF value of a dataset version, if it's missing, by supplying th POST http://$SERVER/api/admin/datasets/integrity/{datasetVersionId}/fixmissingunf +.. _dataset-validation-api: + +Dataset Validation +~~~~~~~~~~~~~~~~~~ + +Validate the dataset and its components (DatasetVersion, FileMetadatas, etc.) for constraint violations:: + + curl $SERVER_URL/api/admin/validate/dataset/{datasetId} + +if validation fails, will report the specific database entity and the offending value. For example:: + + {"status":"OK","data":{"entityClassDatabaseTableRowId":"[DatasetVersion id:73]","field":"archiveNote","invalidValue":"random text, not a url"}} + + +Validate all the datasets in the Dataverse, report any constraint violations found:: + + curl $SERVER_URL/api/admin/validate/datasets + +This API streams its output in real time, i.e. it will start producing the output immediately and will be reporting on the progress as it validates one dataset at a time. For example:: + + {"datasets": [ + {"datasetId":27,"status":"valid"}, + {"datasetId":29,"status":"valid"}, + {"datasetId":31,"status":"valid"}, + {"datasetId":33,"status":"valid"}, + {"datasetId":35,"status":"valid"}, + {"datasetId":41,"status":"invalid","entityClassDatabaseTableRowId":"[DatasetVersion id:73]","field":"archiveNote","invalidValue":"random text, not a url"}, + {"datasetId":57,"status":"valid"} + ] + } + +Note that if you are attempting to validate a very large number of datasets in your Dataverse, this API may time out - subject to the timeout limit set in your Glassfish configuration. If this is a production Dataverse instance serving large amounts of data, you most likely have that timeout set to some high value already. But if you need to increase it, it can be done with the asadmin command. For example:: + + asadmin set server-config.network-config.protocols.protocol.http-listener-1.http.request-timeout-seconds=3600 + + + Workflows ~~~~~~~~~ diff --git a/doc/sphinx-guides/source/conf.py b/doc/sphinx-guides/source/conf.py index c772a03f986..cfb42d72255 100755 --- a/doc/sphinx-guides/source/conf.py +++ b/doc/sphinx-guides/source/conf.py @@ -65,9 +65,9 @@ # built documents. # # The short X.Y version. -version = '4.13' +version = '4.14' # The full version, including alpha/beta/rc tags. -release = '4.13' +release = '4.14' # The language for content autogenerated by Sphinx. Refer to documentation # for a list of supported languages. diff --git a/doc/sphinx-guides/source/developers/coding-style.rst b/doc/sphinx-guides/source/developers/coding-style.rst index 2ac40784c05..dcdf54b17be 100755 --- a/doc/sphinx-guides/source/developers/coding-style.rst +++ b/doc/sphinx-guides/source/developers/coding-style.rst @@ -89,11 +89,18 @@ Generally speaking you should use ``fine`` for everything that you don't want to When adding logging, do not simply add ``System.out.println()`` lines because the logging level cannot be controlled. -Avoid Hard-Coding Strings -~~~~~~~~~~~~~~~~~~~~~~~~~ +Avoid Hard-Coding Strings (Use Constants) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Special strings should be defined as public constants. For example, ``DatasetFieldConstant.java`` contains a field for "title" and it's used in many places in the code (try "Find Usages" in Netbeans). This is better than writing the string "title" in all those places. +Avoid Hard-Coding User-Facing Messaging in English +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +There is an ongoing effort to translate Dataverse into various languages. Look for "lang" or "languages" in the :doc:`/installation/config` section of the Installation Guide for details if you'd like to help or play around with this feature. + +The translation effort is hampered if you hard code user-facing messages in English in the Java code. Put English strings in ``Bundle.properties`` and use ``BundleUtil`` to pull them out. This is especially important for messages that appear in the UI. We are aware that the API has many, many hard coded English strings in it. If you touch a method in the API and notice English strings, you are strongly encouraged to used that opportunity to move the English to ``Bundle.properties``. + Type Safety ~~~~~~~~~~~ diff --git a/doc/sphinx-guides/source/installation/config.rst b/doc/sphinx-guides/source/installation/config.rst index 6574142e1e8..c19bfde3b75 100644 --- a/doc/sphinx-guides/source/installation/config.rst +++ b/doc/sphinx-guides/source/installation/config.rst @@ -569,6 +569,16 @@ Once you have created the analytics file, run this curl command to add it to you ``curl -X PUT -d '/var/www/dataverse/branding/analytics-code.html' http://localhost:8080/api/admin/settings/:WebAnalyticsCode`` +Tracking Button Clicks +++++++++++++++++++++++ + +The basic analytics configuration above tracks page navigation. However, it does not capture potentially interesting events, such as those from users clicking buttons on pages, that do not result in a new page opening. In Dataverse, these events include file downloads, requesting access to restricted data, exporting metadata, social media sharing, requesting citation text, launching external tools or WorldMap, contacting authors, and launching computations. + +Both Google and Matomo provide the optional capability to track such events and Dataverse has added CSS style classes (btn-compute, btn-contact, btn-download, btn-explore, btn-export, btn-preview, btn-request, btn-share, anddownloadCitation) to it's HTML to facilitate it. + +For Google Analytics, the example script at :download:`analytics-code.html ` will track both page hits and events within Dataverse. You would use this file in the same way as the shorter example above, putting it somewhere outside your deployment directory, replacing ``YOUR ACCOUNT CODE`` with your actual code and setting :WebAnalyticsCode to reference it. + +Once this script is running, you can look in the Google Analytics console (Realtime/Events or Behavior/Events) and view events by type and/or the Dataset or File the event involves. DuraCloud/Chronopolis Integration --------------------------------- diff --git a/doc/sphinx-guides/source/installation/prerequisites.rst b/doc/sphinx-guides/source/installation/prerequisites.rst index ac7a49e6ba6..72bc64a19f0 100644 --- a/doc/sphinx-guides/source/installation/prerequisites.rst +++ b/doc/sphinx-guides/source/installation/prerequisites.rst @@ -124,9 +124,7 @@ PostgreSQL Installing PostgreSQL ======================= -Version 9.3 is required. Previous versions have not been tested. - -Version 9.6 is strongly recommended:: +Version 9.6 is strongly recommended because it is the version developers and QA test with:: # yum install -y https://download.postgresql.org/pub/repos/yum/9.6/redhat/rhel-7-x86_64/pgdg-centos96-9.6-3.noarch.rpm # yum makecache fast diff --git a/doc/sphinx-guides/source/user/dataset-management.rst b/doc/sphinx-guides/source/user/dataset-management.rst index 835594b3410..c2b272725dd 100755 --- a/doc/sphinx-guides/source/user/dataset-management.rst +++ b/doc/sphinx-guides/source/user/dataset-management.rst @@ -20,7 +20,20 @@ A dataset contains three levels of metadata: For more details about what Citation and Domain Specific Metadata is supported please see our :ref:`user-appendix`. -Note that once a dataset has been published its metadata may be exported. A button on the dataset page's metadata tab will allow a user to export the metadata of the most recently published version of the dataset. Currently supported export formats are DDI, Dublin Core, Datacite 4, OAI_ORE, Schema.org JSON-LD, and Dataverse's native JSON format. +.. _metadata-export-formats: + +Supported Metadata Export Formats +--------------------------------- + +Once a dataset has been published its metadata is exported in a variety of formats. A button on the dataset page's metadata tab will allow a user to export the metadata of the most recently published version of the dataset. Currently supported export formats are: + +- Dublin Core +- DDI (Data Documentation Initiative) +- DataCite 4 +- JSON (native Dataverse format) +- OAI_ORE +- OpenAIRE +- Schema.org JSON-LD Adding a New Dataset ==================== @@ -510,4 +523,4 @@ If you deaccession the most recently published version of the dataset but not al .. |file-upload-prov-window| image:: ./img/prov1.png :class: img-responsive .. |image-file-tree-view| image:: ./img/file-tree-view.png - :class: img-responsive \ No newline at end of file + :class: img-responsive diff --git a/doc/sphinx-guides/source/user/find-use-data.rst b/doc/sphinx-guides/source/user/find-use-data.rst index 3fd2c6439b2..91947ea80e2 100755 --- a/doc/sphinx-guides/source/user/find-use-data.rst +++ b/doc/sphinx-guides/source/user/find-use-data.rst @@ -121,7 +121,7 @@ rsync is typically used for synchronizing files and directories between two diff rsync-enabled Dataverse installations offer a new file download process that differs from traditional browser-based downloading. Instead of multiple files, each dataset uploaded via rsync contains a single "Dataverse Package". When you download this package you will receive a folder that contains all files from the dataset, arranged in the exact folder structure in which they were originally uploaded. -In a dataset containing a Dataverse Package, at the bottom of the dataset page, under the **Data Access** tab, instead of a download button you will find the information you need in order to download the Dataverse Package using rsync. If the data is locally available to you (on a shared drive, for example) then you can find it at the folder path under **Local Access**. Otherwise, to download the Dataverse Package you will have to use one of the rsync commands under **Download Access**. There may be multiple commands listed, each corresponding to a different mirror that hosts the Dataverse Package. Go outside your browser and open a terminal (AKA command line) window on your computer. Use the terminal to run the command that corresponds with the mirror of your choice. It's usually best to choose the mirror that is geographically closest to you. Running this command will initiate the download process. +In a dataset containing a Dataverse Package, the information to download and/or access is in two places. You can find it on the **dataset page** under the **Files** tab, and on the **file page** under the **Data Access** tab. If the data is locally available to you (on a shared drive, for example) you will find the folder path to access the data locally. To download, use one of the rsync commands provided. There may be multiple commands, each corresponding to a different mirror that hosts the Dataverse Package. Go outside your browser and open a terminal (AKA command line) window on your computer. Use the terminal to run the command that corresponds with the mirror of your choice. It’s usually best to choose the mirror that is geographically closest to you. Running this command will initiate the download process. After you've downloaded the Dataverse Package, you may want to double-check that your download went perfectly. Under **Verify Data**, you'll find a command that you can run in your terminal that will initiate a checksum to ensure that the data you downloaded matches the data in Dataverse precisely. This way, you can ensure the integrity of the data you're working with. diff --git a/doc/sphinx-guides/source/versions.rst b/doc/sphinx-guides/source/versions.rst index 9454d58720a..da9e429820c 100755 --- a/doc/sphinx-guides/source/versions.rst +++ b/doc/sphinx-guides/source/versions.rst @@ -6,8 +6,10 @@ Dataverse Guides Versions This list provides a way to refer to previous versions of the Dataverse guides, which we still host. In order to learn more about the updates delivered from one version to another, visit the `Releases `__ page in our GitHub repo. -- 4.13 +- 4.14 + +- `4.13 `__ - `4.12 `__ - `4.11 `__ - `4.10.1 `__ diff --git a/pom.xml b/pom.xml index 75cb754f9d1..95ba816701d 100644 --- a/pom.xml +++ b/pom.xml @@ -7,7 +7,7 @@ --> edu.harvard.iq dataverse - 4.13 + 4.14 war dataverse @@ -599,6 +599,12 @@ tika-parsers 1.19 + + + org.apache.opennlp + opennlp-tools + 1.9.1 + +
+ +
+ + + + + + + + + + + + + + + +
+
+ + + + +
+
+
#{bundle['dashboard.card.datamove.newdataverse.header']}
+
+ + +
+ +
+ + + + + + + + + + + + + +
+
+ +
+
+
+ +
+ + + + + + + +
+ + +

#{bundle['dashboard.card.datamove.confirm.dialog']}

+
+ + +
+
+ + + + + + diff --git a/src/main/webapp/dashboard.xhtml b/src/main/webapp/dashboard.xhtml index 8128c3168fe..b49ae3f88f0 100644 --- a/src/main/webapp/dashboard.xhtml +++ b/src/main/webapp/dashboard.xhtml @@ -123,6 +123,30 @@ + +
+
+

#{bundle['dashboard.card.datamove']}

+
+
+ +

#{bundle['dataverses']}

+
+
+ +

#{bundle['datasets']}

+
+
+ +
+
+ diff --git a/src/main/webapp/dataset.xhtml b/src/main/webapp/dataset.xhtml index 2dc63759558..779cf02edb3 100644 --- a/src/main/webapp/dataset.xhtml +++ b/src/main/webapp/dataset.xhtml @@ -59,6 +59,7 @@ + @@ -524,11 +525,11 @@ - + -