Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Enabling configuration of Dataverse using simple files (TOML) #10684

Open
poikilotherm opened this issue Jul 14, 2024 · 4 comments
Labels
Component: Code Infrastructure formerly "Feature: Code Infrastructure" Component: Containers Anything related to cloudy Dataverse, shipped in containers. Feature: Installation Guide Feature: Installer Size: 10 A percentage of a sprint. 7 hours. Status: Needs Input Applied to issues in need of input from someone currently unavailable Type: Feature a feature request User Role: Sysadmin Installs, upgrades, and configures the system, connects via ssh

Comments

@poikilotherm
Copy link
Contributor

poikilotherm commented Jul 14, 2024

Overview of the Feature Request

Let's enable using a TOML file to configure JVM options for starters. Ideally extend to configuring "DB" options and REST API DTO-based options, too, providing a unified approach to how one configures a Dataverse instance. Basically, enable sth like /etc/dataverse/config.tomlin the spirit of many UNIX/Linux services.

What kind of user is the feature intended for?

Sysadmin

What inspired the request?

Many not-so-experienced Dataverse admins have a hard time setting up their JVM options right.
With our now ever growing list of options (recently the PID providers were added), it's easy to end up with a mess of options.
It's friggin' complicated!

DB options these days cannot be provisioned from the same place as JVM options. Having to very different ways of configuring things as well as adding API endpoints for model based config approaches (auth, licenses, ...) is not making it easier for newbies and non-superhero-admins to follow.

Even though our JVM options start to be scoped and hierarchical, in reality the configuration requires a flat structure as system properties in domain.xml, long env var names, etc. There is one exception: the Dir Config Source allows to create folders with files. But it seems hardly used in classic installations and is clearly geared towards container usage.

Enabling configuration file(s) allows to provision a Dataverse instance from configuration management system in a much easier way. There is loads of tooling around to manage TOML files in idempotent ways, while editing domain.xml is a lot harder. Humans tend to like TOML more than YAML and even more than XML. (Let alone the fact that domain.xml is a VERY large and complex file.) Serving one or more of these files from a K8s ConfigMap, maybe even generated by a K8s Operator is simple and makes people have more "control" over their deployments.

What existing behavior do you want changed?

I want to be able to configure JVM options using a TOML file. Here's an example.

Instead of configuring all these options:

DATAVERSE_PID_PROVIDERS: "zb-test"
DATAVERSE_PID_DEFAULT_PROVIDER: "zb-test"
DATAVERSE_PID_ZB_TEST_TYPE: "datacite"
DATAVERSE_PID_ZB_TEST_LABEL: "DataCite Test Fabrica"
DATAVERSE_PID_ZB_TEST_AUTHORITY: "10.0346"
DATAVERSE_PID_ZB_TEST_SHOULDER: "JUELICH-DATA-BETA/"
DATAVERSE_PID_ZB_TEST_IDENTIFIER_GENERATION_STYLE: "randomString"
DATAVERSE_PID_ZB_TEST_DATACITE_REST_API_URL: "https://api.test.datacite.org/"
DATAVERSE_PID_ZB_TEST_DATACITE_MDS_API_URL: "https://mds.test.datacite.org/"
DATAVERSE_PID_ZB_TEST_DATACITE_USERNAME: "FOO.BAR"
DATAVERSE_PID_ZB_TEST_DATACITE_PASSWORD: "whatever"

Let's put this into a TOML file:

[dataverse.pid]
providers        = "zb-test"
default-provider = "zb-test"

[dataverse.pid.zb-test]
type                        = "datacite"
label                       = "DataCite Test Fabrica"
authority                   = "10.0346"
shoulder                    = "JUELICH-DATA-BETA/"
identifier-generation-style =  "randomString"

[dataverse.pid.zb-test.datacite]
rest-api-url = "https://api.test.datacite.org/"
mds-api-url  = "https://mds.test.datacite.org/"
username     = "FOO.BAR"
password     = "whatever"

Isn't this a lot easier to read and maintain? (It's a lot more DRY-compliant and less chatty...)

A different way to write this, which might be preffered by some is like this:

[dataverse.pid]
providers        = "zb-test"
default-provider = "zb-test"

[dataverse.pid.zb-test]
type                        = "datacite"
label                       = "DataCite Test Fabrica"
authority                   = "10.0346"
shoulder                    = "JUELICH-DATA-BETA/"
identifier-generation-style =  "randomString"
datacite.rest-api-url       = "https://api.test.datacite.org/"
datacite.mds-api-url        = "https://mds.test.datacite.org/"
datacite.username           = "FOO.BAR"
datacite.password           = "whatever"

Any brand new behavior do you want to add to Dataverse?

It's not really brand new yet when talking about JVM options. It would be brand new when talking about DB options and stuff like auth providers, licenses etc (which are configured by REST API calls with a DTO).

Any open or closed issues related to this feature request?

@poikilotherm poikilotherm added Type: Feature a feature request Component: Code Infrastructure formerly "Feature: Code Infrastructure" Feature: Installer Feature: Installation Guide User Role: Sysadmin Installs, upgrades, and configures the system, connects via ssh Component: Containers Anything related to cloudy Dataverse, shipped in containers. labels Jul 14, 2024
@poikilotherm poikilotherm added the Size: 10 A percentage of a sprint. 7 hours. label Jul 14, 2024
poikilotherm added a commit to poikilotherm/dataverse that referenced this issue Jul 14, 2024
@DS-INRAE DS-INRAE moved this to 🙏 Wanted for next version in Recherche Data Gouv Jul 15, 2024
@DS-INRAE DS-INRAE moved this from 🙏 Wanted for next version to 🔍 Interest in Recherche Data Gouv Jul 15, 2024
@cmbz
Copy link

cmbz commented Jul 18, 2024

2024/07/18 - 6.4 proposal request from @poikilotherm

@pdurbin pdurbin changed the title Feature Request: Enabling configuration of Dataverse using simple file(s) Feature Request: Enabling configuration of Dataverse using simple files (TOML) Oct 31, 2024
@pdurbin pdurbin added the Status: Needs Input Applied to issues in need of input from someone currently unavailable label Oct 31, 2024
@cmbz cmbz moved this to SPRINT READY in IQSS Dataverse Project Nov 18, 2024
@cmbz
Copy link

cmbz commented Nov 18, 2024

2024/11/18: Whoever picks up this issue, please check with @poikilotherm first to coordinate with him and discuss past work he may have already completed.

@pdurbin
Copy link
Member

pdurbin commented Nov 18, 2024

We discussed this a few weeks ago in the container meeting: https://docs.google.com/document/d/1AN6aAX5rt4lS5fEGYY1Q7Feug2GkIoLve-8WSOppVpA/edit?usp=sharing

My understanding is that the commit is available upstream in Payara and now we're waiting for payara-server-6.2024.11 to come out. (payara-server-6.2024.11.RC1 is already available.)

@pdurbin
Copy link
Member

pdurbin commented Dec 19, 2024

My understanding is that the commit is available upstream in Payara and now we're waiting for payara-server-6.2024.11 to come out. (payara-server-6.2024.11.RC1 is already available.)

These are both out now:

I believe the next steps are:

I think I'm ok with these being separate pull requests or a single one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Code Infrastructure formerly "Feature: Code Infrastructure" Component: Containers Anything related to cloudy Dataverse, shipped in containers. Feature: Installation Guide Feature: Installer Size: 10 A percentage of a sprint. 7 hours. Status: Needs Input Applied to issues in need of input from someone currently unavailable Type: Feature a feature request User Role: Sysadmin Installs, upgrades, and configures the system, connects via ssh
Projects
Status: Important
Status: SPRINT READY
Status: 🔍 Interest
Development

No branches or pull requests

3 participants