Skip to content
This repository has been archived by the owner on Aug 26, 2022. It is now read-only.

EML.XML validation not supported or broken? #98

Open
CecSve opened this issue May 5, 2022 · 14 comments
Open

EML.XML validation not supported or broken? #98

CecSve opened this issue May 5, 2022 · 14 comments

Comments

@CecSve
Copy link

CecSve commented May 5, 2022

A GBIF publisher is experiencing issues with validation of his eml.xml file in the data validator.

Data validation result here:

Regardless, it was able to parse out three issues with the .eml file

1. "The licence can not be parsed, is not supported by GBIF or is simply missing”

2. I also get "The description of the dataset is missing or too short” - which I don’t understand.

and lastly, for the EML file:

3. The EML document does not validate against the schema”

ERROR
cvc-complex-type.3.2.2: Attribute 'core_scope' is not allowed to appear in element 'eml:eml'.
cvc-complex-type.4: Attribute 'scope' must appear on element 'eml:eml'.
cvc-complex-type.2.4.a: Invalid content was found starting with element '{"eml://ecoinformatics.org/eml-2.1.1":dataset}'. One of '{dataset}' is expected.

Not sure what any of that means. I built this EML file from our old EML file that worked and a few other places.

@CecSve
Copy link
Author

CecSve commented May 5, 2022

The first issue could be due to changes in EML versions where license has been changed to licensed.

Perhaps the validation will work as DWC with only EML, but not as a xml file?

@MattBlissett
Copy link
Member

What's the EML version? We support version 1.0, 1.0.1, 1.0.2 and 1.1 at present, and have a longer term task to update to the newer versions.

(The version is probably in the first few lines of the EML).

@CecSve
Copy link
Author

CecSve commented May 5, 2022

eml-2.1.1

Then I suppose the first issue is not fixed by changing license to licensed either which was introduced with version 2.

@albenson-usgs
Copy link

@tmcelrath I'm trying to figure out the best way to proceed on this. I could help with building an EML file using the IPT that you could work with as your starting point instead? Would that be helpful?

Also note that when I tried to load your eml.xml file into XMLNotepad it gave this error:
XMLNotepadError

@tmcelrath
Copy link

Idea, can you send me a properly formatted EML file that has been built by the IPT? That way I can compare with what I have. We used the same text on our last upload in 2015 so I don't have anything to compare it to.

@mjy
Copy link

mjy commented May 17, 2022

@tmcelrath I will resolve this: SpeciesFileGroup/taxonworks#2986, and try to validate from there.

@mjy
Copy link

mjy commented May 17, 2022

Idea, can you send me a properly formatted EML file that has been built by the IPT?

^ Please attach the very latest version of the EML that is supported to this issue if possible.

@mjy
Copy link

mjy commented May 17, 2022

Is there an stand-alone web-based EML validator that I can check against for 1.1.2? I.e. for only that file? Never mind this.

@mjy
Copy link

mjy commented May 17, 2022

Now I am confused. Here is EML from a very recent ALA file upload to GBIF. It appears to use 2.1. Perhaps our issue is something else.

Screen Shot 2022-05-17 at 1 08 01 PM

<?xml version="1.0" encoding="utf-8"?>
<eml:eml xmlns:d="eml://ecoinformatics.org/dataset-2.1.0" xmlns:eml="eml://ecoinformatics.org/eml-2.1.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:dc="http://purl.org/dc/terms/" xsi:schemaLocation="eml://ecoinformatics.org/eml-2.1.1 http://rs.gbif.org/schema/eml-gbif-profile/1.1/eml-gbif-profile.xsd" system="ALA-Registry" scope="system" xml:lang="en">
  <dataset>
  ...

@mjy
Copy link

mjy commented May 17, 2022

@tmcelrath I believe the problem is the file you supplied is a frankenstein of old and new, for example the section you merged re licensed is not valid in the 2.1 version. I would prefer to have us use the ALA example and update from there. I'll send you a copy offline.

@mjy
Copy link

mjy commented May 17, 2022

I think I found our main problem:

The validator is likely running the second. We'll update our generator to use that and go from there.

@mdoering
Copy link
Member

Yes, eml-gbif-profile.xsd is the correct schema for the GBIF profile. It uses a subset of the entire EML and adds GBIF specific elements under the additionalMetadata extension slot. The eml.xsd is only the part of the EML schema that is used for the GBIF profile. The GBIF additions live in their own namespace and hence need a separate schema file.

@CecSve
Copy link
Author

CecSve commented Jun 28, 2022

@mjy and @tmcelrath were you able to solve this issue?

@tmcelrath
Copy link

Yes, we were. @mjy what exactly was the solution in the XML stuff at the top of the file?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants