Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trt 556 final schema clean up #38

Conversation

owenlittlejohns
Copy link
Member

@owenlittlejohns owenlittlejohns commented Sep 10, 2024

Description

This PR performs the final clean-up relating to the recommendations for cleaning up the earthdata-varinfo configuration file schema (recommendations 4 to 7 here). Specifically:

  • Grid_Mapping_Data is removed (as this information wasn't directly used, and can be included as metadata attributes in CFOverrides.
  • ProductEpochs is removed (as this information wasn't directly used, and can be included as a units metadata attribute in the more CF Convention compliant manner via CFOverrides).
  • The schema now uses PascalCase for all attributes of the schema.

This is mostly small stuff updates, and I think it closes out the last of the proposed schema changes.

Jira Issue ID

TRT-556

Local Test Steps

  • Pull this branch.
  • Pip install all the local dependencies.
  • Try and parse a local NetCDF-4 or DMR file. It should all work.
# Install dependencies:
$ pip install -r requirements.txt

# Install this version of the package (optional, if running from the root of the directory):
$ pip install -e .

# Parse a file (with/without a configuration file, this is now in a Python repl):
> from varinfo import VarInfoFromNetCDF4
>
> vi_instance = VarInfoFromNetCDF4('path/to/file.nc4', short_name='<collection short name>', config_file='/path/to/config.json')

PR Acceptance Checklist

  • Jira ticket acceptance criteria met.
  • CHANGELOG.md updated to include high level summary of PR changes.
  • VERSION updated if publishing a release. (This is a feature branch)
  • Tests added/updated and passing.
  • Documentation updated (if needed).

@owenlittlejohns owenlittlejohns changed the base branch from main to TRT-552-configuration-file-schema-v2 September 10, 2024 20:42

* The casing of all attributes in the configuration file schema has been
updated to be `PascalCase` throughout. Affected schema attributes include:
* `CFOverrides`
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something I've been toying with through the last couple of PRs... is CFOverrides the best name now? Part of me thinks MetadataOverrides would be better, but that feels like a big change. I'd love to hear some thoughts here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's only in the codebase 19 times. It is a big change, but is it better? Probably.

Copy link
Member Author

@owenlittlejohns owenlittlejohns Sep 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the main client impact is really the change in the configuration file section name.

I think the only other two user-facing places that would be impacted are the CFConfig.cf_overrides attribute and the CFConfig.get_cf_overrides method (I think everything else is just plumbing). If I'm honest, my guess is that those things aren't being called by too many end users.

Given that this overall effort to update the schema is going to be the most disruptive change to the package, now is the time to do it. I think I'm leaning to making this change. (And if it seems terrible, I can always revert the commit before merging this PR)

cls.namespace = 'namespace_string'
cls.sample_config = 'config/0.0.1/sample_config_0.0.1.json'
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It felt off that some of the tests were coupled to the sample configuration file from the config directory. I've added the minimal amount of information to the test_config.json file that is used everywhere else to make all the tests pass (primarily adding a couple of short name paths, and a couple of entries in the mapping between mission and short name). Now all the tests that use a configuration file use the one in the test directory, which feels better.

},
"required": ["Grid_Mapping_Dataset_Name", "grid_mapping_name"]
}
},
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a big change to the schema. The biggest contribution to the lines deleted are the attributes allowed in the Grid_Mapping_Data section, which are a limited set of attributes, whereas the CFOverrides (or MetadataOverrides?) cover a much broader array of metadata attributes, so it becomes untenable to define all the options for attributes there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you losing any of that information? or was most of it copied from somewhere else in the first place? I'm ok with it all going since it can be added as a MetadataOverrides object.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the specific attributes definitions were just copied from the CF Conventions page itself.

"type": "array",
"items": {
"$ref": "#/$defs/MissionVariablePatternType"
}
},
"Required_Fields": {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry - I meant to add a comment here with the rest. The use of "Fields" here is out of sync with the rest of the schema, which uses "Variables" throughout for the same concept. I wanted to make this more consistent, particularly with ExcludedScienceVariables.

Copy link
Member

@flamingbear flamingbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, one question about dropping stuff from the sample config (but I'm pretty sure the answer is yet) and I'd rename CFOverrides now because just because we use it for that, it's not specific to CF.

},
"required": ["Grid_Mapping_Dataset_Name", "grid_mapping_name"]
}
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you losing any of that information? or was most of it copied from somewhere else in the first place? I'm ok with it all going since it can be added as a MetadataOverrides object.

".*shot_number"
]
}
],
"ProductEpochs": [
"CFOverrides": [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do it. MetadataOverrides

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

578faf8 🎉

Comment on lines -46 to -64
"Mission": "ICESat2"
"Mission": "SMAP",
"ShortNamePath": "SPL4.*"
},
"Epoch": "2005-01-01T00:00:00.000000"
"Attributes": [
{
"Name": "grid_mapping",
"Value": "/EASE2_global_projection"
}
]
},
{
"Applicability": {
"Mission": "GEDI"
"Mission": "SMAP",
"ShortNamePath": "SPL4.*",
"VariablePattern": "/EASE2_global_projection"
},
"Epoch": "2018-01-01T00:00:00.000000"
}
],
"Grid_Mapping_Data": [
{
"Grid_Mapping_Dataset_Name": "EASE2_Global",
"grid_mapping_name": "lambert_cylindrical_equal_area",
"standard_parallel": 30.0,
"longitude_of_central_meridian": 0.0,
"false_easting": 0.0,
"false_northing": 0.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is all just deleted, and not added because it wasn't used?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ProductEpoch was entirely unused, so I just deleted it. I moved the content from Grid_Mapping_Data to the CFOverrides/MetadataOverrides section, though, just to provide examples for how that information would be included in the future. (In theory that sample configuration file should probably be slimmed down to a minimum number of examples, but I think that some of those rules were placed there so DAS has them somewhere)

Copy link
Member Author

@owenlittlejohns owenlittlejohns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick summary of new changes:

  • Rename CFOverrides to MetadataOverrides.
  • Rename CFConfig.cf_overrides to CFConfig.metadata_overrides.
  • Rename CFConfig.get_cf_overrides to CFConfig.get_metadata_overrides.
  • Rename the cf_overrides property on all the Attribute/Group/Variable classes to be metadata_overrides. (That sounds like a lot, but it's mostly sorted via inheritance)
  • Review change logs (both schema and overall package) and reword/reorder things based on importance rather than the ordering in which they've been done on this feature branch.

Copy link
Collaborator

@eni-awowale eni-awowale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested with ATL06 and worked as expected!

@owenlittlejohns owenlittlejohns merged commit 17c57a7 into TRT-552-configuration-file-schema-v2 Sep 11, 2024
3 checks passed
@owenlittlejohns owenlittlejohns deleted the TRT-556-final-schema-clean-up branch September 11, 2024 15:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants