Skip to content

Commit 1f913b4

Browse files
jamesfwoodl2ss-py botdkauf42podaac-cloud-dsafrankinspace
authored
Release/2.2.0 (#124)
* /version 2.2.0-alpha.0 * Feature/issue 85 (#112) * Add initial poetry setup guidance to the README * Update CHANGELOG "unreleased" with issue 85 * Adding collections UAT: C1242387621-POCLOUD * Adding collections UAT: C1238658389-POCLOUD * Feature/issue 115 (#116) * make note in README of `-E harmony` install option for tests * Update CHANGELOG.md Co-authored-by: dkaufma3 <[email protected]> * Adding collections OPS: C2152045877-POCLOUD * Feature/issue-110 (#117) * Add extra line of logic to catch timedelta time cases * Updated Changelog * Add logic to handle time attributes for he5 time converted files * Linted code * Feature/issue 119 (#120) * Add extra line of logic to catch timedelta time cases * Updated Changelog * Add logic to handle time attributes for he5 time converted files * Linted code * Added extra logic for compute time vars for cases without any variables * change back chunking * Update Changelog * Feature/issue 122 (#123) * Add extra line of logic to catch timedelta time cases * Updated Changelog * Add logic to handle time attributes for he5 time converted files * Linted code * Fix for ncdataset rename deprication - test writing to follow * Remove unnecessary comments, changing to issue 119 * Added test and linted for duplicate dimension name change * Updated Changelog.md * Release 2.2.0 * /version 2.2.0-rc.1 * Updated to rc.2 * /version 2.2.0-rc.3 Co-authored-by: l2ss-py bot <[email protected]> Co-authored-by: Daniel Kaufman <[email protected]> Co-authored-by: podaac-cloud-dsa <[email protected]> Co-authored-by: Frank Greguska <[email protected]> Co-authored-by: dkaufma3 <[email protected]> Co-authored-by: James Wood <[email protected]> Co-authored-by: Nick Lenssen <[email protected]>
1 parent 53b28cc commit 1f913b4

9 files changed

+166
-28
lines changed

CHANGELOG.md

Lines changed: 21 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,25 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
55
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
66

77
## [Unreleased]
8+
### Added
9+
### Changed
10+
### Deprecated
11+
### Removed
12+
### Fixed
13+
### Security
14+
15+
## [2.2.0]
16+
### Added
17+
### Changed
18+
- [issue/115](https://github.com/podaac/l2ss-py/issues/115): Added notes to README about installing "extra" harmony dependencies to avoid test suite fails.
19+
- [issue/85](https://github.com/podaac/l2ss-py/issues/85): Added initial poetry setup guidance to the README
20+
- [issue/122](https://github.com/podaac/l2ss-py/issues/122): Changed renaming of duplicate dimension from netcdf4 to xarray per issues in the netcdf.rename function. https://github.com/Unidata/netcdf-c/issues/1672
21+
### Deprecated
22+
### Removed
23+
### Fixed
24+
- [issue/119](https://github.com/podaac/l2ss-py/issues/119): Add extra line for variables without any dimensions after a squeeze in compute_time_vars():
25+
- [issue/110](https://github.com/podaac/l2ss-py/issues/110): Get the start date in convert_times and reconvert times into original type in _recombine groups method.
26+
### Security
827

928
## [2.1.1]
1029
### Changed
@@ -32,8 +51,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
3251
- **Breaking Change** [issue/99](https://github.com/podaac/l2ss-py/issues/99): Removed support for python 3.7
3352
### Fixed
3453
- [issue/95](https://github.com/podaac/l2ss-py/issues/95): Fix non variable subsets for OMI since variables are not in the same group as the lat lon variables
35-
3654
### Security
55+
56+
3757
## [1.5.0]
3858
### Added
3959
- Added Shapefile option to UMM-S entry

README.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,18 @@ Harmony service for subsetting L2 data. l2ss-py supports:
1616

1717
If you would like to contribute to l2ss-py, refer to the [contribution document](CONTRIBUTING.md).
1818

19+
## Initial setup, with poetry
20+
21+
1. Follow the instructions for installing `poetry` [here](https://python-poetry.org/docs/).
22+
2. Install l2ss-py, with its dependencies, by running the following from the repository directory:
23+
24+
```
25+
poetry install
26+
```
27+
28+
***Note:*** l2ss-py can be installed as above and run without any dependency on `harmony`.
29+
However, to additionally test the harmony adapter layer,
30+
extra dependencies can be installed with `poetry install -E harmony`.
1931

2032
## How to test l2ss-py locally
2133

@@ -33,6 +45,11 @@ You can generate coverage reports as follows:
3345
poetry run pytest --junitxml=build/reports/pytest.xml --cov=podaac/ --cov-report=html -m "not aws and not integration" tests/
3446
```
3547

48+
***Note:*** The majority of the tests execute core functionality of l2ss-py without ever interacting with the harmony python modules.
49+
The `test_subset_harmony` tests, however, are explicitly for testing the harmony adapter layer
50+
and do require the harmony optional dependencies be installed,
51+
as described above with the `-E harmony` argument.
52+
3653
### l2ss-py script
3754

3855
You can run l2ss-py on a single granule without using Harmony. In order

cmr/ops_associations.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,3 +52,4 @@ C2251465126-POCLOUD
5252
C2254232941-POCLOUD
5353
C2251464384-POCLOUD
5454
C2247621105-POCLOUD
55+
C2152045877-POCLOUD

cmr/uat_associations.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,3 +38,4 @@ C1238621088-POCLOUD
3838
C1240739713-POCLOUD
3939
C1244459498-POCLOUD
4040
C1242387621-POCLOUD
41+
C1238658389-POCLOUD

podaac/subsetter/dimension_cleanup.py

Lines changed: 17 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -24,17 +24,18 @@ def remove_duplicate_dims(nc_dataset):
2424
is changed to the original name.
2525
"""
2626
dup_vars = {}
27+
dup_new_varnames = []
2728
for var_name, var in nc_dataset.variables.items():
2829
dim_list = list(var.dimensions)
2930
if len(set(dim_list)) != len(dim_list): # get true if var.dimensions has a duplicate
3031
dup_vars[var_name] = var # populate dictionary with variables with vars with dup dims
3132
for dup_var_name, dup_var in dup_vars.items():
3233
dim_list = list(dup_var.dimensions) # list of original dimensions of variable with dup dims
33-
# get the dimensions that is duplicated
34+
# get the dimensions that are duplicated
3435
dim_dup = [item for item, count in collections.Counter(dim_list).items() if count > 1][0]
3536
dim_dup_new = dim_dup+'_1'
36-
3737
var_name_new = dup_var_name+'_1'
38+
dup_new_varnames.append(var_name_new)
3839

3940
# create new dimension by copying from the duplicated dimension
4041

@@ -61,6 +62,18 @@ def remove_duplicate_dims(nc_dataset):
6162
data[var_name_new].setncattr(attrname, nc_dataset.variables[dup_var_name].getncattr(attrname))
6263
data[var_name_new][:] = nc_dataset.variables[dup_var_name][:]
6364
del nc_dataset.variables[dup_var_name]
64-
nc_dataset.renameVariable(var_name_new, dup_var_name)
6565

66-
return nc_dataset
66+
# return the variables that will need to be renamed: Rename method is still an issue per https://github.com/Unidata/netcdf-c/issues/1672
67+
return nc_dataset, dup_new_varnames
68+
69+
70+
def rename_dup_vars(dataset, rename_vars):
71+
"""
72+
NetCDF4 rename function raises and HDF error for variable in S5P files with duplicate dimensions
73+
This method will use xarray to rename the variables
74+
"""
75+
for i in rename_vars:
76+
original_name = i[:-2]
77+
dataset = dataset.rename({i: original_name})
78+
79+
return dataset

podaac/subsetter/subset.py

Lines changed: 27 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -522,6 +522,8 @@ def compute_time_variable_name(dataset, lat_var):
522522
if "time" in var_name and dataset[var_name].squeeze().dims == lat_var.squeeze().dims:
523523
return var_name
524524
for var_name in list(dataset.data_vars.keys()):
525+
if len(dataset[var_name].squeeze().dims) == 0:
526+
continue
525527
if 'time' in var_name.lower() and dataset[var_name].squeeze().dims[0] in lat_var.squeeze().dims:
526528
return var_name
527529

@@ -946,7 +948,7 @@ def walk(group_node, path):
946948
return nc_dataset
947949

948950

949-
def recombine_grouped_datasets(datasets, output_file): # pylint: disable=too-many-branches
951+
def recombine_grouped_datasets(datasets, output_file, start_date): # pylint: disable=too-many-branches
950952
"""
951953
Given a list of xarray datasets, combine those datasets into a
952954
single netCDF4 Dataset and write to the disk. Each dataset has been
@@ -978,7 +980,7 @@ def recombine_grouped_datasets(datasets, output_file): # pylint: disable=too-ma
978980
dim_group.createDimension(new_dim_name, dataset.dims[dim_name])
979981

980982
# Rename variables
981-
_rename_variables(dataset, base_dataset)
983+
_rename_variables(dataset, base_dataset, start_date)
982984

983985
# Remove group vars from base dataset
984986
for var_name in list(base_dataset.variables.keys()):
@@ -1003,7 +1005,7 @@ def _get_nested_group(dataset, group_path):
10031005
return nested_group
10041006

10051007

1006-
def _rename_variables(dataset, base_dataset):
1008+
def _rename_variables(dataset, base_dataset, start_date):
10071009
for var_name in list(dataset.variables.keys()):
10081010
new_var_name = var_name.split(GROUP_DELIM)[-1]
10091011
var_group = _get_nested_group(base_dataset, var_name)
@@ -1014,10 +1016,13 @@ def _rename_variables(dataset, base_dataset):
10141016
) or np.issubdtype(
10151017
dataset.variables[var_name].dtype, np.dtype(np.timedelta64)
10161018
):
1017-
1018-
cf_dt_coder = xr.coding.times.CFDatetimeCoder()
1019-
encoded_var = cf_dt_coder.encode(dataset.variables[var_name])
1020-
variable = encoded_var
1019+
if start_date:
1020+
dataset.variables[var_name].values = (dataset.variables[var_name].values - np.datetime64(start_date))/np.timedelta64(1, 's')
1021+
variable = dataset.variables[var_name]
1022+
else:
1023+
cf_dt_coder = xr.coding.times.CFDatetimeCoder()
1024+
encoded_var = cf_dt_coder.encode(dataset.variables[var_name])
1025+
variable = encoded_var
10211026

10221027
var_attrs = variable.attrs
10231028
fill_value = var_attrs.get('_FillValue')
@@ -1134,15 +1139,19 @@ def convert_to_datetime(dataset, time_vars):
11341139
# adjust the time values from the start date
11351140
if start_date:
11361141
dataset[var].values = [start_date + datetime.timedelta(seconds=i) for i in dataset[var].values]
1137-
# copy the values from the utc time variable to the time variable
1138-
else:
1139-
utc_var_name = compute_utc_name(dataset)
1140-
if utc_var_name:
1141-
dataset[var].values = [datetime.datetime(i[0], i[1], i[2], hour=i[3], minute=i[4], second=i[5]) for i in dataset[utc_var_name].values]
1142+
return dataset, start_date
1143+
1144+
utc_var_name = compute_utc_name(dataset)
1145+
if utc_var_name:
1146+
start_seconds = dataset[var].values[0]
1147+
dataset[var].values = [datetime.datetime(i[0], i[1], i[2], hour=i[3], minute=i[4], second=i[5]) for i in dataset[utc_var_name].values]
1148+
start_date = dataset[var].values[0] - np.timedelta64(int(start_seconds), 's')
1149+
return dataset, start_date
1150+
11421151
else:
11431152
pass
11441153

1145-
return dataset
1154+
return dataset, start_date
11461155

11471156

11481157
def subset(file_to_subset, bbox, output_file, variables=None,
@@ -1210,7 +1219,7 @@ def subset(file_to_subset, bbox, output_file, variables=None,
12101219
if has_groups:
12111220
nc_dataset = transform_grouped_dataset(nc_dataset, file_to_subset)
12121221

1213-
nc_dataset = dc.remove_duplicate_dims(nc_dataset)
1222+
nc_dataset, rename_vars = dc.remove_duplicate_dims(nc_dataset)
12141223

12151224
if variables:
12161225
variables = [x.replace('/', GROUP_DELIM) for x in variables]
@@ -1227,14 +1236,16 @@ def subset(file_to_subset, bbox, output_file, variables=None,
12271236
xr.backends.NetCDF4DataStore(nc_dataset),
12281237
**args
12291238
) as dataset:
1239+
dataset = dc.rename_dup_vars(dataset, rename_vars)
12301240
lat_var_names, lon_var_names, time_var_names = get_coordinate_variable_names(
12311241
dataset=dataset,
12321242
lat_var_names=lat_var_names,
12331243
lon_var_names=lon_var_names,
12341244
time_var_names=time_var_names
12351245
)
1246+
start_date = None
12361247
if min_time or max_time:
1237-
dataset = convert_to_datetime(dataset, time_var_names)
1248+
dataset, start_date = convert_to_datetime(dataset, time_var_names)
12381249
chunks = calculate_chunks(dataset)
12391250
if chunks:
12401251
dataset = dataset.chunk(chunks)
@@ -1306,7 +1317,7 @@ def subset(file_to_subset, bbox, output_file, variables=None,
13061317
dataset.load().to_netcdf(output_file, 'w', encoding=encoding)
13071318

13081319
if has_groups:
1309-
recombine_grouped_datasets(datasets, output_file)
1320+
recombine_grouped_datasets(datasets, output_file, start_date)
13101321
# Check if the spatial bounds are all 'None'. This means the
13111322
# subset result is empty.
13121323
if any(bound is None for bound in spatial_bounds):

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212

1313
[tool.poetry]
1414
name = "l2ss-py"
15-
version = "2.1.1"
15+
version = "2.2.0-rc.3"
1616
description = "L2 Subsetter Service"
1717
authors = ["podaac-tva <[email protected]>"]
1818
license = "Apache-2.0"
31.9 MB
Binary file not shown.

tests/test_subset.py

Lines changed: 81 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1290,6 +1290,34 @@ def test_duplicate_dims_sndr(self):
12901290
for var_name, variable in in_nc.variables.items():
12911291
assert in_nc[var_name].shape == out_nc[var_name].shape
12921292

1293+
def test_duplicate_dims_tropomi(self):
1294+
"""
1295+
Check if SNDR Climcaps files run successfully even though
1296+
these files have variables with duplicate dimensions
1297+
"""
1298+
TROP_dir = join(self.test_data_dir, 'tropomi')
1299+
trop_file = 'S5P_OFFL_L2__AER_LH_20210704T005246_20210704T023416_19290_02_020200_20210708T023111.nc'
1300+
1301+
bbox = np.array(((-180, 180), (-90, 90)))
1302+
output_file = "{}_{}".format(self._testMethodName, trop_file)
1303+
shutil.copyfile(
1304+
os.path.join(TROP_dir, trop_file),
1305+
os.path.join(self.subset_output_dir, trop_file)
1306+
)
1307+
box_test = subset.subset(
1308+
file_to_subset=join(self.subset_output_dir, trop_file),
1309+
bbox=bbox,
1310+
output_file=join(self.subset_output_dir, output_file)
1311+
)
1312+
# check if the box_test is
1313+
1314+
in_nc = nc.Dataset(join(TROP_dir, trop_file))
1315+
out_nc = nc.Dataset(join(self.subset_output_dir, output_file))
1316+
1317+
for var_name, variable in in_nc.groups['PRODUCT'].groups['SUPPORT_DATA'].groups['DETAILED_RESULTS'].variables.items():
1318+
assert variable.shape == out_nc.groups['PRODUCT'].groups['SUPPORT_DATA'].groups['DETAILED_RESULTS'].variables[var_name].shape
1319+
1320+
12931321
def test_omi_novars_subset(self):
12941322
"""
12951323
Check that the OMI variables are conserved when no variable are specified
@@ -1314,8 +1342,9 @@ def test_omi_novars_subset(self):
13141342
in_nc = nc.Dataset(join(omi_dir, omi_file))
13151343
out_nc = nc.Dataset(join(self.subset_output_dir, output_file))
13161344

1317-
for var_name, variable in in_nc.variables.items():
1318-
assert in_nc[var_name].shape == out_nc[var_name].shape
1345+
for var_name, variable in in_nc.groups['HDFEOS'].groups['SWATHS'].groups['OMI Total Column Amount SO2'].groups['Geolocation Fields'].variables.items():
1346+
assert in_nc.groups['HDFEOS'].groups['SWATHS'].groups['OMI Total Column Amount SO2'].groups['Geolocation Fields'].variables[var_name].shape == \
1347+
out_nc.groups['HDFEOS'].groups['SWATHS'].groups['OMI Total Column Amount SO2'].groups['Geolocation Fields'].variables[var_name].shape
13191348

13201349

13211350
def test_root_group(self):
@@ -1691,12 +1720,58 @@ def test_temporal_he5file_subset(self):
16911720
if 'BRO' in i:
16921721
assert any('utc' in x.lower() for x in time_var_names)
16931722

1694-
1695-
dataset = subset.convert_to_datetime(dataset, time_var_names)
1696-
1723+
dataset, start_date = subset.convert_to_datetime(dataset, time_var_names)
16971724
assert dataset[time_var_names[0]].dtype == 'datetime64[ns]'
1698-
16991725

1726+
1727+
def test_he5_timeattrs_output(self):
1728+
"""Test that the time attributes in the output match the attributes of the input for OMI test files"""
1729+
1730+
omi_dir = join(self.test_data_dir, 'OMI')
1731+
omi_file = 'OMI-Aura_L2-OMBRO_2020m0116t1207-o82471_v003-2020m0116t182003.he5'
1732+
omi_file_input = 'input'+omi_file
1733+
bbox = np.array(((-180, 90), (-90, 90)))
1734+
output_file = "{}_{}".format(self._testMethodName, omi_file)
1735+
shutil.copyfile(
1736+
os.path.join(omi_dir, omi_file),
1737+
os.path.join(self.subset_output_dir, omi_file)
1738+
)
1739+
shutil.copyfile(
1740+
os.path.join(omi_dir, omi_file),
1741+
os.path.join(self.subset_output_dir, omi_file_input)
1742+
)
1743+
1744+
min_time='2020-01-16T12:30:00Z'
1745+
max_time='2020-01-16T12:40:00Z'
1746+
bbox = np.array(((-180, 180), (-90, 90)))
1747+
nc_dataset_input = nc.Dataset(os.path.join(self.subset_output_dir, omi_file_input))
1748+
incut_set = nc_dataset_input.groups['HDFEOS'].groups['SWATHS'].groups['OMI Total Column Amount BrO'].groups['Geolocation Fields']
1749+
xr_dataset_input = xr.open_dataset(xr.backends.NetCDF4DataStore(incut_set))
1750+
inattrs = xr_dataset_input['Time'].attrs
1751+
1752+
subset.subset(
1753+
file_to_subset=os.path.join(self.subset_output_dir, omi_file),
1754+
bbox=bbox,
1755+
output_file=os.path.join(self.subset_output_dir, output_file),
1756+
min_time=min_time,
1757+
max_time=max_time
1758+
)
1759+
1760+
output_ncdataset = nc.Dataset(os.path.join(self.subset_output_dir, output_file))
1761+
outcut_set = output_ncdataset.groups['HDFEOS'].groups['SWATHS'].groups['OMI Total Column Amount BrO'].groups['Geolocation Fields']
1762+
xrout_dataset = xr.open_dataset(xr.backends.NetCDF4DataStore(outcut_set))
1763+
outattrs = xrout_dataset['Time'].attrs
1764+
1765+
for key in inattrs.keys():
1766+
if isinstance(inattrs[key], np.ndarray):
1767+
if np.array_equal(inattrs[key],outattrs[key]):
1768+
pass
1769+
else:
1770+
raise AssertionError('Attributes for {} do not equal each other'.format(key))
1771+
else:
1772+
assert inattrs[key] == outattrs[key]
1773+
1774+
17001775
def test_temporal_subset_lines(self):
17011776
bbox = np.array(((-180, 180), (-90, 90)))
17021777
file = 'SWOT_L2_LR_SSH_Expert_368_012_20121111T235910_20121112T005015_DG10_01.nc'

0 commit comments

Comments
 (0)