Add `dataset.services()` method to list available Harmony services #500

nikki-t · 2024-03-25T15:23:00Z

Github Issue: 447

Description

List available Harmony services for a collection. As a first step to facilitating the use of services in earthaccess, earthaccess was modified so that it can list the available services for a collection.

Overview of work done

A new services module was created that performs service queries. The results module was updated to include a services function which will use the services module to query a collection's services and return the provider-id and umm JSON for the service.

Sample services results:

{
    "provider-id": "POCLOUD",
    "umm": {
        "URL": {
            "Description": "the main access point",
            "URLValue": "https://opendap.earthdata.nasa.gov/"
        },
        "Type": "OPeNDAP",
        "ServiceKeywords": [
            {
                "ServiceCategory": "EARTH SCIENCE SERVICES",
                "ServiceTopic": "DATA MANAGEMENT/DATA HANDLING",
                "ServiceTerm": "DATA SUBSETTING/SUPERSETTING",
                "ServiceSpecificTerm": "SPATIAL SUBSETTING"
            },
            {
                "ServiceCategory": "EARTH SCIENCE SERVICES",
                "ServiceTopic": "DATA MANAGEMENT/DATA HANDLING",
                "ServiceTerm": "DATA SUBSETTING/SUPERSETTING",
                "ServiceSpecificTerm": "TEMPORAL SUBSETTING"
            },
            {
                "ServiceCategory": "EARTH SCIENCE SERVICES",
                "ServiceTopic": "DATA MANAGEMENT/DATA HANDLING",
                "ServiceTerm": "DATA SUBSETTING/SUPERSETTING",
                "ServiceSpecificTerm": "VARIABLE SUBSETTING"
            }
        ],
        "ServiceOrganizations": [
            {
                "Roles": [
                    "DEVELOPER"
                ],
                "ShortName": "UCAR/UNIDATA",
                "LongName": "Unidata, University Corporation for Atmospheric Research"
            }
        ],
        "OperationMetadata": [
            {
                "DistributedComputingPlatform": [
                    "WEBSERVICES"
                ]
            }
        ],
        "Description": "Earthdata OPEnDAP in the cloud",
        "Version": "9",
        "Name": "PO.DAAC Cloud OPeNDAP",
        "ServiceOptions": {
            "Subset": {
                "SpatialSubset": {
                    "BoundingBox": {
                        "AllowMultipleValues": false
                    }
                },
                "TemporalSubset": {
                    "AllowMultipleValues": false
                },
                "VariableSubset": {
                    "AllowMultipleValues": true
                }
            },
            "SupportedReformattings": [
                {
                    "SupportedInputFormat": "NETCDF-4",
                    "SupportedOutputFormats": [
                        "ASCII",
                        "CSV",
                        "NETCDF-3",
                        "NETCDF-4"
                    ]
                },
                {
                    "SupportedInputFormat": "HDF5",
                    "SupportedOutputFormats": [
                        "ASCII",
                        "CSV",
                        "NETCDF-3",
                        "NETCDF-4"
                    ]
                }
            ]
        },
        "MetadataSpecification": {
            "URL": "https://cdn.earthdata.nasa.gov/umm/service/v1.5.2",
            "Name": "UMM-S",
            "Version": "1.5.2"
        },
        "LongName": "PO.DAAC OPeNDADP In the Cloud"
    }
}

Overview of verification done

Tested new service functionality on four collections:

HLSS30
MUR-JPL-L4-GLOB-v4.1
VIIRS_NPP-JPL-L2P-v2016.2
ATL22
TOMSN7SO2

Overview of integration done

Created a new unit test to test DataService get. Unit test coverage:

---------- coverage: platform linux, python 3.11.6-final-0 -----------
Name                                    Stmts   Miss  Cover   Missing
---------------------------------------------------------------------
earthaccess/__init__.py                    31      3    90%   77-81
earthaccess/api.py                        107     77    28%   26-28, 65-78, 117, 124, 143-158, 182-193, 211-213, 234-239, 248-252, 261-265, 284-285, 307-308, 328-336, 345-346, 350-355
earthaccess/auth.py                       207     80    61%   17-18, 49-63, 95-96, 122-154, 188-206, 211-215, 242, 251-252, 260-270, 316-317, 346-358, 362-372, 384
earthaccess/daac.py                        20      6    70%   136, 140-147
earthaccess/formatters.py                  17     10    41%   11, 18, 22-59
earthaccess/kerchunk.py                    32     26    19%   13-22, 32-58
earthaccess/results.py                    152     81    47%   28-31, 34-47, 88-97, 107-109, 117, 124-130, 137-139, 146-148, 155-158, 165-166, 174-176, 184-200, 212-220, 223, 258-261, 268-276, 283-284, 287-290, 306-317, 321-329, 355, 379-380
earthaccess/search.py                     305    139    54%   47, 63-72, 88-89, 99-100, 114, 129-133, 146-150, 165-180, 184-186, 194-195, 203-204, 217, 235-236, 244, 276-312, 374, 392-396, 421, 441-442, 450, 458-464, 474-475, 489-498, 511-516, 525-526, 534-535, 543-544, 552-553, 564-565, 573-574, 582, 588, 628, 634-638, 641, 650, 654, 660-662, 665, 678-679, 721-722, 731-732, 741-742, 772-773, 782-783, 795-803
earthaccess/services.py                    34      6    82%   31, 51, 57-58, 61, 66
earthaccess/store.py                      289    190    34%   27-28, 31, 34, 42, 50-55, 62-75, 82-86, 109-114, 118-121, 124-125, 128-131, 134-137, 148, 156-159, 183-190, 193-194, 214, 224, 239-242, 292, 310-312, 331, 340-385, 394-440, 467-478, 506, 516-536, 546-582, 595-618, 634-649, 656-665
earthaccess/utils/_validation.py            5      3    40%   5-7
earthaccess/widgets.py                      0      0   100%
tests/unit/__init__.py                      0      0   100%
tests/unit/conftest.py                      0      0   100%
tests/unit/test_auth.py                    60      2    97%   133-134
tests/unit/test_collection_queries.py      30      0   100%
tests/unit/test_formatters.py               0      0   100%
tests/unit/test_granule_queries.py         22      0   100%
tests/unit/test_results.py                 22      0   100%
tests/unit/test_store.py                   58      0   100%
---------------------------------------------------------------------
TOTAL                                    1391    623    55%

========================================================================================================= 26 passed, 2 warnings in 1.75s =========================================================================================================
+ bash ./scripts/lint.sh
+ mypy earthaccess --disallow-untyped-defs
Success: no issues found in 12 source files
+ ruff check .

Create a new integration test that tests the services functionality from the service query to the parsing of query results. Integration test coverage:

---------- coverage: platform linux, python 3.11.6-final-0 -----------
Name                                        Stmts   Miss  Cover   Missing
-------------------------------------------------------------------------
earthaccess/__init__.py                        31      4    87%   68-72, 75
earthaccess/api.py                            107     33    69%   77, 124, 144-152, 190-192, 213, 237-238, 248-252, 261-265, 284-285, 331-334, 336, 345-346
earthaccess/auth.py                           207     59    71%   17-18, 122-154, 188-206, 211-212, 242, 251-252, 260-261, 266, 269, 346-358, 362-372, 384
earthaccess/daac.py                            20      2    90%   136, 147
earthaccess/formatters.py                      17     10    41%   11, 18, 22-59
earthaccess/kerchunk.py                        32     26    19%   13-22, 32-58
earthaccess/results.py                        152     58    62%   28-31, 34-47, 88-97, 107-109, 124-130, 137-139, 146-148, 155-158, 165-166, 174-176, 193, 223, 258-261, 268-276, 283-284, 287-290, 316-317, 321-329, 355, 379-380
earthaccess/search.py                         305     81    73%   69-70, 88-89, 114, 129-133, 146-150, 172, 184-186, 194-195, 203-204, 217, 235-236, 284, 290-294, 297, 304, 392-396, 421, 441-442, 459, 474-475, 490, 511-516, 525-526, 543-544, 552-553, 574, 582, 628, 634-638, 641, 650, 662, 678-679, 721-722, 731-732, 741-742, 772-773, 782-783, 795-803
earthaccess/services.py                        34      6    82%   31, 51, 57-58, 61, 66
earthaccess/store.py                          289     89    69%   34, 42, 62-75, 109-114, 118-121, 124-125, 128-131, 159, 183-190, 193-194, 214, 224, 241-242, 312, 331, 345, 373-374, 383-385, 394-440, 468-470, 478, 506, 523-536, 596, 612-617, 635, 637, 660-661
earthaccess/utils/_validation.py                5      0   100%
earthaccess/widgets.py                          0      0   100%
tests/integration/conftest.py                   9      5    44%   8-12
tests/integration/test_api.py                  51      0   100%
tests/integration/test_auth.py                 82     15    82%   56, 64-65, 87-95, 106, 117-118
tests/integration/test_cloud_download.py       80     13    84%   88-89, 130-133, 141-142, 147-156, 169
tests/integration/test_cloud_open.py           82      5    94%   101, 134-135, 159, 169
tests/integration/test_kerchunk.py             41     33    20%   11-87
tests/integration/test_onprem_download.py      82      5    94%   94, 127-128, 159, 161
tests/integration/test_onprem_open.py          75      4    95%   93, 126-127, 151
tests/integration/test_services.py             24      0   100%
-------------------------------------------------------------------------
TOTAL                                        1725    448    74%

================================================================================================================= short test summary info ==================================================================================================================
FAILED tests/integration/test_api.py::test_download[True-0] - ValueError: earthaccess can't yet guess the provider for cloud collections, we need to use one from earthaccess.list_cloud_providers()
FAILED tests/integration/test_api.py::test_download[True-selection1] - ValueError: earthaccess can't yet guess the provider for cloud collections, we need to use one from earthaccess.list_cloud_providers()
FAILED tests/integration/test_auth.py::test_auth_can_read_from_netrc_file - AssertionError: False is not true
FAILED tests/integration/test_auth.py::test_auth_populates_attrs - AssertionError: False is not true
FAILED tests/integration/test_auth.py::test_auth_can_fetch_s3_credentials - AssertionError: False is not true
FAILED tests/integration/test_auth.py::test_get_s3_credentials_lowercase_location[location0] - assert {}
FAILED tests/integration/test_auth.py::test_get_s3_credentials_lowercase_location[location1] - assert {}
FAILED tests/integration/test_auth.py::test_get_s3fs_session_lowercase_location[location0] - requests.exceptions.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
FAILED tests/integration/test_auth.py::test_get_s3fs_session_lowercase_location[location1] - requests.exceptions.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
FAILED tests/integration/test_cloud_download.py::test_earthaccess_can_download_cloud_collection_granules[daac0] - AssertionError: assert False
FAILED tests/integration/test_cloud_download.py::test_earthaccess_can_download_cloud_collection_granules[daac1] - AssertionError: assert False
FAILED tests/integration/test_cloud_download.py::test_earthaccess_can_download_cloud_collection_granules[daac2] - AssertionError: assert False
FAILED tests/integration/test_cloud_download.py::test_earthaccess_can_download_cloud_collection_granules[daac3] - AssertionError: assert False
FAILED tests/integration/test_cloud_download.py::test_earthaccess_can_download_cloud_collection_granules[daac4] - AssertionError: assert False
FAILED tests/integration/test_cloud_download.py::test_multi_file_granule - requests.exceptions.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
FAILED tests/integration/test_cloud_open.py::test_multi_file_granule - requests.exceptions.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
FAILED tests/integration/test_onprem_download.py::test_earthaccess_can_download_onprem_collection_granules[daac0] - AssertionError: False is not true
FAILED tests/integration/test_onprem_download.py::test_earthaccess_can_download_onprem_collection_granules[daac1] - AssertionError: False is not true
============================================================================================= 18 failed, 52 passed, 1 skipped, 3 warnings in 174.40s (0:02:54) =============================================================================================

PR checklist:

Linted & Formatted
Updated unit & integration tests
Updated changelog & readme
Updated documentation

Pending:

Pass currently failing integration tests.

📚 Documentation preview 📚: https://earthaccess--500.org.readthedocs.build/en/500/

nikki-t · 2024-03-25T15:25:03Z

It seems like the existing integration tests are failing due to credentials which I had set up as environment variables and in a .netrc file when running them. What is the best way to handle credentials in these tests? Also happy to move this to a draft if we want to work out those details first.

mfisher87 · 2024-03-25T15:25:13Z

Hope you don't mind the rename, was finding the old name difficult to remember in my notifications :)

mfisher87 · 2024-03-25T15:27:19Z

What is the best way to handle credentials in these tests?

@betolink @andypbarrett I think this may need documenting! As a developer, how do I run integration tests on my laptop?

…ature/issue-447

…rvices

…ccess into feature/issue-447

nikki-t · 2024-04-26T20:48:51Z

@mfisher87 - I think that I was able to fix the integration test and also modified the services unit and integration tests to use VCR. All unit tests are passing.

I ran the integration tests locally and here is the summary:

==================== short test summary info =================================

FAILED tests/integration/test_auth.py::test_auth_can_read_from_netrc_file - AssertionError: False is not true
FAILED tests/integration/test_auth.py::test_auth_populates_attrs - AssertionError: False is not true
FAILED tests/integration/test_auth.py::test_auth_can_fetch_s3_credentials - AssertionError: False is not true
FAILED tests/integration/test_auth.py::test_get_s3_credentials_lowercase_location[location0] - assert {}
FAILED tests/integration/test_auth.py::test_get_s3_credentials_lowercase_location[location1] - assert {}
FAILED tests/integration/test_auth.py::test_get_s3fs_session_lowercase_location[location0] - requests.exceptions.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
FAILED tests/integration/test_auth.py::test_get_s3fs_session_lowercase_location[location1] - requests.exceptions.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

====== 7 failed, 71 passed, 1 skipped in 496.39s (0:08:16) ==========================================

I wonder if these tests need to be run in the same AWS region as the data?

Other than that I think this PR can be merged but let me know how that typically works for earthaccess development.

mfisher87 · 2024-04-27T01:26:33Z

Will you be at hack day this coming week? I may not have time to review this before then 😬

nikki-t · 2024-04-28T17:48:44Z

@mfisher87 - Unfortunately I am on travel this coming week so I can't make it but I do plan to attend the next earthaccess hack day in a few weeks. We can discuss then if it's easier! 😄

mfisher87 · 2024-04-28T18:55:25Z

Sounds like a good plan :) Safe and fun travels!

…ature/issue-447

mfisher87 · 2024-05-14T15:44:25Z

@nikki-t I'll only attend 2nd half of hack day this week, had to schedule a dentist appointment at that time. I may be co-working on that call (I've used my funded Openscapes allocation), but feel free to interrupt me so we can have a quick chat. I love the how-to you added 🤩

docs/howto/search-services.md

earthaccess/services.py

Co-authored-by: Jessica Scheick <[email protected]>

nikki-t · 2024-06-03T18:14:36Z

@JessicaS11 - Thank you for the suggested updates, I have applied them!

@mfisher87 , @betolink - Do you think this PR is ready to be merged?

betolink · 2024-06-03T19:10:13Z

Hi @nikki-t the code looks good to me, there are some failing tests due a print statement which I'm not opposed to, @mfisher87do you know if there is a way to skip this rule on a file?

chuckwondo · 2024-06-03T19:20:05Z

Hi @nikki-t the code looks good to me, there are some failing tests due a print statement which I'm not opposed to, @mfisher87do you know if there is a way to skip this rule on a file?

Please use logging, not printing. We addressed this with #511.

betolink · 2024-06-03T19:23:15Z

Ahh right @chuckwondo cc @nikki-t

chuckwondo · 2024-06-03T19:27:27Z

earthaccess/services.py

+
+        page_size = min(limit, 2000)
+        url = self._build_url()
+
+        results = []  # type: List[str]
+        page = 1
+        while len(results) < limit:
+            params = {"page_size": page_size, "page_num": page}
+            if self._debug:
+                print(f"Fetching: {url}")
+            # TODO: implement caching
+            response = self.session.get(url, params=params)
+
+            try:
+                response.raise_for_status()
+            except exceptions.HTTPError as ex:
+                raise RuntimeError(ex.response.text)
+
+            if self._format == "json":
+                latest = response.json()["items"]
+            else:
+                latest = [response.text]
+
+            if len(latest) == 0:
+                break
+
+            results.extend(latest)
+            page += 1
+
+        return results


This code already exists in ServiceQuery, so simply call the superclass method. Ideally, this method doesn't need to be here at all, but for now we do this simply for generating docs.

Suggested change

page_size = min(limit, 2000)

url = self._build_url()

results = [] # type: List[str]

page = 1

while len(results) < limit:

params = {"page_size": page_size, "page_num": page}

if self._debug:

print(f"Fetching: {url}")

# TODO: implement caching

response = self.session.get(url, params=params)

try:

response.raise_for_status()

except exceptions.HTTPError as ex:

raise RuntimeError(ex.response.text)

if self._format == "json":

latest = response.json()["items"]

else:

latest = [response.text]

if len(latest) == 0:

break

results.extend(latest)

page += 1

return results

return super.get(limit)

I think this needs to be a call to super like this: super().get(limit). Will test and implement in code.

Oh, hang on. We want umm_json, not json, so we do need to implement this here because python_cmr currently parses only when the format is json, not umm_json.

However, we already implement this generally in search.get_results, so this should likely be something like so:

from .search import get_results def get(self, limit: int = 2000) -> List[Any]: return search.get_results(self.session, self, limit)

However, this will need you to tweak search.py as well, as follows:

First, change from cmr import CollectionQuery, GranuleQuery to from cmr import CollectionQuery, GranuleQuery, ServiceQuery

Next, change this:

def get_results( session: requests.Session, query: Union[CollectionQuery, GranuleQuery], limit: int = 2000, ) -> List[Any]:

to this:

def get_results( session: requests.Session, query: Union[CollectionQuery, GranuleQuery, ServiceQuery], limit: int = 2000, ) -> List[Any]:

When I make these changes I get a circular dependency as services.py tries to import from .search while results.py is trying to import DataCollection, DataGranule from .search. The results module uses DataService to query and parse results. Also see this comment.

Hmmm, yeah, I see the circularity. We may need to rethink where to put things to avoid circularities.

earthaccess/results.py

chuckwondo · 2024-06-03T19:29:58Z

earthaccess/results.py

+        if services:
+            for service in services:
+                if earthaccess.__auth__.authenticated:
+                    query = DataService(auth=earthaccess.__auth__).parameters(
+                        concept_id=service
+                    )
+                else:
+                    query = DataService().parameters(concept_id=service)
+                results = query.get(query.hits())
+                parsed[service] = self._parse_service_result(results)


Suggested change

if services:

for service in services:

if earthaccess.__auth__.authenticated:

query = DataService(auth=earthaccess.__auth__).parameters(

concept_id=service

)

else:

query = DataService().parameters(concept_id=service)

results = query.get(query.hits())

parsed[service] = self._parse_service_result(results)

for service in services:

if earthaccess.__auth__.authenticated:

query = DataService(auth=earthaccess.__auth__).parameters(

concept_id=service

)

else:

query = DataService().parameters(concept_id=service)

results = query.get(query.hits())

parsed[service] = self._parse_service_result(results)

chuckwondo · 2024-06-03T19:30:49Z

earthaccess/results.py

+                "provider-id": result_json["items"][0]["meta"]["provider-id"],
+                "umm": result_json["items"][0]["umm"],


Why are we getting only the item at index 0?

In the results response, the data always seems to be returned under a list of one element which contains all of the metadata. In order to provide some filtering, I chose only to return the provider_id and the UMM JSON response for each service. See attached sample-results-response.json.

It also looks like this is the case in the CMR API documentation but to be on the safe side I will figure out how to iterate over the list to make sure we don't miss anything.

When you address my suggestion I just added to the get method, you should not need to use json.loads because the get method will returned parsed results.

@chuckwondo - I can update the code to iterate over the "items" list in the CMR service query response but then I end up returning a list of lists to the end user. See attached parsed.json. What do you think of returning a list of lists that contains the items?

Here is an example:

"S2839491596-XYZ_PROV": [ [ { "provider-id": "XYZ_PROV", "umm": { "URL": { "Description": "https://harmony.earthdata.nasa.gov", "URLValue": "This is the Harmony root endpoint." }, "Type": "Harmony", "ServiceKeywords": [ { "ServiceCategory": "EARTH SCIENCE SERVICES", "ServiceTopic": "DATA MANAGEMENT/DATA HANDLING", "ServiceTerm": "DATA ACCESS/RETRIEVAL" }, { "ServiceCategory": "EARTH SCIENCE SERVICES", "ServiceTopic": "DATA MANAGEMENT/DATA HANDLING", "ServiceTerm": "DATA INTEROPERABILITY", "ServiceSpecificTerm": "DATA REFORMATTING" } ], "ServiceOrganizations": [ { "Roles": [ "DEVELOPER", "PUBLISHER", "SERVICE PROVIDER" ], "ShortName": "NASA/GSFC/EOS/EOSDIS/EMD", "LongName": "Maintenance and Development, Earth Observing System Data and Information System, Earth Observing System,Goddard Space Flight Center, NASA" } ], "Description": "Backend NetCDF-to-Zarr service option description for Harmony data transformations. Cannot be chained with other operations from this record.", "VersionDescription": "Semantic version number for the NetCDF-to-Zarr Docker image used by Harmony in production.", "Version": "1.2.0", "Name": "Harmony NetCDF-to-Zarr Service", "ContactPersons": [ { "Roles": [ "DEVELOPER" ], "FirstName": "Owen", "LastName": "Littlejohns", "ContactInformation": { "ContactMechanisms": [ { "Type": "Email", "Value": "[email protected]" } ] } }, { "Roles": [ "SERVICE PROVIDER" ], "FirstName": "David", "LastName": "Auty", "ContactInformation": { "ContactMechanisms": [ { "Type": "Email", "Value": "[email protected]" } ] } } ], "ServiceOptions": { "Aggregation": { "Concatenate": { "ConcatenateDefault": "False" } }, "SupportedReformattings": [ { "SupportedInputFormat": "NETCDF-4", "SupportedOutputFormats": [ "ZARR" ] } ] }, "MetadataSpecification": { "URL": "https://cdn.earthdata.nasa.gov/umm/service/v1.5.3", "Name": "UMM-S", "Version": "1.5.3" }, "LongName": "Harmony NetCDF-to-Zarr Service" } } ] ]

Is there meaning to the nested list structure? If not, we could use itertools.chain() to flatten it out, IIRC.

See #500 (comment)

Co-authored-by: Chuck Daniels <[email protected]>

chuckwondo · 2024-06-03T21:03:07Z

earthaccess/services.py

+
+        page_size = min(limit, 2000)
+        url = self._build_url()
+
+        results = []  # type: List[str]
+        page = 1
+        while len(results) < limit:
+            params = {"page_size": page_size, "page_num": page}
+            if self._debug:
+                print(f"Fetching: {url}")
+            # TODO: implement caching
+            response = self.session.get(url, params=params)
+
+            try:
+                response.raise_for_status()
+            except exceptions.HTTPError as ex:
+                raise RuntimeError(ex.response.text)
+
+            if self._format == "json":
+                latest = response.json()["items"]
+            else:
+                latest = [response.text]
+
+            if len(latest) == 0:
+                break
+
+            results.extend(latest)
+            page += 1
+
+        return results


Oh, hang on. We want umm_json, not json, so we do need to implement this here because python_cmr currently parses only when the format is json, not umm_json.

However, we already implement this generally in search.get_results, so this should likely be something like so:

from .search import get_results def get(self, limit: int = 2000) -> List[Any]: return search.get_results(self.session, self, limit)

However, this will need you to tweak search.py as well, as follows:

First, change from cmr import CollectionQuery, GranuleQuery to from cmr import CollectionQuery, GranuleQuery, ServiceQuery

Next, change this:

def get_results( session: requests.Session, query: Union[CollectionQuery, GranuleQuery], limit: int = 2000, ) -> List[Any]:

to this:

def get_results( session: requests.Session, query: Union[CollectionQuery, GranuleQuery, ServiceQuery], limit: int = 2000, ) -> List[Any]:

chuckwondo · 2024-06-03T21:08:33Z

earthaccess/results.py

+                "provider-id": result_json["items"][0]["meta"]["provider-id"],
+                "umm": result_json["items"][0]["umm"],


When you address my suggestion I just added to the get method, you should not need to use json.loads because the get method will returned parsed results.

chuckwondo · 2024-06-03T21:10:02Z

earthaccess/results.py

+            results = query.get(query.hits())
+            parsed[service] = self._parse_service_result(results)


Once you make the change I suggested in the get method, you should be able to eliminate the _parse_service_result method and just do this:

Suggested change

results = query.get(query.hits())

parsed[service] = self._parse_service_result(results)

parsed[service] = query.get(query.hits())

CHANGELOG.md

chuckwondo · 2024-06-03T21:31:38Z

earthaccess/services.py

+from .auth import Auth
+
+
+class DataService(ServiceQuery):


Any reason this is not in search.py, like DataCollections and DataGranules are?

I'm not opposed to keeping this in a separate file, but splitting each type of query class into their own modules might perhaps be left as a task for a separate issue (after discussing whether or not we want to do so).

For consistency with existing code, I suggest moving this to search.py and also renaming it to be pluralized: DataServices.

I had initially wanted to place DataService in search.py but because the results module uses the DataService class to query and parse results, a circular dependency is created between the search and results modules. I also thought the DataService class might grow as we add in the plugin architecture and could serve as the main entrypoint for loading and accessing plugins.

I can look into re-architecting the code so that there is a DataServices class in search.py and DataService class in results.py to be more consistent with the Collections and Granules structure. I was initially thinking that the services would be returned for a collection rather than having a separate search_services function.

RIght now the end user can query services like this:

datasets = search_datasets( short_name="MUR-JPL-L4-GLOB-v4.1", cloud_hosted=True, temporal=("2024-02-27T00:00:00Z", "2024-02-29T00:00:00Z"), ) for dataset in datasets: print(dataset.services())

Making it pretty easy to return service data for a collection. If I re-architect as mentioned above. The user would search a service like this:

datasets = search_datasets( short_name="MUR-JPL-L4-GLOB-v4.1", cloud_hosted=True, temporal=("2024-02-27T00:00:00Z", "2024-02-29T00:00:00Z"), ) for dataset in datasets: services = dataset["meta"]["associations"]["services"] for service in services: print(search_services(service))

I like how easy it is to return service data in the first code snippet but also want to make sure we are building a codebase that is consistent and easy to modify (add to) in the future as I think we want to build off of this with the plugin architecture.

Open to suggestions and/or discussing at the next hackday.

Ah, I see the circular dependency you're referring to.

I still suggest your rename DataService (singular) to DataServices (plural) making it consistent with DataCollections and DataGranules.

Regarding the "easier" code you mention above, I agree, but that doesn't preclude providing a search_services method as well. Users can then do both things: (a) call dataset.services() to get the services associated with a dataset, or (b) call search_services to search for services more generally, not necessarily specific to a dataset.

Users can then do both things: (a) call dataset.services() to get the services associated with a dataset, or (b) call search_services to search for services more generally, not necessarily specific to a dataset.

Should we consider these separate features and follow-up later to add even more convenience?

I don't think so. In fact, I suggest that dataset.services() simply invoke search_services.

Sounds reasonable to me. My only request is to not name anything "utils." It's a pet peeve of mine because it's such a generic name, as to have no meaning. Happy to iron out kinks with you at the next hack day.

My only request is to not name anything "utils."

I love this and feel so called out 🤣 I'm very prone to creating a utils subpackage but I always regret it later and am trying to get better at it.

Totally agree on utils 😄 but it does look like there is already a utils directory. Should we consider moving that to a different name? Or maybe I am misunderstanding!

Oh, I hadn't noticed that there's already a utils. Oh well. We can worry about that another time.

If that was me, sorry 🤣

- Modify services.py to use get_results function - Update results.DataCollection.services function to retrieve and return UMM-JSON for services - Update unit and integration tests

…sults

…ature/issue-447

nikki-t · 2024-06-11T15:44:29Z

Suggested modifications have been made and unit and integration tests have been updated.

Unit Test Summary

================================================================================================================ short test summary info =================================================================================================================
FAILED tests/unit/test_collection_queries.py::test_query_can_parse_single_dates[2001-12-12-2001-12-21-2001-12-12T00:00:00Z,2001-12-21T23:59:59Z] - ValueError: time data '2001-12-12' does not match format '%Y-%m-%dT%H:%M:%SZ'
FAILED tests/unit/test_collection_queries.py::test_query_can_parse_single_dates[2021-02-01--2021-02-01T00:00:00Z,] - ValueError: time data '2021-02-01' does not match format '%Y-%m-%dT%H:%M:%SZ'
FAILED tests/unit/test_collection_queries.py::test_query_can_parse_single_dates[1999-02-01 06:00-2009-01-01-1999-02-01T06:00:00Z,2009-01-01T23:59:59Z] - ValueError: time data '1999-02-01 06:00' does not match format '%Y-%m-%dT%H:%M:%SZ'
FAILED tests/unit/test_collection_queries.py::test_query_can_parse_single_dates[2019-03-10T00:00:00Z-2019-03-10T00:00:00-01:00-2019-03-10T00:00:00Z,2019-03-10T01:00:00Z] - ValueError: time data '2019-03-10T00:00:00-01:00' does not match format '%Y-%m-%dT%H:%M:%SZ'
FAILED tests/unit/test_granule_queries.py::test_query_can_parse_single_dates[2001-12-12-2001-12-21-2001-12-12T00:00:00Z,2001-12-21T23:59:59Z] - ValueError: time data '2001-12-12' does not match format '%Y-%m-%dT%H:%M:%SZ'
FAILED tests/unit/test_granule_queries.py::test_query_can_parse_single_dates[2021-02-01--2021-02-01T00:00:00Z,] - ValueError: time data '2021-02-01' does not match format '%Y-%m-%dT%H:%M:%SZ'
FAILED tests/unit/test_granule_queries.py::test_query_can_parse_single_dates[1999-02-01 06:00-2009-01-01-1999-02-01T06:00:00Z,2009-01-01T23:59:59Z] - ValueError: time data '1999-02-01 06:00' does not match format '%Y-%m-%dT%H:%M:%SZ'
FAILED tests/unit/test_granule_queries.py::test_query_can_parse_single_dates[2019-03-10T00:00:00Z-2019-03-10T00:00:00-01:00-2019-03-10T00:00:00Z,2019-03-10T01:00:00Z] - ValueError: time data '2019-03-10T00:00:00-01:00' does not match format '%Y-%m-%dT%H:%M:%SZ'
FAILED tests/unit/test_results.py::TestResults::test_data_links - ValueError: time data '2020' does not match format '%Y-%m-%dT%H:%M:%SZ'
============================================================================================================== 9 failed, 31 passed in 2.56s ==============================================================================================================

Integration Test Summary

================================================================================================================ short test summary info =================================================================================================================
FAILED tests/integration/test_auth.py::test_auth_can_read_earthdata_env_variables - AttributeError: 'Auth' object has no attribute 'username'
FAILED tests/integration/test_auth.py::test_auth_can_read_from_netrc_file - AssertionError: False is not true
FAILED tests/integration/test_auth.py::test_auth_populates_attrs - AssertionError: False is not true
FAILED tests/integration/test_auth.py::test_auth_can_fetch_s3_credentials - AssertionError: False is not true
FAILED tests/integration/test_auth.py::test_get_s3_credentials_lowercase_location[location0] - assert {}
FAILED tests/integration/test_auth.py::test_get_s3_credentials_lowercase_location[location1] - assert {}
FAILED tests/integration/test_onprem_download.py::test_earthaccess_can_download_onprem_collection_granules[daac1] - AssertionError: False is not true
FAILED tests/integration/test_onprem_download.py::test_earthaccess_can_download_onprem_collection_granules[daac3] - AssertionError: 0 not greater than 3
FAILED tests/integration/test_onprem_open.py::test_earthaccess_can_open_onprem_collection_granules[daac3] - AssertionError: 0 not greater than 2
FAILED tests/unit/test_collection_queries.py::test_query_can_parse_single_dates[2001-12-12-2001-12-21-2001-12-12T00:00:00Z,2001-12-21T23:59:59Z] - ValueError: time data '2001-12-12' does not match format '%Y-%m-%dT%H:%M:%SZ'
FAILED tests/unit/test_collection_queries.py::test_query_can_parse_single_dates[2021-02-01--2021-02-01T00:00:00Z,] - ValueError: time data '2021-02-01' does not match format '%Y-%m-%dT%H:%M:%SZ'
FAILED tests/unit/test_collection_queries.py::test_query_can_parse_single_dates[1999-02-01 06:00-2009-01-01-1999-02-01T06:00:00Z,2009-01-01T23:59:59Z] - ValueError: time data '1999-02-01 06:00' does not match format '%Y-%m-%dT%H:%M:%SZ'
FAILED tests/unit/test_collection_queries.py::test_query_can_parse_single_dates[2019-03-10T00:00:00Z-2019-03-10T00:00:00-01:00-2019-03-10T00:00:00Z,2019-03-10T01:00:00Z] - ValueError: time data '2019-03-10T00:00:00-01:00' does not match format '%Y-%m-%dT%H:%M:%SZ'
FAILED tests/unit/test_granule_queries.py::test_query_can_parse_single_dates[2001-12-12-2001-12-21-2001-12-12T00:00:00Z,2001-12-21T23:59:59Z] - ValueError: time data '2001-12-12' does not match format '%Y-%m-%dT%H:%M:%SZ'
FAILED tests/unit/test_granule_queries.py::test_query_can_parse_single_dates[2021-02-01--2021-02-01T00:00:00Z,] - ValueError: time data '2021-02-01' does not match format '%Y-%m-%dT%H:%M:%SZ'
FAILED tests/unit/test_granule_queries.py::test_query_can_parse_single_dates[1999-02-01 06:00-2009-01-01-1999-02-01T06:00:00Z,2009-01-01T23:59:59Z] - ValueError: time data '1999-02-01 06:00' does not match format '%Y-%m-%dT%H:%M:%SZ'
FAILED tests/unit/test_granule_queries.py::test_query_can_parse_single_dates[2019-03-10T00:00:00Z-2019-03-10T00:00:00-01:00-2019-03-10T00:00:00Z,2019-03-10T01:00:00Z] - ValueError: time data '2019-03-10T00:00:00-01:00' does not match format '%Y-%m-%dT%H:%M:%SZ'
FAILED tests/unit/test_results.py::TestResults::test_data_links - ValueError: time data '2020' does not match format '%Y-%m-%dT%H:%M:%SZ'
============================================================================================ 18 failed, 66 passed, 1 skipped, 1 warning in 596.56s (0:09:56) =============================================================================================

It looks like a lot of the tests are failing due to the date format. I pulled in the most recent changes from main to see if they had been fixed but it doesn't look like. I also typically see some test failures when running the integration tests locally.

@mfisher87 or @chuckwondo - Do you mind approving the workflows so that they can run? If these pass I think we might be in good shape to merge the PR.

mfisher87 · 2024-06-11T16:25:33Z

Workflows kicked off! You should not have to worry about that... I sent you an invitation with "triage" rights (includes ability to add labels, close issues, etc., but not quite as much as "maintainer") you can accept or decline :)

nikki-t added 11 commits March 19, 2024 09:28

List services that are available for a collection

3c4d293

Define integration test for services functionality

78a6fcb

Update imports and fix type annotiations

285b0d6

Update file formatting

5a40e8a

Update changelog and readme to include services functionality

8088690

Update for clarity on services

911b954

Provide unit test for DataService get function

bc9c2f4

Fix formatting of imports

0bb9f11

Fix code formatting

692d3cf

Mock API response to account for changing service records

9e37fe2

Add documentation for services functionality

e8dfc74

mfisher87 changed the title ~~Feature/issue 447~~ Add dataset.services() method to list available Harmony services Mar 25, 2024

jhkennedy mentioned this pull request Mar 28, 2024

test-mindeps erroneously cancelling test-mindeps jobs #503

Closed

mfisher87 and others added 8 commits April 2, 2024 12:32

Be more clear about test failures because no tests were collected

5404df3

Improve the error message when no tests collected

1c41659

Merge branch 'main' into feature/issue-447

daf1bb4

Merge branch 'main' of https://github.com/nikki-t/earthaccess into fe…

07771e1

…ature/issue-447

Fix import organization

240c930

Use VCR for CMR API calls and update unit and integration test for se…

4a710f4

…rvices

Merge branch 'feature/issue-447' of https://github.com/nikki-t/eartha…

6cf0f64

…ccess into feature/issue-447

Fix test formatting

20d395d

nikki-t added 2 commits May 14, 2024 10:00

Merge branch 'main' of https://github.com/nikki-t/earthaccess into fe…

44917d9

…ature/issue-447

Fix DataService init documentation

71973dc

Add a HOW-TO on searching for services

16512d1

Fix trailing whitespace

057d7fd

JessicaS11 reviewed May 20, 2024

View reviewed changes

docs/howto/search-services.md Outdated Show resolved Hide resolved

JessicaS11 reviewed May 20, 2024

View reviewed changes

earthaccess/services.py Outdated Show resolved Hide resolved

JessicaS11 reviewed May 20, 2024

View reviewed changes

earthaccess/services.py Outdated Show resolved Hide resolved

nikki-t and others added 3 commits June 3, 2024 14:13

Update docs/howto/search-services.md

41d18b8

Co-authored-by: Jessica Scheick <[email protected]>

Update earthaccess/services.py

14dc3ef

Co-authored-by: Jessica Scheick <[email protected]>

Update earthaccess/services.py

64caf3d

Co-authored-by: Jessica Scheick <[email protected]>

chuckwondo requested changes Jun 3, 2024

View reviewed changes

nikki-t and others added 2 commits June 3, 2024 16:16

Update earthaccess/results.py

f67365b

Co-authored-by: Chuck Daniels <[email protected]>

Update earthaccess/results.py

0c66c66

Co-authored-by: Chuck Daniels <[email protected]>

chuckwondo requested changes Jun 3, 2024

View reviewed changes

nikki-t added 5 commits June 11, 2024 09:44

Add issue to changelog enhancements

9dfa385

Update service architecture to provide cleanr access to service queries.

cc24470

- Modify services.py to use get_results function - Update results.DataCollection.services function to retrieve and return UMM-JSON for services - Update unit and integration tests

Factor our get_results to utils._search to be shared by search and re…

2362eb8

…sults

Merge branch 'main' of https://github.com/nikki-t/earthaccess into fe…

d1bebe3

…ature/issue-447

Fix code formatting

0e505ae

nikki-t added 5 commits June 11, 2024 12:36

Fix reference to expected test data file

b437341

Fix issue with accessing expected test data

8ecacd1

Test response for different Python version unit tests

b874c01

Test response for different Python version unit tests

61b2e04

Remove logging of response

49fb87d

		"provider-id": result_json["items"][0]["meta"]["provider-id"],
		"umm": result_json["items"][0]["umm"],

		results = query.get(query.hits())
		parsed[service] = self._parse_service_result(results)

	results = query.get(query.hits())
	parsed[service] = self._parse_service_result(results)
	parsed[service] = query.get(query.hits())

Add dataset.services() method to list available Harmony services #500

Are you sure you want to change the base?

Add dataset.services() method to list available Harmony services #500

Conversation

nikki-t commented Mar 25, 2024 • edited by github-actions bot Loading

Description

Overview of work done

Overview of verification done

Overview of integration done

PR checklist:

Pending:

nikki-t commented Mar 25, 2024

mfisher87 commented Mar 25, 2024

mfisher87 commented Mar 25, 2024 • edited Loading

nikki-t commented Apr 26, 2024

mfisher87 commented Apr 27, 2024

nikki-t commented Apr 28, 2024

mfisher87 commented Apr 28, 2024

mfisher87 commented May 14, 2024

nikki-t commented Jun 3, 2024

betolink commented Jun 3, 2024

chuckwondo commented Jun 3, 2024

betolink commented Jun 3, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nikki-t Jun 3, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nikki-t Jun 7, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nikki-t commented Jun 11, 2024

mfisher87 commented Jun 11, 2024

Add `dataset.services()` method to list available Harmony services #500

Add `dataset.services()` method to list available Harmony services #500

nikki-t commented Mar 25, 2024 •

edited by github-actions bot

Loading

mfisher87 commented Mar 25, 2024 •

edited

Loading

nikki-t Jun 3, 2024 •

edited

Loading

nikki-t Jun 7, 2024 •

edited

Loading