Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor astroquery.heasarc to use VO protocols #2997

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

zoghbi-a
Copy link
Contributor

@zoghbi-a zoghbi-a commented Apr 25, 2024

This a major refactor of the heasarc module to use the VO interface to the archive. The main motivation is:

  • Allow for complex region and table queries.
  • Expose the TAP service.
  • Cleanup the tests.
  • Since the VO interface is the main archive interface, the archive will be able to support this module more.

The main changes inlcude:

  • The old class has been renamed HeasarcClass -> HeasarcBrowseClass. The initialized instance is also rename Heasarc -> HeasarcBrowse. The same for the test files.
  • The new HeasarcClass class uses an interface similar to those used in other modules e.g. ipac.irsa.
  • A deprecation message has been added to the methods used for querying the tables and columns.
  • Added the ability to download data from the main heasarc servers, Sciserver and from the cloud.

@pep8speaks
Copy link

pep8speaks commented Apr 25, 2024

Hello @zoghbi-a! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 99:13: W503 line break before binary operator
Line 120:13: W503 line break before binary operator
Line 486:17: W503 line break before binary operator
Line 506:13: W503 line break before binary operator

Comment last updated at 2024-05-15 14:58:22 UTC

@zoghbi-a zoghbi-a force-pushed the heasarc-refactor branch 2 times, most recently from 43e9fe4 to 7f74228 Compare April 25, 2024 20:42
@zoghbi-a zoghbi-a marked this pull request as ready for review April 25, 2024 20:52
@bsipocz bsipocz added this to the v0.4.8 milestone May 10, 2024
@bsipocz
Copy link
Member

bsipocz commented May 10, 2024

The old class has been renamed HeasarcClass -> HeasarcBrowseClass. The initialized instance is also renamed Heasarc -> HeasarcBrowse. The same for the test files.

What is the motivation for this? Are there any datasets that are only accessible using the webform, but not the VO backends?

(If the only motivation is to keep what has been here, then it's not necessary, removing everything as part of the refactor is totally fine. Ideally, the old user codes should keep working, but with such a large backend restructure we also have precedence for breaking those)

Doing a proper review may take me until I'm back from the interop.

@zoghbi-a
Copy link
Contributor Author

All the tables should available through the new API, so it is kept in case some people are using it. If it is ok to remove the old class, I don't see a stopper.

@bsipocz
Copy link
Member

bsipocz commented May 10, 2024

so it is kept in case some people are using it

A rename doesn't really solve this scenario, as the continued support would have only worked if the name was kept the same, maybe with a deprecation warning.

So, before diving into a review, I would suggest cleaning up the old class. Maybe try to keep as much of the test examples/docs examples working as possible, or maybe working with a deprecation warning (e.g. in case some of the keywords need to be dropped, or renamed).

@zoghbi-a
Copy link
Contributor Author

The new class implements most of the useful methods of the old class with a deprecation warning. I will then delete the old class and keep the methods and warnings in the new class.

Copy link
Member

@bsipocz bsipocz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only leave a WIP review for now, that focuses only on the API, and will try to come back and look into the code itself later, hopefully later this week or next week. I was thinking it's useful to leave what I have for now, than hold it back until I have the time to do the full review.

@@ -9,6 +9,9 @@
The initial version of this was coded in a sprint at the
"Python in astronomy" workshop in April 2015 by Jean-Christophe Leyder,
Abigail Stevens, Antonio Martin-Carrillo and Christoph Deil.

Updates to use the XAMIN service was added by Abdu Zoghbi
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of this and the above author 3 lines can be cleaned-up, too. Git carries the history, and we'll also work on a team website that will list whoever is actively maintain the module.

heasarc
^^^^^^^

- Refactor heasarc to use the VO backend. The old Heasarc class is now HeasarcBrowser [#2997]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drop the second sentence as we don't keep the old class

return Table.read(*args, **kwargs)
else:
return Table.read(*args, **kwargs, unit_parse_strict='silent')


@async_to_sync
class HeasarcClass(BaseQuery):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do inherit from the VO baseclass, too, using the order in the other modules (so e.g. the session is properly get passed back to pyvo, and a couple of other small things)

return self._tap

@property
def _meta(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great idea, and great that the server returns the info in the table and not in the metadata of the table, as e.g astropy has serious toubles with parsing that non-column metadata

defaults = meta['par']
return defaults

def get_default_radius(self, table_name):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make this private, too, similar to _get_default_cols


return self.query_async(request_payload, cache=cache)
def get_links(self, query_result=None, tablename=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these datalinks? Then call it get_datalinks instead (like in gaia), though here were are inconsistent yet again as alma has datalink instead. But not many modules have datalinks yet, so we can sort out the API now and use that for the future (and then clean up the inconsistent, but existing modules)


return dl_result

def enable_cloud(self, provider='aws', profile=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hopefully everthing cloud related will make its way up to pyvo, eventually, but until then it all can stay of course.


fits_content = fits.open(BytesIO(content))
def download_data(self, links, host='heasarc', location='.'):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder how much of this can be put in the baseclass

The result from get_links
host : str
The data host. The options are: heasarc (defaul), sciserver, aws.
If host == 'sciserver', data is copied from the local mounted
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is very specific corner case only for sciserver. Will it work for smce, too?


def get_package_data():
paths = [
os.path.join("data", "*.dat"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the datafiles being deleted. Should they be?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants