Skip to content

Gaia: change the signature of the method load_data #3014

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
bd8f332
GAIAMNGT-1700 change signature of the method Gaia.load_data
May 28, 2024
a304b13
GAIAMNGT-1700 change signature of the method Gaia.load_data
May 28, 2024
54bd20e
GAIAMNGT-1700 change signature of the method Gaia.load_data
May 28, 2024
040b1dc
GAIAMNGT-1700 change signature of the method Gaia.load_data
May 28, 2024
80275c2
GAIAMNGT-1700 New tests
May 28, 2024
53233b8
GAIAMNGT-1700 New tests
May 29, 2024
095483a
GAIAMNGT-1700 New tests
May 29, 2024
181e858
GAIAMNGT-1700 New tests
May 29, 2024
71ad0ae
GAIAMNGT-1700 New tests
May 29, 2024
eddf96a
GAIAMNGT-1700 New function build_general_output_filename
May 29, 2024
72617e6
GAIAMNGT-1700 New function test
May 29, 2024
94b8099
GAIAMNGT-1700 Remove test
May 29, 2024
536f39c
GAIAMNGT-1700 Remove wrong line
May 29, 2024
45b0461
GAIAMNGT-1700 New test to check the exception
May 29, 2024
0f8b0cf
GAIAMNGT-1700 New test to check the exception
May 29, 2024
6aa4f6b
GAIAMNGT-1700 Update PR number
May 29, 2024
6845bc8
GAIAMNGT-1700 Fix code style issues
May 29, 2024
12efad1
GAIAMNGT-1700 Fix code style issues
May 29, 2024
ea2c414
GAIAMNGT-1700 The function build_general_output_filename is removed.
Jun 21, 2024
154d5e7
GAIAMNGT-1700 The name of the output file contains the timestamp with…
Sep 30, 2024
510fe02
GAIAMNGT-1700 Change format
Sep 30, 2024
db87123
GAIAMNGT-1700 Update test
Sep 30, 2024
443f422
GAIAMNGT-1700 Fix test to make use of a mocked datetime
Oct 1, 2024
220b885
GAIAMNGT-1700 Resolve conflicts
Oct 1, 2024
75b7124
Merge branch 'main' into ESA_gaia_GAIAMNGT-1700_load_data
cosmoJFH Oct 1, 2024
1b8aaed
GAIAMNGT-1700 Make use of Path instead of os.path
Oct 1, 2024
f96d621
GAIAMNGT-1700 Updte information for the parameter dump_to_file
Oct 2, 2024
775e8e0
GAIAMNGT-1700 Fix unexpected indentation
Oct 2, 2024
e24b67c
GAIAMNGT-1700 Fix unexpected indentation
Oct 2, 2024
58f8692
MAINT: adding deprecation for the removed argument
bsipocz Oct 16, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions CHANGES.rst
Original file line number Diff line number Diff line change
Expand Up @@ -179,8 +179,13 @@ gaia

- Fix method search_async_jobs in the class TapPlus. [#2967]

- Change the signature of the function load_data: the parameter output_file that defined the file where the results were
saved, is replaced by boolean parameter dump_to_file, that in case it is true, a compressed directory named "datalink_output.zip" with
all the DataLink files is made. So the users cannot specified the output file anymore [#3014]

- New retrieval types for datalink (Gaia DR4 release). [#3110]


jplhorizons
^^^^^^^^^^^

Expand Down Expand Up @@ -415,6 +420,8 @@ gaia
epoch photometry service to return all data associated to a given source.
[#2376]

- New retrieval types for datalink (Gaia DR4 release). [#3110]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is at the wrong location, but I'm cleaning up the changelog for release time anyway, so it doesn't matter here.


- Default Gaia catalog updated to DR3. [#2596]

heasarc
Expand Down
85 changes: 48 additions & 37 deletions astroquery/gaia/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,13 +13,12 @@
Created on 30 jun. 2016
Modified on 18 Ene. 2022 by mhsarmiento
"""
import datetime
import json
import os
import shutil
import zipfile
from collections.abc import Iterable
from datetime import datetime, timezone
from pathlib import Path

from astropy import units
from astropy import units as u
Expand All @@ -28,6 +27,7 @@
from astropy.io import votable
from astropy.table import Table
from astropy.units import Quantity
from astropy.utils.decorators import deprecated_renamed_argument
from requests import HTTPError

from astroquery import log
Expand Down Expand Up @@ -168,9 +168,11 @@
except HTTPError:
log.error("Error logging out data server")

@deprecated_renamed_argument("output_file", None, since="0.4.8")
def load_data(self, ids, *, data_release=None, data_structure='INDIVIDUAL', retrieval_type="ALL",
linking_parameter='SOURCE_ID', valid_data=False, band=None, avoid_datatype_check=False,
format="votable", output_file=None, overwrite_output_file=False, verbose=False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing a parameter from the public APIs should be done through a deprecation period.

But also, I do wonder why is this necessary, to remove the ability from the users to define a filename/location that better suits them?

So, my suggestion would be to change the default None to datalink_output.zip, and add the new boolean parameter. That way the default will be what you proposed here, but the users still have the ability to select a better location.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still a problem, we should not break people's code without noticing them first.

format="votable", dump_to_file=False, overwrite_output_file=False, verbose=False,
output_file=None):
"""Loads the specified table
TAP+ only

Expand Down Expand Up @@ -218,44 +220,53 @@
By default, this value will be set to False. If it is set to 'true'
the Datalink items tags will not be checked.
format : str, optional, default 'votable'
loading format. Other available formats are 'csv', 'ecsv','votable_plain' and 'fits'
output_file : string or pathlib.PosixPath, optional, default None
file where the results are saved.
If it is not provided, the http response contents are returned.
loading format. Other available formats are 'csv', 'ecsv','votable_plain', 'json' and 'fits'
dump_to_file: boolean, optional, default False.
If it is true, a compressed directory named "datalink_output_<time_stamp>.zip" with all the DataLink
files is made in the current working directory. The <time_stamp> format follows the ISO 8601 standard:
"yyyymmddThhmmss".
overwrite_output_file : boolean, optional, default False
To overwrite the output_file if it already exists.
To overwrite the output file ("datalink_output.zip") if it already exists.
verbose : bool, optional, default 'False'
flag to display information about the process

Returns
-------
A dictionary where the keys are the file names and its value is a list of astropy.table.table.Table objects
"""
now = datetime.now(timezone.utc)
now_formatted = now.strftime("%Y%m%d_%H%M%S")
temp_dirname = "temp_" + now_formatted
downloadname_formated = "download_" + now_formatted

output_file_specified = False
if output_file is None:

now = datetime.datetime.now(datetime.timezone.utc)
if not dump_to_file:
now_formatted = now.strftime("%Y%m%d_%H%M%S")
temp_dirname = "temp_" + now_formatted
downloadname_formated = "download_" + now_formatted

Check warning on line 244 in astroquery/gaia/core.py

View check run for this annotation

Codecov / codecov/patch

astroquery/gaia/core.py#L242-L244

Added lines #L242 - L244 were not covered by tests
output_file = os.path.join(os.getcwd(), temp_dirname, downloadname_formated)

else:
output_file = 'datalink_output_' + now.strftime("%Y%m%dT%H%M%S") + '.zip'
output_file_specified = True

if isinstance(output_file, str):
if not output_file.lower().endswith('.zip'):
output_file = output_file + '.zip'
elif isinstance(output_file, Path):
if not output_file.suffix.endswith('.zip'):
output_file.with_suffix('.zip')

output_file = os.path.abspath(output_file)
log.info(f"DataLink products will be stored in the {output_file} file")

if not overwrite_output_file and os.path.exists(output_file):
raise ValueError(f"{output_file} file already exists. Please use overwrite_output_file='True' to "
f"overwrite output file.")

path = os.path.dirname(output_file)

log.debug(f"Directory where the data will be saved: {path}")

if path != '':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This maybe cleaner as '' will be False here:

Suggested change
if path != '':
if path:

if not os.path.isdir(path):
try:
os.mkdir(path)
except FileExistsError:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is the overwrite_output_file parameter, it should be used here isn't it?

log.warn("Path %s already exist" % path)
except OSError:
log.error("Creation of the directory %s failed" % path)

Check warning on line 268 in astroquery/gaia/core.py

View check run for this annotation

Codecov / codecov/patch

astroquery/gaia/core.py#L263-L268

Added lines #L263 - L268 were not covered by tests

if avoid_datatype_check is False:
# we need to check params
rt = str(retrieval_type).upper()
Expand Down Expand Up @@ -298,14 +309,7 @@
if linking_parameter != 'SOURCE_ID':
params_dict['LINKING_PARAMETER'] = linking_parameter

if path != '':
try:
os.mkdir(path)
except FileExistsError:
log.error("Path %s already exist" % path)
except OSError:
log.error("Creation of the directory %s failed" % path)

files = dict()
try:
self.__gaiadata.load_data(params_dict=params_dict, output_file=output_file, verbose=verbose)
files = Gaia.__get_data_files(output_file=output_file, path=path)
Expand All @@ -314,6 +318,9 @@
finally:
if not output_file_specified:
shutil.rmtree(path)
else:
for file in files.keys():
os.remove(os.path.join(os.getcwd(), path, file))

if verbose:
if output_file_specified:
Expand All @@ -329,18 +336,21 @@
@staticmethod
def __get_data_files(output_file, path):
files = {}
if zipfile.is_zipfile(output_file):
with zipfile.ZipFile(output_file, 'r') as zip_ref:
zip_ref.extractall(os.path.dirname(output_file))
extracted_files = []

with zipfile.ZipFile(output_file, "r") as zip_ref:
extracted_files.extend(zip_ref.namelist())
zip_ref.extractall(os.path.dirname(output_file))

# r=root, d=directories, f = files
for r, d, f in os.walk(path):
for file in f:
if file.lower().endswith(('.fits', '.xml', '.csv', '.ecsv')):
if file in extracted_files:
files[file] = os.path.join(r, file)

for key, value in files.items():
if '.fits' in key:

if key.endswith('.fits'):
tables = []
with fits.open(value) as hduList:
num_hdus = len(hduList)
Expand All @@ -349,19 +359,20 @@
Gaia.correct_table_units(table)
tables.append(table)
files[key] = tables
elif '.xml' in key:

elif key.endswith('.xml'):
tables = []
for table in votable.parse(value).iter_tables():
tables.append(table)
files[key] = tables

elif '.csv' in key:
elif key.endswith('.csv'):
tables = []
table = Table.read(value, format='ascii.csv', fast_reader=False)
tables.append(table)
files[key] = tables

elif '.json' in key:
elif key.endswith('.json'):
tables = []
with open(value) as f:
data = json.load(f)
Expand Down
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
3 changes: 2 additions & 1 deletion astroquery/gaia/tests/setup_package.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,8 @@ def get_package_data():
paths = [os.path.join('data', '*.vot'),
os.path.join('data', '*.vot.gz'),
os.path.join('data', '*.json'),
os.path.join('data', '*.ecsv')
os.path.join('data', '*.ecsv'),
os.path.join('data', '*.zip')
] # etc, add other extensions
# you can also enlist files individually by names
# finally construct and return a dict for the sub module
Expand Down
Loading