-
-
Notifications
You must be signed in to change notification settings - Fork 392
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gaia: change the signature of the method load_data #3014
base: main
Are you sure you want to change the base?
Gaia: change the signature of the method load_data #3014
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to understand the motivation behind removing the ability from the users to define the output name. At the minimum, it should go through a deprecation cycle, but I would rather keep the functionality.
Besides that it looks fine, minus a minor bug about having it but not using the overwrite parameter.
astroquery/gaia/core.py
Outdated
@@ -325,21 +323,31 @@ def load_data(self, ids, *, data_release=None, data_structure='INDIVIDUAL', retr | |||
|
|||
return files | |||
|
|||
def build_general_output_filename(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we try to slowly simplify the API whereever we can, so I would suggest to make this private from the beginning.
Or just as well, not have a separate method, but add the 4 lines at the place of usage (as I see it is only used in one place)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have removed the function build_general_output_filename.
@@ -170,7 +169,7 @@ def logout(self, *, verbose=False): | |||
|
|||
def load_data(self, ids, *, data_release=None, data_structure='INDIVIDUAL', retrieval_type="ALL", | |||
linking_parameter='SOURCE_ID', valid_data=False, band=None, avoid_datatype_check=False, | |||
format="votable", output_file=None, overwrite_output_file=False, verbose=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing a parameter from the public APIs should be done through a deprecation period.
But also, I do wonder why is this necessary, to remove the ability from the users to define a filename/location that better suits them?
So, my suggestion would be to change the default None to datalink_output.zip
, and add the new boolean parameter. That way the default will be what you proposed here, but the users still have the ability to select a better location.
if not os.path.isdir(path): | ||
try: | ||
os.mkdir(path) | ||
except FileExistsError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is the overwrite_output_file
parameter, it should be used here isn't it?
I would also raise the idea of maybe having a separate method for the download instead of the Anyway, this API question points further than this PR, and for that I would like to also ping @keflavich and @andamian for their opinions. |
The proposed change is a bit counterintuitive - it's hard to tell what drives it (maybe subsequent processing on the resulting file?). If the user has no control on the name of the file I would expect it to be at least timestamped. Also, current working directory is a bit vague. What is current working directory when a script calls the |
Sorry I'm a bit late to this, but I agree that removing users' ability to specify the output location does not seem like a good idea. However, I had a hard time parsing the original intent: is the idea that |
Dear all, Thanks for your valuable feedback. Let me give you an overview of the way this functionality is currently implemented and the reasons behind our proposal.
Examples of key names are: “EPOCH_PHOTOMETRY-Gaia DR3 2263166706630078848.xml” or “EPOCH_PHOTOMETRY_RAW.xml”. This file-naming procedure reproduces the behavior of the web interface of the Gaia ESA Archive (which does not allow to specify the names of the files to be downloaded). That said, it is possible to retrieve:
From the web interface, the data is always downloaded as a single compressed (.zip) file that is named as: <user_name><job_id>. or <job_id>. (depending on the request type: registered or anonymous).
We could consider the possibility of adding a separated method to download the ancillary files, but there are few reasons that make me reluctant to implement it (although I am not totally against this):
All in all, we considered that the proposed change would benefit the users by simplifying the way that the DataLink products can be stored using Astroquery.Gaia). But if you think that this change can be improved (by e.g. automatically adding a timestamp to the currently fixed file name), please let us know. Kind regards, ================================== Telespazio for ESA - European Space Agency |
Thank you for your great description @hectorcanovas. Yes, trying to emulate the Web Interface functionality is a great goal as it will make it easier for user to use both mechanisms to access their data. I'm not familiar with the Gaia Archive, but in general, the Web interface is used for exploration while the library is used for processing. |
The present implementation of the method GaiaClass.load_data makes use of the parameter output_file that defines a user defined zipped file were the data is saved.
We would like to change this parameter by the boolean parameter dump_to_file to save the results in the hard-code file name "datalink_output.zip" that will be saved in the current working directory.
This PR also contains 2 bug fixes in the present implementation:
if the value of the parameter output_file is a simple name, i.e., without a path, the code will collect all the files with extensions '.fits', '.xml', '.csv', or '.ecsv' from the current working directory. This must be changed, since this implies that any file with any of those extensions will be included in zipped file, i.e., independently if they come from the execution of the method load_data, or from previous executions.
If the parameter output_file is used, the built zip file is used to save the files coming from by the datalink service. Then the file is unzipped and the files that it contains are read, so that a list of Tables is built and returned to the user. But the the unzipped files are never removed. We think that this is a bug and it would be good to remove the unzipped files.
New units tests were developed.
cc @esdc-esac-esa-int
jira: GAIAMNGT-1700