Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for using local data files? #6

Closed
tuckermcclure opened this issue May 8, 2024 · 12 comments
Closed

Support for using local data files? #6

tuckermcclure opened this issue May 8, 2024 · 12 comments

Comments

@tuckermcclure
Copy link

This package is very helpful. We use it in our simulation. One key thing is that, given a fixed Manifest.toml file, our sims should always produce the same results. When space indices are updated, that requirement is violated. That is, you can run a simulation and then re-run that exact same simulation, and if the space indices happen to have been updated, you may get different results, and that's very bad for us.

Back in v0.10, I was calling this:

    ST.init_space_indices(;
        enabled_files  = [:dtcfile, :fluxtable, :solfsmy, :wdcfiles],
        dtcfile_path, fluxtable_path, solfsmy_path, wdcfiles_dir,
    )

(We added the data files directly to a package we use so that they were essentially version-controlled just like any other package.)

Looking through the source code, I don't see any equivalent way to pass in paths for those files today. Am I missing anything?

@ronisbr
Copy link
Member

ronisbr commented May 9, 2024

Hi @tuckermcclure !

In fact, there is a way, but it is not documented and we can develop a nicer API.

I created a new project with the following files:

Project.toml

[deps]
Scratch = "6c6a2e73-6563-6170-7368-637461726353"
SpaceIndices = "5a540a4e-639f-452a-b107-23ea09ed4d36"

SW-ALL.csv
DTCFILE.txt
SOLFSMY.txt

The three last files were obtained from the respective websites.

Now, we need to change the url function of the API to tell the system to copy the local files to the Scratch space.

using Scratch
using SpaceIndices

# Make sure we do not have any cache in the scratch space.
Scratch.clear_scratchspaces!()

# Use the local files instead the URLs.
local_dir = dirname(@__FILE__)
SpaceIndices.urls(::Type{SpaceIndices.Celestrak}) = ["file://$local_dir/SW-ALL.csv"]
SpaceIndices.urls(::Type{SpaceIndices.JB2008})    = [
    "file://$local_dir/DTCFILE.txt"
    "file://$local_dir/SOLFSMY.txt"
]

SpaceIndices.init()

I got the following by running this script:

julia> include("test.jl")
[ Info: Downloading the file 'DTCFILE.txt' from 'file:///Users/ronan.arraes/tmp/my_project/DTCFILE.txt'...
[ Info: Downloading the file 'SOLFSMY.txt' from 'file:///Users/ronan.arraes/tmp/my_project/SOLFSMY.txt'...
[ Info: Downloading the file 'SW-ALL.csv' from 'file:///Users/ronan.arraes/tmp/my_project/SW-ALL.csv'...

julia> space_index(Val(:F10obs), DateTime("2024-01-01"))
146.2

I really think we can improve this but I have no idea of a good API for the general case.

@tuckermcclure
Copy link
Author

Hi @ronisbr, and thanks for your reply!

Ok, got it. You're reassigning the urls methods in SpaceIndices, and you're using file://... as the URL, so SpaceIndices is still "downloading" the files, but really that just results in the files being copied to the scratch space.

To brainstorm on a potential API for this, I think something like this would work well for my use cases:

indices = SpaceIndices.load_indices(dtcfile_path = "...", fluxtable_path = "...", ...)
atmosphere = SatelliteToolbox.AtmosphereModels.jb2008(indices)
rho = get_density(atmosphere, julian_day, position)

I recognize that's a big change from the way SatelliteToolbox and SpaceIndices work today. I'm not really sure why the indices are loaded as a global today though; maybe it's to save from doing the same load over and over if there are different models in different places of the code where sharing a common indices variable would be hard? I would actually like to have the ability to use different indices for different things, all running in the same Julia session.

While we're here, it would also be pretty cool to put the relevant indices data files into a Julia package that's updated every two weeks or so. That could be directly in SpaceIndices or in another package like SpaceIndicesData. Then, users would only get different results in their sims after explicitly updating the SpaceIndicesData package. That would let the data function like all other package management rather than always using the bleeding edge of released data. What do you think?

@ronisbr
Copy link
Member

ronisbr commented May 11, 2024

Hi @tuckermcclure !

I had an idea! What if we create a keyword filepath to be passed to SpaceIndices.init function (when calling for each set). In this case, we will use this file instead the one specified in the url function. It will solve your problem with only minor modifications to the package. What do you think?

While we're here, it would also be pretty cool to put the relevant indices data files into a Julia package that's updated every two weeks or so. That could be directly in SpaceIndices or in another package like SpaceIndicesData. Then, users would only get different results in their sims after explicitly updating the SpaceIndicesData package. That would let the data function like all other package management rather than always using the bleeding edge of released data. What do you think?

It would be awesome! However, we will need something to constantly update the files (some are daily). I am not sure how can we do this safely.

@tuckermcclure
Copy link
Author

What if we create a keyword filepath to be passed to SpaceIndices.init

Sounds good to me!

we will need something to constantly update the files (some are daily)

I think if you want bleeding edge, you should just pull the files like today, but if you want a predictable environment at a slower pace than daily updates, the SpaceIndicesData idea might be useful. A two-week release cadence would certainly be fine for lots of folks. Perhaps the releases could be automated by making sure there are new data files and they appear to load correctly. I mean, there's nothing today that checks that the files are validly interpreted; the most recent files are pulled down, and those are what get used. In fact, if there were a breaking change in the files today, that would break SatelliteToolbox, but if there were this SpaceIndicesData package that had to load the indices before committing/releasing them, then a breaking change in the upstream files wouldn't break a user's work. 🤷

@ronisbr
Copy link
Member

ronisbr commented May 14, 2024

Sounds good to me!

Perfect! I will implemente that.

I think if you want bleeding edge, you should just pull the files like today, but if you want a predictable environment at a slower pace than daily updates, the SpaceIndicesData idea might be useful. A two-week release cadence would certainly be fine for lots of folks. Perhaps the releases could be automated by making sure there are new data files and they appear to load correctly. I mean, there's nothing today that checks that the files are validly interpreted; the most recent files are pulled down, and those are what get used. In fact, if there were a breaking change in the files today, that would break SatelliteToolbox, but if there were this SpaceIndicesData package that had to load the indices before committing/releasing them, then a breaking change in the upstream files wouldn't break a user's work. 🤷

I fully agree. I will investigate the best way to do this, if this can be automated, and if it will require big changes. I was thinking about using Artifacts, but I am not sure if they are suitable for data that changes daily.

EDIT: As I learned from JuliaSlack, using Artifacts for this type of task is not good. A new version will have a new hash and I will need to update this package every day.

@ronisbr
Copy link
Member

ronisbr commented May 15, 2024

@tuckermcclure Done!

Now you can do:

SpaceIndices.init(SpaceIndices.Celestrak; filepaths = ["./SW-All.csv"])

@ronisbr
Copy link
Member

ronisbr commented May 15, 2024

Btw @tuckermcclure, can you please test the dev version before I tag a new version?

@tuckermcclure
Copy link
Author

Wow, so fast! I barely had time to get out my meetings before this was done. I'll give this a shot.

@tuckermcclure
Copy link
Author

This seems to work fine for the files on my end.

I know some things have changed recently. Do I understand correctly that https://celestrak.org/SpaceData/SW-All.csv has replaced the need for the previous fluxtable.txt and various kp****.wdc files?

@ronisbr
Copy link
Member

ronisbr commented May 15, 2024

I know some things have changed recently. Do I understand correctly that https://celestrak.org/SpaceData/SW-All.csv has replaced the need for the previous fluxtable.txt and various kp****.wdc files?

Yes! This file has everything in a much easier format to handle.

@tuckermcclure
Copy link
Author

Thank you! This is very helpful.

@ronisbr
Copy link
Member

ronisbr commented May 16, 2024

Perfect! I will tag a new version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants