Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration with RAIL #9

Closed
jlvdb opened this issue May 16, 2024 · 5 comments · Fixed by #14
Closed

Integration with RAIL #9

jlvdb opened this issue May 16, 2024 · 5 comments · Fixed by #14
Labels
help wanted Extra attention is needed

Comments

@jlvdb
Copy link
Collaborator

jlvdb commented May 16, 2024

Since the project code is stable now I'm starting to prepare a release and integrating it correctly with the RAIL ecosystem. There are a few items that are not fully clear to me:

  1. Did I choose the correct path structure in src/? As what kind of algorithm should the code be classified and in which place exactly should the final YawSummarizer stage go?
  2. Where do the examples and notebooks belong? Do they stay in this repository or should they move some of the top level RAIL repos?
  3. It would be great if someone experienced with RAIL could have a look at the repository in general to ensure that the quality requirements are met.
  4. Are there any other steps required other than releasing the package on PyPI and registering it as a RAIL algorithm?
@jlvdb jlvdb added the help wanted Extra attention is needed label May 16, 2024
@aimalz
Copy link

aimalz commented May 16, 2024

Thanks for setting this up!

  1. The actual informer/summarizer pair should be in a single module in src/rail/estimation/algos. To match style, it would be best for the one module per algorithm containing all informer/estimator/classifier/summarizer stages that use it to be called yaw.py (or yaw_cc.py, yaw_clust_z.py, etc.) without having 'summarizer' in the module name, as we hope to eventually support both summarization and estimation with more methods that can actually be applied in both ways, and users can see what stages are available in a module pretty easily using python's standard help functionality. However, I don't think we've actually achieved standardization on whether anything besides the __init__.py can go in src/rail/[algo-name] (seems like usually no, but there are examples that do have other modules there), and we're a bit all over the place on what other supporting modules can go in src/rail/estimation/algos (and similarly for creation/engines, creation/degraders, and evaluation/metrics). Input from others would definitely be helpful here! For now, I'd recommend at least prepending an underscore if you don't expect users to directly call anything in the supporting modules and ensuring that there aren't already modules with identical names in the RAILiverse (to make it easier for folks to debug something like an import error that could arise in any of the same-named modules that might be hinted at in error messages without showing the full path).
  2. Demos should indeed go in the primary RAIL repo's examples/estimation directory, as that's what gets synced for rendering the notebooks on ReadTheDocs. However, while the package is in development, it's fine to leave them in a top-level `examples' directory.
  3. This should probably be reviewed by more than one person and perhaps not yet, so I'll leave that for later -- definitely worth mentioning it in a tag-up when you're ready!
  4. I think @hangqianjun and @ztq1996 can advise best (and maybe identify if there are any holes in the development principles doc to help address all these questions).

@jlvdb
Copy link
Collaborator Author

jlvdb commented May 16, 2024

I think one item to add is
5. add the algorithm to the citations page on the RAIL docs.

@jlvdb jlvdb linked a pull request May 17, 2024 that will close this issue
4 tasks
@jlvdb jlvdb closed this as completed in #14 May 17, 2024
@jlvdb jlvdb reopened this May 17, 2024
@jlvdb
Copy link
Collaborator Author

jlvdb commented May 17, 2024

The code does not fit into the currently existing summarizers (CatSummarizer, PZSummarizer, SZPZSummarizer) because, even if you wrap/group some of the stages, there are always different numbers or types of inputs as can be seen here:
rail_yaw_network
So subclassing anything is not an option at the moment.

If I understand you correctly @aimalz, you are suggesting that I put all five stages into src/rail/estimation/algos/cc_yaw.py and move the package with the remaining code to something like src/rail/yaw_rail/ (following the example of pz-rail-som)?

@jlvdb
Copy link
Collaborator Author

jlvdb commented May 17, 2024

I restructured the code accordingly on the associated branch. Feel free to check if that better matches the RAIL principles.

  • 1. Code structure
  • 2. Collecting all examples and moving them to RAIL
  • 3. Code review
  • 4. Registering package in rail/rail_packages.yml
  • 5. Listing package in rail/docs/source/citing.rst
  • 6. Publishing v1.0

@aimalz
Copy link

aimalz commented May 17, 2024

Gotcha, thanks for the diagram, that makes a lot of sense. (Also, I would love to make the figures for the v1 paper using whatever you used for this -- let's compare notes offline!)

Organizationally, I don't want to suggest significantly rearranging the code you've written when it's already grouped reasonably given the complexity of the algorithm. I can, however, think of a way to use import statements so users can find the YAW functionality in parallel places relative to other algos. My suggestion is to make two new modules:

  • src/rail/tools/yaw_handles.py would contain import statements for all the YAW-specific DataHandle subclasses.
  • src/rail/estimation/algos/yaw_xcorr.py (where "xcorr" could be replaced with "clust_z", "cc_z" or whatever shorthand folks prefer for the spatial correlation/clustering-based approach to non-photometric redshift estimation) would contain import statements for all the YawRailStage subclasses.

Relatedly, I would also propose the following changes to follow the current naming conventions:

  • YawCacheCreate --> YawCacheMaker (because Creator already means something unrelated in the RAILiverse)
  • YawCacheDrop --> YawCacheDropper
  • YawCrossCorrelate --> YawCrossCorrelator
  • YawAutoCorrelate --> YawAutoCorrelator
  • YawSummarize --> YawSummarizer

As for where to put the modules currently in src/rail/yaw_rail, I think it would be best to move them out of there because, while I don't know the details of how the namespace trick works, nearly all the other rail_* repositories have nothing but an __init__.py in their equivalent of src/rail/yaw directory (yes, another suggested renaming, sorry), so . (Exceptions include rail_sompz, rail_deepdisc, and rail_bpz, which each have a utils.py that should be renamed to disambiguate if not also moved elsewhere in the directory structure.) In any case, the closest analogue to YAW is rail_tpz, which pretty reasonably has directories under src/rail/estimation/algos, so I'd suggest following that model of moving src/rail/yaw_rail into src/rail/estimation/algos until we establish a standard (and this is something we should definitely add to the development guide).

How does that sound?

@jlvdb jlvdb mentioned this issue Jul 10, 2024
4 tasks
@jlvdb jlvdb closed this as completed Jan 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants