Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Implement the Plasticc pipeline functionalities into snmachine [RFC] #111

Closed
Catarina-Alves opened this issue Apr 8, 2019 · 5 comments
Assignees
Labels
enhancement Improvement to existing functionality or implementation, including adding a new functions/methods. feature To add a new feature, new standalone files. (High level) pre-v2.0.0 Issues that should be completed prior to public release of v2.0.0

Comments

@Catarina-Alves
Copy link
Collaborator

There are several useful functions in pipeline.py (https://github.com/tallamjr/plasticc/blob/develop/pipeline.py).
I propose copying them to snmachine. I think most of them would fit well in analysis.py as they are usefull in analysing the data/ saving the analysis we did.

@tallamjr
Copy link
Collaborator

@Catarina-Alves can you comment on which functions you find most useful in the pipeline.py file and we can write a plan to bring these over to the snmachine codebase in analysis.py or other files.

@Catarina-Alves
Copy link
Collaborator Author

The ones I find more useful (not all will be fit to put into snmachine) are:

  • createFolderStructure
  • loadDataset (now I have come to use it less and less)
  • reduceDataset and augmentData (they need to suffer some modifications because we changed PlasticcDayta)
  • combineAdditionalFeatures (and associated functions)

@tallamjr tallamjr added enhancement Improvement to existing functionality or implementation, including adding a new functions/methods. feature To add a new feature, new standalone files. (High level) labels May 3, 2019
@tallamjr
Copy link
Collaborator

tallamjr commented May 3, 2019

As requested in here, the proposal for this feature will be to bring several useful functions.

The functions that will be migrated are:
This is a mutable table

Function Where to live Migrated
def createFolderStructure(ANALYSIS_DIR, ANALYSIS_NAME):
def loadDataset(DATA_PATH):
def saveConfigurationFile(dirs):
def reduceDataset(dat, dirs, subset_size, SEED):
def mergeFeatures(some_features, other_features):
def combineAdditionalFeatures(wavelet_features, dat):

@MichelleLochner , @rbiswas4 , @Catarina-Alves Please respond to this for comments on which functions users would like and for a discussion on best place certain functions should live.

For example, should they all be housed in a new run_plasticc_pipeline.py file akin to run_pipeline.py which was made for SPCC data, or perhaps should they be incorporated into the snmachine module files such as sndata for say reduceDataset ?

@tallamjr tallamjr changed the title Implement the Plasticc pipeline functionalities into snmachine [FEATURE] Implement the Plasticc pipeline functionalities into snmachine [RFC] May 3, 2019
@tallamjr tallamjr added the pre-v2.0.0 Issues that should be completed prior to public release of v2.0.0 label May 8, 2019
@MichelleLochner
Copy link
Contributor

I think I'd propose these could all go into one of two modules: analysis.py and utils.py. I'd say all those functions in that table could go in utils.py. The only caveat is I really didn't like mergeFeatures and combineAdditionalFeatures. In general, if working with numpy arrays or pandas dataframes, combining features should be trivial and left to the user rather than put in a very opaque function. The exception might be if there are particular types of features on which some additional processing might need to be done, you might want to write a separate function for that but those two general functions (which seem to do the same thing by the name anyway) seem far too ambiguous to me. But the two of you are the ones doing most of the processing so let me know if you think that kind of behaviour is really useful and maybe we can rewrite them to be more clear.

@tallamjr
Copy link
Collaborator

With the majority of the above being merged into dev with #157 I will close this issue in favour of more specific issues relating to adding functionality to the pipeline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improvement to existing functionality or implementation, including adding a new functions/methods. feature To add a new feature, new standalone files. (High level) pre-v2.0.0 Issues that should be completed prior to public release of v2.0.0
Projects
None yet
Development

No branches or pull requests

3 participants