Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Feature/111/pipeline functions #157

Merged
merged 58 commits into from
Jun 4, 2019
Merged

Conversation

tallamjr
Copy link
Collaborator

@tallamjr tallamjr commented Jun 3, 2019

Early PR to allow for comments as well as CI integration tests

This PR aims to fix:

This baseline commit brings in the file that has been used in the
exploratory repo of https://github.com/tallamjr/plasticc/pipeline.py
Although one expects the code in this file to change a lot, PEP8 linting
was carried out to encourage the consistent style.

Comments added to areas of code which need further discussion or will
indeed be adapted further
File mode changed to 644 from 755. This puts all files in the same
permissions bracket to allow for consistency across the files.
Renaming to be in line with code style conventions
Updating with doctrings and examples. Also including helper function to
obtain git revision hash to include in analysis folder name
Certain options would be better served as defaults in the script and the
user can change these as they wish in the source file
These files may still have merit for processing the data but as the
pipeline is being developed it is felt they are better served in a
seperate folder
Removed code to add to sys path as this is no longer necessary as
pipeline script now resides in snmachine main repo
Several functions have been updated with doctrings and examples for how
to run such functions
Renaming of files to make it easier to follow how the modern workflow
takes place. Put old run_pipeline.py file in archive as this is no
longer used
gps.py defines variable names for the kernel parameters and number of
points for the GPs. This change updates the configuration file and
pipeline to be in line with that file
Reducing the number of PCA components from 200 to 10 as it is required
that number of components be less than or equal to the number of
objects. Thus for the dataset used here "training_set_snia.pickle" 10 is
appropriate.

This should fix this error:

    Running PCA...
    The condition number in the SVD is 1.02688179587e+23 and the normalized
    one is 5.00036575467e+22
    Traceback (most recent call last):
      File "plasticc_pipeline.py", line 469, in <module>
        tol=None, pca_path=None, save_output=True,
    output_root=dirs.get("intermediate_files_directory"))
      File
    "/home/tallam/.conda/envs/snmachine/lib/python3.6/site-packages/snmachine/snfeatures.py",
    line 2005, in extract_pca
        normalize_variance=normalize_variance)
      File
    "/home/tallam/.conda/envs/snmachine/lib/python3.6/site-packages/snmachine/snfeatures.py",
    line 1873, in _pca
        return self.pca_SVD(dataMatrix, ncomp, tol, normalize_variance)
      File
    "/home/tallam/.conda/envs/snmachine/lib/python3.6/site-packages/snmachine/snfeatures.py",
    line 1714, in pca_SVD
        assert isinstance(tol, np.float)
    AssertionError
If one attempts to call the "method_directory" parameter from the
dictionary but it does not exist, a None type return will occur
This function is used to determine the last modified time of the
configuration file that is being used and to place this in the name of
the analysis run.
This function now displays the confusion matrix as ASCII table in
console as well as returning seaborn figure
Removal of Log Loss function call as well as stripping unused functions
within 'create_classifier'
Previously one would prepend the hash and timestamp to the folder, but
this became too verbose

Adding checks if analysis name already created

This should see if the user has already defined an existing analysis
name already and check to see if they want to overwrite the results in
that folder, or create a new one.
Sending stderr to /dev/null if folder overwritten
@tallamjr tallamjr self-assigned this Jun 3, 2019
tallamjr and others added 4 commits June 3, 2019 16:23
The recent HTTP 404 error discovered in the CI suggests that a recent
change to sncosmo might be the reason for failing to find salt2 models

Latest version = 1.8.0, which is where the error occurs, bumping down to
1.7.1 (previous release) to test outcome
With the inclusion of this feature set, although not fully complete,
a MINOR bump is felt necessary.
Copy link
Collaborator

@Catarina-Alves Catarina-Alves left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Answer my comments. I am ok with merging it but I would like for you to address my comments.

snmachine/snaugment.py Show resolved Hide resolved
snmachine/snclassifier.py Show resolved Hide resolved
snmachine/snfeatures.py Show resolved Hide resolved
utils/archive/conquer.py Show resolved Hide resolved
utils/plasticc_pipeline.py Show resolved Hide resolved
Copy link
Collaborator

@Catarina-Alves Catarina-Alves left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All comments addressed.

@tallamjr tallamjr merged commit 90377de into dev Jun 4, 2019
@tallamjr
Copy link
Collaborator Author

tallamjr commented Jun 4, 2019

This branch feature/111/pipeline-functions should remain alive for now..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants