Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor interface #88

Merged
merged 13 commits into from
Nov 30, 2023
Merged

Refactor interface #88

merged 13 commits into from
Nov 30, 2023

Conversation

yashpatel6
Copy link
Collaborator

Description

Refactoring interface from separate validate and generate-checksum commands into a single pipeval command with validate and generate-checksum subcommands


Test Results

#!/bin/bash
echo "empty BAM"
pipeval validate  test_files/empty_bam.bam

printf "\n"

echo "invalid BAM"
pipeval validate  test_files/invalid.bam

printf "\n"

echo "pass BAM"
pipeval validate  test_files/pass.bam

printf "\n"

echo "BAM with no index"
pipeval validate  test_files/noindex.bam

printf "\n"

echo "Just text file"
pipeval validate  test_files/hello.txt

printf "\n"

echo "Failing checksum MD5"
pipeval validate  test_files/hello_bad_md5.txt

echo "Failing checksum SHA512"
pipeval validate  test_files/hello_bad_sha512.txt

printf "\n"

echo "Generate md5 checksum"
pipeval generate-checksum -t md5 test_files/togen.txt

echo "Generate sha512 checksum"
pipeval generate-checksum -t sha512 test_files/togen.txt

echo "Validate generated checksums"
pipeval validate test_files/togen.txt

rm test_files/togen.txt.md5
rm test_files/togen.txt.sha512

printf "\n"

echo "Valid VCF"
pipeval validate  test_files/test_vcf.vcf.gz

printf "\n"

echo "Valid CRAM"
pipeval validate test_files/valid.cram -r /hot/ref/reference/GRCh38-BI-20160721/Homo_sapiens_assembly38.fasta

printf "\n"

echo "CRAM with no index"
pipeval validate test_files/noindex.cram -r /hot/ref/reference/GRCh38-BI-20160721/Homo_sapiens_assembly38.fasta

printf "\n"

echo "CRAM with default reference"
pipeval validate test_files/default_ref.cram

printf "\n"

echo "Invalid CRAM"
pipeval validate test_files/invalid.cram -r /hot/ref/reference/GRCh38-BI-20160721/Homo_sapiens_assembly38.fasta

printf "\n"

echo "Valid SAM"
pipeval validate test_files/valid.sam

Checklist

File Commits

  • This PR does NOT contain Protected Health Information (PHI). A repo may need to be deleted if such data is uploaded.
    Disclosing PHI is a major problem1 - Even a small leak can be costly2.

  • This PR does NOT contain germline genetic data3, RNA-Seq, DNA methylation, microbiome or other molecular data4.

  • This PR does NOT contain other non-plain text files, such as: compressed files, images (e.g. .png, .jpeg), .pdf, .RData, .xlsx, .doc, .ppt, or other output files.

  To automatically exclude such files using a .gitignore file, see here for example.

Code Review Best Practices

  • I have read the code review guidelines and the code review best practice on GitHub check-list.

  • I have set up or verified the main branch protection rule following the github standards before opening this pull request.

  • The name of the branch is meaningful and well formatted following the standards, using [AD_username (or 5 letters of AD if AD is too long)]-[brief_description_of_branch].

  • I have added the major changes included in this pull request to the CHANGELOG.md under the next release version or unreleased, and updated the date.

Testing

  • I have added unit tests for the new feature(s).

  • I modified the integration test(s) to include the new feature.

  • All new and previously existing tests passed locally and/or on the cluster.

  • The docker image built successfully on the cluster.

Footnotes

  1. UCLA Health reaches $7.5m settlement over 2015 breach of 4.5m patient records

  2. The average healthcare data breach costs $2.2 million, despite the majority of breaches releasing fewer than 500 records.

  3. Genetic information is considered PHI.
    Forensic assays can identify patients with as few as 21 SNPs

  4. RNA-Seq, DNA methylation, microbiome, or other molecular data can be used to predict genotypes (PHI) and reveal a patient's identity.

@yashpatel6 yashpatel6 mentioned this pull request Nov 30, 2023
11 tasks
formatter_class = argparse.ArgumentDefaultsHelpFormatter
)

parser.add_argument('path', help='One or more paths of files to validate', type=str, nargs='+')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a critical point, but for input paths I generally use something like this extant_file type rather than str.

def extant_file(x):
    """
    'Type' for argparse - checks that file exists but does not open.
    """
    if not os.path.exists(x):
        # Argparse uses the ArgumentTypeError to give a rejection message like:
        # error: argument input: x does not exist
        raise argparse.ArgumentTypeError("{0} does not exist".format(x))
    return x

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point; since the tool is intended to validate files, the existence check so far has been put into the actual validation rather than letting it happen through argparse. I'll add an issue for moving the existence validation to happen through the argument though!

@yashpatel6 yashpatel6 merged commit ee3dbe1 into main Nov 30, 2023
1 check passed
@yashpatel6 yashpatel6 deleted the yashpatel-refactor-cli branch November 30, 2023 18:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants