Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue about TAG value Liftover in BAM file. #1

Open
yangyxt opened this issue Nov 17, 2020 · 4 comments
Open

Issue about TAG value Liftover in BAM file. #1

yangyxt opened this issue Nov 17, 2020 · 4 comments

Comments

@yangyxt
Copy link

yangyxt commented Nov 17, 2020

First of All, thanks for making this tool to bring up a new possibility in liftover bam file

I tried to use CrossMap to liftover bams but it seems not being able to alter the value of some important tags like MD, NM, MC, XA, SA etc.

I wonder whether AirLift can also liftover these tag values for BAM files?

@canfirtina
Copy link
Member

AirLift mainly performs the following operations:

  1. Airlift extracts the reads from the existing alignment file that fall under either updated or retired regions in the new reference genome (please see the preprint of AirLift for the detailed explanations of these regions). These reads are aligned (not lifted) from scratch to the new reference genome. Thus, all of the fields in the BAM/SAM file are updated.
  2. AirLift updates the positions of the alignments that fall under the constant region so that it matches with the new reference genome (i.e., lifting). AirLift does not update anything but the positions. Our current assumption behind this behavior is the following: edit distance should stay unchanged since constant regions are determined by the blocks that have exact match between two references.

@yangyxt
Copy link
Author

yangyxt commented Dec 1, 2020 via email

@yangyxt
Copy link
Author

yangyxt commented Dec 2, 2020

Dear Firtina,

Since some of the bam TAG values are not lifted in constant region reads. I found these TAG values violates the format regulation from GATK.

Could you pls make sure the returned bam from AirLift can pass the validation by ValidateSamFile (Picard)(https://gatk.broadinstitute.org/hc/en-us/articles/360036854731-ValidateSamFile-Picard-)? So we can use the returned bam files for downstream analysis.

Much appreciated! Thanks!

@canfirtina
Copy link
Member

Hi @yangyxt ,

We made several changes in our implementation. One of the changes includes replacing liftOver with CrossMap so that we update all the relevant fields in the BAM file, not just the start positions.

Please note that we have modified the original implementation of CrossMap, which we now make it available under "https://github.com/canfirtina/CrossMap". You can simply run bash install.sh under AirLift/dependencies/ directory to install the CrossMap tool we modify to use it with AirLift as well as the other dependencies. Please note that need to provide $PWD/bin as BINDIR parameter when calling run_pipeline.sh (assuming $PWD points to /path/to/AirLift/dependencies/). We hope you can now run AirLift successfully.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants