Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducing results from the UMI paper #4

Open
Redmar-van-den-Berg opened this issue Jun 13, 2022 · 8 comments
Open

Reproducing results from the UMI paper #4

Redmar-van-den-Berg opened this issue Jun 13, 2022 · 8 comments

Comments

@Redmar-van-den-Berg
Copy link

I want to replicate the findings from the umi-tools publication, but I cannot find the 'reaper' tool that is mentioned in the requirements.

@IanSudbery
Copy link
Member

Hi. You can get reapear here: https://www.ebi.ac.uk/research/enright/software/kraken

I'll add it to the requirements documentation.

@Redmar-van-den-Berg
Copy link
Author

Thanks, I was able to install reaper from the link you posted.

I ran into another issue though, CGAT is also listed as a requirement, but the repository appears to be archived, and there is no release 0.2.4. When I try to run the pipelines I get

Traceback (most recent call last):
  File "../UMI-tools_pipelines/pipeline_scRNASeq.py", line 83, in <module>
    import CGAT.Experiment as E
ImportError: No module named CGAT.Experiment

Do you have any suggestions on how to install the correct version of the CGAT module?

@IanSudbery
Copy link
Member

The best course of action here probably depends on what your aim is. If you are undertaking a reproducibility excercise, then the code for this particular version is here:

https://github.com/CGATOxford/cgat/tree/v0.2.4

You are also going to need from https://github.com/CGATOxford/CGATPipelines (the commit has is in the requirements)

However, I don't know how well the installation procedures will work.

So, if you are just interested in reproducing the results, rather then testing how reproducible the code is, then I recommend you install the modern versions. You will need cgat-apps which is installable from bioconda and cgat-flow which you can get from here:
https://github.com/cgat-developers/cgat-flow

cgat-flow has an installation script which will build the a conda envirnoment to run it in. However, I tend to do it manually by cloning the repo and typing

mamba env create -f conda/environments/cgat-flow-pipelines.yml -p /PATH/TO/LOCATION/FOR/ENV

activating the environment

source activate /PATH/TO/LOCATION/FOR/ENV

then running python setup.py develop

If you take this path, the pipelines will need a little light refactoring, but I can probably help with that once exam season is over.

@Redmar-van-den-Berg
Copy link
Author

The background for my request is that I have developed a reference-free umi deduplication tool, and I would like to compare it's results against UMI-tools. I've been trying to install the original versions in a conda environment, but it appears they are too old to easily install.

Would you recommend I use the latest version of cgat-apps and cgat-flow, and then run the UMI-tools_pipelines in that environment?

@Redmar-van-den-Berg Redmar-van-den-Berg changed the title Add link to reaper tool to readme Reproducing results from the UMI paper Jun 15, 2022
@IanSudbery
Copy link
Member

Yes, that is probably the best way. Some things have changed in the API since 2015. CGAT was split into cgat and cgatcore. Several of the modules now have lowercase names e.g. experiment rather than Experiment and pipeline rather than Pipeline, and most CamelCase names are now underscore_names (e.g. openFile is now open_file), but its not too difficult a transition and installation is loads easier.

@Redmar-van-den-Berg
Copy link
Author

@IanSudbery I've been refactoring the pipeline to work with the latest versions of cgat, but I'm stuck on report section of the pipeline (P.run_report and P.publish_report, see here) which do not appear to exist. Has this functionality removed from the latest version? If so, do you have any guidance on what these task should do, and how to replicate this in the latest version?

@IanSudbery
Copy link
Member

IanSudbery commented Jun 20, 2022 via email

@Redmar-van-den-Berg
Copy link
Author

@IanSudbery
I've made a start with refactoring (see here), but I'm having a hard time installing all required tools in a single conda environment, especially since some of the old cgat code the pipelines rely on is still on python 2. Do you have time to take a look at creating a conda environment that can run these pipelines?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants