Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUSCO datasets #29

Open
kokyriakidis opened this issue Apr 26, 2019 · 12 comments
Open

BUSCO datasets #29

kokyriakidis opened this issue Apr 26, 2019 · 12 comments

Comments

@kokyriakidis
Copy link

kokyriakidis commented Apr 26, 2019

Hello, do I have to choose which datasets to include, or could I use them all? I am running an analysis on Chelonia Mydas.

The lineage is

Lineage( full )
cellular organisms; Eukaryota; Opisthokonta; Metazoa; Eumetazoa; Bilateria; Deuterostomia; Chordata; Craniata; Vertebrata; Gnathostomata; Teleostomi; Euteleostomi; Sarcopterygii; Dipnotetrapodomorpha; Tetrapoda; Amniota; Sauropsida; Sauria; Archelosauria; Testudines; Cryptodira; Durocryptodira; Americhelydia; Chelonioidea; Cheloniidae; Caretta

Should I use Tetrapoda dataset? Should I use Tetrapoda AND eukaryota? Or should I use more?

@AdamStuckert
Copy link
Contributor

Tetrapoda will probably be the most informative for your purposes.

@macmanes
Copy link
Contributor

but to keep is simple, for the running of the assembly itself, just stick with the default Euk database. Once you have an assembly agree that Tetrapoda will be good!

@kokyriakidis
Copy link
Author

kokyriakidis commented Apr 29, 2019

I am using the latest docker image. Running the first command runs the pipeline all at once as it says. Should I run the annotation and the evaluation commands, or these are already run with the first command?

@macmanes
Copy link
Contributor

running the 1st command runs the entire pipeline, including TransRate and BUSCO (with Euk database). After that, you can annotate or do whatever else you want to. Does this make sense?

@kokyriakidis
Copy link
Author

Yes, and thank you both very much for this work!

@kokyriakidis
Copy link
Author

@macmanes Another question! Can I use several samples together? Or I have to concatenate their _1 and _2 fastq files?

@macmanes
Copy link
Contributor

macmanes commented Apr 29, 2019 via email

@kokyriakidis
Copy link
Author

@macmanes Could you please explain why is that? Biological replicates wouldn't help assembling lower expressed regions?

@kokyriakidis
Copy link
Author

kokyriakidis commented Apr 30, 2019

@macmanes I have 6 RNAseq libraries (~35M reads each), 3 are normal 3 are not normal. Should I run 3 times the pipeline for the normal and then fuse them with orthofuser and do the same for the other 3 and then fuse the 2 merged? I have read that above 40M reads will be little to no improvement. Using 2 samples 1 from normal and 1 from not normal will it help to recall better transcripts?

@macmanes
Copy link
Contributor

macmanes commented Apr 30, 2019 via email

@kokyriakidis
Copy link
Author

kokyriakidis commented Apr 30, 2019

@macmanes Thank you for your reply!
These 6 samples are from 3 pairs of siblings. Do you think I should choose 1 normal and it's not normal sibling? or chose one from another family?

@macmanes
Copy link
Contributor

macmanes commented May 1, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants