Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Select reduced transcriptome from clusters #18

Open
mpaya opened this issue Oct 12, 2017 · 2 comments
Open

Select reduced transcriptome from clusters #18

mpaya opened this issue Oct 12, 2017 · 2 comments

Comments

@mpaya
Copy link

mpaya commented Oct 12, 2017

Hi,
I am comparing clustering results from CD-HIT and RapClust. One of the characteristics of CD-HIT is that it selects one representative transcript per cluster, while rapclust doesn't. Would it be representative to also select the largest transcript from RapClust clusters to generate assemblies with reduced redundancy?
Thank you

@rob-p
Copy link
Contributor

rob-p commented Oct 13, 2017

Hi @mpaya,

The clustering methodology of CD-HIT is considerably different from that of RapClust. Specifically, in CD-HIT selecting a single cluster member as a representative is often reasonable because the clusters are formed from sequences that are generally very similar. However, RapClust aims to cluster together multiple transcript isoforms of the same gene, which can vary considerably in their length and sequence composition (e.g. through the inclusion or exclusion of alternatively-spliced exons). Hence, the idea of selecting a single representative sequence from the cluster isn't as straightforward, though it is true that selecting the longest transcript is likely to choose the one that contains much of the sequence in the cluster, it is not necessarily likely to be pairwise-similar to all cluster members.

More generally, how you select a representative might depend on which type of analysis you hope to do. One approach to representative generation that is compatible with RapClust is the Lace method from the Oshlack group --- it's probably worth taking a look over that paper if you're not already familiar with it and seeing if it will suit your needs.

@mpaya
Copy link
Author

mpaya commented Oct 13, 2017

Hi @rob-p,

For this current project, the analysis that I was expecting to do was just comparing results of CD-HIT and RapClust. On the reduced assemblies, after selection of a single cluster representative, the purpose is to use Transrate, Transdecoder and BUSCO results for comparison. So our concern was whether this naive representative selection on RapClust to generate this artificial reduced assembly may or not be acceptable. I wasn't familiar with Lace, would you recommend to use this output instead for the indicated purpose?

Thank you for your kind help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants