Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error - zero dimension #2

Open
asmagen opened this issue Apr 18, 2017 · 19 comments
Open

Error - zero dimension #2

asmagen opened this issue Apr 18, 2017 · 19 comments

Comments

@asmagen
Copy link

asmagen commented Apr 18, 2017

Hello,
I get the following error after following the manual for a single-cell dataset I'm working with.

data_obj = featureConstruct(normalized,method = "SelfProjection")
Error in cor(fpkm_for_clust0, method = "pearson") :
'x' has a zero dimension

Why does it happen and how can I solve this?
Thanks, A

@asmagen
Copy link
Author

asmagen commented Apr 18, 2017

Also I get this error:
Error in cor(fpkm_temp, method = "pearson") :
Missing values present in input variable 'x'. Consider using use = 'pairwise.complete.obs'.
I didn't have any NA values in my dataset. Any idea what might cause that?
Thank you.

@GIS-SP-Group
Copy link
Owner

Dear Asmagen,

Have you followed the gene name requirement as stated in the manual?

#########################################################################
Input data:
A data frame of expression values (FPKM, TPM, UMI counts ...), with rows representing genes and columns representing cells. Note the current version of RCA only accepts gene names in the following format: "GenomeLocation_HGNCGeneName_EnsembleID", from which the "HGNCGeneName" is extracted for RCA analysis. For input data with only HGNC names, the users need to attach two strings to the HGNC names to make them into the "XXXX_HGNCGeneNames_YYYY" format"
#########################################################################

@asmagen
Copy link
Author

asmagen commented Apr 19, 2017 via email

@GIS-SP-Group
Copy link
Owner

Correct. Sorry for the inconvenience and we will improve this in the next version.

Huipeng

@asmagen
Copy link
Author

asmagen commented Apr 20, 2017

The same issue still occurs. It doesn't have to do anything with the gene names. What can be done about it?

@GIS-SP-Group
Copy link
Owner

Asmagen,

Wonder if you followed the procedure in Vignettes.

Please paste your script here.

Huipeng

@asmagen
Copy link
Author

asmagen commented Apr 22, 2017

library(RCA)

construct data object

rownames(dataset$counts) = sapply(rownames(dataset$counts),function(v) paste('XXXX',v,'YYYY',sep='_'))
data_obj = dataConstruct(dataset$counts);

filt out lowly expressed genes

data_obj = geneFilt(obj_in = data_obj);

normalize gene expression data

data_obj = cellNormalize(data_obj,method='scQ');

log transform the data

normalized = dataTransform(data_obj,"log10");

project the expression data into Reference Component space

data_obj = featureConstruct(normalized,method = "SelfProjection")

generate cell clusters

data_obj = cellClust(data_obj,method="hclust",deepSplit_wgcna=environment$cluster.param2,min_group_Size_wgcna=2)

cluster.association = data_obj$group_labels_color$groupLabel

@GIS-SP-Group
Copy link
Owner

Hi, Asmagen,

Could you provide the table of "normalized$fpkm_transformed" via email? It seems that the "featureConstruct" failed to select any features.

Huipeng

@asmagen
Copy link
Author

asmagen commented Apr 24, 2017 via email

@GIS-SP-Group
Copy link
Owner

Ok, since your script works well on our data set, this issue is likely specific to your data set.

Let me know if you are ok with sharing the following information, which might help us to figure out what's going on.

dim(normalized$fpkm_raw)
dim(normalized$fpkm)
sum(normalized$geneFilter)
dim(normalized$fpkm_transformed)
max(normalized$fpkm_transformed)
min(normalized$fpkm_transformed)

@asmagen
Copy link
Author

asmagen commented Apr 24, 2017 via email

@asmagen
Copy link
Author

asmagen commented Apr 24, 2017

Any news?

@GIS-SP-Group
Copy link
Owner

Dear Asmegen,

My guess is that the size of your matrix is not compatible with some hard-coded parameters in the package. We need to explore more for a solid answer though.

You could try to run the package with a randomly chosen subset (~500 cells) and see if the problem still exists.

H

@asmagen
Copy link
Author

asmagen commented Apr 27, 2017

Hello,
The featureConstruct works when I select random 500 cells, which is a very small number in comparison to the recent ScRNA-Seq technologies. But the actual clustering fails:
Error in cor(fpkm_temp, method = "pearson") :
Missing values present in input variable 'x'. Consider using use = 'pairwise.complete.obs'.

The code has hard coded parameters that relate to the matrix size? How can it be resolved asap?
Thanks, A

@GIS-SP-Group
Copy link
Owner

GIS-SP-Group commented Apr 28, 2017

Hi, Asmagen,

We have tested our package on many data sets available on our side and it seems to work fine. We are indeed optimizing the package and will release the next version in the next couple of months.

But to have a quick solution for you, we really need something to mimic the difficulty you encountered. We don't need to see your full raw data set. But if you could generate a fake set that could be representative of the original one, that would be great.

Let me know how you think.

H

@asmagen
Copy link
Author

asmagen commented Apr 28, 2017

Attached a subset of the 3k pbmcs published as an example of the Seurat package. The RCA method didn't work for this public dataset as well. Please let me know what's the status when you have news.
example.data.RData.zip

@asmagen
Copy link
Author

asmagen commented May 10, 2017

Hello,
What's the status?
Thanks, A

@enhaofrank
Copy link

Hi, two guys.
Dose the problem have been solved ?
I also get the same error,and my data produced from 10X genomics single cell cellranger pipeline. The data frame of expression values is UMI counts, with rows representing genes and columns representing cells. And gene names is changed to the following format: "GenomeLocation_HGNCGeneName_EnsembleID" .The error info :
data_obj = featureConstruct(normalized,method = "SelfProjection")
Error in cor(fpkm_for_clust0, method = "pearson") :
'x' has a zero dimension

Thank you very much!
Frank

@wiseflying
Copy link

Dear all,

We have been testing the performance of RCA on multiple datasets on our side. For data sets from dropseq protocol, since they are usually under shallow sequencing, some of the cells might have very few expressed genes (FPKM or UMI count >0). This will cause some problem of RCA.

So when running RCA for large data sets, please do a preliminary QC to filter out bad quality cells (with sum(FPKM>0) <=1000 or sum(FPKM>0)<=500, the same of UMI count data).

Please let me know if more stringent QC would solve the problem.

best
Huipeng

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants