Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Makefile not able to run in one go #1

Open
Citugulia40 opened this issue May 4, 2021 · 12 comments
Open

Makefile not able to run in one go #1

Citugulia40 opened this issue May 4, 2021 · 12 comments

Comments

@Citugulia40
Copy link

Hi
I am using docker image for analysis. But Makefile is not running in one go, it is stopping after first step.

rsat_user@cde778d90cbd:~/rsat_results$ make -f peak-motifs.mk
Creating the result directories
Retrieving 4 upstream sequences boundaries from the genes of module M11
~/rsat_results/upstream1/regulonM11_up1.rm.fna
~/rsat_results/upstream2/regulonM11_up2.rm.fna
~/rsat_results/upstream3/regulonM11_up3.rm.fna
~/rsat_results/upstream4/regulonM11_up4.rm.fna
Retrieving 4 upstream sequences boundaries from all Prunus persica genome

@najlaksouri
Copy link
Collaborator

Dear Citu,
To run the makefile in one go you need to precise the target all.
Means that you need to use the following command: $make -f peak-motifs.mk all

@Citugulia40
Copy link
Author

Citugulia40 commented May 5, 2021

Dear Najla,
Thanks for your kind response.
I am now able to run make file in one go successfully. But I am receiving the error in the peak-motifs command.
Please let me know if I am doing anything wrong or how can I resolve this?

rsat_user@cde778d90cbd:~/rsat_results$ make -f peak-motifs.mk all
Creating the result directories
Retrieving 4 upstream sequences boundaries from the genes of module M11
~/rsat_results/upstream1/regulonM11_up1.rm.fna
~/rsat_results/upstream2/regulonM11_up2.rm.fna
~/rsat_results/upstream3/regulonM11_up3.rm.fna
~/rsat_results/upstream4/regulonM11_up4.rm.fna
Retrieving 4 upstream sequences boundaries from all Prunus persica genome
Creating 2 random clusters
Replicate 1
Replicate 2
retrieve the different upstream sequence lengths from the random clusters
Replicate 1
Replicate 2
Running peak-motifs for M11 within upstream1
sequence length: 392375, number of masked symbols: 50808 (12.95 percent of the sequences)
sh: 1: Syntax error: word unexpected (expecting ")")
Error
OpenInputFile: File /home/rsat_user/rsat_results/upstream1/regulonM11.rm.fna.peaks-rm/results/composition/peaks_test_freq-1str-ovlp_1nt.tab does not exist.
Error
OpenInputFile: File /home/rsat_user/rsat_results/upstream1/regulonM11.rm.fna.peaks-rm/results/composition/peaks_test_freq-1str-ovlp_1nt.tab does not exist.
sh: 1: Syntax error: word unexpected (expecting ")")
Error

Thanks in advance

@najlaksouri
Copy link
Collaborator

Hi Citu,
Are you running the analysis on Mac? because on Linux am not getting this error
Could you please confirm that and we will try to figure out the problem

Thanks

@Citugulia40
Copy link
Author

Hi Najal,
I am running this on Centos7 linux.

Thanks

@brunocontrerasmoreira
Copy link
Contributor

Thanks @Citugulia40 , I have tested the container running Docker version 20.10.6, build 370c289 in two hosts:
i) Ubuntu 18.04 -> works fine
ii) macOS Big Sur 11.31.1 -> get the same errors your report

This is apparently related to the way arguments to scripts are handled by the shell within the container. We will investigate and produce a new container, but this will take us some time. Our only suggestion for the moment is to run it in Ubuntu, hope this helps,
Bruno

@brunocontrerasmoreira
Copy link
Contributor

brunocontrerasmoreira commented May 6, 2021

Hi Najal,
I am running this on Centos7 linux.

Thanks

In your running container, if you type ps, which shell is running?

@Citugulia40
Copy link
Author

Thank you so much.
I will try it on Ubuntu.
My shell is bash.

@Citugulia40
Copy link
Author

I have another question regarding de novo motif discovery using RSAT-peak motifs. I have a set of co-expressed genes and I have used your pipeline to discover de novo motifs in my genes of interest but I am not getting any significant motif in my gene list. Can you please recommend me any parameter that can be changed to get statistical significant motifs in my set of genes.

@najlaksouri
Copy link
Collaborator

najlaksouri commented May 17, 2021

Hi Citu,
If i'm not wrong, i suppose that you are comparing between the significance of the motifs identified in your genes of interest and those of the control negative clusters.

  • Could you please show us an example of these results?

  • How many genes do you have in your list? To apply our methodology we recommend using clusters with at least 15 sequences.

  • If you have a positive control cluster (group of genes with an experimentally verified motif), you can verify whether the motif discovery protocol will be able to return this same motif.

  • How did you define the boundaries of the proximal promoter region? As you are working with Chlamydomonas, we suggest you to analyze a short interval (could be from -250bp to +100 bp)

@Citugulia40
Copy link
Author

Thanks.

  1. When I run the RSAT peak motifs, I have got highest significance for my genes of interest "k-mer sig= 2.19; evalue=0.0065" and when I have compared with negative clusters, the significance is falling between the negative clusters (not very high).
  2. I have 519 genes in my list.
  3. I don't have any exprerimentally verified motif for my set of genes.
  4. I have tried the same that you had mentioned in your paper.
    Yes, you are right, I will try the promoter region from -250bp to +100bp.

@Citugulia40
Copy link
Author

Hi,
I have also tried -250bp to +100bp region but I am still not getting any significant motif in my set of genes. Can you suggest changing of any parameter so that I can get the significant motifs?

Thanks in advance

@brunocontrerasmoreira
Copy link
Contributor

Good morning @Citugulia40 , at this point I have 2 suggestions:

  1. Find yourself a good positive control, which can be a regulon/group of promoters known in the literature to be bound by the same transcription factor. This would be handy to validate the protocol and optimize in your setting.
  2. Refine your cluster by using additional expression/GO data to split it into smaller clusters

Please let us know how that goes,
Bruno

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants