GIMSAN

GIMSAN (GIbbsMarkov with Significance ANalysis) is a novel web-server tool for de novo motif discovery:

It is currently available as a web application at:

GIMSAN @ BioHPC

It is also available as a stand-alone application on Unix and PBS (Portable Batch System) cluster:

Download GIMSAN @ Unix



Commercial use of GIMSAN without written permission from the authors is prohibited. If you use this program in your research, please cite: The motif significance analysis approach is described in detail in: If you use the sequence logos from GIMSAN in your research, please cite WebLogo.


 
GIMSAN input options
Options Cmdline Description
input FASTA --f a set of sequences in FASTA format for de novo motif discovery
background model --bg file (FASTA format) for background model estimation. For example, this can be a set of S. Cerevisiae intergenic sequences. This data is used to generate null sets of sequences that preserve the dimensions and local GC-content of the input set, as well as estimating the background model for the de novo motif-finding task.

Note: It is recommended that the user either "upload your own genomic file" or use "one of our standard genomic files".
motif widths --w user can specify a range of motif widths. We previously showed that selecting the optimal width based on our significance analysis can improve the results of de novo motif discovery
size of nullset --nullset size of the randomly drawn set to estimate the motif-finder's null distribution based on 3-Gamma approximation. A larger null set would give a more accurate p-value at the expense of longer runtime.
number of processors   the number of processors to allocate on the computer cluster. The specified number of processors are allocated before any execution of the job. Therefore, it is recommended that this parameter should be set to less than 5.



GIMSAN sample output on FHL1 motif



We thank Robert Bukowski for deploying GIMSAN as a web application. GIMSAN is based upon work supported by the National Science Foundation under Grant No. 0644136.

Please send any questions, comments, or suggestions to ppn3@cs.cornell.edu