CS Colloquium
Thursday, January 29, 2004
B17 Upson Hall

Golan Yona
Cornell University


The function of genes depends on their extended biological context - their relations to other genes, the set of interactions they form, the pathways they participate in, their subcellular location, and so on.  In this view, there is a growing need to corroborate and integrate data from different resources and aspects of biological systems in order to analyze effectively new genes. Addressing this urgent need, the aim of the BIOZON project is to construct a new unified biological resource and a comprehensive protein and DNA characterization, classification and management system that analyzes biological entities from genes to protein families, biochemical pathways and organisms.  BIOZON is based on an extensive database schema that integrates information at the macro-molecular level as well as at the cellular level, from a variety of resources.

In this seminar I will present several elements of the BIOZON system.  The system uses algorithms and mathematical models that we have developed for detection of domains and of similarities between proteins and protein families, and novel embedding techniques that we have developed and are used to construct a complete "road map" of the protein universe.

Biozon website: (will be accessible starting February 1st, 2004) biozon.cornell.edu


Relevant references:

Niranjan Nagarajan and Golan Yona. (2003). Automatic prediction of protein domains from sequence information using a hybrid learning system. Bioinformatics (in press).

William Dirks and Golan Yona. (2003). A comprehensive study of the notion of functional link between genes based on microarray data, promoter signals, protein-protein interactions and pathway analysis. Technical report TR2004-1921, Computing and Information Science, Cornell University

Jason Davis and Golan Yona. (2003) Prediction of protein-protein interactions and the interaction site from sequence information - an extensive study of the co-evolution model. Technical report TR2004-1919, Computing and Information Science, Cornell University.

Michael Quist and Golan Yona. (2003). Distributional scaling: an algorithm for structure-preserving embedding of metric and nonmetric spaces. http://www.cs.cornell.edu/golan/Papers/embedding.ps

Umar Syed and Golan Yona. (2002). Using a mixture of probabilistic decision trees for direct prediction of protein function. In the proceedings of RECOMB 2003.

Golan Yona and Klara Kedem. (2003). The URMS-RMS hybrid algorithm for fast and sensitive local protein structure alignment. Technical report TR2004-1922, Computing and Information Science, Cornell University.

Golan Yona and Michael Levitt. (2002). Within the twilight zone: A sensitive profile-profile comparison tool based on information theory. Journal of Molecular Biology 315, 1257-1275.

Gill Bejerano and Golan Yona. (2001). Variations on probabilistic suffix trees: statistical modeling and prediction of protein families. Bioinformatics 17 23-43.

Golan Yona and Michael Levitt. (2000). Towards a complete map of the protein space based on a unified sequence and structure analysis of all known proteins. In the proceedings of ISMB 2000, 395-406, AAAI Press.

Golan Yona, Nathan Linial, Michal Linial. (1999). ProtoMap: Automatic classification of protein sequences, a hierarchy of protein families, and local maps of the protein space. Proteins: Structure, Function and Genetics 37, 360-378.