Structural SVM for Protein Sequence Alignment
This is the implementation of structural SVM for training complex
alignment models for protein sequence alignment, especially for
homology modeling. The structural SVM algorithm can incorporate many
relevant features like secondary structure, relative exposed surface
area, profiles and their various interaction into the alignment model.
It was developed under Linux and compiles under gcc, built upon
the svm^light software by Thorsten Joachims.
Please send comments and questions to Chun-Nam Yu.
Source Code
svm_alignment_0.7.tar.gz - which includes the code with documentation and some toy examples.
Datasets
TRAIN_ALI.tar.gz - the training set (a slightly smaller subset than the one used in the paper)
TEST_ALI.tar.gz - the validation set (again a slightly smaller subset than the one used in the paper)
sable_annot.tar.gz
- the set of SABLE annotations of predicted secondary structure and
relative exposed surface area for training and validation set
dssp_annot.tar.gz - the set of DSSP annotations of secondary structure and exposed surface area for training and validation set
Links
The following two programs are used for obtaining structural
annotations for the target and templates as input features to the SVM
alignment program:
CE - the structural alignment program used to generate the examples in training and validation set
SABLE - the program for predicting secondary structure and relative exposed surface area for target sequence
DSSP - the program for computing secondary structure and exposed surface area for template structure
References
The following paper describes this work in detail, and contains many references:
Support Vector Training of Protein Alignment Models (pdf)
C.-N. Yu, T. Joachims, R. Elber, J. Pillardy
Proceeding
of the International Conference in Research in Computational
Biology (RECOMB), 2007