Structural SVM for Protein Sequence Alignment

This is the implementation of structural SVM for training complex alignment models for protein sequence alignment, especially for homology modeling. The structural SVM algorithm can incorporate many relevant features like secondary structure, relative exposed surface area, profiles and their various interaction into the alignment model. It was developed under Linux and compiles under gcc, built upon the svm^light software by Thorsten Joachims.

Please send comments and questions to Chun-Nam Yu.



Source Code

svm_alignment_0.7.tar.gz - which includes the code with documentation and some toy examples.



Datasets


TRAIN_ALI.tar.gz - the training set (a slightly smaller subset than the one used in the paper)
TEST_ALI.tar.gz - the validation set (again a slightly smaller subset than the one used in the paper)
sable_annot.tar.gz - the set of SABLE annotations of predicted secondary structure and relative exposed surface area for training and validation set
dssp_annot.tar.gz - the set of DSSP annotations of secondary structure and exposed surface area for training and validation set



Links


The following two programs are used for obtaining structural annotations for the target and templates as input features to the SVM alignment program:

CE - the structural alignment program used to generate the examples in training and validation set
SABLE
- the program for predicting secondary structure and relative exposed surface area for target sequence
DSSP - the program for computing secondary structure and exposed surface area for template structure



References

The following paper describes this work in detail, and contains many references:

Support Vector Training of Protein Alignment Models (pdf)
C.-N. Yu, T. Joachims, R. Elber, J. Pillardy
Proceeding of the International Conference in Research in Computational Biology (RECOMB), 2007