SVM^struct

Support Vector Machine for Complex Outputs

Author: Thorsten Joachims <thorsten@joachims.org>
Cornell University
Department of Computer Science

Version: 3.10
Date: 14.08.2008

Overview

SVM^struct is a Support Vector Machine (SVM) algorithm for predicting multivariate or structured outputs. It performs supervised learning by approximating a mapping

h: X --> Y

using labeled training examples (x₁,y₁), ..., (x_n,y_n). Unlike regular SVMs, however, which consider only univariate predictions like in classification and regression, SVM^struct can predict complex objects y like trees, sequences, or sets. Examples of problems with complex outputs are natural language parsing, sequence alignment in protein homology detection, and markov models for part-of-speech tagging. The SVM^struct algorithm can also be used for linear-time training of binary and multi-class SVMs under the linear kernel [4].

The 1-slack cutting-plane algorithm implemented in SVM^struct V3.10 uses a new but equivalent formulation of the structural SVM quadratic program and is several orders of magnitude faster than prior methods. The algorithm is described in [5]. The n-slack algorithm of SVM^struct V2.50 is described in [1][2]. The SVM^struct implementation is based on the SVM^light quadratic optimizer [3].

Existing Instantiations

SVM^struct can be thought of as an API for implementing different kinds of complex prediction algorithms. Currently, we have implemented the following learning tasks:

SVM^struct Python: A python interface to the SVM^struct API for implementing your own structured prediction method. The Python interface makes prototyping much easier and faster than working in C.
More information and source code.
SVM^struct Matlab: A matlab interface to the SVM^struct API for implementing your own structured prediction method. Again, prototyping should be much easier and faster than working in C.
More information and source code.
Latent SVM^struct: Training of structural SVM predictions rules when the training labels are not fully observed (e.g. unobserved dependency structure in NP-coref, motif finding, ranking with weak orderings).
More information and source code.
SVM^multiclass: Multi-class classification. Learns to predict one of k mutually exclusive classes. This is probably the simplest possible instance of SVM^struct and serves as a tutorial example of how to use the programming interface.
More information and source code.
SVM^cfg: Learns a weighted context free grammar from examples. Training examples (e.g. for natural language parsing) specify the sentence along with the correct parse tree. The goal is to predict the parse tree of new sentences.
More information and source code.
SVM^align: Learning to align sequences. Given examples of how sequence pairs align, the goal is to learn a complex substitution and insertion/deletion model so that one can predict alignments of new sequences.
More information and source code.
SVM^hmm: Learns a hidden Markov model from examples. Training examples (e.g. for part-of-speech tagging) specify the sequence of words along with the correct assignment of tags (i.e. states). The goal is to predict the tag sequences for new sentences.
More information and source code.
SVM^map: Learns rankings that optimize Mean Average Precision (MAP) as the performance metric.
More information and source code.
SVM^div: Learns to predict diversified rankings and sets for Information Retrieval.
More information and source code.
SVM^perf: Learns a binary classification rule that directly optimizes ROC-Area, F1-Score, or the Precision/Recall Break-Even Point. It is also a training algorithm for conventional linear binary classification SVMs that can be orders of magnitude faster than SVM-light for large datasets.
More information and source code.
SVM^rank: Learns a rule for predicting rankings as typically used in search engines and other retrieval systems. It is equivalent to SVM-light in '-z p' mode, but it is a much more efficient algorithm for training Ranking SVMs.
More information and source code.
Propensity SVM^rank: Learns a ranking function from biased and incomplete data, especially data that comes from clicks, through counterfactual empirical risk miminization.
More information and source code.
SVM^pairMRF: Semantic scene labeling for 3D point cloud data. Basically learns a general Markov Random Field model with pairwise potentials and can be used beyond that specific application.
More information and source code.
SVM^sle: Learning algorithm for predicting document-level sentiment polarities with latent explanations.
More information and source code.
SVM^struct for activity recognition: Learning algorithm for training activity recognizers for video.
More information and source code.
SVM^struct for web-page segmentation: Learning algorithm for segmenting web pages based on directed acyclic graph structure.
More information and source code.

Please let me know, if you want me to add your implementations to this list.

Source Code for Implementing your Own Instantiation

Instead of using one of the existing instantiations of SVM^struct listed above, you can implement your own. SVM^struct contains an API that let's you specialize the general sparse approximation training algorithm for your particular application. Referring to the algorithm as presented in [1], you merely need to provide the code for the following:

A function for computing the feature vector Psi.
A function for computing the argmax over the (kernelized) linear discriminant function.
A function for computing the argmax over the loss-augmented (kernelized) linear discriminant function.
A loss function.

You can download the source code of the algorithm and the API from the following location:

      https://osmot.cs.cornell.edu/svm_struct/current/svm_struct.tar.gz

If you are not so eager on C programming, then you might want to look at the Python API by Thomas Finley or the Matlab API by Andrea Vedaldi. They make it substantially easier to prototype in than using the original C API, but offer essentially the same functionality and call the original C-code internally. Also, both the Pyton and the Matlab APIs are identical in their structure to the C API described below, so it is easy to switch between them.

If you decide to use the C version, the file you downloaded above contains the source code of the most recent version of SVM^struct as well as the source code of the SVM^light quadratic optimizer. Unpack the archive using the shell command:

      gunzip –c svm_struct.tar.gz | tar xvf –

This expands the archive into the current directory, which now contains all relevant files. You can compile SVM^struct with the empty API using the command

      make

in the root directory of the archive. It will output some warnings, since the functions of the API are only templates and do not return values as required. However, it should produce the executables svm_empty_learn svm_empty_classify. "empty" is a placeholder where you can substitute a meaningful name for your particular instance of SVM^struct. To implement your own instantiation, you will need to edit the following files:

svm_struct_api_types.h
svm_struct_api.c

Both files already contain empty templates. The first file contains the type definitions that need to be changed. PATTERN is the structure for storing the x-part of an example (x,y), LABEL is the y-part. The learned model will be stored in STRUCTMODEL. Finally, STRUCT_LEARN_PARM can be used to store any parameters that you might want to pass to the function. The file svm_struct_api.h contains the functions you need to implement. See the documentation in the file for details. You might also want to look at the other instantiations of SVM^struct for examples of how to use the API.

Finally, you can also implement your own structural SVM training algorithm in SVM^struct using the file svm_struct_learn_custom.c. By putting your algorithm into the empty function there, you can access the API and all the instance-specific functions that the algorithms already implemented in SVM^struct are using. Your custom algorithm is then selected via the -w 9 option. This makes it easy to test new algorithms and compare them against the existing algorithms.

How to Use

Compiling creates the executable svm_empty_learn, which performs the learning, and the executable svm_empty_classify for classifying new examples. Usage is much like SVM^light. You call it like

      svm_empty_learn -c 1.0 train.dat model.dat

which trains an SVM on the training set train.dat and outputs the learned rule to model.dat using the regularization parameter C set to 1.0 (note that this crashes for the empty API -- use one of the other instantiations from above for a working example). The format of the train file and the model file depend on the particular instantiation of SVM^struct. Other options are:

General Options:
         -?          -> this help
         -v [0..3]   -> verbosity level (default 1)
         -y [0..3]   -> verbosity level for svm_light (default 0)
Learning Options:
         -c float    -> C: trade-off between training error
                        and margin (default 0.01)
         -p [1,2]    -> L-norm to use for slack variables. Use 1 for L1-norm,
                        use 2 for squared slacks. (default 1)
         -o [1,2]    -> Rescaling method to use for loss.
                        1: slack rescaling
                        2: margin rescaling
                        (default 2)
         -l [0..]    -> Loss function to use.
                        0: zero/one loss
                        ?: see below in application specific options
                        (default 0)
Optimization Options (see [2][5]):
         -w [0,..,9] -> choice of structural learning algorithm (default 3):
                        0: n-slack algorithm described in [1]
                        1: n-slack algorithm with shrinking heuristic
                        2: 1-slack algorithm (primal) described in [5]
                        3: 1-slack algorithm (dual) described in [5]
                        4: 1-slack algorithm (dual) with constraint cache [5]
                        9: custom algorithm in svm_struct_learn_custom.c
         -e float    -> epsilon: allow that tolerance for termination
                        criterion (default 0.100000)
         -k [1..]    -> number of new constraints to accumulate before
                        recomputing the QP solution (default 100) (-w 0 and 1 only)
         -f [5..]    -> number of constraints to cache for each example
                        (default 5) (used with -w 4)
         -b [1..100] -> percentage of training set for which to refresh cache
                        when no epsilon violated constraint can be constructed
                        from current cache (default 100%) (used with -w 4)
SVM-light Options for Solving QP Subproblems (see [3]):
         -n [2..q]   -> number of new variables entering the working set
                        in each svm-light iteration (default n = q).
                        Set n < q to prevent zig-zagging.
         -m [5..]    -> size of svm-light cache for kernel evaluations in MB
                        (default 40) (used only for -w 1 with kernels)
         -h [5..]    -> number of svm-light iterations a variable needs to be
                        optimal before considered for shrinking (default 100)
         -# int      -> terminate svm-light QP subproblem optimization, if no
                        progress after this number of iterations.
                        (default 100000)
Kernel Options:
         -t int      -> type of kernel function:
                        0: linear (default)
                        1: polynomial (s a*b+c)^d
                        2: radial basis function exp(-gamma ||a-b||^2)
                        3: sigmoid tanh(s a*b + c)
                        4: user defined kernel from kernel.h
         -d int      -> parameter d in polynomial kernel
         -g float    -> parameter gamma in rbf kernel
         -s float    -> parameter s in sigmoid/poly kernel
         -r float    -> parameter c in sigmoid/poly kernel
         -u string   -> parameter of user defined kernel
Output Options:
         -a string   -> write all alphas to this file after learning
                        (in the same order as in the training set)
Application-Specific Options:
         --* string  -> custom parameters that can be adapted for struct
                        learning. The * can be replaced by any character
                        and there can be multiple options starting with --.

For more details on the meaning of these options consult references [1][3][5] and the description of SVM^light. The options starting with -- are those specific to the instantiation and are specified via the API.

Disclaimer

This software is free only for non-commercial use. It must not be distributed without prior permission of the author. The author is not responsible for implications from the use of this software.

Known Problems

none

History

V3.00 - 3.10

Reimplementation of -w 3 and -w 4 algorithms to improve memory management and speed.
Added "mini-batch" updates via the -b option.
Added the option to implement additional algorithms in svm_struct_learn_custom.c and select them via -w 9.
Fixed bug in RBF Kernel.
Fixed precision issues on 64-bit AMD and Intel machines.
Cleaned up the API to improve passing optional arguments to the classification module.
Source code for SVM^struct V3.00.

V2.00 - 3.00

This version implements a new algorithm for structural SVM training (options -w 2, -w 3, -w 4). The algorithm is based on an alternative formulation of the structural SVM training problem that has the same solution as the conventional formulation. This new one-slack formulation allows a cutting-plane algorithm that has time complexity linear in the number of training examples. For large datasets, it is several orders of magnitude faster than the old cutting-plane algorithm.
New IO routines that are faster for reading large data and model files.
Source code for SVM^struct V2.00.

References

[1] I. Tsochantaridis, T. Hofmann, T. Joachims, and Y. Altun. Support Vector Learning for Interdependent and Structured Output Spaces, ICML, 2004. [Postscript (gz)] [PDF] [BibTeX]

[2] T. Joachims. Learning to Align Sequences: A Maximum Margin Approach, Technical Report, August, 2003. [Postscript (gz)] [PDF] [BibTeX]

[3] T. Joachims, Making Large-Scale SVM Learning Practical. Advances in Kernel Methods - Support Vector Learning, B. Schölkopf and C. Burges and A. Smola (ed.), MIT Press, 1999. [Postscript (gz)] [PDF] [BibTeX]

[4] T. Joachims, Training Linear SVMs in Linear Time, Proceedings of the ACM Conference on Knowledge Discovery and Data Mining (KDD), 2006. [Postscript (gz)] [PDF] [BibTeX]

[5] T. Joachims, T. Finley, Chun-Nam Yu, Cutting-Plane Training of Structural SVMs, Machine Learning Journal, 77(1):27-59, 2009. [PDF] [BibTeX]

SVMstruct