SVM^multiclass

Multi-Class Support Vector Machine

Author: Thorsten Joachims <thorsten@joachims.org>
Cornell University
Department of Computer Science

Version: 2.20
Date: 14.08.2008

Overview

SVM^multiclass uses the multi-class formulation described in [1], but optimizes it with an algorithm that is very fast in the linear case. For a training set (x₁,y₁) ... (x_n,y_n) with labels y_i in [1..k], it finds the solution of the following optimization problem during training.

        min 1/2 Σ_i=1..k w_i*w_i + C/n Σ_{i = 1..n} ξ_i
         s.t. for all y in [1..k]: [ x₁ • w_yi ] >= [ x₁ • w_y ] + 100*Δ(y₁,y) - ξ_{1


...} for all y in [1..k]: [ x_n • w_yn ] >= [ x_n • w_y ] + 100*Δ(y_n,y) - ξ_n

C is the usual regularization parameter that trades off margin size and training error. Δ(y_n,y) is the loss function that returns 0 if y_n equals y, and 1 otherwise.

To solve this optimization problem, SVM^multiclass uses an algorithm that is different from the one in [1]. The algorithm is based on Structural SVMs [2] and it is an instance of SVM^struct. For linear kernels, SVM^multiclass V2.20 is very fast and runtime scales linearly with the number of training examples. Non-linear kernels are not (really) supported. It also serves as a easy tutorial example of how to use the SVM^struct programming interface. More information on SVM^struct is available here.

Source Code and Binaries

The program is free for scientific use. Please contact me, if you are planning to use the software for commercial purposes. The software must not be further distributed without prior permission of the author. The implementation was developed on Linux with gcc, but compiles also on Cygwin, Windows, and Mac (after small modifications, see FAQ).

The source code is available at the following location:

https://osmot.cs.cornell.edu/svm_multiclass/current/svm_multiclass.tar.gz If you just want the binaries, you can download them for the following systems:

Please send me email and let me know that you got it. The archive contains the source code of the most recent version of SVM^multiclass, which includes the source code of SVM^struct and the SVM^light quadratic optimizer. Unpack the archive using the shell command:

      gunzip –c svm_multiclass.tar.gz | tar xvf –

This expands the archive into the current directory, which now contains all relevant files. You can compile SVM^multiclass using the command

      make

in the top-level directory of the archive. This will produce the executables svm_multiclass_learn and svm_multiclass_classify. If the system does not compile properly, check this FAQ.

How to Use

SVM^multiclass consists of a learning module (svm_multiclass_learn) and a classification module (svm_multiclass_classify). The classification module can be used to apply the learned model to new examples. See also the examples below for how to use svm_multiclass_learn and svm_multiclass_classify.

Usage is much like SVM^light. You call it like

      svm_multiclass_learn -c 1.0 example_file model_file

which trains an SVM on the training set example_file and outputs the learned rule to model_file using the regularization parameter C set to 1.0. Note that the C parameters is scaled differently from SVM^light. Other options are:

General Options:
         -?          -> this help
         -v [0..3]   -> verbosity level (default 1)
         -y [0..3]   -> verbosity level for svm_light (default 0)
Learning Options:
         -c float    -> C: trade-off between training error
                        and margin (default 0.01)
         -p [1,2]    -> L-norm to use for slack variables. Use 1 for L1-norm,
                        use 2 for squared slacks. (default 1)
         -o [1,2]    -> Rescaling method to use for loss.
                        1: slack rescaling
                        2: margin rescaling
                        (default 2)
         -l [0..]    -> Loss function to use.
                        0: zero/one loss
                        ?: see below in application specific options
                        (default 0)
Optimization Options (see [2][4]):
         -w [0,..,9] -> choice of structural learning algorithm (default 4):
                        0: n-slack algorithm described in [2]
                        1: n-slack algorithm with shrinking heuristic
                        2: 1-slack algorithm (primal) described in [4]
                        3: 1-slack algorithm (dual) described in [4]
                        4: 1-slack algorithm (dual) with constraint cache [4]
                        9: custom algorithm in svm_struct_learn_custom.c
         -e float    -> epsilon: allow that tolerance for termination
                        criterion (default 0.100000)
         -k [1..]    -> number of new constraints to accumulate before
                        recomputing the QP solution (default 100) (-w 0 and 1 only)
         -f [5..]    -> number of constraints to cache for each example
                        (default 5) (used with -w 4)
         -b [1..100] -> percentage of training set for which to refresh cache
                        when no epsilon violated constraint can be constructed
                        from current cache (default 100%) (used with -w 4)
         -n [2..q]   -> number of new variables entering the working set
                        in each svm-light iteration (default n = q).
                        Set n < q to prevent zig-zagging.
         -m [5..]    -> size of svm-light cache for kernel evaluations in MB
                        (default 40) (used only for -w 1 with kernels)
         -h [5..]    -> number of svm-light iterations a variable needs to be
                        optimal before considered for shrinking (default 100)
         -# int      -> terminate svm-light QP subproblem optimization, if no
                        progress after this number of iterations.
                        (default 100000)
Kernel Options:
         -t int      -> type of kernel function:
                        0: linear (default)
                        1: polynomial (s a*b+c)^d
                        2: radial basis function exp(-gamma ||a-b||^2)
                        3: sigmoid tanh(s a*b + c)
                        4: user defined kernel from kernel.h
         -d int      -> parameter d in polynomial kernel
         -g float    -> parameter gamma in rbf kernel
         -s float    -> parameter s in sigmoid/poly kernel
         -r float    -> parameter c in sigmoid/poly kernel
         -u string   -> parameter of user defined kernel
Output Options:
         -a string   -> write all alphas to this file after learning
                        (in the same order as in the training set)
Application-Specific Options:
          none

For more details on the meaning of these options consult the description of SVM^struct, the description of SVM^light, and references [2][4].

The input file example_file contains the training examples. The file format is the same as for SVM^light, just that the target value is now a positive integer that indicates the class. The first lines may contain comments and are ignored if they start with #. Each of the following lines represents one training example and is of the following format:

<line> .=. <target> <feature>:<value> <feature>:<value> ... <feature>:<value> # <info>
<target> .=. <integer>
<feature> .=. <integer>
<value> .=. <float>
<info> .=. <string>

The target value and each of the feature/value pairs are separated by a space character. Feature/value pairs MUST be ordered by increasing feature number. Features with value zero can be skipped. The target value denotes the class of the example via a positive (non-zero) integer. So, for example, the line

3 1:0.43 3:0.12 9284:0.2 # abcdef

specifies an example of class 3 for which feature number 1 has the value 0.43, feature number 3 has the value 0.12, feature number 9284 has the value 0.2, and all the other features have value 0. In addition, the string abcdef is stored with the vector, which can serve as a way of providing additional information when adding user defined kernels.

The result of svm_multiclass_learn is the model which is learned from the training data in example_file. The model is written to model_file. To make predictions on test examples, svm_multiclass_classify reads this file. svm_multiclass_classify is called as follows:

svm_multiclass_classify [options] test_example_file model_file output_file

For all test examples in test_example_file the predicted classes (and the values of x • w_i for each class) are written to output_file. There is one line per test example in output_file in the same order as in test_example_file. The first value in each line is the predicted class, and each of the following numbers are the discriminant values for each of the k classes.

Getting started: an Example Problem

You will find an example problem with 7 classes at

https://osmot.cs.cornell.edu/svm_multiclass/examples/example4.tar.gz

Download this file into your svm_multiclass directory and unpack it with

gunzip -c example4.tar.gz | tar xvf -

This will create a subdirectory example4. There are 300 examples in the file train.dat and 2000 in the file test.dat. To run the example, execute the commands:

svm_multiclass_learn -c 5000 example4/train.dat example4/model
svm_multiclass_classify example4/test.dat example4/model example4/predictions

The accuracy on the test set is printed to stdout.

Extensions and Additions

Ruby Interface: interface to call SVM^multiclass from Ruby, written by Vicente Bosch.

Disclaimer

This software is free only for non-commercial use. It must not be distributed without prior permission of the author. The author is not responsible for implications from the use of this software.

Known Problems

Training with kernels is not really supported, since it is very very slow in the current version.

History

V2.12 - V2.20

Includes version V3.10 of SVM^struct, which makes it substantially faster.
Now writes linear model files in a more compact format (i.e. stores only the weight vectors, not the support vectors).
Kernels now actually work correctly, but using a non-linear kernel is still very slow.
Fixed bug in RBF Kernel.
Fixed precision issues on 64-bit AMD and Intel machines.
Source code for SVM^multiclass V2.12.

V1.01 - V2.12

New training algorithm based on equivalent 1-slack reformulation of the training problem. This makes training on the linear kernel several orders of magnitude faster than in V1.01. See changes introduced by SVM^struct V3.00 for a general properties of the new structural SVM training algorithm.
New IO routines that are faster for reading large data and model files than in
Source code for SVM^multiclass V1.01.

V1.00 - V1.01

svm_multiclass_classify that occurred when true class labels are not given in test file (thanks to Jouni Paulus).

References

[1] K. Crammer and Y. Singer. On the Algorithmic Implementation of Multi-class SVMs, JMLR, 2001.

[2] I. Tsochantaridis, T. Hofmann, T. Joachims, and Y. Altun. Support Vector Learning for Interdependent and Structured Output Spaces, ICML, 2004. [Postscript]] [PDF]

[3] T. Joachims, Making Large-Scale SVM Learning Practical. Advances in Kernel Methods - Support Vector Learning, B. Schölkopf and C. Burges and A. Smola (ed.), MIT Press, 1999. [Postscript (gz)] [PDF]

[4] T. Joachims, T. Finley, Chun-Nam Yu, Cutting-Plane Training of Structural SVMs, Machine Learning Journal, to appear. [Draft PDF]

SVMmulticlass