## SVM
^{multiclass} ## Multi-Class Support Vector MachineAuthor: Thorsten Joachims <thorsten@joachims.org> Version: 2.20 |

*SVM ^{multiclass}* uses the multi-class formulation described in [1],
but optimizes it with an algorithm that is very fast in the linear case.
For a training set (x

min 1/2 Σ_{i=1..k} w_{i}*w_{i} + C/n
Σ_{i = 1..n} ξ_{i}

s.t. for all y
in [1..k]:
[ x_{1}
• w_{yi} ]
>= [ x_{1}
• w_{y} ] + 100*Δ(y_{1},y) - ξ_{1
...
}
for all y in [1..k]:
[ x_{n}
• w_{yn} ]
>= [ x_{n}
• w_{y} ] + 100*Δ(y_{n},y) - ξ_{n}

C is the usual regularization parameter that trades off margin size and
training error. Δ(y_{n},y)
is the loss function that returns 0 if y_{n}
equals y, and 1 otherwise.

To solve this optimization problem, *SVM ^{multiclass}* uses an
algorithm that is different from the one in [1]. The algorithm
is based on Structural SVMs [2] and it is an instance of

The source code is available at the following location:

- Linux (32-bit): http://download.joachims.org/svm_multiclass/current/svm_multiclass_linux32.tar.gz
- Linux (64-bit): http://download.joachims.org/svm_multiclass/current/svm_multiclass_linux64.tar.gz
- Cygwin: http://download.joachims.org/svm_multiclass/current/svm_multiclass_cygwin.tar.gz
- Windows: http://download.joachims.org/svm_multiclass/current/svm_multiclass_windows.zip

Please send me email and let me know that you got it. The archive contains the source code of the most recent version of *SVM ^{multiclass}*, which includes the source code of

gunzip –c svm_multiclass.tar.gz | tar xvf –This expands the archive into the current directory, which now contains all relevant files. You can compile

makein the top-level directory of the archive. This will produce the executables

SVM* ^{multiclass}* consists of a learning module (

svm_multiclass_learn -c 1.0 example_file model_filewhich trains an SVM on the training set

General Options: -? -> this help -v [0..3] -> verbosity level (default 1) -y [0..3] -> verbosity level for svm_light (default 0) Learning Options: -c float -> C: trade-off between training error and margin (default 0.01) -p [1,2] -> L-norm to use for slack variables. Use 1 for L1-norm, use 2 for squared slacks. (default 1) -o [1,2] -> Rescaling method to use for loss. 1: slack rescaling 2: margin rescaling (default 2) -l [0..] -> Loss function to use. 0: zero/one loss ?: see below in application specific options (default 0) Optimization Options (see [2][4]): -w [0,..,9] -> choice of structural learning algorithm (default 4): 0: n-slack algorithm described in [2] 1: n-slack algorithm with shrinking heuristic 2: 1-slack algorithm (primal) described in [4] 3: 1-slack algorithm (dual) described in [4] 4: 1-slack algorithm (dual) with constraint cache [4] 9: custom algorithm in svm_struct_learn_custom.c -e float -> epsilon: allow that tolerance for termination criterion (default 0.100000) -k [1..] -> number of new constraints to accumulate before recomputing the QP solution (default 100) (-w 0 and 1 only) -f [5..] -> number of constraints to cache for each example (default 5) (used with -w 4) -b [1..100] -> percentage of training set for which to refresh cache when no epsilon violated constraint can be constructed from current cache (default 100%) (used with -w 4) -n [2..q] -> number of new variables entering the working set in each svm-light iteration (default n = q). Set n < q to prevent zig-zagging. -m [5..] -> size of svm-light cache for kernel evaluations in MB (default 40) (used only for -w 1 with kernels) -h [5..] -> number of svm-light iterations a variable needs to be optimal before considered for shrinking (default 100) -# int -> terminate svm-light QP subproblem optimization, if no progress after this number of iterations. (default 100000) Kernel Options: -t int -> type of kernel function: 0: linear (default) 1: polynomial (s a*b+c)^d 2: radial basis function exp(-gamma ||a-b||^2) 3: sigmoid tanh(s a*b + c) 4: user defined kernel from kernel.h -d int -> parameter d in polynomial kernel -g float -> parameter gamma in rbf kernel -s float -> parameter s in sigmoid/poly kernel -r float -> parameter c in sigmoid/poly kernel -u string -> parameter of user defined kernel Output Options: -a string -> write all alphas to this file after learning (in the same order as in the training set) Application-Specific Options: noneFor more details on the meaning of these options consult the description of

The input file `example_file` contains the training examples. The file format is the same as for *SVM ^{light}*, just that the target value is now a positive integer that indicates the class. The first lines may contain comments and are ignored if they start with #. Each of the following lines represents one training example and is of the following format:

The target value and each of the feature/value pairs are separated by a space character. Feature/value pairs MUST be ordered by increasing feature number. Features with value zero can be skipped. The target value denotes the class of the example via a positive (non-zero) integer. So, for example, the line

3 1:0.43 3:0.12 9284:0.2 # abcdef

specifies an example of class 3 for which feature number 1 has the value 0.43, feature number 3 has the value 0.12, feature number 9284 has the value 0.2, and all the other features have value 0. In addition, the string `abcdef` is stored with the vector, which can serve as a way of providing additional information when adding user defined kernels.

The result of `svm_multiclass_learn` is the model which is learned from the training data in `example_file`. The model is written to `model_file`. To make predictions on test examples, `svm_multiclass_classify` reads this file. `svm_multiclass_classify` is called as follows:

svm_multiclass_classify [options] test_example_file model_file output_file

For all test examples in `test_example_file` the predicted classes
(and the values of x
• w_{i} for each class) are written to `output_file`. There is one line per test example in `output_file` in the same order as in `
test_example_file`.
The first value in each line is the predicted class, and each of the following
numbers are the discriminant values for each of the k classes.

You will find an example problem with 7 classes at

http://download.joachims.org/svm_multiclass/examples/example4.tar.gz

Download this file into your svm_multiclass directory and unpack it with

gunzip -c example4.tar.gz | tar xvf -

This will create a subdirectory `example4`. There are 300 examples in the file `train.dat` and 2000 in the file `test.dat`. To run the example, execute the commands:

`svm_multiclass_learn -c 5000 example4/train.dat example4/model`

`svm_multiclass_classify example4/test.dat example4/model example4/predictions`

The accuracy on the test set is printed to stdout.

- Ruby
Interface: interface to call SVM
^{multiclass}from Ruby, written by Vicente Bosch.

This software is free only for non-commercial use. It must not be distributed without prior permission of the author. The author is not responsible for implications from the use of this software.

- Training with kernels is not really supported, since it is very very slow in the current version.

- Includes version V3.10 of
*SVM*, which makes it substantially faster.^{struct} - Now writes linear model files in a more compact format (i.e. stores only the weight vectors, not the support vectors).
- Kernels now actually work correctly, but using a non-linear kernel is still very slow.
- Fixed bug in RBF Kernel.
- Fixed precision issues on 64-bit AMD and Intel machines.
- Source code for
*SVM*V2.12.^{multiclass}

- New training algorithm based on equivalent 1-slack reformulation of the
training problem. This makes training on the linear kernel several orders of
magnitude faster than in V1.01. See changes introduced by
*SVM*V3.00 for a general properties of the new structural SVM training algorithm.^{struct} - New IO routines that are faster for reading large data and model files than in
- Source code for
*SVM*V1.01.^{multiclass}

**[1]** K. Crammer and Y. Singer. On the Algorithmic Implementation of Multi-class SVMs, JMLR, 2001.

**[2]** I. Tsochantaridis, T. Hofmann, T. Joachims, and Y. Altun. Support Vector Learning for Interdependent and Structured Output Spaces, ICML, 2004. [Postscript]] [PDF]

**[3]** T. Joachims, Making Large-Scale SVM Learning Practical. Advances in Kernel Methods - Support Vector Learning, B. Schölkopf and C. Burges and A. Smola (ed.), MIT Press, 1999. [Postscript (gz)] [PDF]

**[4] **T. Joachims, T. Finley, Chun-Nam Yu, Cutting-Plane Training of
Structural SVMs, Machine Learning Journal, to appear.
[Draft PDF]