## SVM
^{perf} ## Support Vector Machine for Multivariate Performance MeasuresAuthor: Thorsten Joachims <thorsten@joachims.org> Version: 2.00 |

*SVM ^{perf}* is an implementation of the Support Vector Machine (SVM)
formulation for optimizing multivariate performance measures described in [Joachims,
2005]. Furthermore,

- Optimize binary SVM classification rules directly to ROC-Area, F1-Score, and Precision/Recall Break-Even Point (see [Joachims, 2005]).
- Train conventional linear classification SVMs optimizing error rate in time
that is linear in the size of the training data through an alternative, but
equivalent formulation of the training problem (see [Joachims, 2006]). This can
be much faster than
*SVM*for large training sets.^{light} - Train conventional linear ordinal regression SVMs [Herbrich et al., 1999] optimizing the number of misordered pairs in time that is O(n log(n)) in the size of the training data through an alternative, but equivalent formulation of the training problem (see [Joachims, 2006]). Currently this is implemented only for rankings with two ranks.

This implementation is an instance of *SVM ^{struct}*. More information on

http://download.joachims.org/svm_perf/v2.00/svm_perf.tar.gz

Please send me email and let me know that you got it. The archive contains the source code of the most recent version of *SVM ^{perf}*, which includes the source code of

gunzip –c svm_perf.tar.gz | tar xvf –This expands the archive into the current directory, which now contains all relevant files. You can compile

makeThis will produce the executables

SVM* ^{perf}* consists of a learning module (

Usage is much like *SVM ^{light}*. You call it like

svm_perf_learn -c 20.0 train.dat model.datwhich trains an SVM on the training set

tructure learning options" listed below are particular toSVM

General options: -? -> this help -v [0..3] -> verbosity level (default 1) -y [0..3] -> verbosity level for svm_light (default 0) Learning options: -c float -> C: trade-off between training error and margin (default 0.01) -p [1,2] -> L-norm to use for slack variables. Use 1 for L1-norm, use 2 for squared slacks. (default 1) -o [1,2] -> Rescaling method to use for loss. 1: slack rescaling 2: margin rescaling (default 2) -l [0..] -> Loss function to use. 0: zero/one loss (default 2) Kernel options: -t int -> type of kernel function: 0: linear (default) 1: polynomial (s a*b+c)^d 2: radial basis function exp(-gamma ||a-b||^2) 3: sigmoid tanh(s a*b + c) 4: user defined kernel from kernel.h -d int -> parameter d in polynomial kernel -g float -> parameter gamma in rbf kernel -s float -> parameter s in sigmoid/poly kernel -r float -> parameter c in sigmoid/poly kernel -u string -> parameter of user defined kernel Optimization options (see [2][3]): -q [2..] -> maximum size of QP-subproblems (default 10) -n [2..q] -> number of new variables entering the working set in each iteration (default n = q). Set nsize of cache for kernel evaluations in MB (default 40) The larger the faster... -e float -> eps: Allow that error for termination criterion (default 0.100000) -h [5..] -> number of iterations a variable needs to be optimal before considered for shrinking (default 100) -k [1..] -> number of new constraints to accumulate before recomputing the QP solution (default 100) -# int -> terminate optimization, if no progress after this number of iterations. (default 100000) Output options: -a string -> write all alphas to this file after learning (in the same order as in the training set) Structure learning options: --b float -> value of L2-bias feature. A value of 0 implies not having a bias feature. (default 1) WARNING: This is implemented only for linear kernel! --p [0..] -> fraction of positive examples to use as value of k for Prec@k and Rec@k. 0 indicates to use (0.5 * #pos) for Prec@k and (2 * #pos) for Rec@k. #pos is the number of positive examples in the training set. (default 0) --t [0..] -> Use sparse kernel expansion. Values like those for normal kernel (i.e. option -t). (default 0) --k [0..] -> Specifies the number of basis functions to use for sparse kernel approximation. (default 500) --f string -> Specifies file that contains basis functions to use for sparse kernel approximation. (default training file) The following loss functions can be selected with the -l option: 0 Zero/one loss: 1 if vector of predictions contains error, 0 otherwise. 1 F1: 100 minus the F1-score in percent. 2 Errorrate: Percentage of errors in prediction vector. 3 Prec/Rec Breakeven: 100 minus PRBEP in percent. 4 Prec@k: 100 minus precision at k in percent. 5 Rec@k: 100 minus recall at k in percent. 10 ROCArea: Percentage of swapped pos/neg pairs (i.e. 100 - ROCArea).

SVM^{perf} learns an unbiased linear classification rule (i.e.
a rule `sign(w*x)` without explicit threshold). Using the parameter '--b',
an artificial feature with constant value is added to each training example
which acts like a threshold. The loss function to be optimized is selected using
the '-l' option. Note that I never really tested Prec@k and Rec@k, so take those
with a grain of salt.

You can in principle use kernels in SVM^{perf} using the '-t'
option just like in SVM^{light}, but it is painfully slow. Much
faster (but only approximate) is using a sparse kernel expansion via the option
'--t' (to be described in a future paper). The number of basis functions to use
is specified with '--k'. The more basis functions, the more accurate the
approximation but the longer the training time. Normally, the
basis functions are sampled from the training set, but you can give a file with
an explicit set of basis function using the '--f' option. The file has the same
format as a training/test file.

The file format of the training and test files is the same as for *SVM ^{light}*. The first lines may contain comments and are ignored if they start with #. Each of the following lines represents one training example and is of the following format:

The target value and each of the feature/value pairs are separated by a space character. Feature/value pairs MUST be ordered by increasing feature number. Features with value zero can be skipped. The target value denotes the class of the example via a positive integer. So, for example, the line

-1 1:0.43 3:0.12 9284:0.2 # abcdef

specifies an negative example (i.e. -1) for which feature number 1 has the value 0.43, feature number 3 has the value 0.12, feature number 9284 has the value 0.2, and all the other features have value 0. In addition, the string `abcdef` is stored with the vector, which can serve as a way of providing additional information when adding user defined kernels.

The result of `svm_perf_learn` is the model which is learned from the training data in
`train.dat`. The model is written to `model.dat`. To make predictions on test examples, `svm_perf_classify` reads this file. `svm_perf_classify` is called as follows:

svm_perf_classify [options] test.dat model.dat predictions

For all test examples in `test.dat` the predicted classes are written to
the file `predictions`. There is one line per test example in `predictions` in the same order as in
`test.dat`.

You will find an example problem at

http://download.joachims.org/svm_light/examples/example1.tar.gz

Download this file into your svm_perf directory and unpack it with

gunzip -c example1.tar.gz | tar xvf -

This will create a subdirectory `example1`. There are 2000 examples in the file `train.dat` and
600 in the file `test.dat`. To run the example, execute the commands:

`svm_perf_learn -c 20 -l 2 --b 0 example1/train.dat example1/model`

`svm_perf_classify example1/test.dat example1/model example1/predictions`

The accuracy on the test set is printed to stdout. The equivalent call to SVM^{light}
that will find the same classification rule (up to numerical precision) is
`svm_learn -c 1 -b 0 example1/train.dat example1/model`.

This software is free only for non-commercial use. It must not be distributed without prior permission of the author. The author is not responsible for implications from the use of this software.

- Extended algorithm for fast training of linear models, and using the sparse kernel approximation.

- T. Joachims,
*Training Linear SVMs in Linear Time,*Proceedings of the ACM Conference on Knowledge Discovery and Data Mining (KDD), 2006. [Postscript] [PDF] - T. Joachims,
*A Support Vector Method for Multivariate Performance Measures*, Proceedings of the International Conference on Machine Learning (ICML), 2005. [Postscript] [PDF] - Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun, Large Margin Methods for Structured and Interdependent Output Variables, Journal of Machine Learning Research (JMLR), 6(Sep):1453-1484, 2005. [PDF]
- I. Tsochantaridis, T. Hofmann, T. Joachims, Y. Altun.
*Support Vector Machine Learning for Interdependent and Structured Output Spaces*. International Conference on Machine Learning (ICML), 2004. [Postscript] [PDF] - T. Joachims, Making Large-Scale SVM Learning Practical. Advances in Kernel Methods - Support Vector Learning, B. Schölkopf and C. Burges and A. Smola (ed.), MIT Press, 1999. [Postscript (gz)] [PDF]