SVMlight

Support Vector Machine

Author: Thorsten Joachims <thorsten@joachims.org>
Cornell University
Department of Computer Science

Developed at:
University of Dortmund, Informatik, AI-Unit
Collaborative Research Center on 'Complexity Reduction in Multivariate Data' (SFB475)

Version: 4.00
Date: 11.02.2002

Overview

SVMlight is an implementation of Support Vector Machines (SVMs) in C. The main features of the program are the following:

There is also another regression support vector machine based on SVMlight available at the AI-Unit: mySVM.

Description

SVMlight is an implementation of Vapnik's Support Vector Machine [Vapnik, 1995] for the problem of pattern recognition and for the problem of regression. The optimization algorithm used in SVMlight is described in [Joachims, 1999a]. The algorithm has scalable memory requirements and can handle problems with many thousands of support vectors efficiently.

This version also provides methods for assessing the generalization performance efficiently. It includes two efficient estimation methods for both error rate and precision/recall. XiAlpha-estimates [Joachims, 2000a, Joachims, 2000b] can be computed at essentially no computational expense, but they are conservatively biased. Almost unbiased estimates provides leave-one-out testing. SVMlight exploits that the results of most leave-one-outs (often more than 99%) are predetermined and need not be computed [Joachims, 2000b].

Futhermore, this version includes an algorithm for training large-scale transductive SVMs. The algorithm proceeds by solving a sequence of optimization problems lower-bounding the solution using a form of local search. A detailed description of the algorithm can be found in [Joachims, 1999c].

SVMlight can also train SVMs with cost models (see [Morik et al., 1999]).

The code has been used on a large range of problems, including text classification [Joachims, 1999c][Joachims, 1998a], several image recognition tasks, and medical applications. Many tasks have the property of sparse instance vectors. This implementation makes use of this property which leads to a very compact and efficient representation.

Source Code and Binaries

The program is free for scientific use. Please contact me, if you are planning to use the software for commercial purposes. The software must not be modified and distributed without prior permission of the author. If you use SVMlight in your scientific work, please cite as

I would also appreciate, if you sent me (a link to) your papers so that I can learn about your research. The implementation was developed on Solaris 2.5 with gcc, but compiles also on SunOS 3.1.4, Solaris 2.7, Linux, IRIX, Windows NT, and Powermac (after small modifications, see FAQ). The source code is available at the following location (NOTE: If your computer does not have a valid DNS entry, the FTP links might not work. Try a different computer or send email.):

http://download.joachims.org/svm_light/v4.00/svm_light.tar.gz

If you just want the binaries, you can download them for the following systems:

Please send me email and let me know that you got svm-light. I will put you on my mailing list to inform you about new versions and bug-fixes. SVMlight comes with a quadratic programming tool for solving small intermediate quadratic programming problems. It is based on the method of Hildreth and D'Espo and solves small quadratic programs very efficiently. Nevertheless, if for some reason you want to use another solver, the new version still comes with an interface to PR_LOQO. The PR_LOQO optimizer was written by A. Smola. It can be requested from http://www.kernel-machines.org/code/prloqo.tar.gz.

Installation

To install SVMlight you need to download svm_light.tar.gz. Create a new directory:

mkdir svm_light

Move svm_light.tar.gz to this directory and unpack it with

gunzip -c svm_light.tar.gz | tar xvf -

Now execute

make or make all

which compiles the system and creates the two executables

svm_learn (learning module)
svm_classify (classification module)

If you do not want to use the built-in optimizer but PR_LOQO instead, create a subdirectory in the svm_light directory with

mkdir pr_loqo

and copy the files pr_loqo.c and pr_loqo.h in there. Now execute

make svm_learn_loqo

If the system does not compile properly, check this FAQ.

How to use

This section explains how to use the SVMlight software. A good introduction to the theory of SVMs is Chris Burges' tutorial.

SVMlight consists of a learning module (svm_learn) and a classification module (svm_classify). The classification module can be used to apply the learned model to new examples. See also the examples below for how to use svm_learn and svm_classify.

svm_learn is called with the following parameters:

svm_learn [options] example_file model_file

Available options are:

General options:
         -?          - this help
         -v [0..3]   - verbosity level (default 1)
Learning options:
         -z {c,r}    -> select between classification and regression
                        (default classification)
         -c float    - C: trade-off between training error
                        and margin (default [avg. x*x]^-1)
         -w [0..]    -> epsilon width of tube for regression
                        (default 0.1)
         -j float    - Cost: cost-factor, by which training errors on
                        positive examples outweight errors on negative
                        examples (default 1) (see [4])
         -b [0,1]    - use biased hyperplane (i.e. x*w+b0) instead
                        of unbiased hyperplane (i.e. x*w0) (default 1)
         -i [0,1]    - remove inconsistent training examples
                        and retrain (default 0)
Performance estimation options:
         -x [0,1]    - compute leave-one-out estimates (default 0)
                        (see [5])
         -o ]0..2]   - value of rho for XiAlpha-estimator and for pruning
                        leave-one-out computation (default 1.0) (see [2])
         -k [0..100] - search depth for extended XiAlpha-estimator
                        (default 0)
Transduction options (see [3]):
         -p [0..1]   - fraction of unlabeled examples to be classified
                        into the positive class (default is the ratio of
                        positive and negative examples in the training data)
Kernel options:
         -t int      - type of kernel function:
                        0: linear (default)
                        1: polynomial (s a*b+c)^d
                        2: radial basis function exp(-gamma ||a-b||^2)
                        3: sigmoid tanh(s a*b + c)
                        4: user defined kernel from kernel.h
         -d int      - parameter d in polynomial kernel
         -g float    - parameter gamma in rbf kernel
         -s float    - parameter s in sigmoid/poly kernel
         -r float    - parameter c in sigmoid/poly kernel
         -u string   - parameter of user defined kernel
Optimization options (see [1]):
         -q [2..]    - maximum size of QP-subproblems (default 10)
         -n [2..q]   - number of new variables entering the working set
                        in each iteration (default n = q). Set n<q to prevent
                        zig-zagging.
         -m [5..]    - size of cache for kernel evaluations in MB (default 40)
                        The larger the faster...
         -e float    - eps: Allow that error for termination criterion
                        [y [w*x+b] - 1] = eps (default 0.001)
         -h [5..]    - number of iterations a variable needs to be
                        optimal before considered for shrinking (default 100)
         -f [0,1]    - do final optimality check for variables removed
                        by shrinking. Although this test is usually
                        positive, there is no guarantee that the optimum
                        was found if the test is omitted. (default 1)
Output options:
         -l char     - file to write predicted labels of unlabeled
                        examples into after transductive learning
         -a char     - write all alphas to this file after learning
                        (in the same order as in the training set)

More details in:
[1] T. Joachims, Making Large-Scale SVM Learning Practical. Advances in
    Kernel Methods - Support Vector Learning, B. Schölkopf and C. Burges and
    A. Smola (ed.), MIT Press, 1999.
[2] T. Joachims, Estimating the Generalization performance of an SVM
    Efficiently. International Conference on Machine Learning (ICML), 2000.
[3] T. Joachims, Transductive Inference for Text Classification using Support
    Vector Machines. International Conference on Machine Learning (ICML),
    1999.
[4] K. Morik, P. Brockhausen, and T. Joachims, Combining statistical learning
    with a knowledge-based approach - A case study in intensive care
    monitoring. International Conference on Machine Learning (ICML), 1999.
[5] T. Joachims, Learning to Classify Text Using Support Vector Machines.
    Dissertation, Universitaet Dortmund, 2000, to appear with Kluwer early 2002.

A more detailed description of the parameters and how they link to the respective algorithms is given in the appendix of [5] [Joachims/00a].

The input file example_file contains the training examples. The first lines may contain comments and are ignored if they start with #. Each of the following lines represents one training example and is of the following format:

<class> .=. +1 | -1 | 0
<feature> .=. integer
<value> .=. real
<line> .=. <class> <feature>:<value> <feature>:<value> ... <feature>:<value>

The class label and each of the feature/value pairs are separated by a space character. Feature/value pairs MUST be ordered by increasing feature number. Features with value zero can be skipped. The +1 as class label marks a positive example, -1 a negative example respectively. So, for example, the line

-1 1:0.43 3:0.12 9284:0.2

specifies an negative example for which feature number 1 has the value 0.43, feature number 3 has the value 0.12, feature number 9284 has the value 0.2, and all the other features have value 0.

A class label of 0 indicates that this example should be classified using transduction. The predictions for the examples classified by transduction are written to the file specified through the -l option. The order of the predictions is the same as in the training data.

The result of svm_learn is the model which is learned from the training data in example_file. The model is written to model_file. To classify test examples, svm_classify reads this file. svm_classify is called with the following parameters:

svm_classify [options] example_file model_file output_file

Available options are:

-h         Help. 
-v [0..3]  Verbosity level (default 2).
-f [0,1]   0: old output format of V1.0
           1: output the value of decision function (default)

All test examples in example_file are classified and the predicted classes are written to output_file. There is one line per test example in output_file containing the value of the decision function on that example. The sign of this value determines the predicted class. The test example file has the same format as the one for svm_learn. Again, <class> can have the value zero indicating unknown.

If you want to find out more, try this FAQ.

Getting started: an Example Problem

Inductive SVM

You will find an example text classification problem at

ftp://ftp-ai.cs.uni-dortmund.de/pub/Users/thorsten/svm_light/examples/example1.tar.gz

Download this file into your svm_light directory and unpack it with

gunzip -c example1.tar.gz | tar xvf -

This will create a subdirectory example1. Documents are represented as feature vectors. Each feature corresponds to a word stem (9947 features). The task is to learn which Reuters articles are about "corporate acquisitions". There are 1000 positive and 1000 negative examples in the file train.dat. The file test.dat contains 600 test examples. The feature numbers correspond to the line numbers in the file words. To run the example, execute the commands:

svm_learn example1/train.dat example1/model
svm_classify example1/test.dat example1/model example1/predictions

The accuracy on the test set is printed to stdout.

Transductive SVM

To try out the transductive learner, you can use the following dataset. I compiled it from the same Reuters articles as used in the example for the inductive SVM. The dataset consists of only 10 training examples (5 positive and 5 negative) and the same 600 test examples as above. You find it at

ftp://ftp-ai.cs.uni-dortmund.de/pub/Users/thorsten/svm_light/examples/example2.tar.gz

Download this file into your svm_light directory and unpack it with

gunzip -c example2.tar.gz | tar xvf -

This will create a subdirectory example2. To run the example, execute the commands:

svm_learn example2/train_transduction.dat example2/model
svm_classify example2/test.dat example2/model example2/predictions

The classification module is called only to get the accuracy printed. The transductive learner is invoced automatically, since train_transduction.dat contains unlabeled examples (i. e. the 600 test examples). You can compare the results to those of the inductive SVM by running:

svm_learn example2/train_induction.dat example2/model
svm_classify example2/test.dat example2/model example2/predictions

The file train_induction.dat contains the same 10 (labeled) training examples as train_transduction.dat.

Extensions and Additions

Questions and Bug Reports

If you find bugs or you have problems with the code you cannot solve by yourself, please contact me via email <svm-light@ls8.cs.uni-dortmund.de.

Disclaimer

This software is free only for non-commercial use. It must not be modified and distributed without prior permission of the author. The author is not responsible for implications from the use of this software.

History

V3.50 - V4.00

V3.02 - V3.50

V3.01 - V3.02

V3.00 - V3.01

V2.01 - V3.00

V2.00 - V2.01

V1.00 - V2.00

V0.91 - V1.00

V0.9 - V0.91

References

[Joachims/00a]

Thorsten Joachims, Learning to Classify Text Using Support Vector Machines. Dissertation, Universitaet Dortmund, February 2001. To appear with Kluwer early 2002.

[Klinkenberg/Joachims/2000a]

R. Klinkenberg and T. Joachims, Detecting Concept Drift with Support Vector Machines. Proceedings of the Seventeenth International Conference on Machine Learning (ICML), Morgan Kaufmann, 2000.
Online [Postscript (gz)] [PDF (gz)]

[Joachims/00b]

T. Joachims, Estimating the Generalization Performance of a SVM Efficiently. Proceedings of the International Conference on Machine Learning, Morgan Kaufman, 2000.
Online [Postscript (gz)] [PDF]

[Joachims/99a]

T. Joachims, 11 in: Making large-Scale SVM Learning Practical. Advances in Kernel Methods - Support Vector Learning, B. Schölkopf and C. Burges and A. Smola (ed.), MIT Press, 1999.
Online [Postscript (gz)] [PDF]

[Joachims/99c]

Thorsten Joachims, Transductive Inference for Text Classification using Support Vector Machines. International Conference on Machine Learning (ICML), 1999.
Online [Postscript (gz)] [PDF]

[Morik/etal/99a]

K. Morik, P. Brockhausen, and T. Joachims, Combining statistical learning with a knowledge-based approach - A case study in intensive care monitoring. Proc. 16th Int'l Conf. on Machine Learning (ICML-99), 1999.
Online [Postscript (gz)] [PDF]

[Joachims/98a]

T. Joachims, Text Categorization with Support Vector Machines: Learning with Many Relevant Features. Proceedings of the European Conference on Machine Learning, Springer, 1998.
Online [Postscript (gz)] [PDF]

[Joachims/98c]

Thorsten Joachims, Making Large-Scale SVM Learning Practical. LS8-Report, 24, Universität Dortmund, LS VIII-Report, 1998.
Online [Postscript (gz)] [PDF]

[Vapnik/95a]

Vladimir N. Vapnik, The Nature of Statistical Learning Theory. Springer, 1995.

Other SVM Resources

Last modified May 13th, 2002 by Thorsten Joachims <thorsten@joachims.org>