SVM^cfg

Support Vector Machine for Weighted Context Free Grammars

Author: Thorsten Joachims <thorsten@joachims.org>
Cornell University
Department of Computer Science

Version: 3.00
Date: 01.07.2007

Overview

SVM^cfg is an implementation of the Support Vector Machine (SVM) algorithm for learning a weighted context free grammar as described in [1]. The goal is to learn an accurate model from supervised training data, so that this model predicts the correct tree y for a given input x (as, e.g., in natural language parsing). It includes a modified version of the CKY parser written by Mark Johnson, which is described in [2].

Compared to a conventional probablistic context free grammar (PCFG), the SVM algorithm can handle overlapping and dependent attributes attached to grammar rules. In particular, the weight of an instantiated rule can depend on the complete terminal sequence, the span of the rule, and the spans of the children trees. This implementation is not meant to be a specialized and properly tuned parser for natural language, but a flexible and extensible tool for learning models in a wide range of domains. In particular, it is easy to add attributes that reflect the properties for the particular domain at hand.

SVM^cfg V3.00 uses the one-slack formulation introduced in SVM^struct V3.00, which makes it substantially faster than previous versions. More information on SVM^struct is available here.

Source Code and Binaries

The program is free for scientific use. Please contact me, if you are planning to use the software for commercial purposes. The software must not be further distributed without prior permission of the author. The implementation was developed on Linux with gcc, but compiles also on Cygwin, Windows (using the MinGW option of Cygwin), and Mac (after small modifications, see FAQ). The source code is available at the following location:

http://download.joachims.org/svm_cfg/v3.00/svm_cfg.tar.gz

If you just want the binaries, you can download them for the following systems:

Please send me email and let me know that you got it. The archive contains the source code of the most recent version of SVM^cfg, which includes the source code of SVM^struct, the CKY parser of Mark Johnson, and the SVM^light quadratic optimizer. Unpack the archive using the shell command:

      gunzip –c svm_cfg.tar.gz | tar xvf –

This expands the archive into the current directory, which now contains all relevant files. You can compile SVM^cfg using the command:

           make

This will produce the executables svm_cfg_learn and svm_cfg_classify. If the system does not compile properly, check this FAQ.

How to Use

SVM^cfg consists of a learning module (svm_cfg_learn) and a classification module (svm_cfg_classify). The classification module can be used to apply the learned model to new examples. See also the examples below for how to use svm_cfg_learn and svm_cfg_classify.

Usage is much like SVM^light. You call it like

      svm_cfg_learn -c 1.0 example_file model_file

which trains an SVM on the training set example_file and outputs the learned grammar to the files model_file.svm and model_file.grammar using the regularization parameter C set to 1.0. Note that the loss functions are scaled by the factor 100/n, which affects the scaling and the sensible range of C. Other options are:

General options:

         -?          -> this help

         -v [0..3]   -> verbosity level (default 1)

         -y [0..3]   -> verbosity level for svm_light (default 0)

Learning options:

         -c float    -> C: trade-off between training error

                        and margin (default 0.01)

         -p [1,2]    -> L-norm to use for slack variables. Use 1 for L1-norm,

                        use 2 for squared slacks. (default 1)

         -o [1,2]    -> Rescaling method to use for loss.

                        1: slack rescaling

                        2: margin rescaling

                        (default 2)

         -l [0..]    -> Loss function to use.

                        0: zero/one loss

                        (default 0)

Kernel options:

         -t int      -> type of kernel function:

                        0: linear (default)

                        1: polynomial (s a*b+c)^d

                        2: radial basis function exp(-gamma ||a-b||^2)

                        3: sigmoid tanh(s a*b + c)

                        4: user defined kernel from kernel.h

         -d int      -> parameter d in polynomial kernel

         -g float    -> parameter gamma in rbf kernel

         -s float    -> parameter s in sigmoid/poly kernel

         -r float    -> parameter c in sigmoid/poly kernel

         -u string   -> parameter of user defined kernel

Optimization options (see [2][3]):

         -w [1,2,3,4]-> choice of structural learning algorithm (default 4):

                        1: algorithm described in [2]

                        2: joint constraint algorithm (primal) [to be published]

                        3: joint constraint algorithm (dual) [to be published]

                        4: joint constraint algorithm (dual) with constr. cache

         -q [2..]    -> maximum size of QP-subproblems (default 10)

         -n [2..q]   -> number of new variables entering the working set

                        in each iteration (default n = q). Set n < q to prevent

                        zig-zagging.

         -m [5..]    -> size of cache for kernel evaluations in MB (default 40)

                        (used only for -w 1 with kernels)

         -f [5..]    -> number of constraints to cache for each example

                        (default 5) (used with -w 4)

         -e float    -> eps: Allow that error for termination criterion

                        (default 0.100000)

         -h [5..]    -> number of iterations a variable needs to be

                        optimal before considered for shrinking (default 100)

         -k [1..]    -> number of new constraints to accumulate before

                        recomputing the QP solution (default 100) (-w 1 only)

         -# int      -> terminate QP subproblem optimization, if no progress

                        after this number of iterations. (default 100000)

Output options:

         -a string   -> write all alphas to this file after learning

                        (in the same order as in the training set)

Structure learning options:

         --s [1..]   -> Maximum length of sentences to use (default 10)

         --p {0,1}   -> Use parent annotation for rules (default 0)

         --a {0,1}   -> Use borders as feature (default 0)

         --b {0,1}   -> Use length of parent span as feature (default 0)

         --c {0,1}   -> Use length of childrens spans as features (default 0)

         --d {0,1}   -> Use difference between children spans as feature

                        (default 0)

         -l {0,1}    NOTE: Added loss type '1' in addition to the 0/1-loss '0'.

                     Loss type '1' uses (1-F_1) as the loss as in [2].

The option --s allows you to specify a maximum sentence length, so that all longer sentences are ignored in the experiment. Option --p turns on parent annotation of rules as described in [2]. The remaining structure option (--a to --d) select different types of features to include in the representation. For more details on the meaning of these options consult the description of SVM^struct , the description of SVM^light, and references [1][3].

The input file example_file contains the training examples. The file format is the same in the Penn Treebank with the slight modifications described in [2] (e.g. removal of epsilon rules). For example, the following bracketed tree is an example with the terminal symbols in the leaves.

('root' (S (NP (DET the) (N cat)) (VP (V ran))))

Note that all trees have to start with the special root symbol 'root'. Note also that the current version of the code ignores the terminals, but starts with the pre-terminals.

The result of svm_cfg_learn is the model which is learned from the training data in example_file. The model is written to two files starting with model_file and having suffixes .svm and .grammar. To make predictions on test examples, svm_cfg_classify reads this file. svm_cfg_classify is called as follows:

svm_cfg_classify [options] test_example_file model_file output_file

For all test examples in test_example_file the predicted parse trees are written to output_file. There is one line per test example in output_file in the same order as in test_example_file. The test file has the same format as the training file (i.e. if you do not know the correct parse tree for the test example, you have to make up a bogus one).

Getting started: an Example Problem

You will find a small example problem with 29 examples from the Penn Treebank at

http://download.joachims.org/svm_cfg/examples/example5.tar.gz

Download this file into your svm_cfg directory and unpack it with

gunzip -c example5.tar.gz | tar xvf -

This will create a subdirectory example5. There are 29 examples in the file train.dat. To run the example, execute the commands:

svm_cfg_learn --s 40 -c 1.0 -l 1 --a 1 --b 1 --c 1 example5/train.dat example5/model
svm_cfg_classify example5/train.dat example5/model example5/predictions

The first command trains using all sentences of length at most 40, the regularization parameter C equal to 1.0, the F₁-loss, and features a,b,c. The second command reclassifies the training set. The accuracy and the average loss on the training set is printed to stdout.

Disclaimer

This software is free only for non-commercial use. It must not be distributed without prior permission of the author. The author is not responsible for implications from the use of this software.

Known Problems

On 64-bit AMD and Intel machines, there can be a precision problem when using the w3 or w4 algorithms on problems with many features. To fix this problem, change "#define CFLOAT" to "double" in svm_common.h. Thank you to Chih-Jen Lin for pointing this out.
The RBF kernel is broken.

History

V2.01 - V3.00

New training algorithm based on equivalent 1-slack reformulation of the training problem. This makes training on the linear kernel several orders of magnitude faster than in V2.01. See changes introduced by SVM^struct V3.00 for a general properties of the new structural SVM training algorithm.
New IO routines that are faster for reading large data files.
Source code for SVM^cfg V2.01.

V2.00 - V2.01

Fixed bug that lead to ignoring additional rule features in svm_cfg_classify .

References

[1] I. Tsochantaridis, T. Hofmann, T. Joachims, and Y. Altun. Support Vector Learning for Interdependent and Structured Output Spaces, ICML, 2004. [Postscript] [PDF]

[2] M. Johnson, PCFG models of linguistic tree representations. Computational Linguistics, 1999.

[3] T. Joachims, Making Large-Scale SVM Learning Practical. Advances in Kernel Methods - Support Vector Learning, B. Schölkopf and C. Burges and A. Smola (ed.), MIT Press, 1999. [Postscript (gz)] [PDF]

SVMcfg