Learning to Classify Text using Support Vector Machines
Methods, Theory, and Algorithms
Academic Publishers / Springer
Text Classification, or the task of
automatically assigning semantic categories to natural language text, has become
one of the key methods for organizing online information. Since hand-coding
classification rules is costly or even impractical, most modern approaches
employ machine learning techniques to automatically learn text classifiers from
examples. However, none of these conventional approaches combines good
prediction performance, theoretical understanding, and efficient training
Based on ideas from Support Vector Machines (SVMs), Learning To Classify Text Using Support Vector Machines presents a new approach to generating text classifiers from examples. The approach combines high performance and efficiency with theoretical understanding and improved robustness. In particular, it is highly effective without greedy heuristic components. The SVM approach is computationally efficient in training and classification, and it comes with a learning theory that can guide real-world applications.
Learning To Classify Text Using Support Vector Machines gives a complete and detailed description of the SVM approach to learning text classifiers, including training algorithms, transductive text classification, efficient performance estimation, and a statistical learning model of text classification. In addition, it includes an overview of the field of text classification, making it self-contained even for newcomers to the field. This book gives a concise introduction to SVMs for pattern recognition, and it includes a detailed description of how to formulate text-classification tasks for machine learning.
Learning To Classify Text Using Support Vector Machines is designed as a reference for researchers and practitioners, and is suitable as a secondary text for graduate-level students in Computer Science within Machine Learning and Language Technology.
Table of Content
Prof. Tom Mitchell and Prof. Katharina Morik xi
1. INTRODUCTION 1
2. TEXT CLASSIFICATION 7
3. SUPPORT VECTOR MACHINES 35
4. A STATISTICAL LEARNING MODEL OF
TEXT CLASSIFICATION FOR SVMS 45
5. EFFICIENT PERFORMANCE
ESTIMATORS FOR SVMS 75
6. INDUCTIVE TEXT CLASSIFICATION
7. TRANSDUCTIVE TEXT
8. TRAINING INDUCTIVE SUPPORT
VECTOR MACHINES 141
9. TRAINING TRANSDUCTIVE SUPPORT
VECTOR MACHINES 163
10. CONCLUSIONS 175
11. Open Question 177
Appendix: SVM-Light Commands and Options 197
Thorsten Joachims is an
Assistant Professor in the Department of
Computer Science at Cornell University.
In 2001 he finished his Ph. D. as a student of Prof. Morik at the AI-unit of the University of Dortmund, from where he
also received a Diplom in Computer Science in 1997. Between 2000 and 2001 he
worked as a PostDoc at the GMD in the Knowledge Discovery Team of the Institute for Autonomous
Intelligent Systems. From 1994 to 1996 he spent one and a half years at Carnegie Mellon University as a visiting
scholar of Prof.