|
Learning to Classify Text using Support Vector MachinesMethods, Theory, and AlgorithmsThorsten Joachims Kluwer
Academic Publishers / Springer [B&N] [Amazon] [Kluwer/Springer] |
Abstract
Text Classification, or the task of
automatically assigning semantic categories to natural language text, has become
one of the key methods for organizing online information. Since hand-coding
classification rules is costly or even impractical, most modern approaches
employ machine learning techniques to automatically learn text classifiers from
examples. However, none of these conventional approaches combines good
prediction performance, theoretical understanding, and efficient training
algorithms.Based on ideas from Support Vector Machines (SVMs), Learning To Classify Text Using Support Vector Machines presents a new approach to generating text classifiers from examples. The approach combines high performance and efficiency with theoretical understanding and improved robustness. In particular, it is highly effective without greedy heuristic components. The SVM approach is computationally efficient in training and classification, and it comes with a learning theory that can guide real-world applications. Learning To Classify Text Using Support Vector Machines gives a complete and detailed description of the SVM approach to learning text classifiers, including training algorithms, transductive text classification, efficient performance estimation, and a statistical learning model of text classification. In addition, it includes an overview of the field of text classification, making it self-contained even for newcomers to the field. This book gives a concise introduction to SVMs for pattern recognition, and it includes a detailed description of how to formulate text-classification tasks for machine learning. Learning To Classify Text Using Support Vector Machines is designed as a reference for researchers and practitioners, and is suitable as a secondary text for graduate-level students in Computer Science within Machine Learning and Language Technology. |
|
Table of ContentForeword Prof. Tom Mitchell and Prof. Katharina Morik xi Preface xiii Acknowledgments xv Notation xvii 1. INTRODUCTION 1 2. TEXT CLASSIFICATION 7 3. SUPPORT VECTOR MACHINES 35 Part Theory 4. A STATISTICAL LEARNING MODEL OF
TEXT CLASSIFICATION FOR SVMS 45 5. EFFICIENT PERFORMANCE
ESTIMATORS FOR SVMS 75 Part Methods 6. INDUCTIVE TEXT CLASSIFICATION
103 7. TRANSDUCTIVE TEXT
CLASSIFICATION 119 Part Algorithms 8. TRAINING INDUCTIVE SUPPORT
VECTOR MACHINES 141 9. TRAINING TRANSDUCTIVE SUPPORT
VECTOR MACHINES 163 10. CONCLUSIONS 175 11. Open Question 177 Bibliography 180 Appendix: SVM-Light Commands and Options 197 Index 203 |
|
Errata
|
|
Bio
Thorsten Joachims is an
Assistant Professor in the Department of
Computer Science at Cornell University.
In 2001 he finished his Ph. D. as a student of Prof. Morik at the AI-unit of the University of Dortmund, from where he
also received a Diplom in Computer Science in 1997. Between 2000 and 2001 he
worked as a PostDoc at the GMD in the Knowledge Discovery Team of the Institute for Autonomous
Intelligent Systems. From 1994 to 1996 he spent one and a half years at Carnegie Mellon University as a visiting
scholar of Prof.
Tom Mitchell.
|