E-Book, Englisch, 473 Seiten
Abe Support Vector Machines for Pattern Classification
2. Auflage 2010
ISBN: 978-1-84996-098-4
Verlag: Springer
Format: PDF
Kopierschutz: 1 - PDF Watermark
E-Book, Englisch, 473 Seiten
Reihe: Advances in Computer Vision and Pattern Recognition
ISBN: 978-1-84996-098-4
Verlag: Springer
Format: PDF
Kopierschutz: 1 - PDF Watermark
A guide on the use of SVMs in pattern classification, including a rigorous performance comparison of classifiers and regressors. The book presents architectures for multiclass classification and function approximation problems, as well as evaluation criteria for classifiers and regressors. Features: Clarifies the characteristics of two-class SVMs; Discusses kernel methods for improving the generalization ability of neural networks and fuzzy systems; Contains ample illustrations and examples; Includes performance evaluation using publicly available data sets; Examines Mahalanobis kernels, empirical feature space, and the effect of model selection by cross-validation; Covers sparse SVMs, learning using privileged information, semi-supervised learning, multiple classifier systems, and multiple kernel learning; Explores incremental training based batch training and active-set training methods, and decomposition techniques for linear programming SVMs; Discusses variable selection for support vector regressors.
Autoren/Hrsg.
Weitere Infos & Material
1;Preface;6
2;Acknowledgments;12
3;Contents;14
4;Symbols;20
5;1 Introduction;21
5.1;1.1 Decision Functions;22
5.1.1;1.1.1 Decision Functions for Two-Class Problems;22
5.1.2;1.1.2 Decision Functions for Multiclass Problems;24
5.2;1.2 Determination of Decision Functions;28
5.3;1.3 Data Sets Used in the Book;29
5.4;1.4 Classifier Evaluation;33
5.5;References;36
6;2 Two-Class Support Vector Machines;40
6.1;2.1 Hard-Margin Support Vector Machines;40
6.2;2.2 L1 Soft-Margin Support Vector Machines;47
6.3;2.3 Mapping to a High-Dimensional Space;50
6.3.1;2.3.1 Kernel Tricks;50
6.3.2;2.3.2 Kernels;52
6.3.3;2.3.3 Normalizing Kernels;62
6.3.4;2.3.4 Properties of Mapping Functions Associated with Kernels;63
6.3.5;2.3.5 Implicit Bias Terms;66
6.3.6;2.3.6 Empirical Feature Space;69
6.4;2.4 L2 Soft-Margin Support Vector Machines;75
6.5;2.5 Advantages and Disadvantages;77
6.5.1;2.5.1 Advantages;77
6.5.2;2.5.2 Disadvantages;78
6.6;2.6 Characteristics of Solutions;79
6.6.1;2.6.1 Hessian Matrix;79
6.6.2;2.6.2 Dependence of Solutions on C;81
6.6.3;2.6.3 Equivalence of L1 and L2 Support Vector Machines;86
6.6.4;2.6.4 Nonunique Solutions;89
6.6.5;2.6.5 Reducing the Number of Support Vectors;97
6.6.6;2.6.6 Degenerate Solutions;100
6.6.7;2.6.7 Duplicate Copies of Data;102
6.6.8;2.6.8 Imbalanced Data;104
6.6.9;2.6.9 Classification for the Blood Cell Data;104
6.7;2.7 Class Boundaries for Different Kernels;107
6.8;2.8 Developing Classifiers;112
6.8.1;2.8.1 Model Selection;112
6.8.2;2.8.2 Estimating Generalization Errors;112
6.8.3;2.8.3 Sophistication of Model Selection;116
6.8.4;2.8.4 Effect of Model Selection by Cross-Validation;117
6.9;2.9 Invariance for Linear Transformation;121
6.10;References;125
7;3 Multiclass Support Vector Machines;132
7.1;3.1 One-Against-All Support Vector Machines;133
7.1.1;3.1.1 Conventional Support Vector Machines;133
7.1.2;3.1.2 Fuzzy Support Vector Machines;135
7.1.3;3.1.3 Equivalence of Fuzzy Support Vector Machines and Support Vector Machines with Continuous Decision Functions;138
7.1.4;3.1.4 Decision-Tree-Based Support Vector Machines;141
7.2;3.2 Pairwise Support Vector Machines;146
7.2.1;3.2.1 Conventional Support Vector Machines;146
7.2.2;3.2.2 Fuzzy Support Vector Machines;147
7.2.3;3.2.3 Performance Comparison of Fuzzy Support Vector Machines;148
7.2.4;3.2.4 Cluster-Based Support Vector Machines;151
7.2.5;3.2.5 Decision-Tree-Based Support Vector Machines;152
7.2.6;3.2.6 Pairwise Classification with Correcting Classifiers;162
7.3;3.3 Error-Correcting Output Codes;163
7.3.1;3.3.1 Output Coding by Error-Correcting Codes;164
7.3.2;3.3.2 Unified Scheme for Output Coding;165
7.3.3;3.3.3 Equivalence of ECOC with Membership Functions;166
7.3.4;3.3.4 Performance Evaluation;166
7.4;3.4 All-at-Once Support Vector Machines;168
7.5;3.5 Comparisons of Architectures;171
7.5.1;3.5.1 One-Against-All Support Vector Machines;171
7.5.2;3.5.2 Pairwise Support Vector Machines;171
7.5.3;3.5.3 ECOC Support Vector Machines;172
7.5.4;3.5.4 All-at-Once Support Vector Machines;172
7.5.5;3.5.5 Training Difficulty;172
7.5.6;3.5.6 Training Time Comparison;176
7.6;References;177
8;4 Variants of Support Vector Machines;181
8.1;4.1 Least-Squares Support Vector Machines;181
8.1.1;4.1.1 Two-Class Least-Squares Support Vector Machines;182
8.1.2;4.1.2 One-Against-All Least-Squares Support Vector Machines;184
8.1.3;4.1.3 Pairwise Least-Squares Support Vector Machines;186
8.1.4;4.1.4 All-at-Once Least-Squares Support Vector Machines;187
8.1.5;4.1.5 Performance Comparison;188
8.2;4.2 Linear Programming Support Vector Machines;192
8.2.1;4.2.1 Architecture;193
8.2.2;4.2.2 Performance Evaluation;196
8.3;4.3 Sparse Support Vector Machines;198
8.3.1;4.3.1 Several Approaches for Sparse SupportVector Machines;199
8.3.2;4.3.2 Idea;201
8.3.3;4.3.3 Support Vector Machines Trained in the Empirical Feature Space;202
8.3.4;4.3.4 Selection of Linearly Independent Data;205
8.3.5;4.3.5 Performance Evaluation;207
8.4;4.4 Performance Comparison of Different Classifiers;210
8.5;4.5 Robust Support Vector Machines;214
8.6;4.6 Bayesian Support Vector Machines;215
8.6.1;4.6.1 One-Dimensional Bayesian Decision Functions;217
8.6.2;4.6.2 Parallel Displacement of a Hyperplane;218
8.6.3;4.6.3 Normal Test;219
8.7;4.7 Incremental Training;219
8.7.1;4.7.1 Overview;219
8.7.2;4.7.2 Incremental Training Using Hyperspheres;222
8.8;4.8 Learning Using Privileged Information;231
8.9;4.9 Semi-Supervised Learning;234
8.10;4.10 Multiple Classifier Systems;235
8.11;4.11 Multiple Kernel Learning;236
8.12;4.12 Confidence Level;237
8.13;4.13 Visualization;238
8.14;References;238
9;5 Training Methods;245
9.1;5.1 Preselecting Support Vector Candidates;245
9.1.1;5.1.1 Approximation of Boundary Data;246
9.1.2;5.1.2 Performance Evaluation;248
9.2;5.2 Decomposition Techniques;249
9.3;5.3 KKT Conditions Revisited;252
9.4;5.4 Overview of Training Methods;257
9.5;5.5 Primal--Dual Interior-Point Methods;260
9.5.1;5.5.1 Primal--Dual Interior-Point Methods for Linear Programming;260
9.5.2;5.5.2 Primal--Dual Interior-Point Methods for Quadratic Programming;264
9.5.3;5.5.3 Performance Evaluation;266
9.6;5.6 Steepest Ascent Methods and Newton's Methods;270
9.6.1;5.6.1 Solving Quadratic Programming Problems Without Constraints;270
9.6.2;5.6.2 Training of L1 Soft-Margin Support Vector Machines;272
9.6.3;5.6.3 Sequential Minimal Optimization;277
9.6.4;5.6.4 Training of L2 Soft-Margin Support Vector Machines;278
9.6.5;5.6.5 Performance Evaluation;279
9.7;5.7 Batch Training by Exact Incremental Training;280
9.7.1;5.7.1 KKT Conditions;281
9.7.2;5.7.2 Training by Solving a Set of Linear Equations;282
9.7.3;5.7.3 Performance Evaluation;290
9.8;5.8 Active Set Training in Primal and Dual;291
9.8.1;5.8.1 Training Support Vector Machines in the Primal;291
9.8.2;5.8.2 Comparison of Training Support Vector Machines in the Primal and the Dual;294
9.8.3;5.8.3 Performance Evaluation;297
9.9;5.9 Training of Linear Programming Support Vector Machines;299
9.9.1;5.9.1 Decomposition Techniques;300
9.9.2;5.9.2 Decomposition Techniques for Linear Programming Support Vector Machines;307
9.9.3;5.9.3 Computer Experiments;315
9.10;References;317
10;6 Kernel-Based Methods;322
10.1;6.1 Kernel Least Squares;322
10.1.1;6.1.1 Algorithm;322
10.1.2;6.1.2 Performance Evaluation;325
10.2;6.2 Kernel Principal Component Analysis;328
10.3;6.3 Kernel Mahalanobis Distance;331
10.3.1;6.3.1 SVD-Based Kernel Mahalanobis Distance;332
10.3.2;6.3.2 KPCA-Based Mahalanobis Distance;335
10.4;6.4 Principal Component Analysis in the EmpiricalFeature Space;336
10.5;6.5 Kernel Discriminant Analysis;337
10.5.1;6.5.1 Kernel Discriminant Analysis for Two-Class Problems;338
10.5.2;6.5.2 Linear Discriminant Analysis for Two-Class Problems in the Empirical Feature Space;341
10.5.3;6.5.3 Kernel Discriminant Analysis for Multiclass Problems;342
10.6;References;344
11;7 Feature Selection and Extraction;347
11.1;7.1 Selecting an Initial Set of Features;347
11.2;7.2 Procedure for Feature Selection;348
11.3;7.3 Feature Selection Using Support Vector Machines;349
11.3.1;7.3.1 Backward or Forward Feature Selection;349
11.3.2;7.3.2 Support Vector Machine-Based Feature Selection;352
11.3.3;7.3.3 Feature Selection by Cross-Validation;353
11.4;7.4 Feature Extraction;355
11.5;References;356
12;8 Clustering;358
12.1;8.1 Domain Description;358
12.2;8.2 Extension to Clustering;364
12.3;References;366
13;9 Maximum-Margin Multilayer Neural Networks;368
13.1;9.1 Approach;368
13.2;9.2 Three-Layer Neural Networks;369
13.3;9.3 CARVE Algorithm;372
13.4;9.4 Determination of Hidden-Layer Hyperplanes;373
13.4.1;9.4.1 Rotation of Hyperplanes;374
13.4.2;9.4.2 Training Algorithm;377
13.5;9.5 Determination of Output-Layer Hyperplanes;378
13.6;9.6 Determination of Parameter Values;378
13.7;9.7 Performance Evaluation;379
13.8;References;380
14;10 Maximum-Margin Fuzzy Classifiers;382
14.1;10.1 Kernel Fuzzy Classifiers with Ellipsoidal Regions;383
14.1.1;10.1.1 Conventional Fuzzy Classifiers withEllipsoidal Regions;383
14.1.2;10.1.2 Extension to a Feature Space;384
14.1.3;10.1.3 Transductive Training;385
14.1.4;10.1.4 Maximizing Margins;390
14.1.5;10.1.5 Performance Evaluation;393
14.2;10.2 Fuzzy Classifiers with Polyhedral Regions;397
14.2.1;10.2.1 Training Methods;398
14.2.2;10.2.2 Performance Evaluation;406
14.3;References;408
15;11 Function Approximation;410
15.1;11.1 Optimal Hyperplanes;410
15.2;11.2 L1 Soft-Margin Support Vector Regressors;414
15.3;11.3 L2 Soft-Margin Support Vector Regressors;416
15.4;11.4 Model Selection;418
15.5;11.5 Training Methods;418
15.5.1;11.5.1 Overview;418
15.5.2;11.5.2 Newton's Methods;420
15.5.3;11.5.3 Active Set Training;437
15.6;11.6 Variants of Support Vector Regressors;444
15.6.1;11.6.1 Linear Programming Support Vector Regressors;445
15.6.2;11.6.2 -Support Vector Regressors;446
15.6.3;11.6.3 Least-Squares Support Vector Regressors;447
15.7;11.7 Variable Selection;450
15.7.1;11.7.1 Overview;450
15.7.2;11.7.2 Variable Selection by Block Deletion;451
15.7.3;11.7.3 Performance Evaluation;452
15.8;References;453
16;A Conventional Classifiers;458
16.1;A.1 Bayesian Classifiers;458
16.2;A.2 Nearest-Neighbor Classifiers;459
16.3;References;460
17;B Matrices;462
17.1;B.1 Matrix Properties;462
17.2;B.2 Least-Squares Methods and Singular Value Decomposition;464
17.3;B.3 Covariance Matrices;467
17.4;References;469
18;C Quadratic Programming;470
18.1;C.1 Optimality Conditions;470
18.2;C.2 Properties of Solutions;471
19;D Positive Semidefinite Kernels and Reproducing Kernel Hilbert Space;474
19.1;D.1 Positive Semidefinite Kernels;474
19.2;D.2 Reproducing Kernel Hilbert Space;478
19.3;References;480
20;Index;482




