E-Book, Englisch, 793 Seiten
Clarke / Fokoue / Zhang Principles and Theory for Data Mining and Machine Learning
1. Auflage 2009
ISBN: 978-0-387-98135-2
Verlag: Springer
Format: PDF
Kopierschutz: 1 - PDF Watermark
E-Book, Englisch, 793 Seiten
Reihe: Springer Series in Statistics
ISBN: 978-0-387-98135-2
Verlag: Springer
Format: PDF
Kopierschutz: 1 - PDF Watermark
Extensive treatment of the most up-to-date topics Provides the theory and concepts behind popular and emerging methods Range of topics drawn from Statistics, Computer Science, and Electrical Engineering
Autoren/Hrsg.
Weitere Infos & Material
1;Preface;1
2;Variability, Information, and Prediction;16
2.1;The Curse of Dimensionality;18
2.1.1;The Two Extremes;19
2.2;Perspectives on the Curse;20
2.2.1;Sparsity;21
2.2.2;Exploding Numbers of Models;23
2.2.3;Multicollinearity and Concurvity;24
2.2.4;The Effect of Noise;25
2.3;Coping with the Curse;26
2.3.1;Selecting Design Points;26
2.3.2;Local Dimension;27
2.3.3;Parsimony;32
2.4;Two Techniques;33
2.4.1;The Bootstrap;33
2.4.2;Cross-Validation;42
2.5;Optimization and Search;47
2.5.1;Univariate Search;47
2.5.2;Multivariate Search;48
2.5.3;General Searches;49
2.5.4;Constraint Satisfaction and Combinatorial Search;50
2.6;Notes;53
2.6.1;Hammersley Points;53
2.6.2;Edgeworth Expansions for the Mean;54
2.6.3;Bootstrap Asymptotics for the Studentized Mean;56
2.7;Exercises;58
3;Local Smoothers;68
3.1;Early Smoothers;70
3.2;Transition to Classical Smoothers;74
3.2.1;Global Versus Local Approximations;75
3.2.2;LOESS;79
3.3;Kernel Smoothers;82
3.3.1;Statistical Function Approximation;83
3.3.2;The Concept of Kernel Methods and the Discrete Case;88
3.3.3;Kernels and Stochastic Designs: Density Estimation;93
3.3.4;Stochastic Designs: Asymptotics for Kernel Smoothers;96
3.3.5;Convergence Theorems and Rates for Kernel Smoothers;101
3.3.6;Kernel and Bandwidth Selection;105
3.3.7;Linear Smoothers;110
3.4;Nearest Neighbors;111
3.5;Applications of Kernel Regression;115
3.5.1;A Simulated Example;115
3.5.2;Ethanol Data;117
3.6;Exercises;122
4;Spline Smoothing;132
4.1;Interpolating Splines;132
4.2;Natural Cubic Splines;138
4.3;Smoothing Splines for Regression;141
4.3.1;Model Selection for Spline Smoothing;144
4.3.2;Spline Smoothing Meets Kernel Smoothing;145
4.4;Asymptotic Bias, Variance, and MISE for Spline Smoothers;146
4.4.1;Ethanol Data Example -- Continued;148
4.5;Splines Redux: Hilbert Space Formulation;151
4.5.1;Reproducing Kernels;153
4.5.2;Constructing an RKHS;156
4.5.3;Direct Sum Construction for Splines;161
4.5.4;Explicit Forms;164
4.5.5;Nonparametrics in Data Mining and Machine Learning;167
4.6;Simulated Comparisons;169
4.6.1;What Happens with Dependent Noise Models?;172
4.6.2;Higher Dimensions and the Curse of Dimensionality;174
4.7;Notes;178
4.7.1;Sobolev Spaces: Definition;178
4.8;Exercises;179
5;New Wave Nonparametrics;186
5.1;Additive Models;187
5.1.1;The Backfitting Algorithm;188
5.1.2;Concurvity and Inference;192
5.1.3;Nonparametric Optimality;195
5.2;Generalized Additive Models;196
5.3;Projection Pursuit Regression;199
5.4;Neural Networks;204
5.4.1;Backpropagation and Inference;207
5.4.2;Barron's Result and the Curse;212
5.4.3;Approximation Properties;213
5.4.4;Barron's Theorem: Formal Statement;215
5.5;Recursive Partitioning Regression;217
5.5.1;Growing Trees;219
5.5.2;Pruning and Selection;222
5.5.3;Regression;223
5.5.4;Bayesian Additive Regression Trees: BART;225
5.6;MARS;225
5.7;Sliced Inverse Regression;230
5.8;ACE and AVAS;233
5.9;Notes;235
5.9.1;Proof of Barron's Theorem;235
5.10;Exercises;239
6;Supervised Learning: Partition Methods;246
6.1;Multiclass Learning;248
6.2;Discriminant Analysis;250
6.2.1;Distance-Based Discriminant Analysis;251
6.2.2;Bayes Rules;256
6.2.3;Probability-Based Discriminant Analysis;260
6.3;Tree-Based Classifiers;264
6.3.1;Splitting Rules;264
6.3.2;Logic Trees;268
6.3.3;Random Forests;269
6.4;Support Vector Machines;277
6.4.1;Margins and Distances;277
6.4.2;Binary Classification and Risk;280
6.4.3;Prediction Bounds for Function Classes;283
6.4.4;Constructing SVM Classifiers;286
6.4.5;SVM Classification for Nonlinearly Separable Populations;294
6.4.6;SVMs in the General Nonlinear Case;297
6.4.7;Some Kernels Used in SVM Classification;303
6.4.8;Kernel Choice, SVMs and Model Selection;304
6.4.9;Support Vector Regression;305
6.4.10;Multiclass Support Vector Machines;308
6.5;Neural Networks;309
6.6;Notes;311
6.6.1;Hoeffding's Inequality;311
6.6.2;VC Dimension;312
6.7;Exercises;315
7;Alternative Nonparametrics;322
7.1;Ensemble Methods;323
7.1.1;Bayes Model Averaging;325
7.1.2;Bagging;327
7.1.3;Stacking;331
7.1.4;Boosting;333
7.1.5;Other Averaging Methods;341
7.1.6;Oracle Inequalities;343
7.2;Bayes Nonparametrics;349
7.2.1;Dirichlet Process Priors;349
7.2.2;Polya Tree Priors;351
7.2.3;Gaussian Process Priors;353
7.3;The Relevance Vector Machine;359
7.3.1;RVM Regression: Formal Description;360
7.3.2;RVM Classification;364
7.4;Hidden Markov Models -- Sequential Classification;367
7.5;Notes;369
7.5.1;Proof of Yang's Oracle Inequality;369
7.5.2;Proof of Lecue's Oracle Inequality;372
7.6;Exercises;374
8;Computational Comparisons;379
8.1;Computational Results: Classification;380
8.1.1;Comparison on Fisher's Iris Data;380
8.1.2;Comparison on Ripley's Data;383
8.2;Computational Results: Regression;390
8.2.1;Vapnik's sinc Function;391
8.2.2;Friedman's Function;403
8.2.3;Conclusions;406
8.3;Systematic Simulation Study;411
8.4;No Free Lunch;414
8.5;Exercises;416
9;Unsupervised Learning: Clustering;419
9.1;Centroid-Based Clustering;422
9.1.1;K-Means Clustering;423
9.1.2;Variants;426
9.2;Hierarchical Clustering;427
9.2.1;Agglomerative Hierarchical Clustering;428
9.2.2;Divisive Hierarchical Clustering;436
9.2.3;Theory for Hierarchical Clustering;440
9.3;Partitional Clustering;444
9.3.1;Model-Based Clustering;446
9.3.2;Graph-Theoretic Clustering;461
9.3.3;Spectral Clustering;466
9.4;Bayesian Clustering;472
9.4.1;Probabilistic Clustering;472
9.4.2;Hypothesis Testing;475
9.5;Computed Examples;477
9.5.1;Ripley's Data;479
9.5.2;Iris Data;489
9.6;Cluster Validation;494
9.7;Notes;498
9.7.1;Derivatives of Functions of a Matrix:;498
9.7.2;Kruskal's Algorithm: Proof;498
9.7.3;Prim's Algorithm: Proof;499
9.8;Exercises;499
10;Learning in High Dimensions;506
10.1;Principal Components;508
10.1.1;Main Theorem;509
10.1.2;Key Properties;511
10.1.3;Extensions;513
10.2;Factor Analysis;515
10.2.1;Finding and;517
10.2.2;Finding K;519
10.2.3;Estimating Factor Scores;520
10.3;Projection Pursuit;521
10.4;Independent Components Analysis;524
10.4.1;Main Definitions;524
10.4.2;Key Results;526
10.4.3;Computational Approach;528
10.5;Nonlinear PCs and ICA;529
10.5.1;Nonlinear PCs;530
10.5.2;Nonlinear ICA;531
10.6;Geometric Summarization;531
10.6.1;Measuring Distances to an Algebraic Shape;532
10.6.2;Principal Curves and Surfaces;533
10.7;Supervised Dimension Reduction: Partial Least Squares;536
10.7.1;Simple PLS;536
10.7.2;PLS Procedures;537
10.7.3;Properties of PLS;539
10.8;Supervised Dimension Reduction: Sufficient Dimensions in Regression;540
10.9;Visualization I: Basic Plots;544
10.9.1;Elementary Visualization;547
10.9.2;Projections;554
10.9.3;Time Dependence;556
10.10;Visualization II: Transformations;559
10.10.1;Chernoff Faces;559
10.10.2;Multidimensional Scaling;560
10.10.3;Self-Organizing Maps;566
10.11;Exercises;573
11;Variable Selection;582
11.1;Concepts from Linear Regression;583
11.1.1;Subset Selection;585
11.1.2;Variable Ranking;588
11.1.3;Overview;590
11.2;Traditional Criteria;591
11.2.1;Akaike Information Criterion (AIC);593
11.2.2;Bayesian Information Criterion (BIC);596
11.2.3;Choices of Information Criteria;598
11.2.4;Cross Validation;600
11.3;Shrinkage Methods;612
11.3.1;Shrinkage Methods for Linear Models;614
11.3.2;Grouping in Variable Selection;628
11.3.3;Least Angle Regression;630
11.3.4;Shrinkage Methods for Model Classes;633
11.3.5;Cautionary Notes;644
11.4;Bayes Variable Selection;645
11.4.1;Prior Specification;648
11.4.2;Posterior Calculation and Exploration;656
11.4.3;Evaluating Evidence;660
11.4.4;Connections Between Bayesian and Frequentist Methods;663
11.5;Computational Comparisons;666
11.5.1;The n > p Case;666
11.5.2;When p > n;678
11.6;Notes;680
11.6.1;Code for Generating Data in Section 10.5;680
11.7;Exercises;684
12;Multiple Testing;692
12.1;Analyzing the Hypothesis Testing Problem;694
12.1.1;A Paradigmatic Setting;694
12.1.2;Counts for Multiple Tests;697
12.1.3;Measures of Error in Multiple Testing;698
12.1.4;Aspects of Error Control;700
12.2;Controlling the Familywise Error Rate;703
12.2.1;One-Step Adjustments;703
12.2.2;Stepwise p-Value Adjustments;706
12.3;PCER and PFER;708
12.3.1;Null Domination;709
12.3.2;Two Procedures;710
12.3.3;Controlling the Type I Error Rate;715
12.3.4;Adjusted p-Values for PFER/PCER;719
12.4;Controlling the False Discovery Rate;720
12.4.1;FDR and other Measures of Error;722
12.4.2;The Benjamini-Hochberg Procedure;723
12.4.3;A BH Theorem for a Dependent Setting;724
12.4.4;Variations on BH;726
12.5;Controlling the Positive False Discovery Rate;732
12.5.1;Bayesian Interpretations;732
12.5.2;Aspects of Implementation;736
12.6;Bayesian Multiple Testing;740
12.6.1;Fully Bayes: Hierarchical;741
12.6.2;Fully Bayes: Decision theory;744
12.7;Notes;749
12.7.1;Proof of the Benjamini-Hochberg Theorem;749
12.7.2;Proof of the Benjamini-Yekutieli Theorem;752
13;References;756
14;Index;785




