Clarke / Fokoue / Zhang | Principles and Theory for Data Mining and Machine Learning | E-Book | www2.sack.de
E-Book

E-Book, Englisch, 793 Seiten

Reihe: Springer Series in Statistics

Clarke / Fokoue / Zhang Principles and Theory for Data Mining and Machine Learning


1. Auflage 2009
ISBN: 978-0-387-98135-2
Verlag: Springer
Format: PDF
Kopierschutz: 1 - PDF Watermark

E-Book, Englisch, 793 Seiten

Reihe: Springer Series in Statistics

ISBN: 978-0-387-98135-2
Verlag: Springer
Format: PDF
Kopierschutz: 1 - PDF Watermark



Extensive treatment of the most up-to-date topics Provides the theory and concepts behind popular and emerging methods Range of topics drawn from Statistics, Computer Science, and Electrical Engineering

Clarke / Fokoue / Zhang Principles and Theory for Data Mining and Machine Learning jetzt bestellen!

Weitere Infos & Material


1;Preface;1
2;Variability, Information, and Prediction;16
2.1;The Curse of Dimensionality;18
2.1.1;The Two Extremes;19
2.2;Perspectives on the Curse;20
2.2.1;Sparsity;21
2.2.2;Exploding Numbers of Models;23
2.2.3;Multicollinearity and Concurvity;24
2.2.4;The Effect of Noise;25
2.3;Coping with the Curse;26
2.3.1;Selecting Design Points;26
2.3.2;Local Dimension;27
2.3.3;Parsimony;32
2.4;Two Techniques;33
2.4.1;The Bootstrap;33
2.4.2;Cross-Validation;42
2.5;Optimization and Search;47
2.5.1;Univariate Search;47
2.5.2;Multivariate Search;48
2.5.3;General Searches;49
2.5.4;Constraint Satisfaction and Combinatorial Search;50
2.6;Notes;53
2.6.1;Hammersley Points;53
2.6.2;Edgeworth Expansions for the Mean;54
2.6.3;Bootstrap Asymptotics for the Studentized Mean;56
2.7;Exercises;58
3;Local Smoothers;68
3.1;Early Smoothers;70
3.2;Transition to Classical Smoothers;74
3.2.1;Global Versus Local Approximations;75
3.2.2;LOESS;79
3.3;Kernel Smoothers;82
3.3.1;Statistical Function Approximation;83
3.3.2;The Concept of Kernel Methods and the Discrete Case;88
3.3.3;Kernels and Stochastic Designs: Density Estimation;93
3.3.4;Stochastic Designs: Asymptotics for Kernel Smoothers;96
3.3.5;Convergence Theorems and Rates for Kernel Smoothers;101
3.3.6;Kernel and Bandwidth Selection;105
3.3.7;Linear Smoothers;110
3.4;Nearest Neighbors;111
3.5;Applications of Kernel Regression;115
3.5.1;A Simulated Example;115
3.5.2;Ethanol Data;117
3.6;Exercises;122
4;Spline Smoothing;132
4.1;Interpolating Splines;132
4.2;Natural Cubic Splines;138
4.3;Smoothing Splines for Regression;141
4.3.1;Model Selection for Spline Smoothing;144
4.3.2;Spline Smoothing Meets Kernel Smoothing;145
4.4;Asymptotic Bias, Variance, and MISE for Spline Smoothers;146
4.4.1;Ethanol Data Example -- Continued;148
4.5;Splines Redux: Hilbert Space Formulation;151
4.5.1;Reproducing Kernels;153
4.5.2;Constructing an RKHS;156
4.5.3;Direct Sum Construction for Splines;161
4.5.4;Explicit Forms;164
4.5.5;Nonparametrics in Data Mining and Machine Learning;167
4.6;Simulated Comparisons;169
4.6.1;What Happens with Dependent Noise Models?;172
4.6.2;Higher Dimensions and the Curse of Dimensionality;174
4.7;Notes;178
4.7.1;Sobolev Spaces: Definition;178
4.8;Exercises;179
5;New Wave Nonparametrics;186
5.1;Additive Models;187
5.1.1;The Backfitting Algorithm;188
5.1.2;Concurvity and Inference;192
5.1.3;Nonparametric Optimality;195
5.2;Generalized Additive Models;196
5.3;Projection Pursuit Regression;199
5.4;Neural Networks;204
5.4.1;Backpropagation and Inference;207
5.4.2;Barron's Result and the Curse;212
5.4.3;Approximation Properties;213
5.4.4;Barron's Theorem: Formal Statement;215
5.5;Recursive Partitioning Regression;217
5.5.1;Growing Trees;219
5.5.2;Pruning and Selection;222
5.5.3;Regression;223
5.5.4;Bayesian Additive Regression Trees: BART;225
5.6;MARS;225
5.7;Sliced Inverse Regression;230
5.8;ACE and AVAS;233
5.9;Notes;235
5.9.1;Proof of Barron's Theorem;235
5.10;Exercises;239
6;Supervised Learning: Partition Methods;246
6.1;Multiclass Learning;248
6.2;Discriminant Analysis;250
6.2.1;Distance-Based Discriminant Analysis;251
6.2.2;Bayes Rules;256
6.2.3;Probability-Based Discriminant Analysis;260
6.3;Tree-Based Classifiers;264
6.3.1;Splitting Rules;264
6.3.2;Logic Trees;268
6.3.3;Random Forests;269
6.4;Support Vector Machines;277
6.4.1;Margins and Distances;277
6.4.2;Binary Classification and Risk;280
6.4.3;Prediction Bounds for Function Classes;283
6.4.4;Constructing SVM Classifiers;286
6.4.5;SVM Classification for Nonlinearly Separable Populations;294
6.4.6;SVMs in the General Nonlinear Case;297
6.4.7;Some Kernels Used in SVM Classification;303
6.4.8;Kernel Choice, SVMs and Model Selection;304
6.4.9;Support Vector Regression;305
6.4.10;Multiclass Support Vector Machines;308
6.5;Neural Networks;309
6.6;Notes;311
6.6.1;Hoeffding's Inequality;311
6.6.2;VC Dimension;312
6.7;Exercises;315
7;Alternative Nonparametrics;322
7.1;Ensemble Methods;323
7.1.1;Bayes Model Averaging;325
7.1.2;Bagging;327
7.1.3;Stacking;331
7.1.4;Boosting;333
7.1.5;Other Averaging Methods;341
7.1.6;Oracle Inequalities;343
7.2;Bayes Nonparametrics;349
7.2.1;Dirichlet Process Priors;349
7.2.2;Polya Tree Priors;351
7.2.3;Gaussian Process Priors;353
7.3;The Relevance Vector Machine;359
7.3.1;RVM Regression: Formal Description;360
7.3.2;RVM Classification;364
7.4;Hidden Markov Models -- Sequential Classification;367
7.5;Notes;369
7.5.1;Proof of Yang's Oracle Inequality;369
7.5.2;Proof of Lecue's Oracle Inequality;372
7.6;Exercises;374
8;Computational Comparisons;379
8.1;Computational Results: Classification;380
8.1.1;Comparison on Fisher's Iris Data;380
8.1.2;Comparison on Ripley's Data;383
8.2;Computational Results: Regression;390
8.2.1;Vapnik's sinc Function;391
8.2.2;Friedman's Function;403
8.2.3;Conclusions;406
8.3;Systematic Simulation Study;411
8.4;No Free Lunch;414
8.5;Exercises;416
9;Unsupervised Learning: Clustering;419
9.1;Centroid-Based Clustering;422
9.1.1;K-Means Clustering;423
9.1.2;Variants;426
9.2;Hierarchical Clustering;427
9.2.1;Agglomerative Hierarchical Clustering;428
9.2.2;Divisive Hierarchical Clustering;436
9.2.3;Theory for Hierarchical Clustering;440
9.3;Partitional Clustering;444
9.3.1;Model-Based Clustering;446
9.3.2;Graph-Theoretic Clustering;461
9.3.3;Spectral Clustering;466
9.4;Bayesian Clustering;472
9.4.1;Probabilistic Clustering;472
9.4.2;Hypothesis Testing;475
9.5;Computed Examples;477
9.5.1;Ripley's Data;479
9.5.2;Iris Data;489
9.6;Cluster Validation;494
9.7;Notes;498
9.7.1;Derivatives of Functions of a Matrix:;498
9.7.2;Kruskal's Algorithm: Proof;498
9.7.3;Prim's Algorithm: Proof;499
9.8;Exercises;499
10;Learning in High Dimensions;506
10.1;Principal Components;508
10.1.1;Main Theorem;509
10.1.2;Key Properties;511
10.1.3;Extensions;513
10.2;Factor Analysis;515
10.2.1;Finding and;517
10.2.2;Finding K;519
10.2.3;Estimating Factor Scores;520
10.3;Projection Pursuit;521
10.4;Independent Components Analysis;524
10.4.1;Main Definitions;524
10.4.2;Key Results;526
10.4.3;Computational Approach;528
10.5;Nonlinear PCs and ICA;529
10.5.1;Nonlinear PCs;530
10.5.2;Nonlinear ICA;531
10.6;Geometric Summarization;531
10.6.1;Measuring Distances to an Algebraic Shape;532
10.6.2;Principal Curves and Surfaces;533
10.7;Supervised Dimension Reduction: Partial Least Squares;536
10.7.1;Simple PLS;536
10.7.2;PLS Procedures;537
10.7.3;Properties of PLS;539
10.8;Supervised Dimension Reduction: Sufficient Dimensions in Regression;540
10.9;Visualization I: Basic Plots;544
10.9.1;Elementary Visualization;547
10.9.2;Projections;554
10.9.3;Time Dependence;556
10.10;Visualization II: Transformations;559
10.10.1;Chernoff Faces;559
10.10.2;Multidimensional Scaling;560
10.10.3;Self-Organizing Maps;566
10.11;Exercises;573
11;Variable Selection;582
11.1;Concepts from Linear Regression;583
11.1.1;Subset Selection;585
11.1.2;Variable Ranking;588
11.1.3;Overview;590
11.2;Traditional Criteria;591
11.2.1;Akaike Information Criterion (AIC);593
11.2.2;Bayesian Information Criterion (BIC);596
11.2.3;Choices of Information Criteria;598
11.2.4;Cross Validation;600
11.3;Shrinkage Methods;612
11.3.1;Shrinkage Methods for Linear Models;614
11.3.2;Grouping in Variable Selection;628
11.3.3;Least Angle Regression;630
11.3.4;Shrinkage Methods for Model Classes;633
11.3.5;Cautionary Notes;644
11.4;Bayes Variable Selection;645
11.4.1;Prior Specification;648
11.4.2;Posterior Calculation and Exploration;656
11.4.3;Evaluating Evidence;660
11.4.4;Connections Between Bayesian and Frequentist Methods;663
11.5;Computational Comparisons;666
11.5.1;The n > p Case;666
11.5.2;When p > n;678
11.6;Notes;680
11.6.1;Code for Generating Data in Section 10.5;680
11.7;Exercises;684
12;Multiple Testing;692
12.1;Analyzing the Hypothesis Testing Problem;694
12.1.1;A Paradigmatic Setting;694
12.1.2;Counts for Multiple Tests;697
12.1.3;Measures of Error in Multiple Testing;698
12.1.4;Aspects of Error Control;700
12.2;Controlling the Familywise Error Rate;703
12.2.1;One-Step Adjustments;703
12.2.2;Stepwise p-Value Adjustments;706
12.3;PCER and PFER;708
12.3.1;Null Domination;709
12.3.2;Two Procedures;710
12.3.3;Controlling the Type I Error Rate;715
12.3.4;Adjusted p-Values for PFER/PCER;719
12.4;Controlling the False Discovery Rate;720
12.4.1;FDR and other Measures of Error;722
12.4.2;The Benjamini-Hochberg Procedure;723
12.4.3;A BH Theorem for a Dependent Setting;724
12.4.4;Variations on BH;726
12.5;Controlling the Positive False Discovery Rate;732
12.5.1;Bayesian Interpretations;732
12.5.2;Aspects of Implementation;736
12.6;Bayesian Multiple Testing;740
12.6.1;Fully Bayes: Hierarchical;741
12.6.2;Fully Bayes: Decision theory;744
12.7;Notes;749
12.7.1;Proof of the Benjamini-Hochberg Theorem;749
12.7.2;Proof of the Benjamini-Yekutieli Theorem;752
13;References;756
14;Index;785



Ihre Fragen, Wünsche oder Anmerkungen
Vorname*
Nachname*
Ihre E-Mail-Adresse*
Kundennr.
Ihre Nachricht*
Lediglich mit * gekennzeichnete Felder sind Pflichtfelder.
Wenn Sie die im Kontaktformular eingegebenen Daten durch Klick auf den nachfolgenden Button übersenden, erklären Sie sich damit einverstanden, dass wir Ihr Angaben für die Beantwortung Ihrer Anfrage verwenden. Selbstverständlich werden Ihre Daten vertraulich behandelt und nicht an Dritte weitergegeben. Sie können der Verwendung Ihrer Daten jederzeit widersprechen. Das Datenhandling bei Sack Fachmedien erklären wir Ihnen in unserer Datenschutzerklärung.