E-Book, Englisch, 758 Seiten
Hastie / Friedman / Tibshirani The Elements of Statistical Learning
2. Auflage 2009
ISBN: 978-0-387-84858-7
Verlag: Springer US
Format: PDF
Kopierschutz: 1 - PDF Watermark
Data Mining, Inference, and Prediction, Second Edition
E-Book, Englisch, 758 Seiten
Reihe: Springer Series in Statistics
ISBN: 978-0-387-84858-7
Verlag: Springer US
Format: PDF
Kopierschutz: 1 - PDF Watermark
During the past decade there has been an explosion in computation and information technology. With it have come vast amounts of data in a variety of fields such as medicine, biology, finance, and marketing. The challenge of understanding these data has led to the development of new tools in the field of statistics, and spawned new areas such as data mining, machine learning, and bioinformatics. Many of these tools have common underpinnings but are often expressed with different terminology. This book describes the important ideas in these areas in a common conceptual framework. While the approach is statistical, the emphasis is on concepts rather than mathematics. Many examples are given, with a liberal use of color graphics. It is a valuable resource for statisticians and anyone interested in data mining in science or industry. The book's coverage is broad, from supervised learning (prediction) to unsupervised learning. The many topics include neural networks, support vector machines, classification trees and boosting---the first comprehensive treatment of this topic in any book. This major new edition features many topics not covered in the original, including graphical models, random forests, ensemble methods, least angle regression & path algorithms for the lasso, non-negative matrix factorization, and spectral clustering. There is also a chapter on methods for ``wide'' data (p bigger than n), including multiple testing and false discovery rates.
Autoren/Hrsg.
Weitere Infos & Material
1;Preface to the Second Edition;7
2;Preface to the First Edition;10
3;Contents;12
4;Introduction;22
5;Overview of Supervised Learning;30
5.1;2.1 Introduction;30
5.2;2.2 Variable Types and Terminology;30
5.3;2.3 Two Simple Approaches to Prediction: Least Squares and Nearest Neighbors;32
5.4;2.4 Statistical Decision Theory;39
5.5;2.5 Local Methods in High Dimensions;43
5.6;2.6 Statistical Models, Supervised Learning and Function Approximation;49
5.7;2.7 Structured Regression Models;53
5.8;2.8 Classes of Restricted Estimators;54
5.9;2.9 Model Selection and the Bias–Variance Tradeoff;58
5.10;Bibliographic Notes;60
5.11;Exercises;60
6;Linear Methods for Regression;63
6.1;3.1 Introduction;63
6.2;3.2 Linear Regression Models and Least Squares;64
6.3;3.3 Subset Selection;77
6.4;3.4 Shrinkage Methods;81
6.5;3.5 Methods Using Derived Input Directions;99
6.6;3.6 Discussion: A Comparison of the Selection and Shrinkage Methods;102
6.7;3.7 Multiple Outcome Shrinkage and Selection;104
6.8;3.8 More on the Lasso and Related Path Algorithms;106
6.9;3.9 Computational Considerations;113
6.10;Bibliographic Notes;114
6.11;Exercises;114
7;Linear Methods for Classification;120
7.1;4.1 Introduction;120
7.2;4.2 Linear Regression of an Indicator Matrix;122
7.3;4.3 Linear Discriminant Analysis;125
7.4;4.4 Logistic Regression;138
7.5;4.5 Separating Hyperplanes;148
7.6;Bibliographic Notes;154
7.7;Exercises;154
8;Basis Expansions and Regularization;157
8.1;5.1 Introduction;157
8.2;5.2 Piecewise Polynomials and Splines;159
8.3;5.3 Filtering and Feature Extraction;168
8.4;5.4 Smoothing Splines;169
8.5;5.5 Automatic Selection of the Smoothing Parameters;174
8.6;5.6 Nonparametric Logistic Regression;179
8.7;5.7 Multidimensional Splines;180
8.8;5.8 Regularization and Reproducing Kernel Hilbert Spaces;185
8.9;5.9 Wavelet Smoothing;192
8.10;Bibliographic Notes;199
8.11;Exercises;199
8.12;Appendix: Computations for Splines;204
9;Kernel Smoothing Methods;208
9.1;6.1 One-Dimensional Kernel Smoothers;209
9.2;6.2 Selecting the Width of the Kernel;215
9.3;6.3 Local Regression in IRp;217
9.4;6.4 Structured Local Regression Models in IRp;218
9.5;6.5 Local Likelihood and Other Models;222
9.6;6.6 Kernel Density Estimation and Classification;225
9.7;6.7 Radial Basis Functions and Kernels;229
9.8;6.8 Mixture Models for Density Estimation and Classification;231
9.9;6.9 Computational Considerations;233
9.10;Bibliographic Notes;233
9.11;Exercises;233
10;Model Assessment and Selection;236
10.1;7.1 Introduction;236
10.2;7.2 Bias, Variance and Model Complexity;236
10.3;7.3 The Bias–Variance Decomposition;240
10.4;7.4 Optimism of the Training Error Rate;245
10.5;7.5 Estimates of In-Sample Prediction Error;247
10.6;7.6 The Effective Number of Parameters;249
10.7;7.7 The Bayesian Approach and BIC;250
10.8;7.8 Minimum Description Length;252
10.9;7.9 Vapnik–Chervonenkis Dimension;254
10.10;7.10 Cross-Validation;258
10.11;7.11 Bootstrap Methods;266
10.12;7.12 Conditional or Expected Test Error?;271
10.13;Bibliographic Notes;274
10.14;Exercises;274
11;Model Inference and Averaging;277
11.1;8.1 Introduction;277
11.2;8.2 The Bootstrap and Maximum Likelihood Methods;277
11.3;8.3 Bayesian Methods;283
11.4;8.4 Relationship Between the Bootstrap and Bayesian Inference;287
11.5;8.5 The EM Algorithm;288
11.6;8.6 MCMC for Sampling from the Posterior;295
11.7;8.7 Bagging;298
11.8;8.8 Model Averaging and Stacking;304
11.9;8.9 Stochastic Search: Bumping;306
11.10;Bibliographic Notes;308
11.11;Exercises;309
12;Additive Models, Trees, and Related Methods;311
12.1;9.1 Generalized Additive Models;311
12.2;9.2 Tree-Based Methods;321
12.3;9.3 PRIM: Bump Hunting;333
12.4;9.4 MARS: Multivariate Adaptive Regression Splines;337
12.5;9.5 Hierarchical Mixtures of Experts;345
12.6;9.6 Missing Data;348
12.7;9.7 Computational Considerations;350
12.8;Bibliographic Notes;350
12.9;Exercises;351
13;Boosting and Additive Trees;353
13.1;10.1 Boosting Methods;353
13.2;10.2 Boosting Fits an Additive Model;357
13.3;10.3 Forward Stagewise Additive Modeling;358
13.4;10.4 Exponential Loss and AdaBoost;359
13.5;10.5 Why Exponential Loss?;361
13.6;10.6 Loss Functions and Robustness;362
13.7;10.7 “Off-the-Shelf” Procedures for Data Mining;366
13.8;10.8 Example: Spam Data;368
13.9;10.9 Boosting Trees;369
13.10;10.10 Numerical Optimization via Gradient Boosting;374
13.11;10.11 Right-Sized Trees for Boosting;377
13.12;10.12 Regularization;380
13.13;10.13 Interpretation;383
13.14;10.14 Illustrations;387
13.15;Bibliographic Notes;396
13.16;Exercises;400
14;Neural Networks;404
14.1;11.1 Introduction;404
14.2;11.2 Projection Pursuit Regression;404
14.3;11.3 Neural Networks;407
14.4;11.4 Fitting Neural Networks;410
14.5;11.5 Some Issues in Training Neural Networks;412
14.6;11.6 Example: Simulated Data;416
14.7;11.7 Example: ZIP Code Data;419
14.8;11.8 Discussion;423
14.9;11.9 Bayesian Neural Nets and the NIPS 2003 Challenge;424
14.10;11.10 Computational Considerations;429
14.11;Bibliographic Notes;430
14.12;Exercises;430
15;Support Vector Machines and Flexible Discriminants;432
15.1;12.1 Introduction;432
15.2;12.2 The Support Vector Classifier;432
15.3;12.3 Support Vector Machines and Kernels;438
15.4;12.4 Generalizing Linear Discriminant Analysis;453
15.5;12.5 Flexible Discriminant Analysis;455
15.6;12.6 Penalized Discriminant Analysis;461
15.7;12.7 Mixture Discriminant Analysis;464
15.8;Bibliographic Notes;470
15.9;Exercises;470
16;Prototype Methods and Nearest-Neighbors;474
16.1;13.1 Introduction;474
16.2;13.2 Prototype Methods;474
16.3;13.3 k-Nearest-Neighbor Classifiers;478
16.4;13.4 Adaptive Nearest-Neighbor Methods;490
16.5;13.5 Computational Considerations;495
16.6;Bibliographic Notes;496
16.7;Exercises;496
17;Unsupervised Learning;499
17.1;14.1 Introduction;499
17.2;14.2 Association Rules;501
17.3;14.3 Cluster Analysis;515
17.4;14.4 Self-Organizing Maps;542
17.5;14.5 Principal Components, Curves and Surfaces;548
17.6;14.6 Non-negative Matrix Factorization;567
17.7;14.7 Independent Component Analysis and Exploratory Projection Pursuit;571
17.8;14.8 Multidimensional Scaling;584
17.9;14.9 Nonlinear Dimension Reduction and Local Multidimensional Scaling;586
17.10;14.10 The Google PageRank Algorithm;590
17.11;Bibliographic Notes;592
17.12;Exercises;593
18;Random Forests;600
18.1;15.1 Introduction;600
18.2;15.2 Definition of Random Forests;600
18.3;15.3 Details of Random Forests;605
18.4;15.4 Analysis of Random Forests;610
18.5;Bibliographic Notes;615
18.6;Exercises;616
19;Ensemble Learning;618
19.1;16.1 Introduction;618
19.2;16.2 Boosting and Regularization Paths;620
19.3;16.3 Learning Ensembles;629
19.4;Bibliographic Notes;636
19.5;Exercises;637
20;Undirected Graphical Models;638
20.1;17.1 Introduction;638
20.2;17.2 Markov Graphs and Their Properties;640
20.3;17.3 Undirected Graphical Models for Continuous Variables;643
20.4;17.4 Undirected Graphical Models for Discrete Variables;651
20.5;Bibliographic Notes;658
20.6;Exercises;658
21;High-Dimensional Problems: p N;662
21.1;18.1 When p is Much Bigger than N;662
21.2;18.2 Diagonal Linear Discriminant Analysis and Nearest Shrunken Centroids;664
21.3;18.3 Linear Classifiers with Quadratic Regularization;667
21.4;18.4 Linear Classifiers with L1 Regularization;674
21.5;18.5 Classification When Features are Unavailable;681
21.6;18.6 High-Dimensional Regression: Supervised Principal Components;687
21.7;18.7 Feature Assessment and the Multiple-Testing Problem;696
21.8;18.8 Bibliographic Notes;706
21.9;Exercises;707
22;References;712
23;Author Index;741
24;Index;749




