E-Book, Englisch, 1062 Seiten
Theodoridis Machine Learning
1. Auflage 2015
ISBN: 978-0-12-801722-7
Verlag: Elsevier Science & Techn.
Format: EPUB
Kopierschutz: Adobe DRM (»Systemvoraussetzungen)
A Bayesian and Optimization Perspective
E-Book, Englisch, 1062 Seiten
ISBN: 978-0-12-801722-7
Verlag: Elsevier Science & Techn.
Format: EPUB
Kopierschutz: Adobe DRM (»Systemvoraussetzungen)
This tutorial text gives a unifying perspective on machine learning by covering both probabilistic and deterministic approaches -which are based on optimization techniques - together with the Bayesian inference approach, whose essence lies in the use of a hierarchy of probabilistic models.The book presents the major machine learning methods as they have been developed in different disciplines, such as statistics, statistical and adaptive signal processing and computer science. Focusing on the physical reasoning behind the mathematics, all the various methods and techniques are explained in depth, supported by examples and problems, giving an invaluable resource to the student and researcher for understanding and applying machine learning concepts. The book builds carefully from the basic classical methods to the most recent trends, with chapters written to be as self-contained as possible, making the text suitable for different courses: pattern recognition, statistical/adaptive signal processing, statistical/Bayesian learning, as well as short courses on sparse modeling, deep learning, and probabilistic graphical models. - All major classical techniques: Mean/Least-Squares regression and filtering, Kalman filtering, stochastic approximation and online learning, Bayesian classification, decision trees, logistic regression and boosting methods. - The latest trends: Sparsity, convex analysis and optimization, online distributed algorithms, learning in RKH spaces, Bayesian inference, graphical and hidden Markov models, particle filtering, deep learning, dictionary learning and latent variables modeling. - Case studies - protein folding prediction, optical character recognition, text authorship identification, fMRI data analysis, change point detection, hyperspectral image unmixing, target localization, channel equalization and echo cancellation, show how the theory can be applied. - MATLAB code for all the main algorithms are available on an accompanying website, enabling the reader to experiment with the code.
Sergios Theodoridis is professor emeritus of machine learning and data processing with the National and Kapodistrian University of Athens, Greece. He is a Fellow of EURASIP and a Life Fellow of IEEE. He is the coauthor of the best-selling book Pattern Recognition, 4th edition, Academic Press, 2009, and of the book Introduction to Pattern Recognition: A MATLAB Approach, Academic Press, 2010.
Autoren/Hrsg.
Weitere Infos & Material
Chapter 1 Introduction
Abstract
This chapter serves as an introduction to the text and an overview of machine learning. It deals with two problems at the heart of machine learning and of the book—classification and regression tasks. The chapter also outlines the structure of the book and provides a road map for students and instructors. A summary of each chapter is provided. The first six chapters of the book deal with classical topics, while the remaining twelve cover more advanced techniques. Finally, the author offers suggestions on which chapters to cover based on the focus of the particular course. Keywords Machine learning Statistical signal processing Adaptive signal processing Bayesian learning Classification Regression Chapter Outline 1.1 What Machine Learning is About 1 1.1.1 Classification 2 1.1.2 Regression 3 1.2 Structure and a Road Map of the Book 5 References 8 1.1 What Machine Learning is About
Learning through personal experience and knowledge, which propagates from generation to generation, is at the heart of human intelligence. Also, at the heart of any scientific field lies the development of models (often, they are called theories) in order to explain the available experimental evidence at each time period. In other words, we always learn from data. Different data and different focuses on the data give rise to different scientific disciplines. This book is about learning from data; in particular, our intent is to detect and unveil a possible hidden structure and regularity patterns associated with their generation mechanism. This information in turn helps our analysis and understanding of the nature of the data, which can be used to make predictions for the future. Besides modeling the underlying structure, a major direction of significant interest in Machine Learning is to develop efficient algorithms for designing the models and also for analysis and prediction. The latter part is gaining importance in the dawn of what we call the big data era, when one has to deal with massive amounts of data, which may be represented in spaces of very large dimensionality. Analyzing data for such applications sets demands on algorithms to be computationally efficient and at the same time robust in their performance, because some of these data are contaminated with large noise and also, in some cases, the data may have missing values. Such methods and techniques have been at the center of scientific research for a number of decades in various disciplines, such as Statistics and Statistical Learning, Pattern Recognition, Signal and Image Processing and Analysis, Computer Science, Data Mining, Machine Vision, Bioinformatics, Industrial Automation, and Computer-Aided Medical Diagnosis, to name a few. In spite of the different names, there is a common corpus of techniques that are used in all of them, and we will refer to such methods as Machine Learning. This name has gained popularity over the last decade or so. The name suggests the use of a machine/computer to learn in analogy to how the brain learns and predicts. In some cases, the methods are directly inspired by the way the brain works, as is the case with neural networks, covered in Chapter 18. Two problems at the heart of machine learning, which also comprise the backbone of this book, are the classification and the regression tasks. 1.1.1 Classification
The goal in classification is to assign an unknown pattern to one out of a number of classes that are considered to be known. For example, in X-ray mammography, we are given an image where a region indicates the existence of a tumor. The goal of a computer-aided diagnosis system is to predict whether this tumor corresponds to the benign or the malignant class. Optical character recognition (OCR) systems are also built around a classification system, in which the image corresponding to each letter of the alphabet has to be recognized and assigned to one of the twenty-four (for the Latin alphabet) classes; see Section 18.11, for a related case study. Another example is the prediction of the authorship of a given text. Given a text written by an unknown author, the goal of a classification system is to predict the author among a number of authors (classes); this application is treated in Section 11.15. The first step in designing any machine learning task is to decide how to represent each pattern in the computer. This is achieved during the preprocessing stage; one has to “encode” related information that resides in the raw data (image pixels or strings of letters in the previous examples) in an efficient and information-rich way. This is usually done by transforming the raw data in a new space with each pattern represented by a vector, x ? l. This is known as the feature vector, and its l elements are known as the features. In this way, each pattern becomes a single point in an l-dimensional space, known as the feature space or the input space. We refer to this as the feature generation stage. Usually, one starts with some large value K of features and eventually selects the l most informative ones via an optimizing procedure known as the feature selection stage. Having decided upon the input space, in which the data are represented, one has to train a classifier. This is achieved by first selecting a set of data whose class is known, which comprises the training set. This is a set of pairs, (yn, xn), n = 1,2,…,N, where yn is the (output) variable denoting the class in which xn belongs, and it is known as the corresponding class label; the class labels, y, take values over a discrete set, {1,2,…,M}, for an M-class classification task. For example, for a two-class classification task, yn ?{-1,+1}. To keep our discussion simple, let us focus on the two-class case. Based on the training data, one then designs a function, f, which predicts the output label given the input; that is, given the measured values of the features. This function is known as the classifier. In general, we need to design a set of such functions. Once the classifier has been designed, the system is ready for predictions. Given an unknown pattern, we form the corresponding feature vector, x, from the raw data, and we plug this value into the classifier; depending on the value of f(x) (usually on the respective sign, =sgnf(x)) the pattern is classified in one of the two classes. Figure 1.1 illustrates the classification task. Initially, we are given the set of points, each representing a pattern in the two-dimensional space (two features used, x1,x2). Stars belong to one class, say ?1 and the crosses to the other, ?2, in a two-class classification task. These are the training points. Based on these points, a classifier was learned; for our very simple case, this turned out to be a linear function, (x)=?1x1+?2x2+?0, (1.1) Figure 1.1 The classifier (linear in this simple case) has been designed in order to separate the training data into the two classes, having on its positive side the points coming from one class and on its negative side those of the other. The “red” point, whose class is unknown, is classified to the same class as the “star” points, since it lies on the positive side of the classifier. whose graph for all the points such as: f(x) = 0, is the straight line shown in the figure. Then, we are given the point denoted by the red circle; this corresponds to the measured values from a pattern whose class is unknown to us. According to the classification system, which we have designed, this belongs to the same class as the points denoted by stars. Indeed, every point on one side of the straight line will give a positive value, f(x) > 0, and all the points on its other side will give a negative value, f(x) < 0. The point denoted with the red circle will then result in f(x) > 0, as all the star points, and it is classified in the same class, ?1. This type of learning is known as supervised learning, since a set of training data with known labels is available. Note that the training data can be seen as the available previous experience, and based on this, one builds a model to make predictions for the future....