E-Book, Englisch, 132 Seiten
Reihe: De Gruyter Textbook
An Introduction
E-Book, Englisch, 132 Seiten
Reihe: De Gruyter Textbook
ISBN: 978-3-11-102580-3
Verlag: De Gruyter
Format: EPUB
Kopierschutz: Adobe DRM (»Systemvoraussetzungen)
Zielgruppe
Graduate students, and professionals interested in the mathematic
Autoren/Hrsg.
Fachgebiete
- Mathematik | Informatik Mathematik Numerik und Wissenschaftliches Rechnen Angewandte Mathematik, Mathematische Modelle
- Mathematik | Informatik EDV | Informatik Programmierung | Softwareentwicklung Algorithmen & Datenstrukturen
- Mathematik | Informatik EDV | Informatik Informatik Künstliche Intelligenz
Weitere Infos & Material
4 The fundamentals of artificial neural networks
4.1 Basic definitions
In this book, we focus on deep learning, which is a type of machine learning based on artificial neural networks (ANNs), which involve several layers of so-called neuron functions. This kind of architecture was loosely inspired by the biological neural networks in animals’ brains. The key building block of ANNs is what we call here a neuron function. A neuron function is a very simplified mathematical representation of a biological neuron dating back to [34]. This naturally leads to various ways of organizing those neuron functions into interacting networks that are capable of complex tasks. The perceptron, introduced in [43] (see also [35]), is one of the earliest one layer ANNs. Nowadays, multiple layers are typically used instead because neurons in the same layer are not connected, but neurons can interact with neurons in other layers. We present in this chapter a rather common architecture for classifiers, which is based on so-called feedforward networks, which are networks with no cycles. A “cycle” means that the output of one layer becomes the input of a previous layer. More complex networks can also be used with some cycles between layers (so-called recurrent neural networks; see for example [11], [17]). Finally as one readily sees, ANNs typically require a very large number of parameters. This makes identifying the best choice for those coefficients delicate and leads to the important question of learning the parameters, which is discussed later in this book. Definition 4.1.1.
A neuron function f:Rn?R is a mapping of the form (see Fig.?4.1) (4.1)f(x)=?(a·x+ß), where ?:R?R is a continuous non-linear function called the activation function, a?Rn is a vector of weights, and scalar ß?R is called the bias. Here, a·x is the inner product on Rn. A typical example of an activation function is ReLU (Rectified Linear Unit), defined as follows (see Fig.?4.2): (4.2)ReLU(x)=xx>0,0x=0. ReLU is a very simple non-linear function composed of two linear functions that model the threshold between an “inactive” and an “active” state of the neuron. Figure 4.1 A neuron function input x?Rn, vector of weights a=(a1,…,an), bias ß, and output scalar f(x)=?(a·x+ß). Figure 4.2 ReLU function. ß, “bias” constant. ReLU is a simple way to model a neuron in a brain which has two states: resting (resting potential) and active (firing). Incoming impulses from other neurons can change the state from resting to active, but the impulse needs to reach a certain threshold first. Thus, ReLU’s change from constant 0 to a linear function at xthresh=ß reflects the change from resting to active at this threshold. Figure 4.3 Layer of neurons fI, 1=i=4, with vector of input data xj. Weights aij are assigned to each edge connecting xj and fi, n=3, m=4. Since the output of a neuron function is a single number, we can combine several neurons to create a vector-valued function called a layer function. Definition 4.1.2.
A layer function g:Rn?Rm is a mapping of the form (4.3)g(x)=(f1(x),f2(x),…,fm(x)), where each fi:Rn?R is a neuron function of the form (4.1) with its own vector of parameters ai=(ai1,…,ain) and biases ßi, i=1,…,m. Remark 4.1.1.
When m=1 in (4.3), the layer function reduces to a neuron function. Remark 4.1.2.
When discussing layers, it is important to distinguish between the layer nodes and the layer function that connects them. There are two columns of nodes in Fig.?4.3, which are commonly referred to as two layers. However, according to Definition 4.1.2, Fig.?4.3 depicts a single layer function. Thus, it is important to distinguish between columns of nodes in diagrammatic representations of layers as in Fig.?4.3 and the layer function that connects them defined in (4.3). This is illustrated in Fig.?4.3, where the nodes x=(x1,x2,x3) can be referred to as the nodes of the input vector and the output vector of the layer function is given by (4.3). In such a situation, one may commonly refer to two layers of nodes: the input layer composed of the nodes corresponding to the coordinates xi of x and the output layer composed of the m nodes corresponding to the coordinates yi of y=g(x). For a general multilayer network defined in Definition 4.1.3, if we consider M layer functions, we will have M+1 layers of nodes. We simply write layer for layer of nodes, and often refer to individual nodes within a layer as neurons (vs. the layer functions or neuron functions connecting those). Thus, the layer function is determined by a matrix of parameters, (4.4)A=a11?a1n???am1?amn, and a vector of biases, (4.5)ß=ß1?ßm. Hence, (4.3) may be written (4.6)g(x)=?¯(Ax+ß), where ?¯:Rm?Rm is the vectorial activation function defined as (4.7)?¯(x1,…,xm)=(?(x1),…,?(xm)) for a scalar activation function ? as in Definition 4.1.1. See Fig.?4.3 for a diagram of a layer function. This figure shows a layer as a graph with two columns of nodes. The right column depicts neuron functions in the layer, while the left column depicts three real numbers (data) which are input to the layer. The edge connecting the ith input with the jth neuron is multiplied by the parameter value aij from the matrix of parameters. Definition 4.1.3.
An artificial neural network (ANN) is a function h:Rn?Rm of the form (4.8)h(x)=hM°hM-1°?°h1(x),M=1, where each hi:Rni-1?Rni is a layer function (see Definition 4.1.2) with its own matrix of parameters Ai and its own vector of biases ßi. Fig.?4.4 shows an ANN composed of two layers. The layer function in Fig.?4.4 between input and output is called a hidden layer (hidden from the user) because its output is passed to another layer, not directly to the user. The number of neurons ni in the ith layer is called the layer’s width, while the total number M of layers in an ANN is the depth of the ANN. The numbers n1,…,nM and M comprise the architecture of this network. ANNs with more than one layer are referred to as deep neural networks (DNNs). Figure 4.4 Simple network with input layer and two layer functions, n=n0=4, n1=6, m=n2=3. This network is called fully connected because for all but the last layer, each neuron provides input to each neuron in the next layer. That is, each node in each column of the graph in Fig.?4.4 is connected to each node in the...