Yu / Deng | Automatic Speech Recognition | E-Book | www2.sack.de
E-Book

E-Book, Englisch, 329 Seiten

Reihe: Signals and Communication Technology

Yu / Deng Automatic Speech Recognition

A Deep Learning Approach
2015
ISBN: 978-1-4471-5779-3
Verlag: Springer
Format: PDF
Kopierschutz: 1 - PDF Watermark

A Deep Learning Approach

E-Book, Englisch, 329 Seiten

Reihe: Signals and Communication Technology

ISBN: 978-1-4471-5779-3
Verlag: Springer
Format: PDF
Kopierschutz: 1 - PDF Watermark



This book provides a comprehensive overview of the recent advancement in the field of automatic speech recognition with a focus on deep learning models including deep neural networks and many of their variants. This is the first automatic speech recognition book dedicated to the deep learning approach. In addition to the rigorous mathematical treatment of the subject, the book also presents insights and theoretical foundation of a series of highly successful deep learning models.

Yu / Deng Automatic Speech Recognition jetzt bestellen!

Autoren/Hrsg.


Weitere Infos & Material


1;Foreword;7
2;Preface;9
3;Contents;12
4;Acronyms;19
5;Symbols;22
6;1 Introduction;24
6.1;1.1 Automatic Speech Recognition: A Bridge for Better Communication;24
6.1.1;1.1.1 Human--Human Communication;25
6.1.2;1.1.2 Human--Machine Communication;25
6.2;1.2 Basic Architecture of ASR Systems;27
6.3;1.3 Book Organization;28
6.3.1;1.3.1 Part I: Conventional Acoustic Models;29
6.3.2;1.3.2 Part II: Deep Neural Networks;29
6.3.3;1.3.3 Part III: DNN-HMM Hybrid Systems for ASR;30
6.3.4;1.3.4 Part IV: Representation Learning in Deep Neural Networks;30
6.3.5;1.3.5 Part V: Advanced Deep Models;30
6.4;References;31
7;Part IConventional Acoustic Models;33
8;2 Gaussian Mixture Models;34
8.1;2.1 Random Variables;34
8.2;2.2 Gaussian and Gaussian-Mixture Random Variables;35
8.3;2.3 Parameter Estimation;38
8.4;2.4 Mixture of Gaussians as a Model for the Distribution of Speech Features;39
8.5;References;41
9;3 Hidden Markov Models and the Variants;43
9.1;3.1 Introduction;43
9.2;3.2 Markov Chains;45
9.3;3.3 Hidden Markov Sequences and Models;46
9.3.1;3.3.1 Characterization of a Hidden Markov Model;47
9.3.2;3.3.2 Simulation of a Hidden Markov Model;49
9.3.3;3.3.3 Likelihood Evaluation of a Hidden Markov Model;49
9.3.4;3.3.4 An Algorithm for Efficient Likelihood Evaluation;50
9.3.5;3.3.5 Proofs of the Forward and Backward Recursions;52
9.4;3.4 EM Algorithm and Its Application to Learning HMM Parameters;53
9.4.1;3.4.1 Introduction to EM Algorithm;53
9.4.2;3.4.2 Applying EM to Learning the HMM---Baum-Welch Algorithm;55
9.5;3.5 Viterbi Algorithm for Decoding HMM State Sequences;59
9.5.1;3.5.1 Dynamic Programming and Viterbi Algorithm;59
9.5.2;3.5.2 Dynamic Programming for Decoding HMM States;60
9.6;3.6 The HMM and Variants for Generative Speech Modeling and Recognition;62
9.6.1;3.6.1 GMM-HMMs for Speech Modeling and Recognition;63
9.6.2;3.6.2 Trajectory and Hidden Dynamic Models for Speech Modeling and Recognition;64
9.6.3;3.6.3 The Speech Recognition Problem Using Generative Models of HMM and Its Variants;66
9.7;References;68
10;Part IIDeep Neural Networks;75
11;4 Deep Neural Networks;76
11.1;4.1 The Deep Neural Network Architecture;76
11.2;4.2 Parameter Estimation with Error Backpropagation;78
11.2.1;4.2.1 Training Criteria;79
11.2.2;4.2.2 Training Algorithms;80
11.3;4.3 Practical Considerations;84
11.3.1;4.3.1 Data Preprocessing;84
11.3.2;4.3.2 Model Initialization;86
11.3.3;4.3.3 Weight Decay;87
11.3.4;4.3.4 Dropout;88
11.3.5;4.3.5 Batch Size Selection;89
11.3.6;4.3.6 Sample Randomization;91
11.3.7;4.3.7 Momentum;92
11.3.8;4.3.8 Learning Rate and Stopping Criterion;92
11.3.9;4.3.9 Network Architecture;94
11.3.10;4.3.10 Reproducibility and Restartability;94
11.4;References;95
12;5 Advanced Model Initialization Techniques;97
12.1;5.1 Restricted Boltzmann Machines;97
12.1.1;5.1.1 Properties of RBMs;99
12.1.2;5.1.2 RBM Parameter Learning;101
12.2;5.2 Deep Belief Network Pretraining;104
12.3;5.3 Pretraining with Denoising Autoencoder;107
12.4;5.4 Discriminative Pretraining;109
12.5;5.5 Hybrid Pretraining;110
12.6;5.6 Dropout Pretraining;111
12.7;References;112
13;Part IIIDeep Neural Network-Hidden MarkovModel Hybrid Systems for AutomaticSpeech Recognition;114
14;6 Deep Neural Network-Hidden Markov Model Hybrid Systems;115
14.1;6.1 DNN-HMM Hybrid Systems;115
14.1.1;6.1.1 Architecture;115
14.1.2;6.1.2 Decoding with CD-DNN-HMM;117
14.1.3;6.1.3 Training Procedure for CD-DNN-HMMs;118
14.1.4;6.1.4 Effects of Contextual Window;120
14.2;6.2 Key Components in the CD-DNN-HMM and Their Analysis;122
14.2.1;6.2.1 Datasets and Baselines for Comparisons and Analysis;122
14.2.2;6.2.2 Modeling Monophone States or Senones;124
14.2.3;6.2.3 Deeper Is Better;125
14.2.4;6.2.4 Exploit Neighboring Frames;127
14.2.5;6.2.5 Pretraining;127
14.2.6;6.2.6 Better Alignment Helps;128
14.2.7;6.2.7 Tuning Transition Probability;129
14.3;6.3 Kullback-Leibler Divergence-Based HMM;129
14.4;References;130
15;7 Training and Decoding Speedup;133
15.1;7.1 Training Speedup;133
15.1.1;7.1.1 Pipelined Backpropagation Using Multiple GPUs;134
15.1.2;7.1.2 Asynchronous SGD;137
15.1.3;7.1.3 Augmented Lagrangian Methods and Alternating Directions Method of Multipliers;140
15.1.4;7.1.4 Reduce Model Size;142
15.1.5;7.1.5 Other Approaches;143
15.2;7.2 Decoding Speedup;143
15.2.1;7.2.1 Parallel Computation;144
15.2.2;7.2.2 Sparse Network;146
15.2.3;7.2.3 Low-Rank Approximation;148
15.2.4;7.2.4 Teach Small DNN with Large DNN;149
15.2.5;7.2.5 Multiframe DNN;150
15.3;References;151
16;8 Deep Neural Network Sequence-Discriminative Training;153
16.1;8.1 Sequence-Discriminative Training Criteria;153
16.1.1;8.1.1 Maximum Mutual Information;153
16.1.2;8.1.2 Boosted MMI;155
16.1.3;8.1.3 MPE/sMBR;156
16.1.4;8.1.4 A Uniformed Formulation;157
16.2;8.2 Practical Considerations;158
16.2.1;8.2.1 Lattice Generation;158
16.2.2;8.2.2 Lattice Compensation;159
16.2.3;8.2.3 Frame Smoothing;161
16.2.4;8.2.4 Learning Rate Adjustment;162
16.2.5;8.2.5 Training Criterion Selection;162
16.2.6;8.2.6 Other Considerations;163
16.3;8.3 Noise Contrastive Estimation;163
16.3.1;8.3.1 Casting Probability Density Estimation Problem as a Classifier Design Problem;164
16.3.2;8.3.2 Extension to Unnormalized Models;166
16.3.3;8.3.3 Apply NCE in DNN Training;167
16.4;References;169
17;Part IVRepresentation Learningin Deep Neural Networks;170
18;9 Feature Representation Learning in Deep Neural Networks;171
18.1;9.1 Joint Learning of Feature Representation and Classifier;171
18.2;9.2 Feature Hierarchy;173
18.3;9.3 Flexibility in Using Arbitrary Input Features;176
18.4;9.4 Robustness of Features;177
18.4.1;9.4.1 Robust to Speaker Variations;177
18.4.2;9.4.2 Robust to Environment Variations;179
18.5;9.5 Robustness Across All Conditions;181
18.5.1;9.5.1 Robustness Across Noise Levels;181
18.5.2;9.5.2 Robustness Across Speaking Rates;183
18.6;9.6 Lack of Generalization Over Large Distortions;184
18.7;References;187
19;10 Fuse Deep Neural Network and Gaussian Mixture Model Systems;190
19.1;10.1 Use DNN-Derived Features in GMM-HMM Systems;190
19.1.1;10.1.1 GMM-HMM with Tandem and Bottleneck Features;190
19.1.2;10.1.2 DNN-HMM Hybrid System Versus GMM-HMM System with DNN-Derived Features;193
19.2;10.2 Fuse Recognition Results;195
19.2.1;10.2.1 ROVER;196
19.2.2;10.2.2 SCARF;197
19.2.3;10.2.3 MBR Lattice Combination;198
19.3;10.3 Fuse Frame-Level Acoustic Scores;199
19.4;10.4 Multistream Speech Recognition;200
19.5;References;202
20;11 Adaptation of Deep Neural Networks;205
20.1;11.1 The Adaptation Problem for Deep Neural Networks;205
20.2;11.2 Linear Transformations;206
20.2.1;11.2.1 Linear Input Networks;207
20.2.2;11.2.2 Linear Output Networks;208
20.3;11.3 Linear Hidden Networks;210
20.4;11.4 Conservative Training;211
20.4.1;11.4.1 L2 Regularization;211
20.4.2;11.4.2 KL-Divergence Regularization;212
20.4.3;11.4.3 Reducing Per-Speaker Footprint;214
20.5;11.5 Subspace Methods;216
20.5.1;11.5.1 Subspace Construction Through Principal Component Analysis;216
20.5.2;11.5.2 Noise-Aware, Speaker-Aware, and Device-Aware Training;217
20.5.3;11.5.3 Tensor;221
20.6;11.6 Effectiveness of DNN Speaker Adaptation;222
20.6.1;11.6.1 KL-Divergence Regularization Approach;222
20.6.2;11.6.2 Speaker-Aware Training;224
20.7;References;225
21;Part VAdvanced Deep Models;228
22;12 Representation Sharing and Transfer in Deep Neural Networks;229
22.1;12.1 Multitask and Transfer Learning;229
22.1.1;12.1.1 Multitask Learning;229
22.1.2;12.1.2 Transfer Learning;230
22.2;12.2 Multilingual and Crosslingual Speech Recognition;231
22.2.1;12.2.1 Tandem/Bottleneck-Based Crosslingual Speech Recognition;232
22.2.2;12.2.2 Shared-Hidden-Layer Multilingual DNN;233
22.2.3;12.2.3 Crosslingual Model Transfer;236
22.3;12.3 Multiobjective Training of Deep Neural Networks for Speech Recognition;240
22.3.1;12.3.1 Robust Speech Recognition with Multitask Learning;240
22.3.2;12.3.2 Improved Phone Recognition with Multitask Learning;240
22.3.3;12.3.3 Recognizing both Phonemes and Graphemes;241
22.4;12.4 Robust Speech Recognition Exploiting Audio-Visual Information;242
22.5;References;243
23;13 Recurrent Neural Networks and Related Models;246
23.1;13.1 Introduction;246
23.2;13.2 State-Space Formulation of the Basic Recurrent Neural Network;248
23.3;13.3 The Backpropagation-Through-Time Learning Algorithm;249
23.3.1;13.3.1 Objective Function for Minimization;250
23.3.2;13.3.2 Recursive Computation of Error Terms;250
23.3.3;13.3.3 Update of RNN Weights;251
23.4;13.4 A Primal-Dual Technique for Learning Recurrent Neural Networks;253
23.4.1;13.4.1 Difficulties in Learning RNNs;253
23.4.2;13.4.2 Echo-State Property and Its Sufficient Condition;254
23.4.3;13.4.3 Learning RNNs as a Constrained Optimization Problem;254
23.4.4;13.4.4 A Primal-Dual Method for Learning RNNs;255
23.5;13.5 Recurrent Neural Networks Incorporating LSTM Cells;258
23.5.1;13.5.1 Motivations and Applications;258
23.5.2;13.5.2 The Architecture of LSTM Cells;259
23.5.3;13.5.3 Training the LSTM-RNN;259
23.6;13.6 Analyzing Recurrent Neural Networks---A Contrastive Approach;260
23.6.1;13.6.1 Direction of Information Flow: Top-Down versus Bottom-Up;260
23.6.2;13.6.2 The Nature of Representations: Localist or Distributed;263
23.6.3;13.6.3 Interpretability: Inferring Latent Layers versus End-to-End Learning;264
23.6.4;13.6.4 Parameterization: Parsimonious Conditionals versus Massive Weight Matrices;265
23.6.5;13.6.5 Methods of Model Learning: Variational Inference versus Gradient Descent;267
23.6.6;13.6.6 Recognition Accuracy Comparisons;267
23.7;13.7 Discussions;268
23.8;References;270
24;14 Computational Network;276
24.1;14.1 Computational Network;276
24.2;14.2 Forward Computation;278
24.3;14.3 Model Training;280
24.4;14.4 Typical Computation Nodes;284
24.4.1;14.4.1 Computation Node Types with No Operand;285
24.4.2;14.4.2 Computation Node Types with One Operand;285
24.4.3;14.4.3 Computation Node Types with Two Operands;290
24.4.4;14.4.4 Computation Node Types for Computing Statistics;296
24.5;14.5 Convolutional Neural Network;297
24.6;14.6 Recurrent Connections;300
24.6.1;14.6.1 Sample by Sample Processing Only Within Loops;301
24.6.2;14.6.2 Processing Multiple Utterances Simultaneously;302
24.6.3;14.6.3 Building Arbitrary Recurrent Neural Networks;302
24.7;References;306
25;15 Summary and Future Directions;308
25.1;15.1 Road Map;308
25.1.1;15.1.1 Debut of DNNs for ASR;308
25.1.2;15.1.2 Speedup of DNN Training and Decoding;311
25.1.3;15.1.3 Sequence Discriminative Training;311
25.1.4;15.1.4 Feature Processing;312
25.1.5;15.1.5 Adaptation;313
25.1.6;15.1.6 Multitask and Transfer Learning;314
25.1.7;15.1.7 Convolution Neural Networks;314
25.1.8;15.1.8 Recurrent Neural Networks and LSTM;315
25.1.9;15.1.9 Other Deep Models;315
25.2;15.2 State of the Art and Future Directions;316
25.2.1;15.2.1 State of the Art---A Brief Analysis;316
25.2.2;15.2.2 Future Directions;317
25.3;References;318
26;Index;325



Ihre Fragen, Wünsche oder Anmerkungen
Vorname*
Nachname*
Ihre E-Mail-Adresse*
Kundennr.
Ihre Nachricht*
Lediglich mit * gekennzeichnete Felder sind Pflichtfelder.
Wenn Sie die im Kontaktformular eingegebenen Daten durch Klick auf den nachfolgenden Button übersenden, erklären Sie sich damit einverstanden, dass wir Ihr Angaben für die Beantwortung Ihrer Anfrage verwenden. Selbstverständlich werden Ihre Daten vertraulich behandelt und nicht an Dritte weitergegeben. Sie können der Verwendung Ihrer Daten jederzeit widersprechen. Das Datenhandling bei Sack Fachmedien erklären wir Ihnen in unserer Datenschutzerklärung.