E-Book, Englisch, 402 Seiten
Tan / Lindberg Automatic Speech Recognition on Mobile Devices and over Communication Networks
1. Auflage 2008
ISBN: 978-1-84800-143-5
Verlag: Springer
Format: PDF
Kopierschutz: 1 - PDF Watermark
E-Book, Englisch, 402 Seiten
Reihe: Advances in Computer Vision and Pattern Recognition
ISBN: 978-1-84800-143-5
Verlag: Springer
Format: PDF
Kopierschutz: 1 - PDF Watermark
The advances in computing and networking have sparked an enormous interest in deploying automatic speech recognition on mobile devices and over communication networks. This book brings together academic researchers and industrial practitioners to address the issues in this emerging realm and presents the reader with a comprehensive introduction to the subject of speech recognition in devices and networks. It covers network, distributed and embedded speech recognition systems.
Autoren/Hrsg.
Weitere Infos & Material
1;Preface;6
2;Contents;10
3;Contributors;20
4;1 Network, Distributed and Embedded Speech Recognition: An Overview;22
4.1;1.1 Introduction;22
4.2;1.2 ASR and Its Deployment in Devices and Networks;24
4.3;1.3 Network Speech Recognition;30
4.4;1.4 Distributed Speech Recognition;32
4.5;1.5 Embedded Speech Recognition;36
4.6;1.6 Discussion;41
4.7;References;42
5;Part I Network Speech Recognition;46
5.1;2 Speech Coding and Packet Loss Effects on Speech and Speaker Recognition;48
5.1.1;2.1 Introduction;48
5.1.2;2.2 Sources of Degradation in Network Speech Recognition;49
5.1.3;2.3 Effects on the Automatic Speech Recognition Task;53
5.1.4;2.4 Effect for the Automatic Speaker Verification Task;56
5.1.5;2.5 Conclusion;59
5.1.6;Acknowledgments;59
5.1.7;References;60
5.2;3 Speech Recognition ver Mobile Networks ;62
5.2.1;3.1 Introduction;62
5.2.2;3.2 Techniques for Improving ASR Performance ver Mobile Networks ;64
5.2.3;3.3 Bitstream-Based Approach;67
5.2.4;3.4 Feature Transform;71
5.2.5;3.5 Enhancement of ASR Performance ver Mobile Networks O;74
5.2.6;3.6 Conclusion;78
5.2.7;References;79
5.3;4 Speech Recognition Over IP Networks;84
5.3.1;4.1 Introduction;84
5.3.2;4.2 Speech Recognition and IP Networks;86
5.3.3;4.3 Robustness Against Packet Loss;90
5.3.4;4.4 Speech Coder for Speech Recognition Over IP Networks;92
5.3.5;4.5 Conclusion;103
5.3.6;References;103
6;Part II Distributed Speech Recognition;106
6.1;5 Distributed Speech Recognition Standards;108
6.1.1;5.1 Introduction;108
6.1.2;5.2 Overview of the Set of DSR Standards;110
6.1.3;5.3 Scope of the Standards;111
6.1.4;5.4 DSR Basic Front-End ES 201 108;115
6.1.5;5.5 DSR Advanced Front-End ES 202 050;117
6.1.6;5.6 Recognition Performance of the DSR Front-Ends;118
6.1.7;5.7 3GPP Evaluations and Comparisons to AMR Coded Speech;120
6.1.8;5.8 ETSI DSR Extended Front-End Standards ES 202 211 and ES 202 212;123
6.1.9;5.9 Transport Protocols: The IETF RTP Payload Formats for DSR;125
6.1.10;5.10 Conclusion;126
6.1.11;Acknowledgments;126
6.1.12;References;126
6.2;6 Speech Feature Extraction and Reconstruction;128
6.2.1;6.1 Introduction;128
6.2.2;6.2 Feature Extraction;130
6.2.3;6.3 Speech Reconstruction;138
6.2.4;6.4 Prediction of Voicing and Fundamental Frequency;144
6.2.5;6.5 Conclusion;150
6.2.6;References;150
6.3;7 Quantization of Speech Features: Source Coding;152
6.3.1;7.1 Introduction;152
6.3.2;7.2 Quantization Schemes;153
6.3.3;7.3 Quantization of ASR Feature Vectors;162
6.3.4;7.4 Experimental Results;174
6.3.5;7.5 Conclusion;179
6.3.6;References;180
6.4;8 Error Recovery: Channel Coding and Packetization;184
6.4.1;8.1 Distributed Speech Recognition Systems;184
6.4.2;8.2 Characterization and Modeling of Communication Channels;185
6.4.3;8.3 Media-Specific FEC;188
6.4.4;8.4 Media-Independent FEC;189
6.4.5;8.5 Unequal Error Protection;197
6.4.6;8.6 Frame Interleaving;198
6.4.7;8.7 Examples of Modern Error Recovery Standards;202
6.4.8;8.8 Summary;204
6.4.9;Acknowledgments;205
6.4.10;References;205
6.5;9 Error Concealment;208
6.5.1;9.1 Introduction;208
6.5.2;9.2 Speech Recognition in the Presence of Corrupted Features;211
6.5.3;9.3 Feature Posterior Estimation in a DSR Framework;215
6.5.4;9.4 Performance Evaluations;223
6.5.5;9.5 Conclusion;228
6.5.6;Acknowledgments;229
6.5.7;References;229
7;Part III Embedded Speech Recognition;232
7.1;10 Algorithm Optimizations: Low Computational Complexity;234
7.1.1;10.1 Introduction;234
7.1.2;10.2 Common Limitations of Embedded Platforms;235
7.1.3;10.3 Overview of an ASR System;236
7.1.4;10.4 Front End;237
7.1.5;10.5 Observation Model;238
7.1.6;10.6 Search;242
7.1.7;10.7 Conclusion;250
7.1.8;Acknowledgments;250
7.1.9;References;251
7.2;11 Algorithm Optimizations: Low Memory Footprint;254
7.2.1;11.1 Introduction;254
7.2.2;11.2 Notations and Problem Statement;255
7.2.3;11.3 Model Complexity Control;258
7.2.4;11.4 Parameter Tying;260
7.2.5;11.5 Parameter Representations;264
7.2.6;11.6 Quantized Parameters HMMs;266
7.2.7;11.7 Subspace Distribution Clustering HMM;268
7.2.8;11.8 Computational Complexity Implications;270
7.2.9;11.9 Practicalities and Conclusion;271
7.2.10;References;272
7.3;12 Fixed-Point Arithmetic;276
7.3.1;12.1 Introduction;276
7.3.2;12.2 Fixed-Point Arithmetic;278
7.3.3;12.3 LVCSR MAP Recognizer;280
7.3.4;12.4 Fixed-Point Implementation of the Recognizer;285
7.3.5;12.5 Experiments;290
7.3.6;12.6 Conclusion;295
7.3.7;Acknowledgments;295
7.3.8;References;295
8;Part IV Systems and Applications;298
8.1;13 Software Architectures for Networked Mobile Speech Applications;300
8.1.1;13.1 Introduction;300
8.1.2;13.2 Classes of Multimodal Architectures;309
8.1.3;13.3 The “Plus V” Distributed Multimodal Architecture;314
8.1.4;13.4 Other Distributed Multimodal Architectures;316
8.1.5;13.5 Toward a Commercial Ecosystem;318
8.1.6;13.6 Conclusion;319
8.1.7;References;319
8.2;14 Speech Recognition in Mobile Phones;322
8.2.1;14.1 Introduction;322
8.2.2;14.2 Applications of Speech Recognition for Mobile Phones;323
8.2.3;14.3 Multilinguality and Language Support;326
8.2.4;14.4 Noise Robustness;330
8.2.5;14.5 Footprint and Complexity Reduction;335
8.2.6;14.6 Platforms and an Example Application;340
8.2.7;14.7 Conclusion and Outlook;344
8.2.8;References;344
8.3;15 Handheld Speech to Speech Translation System;348
8.3.1;15.1 Introduction;348
8.3.2;15.2 System Overview;349
8.3.3;15.3 System Components and Optimization;353
8.3.4;15.4 Experiments and Discussions;362
8.3.5;15.5 Conclusion;365
8.3.6;References;366
8.4;16 Automotive Speech Recognition;368
8.4.1;16.1 Introduction;368
8.4.2;16.2 Siemens Speech Processing—From Research to Products;369
8.4.3;16.3 Example Automotive Voice Applications: Infotainment, Navigation, Manuals, and Internet;372
8.4.4;16.4 Automotive Platform Issues and Challenges;378
8.4.5;16.5 Noise Robust Recognition Technology;381
8.4.6;16.6 Methodology for Evaluation of Automotive Recognizers Quality Measurement Using SNR Curves;388
8.4.7;16.7 Conclusion;393
8.4.8;References;393
8.5;17 Energy Aware Speech Recognition for Mobile Devices;396
8.5.1;17.1 Introduction;396
8.5.2;17.2 Case Study of Distributed Speech Recognition Using the HP Labs Smartbadge System;400
8.5.3;17.3 Conclusion;416
8.5.4;References;416
9;Index;418




