Virtanen / Plumbley / Ellis | Computational Analysis of Sound Scenes and Events | E-Book | www2.sack.de
E-Book

E-Book, Englisch, 417 Seiten

Virtanen / Plumbley / Ellis Computational Analysis of Sound Scenes and Events


1. Auflage 2018
ISBN: 978-3-319-63450-0
Verlag: Springer Nature Switzerland
Format: PDF
Kopierschutz: 1 - PDF Watermark

E-Book, Englisch, 417 Seiten

ISBN: 978-3-319-63450-0
Verlag: Springer Nature Switzerland
Format: PDF
Kopierschutz: 1 - PDF Watermark



This book presents computational methods for extracting the useful information from audio signals, collecting the state of the art in the field of sound event and scene analysis. The authors cover the entire procedure for developing such methods, ranging from data acquisition and labeling, through the design of taxonomies used in the systems, to signal processing methods for feature extraction and machine learning methods for sound recognition. The book also covers advanced techniques for dealing with environmental variation and multiple overlapping sound sources, and taking advantage of multiple microphones or other modalities. The book gives examples of usage scenarios in large media databases, acoustic monitoring, bioacoustics, and context-aware devices. Graphical illustrations of sound signals and their spectrographic representations are presented, as well as block diagrams and pseudocode of algorithms.


Tuomas Virtanen is Professor at Laboratory of Signal Processing, Tampere University of Technology (TUT), Finland, where he is leading the Audio Research Group. He received the M.Sc. and Doctor of Science degrees in information technology from TUT in 2001 and 2006, respectively. He has also been working as a research associate at Cambridge University Engineering Department, UK. He is known for his pioneering work on single-channel sound source separation using non-negative matrix factorization based techniques, and their application to noise-robust speech recognition, music content analysis and audio event detection. In addition to the above topics, his research interests include content analysis of audio signals in general and machine learning. He has authored more than 100 scientific publications on the above topics, which have been cited more than 5000 times. He has received the IEEE Signal Processing Society 2012 best paper award for his article 'Monaural Sound Source Separation by Nonnegative Matrix Factorization with Temporal Continuity and Sparseness Criteria' as well as three other best paper awards. He is an IEEE Senior Member, a member of the Audio and Acoustic Signal Processing Technical Committee of IEEE Signal Processing Society, Associate Editor of IEEE/ACM Transaction on Audio, Speech, and Language Processing and recipient of the ERC 2014 Starting Grant.Mark Plumbley is Professor of Signal Processing at the Centre for Vision, Speech and Signal Processing (CVSSP) at the University of Surrey, in Guildford, UK. After receiving his Ph.D. degree in neural networks in 1991, he became a Lecturer at King's College London, before moving to Queen Mary University of London in 2002. He subsequently became Professor and Director of the Centre for Digital Music, before joining the University of Surrey in 2015. He is known for his work on analysis and processing of audio and music, using a wide range of signal processing techniques, including independent component analysis, sparse representations, and deep learning. He has also a keen to promote the importance of research software and data in audio and music research, including training researchers to follow the principles of reproducible research, and he led the 2013 D-CASE data challenge on Detection and Classification of Acoustic Scenes and Events. He currently leads two EU-funded research training networks in sparse representations, compressed sensing and machine sensing, and leads two major UK-funded projects on audio source separation and making sense of everyday sounds. He is a Fellow of the IET and IEEE. Dan Ellis joined Google Inc., in 2015 as a Research Scientist after spending 15 years as a tenured professor in the Electrical Engineering department of Columbia University, where he founded and led the Laboratory for Recognition and Organization of Speech and Audio (LabROSA) which conducted research into all aspects of extracting information from sound. He is also an External Fellow of the International Computer Science Institute in Berkeley, CA, where he researched approaches to robust speech recognition. He is known for his contributions to Computational Auditory Scene Analysis, and for developing and transferring techniques between all different kinds of audio processing including speech, music, and environmental sounds. He has a long track record of supporting the community through public releases of code and data, including the Million Song Dataset of features and metadata for one million pop music tracks, which has become the standard large-scale research set in the Music Information Retrieval field.

Virtanen / Plumbley / Ellis Computational Analysis of Sound Scenes and Events jetzt bestellen!

Weitere Infos & Material


1;Preface;5
2;Contents;6
3;Contributors;8
4;Part I Foundations;10
4.1;1 Introduction to Sound Scene and Event Analysis;11
4.1.1;1.1 Motivation;11
4.1.2;1.2 What is Computational Analysis of Sound Scenes and Events?;12
4.1.3;1.3 Related Fields;14
4.1.4;1.4 Scientific and Technical Challenges in Computational Analysis of Sound Scenes and Events;15
4.1.5;1.5 About This Book;16
4.1.6;References;19
4.2;2 The Machine Learning Approach for Analysis of Sound Scenes and Events;21
4.2.1;2.1 Introduction;21
4.2.2;2.2 Analysis Systems Overview;23
4.2.3;2.3 Data Acquisition;24
4.2.3.1;2.3.1 Source Audio;25
4.2.3.2;2.3.2 Reference Annotations;26
4.2.4;2.4 Audio Processing;27
4.2.4.1;2.4.1 Pre-processing;27
4.2.4.2;2.4.2 Feature Extraction;28
4.2.5;2.5 Supervised Learning and Recognition;30
4.2.5.1;2.5.1 Learning;31
4.2.5.2;2.5.2 Generalization;32
4.2.5.3;2.5.3 Recognition;33
4.2.6;2.6 An Example Approach Based on Neural Networks;36
4.2.6.1;2.6.1 Sound Classification;38
4.2.6.2;2.6.2 Sound Event Detection;39
4.2.7;2.7 Development Process of Audio Analysis Systems;40
4.2.7.1;2.7.1 Technological Research;41
4.2.7.2;2.7.2 Product Demonstrations;43
4.2.7.3;2.7.3 Development Process;44
4.2.8;2.8 Conclusions;45
4.2.9;References;46
4.3;3 Acoustics and Psychoacoustics of Sound Scenes and Events;49
4.3.1;3.1 Introduction;50
4.3.2;3.2 Acoustic and Psychoacoustic Characteristics of Auditory Scenes and Events;51
4.3.2.1;3.2.1 Acoustic Characteristics of Sound Scenes and Events;51
4.3.2.1.1;3.2.1.1 Periodic and Non-periodic Signals;52
4.3.2.1.2;3.2.1.2 Sound Production and Propagation;53
4.3.2.2;3.2.2 Psychoacoustics of Auditory Scenes and Events;54
4.3.2.2.1;3.2.2.1 Models of Peripheral Auditory Processing;55
4.3.2.2.2;3.2.2.2 Pitch and Loudness;56
4.3.2.2.3;3.2.2.3 The Dimensional Approach to Timbre;57
4.3.3;3.3 The Perception of Auditory Scenes;58
4.3.3.1;3.3.1 Multidimensional Representation;60
4.3.3.2;3.3.2 Temporal Coherence;60
4.3.3.3;3.3.3 Other Effects in Segregation;61
4.3.4;3.4 The Perception of Sound Events;62
4.3.4.1;3.4.1 Perception of the Properties of Sound Events: Psychomechanics;62
4.3.4.1.1;3.4.1.1 Material;62
4.3.4.1.2;3.4.1.2 Shape and Size;64
4.3.4.1.3;3.4.1.3 Parameters of Actions;64
4.3.4.2;3.4.2 Minimal and Sparse Features for Sound Recognition;65
4.3.4.2.1;3.4.2.1 Spectral Regions, Minimal Durations, and Spectro-Temporal Modulations;65
4.3.4.2.2;3.4.2.2 Sparse Features;67
4.3.4.3;3.4.3 Discussion: On the Dimensionality of Auditory Representations;68
4.3.5;3.5 Summary;69
4.3.6;References;69
5;Part II Core Methods;76
5.1;4 Acoustic Features for Environmental Sound Analysis;77
5.1.1;4.1 Introduction;78
5.1.2;4.2 Signal Representations;78
5.1.2.1;4.2.1 Signal Acquisition and Preprocessing;79
5.1.2.2;4.2.2 General Time-Frequency Representations;80
5.1.2.3;4.2.3 Log-Frequency and Perceptually Motivated Representations;82
5.1.2.4;4.2.4 Multiscale Representations;83
5.1.2.5;4.2.5 Discussion;84
5.1.3;4.3 Feature Engineering;84
5.1.3.1;4.3.1 Temporal Features;85
5.1.3.2;4.3.2 Spectral Shape Features;86
5.1.3.3;4.3.3 Cepstral Features;87
5.1.3.4;4.3.4 Perceptually Motivated Features;88
5.1.3.5;4.3.5 Spectrogram Image-Based Features;89
5.1.3.6;4.3.6 Discussion;90
5.1.4;4.4 Feature Learning;91
5.1.4.1;4.4.1 Deep Learning for Feature Extraction;91
5.1.4.2;4.4.2 Matrix Factorisation Techniques;92
5.1.4.3;4.4.3 Discussion;94
5.1.5;4.5 Dimensionality Reduction and Feature Selection;94
5.1.5.1;4.5.1 Dimensionality Reduction;95
5.1.5.2;4.5.2 Feature Selection Paradigms;95
5.1.5.3;4.5.3 Filter Approaches;96
5.1.5.4;4.5.4 Embedded Feature Selection;97
5.1.5.4.1;4.5.4.1 Feature Selection by Sparsity-Inducing Norms;97
5.1.5.4.2;4.5.4.2 Multiple Kernel Learning;97
5.1.5.4.3;4.5.4.3 Feature Selection in Tree-Based Classifiers;98
5.1.6;4.6 Temporal Integration and Pooling;98
5.1.6.1;4.6.1 Temporal Integration by Simple Statistics;98
5.1.6.2;4.6.2 Model-Based Integration;99
5.1.6.3;4.6.3 Discussion;100
5.1.7;4.7 Relation to Work on Speech and Music Processing;101
5.1.8;4.8 Conclusion and Future Directions;101
5.1.9;References;103
5.2;5 Statistical Methods for Scene and Event Classification;108
5.2.1;5.1 Introduction;108
5.2.1.1;5.1.1 Preliminaries;109
5.2.1.2;5.1.2 Validation and Testing;111
5.2.2;5.2 Discriminative Models;112
5.2.2.1;5.2.1 Binary Linear Models;112
5.2.2.1.1;5.2.1.1 Support Vector Machines;113
5.2.2.1.2;5.2.1.2 Logistic Regression;114
5.2.2.2;5.2.2 Multi-Class Linear Models;115
5.2.2.3;5.2.3 Non-linear Discriminative Models;116
5.2.3;5.3 Generative Models;117
5.2.3.1;5.3.1 Maximum Likelihood Estimation;118
5.2.3.2;5.3.2 Bayesian Estimation: Maximum A Posteriori;118
5.2.3.3;5.3.3 Aside: Fully Bayesian Inference;119
5.2.3.4;5.3.4 Gaussian Mixture Models;120
5.2.3.4.1;5.3.4.1 Classification with GMMs;121
5.2.3.4.2;5.3.4.2 Simplifications;121
5.2.3.4.3;5.3.4.3 Aside: Maximum Likelihood, or MAP?;122
5.2.3.4.4;5.3.4.4 Parameter Estimation;123
5.2.3.4.5;5.3.4.5 How Many Components?;124
5.2.3.5;5.3.5 Hidden Markov Models;124
5.2.3.5.1;5.3.5.1 Discriminative HMMs;126
5.2.3.5.2;5.3.5.2 Priors and Parameter Estimation;128
5.2.4;5.4 Deep Models;128
5.2.4.1;5.4.1 Notation;128
5.2.4.2;5.4.2 Multi-Layer Perceptrons;129
5.2.4.2.1;5.4.2.1 Transfer and Objective Functions;130
5.2.4.2.2;5.4.2.2 Initialization;131
5.2.4.2.3;5.4.2.3 Learning and Optimization;132
5.2.4.2.4;5.4.2.4 Discussion: MLP for Audio;133
5.2.4.3;5.4.3 Convolutional Networks;134
5.2.4.3.1;5.4.3.1 One-Dimensional Convolutional Networks;134
5.2.4.3.2;5.4.3.2 Two-Dimensional Convolutional Networks;137
5.2.4.4;5.4.4 Recurrent Networks;138
5.2.4.4.1;5.4.4.1 Recursive Networks;138
5.2.4.4.2;5.4.4.2 Gated Recurrent Units;139
5.2.4.4.3;5.4.4.3 Long Short-Term Memory Networks;140
5.2.4.4.4;5.4.4.4 Bi-directional Networks;142
5.2.4.5;5.4.5 Hybrid Architectures;142
5.2.4.5.1;5.4.5.1 Convolutional+Dense;142
5.2.4.5.2;5.4.5.2 Convolutional+Recurrent;143
5.2.5;5.5 Improving Model Stability;144
5.2.5.1;5.5.1 Data Augmentation;144
5.2.5.2;5.5.2 Domain Adaptation;144
5.2.5.3;5.5.3 Ensemble Methods;145
5.2.6;5.6 Conclusions and Further Reading;145
5.2.7;References;146
5.3;6 Datasets and Evaluation;152
5.3.1;6.1 Introduction;152
5.3.2;6.2 Properties of Audio and Labels;153
5.3.2.1;6.2.1 Audio Content;154
5.3.2.1.1;6.2.1.1 Sound Scene Audio Data;155
5.3.2.1.2;6.2.1.2 Sound Events Audio Data;156
5.3.2.2;6.2.2 Textual Labels;156
5.3.2.2.1;6.2.2.1 Sound Scene Labels;157
5.3.2.2.2;6.2.2.2 Sound Event Labels;158
5.3.3;6.3 Obtaining Reference Annotations;159
5.3.3.1;6.3.1 Designing Manual Annotation Tasks;160
5.3.3.1.1;6.3.1.1 Annotation with Preselected Labels;160
5.3.3.1.2;6.3.1.2 Annotation of Presegmented Audio;161
5.3.3.1.3;6.3.1.3 Free Segmentation and Labeling;161
5.3.3.2;6.3.2 Inter-Annotator Agreement and Data Reliability;163
5.3.4;6.4 Datasets for Environmental Sound Classification and Detection;164
5.3.4.1;6.4.1 Creating New Datasets;164
5.3.4.1.1;6.4.1.1 Recording New Data;164
5.3.4.1.2;6.4.1.2 Collecting a Set of Existing Recordings;165
5.3.4.1.3;6.4.1.3 Data Simulation;166
5.3.4.1.4;6.4.1.4 Typical Pitfalls in Data Collection;166
5.3.4.2;6.4.2 Available Datasets;167
5.3.4.3;6.4.3 Data Augmentation;169
5.3.5;6.5 Evaluation;170
5.3.5.1;6.5.1 Evaluation Setup;170
5.3.5.2;6.5.2 Evaluation Measures;172
5.3.5.2.1;6.5.2.1 Intermediate Statistics;172
5.3.5.2.2;6.5.2.2 Metrics;173
5.3.6;6.6 Advice on Devising Evaluation Protocols;180
5.3.7;References;182
6;Part III Advanced Methods;185
6.1;7 Everyday Sound Categorization;186
6.1.1;7.1 Introduction;186
6.1.2;7.2 Theories of Categorization;188
6.1.2.1;7.2.1 Classical Theory of Categorization;188
6.1.2.2;7.2.2 Holistic Perception;188
6.1.2.3;7.2.3 Prototype Theory of Categorization;189
6.1.2.4;7.2.4 Examplar Theory of Categorization;190
6.1.2.5;7.2.5 Bottom-Up and Top-Down Processes;191
6.1.3;7.3 Research Methods for Sound Categorization;193
6.1.3.1;7.3.1 Data Collection;193
6.1.3.1.1;7.3.1.1 Dissimilarity Estimation;193
6.1.3.1.2;7.3.1.2 Sorting Tasks;194
6.1.3.2;7.3.2 Data Analysis;194
6.1.3.2.1;7.3.2.1 Multidimensional Scaling;195
6.1.3.2.2;7.3.2.2 Additive-Tree Representations;196
6.1.3.2.3;7.3.2.3 Mantel Test;197
6.1.4;7.4 How Do People Categorize Sounds in Everyday Life?;197
6.1.4.1;7.4.1 Isolated Environmental Sounds;197
6.1.4.2;7.4.2 Linguistic Labelling of Auditory Categories;200
6.1.4.3;7.4.3 Factors Influencing Everyday Sound Categorization;201
6.1.4.4;7.4.4 Complex Auditory Scenes;202
6.1.4.4.1;7.4.4.1 Categories of Soundscapes as ``Acts of Meaning'';202
6.1.5;7.5 Organizing Everyday Sounds;205
6.1.5.1;7.5.1 Taxonomies of Sound Events;205
6.1.5.2;7.5.2 Taxonomies of Complex Auditory Scenes;206
6.1.5.3;7.5.3 Toward a Soundscape Ontology;208
6.1.5.4;7.5.4 Sound Events;209
6.1.5.5;7.5.5 Comparison;211
6.1.6;7.6 Conclusion;212
6.1.7;References;213
6.2;8 Approaches to Complex Sound Scene Analysis;217
6.2.1;8.1 Introduction;217
6.2.2;8.2 Sound Scene Recognition;218
6.2.2.1;8.2.1 Methods;219
6.2.2.2;8.2.2 Practical Considerations;221
6.2.3;8.3 Sound Event Detection and Classification;222
6.2.3.1;8.3.1 Paradigms and Techniques;222
6.2.3.2;8.3.2 Monophonic Event Detection/Classification;224
6.2.3.2.1;8.3.2.1 Detection Workflows;225
6.2.3.2.2;8.3.2.2 Modeling Temporal Structure;226
6.2.3.3;8.3.3 Polyphonic Sound Event Detection/Classification;229
6.2.3.3.1;8.3.3.1 Multiple Monophonic Detectors;229
6.2.3.3.2;8.3.3.2 Joint Approaches;230
6.2.3.4;8.3.4 Post-Processing;233
6.2.3.5;8.3.5 Which Comes First: Classification or Detection?;233
6.2.4;8.4 Context and Acoustic Language Models;234
6.2.4.1;8.4.1 Context-Dependent Sound Event Detection;234
6.2.4.2;8.4.2 Acoustic Language Models;235
6.2.5;8.5 Event Detection for Scene Analysis;237
6.2.6;8.6 Conclusions and Future Directions;239
6.2.7;References;240
6.3;9 Multiview Approaches to Event Detection and Scene Analysis;245
6.3.1;9.1 Introduction;246
6.3.2;9.2 Background and Overview;247
6.3.2.1;9.2.1 Multiview Architectures;247
6.3.2.2;9.2.2 Visual Features;247
6.3.3;9.3 General Techniques for Multiview Data Analysis;248
6.3.3.1;9.3.1 Representation and Feature Integration/Fusion;248
6.3.3.1.1;9.3.1.1 Feature-Space Transformation;249
6.3.3.1.2;9.3.1.2 Multimodal Dictionary Learning;250
6.3.3.1.3;9.3.1.3 Co-Factorization Techniques;251
6.3.3.1.4;9.3.1.4 Neural Networks and Deep Learning;253
6.3.3.2;9.3.2 Decision-Level Integration/Fusion;254
6.3.3.2.1;9.3.2.1 Probabilistic Combination Rules;254
6.3.3.2.2;9.3.2.2 Neural Networks;254
6.3.3.2.3;9.3.2.3 Other Methods;255
6.3.4;9.4 Audiovisual Event Detection;255
6.3.4.1;9.4.1 Motivation;255
6.3.4.1.1;9.4.1.1 Examples in Video Content Analysis and Indexing;255
6.3.4.1.2;9.4.1.2 Examples in AV Surveillance and Robot Perception;256
6.3.4.2;9.4.2 AV Event Detection Approaches;257
6.3.4.2.1;9.4.2.1 AV Event Detection and Concept Classification;257
6.3.4.2.2;9.4.2.2 AV Object Localization and Extraction;258
6.3.5;9.5 Microphone Array-Based Sound Scene Analysis;259
6.3.5.1;9.5.1 Spatial Cues Modeling;260
6.3.5.1.1;9.5.1.1 Binaural Approach;260
6.3.5.1.2;9.5.1.2 Beamforming Methods;262
6.3.5.1.3;9.5.1.3 Nonstationary Gaussian Model;262
6.3.5.2;9.5.2 Spatial Cues-Based Sound Scene Analysis;263
6.3.5.2.1;9.5.2.1 Sound Source Separation;263
6.3.5.2.2;9.5.2.2 Sound Event Detection;264
6.3.5.2.3;9.5.2.3 Localization and Tracking of Sound Sources;264
6.3.6;9.6 Conclusion and Outlook;268
6.3.7;References;270
7;Part IV Applications;279
7.1;10 Sound Sharing and Retrieval;280
7.1.1;10.1 Introduction;280
7.1.2;10.2 Database Creation;283
7.1.2.1;10.2.1 Licensing, File Formats, and Size;284
7.1.2.2;10.2.2 Metadata;285
7.1.2.3;10.2.3 Audio Features;286
7.1.3;10.3 Metadata-Based Sound Retrieval;287
7.1.3.1;10.3.1 Metadata for Audio Content;287
7.1.3.2;10.3.2 Search and Discovery of Indexed Content;289
7.1.4;10.4 Audio-Based Sound Retrieval;291
7.1.4.1;10.4.1 Audio Features;292
7.1.4.2;10.4.2 Feature Space;292
7.1.4.3;10.4.3 Descriptor-Based Queries;293
7.1.4.4;10.4.4 Query by Example;294
7.1.4.5;10.4.5 Audio Fingerprints and Thumbnails;294
7.1.5;10.5 Further Approaches to Sound Retrieval;295
7.1.6;10.6 Conclusions;297
7.1.7;References;299
7.2;11 Computational Bioacoustic Scene Analysis;303
7.2.1;11.1 Introduction;303
7.2.2;11.2 Tasks in Bioacoustics;305
7.2.2.1;11.2.1 Population Monitoring, Localisation, and Ranging;305
7.2.2.2;11.2.2 Species and Subspecies Identification;307
7.2.2.3;11.2.3 ``Vocabulary'' Analysis, and the Study of Invariance and Change in Animal Communication Systems;307
7.2.2.4;11.2.4 Data Mining and Archive Management, Citizen Science;308
7.2.3;11.3 Methods and Methodological Issues;309
7.2.3.1;11.3.1 Detection, Segmentation, and Classification;309
7.2.3.2;11.3.2 Source Separation;313
7.2.3.3;11.3.3 Measuring Similarity Between Animal Sounds;314
7.2.3.4;11.3.4 Sequences of Vocalisations;316
7.2.3.5;11.3.5 Holistic Soundscape Analysis: Ecoacoustics;318
7.2.3.6;11.3.6 Visualisation and Data Mining;321
7.2.4;11.4 Large-Scale Analysis Techniques;321
7.2.4.1;11.4.1 Classifiers and Detectors;323
7.2.4.2;11.4.2 Reducing Computation Via Low-Complexity Front-Ends;324
7.2.4.3;11.4.3 Features;325
7.2.5;11.5 Perspectives and Open Problems;326
7.2.6;References;328
7.3;12 Audio Event Recognition in the Smart Home;334
7.3.1;12.1 Introduction;335
7.3.2;12.2 Novel Research Directions Elicited by AER Applications in the Smart Home;340
7.3.2.1;12.2.1 Audio Events as Structured Interrupted Sequences;340
7.3.2.2;12.2.2 Continuous 24/7 Sound Recognition as an Open-Set Problem;342
7.3.2.3;12.2.3 ``The Real World'': Coping with Limited Audio Quality and Computational Power;344
7.3.3;12.3 User Experience;349
7.3.3.1;12.3.1 User Interface Aspects;349
7.3.3.2;12.3.2 The Significance of Subjectivity in AER Evaluation;351
7.3.3.3;12.3.3 Distinguishing Objectivity from Subjectivity in AER Evaluation;354
7.3.3.4;12.3.4 Summary: Objectivity, Subjectivity, and User Experience;358
7.3.4;12.4 Ethical Issues: Privacy and Data Protection;358
7.3.4.1;12.4.1 Which Country are We Talking About?;359
7.3.4.2;12.4.2 Is Environmental Audio Data Actually Private?;360
7.3.4.3;12.4.3 Consent and Ownership;362
7.3.4.4;12.4.4 Data Protection and Security;364
7.3.5;12.5 Conclusion;365
7.3.6;References;366
7.4;13 Sound Analysis in Smart Cities;371
7.4.1;13.1 Introduction;372
7.4.1.1;13.1.1 Smart Cities;372
7.4.1.2;13.1.2 Urban Sound Sensing and Analysis;372
7.4.1.3;13.1.3 Overview of this Chapter;373
7.4.2;13.2 Smart City Applications;374
7.4.3;13.3 Acoustic Sensor Networks;376
7.4.3.1;13.3.1 Mobile Sound Sensing;376
7.4.3.2;13.3.2 Static Sound Sensing;377
7.4.3.3;13.3.3 Designing a Low-Cost Acoustic Sensing Device;377
7.4.3.3.1;13.3.3.1 Microphone Module;378
7.4.3.3.2;13.3.3.2 Form Factor, Cost, and Calibration;378
7.4.3.4;13.3.4 Network Design & Infrastructure;379
7.4.4;13.4 Understanding Urban Soundscapes;380
7.4.4.1;13.4.1 Urban Sound Dataset;381
7.4.4.2;13.4.2 Engineered vs Learned Features;384
7.4.4.3;13.4.3 Shift Invariance via Convolutions;387
7.4.4.4;13.4.4 Deep Learning and Data Augmentation;388
7.4.5;13.5 Conclusion and Future Perspectives;389
7.4.6;References;391
8;Part V Perspectives;396
8.1;14 Future Perspective;397
8.1.1;14.1 Introduction;397
8.1.2;14.2 Obtaining Training Data;398
8.1.2.1;14.2.1 Cataloguing Sounds;398
8.1.2.1.1;14.2.1.1 Manual Sound Event Vocabularies;399
8.1.2.1.2;14.2.1.2 Automatic Creation of Sound Event Vocabularies;399
8.1.2.2;14.2.2 Opportunistic Data Collection;402
8.1.2.3;14.2.3 Active Learning;402
8.1.2.4;14.2.4 Using Unsupervised Data;403
8.1.2.4.1;14.2.4.1 Training with Weak Labels;403
8.1.2.4.2;14.2.4.2 Exploiting Visual Information;405
8.1.2.5;14.2.5 Evaluation Tasks;406
8.1.3;14.3 Future Perspectives;407
8.1.3.1;14.3.1 Applications;407
8.1.3.2;14.3.2 Approaches;407
8.1.4;14.4 Summary;409
8.1.5;References;409
9;Index;412



Ihre Fragen, Wünsche oder Anmerkungen
Vorname*
Nachname*
Ihre E-Mail-Adresse*
Kundennr.
Ihre Nachricht*
Lediglich mit * gekennzeichnete Felder sind Pflichtfelder.
Wenn Sie die im Kontaktformular eingegebenen Daten durch Klick auf den nachfolgenden Button übersenden, erklären Sie sich damit einverstanden, dass wir Ihr Angaben für die Beantwortung Ihrer Anfrage verwenden. Selbstverständlich werden Ihre Daten vertraulich behandelt und nicht an Dritte weitergegeben. Sie können der Verwendung Ihrer Daten jederzeit widersprechen. Das Datenhandling bei Sack Fachmedien erklären wir Ihnen in unserer Datenschutzerklärung.