Berry / Mohamed / Yap | Supervised and Unsupervised Learning for Data Science | E-Book | www2.sack.de
E-Book

E-Book, Englisch, 191 Seiten

Reihe: Unsupervised and Semi-Supervised Learning

Berry / Mohamed / Yap Supervised and Unsupervised Learning for Data Science


1. Auflage 2019
ISBN: 978-3-030-22475-2
Verlag: Springer International Publishing
Format: PDF
Kopierschutz: 1 - PDF Watermark

E-Book, Englisch, 191 Seiten

Reihe: Unsupervised and Semi-Supervised Learning

ISBN: 978-3-030-22475-2
Verlag: Springer International Publishing
Format: PDF
Kopierschutz: 1 - PDF Watermark



This book covers the state of the art in learning algorithms with an inclusion of semi-supervised methods to provide a broad scope of clustering and classification solutions for big data applications. Case studies and best practices are included along with theoretical models of learning for a comprehensive reference to the field. The book is organized into eight chapters that cover the following topics: discretization, feature extraction and selection, classification, clustering, topic modeling, graph analysis and applications. Practitioners and graduate students can use the volume as an important reference for their current and future research and faculty will find the volume useful for assignments in presenting current approaches to unsupervised and semi-supervised learning in graduate-level seminar courses. The book is based on selected, expanded papers from the Fourth International Conference on Soft Computing in Data Science (2018).Includes new advances in clustering and classification using semi-supervised and unsupervised learning;Address new challenges arising in feature extraction and selection using semi-supervised and unsupervised learning;Features applications from healthcare, engineering, and text/social media mining that exploit techniques from semi-supervised and unsupervised learning.


Professor Michael W. Berry is a Full Professor in the Departments of Electrical Engineering and Computer Science (EECS) and Mathematics at the University of Tennessee, Knoxville. He served as Interim Department Head of Computer Science from January 2004 to June 2007, and as Associate Head in the Department of Electrical Engineering and Computer Science from July 2007 to July 2012. He worked in the Communications Product Division of IBM in Raleigh, NC for about 1 year before accepting a research staff position in the Center for Supercomputing Research and Development at the University of Illinois at Urbana-Champaign. In 1990, he received a PhD in Computer Science from the University of Illinois at Urbana-Champaign. Prof. Berry is the co-author of 'Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods' (SIAM, 1994) and 'Understanding Search Engines: Mathematical Modeling and Text Retrieval, Second Edition' (Bestseller, SIAM, 2005) and editor of 'Computational Information Retrieval' (SIAM, 2001), 'Survey of Text Mining: Clustering, Classification, and Retrieval' (Springer-Verlag, 2003, 2007), 'Lecture Notes in Data Mining' (Bestseller, World Scientific, 2006), 'Text Mining: Applications and Theory' (Wiley, 2010), and 'High-Performance Scientific Computing' (Springer, 2012). He has published well over 150 peer-refereed journal and conference publications and book chapters. He has organized numerous workshops on Text Mining and was Conference Co-Chair of the 2003 SIAM Third International Conference on Data Mining (May 1-3) in San Francisco, CA. He was Program Co-Chair of the 2004 SIAM Fourth International Conference on Data Mining (April 22-24) in Orlando, FL., and he was a keynote speaker at the 2015 International  Conference on Soft Computing in Data Science (SCDS2015). He was also honorary chair of the 2016 International Conference on Soft Computing in Data Science (SCDS2016) in Kuala Lumpur, Malaysia. His research interests include information retrieval, data and text mining, computational science, bioinformatics, and parallel computing. Prof. Berry's research has been supported by grants and contracts from organizations such as the National Science Foundation, National Institutes of Health, the U.S. Department of Energy, the the National Aeronautics and Space Administration, and the Intel Corporation.   Professor Dr Azlinah Mohamed is a Professor at the Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA, Malaysia. She currently serves as the Dean of the faculty; she was previously the Special Officer to the Vice Chancellor and Head of the Academic Affairs and Development Unit of Universiti Teknologi MARA. She received her MSc (Artificial Intelligence) from University of Bristol, UK and PhD (Decision Support Systems) from Universiti Kebangsaan Malaysia. Her recent research activities and numerous professional publications in international conferences and local journals focus on her interests in the Artificial Intelligence, Decision Support Systems and Soft Computing. She has published well over 180 peer-refereed journal and conference publications and book chapters. She was the Honorary Chair of the 2015, 2016 and 2017 International Conference on Soft Computing in Data Science, and she was a keynote speaker at the 2016 International Conference on Soft Computing in Data Science (SCDS2016). She was also awarded with many competitive grants from ScienceFund, MOSTI and others on both academic and industrial projects for the industry, as well as for the government. Her research works includes the Information Professionals' Competency Assessment Model and the Multi-Parametric Pectin Lyase-Like Protein Function Classifier which had won many awards. She is also an active member of the Malaysia Information Technology Society (MITS), Lembaga Akredetasi Negara, Malaysia and Artificial Intelligence Society.   Professor Bee Wah Yap is a Professor at the Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA, Malaysia. She is the Head of Advanced Analytics Engineering Centre (AAEC), a Centre of Excellence in FSKM. She received her Bachelor of Science (Education)(Hons) degree, majoring in Mathematics from University of Science Malaysia, Master of Statistics from University of California Riverside and PhD (Statistics) from University of Malaya. Her research interests are in data mining, computational statistics and multivariate data analysis. She actively organizes SCDS2015, SCDS2016 and SCDS2017 conference which focus on Soft Computing in Data Science. She also actively conduct statistical workshops (IBM SPSS STATISTICS, IBM SPSS AMOS, PLS-SEM, SAS EMINER). She has published papers in ISI journals such as Expert Systems with Applications, Journal of Statistical Computation and Simulation, Communication in Statistics-Simulation and Computation, and also in Scopus indexed journals. She is also an active reviewer for international journals such as International Journal of Bank Marketing and Communication in Statistics-Simulation and Computation and Neurocomputing.

Berry / Mohamed / Yap Supervised and Unsupervised Learning for Data Science jetzt bestellen!

Weitere Infos & Material


1;Preface;6
2;Contents;8
3;Part I Algorithms;10
3.1;1 A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science;11
3.1.1;1.1 Introduction;11
3.1.1.1;1.1.1 Motivation and Scope;13
3.1.1.2;1.1.2 Novelty and Review Approach;13
3.1.2;1.2 Search Results;14
3.1.2.1;1.2.1 EBSCO and ProQuest Central Database Results;14
3.1.2.2;1.2.2 Distribution of Included Articles;17
3.1.3;1.3 Discussion;17
3.1.3.1;1.3.1 Decision Tree;18
3.1.3.2;1.3.2 Naïve Bayes;19
3.1.3.3;1.3.3 Support Vector Machine;21
3.1.3.4;1.3.4 k-Means Algorithms;22
3.1.3.5;1.3.5 Semisupervised and Other Learners;23
3.1.4;1.4 Conclusion and Future Work;23
3.1.5;References;24
3.2;2 Overview of One-Pass and Discard-After-Learn Conceptsfor Classification and Clustering in Streaming Environmentwith Constraints;30
3.2.1;2.1 Introduction;30
3.2.2;2.2 Constraints and Conditions;31
3.2.3;2.3 Concept of One-Pass and Discard-After-Learn for Classification and Clustering;32
3.2.4;2.4 Structure of Malleable Hyper-ellipsoid Function;35
3.2.5;2.5 Updating Malleable Hyper-ellipsoid Function;36
3.2.5.1;2.5.1 Recursively Updating Center;36
3.2.5.2;2.5.2 Recursively Updating Covariance Matrix;36
3.2.5.3;2.5.3 Merging Two Covariance Matrices;37
3.2.6;2.6 Analysis of Time and Space Complexities of Updating Computation;38
3.2.7;2.7 Applying Discard-After-Learn to Arbitrary Class Drift;38
3.2.8;2.8 Applying Discard-After-Learn to Expired Data in Clustering;41
3.2.9;2.9 Discussion;42
3.2.10;2.10 Conclusion;42
3.2.11;References;43
3.3;3 Distributed Single-Source Shortest Path Algorithms with Two-Dimensional Graph Layout;45
3.3.1;3.1 Introduction;45
3.3.2;3.2 Overviews;46
3.3.2.1;3.2.1 Single-Source Shortest Path Algorithms;46
3.3.2.2;3.2.2 Two-Dimensional Graph Layout;49
3.3.3;3.3 Novel Parallel SSSP Implementations;51
3.3.3.1;3.3.1 General Parallel SSSP for Distributed Memory Systems;51
3.3.3.2;3.3.2 Parallel SSSP with 2D Graph Layout;51
3.3.3.3;3.3.3 Other Optimizations;54
3.3.3.4;3.3.4 Summary of Implementations;55
3.3.4;3.4 Performance Results and Analysis;56
3.3.4.1;3.4.1 Experimental Setup;56
3.3.4.2;3.4.2 Algorithm and Communication Cost Analysis;57
3.3.4.3;3.4.3 Benefits of 2D SSSP Algorithms;58
3.3.4.4;3.4.4 Communication Cost Analysis;59
3.3.5;3.5 Conclusion and Future Work;59
3.3.6;References;63
3.4;4 Using Non-negative Tensor Decomposition for Unsupervised Textual Influence Modeling;65
3.4.1;4.1 Introduction;65
3.4.2;4.2 Modeling Influence;66
3.4.2.1;4.2.1 Tensors and Decompositions;67
3.4.2.2;4.2.2 Representing Documents as Tensors;71
3.4.2.3;4.2.3 Modeling Influence;71
3.4.2.4;4.2.4 Summary of Influence Modeling Procedure;73
3.4.3;4.3 Related Work;73
3.4.4;4.4 Influence Model;74
3.4.4.1;4.4.1 Approach Overview and Document Preparation;75
3.4.4.2;4.4.2 Tensor Construction;75
3.4.4.3;4.4.3 Tensor Decomposition;77
3.4.4.4;4.4.4 Factor Classification;79
3.4.5;4.5 Implementation;82
3.4.5.1;4.5.1 Constraining Vocabularies;82
3.4.6;4.6 A Conference Paper Case Study;83
3.4.7;4.7 Conclusions and Future Work;86
3.4.8;References;87
4;Part II Applications;89
4.1;5 Survival Support Vector Machines: A Simulation Study and Its Health-Related Application;90
4.1.1;5.1 Introduction;90
4.1.2;5.2 SURLS-SVM for Survival Analysis;91
4.1.3;5.3 Data Description and Methodology;93
4.1.4;5.4 Empirical Results;94
4.1.4.1;5.4.1 Effect of Features Dimension and Sample Size;94
4.1.4.2;5.4.2 Effect of Censoring Percentage;97
4.1.4.3;5.4.3 Effect of Sample Size;98
4.1.4.4;5.4.4 Discussion of the Results of the Simulation;101
4.1.4.5;5.4.5 Application to Health Data;103
4.1.5;5.5 Conclusion;104
4.1.6;References;104
4.2;6 Semantic Unsupervised Learning for Word Sense Disambiguation;106
4.2.1;6.1 Introduction;106
4.2.1.1;6.1.1 Word Sense Disambiguation;106
4.2.1.2;6.1.2 History and Approaches;107
4.2.2;6.2 Latent Semantic Analysis;108
4.2.3;6.3 LSA-WSD Approach;109
4.2.3.1;6.3.1 Sense Discovery;110
4.2.3.2;6.3.2 Sense Identification;110
4.2.3.3;6.3.3 Semantic Mean Clustering;111
4.2.4;6.4 Sense Discovery Using Synclustering;113
4.2.4.1;6.4.1 Experimentation Parameters;113
4.2.4.2;6.4.2 Observations and Results;114
4.2.5;6.5 Sense Identification Using the Context Comparison Method;118
4.2.5.1;6.5.1 Experimentation Parameters;119
4.2.5.2;6.5.2 Observations and Results;120
4.2.6;6.6 Conclusion and Future Research;123
4.2.7;References;123
4.3;7 Enhanced Tweet Hybrid Recommender System Using Unsupervised Topic Modeling and Matrix Factorization-Based Neural Network;126
4.3.1;7.1 Introduction;126
4.3.2;7.2 Related Works;128
4.3.2.1;7.2.1 Recommender System;128
4.3.2.2;7.2.2 Twitter;130
4.3.2.2.1;User Interest Prediction in Microblog Using the Recommendation Method;130
4.3.2.2.2;Collaborative Personalized Tweet Recommendation;131
4.3.2.3;7.2.3 Latent Dirichlet Allocation;131
4.3.2.4;7.2.4 Recommender System with LDA;133
4.3.2.4.1;Content-Based Filtering with LDA;133
4.3.2.4.2;Collaborative Filtering with LDA;134
4.3.2.5;7.2.5 Generalized Matrix Factorization;136
4.3.2.5.1;Matrix Factorization;136
4.3.2.5.2;Neural Network;137
4.3.3;7.3 The Proposed Method;138
4.3.3.1;7.3.1 Data Preparation;138
4.3.3.2;7.3.2 Content-Based Filtering Part;139
4.3.3.3;7.3.3 Collaborative Filtering Part;140
4.3.3.4;7.3.4 Prediction Step;141
4.3.4;7.4 Experimental Results;142
4.3.4.1;7.4.1 Dataset;142
4.3.4.2;7.4.2 Evaluation Metrics;143
4.3.4.3;7.4.3 Experimental Results;143
4.3.5;7.5 Discussion;144
4.3.5.1;7.5.1 Comparison Between the Proposed Method and User Interest Prediction in Microblog Using the Recommendation Method (CBF with LDA);145
4.3.5.2;7.5.2 Comparison Between the Proposed Method and the Improved Collaborative Filtering Algorithm Using the Topic Model (CF with LDA);146
4.3.6;7.6 Conclusion;147
4.3.7;References;147
4.4;8 New Applications of a Supervised Computational Intelligence (CI) Approach: Case Study in Civil Engineering;149
4.4.1;8.1 Introduction;149
4.4.2;8.2 Prediction of Hyperbolic Nonlinear Soil Stress–Strain Parameters (log k and Rf) by a Supervised Artificial Neural Network (ANN);151
4.4.2.1;8.2.1 Development of ANN Models;151
4.4.2.2;8.2.2 Model Inputs and Outputs;153
4.4.2.3;8.2.3 Preprocessing and Data Division;154
4.4.2.4;8.2.4 Scaling of Data;156
4.4.2.5;8.2.5 Model Architecture, Optimization, and Stopping Criteria;158
4.4.2.6;8.2.6 Parametric Study;166
4.4.2.7;8.2.7 Sensitivity Analysis of the ANN Model Inputs;168
4.4.3;8.3 ANN Model Equations;171
4.4.3.1;8.3.1 ANN Model Equation for log k;171
4.4.3.2;8.3.2 ANN Model Equation for Rf;173
4.4.4;8.4 Validity of the ANN Models Equation;175
4.4.5;8.5 Comparison Between Measured and Predicted Stress–Strain Relationship;175
4.4.6;8.6 Concluding Remarks;176
4.4.7;B.1 Appendix 2;183
4.4.8;References;185
5;Index;187



Ihre Fragen, Wünsche oder Anmerkungen
Vorname*
Nachname*
Ihre E-Mail-Adresse*
Kundennr.
Ihre Nachricht*
Lediglich mit * gekennzeichnete Felder sind Pflichtfelder.
Wenn Sie die im Kontaktformular eingegebenen Daten durch Klick auf den nachfolgenden Button übersenden, erklären Sie sich damit einverstanden, dass wir Ihr Angaben für die Beantwortung Ihrer Anfrage verwenden. Selbstverständlich werden Ihre Daten vertraulich behandelt und nicht an Dritte weitergegeben. Sie können der Verwendung Ihrer Daten jederzeit widersprechen. Das Datenhandling bei Sack Fachmedien erklären wir Ihnen in unserer Datenschutzerklärung.