E-Book, Englisch, 166 Seiten
Mazumder / Singh Bhadoria / Deka Distributed Computing in Big Data Analytics
1. Auflage 2017
ISBN: 978-3-319-59834-5
Verlag: Springer Nature Switzerland
Format: PDF
Kopierschutz: 1 - PDF Watermark
Concepts, Technologies and Applications
E-Book, Englisch, 166 Seiten
Reihe: Scalable Computing and Communications
ISBN: 978-3-319-59834-5
Verlag: Springer Nature Switzerland
Format: PDF
Kopierschutz: 1 - PDF Watermark
Big data technologies are used to achieve any type of analytics in a fast and predictable way, thus enabling better human and machine level decision making. Principles of distributed computing are the keys to big data technologies and analytics. The mechanisms related to data storage, data access, data transfer, visualization and predictive modeling using distributed processing in multiple low cost machines are the key considerations that make big data analytics possible within stipulated cost and time practical for consumption by human and machines. However, the current literature available in big data analytics needs a holistic perspective to highlight the relation between big data analytics and distributed processing for ease of understanding and practitioner use.This book fills the literature gap by addressing key aspects of distributed processing in big data analytics. The chapters tackle the essential concepts and patterns of distributed computing widely used in big data analytics. This book discusses also covers the main technologies which support distributed processing. Finally, this book provides insight into applications of big data analytics, highlighting how principles of distributed computing are used in those situations. Practitioners and researchers alike will find this book a valuable tool for their work, helping them to select the appropriate technologies, while understanding the inherent strengths and drawbacks of those technologies.
Autoren/Hrsg.
Weitere Infos & Material
1;Editor’s Notes;5
2;Contents;8
3;On the Role of Distributed Computing in Big Data Analytics;9
3.1;1 Introduction;9
3.2;2 History and Key Characteristics of Big Data;11
3.3;3 Key Aspects of Big Data Analytics;14
3.4;4 Popular Technologies for Big Data Analytics Utilizing Concepts of Distributed Computing;15
3.4.1;4.1 Hadoop;15
3.4.2;4.2 Yarn;16
3.4.3;4.3 Hadoop Map Reduce;16
3.4.4;4.4 Spark;16
3.5;5 Conclusion;17
3.6;References;17
4;Fundamental Concepts of Distributed Computing Used in Big Data Analytics;19
4.1;1 Introduction;19
4.2;2 Multithreading and Multiprocessing;20
4.2.1;2.1 Concept of Multiprocessing;20
4.2.2;2.2 Example of Multiprocessing;20
4.2.3;2.3 Concept of Multithreading;20
4.2.4;2.4 Example of Multithreading;21
4.2.5;2.5 Difference between Multiprocessing and Multithreading;22
4.3;3 Computing Architecture in Distributed Computing;24
4.3.1;3.1 SISD;24
4.3.2;3.2 Vector Processor;24
4.3.3;3.3 SIMD;24
4.3.4;3.4 MIMD;26
4.3.5;3.5 SM-MIMD;26
4.3.6;3.6 DM-MIMD;27
4.4;4 Scalability in Distributing Computing;28
4.4.1;4.1 Scalability Requirement and Category;28
4.4.2;4.2 Scaling Up;29
4.4.3;4.3 Scaling Out;30
4.4.4;4.4 Prospect of Scale Up and Scale Out;31
4.5;5 Queuing Network Model for Distributed Computing;31
4.5.1;5.1 Asynchronous Communication;32
4.5.2;5.2 Queue System;32
4.5.3;5.3 Queue Modeling;33
4.6;6 Application of CAP Theorem;34
4.6.1;6.1 Basic Concepts of Consistency, Availability, and Partition Tolerance;34
4.6.2;6.2 Combination of Consistency, Availability, and Partition Tolerance;35
4.7;7 Quality of Service (QoS) Requirements in Big Data Analytics;36
4.7.1;7.1 Performance;36
4.7.2;7.2 Interoperability;36
4.7.3;7.3 Fault-Tolerance;37
4.7.4;7.4 Security;37
4.7.5;7.5 Manageability;38
4.7.6;7.6 Load-Balance;39
4.7.7;7.7 High-Availability (HA);39
4.7.8;7.8 SLA;40
4.8;8 Conclusion;41
4.9;References;41
5;Distributed Computing Patterns Useful in Big Data Analytics;43
5.1;1 Introduction;43
5.2;2 Primitives for Concurrent Programming;45
5.2.1;2.1 Concurrency Expression;45
5.2.2;2.2 Synchronization;46
5.3;3 Communication Protocols and Message Exchange;47
5.3.1;3.1 Synchronous Communication;47
5.3.2;3.2 Asynchronous Communication;48
5.3.3;3.3 Pseudo-Synchronous Communication;48
5.3.4;3.4 Client/Server Paradigm;49
5.3.5;3.5 Communication Deployment in Big Data;49
5.4;4 Data Distribution in Big Data on Distributed Environments;51
5.5;5 Implementation Problems;56
5.5.1;5.1 Race Condition Problems;56
5.5.2;5.2 Message Exchange;58
5.6;6 Conclusion;59
5.7;References;60
6;Distributed Computing Technologies in Big Data Analytics;64
6.1;1 Introduction;64
6.2;2 Distributed Database;66
6.2.1;2.1 NoSQL Database;67
6.3;3 Distributed Storage;71
6.3.1;3.1 Hadoop Distributed File System (HDFS);72
6.4;4 Distributed Computation;74
6.4.1;4.1 Map-Reduce in Hadoop;75
6.4.2;4.2 Spark;77
6.5;5 Machine Learning Platforms;78
6.6;6 Search System;79
6.6.1;6.1 Search Software;80
6.7;7 Big Data Messaging Software;82
6.8;8 Cache;84
6.8.1;8.1 Distributed Caching Systems;84
6.9;9 Data Visualization;86
6.10;10 Conclusion;86
6.11;References;88
7;Security Issues and Challenges in Big Data Analytics in Distributed Environment;90
7.1;1 Introduction;90
7.1.1;1.1 Security Issues in Big Data in Distributed Environment;92
7.2;2 Infrastructure Based Security;92
7.2.1;2.1 Secure Computations;92
7.2.2;2.2 Secure Non-relational Data Stores;94
7.3;3 Data Privacy;94
7.3.1;3.1 Privacy Preservation in Data Mining;94
7.3.2;3.2 Cryptography Control Mechanism;95
7.3.3;3.3 Granular Access Control;95
7.4;4 Data Integrity and Data Management;96
7.4.1;4.1 Granular Audits;96
7.4.2;4.2 Secure Transactions and Transaction Logs;96
7.4.3;4.3 Data Provenance;97
7.5;5 Reactive Security;97
7.5.1;5.1 Input Validation at Distributed Nodes;97
7.5.2;5.2 Real Time Security;98
7.6;6 Countermeasures;98
7.7;7 Conclusion;100
7.8;References;100
8;Scientific Computing and Big Data Analytics: Application in Climate Science;102
8.1;1 Introduction;102
8.2;2 Computational Challenges in Solving Scientific Problems;103
8.3;3 Climate Change and Big Data Analytics;105
8.4;4 Use Case on Climate Analytics;105
8.4.1;4.1 The Scientific Challenge of the Climate System;105
8.4.2;4.2 Computational Challenge of the Climate Modeling;107
8.4.3;4.3 Post-processing Climate Model Output;109
8.4.4;4.4 BigData Climate Analytics Using Spark;109
8.5;5 Conclusions;111
8.6;References;112
9;Distributed Computing in Cognitive Analytics;114
9.1;1 Introduction;114
9.2;2 Building Blocks of Cognitive Analytic System;115
9.2.1;2.1 The Data Repositories;115
9.2.2;2.2 The Data Ingestion Tools;115
9.2.3;2.3 The Analytical Frameworks;116
9.2.4;2.4 The Hardware Components;118
9.2.5;2.5 Key Non-functional Requirements to Consider;118
9.2.5.1;2.5.1 High Concurrency Throughput;118
9.2.5.2;2.5.2 Interfaces for Interaction with Systems;118
9.2.5.3;2.5.3 High Availability and Disaster Recovery;119
9.2.5.4;2.5.4 Linear Scalability;119
9.2.5.5;2.5.5 Ability to Prioritize Workload;119
9.2.6;2.6 Cognitive System – Implementation Patterns;120
9.3;3 Cognitive System – Use Cases;120
9.3.1;3.1 Cognitive Systems in Health Care;121
9.3.2;3.2 Cognitive Systems in Internet of Things Domain;122
9.3.3;3.3 Cognitive Analytics to Become a Customer Centric Organization;124
9.3.3.1;3.3.1 Next Best Action;124
9.3.3.2;3.3.2 Changing Engagement Patterns;124
9.3.3.3;3.3.3 360 ° View of Customer;124
9.3.3.4;3.3.4 Understand Thy Customer;125
9.4;4 Conclusion;126
9.5;References;127
10;Distributed Computing in Social Media Analytics;128
10.1;1 Introduction;128
10.2;2 Open Source Tools for Social Media Analytics;129
10.3;3 Influencer Analytics;129
10.3.1;3.1 Understanding the Impact of Influencers;129
10.3.2;3.2 Wimbledon Influencer Case Study;130
10.4;4 Social Polling;132
10.4.1;4.1 Sentiment Analysis;132
10.4.2;4.2 Intent Detection;134
10.4.3;4.3 Topic Monitoring;134
10.4.4;4.4 User Segmentation;136
10.4.5;4.5 Some Social Polling Examples;137
10.4.6;4.6 Social Polling for Demand Planning;138
10.5;5 Conclusion;139
10.6;References;140
11;Utilizing Big Data Analytics for Automatic Building of Language-agnostic Semantic Knowledge Bases;143
11.1;1 Introduction;143
11.2;2 Search Engines;144
11.2.1;2.1 Key Technologies;144
11.2.2;2.2 Inverted Index;145
11.2.3;2.3 Sharding of Data;145
11.2.4;2.4 Replication of Data;146
11.2.5;2.5 Denormalized Data Model;147
11.2.6;2.6 Distributed Aggregation and Scoring;147
11.3;3 Recommendation Systems;148
11.4;4 Semantic Discovery;149
11.4.1;4.1 Problem Description;149
11.4.2;4.2 Semantic Similarity;150
11.4.3;4.3 Probabilistic Semantic Similarity Scoring Using PGMHD;151
11.4.4;4.4 Distributed PGMHD;152
11.5;5 Word Sense Ambiguity Detection;152
11.5.1;5.1 Ambiguity Score;154
11.5.2;5.2 Resolving Word Sense Ambiguity;155
11.6;6 Semantic Knowledge Graph;157
11.6.1;6.1 Model Structure;158
11.6.2;6.2 Materialization of Nodes and Edges;158
11.6.3;6.3 Discovering Semantic Relationships;160
11.6.4;6.4 Scoring Semantic Relationships;160
11.6.5;6.5 Scaling Characteristics;163
11.7;7 Real World Applications;164
11.8;8 Conclusion;165
11.9;References;165




