Buch, Englisch, 400 Seiten, Book, Format (B × H): 178 mm x 254 mm
Harness the Power and Promise of Big Data with HDP
Buch, Englisch, 400 Seiten, Book, Format (B × H): 178 mm x 254 mm
ISBN: 978-1-4842-0669-0
Verlag: Apress
Companies are finding new sources of valuable data—from social media to clickstreams to server logs to machine and geolocation data—and they know they need to employ big data tools, like Hadoop, to make the best use of it. Many organizations are turning to Hortonworks—started by twenty-four of the original team of Yahoo! engineers that developed Hadoop—a company that has emerged as one of the key vendors helping enterprise customers make use of Hadoop to gain new, powerful insights into customer needs and wants. The company’s flagship product, Hortonworks Data Platform (HDP), is more than a product. It is a platform and a suite of tools that together create a framework for loading, managing, accessing and analyzing massive volumes of data no matter the format or schema. Pro Hortonworks Data Platform: Harness the Power and Promise of Big Data with HDP, written by Hadoop and HDP expert Stephen Giles, is designed to help readers gain full advantage from HDP. The book—which assumes no prior knowledge of Hadoop—provides an understanding of all facets of HDP and how the various parts work both together and within a larger data platform. It is the insightful "missing manual" that all HDP users need to understand the platform in depth and how to use it to best advantage. Pro Hortonworks Data Platform provides a deep understanding of the specific components that make HDP so powerful. The book will: Show how to install, configure, and secure HDP and all its componentsIllustrate the full lifecycle of a big data project using HDP Provide a deep understanding of Yarn, the core engine of Hadoop Show how to leverage HDP/Apache tools like Pig, Hive, Hbase, and Solr to harness data Pro Hortonworks Data Platform provides insight and hands-on examples of how to work with each tool within the Hortonworks framework. Developers and IT pros will be able to get an understanding of Hadoop and its supporting tools, as well as a clear sense of where and when to take advantage of its power. This book will not only show you how to process data effectively—it will show you how to take advantage of the business opportunities that lie within that data.
Zielgruppe
Popular/general
Autoren/Hrsg.
Fachgebiete
Weitere Infos & Material
Chapter 1: Introduction to Hortonworks Data Platform (HDP) Chapter Goal: This chapter will set the stage for the rest of the book. It will discuss Hadoop and Big Data at a high level for those not familiar with this concepts. It will be the only general knowledge chapter in the book. This secondary purpose is to give the big picture of all the parts of the Hortonworks HDP ecosystem and put those parts in context.A brief history of HadoopBrief overview of the big data landscape and where Hadoop fits in Top level overview of the Hortonworks Data Platform and Enterprise HadoopChapter 2: Understanding HDFSChapter Goal: HDFS is a distributed storage system that form Hadoop. This chapter will define the base principles of Hadoop in HDP and how to work with MapReduce. Understanding HDFS architectureUnderstanding how data is stored in HDFSUnderstanding the relationship between NameNodes and DataNotesWorking with WebHDFS and Hadoop fs commandsChapter 3: Understanding YARNChapter Goal: YARN is the "operating system" of HDP. YARN allows both batch and real time access to data. This chapter will provide a deep understanding of Yarn and how it is employed in HDP.Description of the architecture of Yarn and its relationship to HDFSUnderstanding the components of Yarn (ResourceManager, NodeManager, ApplicationMasters and Containers) as configured in HDPUnderstanding MapReduce and how MapReduce jobs are executed under YARNChapter 4: Getting at Your Data Chapter Goal: HDP has a number of tools to query and explore your data without needing to write complex MapReduce jobs. This chapter will look at the key tools for accessing data in HDP. Scripting data access with PigQuerying data with Hadoop and HCatalogueCreating Hadoop data applications with TezChapter 5: Bringing NoSQL to Hadoop in HDPChapter Goal: This chapter builds on chapter 4 and discusses how some No SQL tools, built on top of YARN in HDP, can provide greater access to data. Understanding and working with HbaseUnderstanding and working with AccumuloChapter 6: Working with HDP in Real TimeChapter Goal: Traditional Hadoop was a batch-based process. YARN introduced the ability to add real time or near real time access to your data. This chapter will look at how developers can use Storm in HDP to process streaming data into their data applications. Working with StormUnderstanding the Trident APICombining Storm with HDFS for dataUse cases for streaming dataChapter 7: Installing and Configuring HDPChapter Goal: The next three chapters will pivot from the developer side of Hadoop to the administration of Hadoop within HDP. This chapter will walk through the process of Installing and configuring Hadoop. Installing Hortonworks HDP Configuring HDP HDP deployments in Windows, Linux, and private cloudsChapter 8: Securing HDPChapter Goal: Security and governance is one of the biggest concerns of all administrators. HDP provides particular security assurances that will help admins sleep better at night. This chapter will show how to secure Hadoop within HDP and how to integrate Hadoop into common directory services.Understanding Hadoop security conceptsSetting up authentication and authorization in HDPAuditing security accessLinking to other directory servicesSecuring a cluster with KnoxChapter 9: Monitoring and Managing Data in HDPChapter Goal: This chapter will explain how to monitor and manage a Hadoop cluster once it has been created in HDP.Monitoring and management approachesScheduling jobs with OozieDeploying and managing Hadoop with AmbariWorking with ZookeeperChapter 10: Getting Your Data into HDPChapter Goal: Once you have configured your Hadoop instance, the next step is to get data into the cluster. This chapter will look at a number of tools for providing ETL (Extract, Transform, Load) process to load data into HDP for Hadoop processing. Executing bulk transfers of data into and out of Hadoop using SqoopManaging data processing and governance with FalconLoading high volume streaming data into HDF using FlumeChapter 11: Understanding HDP Architectural PatternsChapter Goal: This chapter will look at some common architectural patterns for working effectively with HDP. Working with Lambda architectureThinking of data lakesChapter 12: Incorporating HDP into Your Larger Data InfrastructureChapter Goal: This chapter will look at how HDP can be incorporated into a larger data platform. It will place Hadoop with the context of BI solutions, data warehouses, and other MPP appliances (like Terradata and Netezza). Integrating HDP with enterprise data warehouses, RDBMS, and MPP systemsConnecting BI tools to HadoopIntegrating HDP with its ecosystem of analytics partnersChapter 13: Adding Advanced Search in HDP with SolrChapter Goal: This chapter will examine some advanced data access features in HDP, primarily Solr.Leveraging Apache Solr in HDP Full text indexing with Solr Searching Hadoop Data with Apache SolrChapter 14: Bringing HDP into the CloudChapter Goal: This final chapter will look forward to helping build Hadoop solutions in the cloud. This chapter will look both at HDInsight on Microsoft Azure and Hadoop on Amazon’s AWS platform.Hadoop on Azure and HDInsightLimitations of Hadoop with HDInsightRunning HDP on AWS Appendix: HDP Add OnsCovers Spark, Advanced Security, ODBC Driver, Teradata Connector, SCOM Management, Oracle Quest Data Connector