Vohra | Practical Hadoop Ecosystem | Buch | 978-1-4842-2198-3 | sack.de

Buch, Englisch, 421 Seiten, Format (B × H): 178 mm x 254 mm, Gewicht: 8279 g

Vohra

Practical Hadoop Ecosystem

A Definitive Guide to Hadoop-Related Frameworks and Tools
1. Auflage 2016
ISBN: 978-1-4842-2198-3
Verlag: Apress

A Definitive Guide to Hadoop-Related Frameworks and Tools

Buch, Englisch, 421 Seiten, Format (B × H): 178 mm x 254 mm, Gewicht: 8279 g

ISBN: 978-1-4842-2198-3
Verlag: Apress


Learn how to use the Apache Hadoop projects, including MapReduce, HDFS, Apache Hive, Apache HBase, Apache Kafka, Apache Mahout, and Apache Solr. From setting up the environment to running sample applications each chapter in this book is a practical tutorial on using an Apache Hadoop ecosystem project.

While several books on Apache Hadoop are available, most are based on the main projects, MapReduce and HDFS, and none discusses the other Apache Hadoop ecosystem projects and how they all work together as a cohesive big data development platform.


What You Will Learn:
  • Set up the environment in Linux for Hadoop projects using Cloudera Hadoop Distribution CDH 5
  • Run a MapReduce job
  • Store data with Apache Hive, and Apache HBase
  • Index data in HDFS with Apache Solr
  • Develop a Kafka messaging system
  • Stream Logs to HDFS with Apache Flume
  • Transfer data from MySQL database to Hive, HDFS, and HBase with Sqoop
  • Create a Hive table over Apache Solr
  • Develop a Mahout User Recommender System

Who This Book Is For:
Apache Hadoop developers. Pre-requisite knowledge of Linux and some knowledge of Hadoop is required.
Vohra Practical Hadoop Ecosystem jetzt bestellen!

Zielgruppe


Professional/practitioner


Autoren/Hrsg.


Weitere Infos & Material


Introduction1. HDFS and MapReduceHadoop Distributed FileSystemMapReduce FrameworksSetting the EnvironmentHadoop Cluster ModesRunning a MapReduce Job with MR1 FrameworkRunning MR1 in Standalone ModeRunning MR1 in Psuedo-Distributed ModeRunning MapReduce with Yarn FrameworkRunning YARN in Psuedo-Distributed ModeRunning Hadoop Streaming
Section II Storing & Querying
2. Apache HiveSetting the EnvironmentConfiguring HadoopConfiguring HiveStarting HDFSStarting the Hive ServerStarting the Hive CLICreating a DatabaseUsing a DatabaseCreating a Managed TableLoading Data into a TableCreating a table using LIKEAdding Data with INSERT INTO TABLEAdding Data with INSERT OVERWRITECreating Table using AS SELECTAltering a TableTruncating a TableDropping a TableCreating an External Table

3. Apache HBase
Setting the EnvironmentConfiguring HadoopConfiguring HBaseConfiguring HiveStarting HBaseStarting HBase ShellCreating a HBase TableAdding Data To HBase TableListing All TablesGetting a Row of DataScanning a TableCounting Number of Rows in a TableAltering a TableDeleting a RowDeleting a ColumnDisabling and Enabling a TableTruncating a TableDropping a TableFinding if a Table existsCreating a Hive External Table
Section III Bulk Transferring & Streaming
4. Apache Sqoop
Installing MySQL DatabaseCreating MySQL Database TablesSetting the EnvironmentConfiguring HadoopStarting HDFSConfiguring HiveConfiguring HBaseImporting into HDFSExporting from HDFSImporting into HiveImporting into HBase

5. Apache Flume
Setting the EnvironmentConfiguring HadoopConfiguring HBaseStarting HDFSConfiguring FlumeRunning a Flume AgentConfiguring Flume for HBase SinkStreaming MySQL Log to HBase Sink
 Section IV Serializing  
6. Apache Avro
Setting the EnvironmentCreating an Avro SchemaCreating a Hive Managed TableCreating a Hive  (version prior to 0.14) External Table Stored as Avro

7. Apache Parquet
    Setting the Environment    Creating a Oracle Database Table    Exporting Oracle Database to a CSV File    Importing the CSV File in MongoDB    Exporting MongoDB Document as CSV File    Importing a CSV File to Oracle Database
Section V Messaging & Indexing
8. Apache Kafka
Setting the EnvironmentStarting the Kafka ServerCreating a TopicStarting a Kafka ProducerStarting a Kafka ConsumerProducing and Consuming MessagesStreaming Log Data to Apache Kafka with Apache Flume     Setting the Environment  Creating Kafka Topics  Configuring Flume<  Running Flume Agent  Consuming Log Data as Kafka Messages


9. Apache Solr
Setting the EnvironmentConfiguring the Solr SchemaStarting the Solr Server Indexing a Document in SolrDeleting a Document from Solr Indexing a Document in Solr with Java ClientSearching a Document in SolrCreating a Hive Managed TableCreating a Hive External TableLoading Hive External Table DataSearching Hive Table Data Indexed in Solr
Section VI Machine Learning         10.Apache Mahout
Setting the EnvironmentStarting HDFSSetting the Mahout EnvironmentRunning a Mahout Classification SampleRunning a Mahout Clustering  SampleDeveloping a User Based Recommender System   The Sample Data  Setting the Environment  Creating a Maven Project in Eclipse  Creating a User Based Recommender  Creating a Recommender Evaluator  Running the Recommender  Choosing a Recommender Type  Choosing a User Similarity  Measure  Choosing a Neighborhood Type  Choosing a Neighborhood Size for NearestNUserNeighborhood  Choosing  a Threshold for ThresholdUserNeighborhood  Running the Evaluator  Choosing the Split between Training Percentage and Test Percentage


Deepak Vohra is a coder, developer, programmer, book author, and technical reviewer.



Ihre Fragen, Wünsche oder Anmerkungen
Vorname*
Nachname*
Ihre E-Mail-Adresse*
Kundennr.
Ihre Nachricht*
Lediglich mit * gekennzeichnete Felder sind Pflichtfelder.
Wenn Sie die im Kontaktformular eingegebenen Daten durch Klick auf den nachfolgenden Button übersenden, erklären Sie sich damit einverstanden, dass wir Ihr Angaben für die Beantwortung Ihrer Anfrage verwenden. Selbstverständlich werden Ihre Daten vertraulich behandelt und nicht an Dritte weitergegeben. Sie können der Verwendung Ihrer Daten jederzeit widersprechen. Das Datenhandling bei Sack Fachmedien erklären wir Ihnen in unserer Datenschutzerklärung.