Beginning Spark

1. Auflage 2016
ISBN: 978-1-4842-1309-4
Verlag: Apress

Buch, Englisch, 300 Seiten, Book, Format (B × H): 178 mm x 254 mm

ISBN: 978-1-4842-1309-4
Verlag: Apress

33,12 €

(inkl. MwSt.)

versandkostenfreie Lieferung
Lieferzeit ca. 3 bis 4 Wochen

Bücher versandkostenfrei

kostenlose Rücksendung

Take a deep dive into Apache Spark and the big data ecosystem. You will acquire an understanding of the next generation of distribution systems, Apache Spark architecture and abstraction, and the Spark ecosystem including Spark SQL, GraphX and MLlib. Beginning Spark provides a practical guide for using Apache Spark in real-world data processing. The author discusses and illustrates how different concepts of Spark are brought together in order to solve complex issues with a data flow system.

With the rise in popularity of distributed systems like Hadoop, more and more people are working in big data processing. A growing number of companies want to build dataflow systems, which can churn huge amounts of data to gain insights for their business. Since Hadoop was a first generation, open source distributed system, there is a need for a next generation distributed system to take data processing to next level. Apache Spark is the next step in that direction. Spark brings a great flexibility and compositional system to the big data world by revolutionizing the field itself.

Phatak Beginning Spark jetzt bestellen!

Zielgruppe

Popular/general

Autoren/Hrsg.

Phatak, Madhukara

Fachgebiete

Weitere Infos & Material

Inhaltsverzeichnis

Table of Contents

Chapter 1: Introduction to next generation distributed systems

Chapter Goal:

Talks about different kind of distributed systems. Also how the distributed systems evolved over the years from terradata to Hadoop spark etc. How Apache Spark is different than Hadoop.

Chapter 2: Introduction to Apache Spark

The architecture and RDD abstraction of Spark. It talks about how Spark distributes the jobs on cluster systems like mesos and yarn.

Chapter 3: Getting started RDD API

This chapter discusses about how to get started with RDD scala API. The chapter starts with a practical example, retail analytics, as a project. Comes with runnable code.

Chapter 4: Map/Reduce RDD API

This chapter discusses about Map/Reduce API of Spark. It talks about shuffling, folding, join and group operation. Comes with runnable code.

Chapter 5: Advanced RDD API

This chapter talks about advanced api like aggregate, mapParitions to control the processing of spark. Comes with runnable code.

Chapter 6: Spark caching

In memory processing is one of the most important part of the Apache Spark. This chapter discusses about how spark implements cache and how to use caching to speed up execution of your spark examples. Comes with runnable code.

Chapter 7: Integrating wi
th Hadoop

Spark integrated beautifully with Hadoop. This chapter discusses about how spark integrates with HDFS and YARN. Comes with runnable code.

Chapter 8: Introduction to Spark Streaming

Spark streaming is a real time system build on top of Spark. It allows developer to use same Spark API to real time systems.

Chapter 9: Anatomy of RDD

This chapter takes a deeper dive into how different RDD is build. The deeper understanding of RDD is very much necessary in order to exploit the spark abstraction to fullest.

Chapter 10: SparkQL, Sql on Spark

This chapter talks about using sql query language in Spark to process structured data. Comes with examples.

Chapter 11: Graphax, Graph processing in Spark

Graph processing is one of the important part of any distributed system. This chapter talks about how graph processing is achieved using Graphax, the graph processing library on Apache Spark.

Chapter 12: MLLib, Machine learning in Spark

With advancement of AI, the machine learning is becoming more and more important. This chapter discussed how to use MLLib, machine learning library to do recommendation, prediction in spark.

Chapter 13: How all comes together
One of the strength of the spark is how different parts of ecosystem comes together to solve problems. This chapter shows how you can mix scala, sql and machine learning in one program to solve a complex problem.

Über Autor(innen)

Madhukara phatak is a big data consultant and trainer at DataMantra. He is teaching and consulting on big data from last five years. Prior to DataMantra he worked on various products in telecom, scientific research and retail marketing at Zinnia Systems. He is also contributed to Hadoop.He holds a bachelor of Computer science degree from VTU.

Produktsicherheit

Fragen zum Artikel?

Ihre Fragen, Wünsche oder Anmerkungen

Vorname*

Nachname*

Ihre E-Mail-Adresse*

Kundennr.

Ihre Nachricht*

Lediglich mit * gekennzeichnete Felder sind Pflichtfelder.

Wenn Sie die im Kontaktformular eingegebenen Daten durch Klick auf den nachfolgenden Button übersenden, erklären Sie sich damit einverstanden, dass wir Ihr Angaben für die Beantwortung Ihrer Anfrage verwenden. Selbstverständlich werden Ihre Daten vertraulich behandelt und nicht an Dritte weitergegeben. Sie können der Verwendung Ihrer Daten jederzeit widersprechen. Das Datenhandling bei Sack Fachmedien erklären wir Ihnen in unserer Datenschutzerklärung.

33,12 € (inkl. MwSt.)

Lieferzeit ca. 3 bis 4 Wochen

Bücher versandkostenfrei

kostenlose Rücksendung

Webcode: www2.sack.de/cu0rl