A Problem-Solution Approach with PySpark2
Buch, Englisch, 265 Seiten, Format (B × H): 155 mm x 235 mm, Gewicht: 4453 g
ISBN: 978-1-4842-3140-1
Verlag: Apress
PySpark Recipes covers Hadoop and its shortcomings. The architecture of Spark, PySpark, and RDD are presented. You will learn to apply RDD to solve day-to-day big data problems. Python and NumPy are included and make it easy for new learners of PySpark to understand and adopt the model.
What You Will Learn
- Understand the advanced features of PySpark2 and SparkSQL
- Optimize your code
- Program SparkSQL with Python
- Use Spark Streaming and Spark MLlib with Python
- Perform graph analysis with GraphFrames
Zielgruppe
Professional/practitioner
Autoren/Hrsg.
Fachgebiete
- Mathematik | Informatik EDV | Informatik Daten / Datenbanken Data Mining
- Mathematik | Informatik EDV | Informatik Programmierung | Softwareentwicklung Programmier- und Skriptsprachen
- Mathematik | Informatik EDV | Informatik Programmierung | Softwareentwicklung Programmierung: Methoden und Allgemeines
Weitere Infos & Material
Chapter 1: The Era of Big Data, Hadoop, and Other Big Data Processing Frameworks.- Chapter 2: Installation.- Chapter 3: Introduction to Python and NumPy.- Chapter 4: Spark Architecture and Resilient Distributed Dataset.- Chapter 5: The Power of Pairs: Paired RDD.- Chapter 6: IO in PySpark.- Chapter 7: Optimizing PySpark and PySpark Streaming.- Chapter 8: PySparkSQL.- Chapter 9: PySpark MLlib and Linear Regression.