Eagar | Data Engineering with AWS | E-Book | www2.sack.de
E-Book

E-Book, Englisch, 636 Seiten

Eagar Data Engineering with AWS

Acquire the skills to design and build AWS-based data transformation pipelines like a pro
1. Auflage 2023
ISBN: 978-1-80461-313-9
Verlag: Packt Publishing
Format: EPUB
Kopierschutz: 0 - No protection

Acquire the skills to design and build AWS-based data transformation pipelines like a pro

E-Book, Englisch, 636 Seiten

ISBN: 978-1-80461-313-9
Verlag: Packt Publishing
Format: EPUB
Kopierschutz: 0 - No protection



This book, authored by a Senior Data Architect with 25 years of experience, helps you gain expertise in the AWS ecosystem for data engineering. This revised edition updates every chapter to cover the latest AWS services and features, provides a refreshed view on data governance, and introduces a new section on building modern data platforms. You will learn how to implement a data mesh, work with open-table formats such as Apache Iceberg, and apply DataOps practices for automation and observability.

You will begin by exploring core concepts and essential AWS tools used by data engineers, along with modern data management approaches. You will then design and build data pipelines, review raw data sources, transform data, and understand how it is consumed by various stakeholders. The book also covers data governance, populating data marts and warehouses, and how a data lakehouse fits into the architecture. You will explore AWS tools for analysis, SQL queries, visualizations, and learn how AI and machine learning generate insights from data. Later chapters cover transactional data lakes, data meshes, and building a complete AWS data platform.

By the end, you will be able to confidently implement data engineering pipelines on AWS.
*Email sign-up and proof of purchase required

Eagar Data Engineering with AWS jetzt bestellen!

Autoren/Hrsg.


Weitere Infos & Material


Preface


We live in a world where the amount of data being generated is constantly increasing. While a few decades ago, an organization may have had a single database that could store everything they needed to track, today most organizations have tens, hundreds, or even thousands of databases, along with data warehouses, and perhaps a data lake. And these data stores are being fed from an increasing number of data sources (transaction data, web server log files, IoT and other sensors, and social media, to name just a few).

It is no surprise that we hear more and more companies talk about being data-driven in their decision making. But in order for an organization to be truly data-driven, they need to be masters of managing and drawing insights from these ever-increasing quantities and types of data. And to enable this, organizations need to employ people with specialized data skills.

Doing a search on LinkedIn for jobs related to data returns nearly 800,000 results (and that is just for the United States!). The job titles include roles such as data engineer, data scientist, and data architect.

This revised edition of the book includes updates to all chapters, covering new features and services from AWS, as well as three brand-new chapters. In these new chapters, we cover topics such as building transactional data lakes (using open table formats such as Apache Iceberg), implementing a data mesh approach on AWS, and using a DataOps approach to building a modern data platform.

While this book will not magically turn you into a data engineer, it has been designed to accelerate your journey toward data engineering on AWS. By the end of this book, you will not only have learned some of the core concepts around data engineering, but you will also have a good understanding of the wide variety of tools available in AWS for working with data. You will also have been through numerous hands-on exercises, and thus gained practical experience with things such as ingesting streaming data, transforming and optimizing data, building visualizations, and even drawing insights from data using AI.

Who this book is for


This book has been designed for two groups of people; firstly, those looking to get started with a career in data engineering, and who want to learn core data engineering concepts. This book introduces many different aspects of data engineering, providing a comprehensive high-level understanding of, and practical hands-on experience with, different focus areas of data engineering.

Secondly, this book is for those people who may already have an established career focused on data, but who are new to the cloud, and to AWS in particular. For these people, this book provides a clear understanding of many of the different AWS services for working with data, and gives them hands-on experience with a variety of these AWS services.

What this book covers


Each of the chapters in this book takes the approach of introducing important concepts or key AWS services, and then providing a hands-on exercise related to the topic of the chapter:

, , reviews the challenges of ever-increasing dataset volumes, and the role of the data engineer in working with data in the cloud.

, , introduces foundational concepts and technologies related to big data processing.

, , provides an introduction to a wide range of AWS services that are used for ingesting, processing, and consuming data, and orchestrating pipelines.

, , covers the all-important topics of keeping data secure, ensuring good data governance, and the importance of cataloging your data.

, , provides an approach for whiteboarding the high-level design of a data engineering pipeline.

, , looks at the variety of data sources that we may need to ingest from, and examines AWS services for ingesting both batch and streaming data.

, , covers common transformations for optimizing datasets and for applying business logic.

, , is about better understanding the different types of data consumers that a data engineer may work to prepare data for.

, , focuses on the use of data warehouses as a data mart and looks at moving data between a data lake and data warehouse. This chapter also does a deep dive into Amazon Redshift, a cloud-based data warehouse.

, , looks at how various data engineering tasks and transformations can be put together in a data pipeline, and how these can be run and managed with pipeline orchestration tools such as AWS Step Functions.

, , does a deeper dive into the Amazon Athena service, which can be used to run SQL queries directly on data in the data lake, and beyond.

, , discusses the importance of being able to craft visualizations of data, and how the Amazon QuickSight service enables this.

, , reviews how AI and ML are increasingly important for gaining new value from data, and introduces some of the AWS services for both ML and AI.

, , looks at new table formats (including Apache Iceberg, Apache Hudi, and Delta Lake) that bring traditional data warehousing type features to data lakes.

, , discusses a recent trend, referred to as a data mesh, that provides a new way to approach analytical data management and data sharing within an organization.

, , introduces important concepts, such as DataOps, which provides automation and observability when building a modern data platform.

, , concludes the book by looking at the bigger picture of data analytics, including real-world examples of data pipelines, and a review of emerging trends in the industry.

To get the most out of this book


Basic knowledge of computer systems and concepts, and how these are used within large organizations, is helpful prerequisite knowledge for this book. However, no data engineering-specific skills or knowledge are required. Also, a familiarity with cloud computing fundamentals and core AWS systems will make it easier to follow along, especially with the hands-on exercises, but detailed step-by-step instructions are included for each task.

Note:

If you are using the digital version of this book, we advise you to access the code from the book’s GitHub repository (a link is available in the next section), rather than copying and pasting from the PDF or electronic version. Doing so will help you avoid any potential formatting errors when copying and pasting code.

Download the example code files


The code bundle for the book is hosted on GitHub at https://github.com/PacktPublishing/Data-Engineering-with-AWS-2nd-edition. We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Download the color images


We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://packt.link/gbp/9781804614426.

Conventions used


There are a number of text conventions used throughout this book.

: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. For example: “Include a clause.”

A block of code is set as follows:

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

Bold: Indicates a new term, an important word, or words that you see on the screen. For instance, words in menus or dialog boxes appear in the text like this. For example: “In addition, you can use Spark SQL to process data using standard SQL.”

Warnings or important notes appear like this.

Tips and tricks appear like this.

Get in touch


Feedback from our readers is always welcome.

...



Ihre Fragen, Wünsche oder Anmerkungen
Vorname*
Nachname*
Ihre E-Mail-Adresse*
Kundennr.
Ihre Nachricht*
Lediglich mit * gekennzeichnete Felder sind Pflichtfelder.
Wenn Sie die im Kontaktformular eingegebenen Daten durch Klick auf den nachfolgenden Button übersenden, erklären Sie sich damit einverstanden, dass wir Ihr Angaben für die Beantwortung Ihrer Anfrage verwenden. Selbstverständlich werden Ihre Daten vertraulich behandelt und nicht an Dritte weitergegeben. Sie können der Verwendung Ihrer Daten jederzeit widersprechen. Das Datenhandling bei Sack Fachmedien erklären wir Ihnen in unserer Datenschutzerklärung.