DP-203T00 Data Engineering on Microsoft Azure
Module 1: Introduction to Data Engineering on Azure
– What is Data Engineering?
– Important Data Engineering Concepts
– Data Engineering in Microsoft Azure
Module 2: Introduction to Azure Data Lake Storage Gen2
– Understanding Azure Data Lake Storage Gen2
– Enabling Azure Data Lake Storage Gen2 in Azure Storage
– Comparing Azure Data Lake Store to Azure Blob storage
– Understanding the stages for processing big data
– Using Azure Data Lake Storage Gen2 in data analytics workloads
Module 3: Introduction to Azure Synapse Analytics
– What is Azure Synapse Analytics?
– How Azure Synapse Analytics works
– When to Use Azure Synapse Analytics
Module 4: Use Azure Synapse serverless SQL pool to query files in a data lake
– Understanding Azure Synapse serverless SQL pool capabilities and use cases
– Querying files using a serverless SQL pool
– Creating external database objects
Module 5: Use Azure Synapse serverless SQL pools to transform data in a data lake
– Transforming data files with the CREATE EXTERNAL TABLE AS SELECT statement
– Encapsulating data transformations in a stored procedure
– Including a data transformation stored procedure in a pipeline
Module 6: Create a Lake Database in Azure Synapse Analytics
– Understanding lake database concepts
– Exploring database templates
– Creating a lake database
– Using a lake database
Module 7: Analyze data with Apache Spark in Azure Synapse Analytics
– Getting to know Apache Spark
– Using Spark in Azure Synapse Analytics
– Analyzing data with Spark
– Visualizing data with Spark
Module 8: Transform data with Spark in Azure Synapse Analytics
– Modifying and saving dataframes
– Partitioning data files
– Transforming data with SQL
Module 9: Use Delta Lake in Azure Synapse Analytics
– Understanding Delta Lake
– Creating Delta Lake tables
– Creating catalog tables
– Using Delta Lake with streaming data
– Using Delta Lake in a SQL pool
Module 10: Analyze data in a relational data warehouse
– Designing a data warehouse schema
– Creating data warehouse tables
– Loading data warehouse tables
– Querying a data warehouse
Module 11: Load data into a relational data warehouse
– Loading staging tables
– Loading dimension tables
– Loading time dimension tables
– Loading slowly changing dimensions
– Loading fact tables
– Performing post-load optimization
Module 12: Build a data pipeline in Azure Synapse Analytics
– Understanding pipelines in Azure Synapse Analytics
– Creating a pipeline in Azure Synapse Studio
– Defining data flows
– Running a pipeline
Module 13: Use Spark Notebooks in an Azure Synapse Pipeline
– Understanding Synapse Notebooks and Pipelines
– Using a Synapse notebook activity in a pipeline
– Using parameters in a notebook