Azure Databricks
FEE
35,000
25,000/- +GST
JUN-15
09:00 - 01:00
Online
Weekends

Azure Databricks

Become an expert in Azure DataBricks

Azure Databricks is a powerful cloud-based platform that integrates with Microsoft Azure to provide a unified analytics service. The course covers key concepts such as setting up and managing clusters, working with Databricks notebooks, and performing ETL operations using Apache Spark. Learners will gain hands-on experience with data processing, real-time analytics, and advanced analytics. The course is designed for data engineers, data scientists, and analysts looking to leverage the capabilities of Azure Databricks for big data and AI projects.

Check your

eligibility to

take this

course

Eligibilty Icon

Why Azure - Data Bricks is so Important

Why Azure - Data Bricks is so Important

Industry growth

$16 billion by 2025

The Indian big data analytics market is expected to reach $16 billion by 2025, growing at a CAGR of 26%, as per a NASSCOM report. This growth is driven by increased adoption of data-driven decision-making in businesses.

Cloud services market

CAGR of 24.1%

According to IDC, the public cloud services market in India is projected to reach $10.8 billion by 2025, growing at a CAGR of 24.1%. Azure is one of the leading cloud service providers in India, driving demand for Azure-related skills.

The AI market in India

Grow at 20.2%

The AI market in India is expected to grow at a CAGR of 20.2% to reach $7.8 billion by 2025, according to NASSCOM. Skills in AI and machine learning, integrated with platforms like Azure Databricks, are highly sought after in the Indian job market.

Cyber Security Guy

2000

Currently there are over 2000 Job openings for roles requiring Azure Databricks skills in India.

source: Indeed.in

  • Databricks Certification
    • Data Analyst Associate
    • Data Engineer Associate
  • Databricks Certification
    • Data Analyst Associate
    • Data Engineer Associate

Course Curriculum

MODULE 1 (PySpark and Python)

  • Hadoop Vs Spark
  • MapReduce limitations
  • Spark History
  • Spark Architecture
  • Spark and Hadoop Advantages
  • Benefits of Spark + Hadoop
  • Introduction to Spark Eco-system
  • Persistence in Spark
  • HDFS data from Spark

  • Spark Cluster Architecture
  • Spark Master & Worker node Service Spark DAG
  • Spark Executor

  • Understanding RDD
  • Loading data into RDD
  • Scala RDD, Paired RDD, Double RDD & General RDD Functions Implementing HadoopRDD,
  • Filtered RDD, Joined RDD Transformations, Actions and Shared Variables
  • Spark Operations on YARN
  • Sequence File Processing
  • Lazy Evaluation

  • What are Transformations?
  • Different transformations with example e.g. map,filter,flatmap What are Actions?
  • Code example on transformations and actions

  • What are PairRDD?
  • Code example on transformations and actions on Pair RDD For each
  • Types of join with example

  • Seting up PySpark on EC2
  • Starting PySpark with Yarn
  • Staring PySpark with Standalone resource Manager Running Python Script using spark submit
  • Spark Shell
  • Basic operations on Shell
  • Spark Context and Spark Properties

  • Understanding various spark Resource managers Spark Dynamic and Static Allocation
  • Spark on YARN
  • Spark Standalone
  • Spark Executor Memory Spark Memory Storage levels

  • Introduction to Spark SQL
  • Introduction to Dataframes
  • Different Operations on Dataframe like groupBy, join, filter, select, distinct, min, max, etc
  • Various mathematical operations of Dataframe
  • Querying Files as Tables
  • Text file Format
  • Schemas
  • Overview of Structured Spark Types Columns/Rows
  • Logical / Physical Planning
  • Columns and Expressions
  • Records and Rows
  • Dataframe Transformations
  • Creating Dataframes
  • select and selectExpr
  • Adding Columns
  • Renaming Columns
  • Changing Column Types
  • Filtering Rows
  • Getting Unique Rows
  • Limit
  • Repartition and Coalesce
  • Joins

  • Introduction to Spark Streaming
  • Implementing Spark Streaming using Python Script

  • Structured Streaming Basics Core Concepts Transformations and Actions Input Sources
  • Sinks
  • Output Modes
  • Triggers
  • Structured Streaming in Action Transformations on Streams Input and Output

  • What is Apache Kafka
  • Kafka Features and terminologies
  • High level Kafka Architecture
  • Real life Kafka Case Studies
  • Kafka components - Broker, Producer, Consumer, Topics, Partitions
  • Different versions of Kafka
  • Installation of Kafka
  • Integration of Kafka and PySpark

  • Working with Strings
  • Lists and Tuples
  • Working with Dictionaries
  • Functions

  • Errors and exception handling
  • Overview of Standard Library
  • Object Oriented Python Programming

MODULE 2 (Azure DataBricks)

  • Introduction to Azure Databricks
  • Azure Databricks user interface
  • Azure Databricks Architecture overview

  • Introduction to Databricks Clusters
  • Cluster types
  • Cluster configuration
  • Creating a Cluster
  • Pricing
  • Cluster pool and policy
  • Databricks Jobs
  • Creating, Submitting and Running Jobs

  • Notebook introduction
  • Magic commands
  • Databricks utilities
  • Introducing DBFS commands
  • Running PySpark code in Notebooks

  • Introduction to Data Lake
  • Promise of Data Lake
  • Evolution of Data Lake
  • Challenges of Data Lake
  • Data Engineering goals
  • Introduction to Delta Lake
  • Data Lifecycle
  • The Delta Lake
  • Bronze/Silver and Gold Tables
  • Using Delta Lake
  • Introducing the Parquet format
  • Challenges of Parquet format
  • How does Delta Lake solve the Parquet challenge?
  • How does Delta Lake work?

  • Schema Drift and Evolution
  • Managing Delta Tables
  • Z-ordering of Delta Tables
  • Auto-optimize of Delta Tables
  • Time travel

LIVE Projects

Real-Time Streaming Analytics:

Project

Develop a real-time streaming analytics application to process and analyze live data feeds from IoT devices.

Tasks

  • Set up a Databricks cluster, use Spark Structured Streaming to ingest data from sources like Azure Event Hubs or Kafka
  • Perform real-time transformations and aggregations
  • Visualize the results in real-time using Power BI or Databricks dashboards.
  • Live Projects

    What will you Learn

    Cluster Management

    Cluster Management

    How to set up, configure, and manage Databricks clusters to efficiently process big data workloads.

    Databricks Notebooks

    Databricks Notebooks

    How to create and use notebooks for interactive data analysis, visualization, and collaboration using Python, SQL

    Data Engineering with Apache Spark

    Data Engineering with Apache Spark

    Techniques for performing ETL (Extract, Transform, Load) operations using Apache Spark, including data ingestion, transformation, and storage

    what will you learn in Data Bricks 1
    what will you learn in Data Bricks 2
    Real-Time Data Processing

    Real-Time Data Processing

    Utilizing Databricks for real-time analytics, streaming data processing, and handling large-scale data sets efficiently.

    Integrating with Azure Services

    Integrating with Azure Services

    How to connect Databricks with other Azure services like Azure Data Lake Storage, Azure SQL Database

    Advanced Analytics

    Advanced Analytics

    Understanding the concepts of Data lake architecture using Delta lake

    Certifications

    After this course you can go for the following certifications

    Databricks Certified - Data Analyst Associate
    Databricks Certified - Data Engineer Associate
    Databricks Certified - Associate Developer for Apache Spark
    Databricks Certified - Data Engineer Professional

    Key Highlights

    Trainer with 20+ years industry expertise
     industry expertise
    Hands-on Practical Training
    Hands-on Practical Training
    Program materials
    Program materials
    Recognized course completion certificate
    Recognized course completion certificate
    Recordings available online post session
    Recordings available online post session
    Totalskill Sigma Pvt. Ltd. 2023, All Rights Reserved
    Designed & Powered by Skill Sigma
    Lets talk talk icon