Get in Touch

Course Outline

Introduction:

  • Apache Spark within the Hadoop Ecosystem
  • Quick overview of Python and Scala

Foundational Concepts (Theory):

  • System Architecture
  • Resilient Distributed Datasets (RDD)
  • Transformations and Actions
  • Stages, Tasks, and Dependencies

Hands-on Workshop: Mastering Basics in the Databricks Environment:

  • RDD API Exercises
  • Fundamental Action and Transformation Functions
  • PairRDDs
  • Join Operations
  • Caching Strategies
  • DataFrame API Exercises
  • SparkSQL
  • DataFrame Operations: select, filter, group, sort
  • User-Defined Functions (UDF)
  • Exploring the DataFrame API
  • Streaming Capabilities

Hands-on Workshop: Deployment in the AWS Environment:

  • AWS Glue Fundamentals
  • Comparing AWS EMR and AWS Glue
  • Sample Jobs in Both Environments
  • Evaluating Advantages and Disadvantages

Additional Content:

  • Introduction to Apache Airflow for Orchestration

Requirements

Programming proficiency (preferably in Python and Scala)

Basic knowledge of SQL

 21 Hours

Number of participants


Price per participant

Testimonials (3)

Upcoming Courses

Related Categories