About the Course

Apache Spark is an open-source cluster-computing framework. Originally developed at the University of California, Berkeley's AMPLab, the Spark codebase was later donated to the Apache Software Foundation, which has maintained it since. Spark provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.


Learn from industry experts with live instructor-led training

Projects & Lab

Apply the skills you learn to solve real-world problems.


Highlight your new skills on your resume or LinkedIn.                      

1:1 Mentoring

Get guidance from industry leaders and professionals.

Best-in-class Support

24×7 support and forum access to answer all your queries throughout your learning journey.


Compatible to Hortonworks Certified Developer (HDPCD)



17 Sept 2018
Online Instructor Based Training
30 days
371 36,000


Mon to Fri (4 weeks)
10 AM - 12 PM
30 days
514 46,000

Mon to Fri (4 weeks)
10 AM - 12 PM
30 days
514 46,000

Learning Path


About the Course

This course is a part of the Specialization Course in Big Data with Hadoop.

What is Scala?

Why Scala for Spark?

Scala in other frameworks

Introduction to Scala REPL

Basic Scala operations

Variable Types in Scala

Control Structures in Scala

Foreach loop, Functions and Procedures

Collections in Scala- Array

ArrayBuffer, Map, Tuples, Lists, and more

Class in Scala

Getters and Setters

Custom Getters and Setters

Properties with only Getters

Auxiliary Constructor and Primary Constructor


Extending a Class

Overriding Methods

Traits as Interfaces and Layered Traits


Higher Order Functions

Anonymous Functions, and more

What is Big Data?

Big Data Customer Scenarios

Limitations and Solutions of Existing Data Analytics Architecture with Uber Use Case

How Hadoop Solves the Big Data Problem

What is Hadoop?

Hadoop’s Key Characteristics

Hadoop Ecosystem and HDFS

Hadoop Core Components

Rack Awareness and Block Replication

HDFS Read/Write Mechanism

YARN and Its Advantage

Hadoop Cluster and Its Architecture

Hadoop: Different Cluster Modes

Data Loading using Sqoop

Big Data Analytics with Batch & Real-Time Processing

Why Spark is Needed?

What is Spark?

How Spark Differs from Its Competitors?

Spark at eBay

Spark’s Place in Hadoop Ecosystem

Spark Components & it’s Architecture

Running Programs on Scala IDE & Spark Shell

Spark Web UI

Configuring Spark Properties

Challenges in Existing Computing Methods

Probable Solution & How RDD Solves the Problem

What is RDD, It’s Functions, Transformations & Actions?

Data Loading and Saving Through RDDs

Key-Value Pair RDDs and Other Pair RDDs o RDD Lineage

RDD Persistence

WordCount Program Using RDD Concepts

RDD Partitioning & How It Helps Achieve Parallelization

Need for Spark SQL

What is Spark SQL?

Spark SQL Architecture

SQL Context in Spark SQL

Data Frames & Datasets

Interoperating with RDDs

JSON and Parquet File Formats

Loading Data through Different Sources

What is Machine Learning?

Where is Machine Learning Used?

Different Types of Machine Learning Techniques

Face Detection: USE CASE

Understanding MLlib

Features of Saprk MLlib and MLlib Tools

Various ML algorithms supported by Spark MLlib

K-Means Clustering & How It Works with MLlib

Analysis on US Election Data: K-Means Spark MLlib USE CASE

Need for Kafka

What is Kafka?

Core Concepts of Kafka

Kafka Architecture

Where is Kafka Used?

Understanding the Components of Kafka Cluster

Configuring Kafka Cluster

Producer and Consumer

Need of Apache Flume

What is Apache Flume

Basic Flume Architecture

Flume Sources

Flume Sinks

Flume Channels

Flume Configuration

Integrating Apache Flume and Apache Kafka

Drawbacks in Existing Computing Methods

Why Streaming is Necessary?

What is Spark Streaming?

Spark Streaming Features

Spark Streaming Workflow

How Uber Uses Streaming Data

Streaming Context & DStreams

Transformations on DStreams

WordCount Program using Spark Streaming

Describe Windowed Operators and Why it is Useful

Important Windowed Operators

Slice, Window and ReduceByWindow Operators

Stateful Operators

Perform Twitter Sentimental Analysis Using Spark Streaming



Big Data Applications for the Healthcare Industry with Apache Sqoop and Apache Solr


1.The certificate rewarded by us is proof that you have taken a big leap in Big Data domain.

2. Our Specialization is exhaustive and the certificate rewarded by us is proof that you have taken a big leap in Big Data domain.

3.Differentiate yourself The knowledge you have gained from working on projects, videos, quizzes, hands-on assessments and case studies gives you a competitive edge.

4.Share your achievement Highlight your new skills on your resume, LinkedIn, Facebook and Twitter. Tell your friends and colleagues about it.
 Course Certificate Sample

Course Creators

Course Creators

Created by team of both industry & academic experts having 20+ years of rich R&D experiance


3 reviews
(4.9 out of 5)


In Online training, you will get

  • Access to live instructor-led training as per your enrolled batch
  • Learn from industry experts over online meeting tools like zoom
  • 24x7 support by the trainers.

In Class room training, you will get

  • Intensive class room 1 to 1 training by the real time experts as per your enrolled batch
  • Learn from industry experts having rich 20+ years of experience in R&D.
  • 24x7 support by the trainers.

Top industry experts with rich 20+ years of R&D experience in mentoring students across the world.

Soft copy of the course material will be mailed to you.

In online instructor-led training, team of experts will train you with a group of our course learners for 25+ hours over online conferencing software like Zoom & Webminar. Online Classes will happen every day from Monday to Friday.

At the end, of course, you will work on a real-time project. Once you are done with the project (it will be reviewed by an expert), you will be awarded a certificate which you can share on LinkedIn.

Enrollment into course entails 30 days of free access to labs depending on date of enrollment. Can be extended based on permission.

Yes, you can renew your subscription anytime. Please choose your desired plan for the lab and make payment to renew your subscription

Mail our most dynamic & ever active director through email director@vaidehisoftware.com

Have more questions? Please contact us at director@vaidehisoftware.com