About The Course

The Post Graduate Diploma program in APACHE HADOOP & BIG DATA ENGINEERING is an intensive six months job oriented programme. This course is targeted towards engineers and IT professionals or any participant with mathematical background who wish to start their carrier into the domain of big data as a data scientist. The course aims to groom the students to enable them to work on current technology scenarios as well as prepare them to keep pace with the changing face of technology and the requirements of the growing IT industry. The course curriculum has been designed keeping in view the emerging trends in big data engineering as well as contemporary and futuristic human resource requirements of the IT industry. The entire course syllabus, course ware, teaching methodology and the course delivery have been derived from the rich research and development background from VAIDEHI SOFTWARE TECHNOLOGIES. The depth of the course is unique in the industry covering a wide spectrum of requirements of the IT industry.

Apache Hadoop is an open source software framework for storage and large scale processing of data-sets on clusters of commodity hardware. Hadoop is an Apache top-level project being built and used by a global community of contributors and users. It is licensed under the Apache License 2.0.

Hadoop was created by Doug Cutting and Mike Cafarella in 2005. It was originally developed to support distribution for the Nutch search engine project. Doug, who was working at Yahoo! at the time and is now Chief Architect of Cloudera, named the project after his son's toy elephant. Cutting's son was 2 years old at the time and just beginning to talk. He called his beloved stuffed yellow elephant "Hadoop Big data is a term that describes the large volume of data – both structured and unstructured – that inundates a business on a day-to-day basis. But it’s not the amount of data that’s important. It’s what organizations do with the data that matters. Big data can be analyzed for insights that lead to better decisions and strategic business moves.

Ribbon

Learn from industry experts with live instructor-led training

Projects & Lab

Apply the skills you learn to solve real-world problems.

Certificate

Highlight your new skills on your resume or LinkedIn.

1:1 Mentoring

Get guidance from industry leaders and professionals.

Best-in-class Support

24×7 support and forum access to answer all your queries throughout your learning journey.

Certifications

Compatible to Hortonworks Certified Developer (HDPCD)

Enrollment

ON-LINE MODE INSTRUCTOR LED TRAINING

17 Sept 2018
Online Instructor Based Training
6 Months
50,000/ 714 70,000

CLASS ROOM BASED INSTRUCTOR LED TRAINING

2018-08-23
Mon to Fri (24 weeks)
10 AM - 12 PM
6 Months
80,000/ 1142 1,20,000

2018-09-17
Mon to Fri (24 weeks)
10 AM - 12 PM
6 Months
80,000/ 1142 1,20,000



Course Learning Outcomes

After completion of course students will be able to acquire the following skill

01

model and implement efficient big data solutions for various application areas using appropriately selected algorithms and data structures.

02

analyze methods and algorithms, to compare and evaluate them with respect to time and space requirements, and make appropriate design choices when solving real-world problems.

03

motivate and explain trade-offs in big data processing technique design and analysis in written and oral form.

04

explain the Big Data Fundamentals, including the evolution of Big Data, the characteristics of Big Data and the challenges introduced.

05

apply non-relational databases, the techniques for storing and processing large volumes of structured and unstructured data, as well as streaming data.

06

apply the novel architectures and platforms introduced for Big data, in particular Hadoop and MapReduce.

Learning Path


  • Introduction to Apache Hadoop
    • High Availability
    • Scaling
    • Advantages and Challenges
  • Introduction to Big Data
    • What is Big data
    • Big Data opportunities,Challenges
    • Characteristics of Big data
  • Introduction to HDFS
    • Hadoop Distributed File System
    • Comparing Hadoop & SQL
    • Industries using Hadoop
    • Data Locality
    • Hadoop Architecture
    • Using the Hadoop single node image (Clone)
  • Hadoop Distributed File System (HDFS)
    • HDFS Design & Concepts
    • Blocks, Name nodes and Data nodes
    • HDFS High-Availability and HDFS Federation
    • Hadoop DFS The Command-Line Interface
    • Basic File System Operations
    • Anatomy of File Read,File Write
    • Block Placement Policy and Modes
    • More detailed explanation about Configuration files
    • Metadata, FS image, Edit log, Secondary Name Node and Safe Mode
    • How to add New Data Node dynamically,decommission a Data Node dynamically (Without stopping cluster)
    • FSCK Utility. (Block report)
    • How to override default configuration at system level and Programming level
    • HDFS Federation
    • ZOOKEEPER Leader Election Algorithm
    • Exercise and small use case on HDFS
  • Map Reduce
    • Map Reduce Functional Programming Basics
    • Map and Reduce Basics
    • How Map Reduce Works
    • Anatomy of a Map Reduce Job Run
    • Legacy Architecture ->Job Submission, Job Initialization, Task Assignment, Task Execution, Progress and Status Updates
    • Job Completion, Failures
    • Shuffling and Sorting
    • Splits, Record reader, Partition, Types of partitions & Combiner
    • Optimization Techniques -> Speculative Execution, JVM Reuse and No. Slots
    • Types of Schedulers and Counters
    • Comparisons between Old and New API at code and Architecture Level
    • Getting the data from RDBMS into HDFS using Custom data types
    • Distributed Cache and Hadoop Streaming (Python, Ruby and R)
    • YARN
    • Sequential Files and Map Files
    • Enabling Compression Codec’s
    • Map side Join with distributed Cache
    • Types of I/O Formats: Multiple outputs, NLINEinputformat
    • Handling small files using CombineFileInputFormat
  • Map Reduce Programming – Java Programming
    • Hands on “Word Count” in Map Reduce in standalone and Pseudo distribution Mode
    • Sorting files using Hadoop Configuration API discussion
    • Emulating “grep” for searching inside a file in Hadoop
    • DBInput Format
    • Job Dependency API discussion
    • Input Format API discussion,Split API discussion
    • Custom Data type creation in Hadoop
  • NOSQL
    • ACID in RDBMS and BASE in NoSQL
    • CAP Theorem and Types of Consistency
    • Types of NoSQL Databases in detail
    • Columnar Databases in Detail (HBASE and CASSANDRA)
    • TTL, Bloom Filters and Compensatio
  • HBase
    • HBase Installation, Concepts
    • HBase Data Model and Comparison between RDBMS and NOSQL
    • Master & Region Servers
    • HBase Operations (DDL and DML) through Shell and Programming and HBase Architecture
    • Catalog Tables
    • Block Cache and sharding
    • SPLITS
    • DATA Modeling (Sequential, Salted, Promoted and Random Keys)
    • JAVA API’s and Rest Interface
    • Client Side Buffering and Process 1 million records using Client side Buffering
    • HBase Counters
    • Enabling Replication and HBase RAW Scans
    • HBase Filters
    • Bulk Loading and Co processors (Endpoints and Observers with programs)
    • Real world use case consisting of HDFS,MR and HBASE
  • Hive
    • Hive Installation, Introduction and Architecture
    • Hive Services, Hive Shell, Hive Server and Hive Web Interface (HWI)
    • Meta store, Hive QL
    • OLTP vs. OLAP
    • Working with Tables
    • Primitive data types and complex data types
    • Working with Partitions
    • User Defined Functions
    • Hive Bucketed Tables and Sampling
    • External partitioned tables, Map the data to the partition in the table, Writing the output of one query to another table, Multiple inserts
    • Dynamic Partition
    • Differences between ORDER BY, DISTRIBUTE BY and SORT BY
    • Bucketing and Sorted Bucketing with Dynamic partition
    • RC File
    • INDEXES and VIEWS
    • MAPSIDE JOINS
    • Compression on hive tables and Migrating Hive tables
    • Dynamic substation of Hive and Different ways of running Hive
    • How to enable Update in HIVE
    • Log Analysis on Hive
    • Access HBASE tables using Hive
    • Hands on Exercises
  • Pig
    • Pig Installation
    • Execution Types
    • Grunt Shell
    • Pig Latin
    • Data Processing
    • Schema on read
    • Primitive data types and complex data types
    • Tuple schema, BAG Schema and MAP Schema
    • Loading and Storing
    • Filtering, Grouping and Joining
    • Debugging commands (Illustrate and Explain)
    • Validations,Type casting in PIG
    • Working with Functions
    • User Defined Functions
    • Types of JOINS in pig and Replicated Join in detail
    • SPLITS and Multiquery execution
    • Error Handling, FLATTEN and ORDER BY
    • Parameter Substitution
    • Nested For Each
    • User Defined Functions, Dynamic Invokers and Macros
    • How to access HBASE using PIG, Load and Write JSON DATA using PIG
    • Piggy Bank
    • Hands on Exercises
  • SQOOP
    • Sqoop Installation
    • Import Data.(Full table, Only Subset, Target Directory, protecting Password, file format other than CSV, Compressing, Control Parallelism, All tables Import)
    • Incremental Import(Import only New data, Last Imported data, storing Password in Metastore, Sharing Metastore between Sqoop Clients)
    • Free Form Query Import
    • Export data to RDBMS,HIVE and HBASE
    • Hands on Exercises
  • HCatalog
    • HCatalog Installation
    • Introduction to HCatalog
    • About Hcatalog with PIG,HIVE and MR
    • Hands on Exercises
  • Flume
    • Flume Installation
    • Introduction to Flume
    • Flume Agents: Sources, Channels and Sinks
    • Log User information using Java program in to HDFS using LOG4J and Avro Source, Tail Source
    • Log User information using Java program in to HBASE using LOG4J and Avro Source, Tail Source
    • Flume Commands
    • Use case of Flume: Flume the data from twitter in to HDFS and HBASE. Do some analysis using HIVE and PIG
  • More Ecosystems
    • HUE.(Hortonworks and Cloudera)
  • Oozie
    • Workflow (Action, Start, Action, End, Kill, Join and Fork), Schedulers, Coordinators and Bundles.,to show how to schedule Sqoop Job, Hive, MR and PIG
    • Real world Use case which will find the top websites used by users of certain ages and will be scheduled to run for every one hour
    • Zoo Keeper
    • HBASE Integration with HIVE and PIG
    • Phoenix
    • Proof of concept (POC)
  • APACHE SPARK
    • Spark Overview
    • Linking with Spark, Initializing Spark
    • Using the Shell
    • Resilient Distributed Datasets (RDDs)
    • Parallelized Collections
    • External Datasets
    • RDD Operations
    • Basics, Passing Functions to Spark
    • Working with Key-Value Pairs
    • Transformations
    • Actions
    • RDD Persistence
    • Which Storage Level to Choose?
    • Removing Data
    • Shared Variables
    • Broadcast Variables
    • Accumulators
    • Deploying to a Cluster
    • Unit Testing
    • Migrating from pre-1.0 Versions of Spark


Projects

  • Big Data Applications for the Healthcare Industry with Apache Sqoop and Apache Solr

Certificate

Earn your certificate

The certificate rewarded by us is proof that you have taken a big leap in Big Data domain.


Our Specialization is exhaustive and the certificate rewarded by us is proof that you have taken a big leap in Big Data domain.


Differentiate yourself

The knowledge you have gained from working on projects, videos, quizzes, hands-on assessments and case studies gives you a competitive edge.


Share your achievement

Highlight your new skills on your resume, LinkedIn, Facebook and Twitter. Tell your friends and colleagues about it.



Trainers

  • Created by team of both industry & academic experts having 20+ years of rich R&D experiance


Eligibility Criteria

  • Any Graduate with mathematical background/ Engineering or equivalent (e.g. BE / BTech / 4-year BSc / AMIE, etc.) in Computer Science / IT / Electronics / Electrical / Mechanical / CIVIL / Electronics / Computer Science/ IT / BCA / MCA / MSC / MBA or related areas.
  • Post Graduate in Engineering Sciences (e.g. MSc in Computer Science, IT, Electronics, etc
  • Graduate in any Discipline of Engineering or equivalent Sciences (e.g. MSc in Computer Science, IT, Electronics, etc
  • MCA/MCM
  • Post Graduate in Physics/ Computational Sciences/ Mathematics or allied areas.
  • Post Graduate in Management with graduation degree in Science/ IT/ Computers
  • The candidates must have secured a minimum of 50% marks in their qualifying examination.


Course Fee Structure



ONLINE TRAINING FEE for PG Diploma courses

Price : Rs 50,000/-( Including Tax) / 714

Duration : 6 Months Mon - Fri 1 Hr

CLASS ROOM TRAINING FEE for PG Diploma courses

Price : 80,000/-( Including Tax) / 1142

Duration : 6 Months Mon - Fri 1 Hr


Financial Aid

Financial Aid

Selected students can contact the Admissions Office for assistance in applying for loans after receiving the offer of admission. Our education loan lending partners include HDFC, Axis Bank, Tata Capital, Capital First and many more.


Placement Assistance


Up on successful completion of PGDP course & the participants who are very serious about their carrier & who clear the IT company standard certification exam @ our campus we are offering 100% placement assistance with our very strong placement team. Vaidehi Software, will use its strong HR corporate network to help candidates in the program make the transition to career to IT industry. For all qualifying candidates the Placement assistance will be extended till they get placed even after post completion of program.

Note :-

  • Only candidates who pass the respective IT standard certification exam will be eligible for outsourcing for client location or for placement assistance.
  • Placement is strictly depends up on the candidate dedication, efforts, commitment, performance in the internal tests, skills.
  • Vaidehi Software strives hard to place its students by conducting rigorous placement activities like mock interviews, soft skills from day one of the course.


Reviews

40 reviews
(4.9 out of 5)

FAQ


  • 1. What is the difference between online training and class room learning?

    In Online training, you will get

    • Access to live instructor-led training as per your enrolled batch
    • Learn from industry experts over online meeting tools like zoom
    • 24x7 support by the trainers.

    In Class room training, you will get

    • Intensive class room 1 to 1 training by the real time experts as per your enrolled batch
    • Learn from industry experts having rich 20+ years of experience in R&D.
    • 24x7 support by the trainers.

  • 2. What are the prerequisites and requirements for this course?

    No prerequisites

  • 3. Who will be the course instructors?

    Top industry experts with rich 20+ years of R&D experience in mentoring students across the world.

  • 4. What is the validity of course material?

    Soft copy of the course material will be mailed to you.

  • 5. How does online instructor-led training work?

    In online instructor-led training, team of experts will train you with a group of our course learners for 25+ hours over online conferencing software like Zoom & Webminar. Online Classes will happen every day from Monday to Friday.

  • 6. What is the certification process?

    At the end, of course, you will work on a real-time project. Once you are done with the project (it will be reviewed by an expert), you will be awarded a certificate which you can share on LinkedIn.

  • 7. How will be the practical or hands-on be conducted?

    Enrollment into course entails 30 days of free access to labs depending on date of enrollment. Can be extended based on permission.

  • 8. Can I renew my lab subscription?

    Yes, you can renew your subscription anytime. Please choose your desired plan for the lab and make payment to renew your subscription

  • 9. For instant help whom to contact directly?

    Mail our most dynamic & ever active director through email director@vaidehisoftware.com