Apache Hadoop

hadoop1Big Data and Hadoop Ecosystem Introduction

  • What is Big Data?
  • Limitations of Big Data
  • Why Hadoop?
  • Problems with Traditional Systems
  • Core Hadoop Components
  • Introduction to Hadoop Ecosystem

 HDFS : Hadoop Distributed File System

  • HDFS Features & Design Goals
  • HDFS Operation Principle
  • Data Locality, Rack Awareness
  • Writing and Reading Files
  • NameNode Memory Considerations
  • Secondary NameNode – FSImage & EditLog
  • Data Node – Heartbeats & Block Report

Getting Started with Hadoop & HDFS Lab1

  • Setting up VM Hadoop Environment
  • Using the NameNode WebUI
  • Using the Hadoop FileShell
  • Hadoop FileShell Commands
  • HDFS Federation & HA
  • HDFS 1.0 and 2.0

 MapReduce v1 — Lab2

  • What is MapReduce?
  • MapReduce Concepts
  • JobTracker and TaskTracker
  • Hadoop MapReduce example
  • Steps of Hadoop MapReduce
  • MapReduce Framework
  • Basics of MapReduce Programming
  • Using the MapReduce Web UI

 HBase — Lab3 

  • HBase Introduction & History
  • Who uses HBase & when to Use
  • HBase Data Model & Families
  • HBase Components
  • Row Distribution b/w Region Server
  • HBase Master

Planning the Hadoop Cluster & Cloudera Manager 

  • General Planning Considerations
  • Choosing the Right Hardware
  • Network Considerations
  • Configuring Nodes
  • Cloudera Manager Features
  • Cloudera Enterprise Pictorial View

 Deploying a Multi-Node Hadoop Cluster — Lab4

  • Deployment Types
  • Planning 3 Node Hadoop Cluster
  • Installing Cloudera Manager
  • Installing Multi Node Hadoop Cluster Using Cloudera Manager
  • Hadoop Configuration in Cluster environment
  • Specifying the Hadoop Configuration
  • Performing Basic Administration Tasks Using Cloudera Manager

 Data In and Out of Hadoop & Mock Interview1

  • Introduction of Flume and Sqoop
  • Ingesting Data from External Sources with Flume
  • Ingesting Data from Relational Databases with Sqoop
  • Flume Model & Scalability
  • Mock Interview1

Advance Pig  — Lab6 

  • Introduction to Pig
  • Comparison b/w Pig and SQL
  • Installing and Configuring PIG
  • Running PIG — Hands On
  • PIG Latin
  • User Defined Functions (UDFs)

  Advance Hive —  Lab7 

  • What is HIVE
  • Serialization/De-Serialization
  • Hive File Formats & Data Model
  • System Architecture and Components
  • Hive Query Language
  • Hive: Installation, Running and Programming
  • Difference b/w Hive and Pig

 Advanced MapReduce & YARN 

  • Writing a MapReduce Program in Java
  • Writing a MapReduce Program Using Streaming
  • Custom Data Types
  • Input & Output Formats
  • Combiners and Partitioners
  • Introduction to YARN
  • YARN Architecture

Hadoop Ecosystem 

  •  Introduction to Zokeeper(Challenges faced in distributed applications , Zookeeper: Goals and Uses , Zookeeper: Entities, Data Model, Services)
  • Introduction to Chukwa (Chukwa Architecture , Chukwa Agent )
  • Introduction to Apache Oozie (Apache Oozie Workflow )
  • Introduction to Mahout
  • Introduction to Apache Cassandra (Why Apache Cassandra )
  • Introduction to Hue
  • Introduction to Impala

Managing & Scheduling Jobs, Cluster Maintenance & Logging 

  • Managing Running Jobs
  • Scheduling Hadoop Jobs
  • Configuring the Fair Scheduler
  • Checking HDFS Status
  • Adding and Removing Cluster Nodes
  • Rebalancing the Cluster
  • Monitoring Hadoop Clusters
  • Common Troubleshooting Hadoop Clusters

 Hadoop Project & Mock Interview2 

  • Project Overview – Web Analytics
  • Classification, Clustering and Collaborative Filtering
  • Get Hands on with Live Data
  • Solution code discussion
  • Introduction to Hadoop community
  • Next Steps: Hadoop Certification Path
  • Mock Interview2