BIG DATA SCIENCE

DS

About The Program
Big data science is one program to look for at least in this decade because according to the survey it’s the best of what we have in IT when it comes to the scope of growth or lucrative packages. Every individual is generating data and in the world of IOTs even the trees are generating data. Sounds amazing? Well yes this much of data needs to be processed and made asset for the future in order to give answers. Answers to business problems, answers to scientific research, answers to what to do next etc. This program will give you complete exposure to the latest big data tools for storing, processing and analyzing large amount of data.
“If we have data, let’s look at data. If all we have are opinions,
let’s go with mine.”
Program Description
Big Data science includes concepts of Big Data with Apache Spark & Scala for data processing, R as well as Python for analytics using modeling techniques which are used for cluster analysis, time series analysis, market basket analysis and regression. Machine learning algorithms are also added to the course keeping in mind the requirement of industry to include artificial intelligence into analytics. To work with different domains like finance, marketing, ecommerce, banking, insurance, aviation and even games also.
Program Preview
SESSION 1

  • Big Data & Hadoop Introduction
    • Understand what big data is.
    • Limitation of existing systems.
    • Hadoop ecosystem
    • Understanding Hadoop 2.x component.
    • Performing read and write operations.
    • RAC awareness.
    • Installation of Hadoop in virtual machine.
  • Hadoop Architecture & Hadoop Distributed File System
    • Hadoop Architecture.
    • Horizontal scaling.
    • Movement of only code and not data over network.
    • High availability
    • Scalability: Multiple Name Node.
    • HDFS Commands
    • Hadoop configuration files.
    • Password less SSH

SESSION 2

  • Hadoop MapReduce Framework
    • How MapReduce is different from traditional way.
    • Hadoop 2.x MapReduce architecture and component.
    • Understand processing part i.e. YARN
    • MapReduce concept
    • Run the basic MapReducer program.
    • Understanding Input Splits
    • MapReduce job submission flow.
    • Performance improvement using combiners.
    • Partitioners.
    • MapReduce as a whole

SESSION 3

  • MapReduce Advanced
    • Understanding counters
    • Map Side Join
    • Reduce side Join
    • MR units.
    • Custom input formats.
    • Sequence file format

SESSION 4

  • PIG
    • Pig how PIG came into picture.
    • Where PIG is a good fit.
    • Where PIG should not be used.
    • Conceptual data flow.
    • Different versions of PIG execution.
    • Data models in PIG.
    • PIG relational operators
    • UDF in PIG : Customized function in Java
    • Describe, explain and illustrate.
    • Demo

SESSION 5

  • Hive
    • Why and how HIVE came into picture.
    • How is this different from PIG.
    • Hive architecture and component.
    • Where and where not HIVE to be used.
    • Data type is HIVE.
    • Perform basic HIVE operations.
    • Joins in Hive
    • Create UDF for Hive
    • Dynamic Partitioning
    • Performance Tuning

SESSION 6

  • HBase
    • Understand NoSQL Database
    • Understand CAP theorem.
    • Comparison of RDBMS and HBASE
    • HBASE Architecture.
    • How updated is implement on top of HDFS.
    • Data model and physical storage in HBASE.
    • Execute basic HBASE command
    • Data loading techniques in HBASE.
    • Understanding Zookeeper

SESSION 7

  • Flume, Sqoop, OOZIE
    • Implement Flume & Sqoop
    • Understand Oozie.
    • Schedule job in Oozie.
    • Oozie workflow.
SESSION 8

  • Scala
    • Introduction to Scala
    • Scala in other Frameworks
    • Scale REPL
    • Basic Scala Operations
    • Functions and Procedures
    • Collections
    • Control Structures
    • OOPS and Functional Programming in Scala

SESSION 9

  • Scala Continued
    • Classes in Scala
    • Getters and Setters
    • Constructors and Singletons
    • Companion Objects
    • Inheritance in Scala
    • Traits and Layered Traits
    • Functional Programming in Scala

SESSION 10

  • Spark
    • Spark Introduction
    • Implement Spark operations on Spark Shell
    • Understand Spark and its Ecosystem
    • Spark Common operations

SESSION 11

  • Playing with RDD
    • Learn how to work in RDD in Spark
    • Understand the role of Spark RDD
  • Spark Streaming & Spark SQL
    • Understand Spark SQL Architecture
    • Learn Spark Streaming API
SESSION 12

  • Getting Started with Python
    • Introduction to Python
    • Python Basics
      • Variable
      • Conditional Statements
      • Loops
  • Data Structures
    • List
      • Introduction
      • Accessing List & Working with Lists
      • Operations
      • Functions & Methods
    • Dictionaries
      • Introduction
      • Accessing Values
      • Working with Dictionaries
      • Properties
      • Functions

SESSION 13

  • Data Handling & String
    • Reading Data into Memory
    • Working with Strings
    • Catching exception to deal with bad data
    • Writing the data back again
  • Python & Pandas
    • Using Pandas, the Python data analysis library
    • Series & Data Frame
    • Grouping, Aggregating & Applying
    • Merging & Joining

SESSION 14

  • Programming with Spark
    • Spark Transformation
    • Spark Action
    • Python Spark Programming Examples
  • Spark SQL
    • Spark SQL Overview
    • Python – Spark SQL Examples
  • Spark Streaming
    • Streaming with Apache Spark
    • Python Spark Streaming examples

SESSION 15

  • Data Analytics using R
    • What is Data Analytics
    • Who uses R and how.
    • What is R
    • Why to use R
    • R products
    • Get Started with R
  • Introduction to R Programming
    • Different Data Types in R and when to use which one
    • Function in R
    • Various subsetting methods.
    • Summarizing the data using str(), class(), nrow(), ncol() and length()
    • Use functions like head() and tail() for inspecting data
    • Indulge into a class activity to summarize the data.

SESSION 16

  • Data Manipulation in R
    • Know the Various steps involved in Data Cleaning
    • Functions used for data inspection
    • Tacking the problem faced during data cleaning
    • How and when to use functions like grep, grepl, sub, gsub, regexpr, gregexpr, strsplit
    • How to coerce the data
    • Apply family functions.
  • Data Import Technique in R
    • Import data spreadsheets and text files into R
    • Install packages used for data import
    • Connect to RDBMS from R using ODBC and basic sql queries in R
    • Perform basic web scrapping.

SESSION 17

  • Data Exploration in R
    • What is Data Exploration
    • Data exploring using Summary(), mean(), var(), sd(), unique()
    • Using Hmisc package and using summarize, aggregate function
    • Learning correlation and cor() function and visualizing the same using corrgram
    • Visualizing data using plot and its different flavours
    • Boxplots
    • Dist function
  • Data Mining
    • Clustering Technique
    • Introduction to data mining
    • Understand machine learning
    • Supervised and unsupervised machine learning algos
    • K means clustering
    • Association rules mining and sentiment analysis
    • Understanding associate rule mining
    • Understanding sentiment analysis

SESSION 18

  • Introduction to Business Analytics
    • Relevance in industry and need of the hour
    • Types of analytics – Marketing, Risk, Operations, etc
    • Future of analytics and critical requirement
  • Fundamental of Statistics
    • Basic Statistics; descriptive and summary
    • Inferential statistics
    • Statistical tests
  • Data Prep & Reduction Techniques
    • Need for data preparation
    • Outlier treatment
    • Flat-liners treatment
    • Missing values treatment
    • Factor Analysis

SESSION 19

  • Basic Analytics
    • Statistics Basics Introduction to Data Analytics and Statistical Techniques
    • Types of Variables, measures of central tendency and dispersion
    • Variable Distributions and Probability Distributions
    • Normal Distribution and Properties
    • Central Limit Theorem and Application
    • Hypothesis Testing Null/Alternative Hypothesis formulation
    • One Sample, two sample (Paired and Independent) T/Z Test
    • P Value Interpretation
    • Analysis of Variance (ANOVA)
    • Chi Square Test
    • Non Parametric Tests (Kruskal-Wallis, Mann-Whitney, KS)
    • Correlation
  • Customer Segmentation
    • Basic Clustering
    • Deciles analysis
    • Cluster analysis (K-means and Hierarchical)
    • Cluster evaluation and profiling
    • Interpretation of results

SESSION 20

  • Regression Modeling
    • Basic of Regression Analysis
    • Linear regression
    • Logistic regression
    • Interpretation of results
    • Multivariate Regression modeling

SESSION 21

  • Predictive Modeling & Forecasting
    • Time Series Analysis
    • Cross-sell and Up-sell opportunities and modeling
    • Churn prediction models and management
  • Credit Risk Modeling
    • Credit risk scoring model using logistic regression
    • Credit risk scoring model using CHAID
    • Credit risk score, its interpretation and implementation

SESSION 22

  • Real Time Data Science
    • Data Science uses Case
    • Spark Data Science
  • Data Mining
    • Decision Trees & Random Forest
    • Understand what is Decision Tree
    • Algos for Decision Tree
    • Greedy approach : Entropy and information gain.
    • A perfect decision tree
    • Understand the concept of random forest
    • How random forest work
    • Features of random forest

SESSION 23

  • Machine Learning with Spark
    • Spark Machine Learning
    • Spark Use Case : Linear Regression
    • Decision Trees
    • Spark Use Case : Decision Trees Classification
    • Principal Component Analysis
    • Random Forests Classification
    • Python Use Case : Random Forests & PCA
    • Spark Machine Learning Algorithm

SESSION 24

  • Project Description
    • E-commerce
    • Healthcare
    • Telecom
Program Highlights
Instructor-led Session
Instructor led training is the live and interactive training by industry professional who have multiple years experience in different domains. This training will include the practical approach and assessments on regular basis. Trainers will guide you how to follow an appropriate approach.
Real-life Case Study
Case studies are the live projects you will be working on where every candidate is provided with the data from different domains like Banking, Healthcare, ecommerce and retail etc. So that he can perform analysis and generate reports for the same. By doing these case studies every candidate will get a practical experience of how analytics is done for decision making.
Assignments
For every session our experts have designed some assignments for the candidates so that they could clear their conceptual understanding of each topic. Most of these assignments are prepared keeping in mind the interview training n techniques so that it can also help candidate during placements.
Lifetime Access
Every session is recorded and the recording is shared with the candidates for the life time so that they can refer to them in case of any doubts or need to repeat the sessions and these recordings will be in high quality mp4 format shared via email to candidates.
24 X 7 Expert Support
Apart from the training our support team is available 24×7 to provide answers to your queries about this training and resolve problem statements made by you in any of the analytical tool by industry experts.our team of client support is ready to serve you at any time of the day.
Certification
Palin certified data science trainees are working in many MNCs acknowledged by industry experts and globally appreciated. This 3 months certification program can yield you a career into business analytics which you may dream of. A certificate from Palin is what you earn by clearing rounds of assessments.
FAQ’s
Big data science is the sexiest job of 21 st century and it’s the high time because every sector is in shortage of data scientists. It has opportunities in many domains like ecommerce, healthcare, banking, social media, search engines etc. It has many profiles for fresher as well as experienced professionals and has profiles to the higher hierarchy.
Palin analytics focuses on industry specific course content and our course material is designed by industry experts with minimum of 8 years experience in relevant profiles. We do provide PPTs for each session as well as assignments and reference codes so that you get prepared each time you enter the session and can maximize your chances of understanding concepts better.
We believe in providing you the training only by industry working professionals with ample amount experience in relevant profiles so that they can impart theoretical as well as current scenarios in the industry to you which makes it easy for you to get hands on experience on the profile and not only the tool knowledge.
You can directly register through our website with a registration link or you can directly submit your fee via check or in cash at Palin gurgaon head office. We also accept fee through payTM and payzap.
Data science is made up of three constituents programming, statistics and business. If you are comfortable with any two, it will be easy to adapt to the third one. Apart from this you need a bachelor’s degree and masters will be an add in for you and you are good to go. Best of luck!!
This training program is a completely lab based practical training and we have divided it in hours specified for each tool and you will complete a case study as part of your project in last so that you can get a hands on experience in big data analytics. By doing that you will get to work on industry specific data and you can easily claim to be a big data expert.
We do provide physical classroom and online meeting training and both are live and instructor led and there is nothing like recorded sessions in it. As time is a constraint for many professionals and those who are far from gurgaon and don’t prefer travelling our online sessions turned out to be a savior as they get to learn at home and receive the whole session recorded back up as well so that they don’t even to maintain notes and can access their previous classes anytime anywhere.
Fresher are getting lot of opportunities these days into data science as they offer scalability and thinking out of the box which is required by most of the organizations. As a fresher you can expect a minimum of 4-6 LPA though right candidates have no bar.
We are working as consultancy as well as training organization. We have our aluminize working in different sectors in market analytics giants and you can be their successors by following the path they did. We provide you a platform where you can learn and prove your worth to the organizations.
As a trainee you will do case studies in 3 different domains at the end of your training. These domains may be healthcare, ecommerce, banking, finance etc. These case studies will give an industry exposure and prepare you for the upcoming challenges in analytics.