Analytics using R

R is a free software environment for statistical computing and graphics. It compiles and runs on a wide variety of UNIX platforms, Windows and MacOS.

Duration : 16 Sessions                                                                                                           r3 Hrs

R is an integrated suite of software facilities for data manipulation, calculation and graphical display.

Highlights
  • An effective data handling and storage facility
  • A Suite of operators for calculations on arrays, in particular matrices.
  • A large, coherent, integrated collection of intermediate tools for data analysis.
  • A well-developed, simple and effective programming language which includes conditionals, loops, user-defined recursive functions and input and output facilities.
  • Graphical facilities for data analysis and display either on-screen or on hardcopy

Overview

R is designed around a true computer language, and it allows users to add additional functionality by defining new functions. Much of the system is itself written in the R dialect of S, which makes it easy for users to follow the algorithmic choices made. For computationally-intensive tasks, C, C++ and Fortran code can be linked and called at run time. Advanced users can write C code to manipulate R objects directly.

Getting Started with R

R

  • Definition of R
  • History of R
  • Why use R?
  • Installing and starting and stopping R
  • file operations and file formats
  • R Interface writing code and text editors
  • basic R syntax
  • Library

Data types  Character

  • Factor
  • Integer
  • Float
  • Date and time
  • Input/Output/Print
  • Matrix Dimension, Design, CBind, RBind
  • Objects, Vectors, Lists

Files

  • reading files
  • symbols and assignment
  • Importing data from multiple sources/formats like .csv, .txt, .xlsx, SAS and SPSS files
  • Exporting data to multiple formats
  • Handling dataframes: filtering, sorting, merging
  • PLYR package for easy data manipulation

Loops

  • sequences
  • simple loops (iteration)
  • For
  • R While loop
  • R Break & Next
  • R Repeat loop

R Functions

  • Commonly used built in functions
  • Function return value
  • Enviorment & Scope
  • Recursive function
  • Infix operator
  • Switch function
  • Grouping functions ((sapply, lapply,  apply, tapply, vapply, mapply,  aggregate)
  • Writing user defined functions
  • Installing packages

Data Manipulation

  • data structures
  • R Vector (Numeric vector, special values, numeric summeries)
  • R List
  • R Matrix
  • R Data frame
  • R Factor
  • subsetting
  • assigning to subsets

R  Object & class

  • S3 class
  • S4 Class
  • Reference Class
  • R Inheritance

R Graphs & Charts

  • R programming Bar Plot
  • R Programming Histogram
  • R Programming Pie Chart
  • R Box Plot
  • R Strip Chart
  • R programming Line Chart

R Advanced Topics

  • R Programming Plot fumctions
  • R programming Sub Plot
  • R programming Saving Plot
  • R programming Color
  • R programming 3D Plot

Statistical Analysis Using R

Introduction to Analytics & Types of Analytics

  • Evolution of Analytics
  • Definition of Analytics Scope of analytics in different industries
  • Descriptive Analysis
  • Predictive Analysis
  • Prescriptive Analysis

Parametric test

  • Z test
  • T Test
  • Two Independent Sample T Test

The One-Sample T-Test in R

  • A manual computation
    1. A data vector
    2. The functions: mean(), sd(), (pqrd)qnorm()
    3. Finding confidence intervals
    4. Finding p-values
    5. Issues with data
      • Using data stored in data frames (attach()/detach(), with())
      • Missing values
      • Cleaning up data
  • EDA graphs
    1. Histogram()
    2. Boxplot()
    3. Densityplot() and qqnorm()
  • The t.test() function
  • P-values
  • Confidence intervals
  • The power of a t test

The Two-Sample T-Tests, the Chi-Square GOF test in R

  • GUI’s
    1. Rcmdr
    2. PMG
  • Tests with two data vectors x, and y
    1. Two independed samples no equal variance assumption
    2. Two independed samples assuming equal variance
    3. Matched samples
    4. Data stored using a factor to label one of two groups; x ~ f;
    5. Boxplots for displaying more than two samples
    6. The chisq.tests
      • Goodness of fit (R square and adjusted R Square)
      • Test of homogeneity or independence

Concept of Analysis of variance

  • Types of Anova
  • One Way Anova
  • Two Way Anova

Association between Variables

  • Chi square Test for Independence
  • Formulate an analysis plan
  • Analyze sample data
  • Interpret result
  • Scatter Plot- Interpretation Of Scatter Plot
  • Correlation among variables
  • Type of Correlation
  • Partial Correlation

The Simple Linear Regression Model in R

  • The basics of the Wilkinson-Rogers notation: y ~ x
  • * y ~ x linear regression
  • Scatterplots with regression lines
  • Reading the output of lm()
  • Confidence intervals for beta_0, beta_1
  • Tests on beta_0, beta_1
  • Identifying points in a plot
  • Diagnostic plots

Bootstrapping in R, Permutation Tests

  • An introduction to boostrapping
  • The sample() function
  • A bootstrap sample
  • Forming several bootstrap samples
  • Aside for loops vs. matrices and speed
    1. Using the bootstrap
    2. An introduction to permuation tests
    3. A permutation test simulation

Cluster Analysis/ segmentation analysis

Appraches to cluster Analysis

  • Agglomerative Method
  • Divisive Method

Non Hierarchical Method K means clustering

Multiple/ Linear Regression

  • Simple Linear regression
  • Method of Least Square
  • Multiple linear regression with R
  • Simple examples, dummy explanatory variables, interpreting regression coefficients; finding a parsimonious model

Generalized Linear Models With R

  • Logistic regression with R
  • The need for a different model when the response variable is binary, the logistic transform and fitting the model to some simple examples, deviance residuals
  • Multiple regression and logistic regression as special cases of the generalized linear model
  • The Poisson model for count data.
  • The problem of overdispersion

Characterizing Time Series and the Forecasting Goal; Evaluating Predictive Accuracy and Data Partitioning

  • Concept of trend, Cyclical, Seasonal & Random Concept
  • Visualizing time series
  • Time series components
  • Forecasting vs. explanation
  • Performance evaluation
  • Naive forecasts
  • Different Approaches of Time Series
    1. Stepwise Auto Regression
    2. Exponential
    3. Winter
  • Random walk model
  • Unit Root problem
  • Correlogram
  • AR Process (auto regressive)
  • MA Process (moving average)

Analysing Longitudinal Data Using R

  • Examples of longitudinal data
  • Simple graphics for longitudinal data and simple inference using the summary measure approach
  • The ‘long form’ of longitudinal data
  • Mixed-effects models for longitudinal data

Generalized Estimating Equations

  • Modeling the correlational structure of the repeated measurements
  • The generalized estimating equation approach for non-normal response variables in longitudinal data
  • The dropout problem

Our Faculty:

  • Experience in Data Science Industry with different domains Like Banking, Retail, Healthcare
  • More than 8 years of industry experience.
  • Masters in Analytics
  • More than 5 years of teaching experience.