Using R

Overview

R is designed around a true computer language, and it allows users to add additional functionality by defining new functions. Much of the system is itself written in the R dialect of S, which makes it easy for users to follow the algorithmic choices made. For computationally-intensive tasks, C, C++ and Fortran code can be linked and called at run time. Advanced users can write C code to manipulate R objects directly.

Getting Started with R

R

  • Definition of R
  • History of R
  • Why use R?
  • Installing and starting and stopping R
  • file operations and file formats
  • R Interface writing code and text editors
  • basic R syntax
  • Library

 

Data types  Character

  • Factor
  • Integer
  • Float
  • Date and time
  • Input/Output/Print
  • Matrix Dimension, Design, CBind, RBind
  • Objects, Vectors, Lists

 

Files

  • reading files
  • symbols and assignment
  • Importing data from multiple sources/formats like .csv, .txt, .xlsx, SAS and SPSS files
  • Exporting data to multiple formats
  • Handling dataframes: filtering, sorting, merging
  • PLYR package for easy data manipulation

 

Loops

  • sequences
  • simple loops (iteration)
  • For
  • R While loop
  • R Break & Next
  • R Repeat loop

 

R Functions

  • Commonly used built in functions
  • Function return value
  • Enviorment & Scope
  • Recursive function
  • Infix operator
  • Switch function
  • Grouping functions ((sapply, lapply,  apply, tapply, vapply, mapply,  aggregate)
  • Writing user defined functions
  • Installing packages

 

Data Manipulation

  • data structures
  • R Vector (Numeric vector, special values, numeric summeries)
  • R List
  • R Matrix
  • R Data frame
  • R Factor
  • subsetting
  • assigning to subsets

 

R  Object & class

  • S3 class
  • S4 Class
  • Reference Class
  • R Inheritance

 

R Graphs & Charts

  • R programming Bar Plot
  • R Programming Histogram
  • R Programming Pie Chart
  • R Box Plot
  • R Strip Chart
  • R programming Line Chart

 

R Advanced Topics

  • R Programming Plot fumctions
  • R programming Sub Plot
  • R programming Saving Plot
  • R programming Color
  • R programming 3D Plot

Statistical Analysis Using R

Introduction to Analytics & Types of Analytics

  • Evolution of Analytics
  • Definition of Analytics Scope of analytics in different industries
  • Descriptive Analysis
  • Predictive Analysis
  • Prescriptive Analysis

Parametric test

  • Z test
  • T Test
  • Two Independent Sample T Test

The One-Sample T-Test in R

  • A manual computation
    • A data vector
    • The functions: mean(), sd(), (pqrd)qnorm()
    • Finding confidence intervals
    • Finding p-values
    • Issues with data
      • Using data stored in data frames (attach()/detach(), with())
      • Missing values
      • Cleaning up data
  • EDA graphs
    • Histogram()
    • Boxplot()
    • Densityplot() and qqnorm()

 

  • The t.test() function
  • P-values
  • Confidence intervals
  • The power of a t test

The Two-Sample T-Tests, the Chi-Square GOF test in R

  • GUI’s
    • Rcmdr
    • PMG
  • Tests with two data vectors x, and y
    • Two independed samples no equal variance assumption
    • Two independed samples assuming equal variance
    • Matched samples
    • Data stored using a factor to label one of two groups; x ~ f;
    • Boxplots for displaying more than two samples
    • The chisq.tests
      • Goodness of fit (R square and adjusted R Square)
      • Test of homogeneity or independence

Concept of Analysis of variance

  • Types of Anova
  • One Way Anova
  • Two Way Anova

Association between Variables

  • Chi square Test for Independence
  • Formulate an analysis plan
  • Analyze sample data
  • Interpret result
  • Scatter Plot- Interpretation Of Scatter Plot
  • Correlation among variables
  • Type of Correlation
  • Partial Correlation

The Simple Linear Regression Model in R

  • The basics of the Wilkinson-Rogers notation: y ~ x
  • * y ~ x linear regression
  • Scatterplots with regression lines
  • Reading the output of lm()
  • Confidence intervals for beta_0, beta_1
  • Tests on beta_0, beta_1
  • Identifying points in a plot
  • Diagnostic plots

Bootstrapping in R, Permutation Tests

  • An introduction to boostrapping
  • The sample() function
  • A bootstrap sample
  • Forming several bootstrap samples
  • Aside for loops vs. matrices and speed
    • Using the bootstrap
    • An introduction to permuation tests
    • A permutation test simulation

Cluster Analysis/ segmentation analysis

Appraches to cluster Analysis

  • Agglomerative Method
  • Divisive Method

Non Hierarchical Method K means clustering

Multiple/ Linear Regression

  • Simple Linear regression
  • Method of Least Square
  • Multiple linear regression with R
  • Simple examples, dummy explanatory variables, interpreting regression coefficients; finding a parsimonious model

Generalized Linear Models With R

  • Logistic regression with R
  • The need for a different model when the response variable is binary, the logistic transform and fitting the model to some simple examples, deviance residuals
  • Multiple regression and logistic regression as special cases of the generalized linear model
  • The Poisson model for count data.
  • The problem of overdispersion

Characterizing Time Series and the Forecasting Goal; Evaluating Predictive Accuracy and Data Partitioning

  • Concept of trend, Cyclical, Seasonal & Random Concept
  • Visualizing time series
  • Time series components
  • Forecasting vs. explanation
  • Performance evaluation
  • Naive forecasts
  • Different Approaches of Time Series
    • Stepwise Auto Regression
    • Exponential
    • Winter
  • Random walk model
  • Unit Root problem
  • Correlogram
  • AR Process (auto regressive)
  • MA Process (moving average)

Analysing Longitudinal Data Using R

  • Examples of longitudinal data
  • Simple graphics for longitudinal data and simple inference using the summary measure approach
  • The ‘long form’ of longitudinal data
  • Mixed-effects models for longitudinal data

Generalized Estimating Equations

  • Modeling the correlational structure of the repeated measurements
  • The generalized estimating equation approach for non-normal response variables in longitudinal data
  • The dropout problem

Our Faculty:

  • Experience in Anaylitics.
  • More than 5 years of industry experience.
  • Analytical Skills.
  • Experience with different domains