SAS Analytics




Statistical Analysis Using SAS

Introduction to Analytics & Types of Analytics

  • Evolution of Analytics
  • Definition of Analytics Scope of analytics in different industries
  • Descriptive Analysis
  • Predictive Analysis
  • Prescriptive Analysis

Parametric test

  • Z test
  • T Test
  • Two Independent Sample T Test

The One-Sample T-Test in SAS

  • A manual computation
    • A data vector
    • The functions: mean(), sd(), (pqrd)qnorm()
    • Finding confidence intervals
    • Finding p-values
    • Issues with data
      • Using data stored in data frames (attach()/detach(), with())
      • Missing values
      • Cleaning up data
  • EDA graphs
    • Histogram()
    • Boxplot()
    • Densityplot() and qqnorm()
  • The t.test() function
  • P-values
  • Confidence intervals
  • The power of a t test

The Two-Sample T-Tests, the Chi-Square GOF test in SAS

  • GUI’s
    • Rcmdr
    • PMG
  • Tests with two data vectors x, and y
    • Two independed samples no equal variance assumption
    • Two independed samples assuming equal variance
    • Matched samples
    • Data stored using a factor to label one of two groups; x ~ f;
    • Boxplots for displaying more than two samples
    • The chisq.tests
      • Goodness of fit (R square and adjusted R Square)
      • Test of homogeneity or independence

Concept of Analysis of variance

  • Types of Anova
  • One Way Anova
  • Two Way Anova

Association between Variables

  • Chi square Test for Independence
  • Formulate an analysis plan
  • Analyze sample data
  • Interpret result
  • Scatter Plot- Interpretation Of Scatter Plot
  • Correlation among variables
  • Type of Correlation
  • Partial Correlation

The Simple Linear Regression Model in SAS

  • The basics of the Wilkinson-Rogers notation: y ~ x
  • * y ~ x linear regression
  • Scatterplots with regression lines
  • Reading the output of lm()
  • Confidence intervals for beta_0, beta_1
  • Tests on beta_0, beta_1
  • Identifying points in a plot
  • Diagnostic plots

Bootstrapping in SAS, Permutation Tests

  • An introduction to boostrapping
  • The sample() function
  • A bootstrap sample
  • Forming several bootstrap samples
  • Aside for loops vs. matrices and speed
    • Using the bootstrap
    • An introduction to permuation tests
    • A permutation test simulation

Cluster Analysis/ segmentation analysis

Appraches to cluster Analysis

  • Agglomerative Method
  • Divisive Method

Non Hierarchical Method K means clustering

Multiple/ Linear Regression

  • Simple Linear regression
  • Method of Least Square
  • Multiple linear regression with SAS
  • Simple examples, dummy explanatory variables, interpreting regression coefficients; finding a parsimonious model

Generalized Linear Models With SAS

  • Logistic regression with SAS
  • The need for a different model when the response variable is binary, the logistic transform and fitting the model to some simple examples, deviance residuals
  • Multiple regression and logistic regression as special cases of the generalized linear model
  • The Poisson model for count data.
  • The problem of overdispersion

Characterizing Time Series and the Forecasting Goal; Evaluating Predictive Accuracy and Data Partitioning

  • Concept of trend, Cyclical, Seasonal & Random Concept
  • Visualizing time series
  • Time series components
  • Forecasting vs. explanation
  • Performance evaluation
  • Naive forecasts
  • Different Approaches of Time Series
    • Stepwise Auto Regression
    • Exponential
    • Winter
  • Random walk model
  • Unit Root problem
  • Correlogram
  • AR Process (auto regressive)
  • MA Process (moving average)

Analysing Longitudinal Data Using SAS

  • Examples of longitudinal data
  • Simple graphics for longitudinal data and simple inference using the summary measure approach
  • The ‘long form’ of longitudinal data
  • Mixed-effects models for longitudinal data

Generalized Estimating Equations

  • Modeling the correlational structure of the repeated measurements
  • The generalized estimating equation approach for non-normal response variables in longitudinal data
  • The dropout problem