BIG Data Analytics

SparkPredictive Analytics On Big Data

Why Spark? Explain Spark and Hadoop Distributed File System 

  • What is Spark
  • Comparison with Hadoop
  • Components of Spark

Spark Components, Common Spark Algorithms-Iterative Algorithms, Graph Analysis, Machine Learning 

  • Apache Spark- Introduction, Consistency, Availability, Partition
  • Unified Stack Spark
  • Spark Components
  • Comparison with Hadoop – Scalding example, mahout, storm, graph

Running Spark on a Cluster, Writing Spark Applications using Python, Java, Scala

  • Explain python example
  • Show installing a spark
  • Explain driver program
  • Explaining spark context with example
  • Define weakly typed variable
  • Combine scala and java seamlessly.
  • Explain concurrency and distribution.
  • Explain what is trait.
  • Explain higher order function with example.
  • Define OFI scheduler.
  • Advantages of Spark
  • Example of Lamda using spark
  • Explain Mapreduce with example

4-RDD and its operation

  • Difference between RISC and CISC
  • Define Apache Mesos
  • Cartesian product between two RDD
  • Define count
  • Define Filter
  • Define Fold
  • Define API Operations
  • Define Factors

Spark, Hadoop, and the Enterprise Data Centre, Common Spark Algorithms 

  • How hadoop cluster is different from spark
  • Define writing data
  • Explain sequence file and its usefulness
  • Define protocol buffers
  • Define text file, CSV, Object Files and File System
  • Define sparse metrics
  • Explain RDD and Compression
  • Explain data stores and its usefulness

Spark Streaming

  • Define Elastic Search
  • Explain Streaming and its usefulness
  • Apache bookeeper
  • Define Dstream
  • Define mapreduce word count
  • Explain Paraquet
  • Scala ORM
  • Define Mlib
  • Explain multi graphix and its usefulness
  • Define property graph

Spark Persistence in Spark

  • Persistence
  • Motivation
  • Example
  • Transformation
  • Scala and Python
  • Examples – K-means
  • Latent Dirichlet Allocation (LDA)

Broadcast and accumulator

  • Motivation
  • Broadcast Variables
  • Example: Join
  • Alternative if one table is small
  • Better version with broadcast
  • How to create a Broadcast
  • Accumulators motivation
  • Example: Join
  • Accumulator Rules
  • Custom accumulators
  • Another common use
  • Creating an accumulator using spark context object

Spark SQL and RD

  • Introduction
  • Spark SQL main capabilities
  • Spark SQL usage diagram
  • Spark SQL
  • Important topics in Spark SQL- Data frames
  • Twitter language analysis

Operations/Accumulators/Traits 

  • How parallelism Takes place
  • The Master Parameter
  • Join Operations Example
  • Accumulators
  • Traits

Scheduling/Partitioning

  • Task Scheduling/ distribution
  • Scheduling Around Applications
  • Static Partitioning
  • Dynamic Sharing
  • Scheduling Within An Application
  • Fair Scheduling
  • High Availability Of Spark Master
  • Standby Masters With Zookeeper
  • Single Node Recovery With Local File System
  • High Order Functions

Capacity Planning in Spark

  • Practicals : Creating Maps, Transformations
  • capacity planning in spark
  • concurrency in java
  • concurrency in scala

Log Analysis

  • Array Buffers
  • Compact Buffer
  • Protocol Buffer
  • Log Analysis With Spark
  • First Log Analyzers In Spark

Introduction of Scala

Scala Overview  

Pattern Matching

  • Advantages of Scala
  • REPL (Read Evaluate print loop)
  • Language Features
  • Type Interface
  • Higher order function
  • Option
  • Pattern Matching
  • Collection
  • Currying
  • Traits
  • Application Space

Executing the Scala code

  • Uses of scala interpreter
  • Example of static object timer in scala
  • Testing of String equality in scala
  • Implicit classes in scala with examples.
  • Recursion in scala
  • Currying in scala with examples.
  • Classes in scala

Classes concept in Scala

  • Constructor
  • Constructor overloading
  • Properties
  • Abstract classes
  • Type hierarchy in Scala
  • Object equality
  • Val and var methods

Case classes and pattern matching

  • Sealed traits
  • Case classes
  • Constant pattern in case classes
  • Wild card pattrn
  • Variable pattern
  • Constructor pattern
  • Tuple pattern

Concepts of traits with example

  • Java equivalents
  • Advantages of traits
  • Avoiding boilerplate code
  • Linearization of traits
  • Modelling a real world example

Scala java Interoperability

  • How traits are implemented in scala and java
  • How extending multiple traits is handled

Scala collections 

  • Classification of scala collections
  • Iterable
  • Iterator and iterable
  • List sequence example in scala

Mutable collections vs. Immutable collections

  • Array in scala
  • List in scala
  • Difference between list and list buffer
  • Array buffer
  • Queue in scala
  • Dequeue in scala
  • Mutable queue in scala
  • Stacks in scala
  • Sets and maps in scala
  • Tuples

Use Case bobsrockets package

  • Different import types
  • Selective imports
  • Testing-Assertions
  • Scala test case- scala test fun. Suite
  • Junit test in scala
  • Interface for Junit via Junit 3 suite in scala test
  • SBT
  • Directory structure for packaging scala application