Presentation is loading. Please wait.

Presentation is loading. Please wait.

© 2015 IBM Corporation UNIT 2: BigData Analytics with Spark and Spark Platforms 1 Shelly Garion IBM Research -- Haifa.

Similar presentations


Presentation on theme: "© 2015 IBM Corporation UNIT 2: BigData Analytics with Spark and Spark Platforms 1 Shelly Garion IBM Research -- Haifa."— Presentation transcript:

1 © 2015 IBM Corporation UNIT 2: BigData Analytics with Spark and Spark Platforms 1 Shelly Garion IBM Research -- Haifa

2 © 2015 IBM Corporation Outline  Map/Reduce  Scala  Spark Core API  Transformations and Actions  Spark Platforms: – MLLib – Machine Learning –GraphX – Graph Processing –SQL –Streaming  What’s new? 2

3 © 2015 IBM Corporation How to Analyze BigData? 3

4 © 2015 IBM Corporation Basic Example: Word Count (Spark & Python) 4 Holden Karau, Making interactive BigData applications fast and easy, Spark Workshop April 2014, http://stanford.edu/~rezab/sparkworkshop/

5 © 2015 IBM Corporation Basic Example: Word Count (Spark & Scala) 5 Holden Karau, Making interactive BigData applications fast and easy, Spark Workshop April 2014, http://stanford.edu/~rezab/sparkworkshop/

6 © 2015 IBM Corporation Scala  Spark was originally written in Scala –Java and Python API were added later  Scala: high-level language for the JVM –Object oriented –Functional programming –Immutable –Inspired by criticism of the shortcomings of Java  Static types –Comparable in speed to Java –Type inference saves us from having to write explicit types most of the time  Interoperates with Java –Can use any Java class –Can be called from Java code 6

7 © 2015 IBM Corporation Scala vs. Java 7 Holden Karau, Making interactive BigData applications fast and easy, Spark Workshop April 2014, http://stanford.edu/~rezab/sparkworkshop/

8 © 2015 IBM Corporation Spark 8 Holden Karau, Making interactive BigData applications fast and easy, Spark Workshop April 2014, http://stanford.edu/~rezab/sparkworkshop/

9 © 2015 IBM Corporation Spark & Scala: Creating RDD 9 Holden Karau, Making interactive BigData applications fast and easy, Spark Workshop April 2014, http://stanford.edu/~rezab/sparkworkshop/ or SoftLayer object store

10 © 2015 IBM Corporation Spark & Scala: Basic Transformations 10 Holden Karau, Making interactive BigData applications fast and easy, Spark Workshop April 2014, http://stanford.edu/~rezab/sparkworkshop/

11 © 2015 IBM Corporation Spark & Scala: Basic Actions 11 Holden Karau, Making interactive BigData applications fast and easy, Spark Workshop April 2014, http://stanford.edu/~rezab/sparkworkshop/

12 © 2015 IBM Corporation Spark & Scala: Key-Value Operations 12 Holden Karau, Making interactive BigData applications fast and easy, Spark Workshop April 2014, http://stanford.edu/~rezab/sparkworkshop/

13 © 2015 IBM Corporation Example: Spark Core API 13 Aaron Davidson, A deeper understanding of Spark internals, Spark Summit July 2014, https://spark-summit.org/2014/

14 © 2015 IBM Corporation Example: Spark Core API 14 Aaron Davidson, A deeper understanding of Spark internals, Spark Summit July 2014, https://spark-summit.org/2014/

15 © 2015 IBM Corporation Example: Spark Core API 15 Aaron Davidson, A deeper understanding of Spark internals, Spark Summit July 2014, https://spark-summit.org/2014/

16 © 2015 IBM Corporation Example: Spark Core API 16 Aaron Davidson, A deeper understanding of Spark internals, Spark Summit July 2014, https://spark-summit.org/2014/ Better implementation:

17 © 2015 IBM Corporation Example: PageRank How to implement PageRank algorithm using Map/Reduce? 17 Hossein Falaki, Numerical Computing with Spark, Spark Workshop April 2014, http://stanford.edu/~rezab/sparkworkshop/

18 © 2015 IBM Corporation Spark Platform 18 Patrick Wendell, Big Data Processing, Spark Workshop April 2014, http://stanford.edu/~rezab/sparkworkshop/

19 © 2015 IBM Corporation Spark Platform: GraphX 19 Patrick Wendell, Big Data Processing, Spark Workshop April 2014, http://stanford.edu/~rezab/sparkworkshop/

20 © 2015 IBM Corporation Spark Platform: GraphX Example: PageRank PageRank is implemented using Pregel graph processing 20

21 © 2015 IBM Corporation Spark Platform: MLLib 21 Patrick Wendell, Big Data Processing, Spark Workshop April 2014, http://stanford.edu/~rezab/sparkworkshop/

22 © 2015 IBM Corporation Spark Platform: MLLib Example: K-Means Clustering Goal: Segment tweets into clusters by geolocation using Spark MLLib K-means clustering 22 https://chimpler.wordpress.com/2014/07/11/segmenting-audience-with-kmeans-and-voronoi-diagram-using-spark-and-mllib/

23 © 2015 IBM Corporation Spark Platform: MLLib Example: K-Means Clustering 23 https://chimpler.wordpress.com/2014/07/11/segmenting-audience-with-kmeans-and-voronoi-diagram-using-spark-and-mllib/

24 © 2015 IBM Corporation Spark Platform: MLLib Example: K-Means Clustering 24 https://chimpler.wordpress.com/2014/07/11/segmenting-audience-with-kmeans-and-voronoi-diagram-using-spark-and-mllib/

25 © 2015 IBM Corporation Spark Platform: Streaming 25 Patrick Wendell, Big Data Processing, Spark Workshop April 2014, http://stanford.edu/~rezab/sparkworkshop/

26 © 2015 IBM Corporation Spark Platform: Streaming Example 26

27 © 2015 IBM Corporation Spark Platform: SQL 27 Patrick Wendell, Big Data Processing, Spark Workshop April 2014, http://stanford.edu/~rezab/sparkworkshop/

28 © 2015 IBM Corporation Spark Platform: SQL & MLLib Example 28 // SVM using Stochastic Gradient Descent Xiangrui Meng, MLLib: scalable machine learning on Spark, Spark Workshop April 2014, http://stanford.edu/~rezab/sparkworkshop/

29 © 2015 IBM Corporation What’s new in 2015?  Spark R (R interface)  DataFrame – API via Spark SQL  Spark ML – support for pipelines 29 Matei Zaharia, New directions for Spark in 2015, Spark Summit East March 2015, https://spark-summit.org/east-2015/


Download ppt "© 2015 IBM Corporation UNIT 2: BigData Analytics with Spark and Spark Platforms 1 Shelly Garion IBM Research -- Haifa."

Similar presentations


Ads by Google