Presentation is loading. Please wait.

Presentation is loading. Please wait.

Raju Subba Open Source Project: Apache Spark. Introduction Big Data Analytics Engine and it is open source Spark provides APIs in Scala, Java, Python.

Similar presentations


Presentation on theme: "Raju Subba Open Source Project: Apache Spark. Introduction Big Data Analytics Engine and it is open source Spark provides APIs in Scala, Java, Python."— Presentation transcript:

1 Raju Subba Open Source Project: Apache Spark

2 Introduction Big Data Analytics Engine and it is open source Spark provides APIs in Scala, Java, Python and supports other languages like R and many more Run in both Memory and disk when needed It is 100x faster in memory and 10x faster on disk than other software like Hadoop MapReduce Supports Batch Interactive and Iterative analytics analytics Can run on clusters managed by Hadoop YARN or Apache Mesos and run stand alone Integrates well with Hadopp ecosystem and data sources like HDFS, Amazon S3, Hive, Hbase, Cassandra etc

3 Timeline 2007: Dryad paper published by Microsoft 2009: Founded at U.C. Berkeley as class project to build a cluster management framework, which supports different kind of cluster computing system 2010: Spark became Open Sourced 2013: Became Apache project named Apache spark 2015: Spark version 1.4 released

4 Why use Apache Spark? Speed: Run programs very fast. Ease of Use: Write applications quickly in Java, Scala, Python, R. Generality: Combine SQL, streaming, and complex analytics. Runs Everywhere: Sparks runs on Hadoop, Mesos, stand alone or in the cloud. It can access diverse data sources including HDFS, Cassandra, Hbase,and S3.

5 Component of spark SparkSQL: SparkSQL is a Spark module for structured data processing Spark Streaming: It makes it easy to build scalable fault-tolerent streaming applications. Mllib: It is a machine learning library that provides various algorithms designed to scale out on a cluster for classification, regression, clustering, collaborative filtering. GraphX: It is a library for manipulating graphs and perfroming praph- parallel operations.

6 Who uses spark

7 Any questions and comments ??????

8 Reference: P. Madhukar.(2015, Jan 2). History of Apache Spark: Journey from Academia to Industry. http://blog.madhukaraphatak.com/history-of- spark/http://blog.madhukaraphatak.com/history-of- spark/ R. Ostowski. Introduction to Apache Spark with Examples and Use Cases https://www.toptal.com/spark/introduction-to-apache-sparkhttps://www.toptal.com/spark/introduction-to-apache-spark


Download ppt "Raju Subba Open Source Project: Apache Spark. Introduction Big Data Analytics Engine and it is open source Spark provides APIs in Scala, Java, Python."

Similar presentations


Ads by Google