Presentation is loading. Please wait.

Presentation is loading. Please wait.

Spark System Background Matei Zaharia  [June 2010. HotCloud 2010. ]  Spark: Cluster Computing with Working Sets  [April 2012. NSDI.

Similar presentations


Presentation on theme: "Spark System Background Matei Zaharia  [June 2010. HotCloud 2010. ]  Spark: Cluster Computing with Working Sets  [April 2012. NSDI."— Presentation transcript:

1 Spark System xuejilong@gmai.com

2 Background Matei Zaharia  [June 2010. HotCloud 2010. ]  Spark: Cluster Computing with Working Sets  [April 2012. NSDI 2012. Best paper awards]  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing  [June 2012. HotCloud 2012.]  Discretized Streams: An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters (Streaming Spark)  [Nov 2013. SOSP 2013.]  Discretized Streams: Fault-Tolerant Streaming Computation at Scale

3 Resilient Distributed Datasets A Fault-Tolerant Abstraction for In-Memory Cluster Computing

4 Motivation MapReduce greatly simplified “big data” analysis on large, unreliable clusters But not good at some apps:  Iterative computation (multi stages)  Interactive ad-hoc queries Leads to some specialized frameworks:  E.g., Pregel for graph processing

5 Motivation Complex apps and interactive queries both need one thing that MapReduce lacks: Efficient primitives for data sharing In MapReduce, the only way to share data across jobs is stable storage  slow!

6 Examples iter. 1 iter. 2... Input HDFS read HDFS write HDFS read HDFS write Input query 1 query 2 query 3 result 1 result 2 result 3... HDFS read Slow due to replication and disk I/O, but necessary for fault tolerance

7 iter. 1 iter. 2... Input Goal: In-Memory Data Sharing Input query 1 query 2 query 3... one-time processing design a distributed memory abstraction that is both fault-tolerant and efficient

8 Resilient Distributed Datasets (RDDs) Restricted form of distributed shared memory »Immutable, partitioned collections of records »Can only be built through coarse-grained deterministic transformations (map, filter, join, …) Efficient fault recovery using lineage »Log one operation to apply to many elements »Recompute lost partitions on failure

9 Input query 1 query 2 query 3... RDD Recovery one-time processing iter. 1 iter. 2... Input

10 Generality of RDDs Despite their restrictions, RDDs can express surprisingly many parallel algorithms Unify many current programming models  Data flow models: MapReduce, Dryad, SQL, …  Specialized models for iterative apps: BSP (Pregel), iterative MapReduce (Haloop), bulk incremental, …

11 Memory bandwidth Network bandwidth Tradeoff Space Granularity of Updates Write Throughput Fine Coarse LowHigh K-V stores, databases, RAMCloud Best for batch workloads Best for transactional workloads HDFSRDDs

12 Spark Programming Interface DryadLINQ-like API in the Scala language Provides:  Resilient distributed datasets (RDDs)  Operations on RDDs:  transformations (build new RDDs)  actions (compute and output results)  Control of each RDD’s partitioning (layout across nodes) and persistence (storage in RAM, on disk, etc)

13 Example: Log Mining Load error messages from a log into memory, then interactively search for various patterns lines = spark.textFile(“hdfs://...”) errors = lines.filter(_.startsWith(“ERROR”)) messages = errors.map(_.split(‘\t’)(2)) messages.persist() Block 1 Block 2 Block 3 Worker Master messages.filter(_.contains(“foo”)).count messages.filter(_.contains(“bar”)).count tasks results Msgs. 1 Msgs. 2 Msgs. 3 Base RDD Transformed RDD Action Result: full-text search of Wikipedia in <1 sec (vs 20 sec for on-disk data) Result: scaled to 1 TB data in 5-7 sec (vs 170 sec for on-disk data) hdfs lines errors message

14 RDDs track the graph of transformations that built them (their lineage) to rebuild lost data E.g.: messages = textFile(...).filter(_.contains(“error”)).map(_.split(‘\t’)(2)) HadoopRDD path = hdfs://… HadoopRDD path = hdfs://… FilteredRDD func = _.contains(...) FilteredRDD func = _.contains(...) MappedRDD func = _.split(…) MappedRDD func = _.split(…) Fault Recovery HadoopRDDFilteredRDDMappedRDD

15 Fault Recovery Results Failure happens

16 Example: PageRank links & ranks repeatedly joined Reuse of Links RDD Can co-partition them (e.g. hash both on URL) to avoid shuffles Problems: generate lot of intermediate RDDs Contribs 0 join Contribs 2 Ranks 0 (url, rank) Ranks 0 (url, rank) Links (url, neighbors) Links (url, neighbors) Ranks 2 reduce Ranks 1...

17 PageRank Performance

18 Implementation Runs on Mesos [NSDI 11] to share clusters w/ Hadoop Can read from any Hadoop input source (HDFS, S3, …) Spark Hadoop MPI Mesos Node … No changes to Scala language or compiler

19 Programming Models Implemented on Spark RDDs can express many existing parallel models »MapReduce, DryadLINQ »Pregel graph processing [200 LOC] »Iterative MapReduce [200 LOC] »SQL: Hive on Spark (Shark) »Enables apps to efficiently intermix these models All are based on coarse-grained operations

20 Behavior with Insufficient RAM

21 Scalability Logistic RegressionK-Means

22 Conclusion RDDs offer a simple and efficient programming model for a broad range of applications Leverage the coarse-grained nature of many parallel algorithms for low-overhead recovery

23 Discretized Streams Fault-Tolerant Streaming Computation at Scale

24 Many important applications need to process large data streams arriving in real time  User activity statistics (e.g. Facebook’s Puma)  Spam detection  Traffic estimation  Network intrusion detection Target: large-scale apps that must run on tens- hundreds of nodes with O(1 sec) latency Motivation

25 Challenge To run at large scale, system has to be both:  Fault-tolerant: recover quickly from failures and stragglers  Cost-efficient: do not require significant hardware beyond that needed for basic processing Existing streaming systems don’t have both properties

26 Traditional Streaming Systems “Record-at-a-time” processing model  Each node has mutable state  For each record, update state & send new records mutable state node 1 node 3 input records push node 2 input records

27 Fault tolerance For These System Fault tolerance : replication, upstream backup node 1 node 3 node 2 node 1’ node 3’ node 2’ synchronization input node 1 node 3 node 2 standb y input Fast recovery, but 2x hardware cost Only need 1 standby, but slow to recover

28 Observation Batch processing models for clusters (e.g. MapReduce) provide fault tolerance efficiently  Divide job into deterministic tasks  Rerun failed/slow tasks in parallel on other nodes Idea: run a streaming computation as a series of very small, deterministic batches  Same recovery schemes at much smaller timescale  Work to make batch size as small as possible

29 Discretized Stream t = 1: t = 2: input batch operation pull …… input immutable dataset (stored reliably) immutable dataset (output or state); stored in memory without replication immutable dataset (output or state); stored in memory without replication …

30 Parallel Recovery If a node fails/straggles, recompute its dataset partitions in parallel on other nodes map input dataset output dataset Faster recovery than upstream backup, without the cost of replication

31 Speed up Batch Jobs  Prototype built on the Spark  process 2 GB/s (20M records/s) of data on 50 nodes at sub- second latency Max throughput within a given latency bound (1 or 2s)

32 Programming Model A discretized stream (D-stream) is a sequence of immutable, partitioned datasets  Specifically, resilient distributed datasets (RDDs), the storage abstraction in Spark Deterministic transformations operators produce new streams

33 API LINQ-like language-integrated API in Scala pageViews = readStream("...", "1s") ones = pageViews.map(ev => (ev.url, 1)) counts = ones.runningReduce(_ + _) t = 1: t = 2: pageViewsonescounts mapreduce... = RDD = partition

34 Evaluation  Compare with Storm

35 Evaluation  Failure recovery

36 Evaluation  Speculative Execution

37 Conclusion  D-Streams forgo traditional streaming wisdom by batching data in small timesteps  Enable efficient, new parallel recovery scheme  Let users seamlessly intermix streaming, batch and interactive queries


Download ppt "Spark System Background Matei Zaharia  [June 2010. HotCloud 2010. ]  Spark: Cluster Computing with Working Sets  [April 2012. NSDI."

Similar presentations


Ads by Google