Presentation is loading. Please wait.

Presentation is loading. Please wait.

汇报人:李旺龙 Discretized Streams: Fault-Tolerant Streaming Computation at Scale Matei Zaharia UC Berkeley AMPLab SOSP 2013 ACM Symposium on Operating Systems.

Similar presentations


Presentation on theme: "汇报人:李旺龙 Discretized Streams: Fault-Tolerant Streaming Computation at Scale Matei Zaharia UC Berkeley AMPLab SOSP 2013 ACM Symposium on Operating Systems."— Presentation transcript:

1 汇报人:李旺龙 Discretized Streams: Fault-Tolerant Streaming Computation at Scale Matei Zaharia UC Berkeley AMPLab SOSP 2013 ACM Symposium on Operating Systems Principles Streaming

2 BREAD PPT DESIGN 目录 Introduction 1 CONTENTS Background 2 Implementation 3 Experiment 4 Conclusion 5 数据库与知识工程 实验室 www.dbke.sinaapp.com Streaming

3 BREAD PPT DESIGN Introduction 数据库与知识工程 实验室 www.dbke.sinaapp.com Much of “big data” is received in real time, and is most valuable at its time of arrival  Social network may wish to detect trending conversation topics in minutes  E-Commerce website may wish to model which users visit a new page  Service operator may wish to monitor program logs to detect failures in seconds

4 BREAD PPT DESIGN Introduction 数据库与知识工程 实验室 www.dbke.sinaapp.com To enable these low-latency processing applications, there is a need for streaming computation models that scale transparently to large clusters Most distributed streaming systems, including Storm, TimeStream, MapReduce Online, and streaming databases, are based on a continuous operator model long-running, stateful operators receive each record, update internal state, and send new records.

5 BREAD PPT DESIGN Introduction 数据库与知识工程 实验室 www.dbke.sinaapp.com

6 BREAD PPT DESIGN Introduction 数据库与知识工程 实验室 www.dbke.sinaapp.com Major Problems : Faults & Stragglers Continuous operator model perform recovery through two approaches :  Replication, where there are two copies of each node costs 2× the hardware  Upstream Backup, where nodes buffer sent messages and replay them to a new copy of a failed node takes a long time to recover

7 BREAD PPT DESIGN Introduction 数据库与知识工程 实验室 www.dbke.sinaapp.com Major Problems : Faults & Stragglers Neither approach handles stragglers:  Replication, synchronization protocols to coordinate replicas slow down  Upstream Backup, treated as a failure costly recovery

8 BREAD PPT DESIGN Introduction 数据库与知识工程 实验室 www.dbke.sinaapp.com This paper presents a new stream processing model, discretized streams (D-Streams), that overcomes these challenges Instead of managing long-lived operators, the idea in D- Streams is to structure a streaming computation as a series of stateless, deterministic batch computations on small time intervals

9 BREAD PPT DESIGN Introduction 数据库与知识工程 实验室 www.dbke.sinaapp.com

10 BREAD PPT DESIGN Introduction 数据库与知识工程 实验室 www.dbke.sinaapp.com Challenge 1 : latency low We use a data structure called Resilient Distributed Datasets (RDDs) , which keeps data in memory and can recover it without replication by tracking the lineage graph of operations that were used to build it Challenge 2 : quickly recovery from faults and stragglers Parallel recovery, When a node fails, each node in the cluster works to recompute part of the lost node’s RDDs, resulting in significantly faster recovery than upstream backup without the cost of replication

11 BREAD PPT DESIGN Introduction 数据库与知识工程 实验室 www.dbke.sinaapp.com We have implemented D-Streams in a system called Spark Streaming, based on the Spark engine The system can process over 60 million records/second on 100 nodes at sub-second latency, and can recover from faults and stragglers in sub-second time.

12 BREAD PPT DESIGN Introduction 数据库与知识工程 实验室 www.dbke.sinaapp.com Spark Streaming’s per-node throughput is comparable to commercial streaming databases, while offering linear scalability to 100 nodes, and is 2–5× faster than the open source Storm and S4 systems, while offering fault recovery guarantees that they lack. D-Streams use the same processing model and data structures (RDDs) as batch jobs, a powerful advantage of our model is that streaming queries can seamlessly be combined with batch and interactive computation.

13 BREAD PPT DESIGN 目录 Introduction 1 CONTENTS Background 2 Implementation 3 Experiment 4 Conclusion 5 数据库与知识工程 实验室 www.dbke.sinaapp.com Streaming

14 BREAD PPT DESIGN Background 数据库与知识工程 实验室 www.dbke.sinaapp.com Review Spark Creator of Hadoop Doug Cutting says “the use of MapReduce engine for Big Data projects will decline, replaced by Apache Spark”

15 BREAD PPT DESIGN Background 数据库与知识工程 实验室 www.dbke.sinaapp.com Review Spark The Spark Stack Spark SQL Relational Operators MLLib Machine Learning GraphX Graph Processing Spark Streaming Real-time Spark Runtime YARN, Mesos, AWSHDFS, S3, Cassandra … Cluster ManagersData Sources A fast and general engine for large-scale data processing

16 BREAD PPT DESIGN Background 数据库与知识工程 实验室 www.dbke.sinaapp.com Review Spark Resilient distributed datasets (RDDs) that enables efficient data reuse in a broad range of applications  Fault-tolerant  Parallel data structures  Explicitly persist in memory  Control their partition  A rich set of operators jack hash arthur tom jack arthur tom hash

17 BREAD PPT DESIGN Background 数据库与知识工程 实验室 www.dbke.sinaapp.com Review Spark 1-10 2-11 1-jack 2-tom 1-(10,jack) 2-(11,tom) join

18 BREAD PPT DESIGN Background 过去 “ 人人都是产品经理 ” 这两年 “ 人人都是大数据专家 ” 再过两年 “ 人人都是电影导演 ” 数据库与知识工程 实验室 www.dbke.sinaapp.com Review MapReduce

19 BREAD PPT DESIGN data block(key,value) (key,value_list) (key,value) split mapreduce shuffle/partition The school motto analysis by MapReduce 自强 弘毅 求是 拓新 (武大) 明德 厚学 求是 创新 (华科) 自强 1 弘毅 1 求是 1 拓新 1 明德 1 厚学 1 求是 1 创新 1 求实 1 创新 1 进取 1 团结 1 严紧 1 求实 1 团结 1 创新 1 自强 1 求是 1 1 … 明德 求实 创新 进取 团结(大连理工) 严紧 求实 团结 创新(同济) 0 自强 弘毅 求是 拓新 1 明德 厚学 求是 创新 0 求实 创新 进取 团结 1 严紧 求实 团结 创新 自强 1 求是 2 … 明德 1 map shuffle reduce 弘毅 1 严紧 1 … 创新 3 弘毅 1 严紧 1 … 创新 1 1 1 数据库与知识工程 实验室 www.dbke.sinaapp.com Background Review MapReduce

20 BREAD PPT DESIGN data block(key,value) (key,value_list) (key,value) splitmapreduce shuffle/partition The school motto analysis by MapReduce 数据库与知识工程 实验室 www.dbke.sinaapp.com Background Review MapReduce val file = spark.textFile("src/main/resources/abc") val counts = file.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey( (a,b) => a+b ) counts.saveAsTextFile("src/main/resources/out")

21 BREAD PPT DESIGN Background 数据库与知识工程 实验室 www.dbke.sinaapp.com Our work targets applications that need to run on tens to hundreds of machines, and tolerate a latency of several seconds. Some examples are:  Site activity statistics  Cluster monitoring  Spam detection For these applications, we believe that the 0.5–2 second latency of D-Streams is adequate, as it is well below the timescale of the trends monitored. We purposely do not target applications with latency needs below a few hundred milliseconds, such as high-frequency trading

22 BREAD PPT DESIGN 目录 Introduction 1 CONTENTS Background 2 Implementation 3 Experiment 4 Conclusion 5 数据库与知识工程 实验室 www.dbke.sinaapp.com Streaming

23 BREAD PPT DESIGN Implementation 数据库与知识工程 实验室 www.dbke.sinaapp.com val conf = new SparkConf().setMaster("local[2]").setAppName("NetworkWordCount") val ssc = new StreamingContext(conf, Seconds(3)) val lines = ssc.socketTextStream("203.195.218.212“,10000) val words = lines.flatMap(line=> line.split(" ")) val pairs = words.map(word => (word,1)) val wordCounts = pairs.reduceByKey( (a,b) => a+b ) wordCounts.print() ssc.start() ssc.awaitTermination()

24 BREAD PPT DESIGN Implementation 数据库与知识工程 实验室 www.dbke.sinaapp.com

25 BREAD PPT DESIGN Implementation 数据库与知识工程 实验室 www.dbke.sinaapp.com The window operation groups all the records from a sliding window of past time intervals into one RDD Windowing words.window("5s") yields a D-Stream of RDDs containing the words in intervals [0, 5), [1, 6), [2, 7)…

26 BREAD PPT DESIGN Implementation 数据库与知识工程 实验室 www.dbke.sinaapp.com Incremental aggregation pairs.reduceByWindow("5s", (a, b) => a + b) pairs.reduceByWindow("5s", (a,b) => a+b, (a,b) => a-b)

27 BREAD PPT DESIGN Implementation 数据库与知识工程 实验室 www.dbke.sinaapp.com State tracking how many sessions have a bitrate above X ? One could count the active sessions from a stream of (ClientID, Event) sessions = events.track( (key, ev) => 1, // initialize function (key, st, ev) => // update function ev == Exit ? null : 1, "30s") // timeout counts = sessions.count() // a stream of ints

28 BREAD PPT DESIGN Implementation 数据库与知识工程 实验室 www.dbke.sinaapp.com Unification with Batch & Interactive Processing Spark Streaming provides several powerful features to unify streaming and batch processing  D-Streams can be combined with static RDDs computed using a standard Spark job  Users can run a D-Stream program on previous historical data using a “batch mode.”  Users run ad-hoc queries on D-Streams interactively by attaching a Scala console to their Spark Streaming program and running arbitrary Spark operations on the RDDs there counts.slice("21:00", "21:05").topK(10)

29 BREAD PPT DESIGN Implementation 数据库与知识工程 实验室 www.dbke.sinaapp.com

30 BREAD PPT DESIGN Implementation 数据库与知识工程 实验室 www.dbke.sinaapp.com Master tracks the D-Stream lineage graph and schedules tasks to compute new RDD partitions. Worker nodes that receive data, store the partitions of input and computed RDDs, and execute tasks. Client library used to send data into the system

31 BREAD PPT DESIGN Implementation 数据库与知识工程 实验室 www.dbke.sinaapp.com New data is replicated across two worker nodes before sending an acknowledgement to the client library, because D-Streams require input data to be stored reliably to recompute results. If a worker fails, the client library sends unacknowledged data to another worker.

32 BREAD PPT DESIGN Implementation 数据库与知识工程 实验室 www.dbke.sinaapp.com Spark Streaming relies on Spark’s existing batch scheduler within each timestep, and performs many of the optimizations in systems  It pipelines operators that can be grouped into a single task, such as a map followed by another map.  It places tasks based on data locality.  It controls the partitioning of RDDs to avoid shuffling data across the network

33 BREAD PPT DESIGN Implementation 数据库与知识工程 实验室 www.dbke.sinaapp.com Optimizations for Stream Processing  Network communication :asynchronous I/O  Timestep pipelining: submitting tasks from the next timestep before the current one has finished  Task Scheduling : messages size, more task  Storage layer: RDDs are immutable, they can be checkpointed over the network without blocking computations on them and slowing jobs.  Lineage cutoff : forget lineage after an RDD has been checkpointed  Master recovery : run 24/7

34 BREAD PPT DESIGN Implementation 数据库与知识工程 实验室 www.dbke.sinaapp.com Memory Management Each node’s block store manages RDD partitions in an LRU fashion User can set a maximum history timeout, after which the system will simply forget old blocks without doing disk I/O The memory required by Spark Streaming is not onerous, because the state within a computation is typically much smaller than the input data

35 BREAD PPT DESIGN Implementation 数据库与知识工程 实验室 www.dbke.sinaapp.com Parallel Recovery The system periodically checkpoints some of the state RDDs, by asynchronously replicating them to other worker nodes When a node fails, the system detects all missing RDD partitions and launches tasks to recompute them from the last checkpoint. Many tasks can be launched at the same time to compute different RDD partitions, allowing the whole cluster to partake in recovery.

36 BREAD PPT DESIGN Implementation 数据库与知识工程 实验室 www.dbke.sinaapp.com Parallel Recovery

37 BREAD PPT DESIGN Implementation 数据库与知识工程 实验室 www.dbke.sinaapp.com Parallel Recovery 恢复量 满载恢 复时间 新数据

38 BREAD PPT DESIGN Implementation 数据库与知识工程 实验室 www.dbke.sinaapp.com Straggler Mitigation D-Streams also let us mitigate stragglers like batch systems do, by running speculative backup copies of slow tasks. Such speculation would be difficult in a continuous operator system, as it would require launching a new copy of a node, synchronizd populating its state, and overtaking the slow copy. whenever a task runs more than 1.4×longer than the median task in its job stage, we mark it as slow. More refined algorithms

39 BREAD PPT DESIGN Implementation 数据库与知识工程 实验室 www.dbke.sinaapp.com Master Recovery Writing the state of the computation reliably when starting each timestep Having workers connect to a new master and report their RDD partitions to it when the old master fails Stores D-Stream metadata in HDFS graph, function objects, checkpoint time,updated rdd A 100-node cluster resuming work in 12 seconds

40 BREAD PPT DESIGN 目录 Introduction 1 CONTENTS Background 2 Implementation 3 Experiment 4 Conclusion 5 数据库与知识工程 实验室 www.dbke.sinaapp.com Streaming

41 BREAD PPT DESIGN Experiment 数据库与知识工程 实验室 www.dbke.sinaapp.com Amazon EC2 m1.xlarge 4 cores and 15 GB RAM 1 s latency target -> 500 ms input intervals 2 s latency target -> 1 s intervals 100-byte input records

42 BREAD PPT DESIGN Experiment 数据库与知识工程 实验室 www.dbke.sinaapp.com Spark Streaming’s per-node throughput of 640,000 records/s for Grep and 250,000 records/s for TopKCount on 4-core nodes Oracle CEP 1 million records/s on 16 cores StreamBase 245,000 records/s on 8 cores Esper 500,000 records/s on 4 cores While there is no reason to expect D-Streams to be slower or faster per-node, the key advantage is that Spark Streaming scales nearly linearly to 100 nodes

43 BREAD PPT DESIGN Experiment 数据库与知识工程 实验室 www.dbke.sinaapp.com S4 was limited in the number of records/second it could process per, which made it almost 10× slower than Spark and Storm. Storm is still adversely affected by smaller record sizes

44 BREAD PPT DESIGN Experiment 数据库与知识工程 实验室 www.dbke.sinaapp.com 1-second batches with input data residing in HDFS 20 MB/s/node for WordCount 80 MB/s/node for Grep checkpoint interval of 10 seconds 20 four-core nodes

45 BREAD PPT DESIGN Experiment 数据库与知识工程 实验室 www.dbke.sinaapp.com

46 BREAD PPT DESIGN Experiment 数据库与知识工程 实验室 www.dbke.sinaapp.com doubling the nodes reduces the recovery time in half

47 BREAD PPT DESIGN Experiment 数据库与知识工程 实验室 www.dbke.sinaapp.com We tried slowing down one of the nodes instead of killing it, by launching a 60-thread process that overloaded the CPU

48 BREAD PPT DESIGN 目录 Introduction 1 CONTENTS Background 2 Implementation 3 Experiment 4 Conclusion 5 数据库与知识工程 实验室 www.dbke.sinaapp.com Streaming

49 BREAD PPT DESIGN Conclusion 数据库与知识工程 实验室 www.dbke.sinaapp.com We have proposed D-Streams, a new model for distributed streaming computation that enables fast recovery from both faults and stragglers without the overhead of replication forgot conventional streaming wisdom by batching data into small timesteps support a wide range of operators and can attain high per-node throughput, linear scaling to 100 nodes, sub-second latency, and sub-second fault recovery compose seamlessly with batch and interactive queries

50 BREAD PPT DESIGN 工作进展 论文工作 数据库与知识工程 实验室 www.dbke.sinaapp.com 实习工作 手 Q 质量数据处理 流数据挖掘 Spark 调研

51 BREAD PPT DESIGN 实习工作 对手机 QQ 十多个质量指标,约 50 个事件进行监控 收发图片、收发消息、收发文件、登陆、页面切换等 群图片、讨论组图片、用户间图片等 每天收图片日志 iPhone 13 亿 Android 60 亿 约 8 万条 / 秒, 80M/ 秒, 7T/ 天 Java + Python + Hive + Pig + PostgreSQL 数据库与知识工程 实验室 www.dbke.sinaapp.com

52 BREAD PPT DESIGN 实习工作 数据样例 2014080108, 中国, 浙江省, 中国电信,unknown,unknown,10.157.89.36 2014-08-01 07:59:59.949,INFO,0S200MNJT807V3GE,5.0.0.146,beacon,1.8.0,H30-U10;Android 4.2.2,level 17,122.242.114.5,122.242.114.5,wifi,actGroupPicSmallDownV1,true, 4397,66,A2=000000000000000&A1=1085779492&A4=000000000000000&param_NetworkInfo =2&A3=000000000000000_00:0c:e7:30:13:cf&A6=20:08:ed:07:c6:8d&param_step=1_1_1_0_65; 2_-1_0_0_0;3_-1_0_0_0&serverip=61.151.234.34&A7=7638540fc92e0c2e &param_groupPolicy=1&param_uuid={2115FC55-2DA4-4A73-3570-FA89969A3C17}.jpg &param_uinType=1&A67=com.tencent.mobileqq:MSF&QQ=&A28=122.242.114.5&A27=4397&p aram_FailCode=0&A26=66&A25=true&A23=2017&param_DownMode=1&param_ProductVersio n=537039093&param_NetworkOperator= 中国移动 &param_SsoServerIp=14.17.42.23:8080&param_runStatus=0&A19=wifi&param_grpUin=213478 033&param_GatewayrIp=122.242.114.5&param_Server=61.151.234.34,2014-08-01 07:59:15,2014-08-01 07:59:59,,1085779492,Android,4.2.2,1085779492,0, 000000000000000_00:0c:e7:30:13:cf,0,,20:08:ed:07:c6:8d,7638540fc92e0c2e,,,,,,,,,,,,wifi,,,,201 7,,true,66,4397,122.242.114.5,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,com.tencent.mobileqq:MSF,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,20140801080 数据库与知识工程 实验室 www.dbke.sinaapp.com

53 BREAD PPT DESIGN 实习工作 手 Q 质量数据处理流程 灯塔灯塔 灯塔库表灯塔库表 各指标小时 统计总表 小时汇总 到天总表 收图小时表 收图天表 TDW HDFS PG 入库 出库出库 计算 发图天表 发图小时表 … … … … … 数据库与知识工程 实验室 www.dbke.sinaapp.com

54 BREAD PPT DESIGN 手 Q 质量数据处理由 SQL 转向 Pig 灯塔灯塔 灯塔库表灯塔库表 各指标小时 统计总表 小时汇总 到天总表 收图小时表 收图天表 TDW HDFS PG 入库 出库出库 计算 发图天表 发图小时表 … … … … … … 收图统计 发图统计 … TDW Pig HDFS 入库 Pig HDFS 转移 数据库与知识工程 实验室 www.dbke.sinaapp.com 实习工作

55 BREAD PPT DESIGN 从 SQL 转向 Pig 后,每天手 Q 质量数据处理成本 由 3000+ 降低至 约 1000 数据库与知识工程 实验室 www.dbke.sinaapp.com 实习工作

56 BREAD PPT DESIGN 手 Q 质量数据处理由 SQL 转向 Pig 成效 —— 原因分析 SQL 重复解析 0_1_0_12;1_1_0_372;2_1_1_2245 => col1--col15 Pig 定义一个 UDF PS : Hive 也支持 UDF 但 TDW SQL 不支持 数据库与知识工程 实验室 www.dbke.sinaapp.com 实习工作

57 BREAD PPT DESIGN Spark 现状 Hive Storm Mahout Giraph 采用 Scala 编写,支持 Python 、 Scala 、 Java Spark SQL Spark Streaming MLlib GraphX 实习工作 数据库与知识工程 实验室 www.dbke.sinaapp.com

58 BREAD PPT DESIGN 实习工作 数据库与知识工程 实验室 www.dbke.sinaapp.com 学术界对工业的理论创新 RDD vs MapReduce 不仅支持 MapReduce ,还支持 Pregel 等多范式 充分利用内存,支持 DAG ,少序列化、 IO 、网络 数据加载时, partition 可控 多级别内存持久化可控,交互式查询 基于血统的容错机制,类管道支持 速度优势明显、内存消耗大 支持 SQL 、流数据、离线数据、图数据、机器学习等 学习了基本的 Spark 使用,其他框架上手容易 不仅仅是快 Spark VS Hadoop

59 BREAD PPT DESIGN 实习工作 数据库与知识工程 实验室 www.dbke.sinaapp.com Spark 未来 ( San Francisco| June 30 - July 2, 2014Spark Summit 2014 ) Spark SQL 优化:代码生成、更快的 join 等 语言扩展:将支持 SQL92 更好的集成

60 BREAD PPT DESIGN 实习工作 数据库与知识工程 实验室 www.dbke.sinaapp.com Spark 未来 ( San Francisco| June 30 - July 2, 2014Spark Summit 2014 ) MLlib 支持的算法将由 15 个翻倍到 30 个左右,涵盖抽样、 相关性、估计、检验等描述性统计学算法以及 NMF 、 Sparse SVD 和 LDA 等机器学习算法 SparkR 上线并集成到 MLlib Streaming 将支持更多的数据源 GraphX 优化和 API 稳定 业界的贡献特性 停止 MapReduce 转向 Spark

61 BREAD PPT DESIGN 实习工作 数据库与知识工程 实验室 www.dbke.sinaapp.com Spark meetup in China 2014 年 8 月 9 日 @ 北京 1 st Intel 、亚信、 Databrick 2014 年 8 月 31 日 @ 杭州 华为、阿里巴巴 2014 年 9 月 6 日 @ 北京 2 nd traintracks.io 、微软、京东 2014 年 9 月 21 日 @ 深圳 华为、腾讯 2014 年 10 月 26 日 @ 北京 3 rd Intel 、阿里巴巴、微软、美团、 NJU 8 月 9 日, Spark-User Beijing Meetup 第一次分享活动在亚信科技总 部研发中心大厦成功举办。本次活动吸引了包括百度、新浪、京东、 Tibco 、豌豆荚、豆瓣、微博、小米、华为、爱奇艺、美团、 58 、海 星、搜狗、 CBSI 、神舟泰岳、大唐电信、 Talking Data 、安达佳、 中航信、清华大学、北京邮电大学及银行系统等 32 家不同公司、高 校、金融系统共 121 人参与。 星火燎原

62 BREAD PPT DESIGN 进展 研读了几篇关于流数据挖掘的博士论文,对于流数据 挖掘的挑战与常用解决方法有了基本认识 查阅了 Storm 这个工业界比较成熟的流数据系统的科 普知识 阅读 MOA ( Massive Online Analysis )这个流数据挖 掘工具的文档,测试了一些例子 计划 深入对流数据挖掘算法的研究 完成小论文的实验与撰写 确定毕业论文的具体题目 数据库与知识工程 实验室 www.dbke.sinaapp.com 论文工作

63 BREAD PPT DESIGN Thank You ! 数据库与知识工程 实验室 www.dbke.sinaapp.com


Download ppt "汇报人:李旺龙 Discretized Streams: Fault-Tolerant Streaming Computation at Scale Matei Zaharia UC Berkeley AMPLab SOSP 2013 ACM Symposium on Operating Systems."

Similar presentations


Ads by Google