Sub-millisecond Stateful Stream Querying over

Slides:



Advertisements
Similar presentations
Wyatt Lloyd * Michael J. Freedman * Michael Kaminsky David G. Andersen * Princeton, Intel Labs, CMU Dont Settle for Eventual : Scalable Causal Consistency.
Advertisements

SLA-Oriented Resource Provisioning for Cloud Computing
Alejandro Llaves Javier D. Fernández Oscar Corcho Ontology Engineering Group Universidad Politécnica de Madrid Madrid, Spain OrdRing.
Store RDF Triples In A Scalable Way Liu Long & Liu Chunqiu.
Spark: Cluster Computing with Working Sets
Benchmarking Parallel Code. Benchmarking2 What are the performance characteristics of a parallel code? What should be measured?
HadoopDB An Architectural Hybrid of Map Reduce and DBMS Technologies for Analytical Workloads Presented By: Wen Zhang and Shawn Holbrook.
Fast Paths in Concurrent Programs Wen Xu, Princeton University Sanjeev Kumar, Intel Labs. Kai Li, Princeton University.
Trishul Chilimbi, Yutaka Suzue, Johnson Apacible,
A Stratified Approach for Supporting High Throughput Event Processing Applications July 2009 Geetika T. LakshmananYuri G. RabinovichOpher Etzion IBM T.
1 Improving Hash Join Performance through Prefetching _________________________________________________By SHIMIN CHEN Intel Research Pittsburgh ANASTASSIA.
Squirrel: A decentralized peer- to-peer web cache Paul Burstein 10/27/2003.
Definition of terms Definition of terms Explain business conditions driving distributed databases Explain business conditions driving distributed databases.
PARALLEL DBMS VS MAP REDUCE “MapReduce and parallel DBMSs: friends or foes?” Stonebraker, Daniel Abadi, David J Dewitt et al.
Distributed Data Stores – Facebook Presented by Ben Gooding University of Arkansas – April 21, 2015.
Presented By HaeJoon Lee Yanyan Shen, Beng Chin Ooi, Bogdan Marius Tudor National University of Singapore Wei Lu Renmin University Cang Chen Zhejiang University.
VLDB2012 Hoang Tam Vo #1, Sheng Wang #2, Divyakant Agrawal †3, Gang Chen §4, Beng Chin Ooi #5 #National University of Singapore, †University of California,
DANIEL J. ABADI, ADAM MARCUS, SAMUEL R. MADDEN, AND KATE HOLLENBACH THE VLDB JOURNAL. SW-Store: a vertically partitioned DBMS for Semantic Web data.
Storage and Retrieval of Large RDF Graph Using Hadoop and MapReduce Mohammad Farhan Husain, Pankil Doshi, Latifur Khan, Bhavani Thuraisingham University.
Daniel J. Abadi · Adam Marcus · Samuel R. Madden ·Kate Hollenbach Presenter: Vishnu Prathish Date: Oct 1 st 2013 CS 848 – Information Integration on the.
Indexing HDFS Data in PDW: Splitting the data from the index VLDB2014 WSIC、Microsoft Calvin
1 Public DAFS Storage for High Performance Computing using MPI-I/O: Design and Experience Arkady Kanevsky & Peter Corbett Network Appliance Vijay Velusamy.
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.
An Efficient Threading Model to Boost Server Performance Anupam Chanda.
State Machine Replication State Machine Replication through transparent distributed protocols State Machine Replication through a shared log.
Spark on Entropy : A Reliable & Efficient Scheduler for Low-latency Parallel Jobs in Heterogeneous Cloud Huankai Chen PhD Student at University of Kent.
Resilient Distributed Datasets A Fault-Tolerant Abstraction for In-Memory Cluster Computing Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave,
Chenning Xie+, Rong Chen+, Haibing Guan*, Binyu Zang+ and Haibo Chen+
Network Requirements for Resource Disaggregation
R-Storm: Resource Aware Scheduling in Storm
Presented by: Omar Alqahtani Fall 2016
Hathi: Durable Transactions for Memory using Flash
Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Parallel Hardware Dr. Xiao Qin Auburn.
Problem: Internet diagnostics and forensics
In quest of the operational database for real-time environmental monitoring and early warning systems Bartosz Baliś, Marian Bubak, Daniel Harezlak, Piotr.
Organizations Are Embracing New Opportunities
Institute of Parallel and Distributed Systems (IPADS)
Smart Building Solution
Introduction to Distributed Platforms
Hydra: Leveraging Functional Slicing for Efficient Distributed SDN Controllers Yiyang Chang, Ashkan Rezaei, Balajee Vamanan, Jahangir Hasan, Sanjay Rao.
International Conference on Data Engineering (ICDE 2016)
Curator: Self-Managing Storage for Enterprise Clusters
Distributed Network Traffic Feature Extraction for a Real-time IDS
Chilimbi, et al. (2014) Microsoft Research
Open Source distributed document DB for an enterprise
Pytheas: Enabling Data-Driven Quality of Experience Optimization Using Group-Based Exploration-Exploitation Junchen Jiang (CMU) Shijie Sun (Tsinghua Univ.)
Spark Presentation.
Scaling SQL with different approaches
Smart Building Solution
Apache Spark Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing Aditya Waghaye October 3, 2016 CS848 – University.
Optimized Rewriter Rules for Efficient Querying of JSON Data
Benchmarking Modern Distributed Stream Processing Systems
COS 518: Advanced Computer Systems Lecture 11 Michael Freedman
Be Fast, Cheap and in Control
StreamApprox Approximate Stream Analytics in Apache Flink
JIAXIN SHI, YOUYANG YAO RONG CHEN, HAIBO CHEN
Replication-based Fault-tolerance for Large-scale Graph Processing
Siyuan Wang, Chang Lou, Rong Chen, Haibo Chen
KISS-Tree: Smart Latch-Free In-Memory Indexing on Modern Architectures
Degree-aware Hybrid Graph Traversal on FPGA-HMC Platform
CLUSTER COMPUTING.
Discretized Streams: A Fault-Tolerant Model for Scalable Stream Processing Zaharia, et al (2012)
CARLA Buenos Aires, Argentina - Sept , 2017
Indiana University, Bloomington
Declarative Transfer Learning from Deep CNNs at Scale
with Raul Castro Fernandez* Matteo Migliavacca+ and Peter Pietzuch*
Performance And Scalability In Oracle9i And SQL Server 2000
COS 518: Advanced Computer Systems Lecture 12 Michael Freedman
A Closer Look at NFV Execution Models
Presentation transcript:

Sub-millisecond Stateful Stream Querying over Fast-evolving Linked Data Yunhao Zhang, Rong Chen, Haibo Chen Institute of Parallel and Distributed Systems (IPADS) Shanghai Jiao Tong University

Stream Query is Important Multiple data sources are continuously generating streaming data in high velocity

Stateful Stream Query Streaming Data: high velocity Stored Data: large & evolving A stateful stream query needs to integrate streaming data with stored data Real-time User Activity Durable Social Graph

Example Dataset for Stateful Stream Query IPADS System @Cornell Stored Data <Rong, creates, Feed>. 12:30 <Feed, hash_tag, SOSP>. 12:30 <Yunhao, likes, Feed>. 12:31 <Haibo, likes, Feed>. 12:40 Streaming Data

Connectivity Property of Data Linked data represents information as entities and relations between the entities Stored Data member_of Haibo IPADS Rong Yunhao Cornell Feed 12:30 Rong #SOSP# Feed create hash tag like like Like 12:31 Yunhao 12:40 Haibo Streaming Data

Example Continuous Query In the last 30 minutes, which IPADS members created feeds that are liked by other IPADS members? time Registered by user Triggered by system Canceled by user … member_of ?Y IPADS ?X ?Feed create like

Example Continuous Query In the last 30 minutes, which IPADS members created feeds that are liked by other IPADS members? Feed 12:30 Like 12:40 Rong create Haibo like Streaming Data Yunhao 12:31 Rong Feed Haibo Stored Data member_of Rong Haibo IPADS

Workload Characteristics Stateful queries integrate Stored and Streaming Data Stored data evolves by absorbing streaming data Connectivity property workload特性总结

Conventional Approach Continuous Query Streaming Data Stream Processing System Graph Store System One-shot Query Stored Data

Stream Processing System Composite Design Composite Design Example Apache Storm Stream Processing System Wukong OSDI’16 Graph Store System

Composite Design Observations Composite Design: high latency low throughput 1. Cross-system Cost ~40% execution time wasted due to data transformation and transmission 2. Inefficient Query Plan Semantic gap between the two systems impair query optimization 3. Limited Scalability Stream processing systems dedicate all resources to the improve performance of a single job

Design Overview Wukong+S Integrated Design uses a novel integrated design for stateful stream query over fast-evolving linked data Integrated Design manages streaming data and stored data in a single system Eliminate cross-system cost Global semantics for query optimization Better scalability by sharing data between the queries

Design Outline Implementing integrated design is not trivial Decisions for efficient integrated design: Hybrid Store: efficiently handle streaming data and fast-evolving stored data Stream Index: fast path to access streaming data in a certain time interval Consistent Data View: through decentralized vector timestamps and bounded snapshot scalarization

Wukong+S Architecture Continuous Query One-shot Engine Store Serve queries Hold data partition Data is partitioned and stored on multiple servers

How to gracefully integrate streaming data and stored data? Hybrid Store How to gracefully integrate streaming data and stored data? Strawman: using different graph stores according to “where from”, namely streaming and stored data Hybrid Store: using different graph stores according to “how to use”, namely timeless and timed data

Explicit Separation of Streaming Data Timed Streaming Data user-defined predicate Continuous Query Timeless One-shot Query Stored Data

Benefit of Hybrid Store No interference between timeless data and timed data Design data stores separately and optimize for different operation pattern: Timeless Data: continuous persistent store Timed Data: time-based transient store

Hybrid Store Continuous Persistent Store Timed Data Continuously absorb the timeless portion of streams Goal: support stateful continuous query and up-to-date one-shot query Timed Data

Hybrid Store Time-based Transient Store Timed Data Timed data will only be accessed by relevant continuous queries in a time interval Goal: support fast garbage collection (GC) for the timed portion of streams Timed Data

Consistent Data Snapshot How to provide consistent view over dynamic data with memory efficiency? Streaming data contains order information Early output from a stream source should always be visible before later output No order relation across data sources

Decentralized Vector Timestamp (VTS) 8:00 8:01 8:02 4 5 11 12 Time Continuous Query Source0: Source1: Stable_VTS 4 11 5 12 Local_VTS Server0 Server1 Data is partitioned and stored on multiple servers

Snapshot Scalarization One-shot Query 4 11 12 5 Server0 Key [4,11] [4,12] √ [4,10] 2 × SN: Snapshot Number VTS: Vector Timestamp SN=2:[4,10] SN=3:[5,12] SN-VTS Plan Visible snapshot Encoded in Key √ [5,12] 3 SN=4:[7,14]

Benefit of Snapshot Scalarization Memory Efficiency Injection Speed bound number of visible snapshot decouple data sources from underlying store stream query scenario Staleness of Stored Data control staleness by SN_VTS Plan

Other Designs of Wukong+S Stream index & locality-aware partitioning Data-driven query trigger One-shot query execution Fault tolerance Leveraging RDMA

Evaluation Baseline: 6 state-of-the-art systems CSPARQL-engine, Heron+Wukong, Storm+Wukong Spark Streaming, Spark Structured Streaming, Wukong/ext Platforms: a rack-scale 8-machine cluster Each: two 12-core Intel Xeon, 128GB DRAM, w/ RDMA Mellanox 56Gbps InfiniBand NIC, 40Gbps IB Switch Benchmarks: LSBench: Social Networking Benchmark w/ 3.75B initial stored data & 5 streams totally 134K tuple/second stream CityBench: Smart City Benchmark w/ 11 real-world data streams

Single Query Latency Outperform: state-of-the-art systems Wukong+S: sub-millisecond 13.7X speedup vs. Storm+Wukong 3 orders of magnitude speedup vs. Spark Streaming 219 527 712 346 2215 1422 49.03 40.77 31.14 latency (msec) 0.10 0.08 0.11 3.50 0.23 1.64 2.62 1.78 1.68

Single Query Latency Unavoidable reason for high latency Cross-system Cost for Storm+Wukong Joining large stored data (3.75B) for Spark Streaming 219 527 712 346 2215 1422 49.03 40.77 31.14 latency (msec) 0.10 0.08 0.11 3.50 0.23 1.64 2.62 1.78 1.68

Throughput of Mixed Workloads Wukong+S: ~1M queries/second on 8 nodes Mixture of 3 queries: 1.08M queries/second Add complex queries: 802K queries/second Mixture of LSBench 1-6 #machine (kilo query/second) throughput (kilo query/second) throughput #machine Mixture of LSBench 1-3

Other Evaluations Influence of different stream rate Data insertion latency Performance of one-shot queries Memory consumption Fault-tolerance overhead

Conclusion Existing systems cannot satisfy demands of stateful stream query over fast-evolving linked data : A distributed stream querying engine adopting a novel integrated desgin for stateful stream queries over fast-evolving linked data Wukong+S Achieving sub-millisecond latency and throughput exceeding one million queries per second http://ipads.se.sjtu.edu.cn/projects/wukong

Questions Thanks Wukong+S Institute of Parallel and http://ipads.se.sjtu.edu.cn/projects/wukong Institute of Parallel and Distributed Systems

Backup: LSBench w/o RDMA