Presentation is loading. Please wait.

Presentation is loading. Please wait.

Akshun Gupta, Karthik Bala

Similar presentations


Presentation on theme: "Akshun Gupta, Karthik Bala"— Presentation transcript:

1 Akshun Gupta, Karthik Bala
StreamScope, S-Store Akshun Gupta, Karthik Bala

2 What is Stream Processing?
“Stream processing is designed to analyze and act on real-time streaming data, using “continuous queries” Infoq - Stream Processing Difference between Batch Processing: “Ability to process potentially infinite input events continuously with delays in seconds and minutes, rather than processing a static data set in hours and days. “ StreamS paper

3 Applications of Stream Processing
Twitter uses stream processing to show trending tweets Algorithmic Trading or High Frequency Trading Surveillance using sensors Realtime Analytics And many more!

4 Stream Processing: Challenges
Continuous infinite amounts of data Need to deal with failures and planned maintenance Latency sensitive Need for high throughput All of this makes stream applications hard to develop, debug, and deploy!

5 Presented by Akshun Gupta
StreamScope: Continuous Reliable Distributed Processing of Big Data Streams Microsoft Research Presented by Akshun Gupta

6 StreamScope - General Information
Paper came out of Microsoft Research Has been deployed in a shared 20k server production cluster at Microsoft Runs Microsoft’s core online advertisement service - created to handle business critical applications - supposed to give strong guarantees

7 Motivation Want to design a streaming computation engine to
Execute an event exactly once with server failures and message loss. Handle large amounts of load Scale well Travel back in time Continue operation during maintenance Make distributed streaming programming easy

8 Key Contributions StreamS shows a streaming computation engine does not need to unnaturally convert streaming computation to a series of mini-batch jobs. Eg, Apache Spark Introduction of abstractions, rVertex and rStream, to simplify creating, debugging, and understanding data computation engines. Proven system - deployed in production running business critical applications while coping with failures and variations.

9 StreamS Abstractions - DAG
Execution of program modeled as a DAG Vertex performs local computation Instreams and OutStreams Each stream is modeled as an infinite sequence of events, each with a continuously incremented sequence number. STREAMS determines the degree of parallelism for each stage (marked in parentheses) based on data rate and computation cost. The execution of a vertex is tracked through a series of snapshots, where each snapshot is a triplet containing the current sequence numbers of its input streams, the current sequence numbers of its output streams, and its current state.

10 StreamS Abstractions - rStream
Abstraction to decouple upstream and downstream vertices with failure recovery mechanisms. Maintains sequence of events and sequence numbers. Provides API calls Write, Read, GarbageCollect Maintains the following properties: Uniqueness: Unique value for each sequence number Validity: If a Read happens for seq, a Write for seq is guaranteed to have happened Reliability: For any Write(seq, e), Read(seq) will return e

11 StreamS Abstractions - rVertex
Vertex can save state with snapshots If Vertex fails, it can be restarted with Load(s). s is a saved snapshot. rVertex guarantees determinism Running Execute() on the same snapshot will produce the same result Determinism ensures correctness. Requires user defined functions to behave deterministically

12 Architecture The program is compiled into a streaming DAG
the program is first converted into a logical plan (DAG) of STREAMS runtime operators, which include temporal joins, window aggregates, and user-defined functions; (2) the STREAMS optimizer then evaluates various plans, choosing the one with the lowest estimated cost based on available resource, data statistics such as the incoming rate, and an internal cost model; and (3) a physical plan (DAG) is finally created by mapping a logical vertex into an appropriate number of physical vertices for parallel execution and scaling, with code generated for each vertex to be deployed in a cluster and process input events continuously at runtime. manager that is responsible for: (1) scheduling vertices to and establishing channels (edges) in the DAG among different machines; (2) monitoring progress and tracking snapshots; (3) providing fault tolerance by detecting failures/stragglers and initiating recovery actions.

13 Failure Recovery Strategies
Checkpoint-based recovery Not performant when vertices hold large internal state Replay-based recovery Rebuilding state using the most recent window like 5 minutes Deterministic execution property comes in handy Might have to reload large window but don’t have to checkpoint as frequently Replication-based recovery Multiple instances of the same vertex can be run at the same time Determinism will ensure output of different machines but of the same vertex to be the same Overhead of extra resources

14 Evaluation Detect fraud clicks of online transactions 3220 Vertices
9.5 TB of events processed 180 TB I/O 21.3 TB aggregate memory usage 7 day evaluation period

15 Evaluation - Failure Impact on Latency*
A: Failed machines had high in-memory state → Latency increased for small number of failures B: Large number of failures but vertices did not have high in-memory state C: Unscheduled mass outage of machines → significant increase in latency D: scheduled maintenance → graceful transition and no significant increase in latency *End-to-end latency

16 Evaluation - Scalability
X Axis: Degree of Parallelism Y Axis: Maximum throughput sustained under a 1-second latency bound.

17 Comparing Failure Recovery Strategies
No effect on latency when using Replication strategy Longer latency delay for Replay because state in checkpoint is more condensed (common case) Company uses 25% replay based but others uses checkpointing

18 Comments Paper does not compare their streaming system with other streaming systems like Spark, Storm, etc. No outlook given on whether this system will be provided as PaaS or their plan on making it open source. Restriction on deterministic applications significant

19 Key Takeaways Introduction of abstractions rStream and rVertex
A new way to design streaming systems Decoupling upstream and downstream vertices Valuable engineering advice Good comparison between failover strategies Checkpointing Replay Based Replication Based Proven system under production load Business critical application 20k+ nodes used Scaling is robust

20 S-Store Presented by Karthik Bala

21 Streaming Meets Transaction Processing
Streaming: handle large amounts of data, but... Transaction Processing: ACID guarantees, but... Challenge: Build a streaming system which provides shared mutable state

22 Guarantees Transactions are stored procedures with input parameters
-”Recall that it is the data that is sent to the query in streaming systems in contrast to the standard DBMS model of sending the query to the data” OLTP Transaction - can access public tables, “pull based” Streaming transaction - can access public tables, windows, streams, “push based” Define acid Use cases

23 Contributions Start with traditional OLTP database system (H-Store) and add streaming transactions streams and windows represented as time-varying state triggers to enable push-based processing over such state a streaming scheduler that ensures correct transaction ordering a variant on H-Store’s recovery scheme that ensures exactly-once processing for streams

24 Transaction Execution
s: stream b: atomic batch w: window (difference?) T: transaction Atomic batches must be processed as individual units Window: time based or tuple based Difference: external vs internal Border vs interior transactions - output is input for the next

25 Transaction Execution
ACID: Wait till T commits to make its writes public Valid orderings? For an ordering to be correct Must follow the topological ordering of the dataflow graph (relaxed if graph has multiple orderings) All batches must be processed in order Atomic batches must be processed as individual units Window: time based or tuple based Difference: external vs internal Border vs interior transactions - output is input for the next

26 Hybrid Schedules, Nested Transactions
Any OLTP transaction can interleave between any pair of streaming transactions (in a valid TE schedule) Nested transactions : two or more transactions which execute like a block No transaction can interleave between nested transactions

27 H- Store Architecture Commit Log, Checkpointing Layers

28 S-Store Extensions Streams: time varying H-Store tables Triggers
Persistent, recoverable Triggers Attached to tables, activate when tuples added PE/EE triggers Window Tables

29 Fault Tolerance Goal: Exactly once processing
Even if a failure happens, state must be as if transaction T occurred exactly once! Weak recovery: correct but nondeterministic results

30 Recovery Strong Recovery Weak Recovery
Use H-Store’s commit log from latest snapshot + disable PE triggers (why?) Weak Recovery Apply Snapshot Start at the inputs of dataflow graph (cached) Leave PE triggers as is! Need interior transactions that were not logged to be re-executed Finally, replay the log

31 Performance and Evaluation
H store - fast but incorrect! We make h store correct, it is slow Esper and storm (streaming systems) better, but have bottleneck of accessing db (full round trip wait) - no push semantics - only a single transaction request at a time

32 Performance and Evaluation (2)
EE triggers: bottleneck due to round trip time PE triggers: bottleneck in h store due to serialization( can only do one at a time), more round trip times (ALL THE WAY TO CLIENT!)

33 Performance and Evaluation (3)
SP = stored procedure

34 Key Takeaways Ordering Push-based processing (triggers!)
Weak vs. strong recovery ACID guarantees

35 Discussion S-Store: >1 node?! S-Store evaluation methods okay?
Implementation of different failure strategies for each vertex not given in the paper. No details on how the optimizer works - how does it know the cost of running the application before deploying? Job Manager fault tolerance not talked about in the paper. If not replicated, it is a single point of failure Lack of custom DAG creation - probably because they have optimized for their own workload and applications


Download ppt "Akshun Gupta, Karthik Bala"

Similar presentations


Ads by Google