Presentation is loading. Please wait.

Presentation is loading. Please wait.

Consistent Streaming Through Time: A Vision for Event Stream Processing by Jonathan Goldstein (speaker), Roger Barga, Mohamed Ali, and Mingsheng Hong Microsoft.

Similar presentations


Presentation on theme: "Consistent Streaming Through Time: A Vision for Event Stream Processing by Jonathan Goldstein (speaker), Roger Barga, Mohamed Ali, and Mingsheng Hong Microsoft."— Presentation transcript:

1 Consistent Streaming Through Time: A Vision for Event Stream Processing by Jonathan Goldstein (speaker), Roger Barga, Mohamed Ali, and Mingsheng Hong Microsoft Research

2 Are StreamSQL semantics ok? Suppose we want to monitor the bandwidth of a device: Suppose we want to monitor the bandwidth of a device: We create an input stream which has one field: bytes sent We create an input stream which has one field: bytes sent We create an output stream which computes a windowed sum We create an output stream which computes a windowed sum What are the StreamSQL semantics when the system gets overloaded (strange question to ask)? What are the StreamSQL semantics when the system gets overloaded (strange question to ask)? Either events must be dropped, or they must be queued at the receiver or sender for later processing Either events must be dropped, or they must be queued at the receiver or sender for later processing Since window semantics are based on system time (StreamSQL server time), if the device has constant bandwidth, apparent bandwidth will decrease! Since window semantics are based on system time (StreamSQL server time), if the device has constant bandwidth, apparent bandwidth will decrease! In StreamSQL, the user has no reasonable way of knowing! In StreamSQL, the user has no reasonable way of knowing! Conclusion: Something is deeply wrong with the use of time in StreamSQL query semantics! Conclusion: Something is deeply wrong with the use of time in StreamSQL query semantics!

3 What’s in the paper? Laundry list of CEDR features either unsupported or poorly supported in existing streaming systems (Read the paper) Laundry list of CEDR features either unsupported or poorly supported in existing streaming systems (Read the paper) Some of these features come from event processing Some of these features come from event processing Some come from specific scenarios which we believe to be important Some come from specific scenarios which we believe to be important These features are described formally through a query language description These features are described formally through a query language description

4 What’s in the talk (and the paper)? Formal definitions of CEDR streams and operator semantics Formal definitions of CEDR streams and operator semantics Provides a clear and intuitive framework for discussing subtle semantic issues Provides a clear and intuitive framework for discussing subtle semantic issues Formalization of materialized view update semantics in standing queries and discuss why they are inadequate in isolation Formalization of materialized view update semantics in standing queries and discuss why they are inadequate in isolation Definition of a non-view update compliant operator which can express a very wide range of seemingly disparate streaming features Definition of a non-view update compliant operator which can express a very wide range of seemingly disparate streaming features A myriad of window types, the separation of inserts and deletes, etc… A myriad of window types, the separation of inserts and deletes, etc… We discuss theoretically both the expression and correct handling of both data delivered out of order and data retraction We discuss theoretically both the expression and correct handling of both data delivered out of order and data retraction Different formal notions of correctness lead to different consistency levels and associated performance tradeoffs Different formal notions of correctness lead to different consistency levels and associated performance tradeoffs

5 What is a stream and a standing query? A stream is a (possibly infinite) collection of events, where each event contains: A stream is a (possibly infinite) collection of events, where each event contains: A payload (P) A payload (P) A key which uniquely identifies the event (K) A key which uniquely identifies the event (K) An interval of time (application) for which the payload is valid [V s, V e ) An interval of time (application) for which the payload is valid [V s, V e ) A time at which it arrives at a listener (C for CEDR time) A time at which it arrives at a listener (C for CEDR time) A standing query is an operator graph, where each operator takes 0 or more input streams and produces 0 or more output streams A standing query is an operator graph, where each operator takes 0 or more input streams and produces 0 or more output streams KVsVs VeVe CP K1151… K2233… Acknowledgement: This is inspired by and built on Rick Snodgrass’s temporal work

6 What properties do operators have? All operators should be well behaved: All operators should be well behaved: Definition 6: A CEDR operator O is well behaved iff for all (combinations of) inputs to O which are logically equivalent to infinity, O’s outputs are also logically equivalent to infinity Definition 6: A CEDR operator O is well behaved iff for all (combinations of) inputs to O which are logically equivalent to infinity, O’s outputs are also logically equivalent to infinity Any well behaved operator, when given 2 identical sets of input streams, except for CEDR time, should produce identical sets of output streams, except for CEDR time Any well behaved operator, when given 2 identical sets of input streams, except for CEDR time, should produce identical sets of output streams, except for CEDR time Query semantics are independent of CEDR time Query semantics are independent of CEDR time KVsVs VeVe CP K1151… K2233… KVsVs VeVe CP K1153… K2231…

7 What properties do operators have? Some operators are also view update compliant: Some operators are also view update compliant: Definition 11: A unary CEDR operator O is view update compliant iff for all R, S s.t. *(R) and *(S) are identical, *(O(R)) and *(O(S)) are also identical Definition 11: A unary CEDR operator O is view update compliant iff for all R, S s.t. *(R) and *(S) are identical, *(O(R)) and *(O(S)) are also identical If we interpret the stream as describing a changing relation where each row’s lifetime is specified by valid time, then: If we interpret the stream as describing a changing relation where each row’s lifetime is specified by valid time, then: A view update compliant operator produces snapshot identical output for snapshot identical input A view update compliant operator produces snapshot identical output for snapshot identical input KVsVs VeVe CP K1151P1 KVsVs VeVe CP K1122P1 K2253P1

8 What are our operators? We may now happily use all our favorite relational operators: We may now happily use all our favorite relational operators: Definition 9: Join ⋈ f(P1,P2)(S1, S2): Definition 9: Join ⋈ f(P1,P2)(S1, S2): ⋈ θ(P1,P2)(S1, S2) = {(Vs, Ve, (e1.Payload concantenated with e2.Payload)) | e1  E(S1), e2  E(S2), Vs=max{ e1.Vs, e2.Vs}, Ve=min{ e1.Ve, e2.Ve}, where Vs < Ve, and θ(e1.Payload, e2.Payload)} ⋈ θ(P1,P2)(S1, S2) = {(Vs, Ve, (e1.Payload concantenated with e2.Payload)) | e1  E(S1), e2  E(S2), Vs=max{ e1.Vs, e2.Vs}, Ve=min{ e1.Ve, e2.Ve}, where Vs < Ve, and θ(e1.Payload, e2.Payload)} These operators’ output streams describe the changing contents of a materialized view computed over the changing input relation(s) described by the input streams These operators’ output streams describe the changing contents of a materialized view computed over the changing input relation(s) described by the input streams

9 Non-view update compliant operators Moving window – all output valid end times are set to their valid start times plus the window size Moving window – all output valid end times are set to their valid start times plus the window size insert separation (CQL) – all output valid end times are set to infinity insert separation (CQL) – all output valid end times are set to infinity The semantics of these operations plus many more can be easily captured using AlterLifetime: The semantics of these operations plus many more can be easily captured using AlterLifetime: Definition 12: AlterLifetime Π fvs, fΔ (S) Definition 12: AlterLifetime Π fvs, fΔ (S) Π fvs, fΔ (S)={(|f Vs (e)|, |f Vs (e)| + |f Δ (e)|, e.Payload) | e  E(S}} Π fvs, fΔ (S)={(|f Vs (e)|, |f Vs (e)| + |f Δ (e)|, e.Payload) | e  E(S}} Allows the lifetime of input events to be recomputed Allows the lifetime of input events to be recomputed It is not view update compliant, but it is well behaved It is not view update compliant, but it is well behaved

10 But is this implementable? KVsVs VeVe CP K K21535 Avg(P) – The usual average operator in materialized view update compliant form Avg(P) – The usual average operator in materialized view update compliant form But how could CEDR know it needed to wait for K2 (to produce output) when it saw K1? But how could CEDR know it needed to wait for K2 (to produce output) when it saw K1? It couldn’t have without waiting indefinitely or without some external guarantee It couldn’t have without waiting indefinitely or without some external guarantee Input: Correct Output: KVsVs VeVe CP K1 -- 1…? K212…5 K325…10 K456…15 K56  …?

11 But is this implementable? We need the ability to retract previously output results in the stream: We need the ability to retract previously output results in the stream: KVsVs VeVe CP K115…1 -K112…1 K227…2 KVsVs VeVe CP K112…1 K227…2 is logically equivalent to:

12 But is this implementable? Our real definition of well behavedness: Our real definition of well behavedness: Any well behaved operator, when given logically equivalent sets of input streams, produces logically equivalent sets of output streams Any well behaved operator, when given logically equivalent sets of input streams, produces logically equivalent sets of output streams Avg may now fully retract incorrect previous output and issue new correct output for the appropriate time period Avg may now fully retract incorrect previous output and issue new correct output for the appropriate time period We can denote operator semantics in a very clean manner even in a system with arbitrarily out of order data We can denote operator semantics in a very clean manner even in a system with arbitrarily out of order data The use of retractions to handle out of order data induces a spectrum of formally defined consistency levels for operators The use of retractions to handle out of order data induces a spectrum of formally defined consistency levels for operators These levels expose interesting tradeoffs between various aspects of performance and correctness (much more in the paper) These levels expose interesting tradeoffs between various aspects of performance and correctness (much more in the paper)

13 How do current systems cope: How do current systems cope: Wait until we’re sure we have all data that affects our results up to a point in time (High consistency) Wait until we’re sure we have all data that affects our results up to a point in time (High consistency) High latency High latency Requires application and network guarantee Requires application and network guarantee Requires high memory Requires high memory Absolutely correct answers Absolutely correct answers Useful for standing queries that result in some expensive form of corrective or examination action: Useful for standing queries that result in some expensive form of corrective or examination action: A human must examine something because some aggregation (avg) or negation based alert tripped A human must examine something because some aggregation (avg) or negation based alert tripped Provide an answer quickly as of the current time, but ignore late arriving data (Low Consistency) Provide an answer quickly as of the current time, but ignore late arriving data (Low Consistency) Low latency Low latency No application or network guarantee required No application or network guarantee required Low memory Low memory Sacrifices answer correctness Sacrifices answer correctness Useful in applications which are unable to provide guarantees about data arrival timeliness and where exact answers aren’t required: Useful in applications which are unable to provide guarantees about data arrival timeliness and where exact answers aren’t required: E.g. Aggregations in internet scale monitoring E.g. Aggregations in internet scale monitoring Imperfections in Event Streaming

14 With retractions: With retractions: Compute our output early in an optimistic fashion and retract later if necessary (Middle Consistency) Compute our output early in an optimistic fashion and retract later if necessary (Middle Consistency) Low latency Low latency Doesn’t require application and network guarantees Doesn’t require application and network guarantees High memory requirements: equal to the high consistency case if we have guarantees High memory requirements: equal to the high consistency case if we have guarantees May produce more output May produce more output Useful in situations where we don’t want to block, but where we want eventual correctness Useful in situations where we don’t want to block, but where we want eventual correctness Stock ticker data example. We want to compute real time info about stock data, but compensate when a correction is issued. Stock ticker data example. We want to compute real time info about stock data, but compensate when a correction is issued. Shared expressions between two queries, one running at the high level of consistency and one at the low Shared expressions between two queries, one running at the high level of consistency and one at the low Imperfections in Event Streaming

15 Infinite Spectrum of Consistency Levels Blocking Slow & cautious Small & less correct Big & more correct Strong consistency Memory Quick & optimistic M B Middle consistency Weak consistency B = How long (at most) does the query block M = How long (at most) is the query required to remember data


Download ppt "Consistent Streaming Through Time: A Vision for Event Stream Processing by Jonathan Goldstein (speaker), Roger Barga, Mohamed Ali, and Mingsheng Hong Microsoft."

Similar presentations


Ads by Google