Presentation is loading. Please wait.

Presentation is loading. Please wait.

IncApprox The marriage of incremental and approximate computing Pramod Bhatotia Dhanya Krishnan, Do Le Quoc, Christof Fetzer, Rodrigo Rodrigues* (TU Dresden.

Similar presentations


Presentation on theme: "IncApprox The marriage of incremental and approximate computing Pramod Bhatotia Dhanya Krishnan, Do Le Quoc, Christof Fetzer, Rodrigo Rodrigues* (TU Dresden."— Presentation transcript:

1 IncApprox The marriage of incremental and approximate computing Pramod Bhatotia Dhanya Krishnan, Do Le Quoc, Christof Fetzer, Rodrigo Rodrigues* (TU Dresden & *IST Lisbon)

2 2 Information Data analytics systems Raw data

3 3 Massive scale Low latency High throughput Big data systems

4 To strike a balance 4 Low latency High throughput Tension “Novel” computing paradigms “Novel” computing paradigms

5 Observation: Compute over a sub-set of data items instead of the entire data-set! Take less time and resources for computation How do these computing paradigms make this trade-off? 5

6 Two such computing paradigms 6 Inc Incremental computing Approx Approximate computing

7 Incremental computation 7 Application Small changed input Incrementally updated output Common workflow: Rerun the same application over evolving input Incremental updates: Reuse memoized parts of the computation that are unaffected by the changed input Incremental updates: Reuse memoized parts of the computation that are unaffected by the changed input

8 Approximate computation 8 Common use-case: Approximate output is good enough! Application Approximate output Input Approximate output: Compute only parts of the input selected by representative sampling Approximate output: Compute only parts of the input selected by representative sampling

9 Basic idea 9 Both paradigms compute over a sub-set of data items ! Incremental computation Approximate computation Affected by the changed input Selected by the input sampling Biased sampling: Select input items for which we already have memoized result from previous runs Biased sampling: Select input items for which we already have memoized result from previous runs IncApprox

10 Motivation Design Evaluation Outline 10

11 Overview of IncApprox 11 Input data stream Incremental computing Approximate computing + IncApprox Approximate output Streaming query Query budget (Latency or resource constraints) Query budget provides adaptive execution interface to systematically tune b/w latency & throughput! Query budget provides adaptive execution interface to systematically tune b/w latency & throughput!

12 Computation model “ Batched stream processing” 12 Input data stream M M M M M M M M M M M M M M M M M M R R R R R R R R Output Input For each sliding window Run a data-parallel job Computation window

13 High-level approach 13 Step #1 Stratified sampling Computation input window Approximate output Biased sampling Step #2 Run job incrementally Step #3

14 #1: Stratified sampling 14 Step #1 Stratified sampling Computation input window Approximate output Biased sampling Step #2 Run job incrementally Step #3

15 #1: Why stratified sampling? 15 Stream aggregator (Kafka) Stream aggregator (Kafka) Stream processing system Input stream Sub-streams S1 S2 Sn … Need proportional allocation of data-items for all sub-streams Need proportional allocation of data-items for all sub-streams Sub-streams: Disparate events with different distributions Different arrival rates Sub-streams: Disparate events with different distributions Different arrival rates

16 #1: Stratified sampling in IncApprox 16 Stream aggregator (Kafka) Stream aggregator (Kafka) Sub-streams S1 S2 Sn … Sample size IncApprox Computation window for the input stream Stratified reservoir sampling (see the paper for details) Query budget

17 #2: Biased sampling 17 Step #1 Stratified sampling Computation input window Approximate output Biased sampling Step #2 Run job incrementally Step #3

18 #2: Why biased sampling? 18 Input data stream Window at T1 Window at T2 Overlap Successive overlapping computation windows provide an opportunity to reuse result

19 #2: Biased sampling in IncApprox 19 IncApprox T1 T2 Overlapping windows w/ fluctuating arrival rates “Adaptive” budget / Sample size Biased sampling (see the paper for details)

20 #3: Run job incrementally 20 Step #1 Stratified sampling Computation input window Approximate output Biased sampling Step #2 Run job incrementally Step #3

21 #3: Why incremental run? 21 Computation window new old (with old and new data-items) To reuse results: Design and implement “Dynamic algorithms” To reuse results: Design and implement “Dynamic algorithms” Need for automatic and efficient mechanism to incrementally update the output

22 #3: Incremental run in IncApprox 22 Self-adjusting computation (see the paper for details) Window M M M M M M M M R R R R R R Dependence graph Change in a data item M M R R R R Change propagation

23 Motivation Design Evaluation Outline 23

24 Performance gains of IncApprox 1.Twitter stream analytics 2.Network monitoring Implementation Apache Spark Streaming Platform 24 nodes distributed computing cluster Evaluation 24 See the paper for more results!

25 Performance gains 25 Higher the better 2X over native Spark Streaming 1.4X over individual Inc & Approx modules 2X over native Spark Streaming 1.4X over individual Inc & Approx modules

26 A data analytics system for incremental approximate computing Transparent : Targets existing applications w/o any code changes Practical: Supports adaptive execution based on the query budget Efficient: Employs a mix of Inc & Approx computing paradigms Summary: IncApprox 26

27 IncApprox Transparent + Practical + Efficient 27 IncApprox also provides error estimation approximate output = output ± error-estimate IncApprox also provides error estimation approximate output = output ± error-estimate See the paper for details! Thank you! www.mpi-sws.org/~bhatotia


Download ppt "IncApprox The marriage of incremental and approximate computing Pramod Bhatotia Dhanya Krishnan, Do Le Quoc, Christof Fetzer, Rodrigo Rodrigues* (TU Dresden."

Similar presentations


Ads by Google