Presentation is loading. Please wait.

Presentation is loading. Please wait.

Naiad Timely Dataflow System Presented by Leeor Peled Advanced Topics in Computer Architecture April 2014.

Similar presentations


Presentation on theme: "Naiad Timely Dataflow System Presented by Leeor Peled Advanced Topics in Computer Architecture April 2014."— Presentation transcript:

1 Naiad Timely Dataflow System Presented by Leeor Peled Advanced Topics in Computer Architecture April 2014

2 Greece, sometime B.C. Naiads (Ναϊάδες) were a type of nymph (female spirit) who presided over fountains, wells, springs, streams, brooks and other bodies of fresh water. Dryads (Δρυάδες) are tree nymphs, or female tree spirits There’s one hiding over here!

3 Dryad (EuroSys’07)

4 Distributed dataflow frameworks Batch processing  MapReduce, Dryad, Spark Deterministic, stateless, synchronous Stream processing  Storm, Millwheel, Timestream Asynchronous Graph processing  Pregel, Graphlab, Giraph

5 Graph descriptive language

6 Timely Dataflow – from 10k ft ”Naiad” (SOSP`13) Framework no longer requires a DAG Nodes are stateful, and extended with epoch-based timestamps  Allows simple loop contexts.  Can preserve datasets No global coordination on critical path (some lazy “GC”) Choice between immediate responsiveness and aggregated syncs Better suited to implement graph algorithms over big- data streaming input …and it’s open sourced!

7 Processing example

8 For comparison.. Map/Reduce model

9 Timeliness in depth

10 Timeliness (2)

11 Timeliness (3) Message passing API:  Callbacks:  OnRecv(Edge, Message, Timestamp)  OnNotify(Timestamp)  Calls:  SendBy(Edge, Message, Timestamp)  NotifyAt(Timestamp)  Restrictions / guarantees:  Messages are queued and may reorder  OnNotify guaranteed to occur after all OnRecv calls (to the same vertex) with prior times. Works like a barrier.  Calls must advance time monotonously

12 Example code class DistinctCount : Vertex { Dictionary > counts; void OnRecv(Edge e, S msg, T time) { if (!counts.ContainsKey(time)) { counts[time] = new Dictionary (); this.NotifyAt(time); } if (!counts[time].ContainsKey(msg)) { counts[time][msg] = 0; this.SendBy(output1, msg, time); } counts[time][msg]++; } void OnNotify(T time) { foreach (var pair in counts[time]) this.SendBy(output2, pair, time); counts.Remove(time); } { Low latency processing (sends upon receive) Synchronization point

13 Framework implementation Single thread:  Scheduler keeps track of events  Pointstamp = {Location, Timestamp}.  Initialized per epoch and per input vertice  Per each active (pending) Pointstamp, maintain:  Occurrence count (number of events associated with it)  Precursor count (number of “could-result-in“ events)  “Frontier”: set of pointstamps with precursor=0. Ok to Notify.  Distributed updates protocol the “agree” on global state epoch

14 Logical graph deployment with parallel physical nodes, graph is cloned  Each node passes messages locally by default  Framework supports partitioning functions for routing  Allows hash/key partitioning, map/reduce logic or group-by operations  "Could-result-in” relations are calculated over the logical graph (less effective, but simplifies scaling)

15 Optimizations Checkpointing (per vertice) for fault tolerance Disable windows TCP delays (Nagle’s algorithm) Reduce backoff times in concurrency conflicts Reduce GC freq

16 Results: micro benchmarks

17 Results: Real-world apps

18 Discussion… Focus on in-memory workloads Lock contentions and producer/consumer issues Performance comparison done against external results – fishy… Auto mapping from logical to physical


Download ppt "Naiad Timely Dataflow System Presented by Leeor Peled Advanced Topics in Computer Architecture April 2014."

Similar presentations


Ads by Google