Presentation is loading. Please wait.

Presentation is loading. Please wait.

Spark Debugger Ankur Dave, Matei Zaharia, Murphy McCauley, Scott Shenker, Ion Stoica UC BERKELEY.

Similar presentations


Presentation on theme: "Spark Debugger Ankur Dave, Matei Zaharia, Murphy McCauley, Scott Shenker, Ion Stoica UC BERKELEY."— Presentation transcript:

1 Spark Debugger Ankur Dave, Matei Zaharia, Murphy McCauley, Scott Shenker, Ion Stoica UC BERKELEY

2 Motivation Debugging distributed programs is hard Debuggers for general distributed systems incur high overhead Spark model enables debugging for almost zero overhead

3 Spark Programming Model map(_.split(‘\t’)(3)) articles Resilient Distributed Datasets (RDDs) filter(_.contains( “Berkeley”)) matches count() 10,000 HDFS file Deterministic transformations Example: Find how many Wikipedia articles match a search term

4 Debugging a Spark Program Debug the individual transformations instead of the whole system Rerun tasks Recompute RDDs Debugging a distributed program is now as easy as debugging a single-threaded one Also applies to MapReduce and Dryad

5 Approach As Spark program runs, workers report key events back to the master, which logs them Worker Master Worker Performance stats Exceptions RDD checksums Event log

6 Approach Later, user can re-execute from the event log to debug in a controlled environment Worker Master Debugger Worker Event log

7 Detecting Nondeterministic Transformations Re-running a nondeterministic transformation may yield different results We can use RDD checksums to detect nondeterminism and alert the user

8 Demo Example app: PageRank on Wikipedia dataset

9 Performance Event logging introduces minimal overhead

10 Future Plans Culprit determination GC monitoring Memory monitoring

11 Ankur Dave ankurd@eecs.berkeley.edu http://ankurdave.com The Spark debugger is in development at https://github.com/mesos/spark, branch event-log Try Spark at http://spark-project.org!


Download ppt "Spark Debugger Ankur Dave, Matei Zaharia, Murphy McCauley, Scott Shenker, Ion Stoica UC BERKELEY."

Similar presentations


Ads by Google