Maintaining SP Relationships Efficiently, on-the-fly Jeremy Fineman.

Slides:



Advertisements
Similar presentations
Wait-Free Linked-Lists Shahar Timnat, Anastasia Braginsky, Alex Kogan, Erez Petrank Technion, Israel Presented by Shahar Timnat 469-+
Advertisements

Optimistic Methods for Concurrency Control By : H.T. Kung & John T. Robinson Presenters: Munawer Saeed.
CS492B Analysis of Concurrent Programs Lock Basics Jaehyuk Huh Computer Science, KAIST.
I NTRODUCTION TO G RAPH DRAWING Fall 2010 Battista, G. D., Eades, P., Tamassia, R., and Tollis, I. G Graph Drawing: Algorithms for the Visualization.
COL 106 Shweta Agrawal and Amit Kumar
Scalable and Precise Dynamic Datarace Detection for Structured Parallelism Raghavan RamanJisheng ZhaoVivek Sarkar Rice University June 13, 2012 Martin.
CS 484. Discrete Optimization Problems A discrete optimization problem can be expressed as (S, f) S is the set of all feasible solutions f is the cost.
Data Structures: A Pseudocode Approach with C 1 Chapter 5 Contd... Objectives Explain the design, use, and operation of a linear list Implement a linear.
More Graphs COL 106 Slides from Naveen. Some Terminology for Graph Search A vertex is white if it is undiscovered A vertex is gray if it has been discovered.
U NIVERSITY OF M ASSACHUSETTS, A MHERST – Department of Computer Science The Implementation of the Cilk-5 Multithreaded Language (Frigo, Leiserson, and.
Concurrent Search Structure Algorithms Dennis Shasha.
Lecture 11 CSS314 Parallel Computing
CILK: An Efficient Multithreaded Runtime System. People n Project at MIT & now at UT Austin –Bobby Blumofe (now UT Austin, Akamai) –Chris Joerg –Brad.
Nested Parallelism in Transactional Memory Kunal Agrawal, Jeremy T. Fineman and Jim Sukha MIT.
15-853Page : Algorithms in the Real World Suffix Trees.
296.3: Algorithms in the Real World
Binary Heaps CSE 373 Data Structures Lecture 11. 2/5/03Binary Heaps - Lecture 112 Readings Reading ›Sections
10/31/20111 Relativistic Red-Black Trees Philip Howard 10/31/2011
Transaction Processing: Concurrency and Serializability 10/4/05.
Canonical Producer CP API User Code CP Servlet Files CreateTable, Port, Protocol, Security, SQL Support, Multiple Query Support Security Insert Query Port.
Chapter 4: Transaction Management
Applied Algorithmics - week2
3/3/ Alperovich Alexander. Motivation  Find a maximal flow over a directed graph  Source and sink vertices are given 3/3/
Review of Graphs A graph is composed of edges E and vertices V that link the nodes together. A graph G is often denoted G=(V,E) where V is the set of vertices.
More Trees COL 106 Amit Kumar and Shweta Agrawal Most slides courtesy : Douglas Wilhelm Harder, MMath, UWaterloo
Juan Mendivelso.  Serial Algorithms: Suitable for running on an uniprocessor computer in which only one instruction executes at a time.  Parallel Algorithms:
Multithreaded Algorithms Andreas Klappenecker. Motivation We have discussed serial algorithms that are suitable for running on a uniprocessor computer.
The HDF Group Multi-threading in HDF5: Paths Forward Current implementation - Future directions May 30-31, 2012HDF5 Workshop at PSI 1.
Heaps, Heapsort, Priority Queues. Sorting So Far Heap: Data structure and associated algorithms, Not garbage collection context.
Tree (new ADT) Terminology:  A tree is a collection of elements (nodes)  Each node may have 0 or more successors (called children)  How many does a.
Scheduler Activations: Effective Kernel Support for the User-level Management of Parallelism Thomas E. Anderson, Brian N. Bershad, Edward D. Lazowska,
CSIT 402 Data Structures II
Chapter 13 B Advanced Implementations of Tables – Balanced BSTs.
November 15, 2007 A Java Implementation of a Lock- Free Concurrent Priority Queue Bart Verzijlenberg.
COMP 111 Threads and concurrency Sept 28, Tufts University Computer Science2 Who is this guy? I am not Prof. Couch Obvious? Sam Guyer New assistant.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Mutual Exclusion.
U NIVERSITY OF M ASSACHUSETTS, A MHERST – Department of Computer Science Performance of Work Stealing in Multiprogrammed Environments Matthew Hertz Department.
2-3 Tree. Slide 2 Outline  Balanced Search Trees 2-3 Trees Trees.
Data Structure & Algorithm II.  In a multiuser computer system, multiple users submit jobs to run on a single processor.  We assume that the time required.
Memory Consistency Models. Outline Review of multi-threaded program execution on uniprocessor Need for memory consistency models Sequential consistency.
Week 8 - Wednesday.  What did we talk about last time?  Level order traversal  BST delete  2-3 trees.
Executing Parallel Programs with Potential Bottlenecks Efficiently Yoshihiro Oyama Kenjiro Taura Akinori Yonezawa {oyama, tau,
Introduction.  Administration  Simple DBMS  CMPT 454 Topics John Edgar2.
Parallelism without Concurrency Charles E. Leiserson MIT.
AVL Trees and Heaps. AVL Trees So far balancing the tree was done globally Basically every node was involved in the balance operation Tree balancing can.
Scheduling Multithreaded Computations By Work-Stealing Robert D. Blumofe The University of Texas, Austin Charles E. Leiserson, MIT Laboratory for Computer.
1 Review of report "LSDX: A New Labeling Scheme for Dynamically Updating XML Data"
CGS 3763 Operating Systems Concepts Spring 2013 Dan C. Marinescu Office: HEC 304 Office hours: M-Wd 11: :30 AM.
BINARY TREES Objectives Define trees as data structures Define the terms associated with trees Discuss tree traversal algorithms Discuss a binary.
1 Binary Search Trees  Average case and worst case Big O for –insertion –deletion –access  Balance is important. Unbalanced trees give worse than log.
PC-Trees vs. PQ-Trees. 2 Table of contents Review of PQ-trees –Template operations Introducing PC-trees The PC-tree algorithm –Terminal nodes –Splitting.
Uses some of the slides for chapters 3 and 5 accompanying “Introduction to Parallel Computing”, Addison Wesley, 2003.
Concurrency and Performance Based on slides by Henri Casanova.
Red-Black Trees an alternative to AVL trees. Balanced Binary Search Trees A Binary Search Tree (BST) of N nodes is balanced if height is in O(log N) A.
Mergesort example: Merge as we return from recursive calls Merge Divide 1 element 829.
Kendo: Efficient Deterministic Multithreading in Software M. Olszewski, J. Ansel, S. Amarasinghe MIT to be presented in ASPLOS 2009 slides by Evangelos.
List Order Maintenance
Memory Consistency Models
Memory Consistency Models
Ivy Eva Wu.
Topological Sort (an application of DFS)
CS200: Algorithms Analysis
April 30th – Scheduling / parallel
Order maintenance problem
Md. Abul Kashem, Chowdhury Sharif Hasan, and Anupam Bhattacharjee
Transactions with Nested Parallelism
Topological Sort (an application of DFS)
Atlas: An Infrastructure for Global Computing
CMSC 341 Splay Trees.
Order maintenance problem
Presentation transcript:

Maintaining SP Relationships Efficiently, on-the-fly Jeremy Fineman

The Problem Fork-Join (e.g. Cilk) programs have threads that operate either in series or logically parallel. Want to query relationship between two threads as the program runs. For example, Nondeterminator uses relationship between two threads as basis for determinacy race.

Parse Tree Represent SP-DAG as a parse tree S nodes show series relationships P nodes are parallel relationships ac b d S P d a S cb

Least Common Ancestor SP-Bags uses LCA lookup. LCA of b and d is an S-node –So b and d are in series Cost is  (v,v) per query (in Nondeterminator) S P d a S c b

Two Complementary Walks At S-node, always walk left then right At P-node, can go left then right, or right then left S P d a S cb

Two Complementary Walks Produce two orders of threads: –a b c d –a c b d Notice b || c, and orders differ between b and c. S P d a S cb

Two Complementary Walks Claim: If e 1 precedes e 2 in one walk, and e 2 precedes e 1 in the other, then e 1 || e 2. S P d a S cb

Maintaining both orders in a single tree walk Walk of tree represents execution of program. –Can execute program twice, but execution could be nondeterministic. –Instead, maintain both thread orderings on-the- fly, in a single pass.

Order Maintenance Problem We need a data structure which supports the following operations: –Insert(X,Y): Place Y after X in the list. –Precedes(X,Y): Does X come before Y?

Naïve Order Maintenance Structure Naïve Implementation is just a linked list

Naïve Order Maintenance Insert Insert(X,Y) does standard linked list insert X Y

Naïve Order Maintenance Insert Insert(X,Y) does standard linked list insert X Y

Naïve Order Maintenance Insert Insert(X,Y) does standard linked list insert X Y

Naïve Order Maintenance Query Precedes(X,Z) looks forward in list. XZ

Naïve Order Maintenance Query Precedes(X,Z) looks forward in list. X Z

Naïve Order Maintenance Query Precedes(X,Z) looks forward in list. XZ

Naïve Order Maintenance Query Precedes(X,Z) looks forward in list. X Z

The algorithm Recall, we are thinking in terms of parse tree. Maintain two order structures. When executing node x: –Insert children of x after x in the lists. –Ordering of children depends on whether x is an S or P node.

Example S1S1 P d a S2S2 cb Order 1: Order 2:

Example S1S1 P d a S2S2 cb Order 1: Order 2: S1S1 S1S1

Example S1S1 P d a S2S2 cb Order 1: Order 2: S1S1 S1S1 S2S2 S2S2 d d

Example S1S1 P d a S2S2 cb Order 1: Order 2: S1S1 S1S1 S2S2 S2S2 d d

Example S1S1 P d a S2S2 cb Order 1: Order 2: S1S1 S1S1 S2S2 S2S2 d d a a P P

Example S1S1 P d a S2S2 cb Order 1: Order 2: S1S1 S1S1 S2S2 S2S2 d d a a P P

Example S1S1 P d a S2S2 cb Order 1: Order 2: S1S1 S1S1 S2S2 S2S2 d d a a P P

Example S1S1 P d a S2S2 cb Order 1: Order 2: S1S1 S1S1 S2S2 S2S2 d d a a P P b c c b

Example S1S1 P d a S2S2 c b Order 1: Order 2: S1S1 S1S1 S2S2 S2S2 d d a a P P b c c b

Example S1S1 P d a S2S2 c b Order 1: Order 2: S1S1 S1S1 S2S2 S2S2 d d a a P P b c c b

Example S1S1 P d a S2S2 cb Order 1: Order 2: S1S1 S1S1 S2S2 S2S2 d d a a P P b c c b

Analysis Correctness does not depend on execution –Any valid serial or parallel execution produces correct results. Inserts after x in orders only happen when x executes. Only one processor will ever insert after x. Running time depends on implementation of order maintenance data structure.

Serial Running Time Current Nondeterminator does serial execution. Can have O(T 1 ) queries and inserts. Naïve implementation is –O(n) time for query of n-element list. –O(1) time for insert. –Total time is very poor: O(T 1 2 )

Use Dietz and Sleator Order Maintenance Structure Essentially a linked list with labels. Queries are O(1): just compare the labels. Inserts are O(1) amortized cost. –On some inserts, need to perform relabel. O(T 1 ) operations only takes O(T 1 ) time. –Gives us linear time Nondeterminator. Better than SP-bags algorithm.

Parallel Problem Dietz and Sleator relabels on inserts –Does not work concurrently. Lock entire structure on insert? –Query is still O(1). –Single relabel can cost O(T 1 ) operations. Critical path increases to O(T 1 ) Running time is O(T 1 /p + T 1 ).

Parallel Problem Solution Leverage the Cilk scheduler: –There are only O(pT  ) steals There is no contention on subcomputations done by single processor between steals. –We do not need to lock every insert.

Parallel Problem Solution Top level is global ordering of subcomputations. Bottom level is local ordering of subcomputation performed on single processor. On a steal, insert O(1) nodes into global structure. Global Structure Size: O(pT  )

Parallel Running Time An insert into a local order structure is still O(1). An insert into the global structure involves locking the entire global structure. –May need to wait for p processors to insert serially. –Amortized cost is O(p) per insert. –Only O(pT  ) inserts into global structure. Total work and waiting time is O(T 1 + p 2 T  ) –Running time is O(T 1 /p + pT  )

Questions?