Efficiency of Alignment-based algorithms B.F. van Dongen Laziness! (Gu)estimation! Implementation effort?

Slides:



Advertisements
Similar presentations
Solving problems by searching
Advertisements

Chapter 5: Tree Constructions
Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
Chapter 9: Graphs Shortest Paths
School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) SSA Guo, Yao.
Greedy best-first search Use the heuristic function to rank the nodes Search strategy –Expand node with lowest h-value Greedily trying to find the least-cost.
Some Graph Algorithms.
Introduction to Algorithms Rabie A. Ramadan rabieramadan.org 2 Some of the sides are exported from different sources.
Parsimony based phylogenetic trees Sushmita Roy BMI/CS 576 Sep 30 th, 2014.
Models vs. Reality dr.ir. B.F. van Dongen Assistant Professor Eindhoven University of Technology
15-853Page : Algorithms in the Real World Suffix Trees.
296.3: Algorithms in the Real World
Experiences with Streaming Construction of SAH KD Trees Stefan Popov, Johannes Günther, Hans-Peter Seidel, Philipp Slusallek.
Using Search in Problem Solving
Graphs & Graph Algorithms 2 Nelson Padua-Perez Bill Pugh Department of Computer Science University of Maryland, College Park.
Liang, Introduction to Java Programming, Eighth Edition, (c) 2011 Pearson Education, Inc. All rights reserved Chapter 23 Algorithm Efficiency.
More Graph Algorithms Weiss ch Exercise: MST idea from yesterday Alternative minimum spanning tree algorithm idea Idea: Look at smallest edge not.
Memory Management Last Update: July 31, 2014 Memory Management1.
Shortest Paths Introduction to Algorithms Shortest Paths CSE 680 Prof. Roger Crawfis.
Time-Constrained Flooding A.Mehta and E. Wagner. Time-Constrained Flooding: Problem Definition ●Devise an algorithm that provides a subgraph containing.
Chapter 9 – Graphs A graph G=(V,E) – vertices and edges
Dijkstras Algorithm Named after its discoverer, Dutch computer scientist Edsger Dijkstra, is an algorithm that solves the single-source shortest path problem.
Distributed Asynchronous Bellman-Ford Algorithm
MA/CSSE 473 Day 12 Insertion Sort quick review DFS, BFS Topological Sort.
ALG0183 Algorithms & Data Structures Lecture 21 d Dijkstra´s algorithm 8/25/20091 ALG0183 Algorithms & Data Structures by Dr Andy Brooks Chapter 14 Weiss.
Representing and Using Graphs
Spring 2010CS 2251 Trees Chapter 6. Spring 2010CS 2252 Chapter Objectives Learn to use a tree to represent a hierarchical organization of information.
Graph Algorithms. Definitions and Representation An undirected graph G is a pair (V,E), where V is a finite set of points called vertices and E is a finite.
Algorithm Course Dr. Aref Rashad February Algorithms Course..... Dr. Aref Rashad Part: 6 Shortest Path Algorithms.
Data Structure & Algorithm II.  In a multiuser computer system, multiple users submit jobs to run on a single processor.  We assume that the time required.
The single-source shortest path problem (SSSP) input: a graph G = (V, E) with edge weights, and a specific source node s. goal: find a minimum weight (shortest)
CSE 2331 / 5331 Topic 12: Shortest Path Basics Dijkstra Algorithm Relaxation Bellman-Ford Alg.
CSE 589 Part VI. Reading Skiena, Sections 5.5 and 6.8 CLR, chapter 37.
10 Copyright © William C. Cheng Data Structures - CSCI 102 Graph Terminology A graph consists of a set of Vertices and a set of Edges C A B D a c b d e.
Lecture 3: Uninformed Search
Billy Timlen Mentor: Imran Saleemi.  Goal: Have an optimal matching  Given: List of key-points in each image/frame, Matrix of weights between nodes.
1 Branch and Bound Searching Strategies Updated: 12/27/2010.
Pipelined and Parallel Computing Partition for 1 Hongtao Du AICIP Research Nov 3, 2005.
Ricochet Robots Mitch Powell Daniel Tilgner. Abstract Ricochet robots is a board game created in Germany in A player is given 30 seconds to find.
CS 150: Analysis of Algorithms. Goals for this Unit Begin a focus on data structures and algorithms Understand the nature of the performance of algorithms.
Shortest Paths PowerPoint adapted from Alan Tam’s Shortest Path, 2004 with slight modifications.
CS 241 Discussion Section (2/9/2012). MP2 continued Implement malloc, free, calloc and realloc Reuse free memory – Sequential fit – Segregated fit.
Suppose G = (V, E) is a directed network. Each edge (i,j) in E has an associated ‘length’ c ij (cost, time, distance, …). Determine a path of shortest.
BINARY TREES Objectives Define trees as data structures Define the terms associated with trees Discuss tree traversal algorithms Discuss a binary.
Review for E&CE Find the minimal cost spanning tree for the graph below (where Values on edges represent the costs). 3 Ans. 18.
Optimization Problems The previous examples simply find a single solution What if we have a cost associated with each solution and we want to find the.
LINKED LISTS.
1 GRAPHS – Definitions A graph G = (V, E) consists of –a set of vertices, V, and –a set of edges, E, where each edge is a pair (v,w) s.t. v,w  V Vertices.
Solving problems by searching Chapter 3. Types of agents Reflex agent Consider how the world IS Choose action based on current percept Do not consider.
NAME THAT ALGORITHM #2 HERE ARE SOME PROBLEMS. SOLVE THEM. GL HF.
Solving problems by searching
Graphs-Part II Depth First Search (DFS)
Efficient implementations of Alignment-based algorithms
Courtsey & Copyright: DESIGN AND ANALYSIS OF ALGORITHMS Courtsey & Copyright:
ECE 448 Lecture 4: Search Intro
C.Eng 213 Data Structures Graphs Fall Section 3.
Single-Source Shortest Paths
COMP 6/4030 ALGORITHMS Prim’s Theorem 10/26/2000.
Graphs Chapter 13.
Algorithms (2IL15) – Lecture 5 SINGLE-SOURCE SHORTEST PATHS
Searching CLRS, Sections 9.1 – 9.3.
Shortest Path Algorithm for Weighted Non-negative Undirected Graphs
Weighted Graphs & Shortest Paths
Visualizations Dijkstra’s Algorithm
CSE 417: Algorithms and Computational Complexity
Dijkstra's Shortest Path Algorithm
Implementation of Dijkstra’s Algorithm
Major Design Strategies
Major Design Strategies
More Graphs Lecture 19 CS2110 – Fall 2009.
Presentation transcript:

Efficiency of Alignment-based algorithms B.F. van Dongen Laziness! (Gu)estimation! Implementation effort?

Introduction: Alignments  Alignments are used for conformance checking  Alignments are computed over a trace and a model:  A trace is a (partial) order of activities  A model is a labeled Petri net or a labeled Process Tree, labeled with activities  An alignment explains exactly where deviations occur:  A synchronous move means that an activity is in the log and a corresponding transition was enabled in the model  A log move means that no corresponding activity is found in the model  A model move means that no corresponding activity appeared in the log

Implementations: Estimator versions Petri nets Process Trees  Dijkstra (estimator 0)  Naive (estimator parikh)  ILP (estimator using ILP)  Naive (estimator parikh)  LP (estimator using LP)  Hybrid ILP (ILP & LP)  (I)LP with OR constraints

Complexity w/ Own Collections Initialize PriorityQueue q While (head(q) is not target t) VisitedNode n = head(q) add n to the considered nodes For each edge in the graph from node(n) to m If m was considered before, continue If m is in the queue with lower cost, continue If m is in the queue with higher cost, update and reposition it If m is new, compute an estimate for the remaining distance to t V = new VisitedNode(m) set n a predecessor for v add v to the priority queue Return head(q) O(log(n)+k/n) O(1) O(log(n)+k/n) Expensive? O(log(n)) O(log(n)+k/n)

A Closer Look: Estimation  Stupid estimation: Trivial  Smarter estimation: Trivial  LP-based estimation:Polynomial  Minimize c.x Where A.x r c.x >= c.x’  ILP-based estimation:Exponential Necessary to compute fast Virtually no difference in practice =≤=≤

Cached Estimates Initialize PriorityQueue q While (head(q) is not target t) VisitedNode n = head(q) add n to the considered nodes For each edge in the graph from node(n) to m If m was considered before, continue If m is in the queue with lower cost, continue If m is in the queue with higher cost, update and reposition it If m is new, compute an estimate for the remaining distance to t v = new VisitedNode(m) set n a predecessor for v add v to the priority queue Return head(q)

Cached Estimates Initialize PriorityQueue q While (head(q) is not target t) VisitedNode n = head(q) If the estimate for node(n) is not exact yet Compute the exact estimate for the remaining distance to t and add n to the priority queue continue add n to the considered nodes For each edge in the graph from node(n) to m If m was considered before, continue If m is in the queue with lower cost, continue If m is in the queue with higher cost, update and reposition it If m is new, compute a fast lowerbound for the remaining distance to t v = new VisitedNode(m) set n a predecessor for v add v to the priority queue Return head(q) O(log(n)+k/n) O(1) Expensive?

Cached Estimates  The number of computed estimates decreases  Worst case is equal  The number of queue operations increases  Worst case is double (queue, poll, requeue, poll)  Storage mechanism has to allow for overwriting of estimates  Loss of space

State space reduction  Example:  Trace:  Statespace of the model: A AB 9 states, 13 transitions

Alignment search space Find the shortest path from the top-left to the bottom right 45 states, 121 transitions

Alignment search space  The alignment for trace : 

Implicit Execution  In a Process Tree, many nodes can be executed implicitly A AB 6 states, 9 transitions

Implicit execution – Statespace reduction states, transitions

Visited Statespace –Estimators Trivial estimator: Remaining trace length states, transitions (I)LP Based Estimator states, transitions

Stubborn Sets  Stubborn sets can be used to eliminate parallelism while maintaining reachability of deadlocks  Basically, you sequentialize as much of the parallelism as possible 5 states, 5 transitions

Stubborn Sets for Alignments  Model Moves can completely be sequentialized  Allow for all possible Model Moves after a Synchronous Move and after a Log Move  Otherwise, allow only mutually dependent moves, and independent moves “larger” than the last Model Move.  Special care has to be taken of loops  Allow moves from within the last started subtree only  Moves are dependent iff:  Their direct parent is a XOR node  Their direct parent is a LOOP node  One move is an OR-termination of the other node’s parent

Performance comparison

Hybrid ILP with Stubborn Sets (fitness 1)

Hybrid ILP with Stubborn Sets (fitness 0.9)

Hybrid ILP with Stubborn Sets (fitness 0.8)

Bottlenecks  Estimation is no longer the (only) bottleneck  LP computations are done in native code  data transfer between Java and native code is expensive  LP solvers are optimized for few, very large problems, while we have very many very small problems (20,000 per second)  Queue operations become expensive  Worst case doubling queue operations  For large models, queues get large  Considered node set operations become expensive  Large statespaces imply many considered states

Future work  Approximation of alignments:  Fixed-size queues allow for quick queue operations, but uncertain results  Efficient estimation:  Estimation is exponential in the size of the model  Doing multiple estimates at once, or  reusing previous estimates  Modularization:  Exploit the block structure even more