Self Adjusted Data Structures. Self-adjusting Structures Consider the following AVL Tree 44 1778 325088 4862.

Presentation on theme: "Self Adjusted Data Structures. Self-adjusting Structures Consider the following AVL Tree 44 1778 325088 4862."— Presentation transcript:

Self-adjusting Structures Consider the following AVL Tree 44 1778 325088 4862

Self-adjusting Structures Consider the following AVL Tree 44 1778 325088 4862 Suppose we want to search for the following sequence of elements: 48, 48, 48, 48, 50, 50, 50, 50, 50.

Self-adjusting Structures Consider the following AVL Tree 44 1778 325088 4862 Suppose we want to search for the following sequence of elements: 48, 48, 48, 48, 50, 50, 50, 50, 50. In this case, is this a good structure?

Self-adjusting Structures So far we have seen: BST: binary search trees – Worst-case running time per operation = O(N) – Worst case average running time = O(N) » Think about inserting a sorted item list AVL tree: – Worst-case running time per operation = O(logN) – Worst case average running time = O(logN) – Does not adapt to skew distributions

Self-adjusting Structures The structure is updated after each operation Consider a binary search tree. If a sequence of insertions produces a leaf in the level O(n), a sequence of m searches to this element will represent a time complexity of O(mn) Use an auto-adjusting strucuture

Self adjustable lists Move to Front – Whenever an element is accessed it is moved to the front of the list Transposition – Whenever an element x is accessed, we switch x with its predecessor Frequency counter – The list is always sorted by decreasing order of frequencies

Self-adjusting Structures Algoritmos Admissíveis – Um método é dito admissível se no i-ésimo acesso ele move o elemento acessado k i posições para frente – Classe de algoritmos que engloba todos os métodos apresentados.

Análise do Move to Front Teorema. Seja H um método admissível e seja s uma sequência de m acessos. Então Custo MF(S) <= 2Custo H(s) –m, Aonde Custo MF(S) e Custo H(s) são, respectivamente, os custos de MF e H para processar uma sequência s de requisições

Análise do Move to Front Prova. Empregamos o método da função potencial D i :o numero de inversões da lista mantida pelo MF em relação a lista do algoritmo H após o i-ésimo acesso. Lista de H: a b c f e d Lista de MF: b a f d e c Inversões: (b,a), (f,c), (d,e), (d,c) D i =4 Temos que c i = c i +D i –D i-1

Análise do Move to Front Prova. Elemento x é acessado pelo MF. x: k-ésimo elemento da lista de H x: j-ésimo elemento da lista de MF p: número de elementos que precedem x na lista de MF e sucedem x na lista de H Quando o MF coloca x na primeira posição, j-1-p inversões são criadas e p são destruídas Quando H move o elemento x e i operações para frente, e i inversões são destruídas e nenhuma é criada

Análise do Move to Front

Bad Sequences for Transposition and Frequency Counter Transposition – We insert n elements a 1,…,a n and then we access the element a n n times. – Transposition pays O(n 2 ) while MF pays O(n) Frequency Counter – List with elements (a n,…,a 1 ). We access a n n times, a n-1 (n- 1) times and so on. FC does not reorganize the list. – FC pays O(n 3 ) while MF pays O(n 2 )

Self-adjusting Structures Splay Trees (Tarjan and Sleator 1985) Binary search tree. Every accessed node is brought to the root Adapt to the access probability distribution

Splay trees: Basic Idea Try to make the worst-case situation occur less frequently. In a Binary search tree, the worst case situation can occur with every operation. (while inserting a sorted item list). In a splay tree, when a worst-case situation occurs for an operation: – The tree is re-structured (during or after the operation), so that the subsequent operations do not cause the worst-case situation to occur again.

Splay trees: Basic idea The basic idea of splay tree is: After a node is accessed, it is pushed to the root by a series of AVL tree-like operations (rotations). For most applications, when a node is accessed, it is likely that it will be accessed again in the near future (principle of locality).

Splay tree: Basic Idea By pushing the accessed node to the root the tree: – If the accessed node is accessed again, the future accesses will be much less costly. – During the push to the root operation, the tree might be more balanced than the previous tree. – Accesses to other nodes can also be less costly.

A first attempt A simple idea – When a node k is accessed, push it towards the root by the following algorithm: On the path from k to root: – Do a singe rotation between node ks parent and node k itself.

F k4k4 E D A k1k1 B B k5k5 k3k3 k2k2 access path Accessing node k1 A first attempt

F k4k4 E D B k5k5 k3k3 After rotation between k2 and k1 A k2k2 C k1k1 A first attempt

F k4k4 E B k5k5 k1k1 After rotation between k3 and k1 A k2k2 DC k3k3 A first attempt

F k1k1 B k5k5 After rotation between k4 and k1 A k2k2 DC E k4k4 k3k3 A first attempt

k1k1 BA k2k2 DC F k3k3 k5k5 E k4k4 k1 is now root But k3 is nearly as deep as k1 was. An access to k3 will push some other node nearly as deep as k3 is. So, this method does not work... A first attempt

Splaying The method will push the accessed node to the root. – With this pushing operation it will also balance the tree somewhat. – So that further operations on the new will be less costly compared to operations that would be done on the original tree. A deep tree will be splayed: Will be less deep, more wide.

Splaying - algorithm Assume we access a node. We will splay along the path from access node to the root. At every splay step: – We will selectively rotate the tree. – Selective operation will depend on the structure of the tree around the node in which rotation will be performed

Implementing Splay(x, S) Do the following operations until x is root. – ZIG: If x has a parent but no grandparent, then rotate(x). – ZIG-ZIG: If x has a parent y and a grandparent, and if both x and y are either both left children or both right children. – ZIG-ZAG: If x has a parent y and a grandparent, and if one of x, y is a left child and the other is a right child.

Implementing Splay(x, S) Do the following operations until x is root. – ZIG: If x has a parent but no grandparent, then rotate(x). – ZIG-ZIG: If x has a parent y and a grandparent, and if both x and y are either both left children or both right children. – ZIG-ZAG: If x has a parent y and a grandparent, and if one of x, y is a left child and the other is a right child. AB x C y

Implementing Splay(x, S) Do the following operations until x is root. – ZIG: If x has a parent but no grandparent, then rotate(x). – ZIG-ZIG: If x has a parent y and a grandparent, and if both x and y are either both left children or both right children. – ZIG-ZAG: If x has a parent y and a grandparent, and if one of x, y is a left child and the other is a right child. AB x C y CB y A x ZIG(x) root

Implementing Splay(x, S) Do the following operations until x is root. – ZIG: If x has a parent but no grandparent, then rotate(x). – ZIG-ZIG: If x has a parent y and a grandparent, and if both x and y are either both left children or both right children. – ZIG-ZAG: If x has a parent y and a grandparent, and if one of x, y is a left child and the other is a right child. AB y C x CB x A y ZAG(x) root

Implementing Splay(x, S) Do the following operations until x is root. – ZIG: If x has a parent but no grandparent, then rotate(x). – ZIG-ZIG: If x has a parent y and a grandparent, and if both x and y are either both left children or both right children. – ZIG-ZAG: If x has a parent y and a grandparent, and if one of x, y is a left child and the other is a right child. AB x C y D z

Implementing Splay(x, S) Do the following operations until x is root. – ZIG: If x has a parent but no grandparent, then rotate(x). – ZIG-ZIG: If x has a parent y and a grandparent, and if both x and y are either both left children or both right children. – ZIG-ZAG: If x has a parent y and a grandparent, and if one of x, y is a left child and the other is a right child. ZIG-ZIG AB x C y D z DC z B y A x

Implementing Splay(x, S) Do the following operations until x is root. – ZIG: If x has a parent but no grandparent, then rotate(x). – ZIG-ZIG: If x has a parent y and a grandparent, and if both x and y are either both left children or both right children. – ZIG-ZAG: If x has a parent y and a grandparent, and if one of x, y is a left child and the other is a right child. BC x D y z A

Implementing Splay(x, S) Do the following operations until x is root. – ZIG: If x has a parent but no grandparent, then rotate(x). – ZIG-ZIG: If x has a parent y and a grandparent, and if both x and y are either both left children or both right children. – ZIG-ZAG: If x has a parent y and a grandparent, and if one of x, y is a left child and the other is a right child. BC x D y z DC y x A BA z ZIG-ZAG

Splay Example Apply Splay(1, S) to tree S: 10 9 8 7 6 5 4 3 2 1 ZIG-ZIG

10 9 8 7 6 5 4 ZIG-ZIG 1 2 3 Splay Example Apply Splay(1, S) to tree S:

10 9 8 7 6 ZIG-ZIG 1 2 3 4 5 Splay Example Apply Splay(1, S) to tree S:

10 9 8 ZIG-ZIG 1 6 2 3 4 5 7 Splay Example Apply Splay(1, S) to tree S:

10 ZIG 1 8 96 7 2 3 4 5 Splay Example Apply Splay(1, S) to tree S:

1 10 8 96 7 2 3 4 5 Splay Example Apply Splay(1, S) to tree S:

Apply Splay(2, S) to tree S: 1 10 8 96 7 2 3 4 5 2 8 4 63 1 9 57 Splay(2)

Splay Tree Analysis Definitions. – Let S(x) denote subtree of S rooted at x. – |S| = number of nodes in tree S. – (S) = rank = log |S|. – (x) = (S(x)). 2 8 4 63 10 1 9 57 |S| = 10 (2) = 3 (8) = 3 (4) = 2 (6) = 1 (5) = 0 S(8)

Splay Tree Analysis Define the potential function Associate a positive weight to each node v: w (v) W(v)= w (y), y belongs to a subtree rooted at v Rank(v) = log W(v)

Splay Tree Analysis Define the potential function Associate a positive weight to each node v: w (v) W(v)= w (y), y belongs to a subtree rooted at v Rank(v) = log W(v) The tree potential is: v rank(v)

Upper bound for the amortized time of a complete splay operation To estimate the time of a splay operation we are going to use the number of rotations

Upper bound for the amortized time of a complete splay operation To estimate the time of a splay operation we use the number of rotations Lemma: The amortized time for a complete splay operation of a node x in a tree of root r is at most 1 + 3[rank(r) – rank(x)] where rank(x) is the rank of x before the splay and rank(r) is the rank of r after the splay.

Upper bound for the amortized time of a complete splay operation Proof: The amortized cost a is given by a=t + after – before t : number of rotations executed in the splaying

Upper bound for the amortized time of a complete splay operation Proof: The amortized cost a is given by a=t + after – before a = o 1 + o 2 + o 3 +... + o k o i : amortized cost of the i-th operation during the splay ( zig or zig-zig or zig-zag)

Upper bound for the amortized time of a complete splay operation Proof: i : potential function after i-th operation rank i : rank after i-th operation o i = t i + i – i-1

Splay Tree Analysis Operations – Case 1: zig( zag) – Case 2: zig-zig (zag-zag) – Case 3: zig-zag (zag-zig)

Splay Tree Analysis – Case 1: Only one rotation (zig) r x root

Splay Tree Analysis – Case 1: Only one rotation (zig) AB x C r CB r A x ZIG(x) w.l.o.g. r x root

Splay Tree Analysis – Case 1: Only one rotation (zig) AB x C r CB r A x ZIG(x) w.l.o.g. After the operation only rank(x) and rank(r) change r x root

Splay Tree Analysis – Since potential is the sum of every rank: i - i-1 = rank i (r) + rank i (x) – rank i-1 (r) – rank i-1 (x) t i = 1 (time of one rotation) Amort. Complexity: o i = 1 + rank i (r) + rank i (x) – rank i-1 (r) – rank i-1 (x)

Splay Tree Analysis Amort. Complexity: o i = 1 + rank i (r) + rank i (x) – rank i-1 (r) – rank i-1 (x) AB x C r CB r A x ZIG(x)

Splay Tree Analysis Amort. Complexity: o i = 1 + rank i (r) + rank i (x) – rank i-1 (r) – rank i-1 (x) AB x C r CB r A x ZIG(x) rank i-1 (r) rank i (r) rank i (x) rank i-1 (x)

Splay Tree Analysis Amort. Complexity: o i <= 1 + rank i (x) – rank i-1 (x) AB x C r CB r A x ZIG(x) rank i-1 (r) rank i (r) rank i (x) rank i-1 (x)

Splay Tree Analysis Amort. Complexity: o i <= 1 + 3[ rank i (x) – rank i-1 (x) ] AB q C r CB r A q ZIG(x) rank i-1 (r) rank i (r) rank i (x) rank i-1 (x)

Splay Tree Analysis – Case 2: Zig-Zig ZIG-ZIG AB x C y D z DC z B y A x

Splay Tree Analysis – Case 2: Zig-Zig ZIG-ZIG AB x C y D z DC z B y A x o i = 2 + rank i (x) + rank i (y)+rank i (z) – rank i-1 (x) – rank i-1 (y) – rank i-1 (z)

Splay Tree Analysis – Case 2: Zig-Zig ZIG-ZIG AB x C y D z DC z B y A x o i = 2 + rank i (x) + rank i (y)+rank i (z) – rank i-1 (x) – rank i-1 (y) – rank i-1 (z) rank i-1 (z) = rank i (x)

Splay Tree Analysis – Case 2: Zig-Zig ZIG-ZIG AB x C y D z DC z B y A x o i = 2 + rank i (y)+rank i (z) – rank i-1 (x) – rank i-1 (y)

Splay Tree Analysis – Case 2: Zig-Zig ZIG-ZIG AB x C y D z DC z B y A x o i = 2 + rank i (y)+rank i (z) – rank i-1 (x) – rank i-1 (y) rank i (x) rank i (y) rank i -1 (y) rank i-1 (x)

Splay Tree Analysis – Case 2: Zig-Zig ZIG-ZIG AB x C y D z DC z B y A x o i 2 + rank i (x)+rank i (z) – 2rank i-1 (x)

Splay Tree Analysis Devemos mostrar que 2 + rank i (x)+rank i (z) – 2rank i-1 (x) <=3(rank i (x)-rank i-1 (x)) Ou seja, -2 rank i-1 (x)+rank i (z) – 2rank i (x) Devemos estudar o comportamento da função rank i-1 (x)+rank i (z) – 2rank i (x)

Splay Tree Analysis Temos que rank i-1 (x)+rank i (z) – 2rank i (x) = log( w i-1 (x)/ w i (x))+log( w i (z)/ w i (x) ) Examinando a árvore percebemos que w i-1 (x)/ w i (x)+ w i (z)/ w i (x)<=1 Portanto, log( w i-1 (x)/ w i (x))+log( w i (z)/ w i (x) ) é maior ou igual a min log a + log b sujeito a a+b<=1. Segue da convexidade da função log que o mínimo é atingido em a=b=1/2. Portanto, o mínimo é maior ou igual a -2.

Splay Tree Analysis – Case 2: Zig-Zig ZIG-ZIG AB x C y D z DC z B y A x o i 3[ rank i (x) – rank i-1 (x) ]

Splay Tree Analysis – Case 3: Zig-Zag ( analysis similar to the case 2) o i 3[ rank i (x) – rank i-1 (x) ]

Splay Tree Analysis Putting the three cases together and telescoping a = o 1 + o 2 +... + o k 3[rank(r)-rank(x)]+1

Splay Tree Analysis For proving different types of results we must set the weights accordingly

Theorem. The cost of m accesses is O(m log n +n log n), where n is the number of items in the tree Splay Tree Analysis

Theorem. The cost of m accesses is O(m log n+n logn), where n is the number of items in the tree Splay Tree Analysis Proof: Define every weight as 1/n. Then, the amortized cost is at most 3 log n + 1. The potential variation is at most n log n Thus, by summing over all accesses we conclude that the cost is at most 3m log n + n log n +m

Static Optimality Theorem Theorem: Let q(i) be the number of accesses to item i. If every item is accessed at least once, then total cost is at most

Static Optimality Theorem Proof. Assign a weight of q(i)/m to item i. Then, rank(r)=0 and rank(i) log(q(i)/m) Thus, 3[rank(r) – rank(i)] +1 3log(m/q(i)) + 1 In addition, | | Thus, the overall cost is

Static Optimality Theorem Theorem: The cost of an optimal static binary search tree is

Static Finger Theorem Theorem: Let i,...,n be the items in the splay tree. Let the sequence of accesses be 1,...,m. If f is a fixed item, the total access time is

Static Finger Theorem Proof. Assign a weight 1/(|i –f|+1) 2 to item i. Then, rank(r)= O(1). rank(i j )=O( log( |i j – f +1|) Since the weight of every item is at least 1/n 2, then | | n log n

Working Set Theorem Theorem: Let i,...,n be the items in the splay tree. Let the sequence of accesses be 1,...,m. Let i(j) be the item accessed at the j-th access and let let t(j) be the number of distinct itens accessed since the previous access to i(j). Then,

Dynamic Optimality Conjecture Conjecture Consider any sequence of successful accesses on an n-node search tree. Let A be any algorithm that carries out each access by traversing the path from the root to the node containing the accessed item, at a cost of one plus the depth of the node containing the item, and that between accesses performs an arbitrary number of rotations anywhere in the tree, at a cost of one per rotation. Then the total time to perform all the accesses by splaying is no more than O(n) plus a constant times the time required by the algorithm.

Dynamic Optimality Conjecture: best attempt Tango Trees: O(log log n) competitive ratio Dynamic optimality - almost. E. Demaine, D. Harmon, J. Iacono, and M. Patrascu. In Foundations of Computer Science (FOCS), 2004

Insertion and Deletion Most of the theorems hold !

Paris Kanellakis Theory and Practice Award Award 1999 Splay Tree Data Structure Daniel D.K. Sleator and Robert E. Tarjan Citation For their invention of the widely-used "Splay Tree" data structure.

Amortized running time Definition: For a series of M consecutive operations: – If the total running time is O(M*f(N)), we say that the amortized running time (per operation) is O(f(N)). Using this definition: – A splay tree has O(logN) amortized cost (running time) per operation.

Amortized running time Ordinary Complexity: determination of worst case complexity. Examines each operation individually

Amortized running time Ordinary Complexity: determination of worst case complexity. Examines each operation individually Amortized Complexity: analyses the average complexity of each operation.

Amortized Analysis: Physics Approach It can be seen as an analogy to the concept of potential energy

Potential function which maps any configuration E of the structure into a real number (E), called potential of E. Amortized Analysis: Physics Approach

It can be seen as an analogy to the concept of potential energy Potential function which maps any configuration E of the structure into a real number (E), called potential of E. It can be used to to limit the costs of the operations to be done in the future Amortized Analysis: Physics Approach

Amortized cost of an operation a = t + (E) - (E)

Amortized cost of an operation a = t + (E) - (E) Real time of the operation Structure configuration after the operation Structure configuration before the operation

Amortized cost of a sequence of operations t i = (a i - i + i-1 ) i=1 m m a = t + (E) - (E)

Amortized cost of a sequence of operations t i = (a i - i + i-1 ) i=1 m m = 0 - m + ai i=1 m By telescopic a = t + (E) - (E)

Amortized cost of a sequence of M operations t i = (a i - i + i-1 ) i=1 m m = 0 - m + ai i=1 m By telescopic The total real time does not depend on the intermediary potential a = t + (E) - (E)

Amortized cost of a sequence of operations T i = (a i - i + i-1 ) i=1 If the final potential is greater or equal than the initial, then the amortized complexity can be used as an upper bound to estimate the total real time.

Amortized running time Definition: For a series of M consecutive operations: – If the total running time is O(M*f(N)), we say that the amortized running time (per operation) is O(f(N)). Using this definition: – A splay tree has O(logN) amortized cost (running time) per operation.

Implementing Splay(x, S) AB x C y D z

AB x C y D z DC z y AB x

AB x C y D z DC z y AB x

AB x C y D z DC z y AB x DC z B y A x