CSE373: Data Structures & Algorithms Lecture 12: Amortized Analysis and Memory Locality Lauren Milne Summer 2015.

Slides:

Advertisements

Similar presentations

CSE332: Data Abstractions Lecture 21: Amortized Analysis Dan Grossman Spring 2010.

Advertisements

CMPE 421 Parallel Computer Architecture MEMORY SYSTEM.

CSE332: Data Abstractions Lecture 8: AVL Delete; Memory Hierarchy Dan Grossman Spring 2010.

1 Lecture 20 – Caching and Virtual Memory  2004 Morgan Kaufmann Publishers Lecture 20 Caches and Virtual Memory.

1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.

1 Chapter Seven Large and Fast: Exploiting Memory Hierarchy.

ENGS 116 Lecture 121 Caches Vincent H. Berk Wednesday October 29 th, 2008 Reading for Friday: Sections C.1 – C.3 Article for Friday: Jouppi Reading for.

Computer ArchitectureFall 2007 © November 7th, 2007 Majd F. Sakr CS-447– Computer Architecture.

1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.

11/3/2005Comp 120 Fall November 10 classes to go! Cache.

1  2004 Morgan Kaufmann Publishers Chapter Seven.

1 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value is stored as a charge.

CS 524 (Wi 2003/04) - Asim LUMS 1 Cache Basics Adapted from a presentation by Beth Richardson

Virtual Memory Topics Virtual Memory Access Page Table, TLB Programming for locality Memory Mountain Revisited.

1 CSE SUNY New Paltz Chapter Seven Exploiting Memory Hierarchy.

Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr CS-447– Computer Architecture.

Systems I Locality and Caching

CSE332: Data Abstractions Lecture 26: Amortized Analysis Tyler Robison Summer

ECE Dept., University of Toronto

Virtual Memory.

CMPE 421 Parallel Computer Architecture

CS1104: Computer Organisation School of Computing National University of Singapore.

Lecture 19 Today’s topics Types of memory Memory hierarchy.

CSE332: Data Abstractions Lecture 8: Memory Hierarchy Tyler Robison Summer

CS1104 – Computer Organization PART 2: Computer Architecture Lecture 10 Memory Hierarchy.

3-May-2006cse cache © DW Johnson and University of Washington1 Cache Memory CSE 410, Spring 2006 Computer Systems

King Fahd University of Petroleum and Minerals King Fahd University of Petroleum and Minerals Computer Engineering Department Computer Engineering Department.

CSE373: Data Structure & Algorithms Lecture 24: Memory Hierarchy and Data Locality Aaron Bauer Winter 2014.

CSE373: Data Structures & Algorithms Lecture 12: Amortized Analysis and Memory Locality Nicki Dell Spring 2014.

The Goal: illusion of large, fast, cheap memory Fact: Large memories are slow, fast memories are small How do we create a memory that is large, cheap and.

CSE378 Intro to caches1 Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early.

Introduction: Memory Management 2 Ideally programmers want memory that is large fast non volatile Memory hierarchy small amount of fast, expensive memory.

1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.

Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.

Caches Hiding Memory Access Times. PC Instruction Memory 4 MUXMUX Registers Sign Ext MUXMUX Sh L 2 Data Memory MUXMUX CONTROLCONTROL ALU CTL INSTRUCTION.

Memory Hierarchy: Terminology Hit: data appears in some block in the upper level (example: Block X)  Hit Rate : the fraction of memory access found in.

1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.

1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.

1  1998 Morgan Kaufmann Publishers Chapter Seven.

1  2004 Morgan Kaufmann Publishers Locality A principle that makes having a memory hierarchy a good idea If an item is referenced, temporal locality:

CSE332: Data Abstractions Lecture 25: Amortized Analysis Sandra B. Fan Winter

Lecture 5: Memory Performance. Types of Memory Registers L1 cache L2 cache L3 cache Main Memory Local Secondary Storage (local disks) Remote Secondary.

CSE373: Data Structures & Algorithms Lecture 10: Amortized Analysis and Disjoint Sets Aaron Bauer Winter 2014.

CSE373: Data Structures & Algorithms Lecture 8: Amortized Analysis Dan Grossman Fall 2013.

What is it and why do we need it? Chris Ward CS147 10/16/2008.

1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.

1 How will execution time grow with SIZE? int array[SIZE]; int sum = 0; for (int i = 0 ; i < ; ++ i) { for (int j = 0 ; j < SIZE ; ++ j) { sum +=

CSE373: Data Structures & Algorithms Lecture 26: Memory Hierarchy and Locality 1 Kevin Quinn, Fall 2015.

Memory Management memory hierarchy programs exhibit locality of reference - non-uniform reference patterns temporal locality - a program that references.

CSE 351 Caches. Before we start… A lot of people confused lea and mov on the midterm Totally understandable, but it’s important to make the distinction.

Computer Architecture Lecture 25 Fasih ur Rehman.

CSE373: Data Structures & Algorithms Lecture Supplement: Amortized Analysis Hunter Zahn Summer 2016 CSE373: Data Structures & Algorithms1.

CMSC 611: Advanced Computer Architecture

CSE 351 Section 9 3/1/12.

Memory COMPUTER ARCHITECTURE

The Goal: illusion of large, fast, cheap memory

Ramya Kandasamy CS 147 Section 3

How will execution time grow with SIZE?

CSE373: Data Structures & Algorithms

Instructor: Lilian de Greef Quarter: Summer 2017

Today How’s Lab 3 going? HW 3 will be out today

Instructor: Lilian de Greef Quarter: Summer 2017

Lecture 14 Virtual Memory and the Alpha Memory Hierarchy

Data Structures and Algorithms

Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.

Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.

Chapter Five Large and Fast: Exploiting Memory Hierarchy

Lecture 13: Computer Memory

Cache Memory and Performance

CSE 332: Data Abstractions Memory Hierarchy

Presentation transcript:

CSE373: Data Structures & Algorithms Lecture 12: Amortized Analysis and Memory Locality Lauren Milne Summer 2015

Announcements Midterms graded, pick them up at the end of class. Homework 3 due on Wednesday at 11pm TA Session tomorrow starts at 11 AM (not 10:50 AM) 2

Amortized Analysis In amortized analysis, the time required to perform a sequence of data structure operations is averaged over all the operations performed. Typically used to show that the average cost of an operation is small for a sequence of operations, even though a single operation can cost a lot 3

Amortized Analysis Recall our plain-old stack implemented as an array that doubles its size if it runs out of room –Can we claim push is O( 1 ) time if resizing is O( n ) time? –We can’t, but we can claim it’s an O( 1 ) amortized operation 4

Amortized Complexity We get an upperbound T(n) on the total time of a sequence of n operations. The average time per operation is then T(n)/n, which is also the amortized time per operation. If a sequence of n operations takes O( n f(n) ) time, we say the amortized runtime is O( f(n) ) –If n operations take O( n ), what is amortized time per operation? O( 1 ) per operation –If n operations take O( n 3 ), what is amortized time per operation? O( n 2 ) per operation The worst case time for an operation can be larger than f(n), but amortized guarantee ensures the average time per operation for any sequence is O( f(n) ) 5

“Building Up Credit” Can think of preceding “cheap” operations as building up “credit” that can be used to “pay for” later “expensive” operations Because any sequence of operations must be under the bound, enough “cheap” operations must come first –Else a prefix of the sequence would violate the bound 6

Example #1: Resizing stack A stack implemented with an array where we double the size of the array if it becomes full Claim: Any sequence of push / pop / isEmpty is amortized O( 1 ) per operation Need to show any sequence of M operations takes time O( M ) –Recall the non-resizing work is O( M ) (i.e., M* O( 1 )) –The resizing work is proportional to the total number of element copies we do for the resizing –So it suffices to show that: After M operations, we have done < 2M total element copies (So average number of copies per operation is bounded by a constant) 7

Amount of copying Claim: after M operations, we have done < 2M total element copies Let n be the size of the array after M operations –Then we have done a total of: n/2 + n/4 + n/8 + … INITIAL_SIZE < n element copies –We must have done at least enough push operations to cause resizing up to size n, so: M  n/2 –Therefore 2M  n > number of element copies 8

Other approaches If array grows by a constant amount (say 1000), operations are not amortized O( 1 ) –After O( M ) operations, you may have done  ( M 2 ) copies If array shrinks when 1/2 empty, operations are not amortized O( 1 ) –Terrible case: pop once and shrink, push once and grow, pop once and shrink, … If array shrinks when 3/4 empty, it is amortized O( 1 ) –Proof is more complicated, but basic idea remains: by the time an expensive operation occurs, many cheap ones occurred 9

Example #2: Queue with two stacks A clever and simple queue implementation using only stacks class Queue { Stack in = new Stack (); Stack out = new Stack (); void enqueue(E x){ in.push(x); } E dequeue(){ if(out.isEmpty()) { while(!in.isEmpty()) { out.push(in.pop()); } return out.pop(); } CBACBA in out enqueue: A, B, C 10

Example #2: Queue with two stacks A clever and simple queue implementation using only stacks class Queue { Stack in = new Stack (); Stack out = new Stack (); void enqueue(E x){ in.push(x); } E dequeue(){ if(out.isEmpty()) { while(!in.isEmpty()) { out.push(in.pop()); } return out.pop(); } in out dequeue BCBC A 11

Example #2: Queue with two stacks A clever and simple queue implementation using only stacks class Queue { Stack in = new Stack (); Stack out = new Stack (); void enqueue(E x){ in.push(x); } E dequeue(){ if(out.isEmpty()) { while(!in.isEmpty()) { out.push(in.pop()); } return out.pop(); } in out enqueue D, E BCBC A EDED 12

Example #2: Queue with two stacks A clever and simple queue implementation using only stacks class Queue { Stack in = new Stack (); Stack out = new Stack (); void enqueue(E x){ in.push(x); } E dequeue(){ if(out.isEmpty()) { while(!in.isEmpty()) { out.push(in.pop()); } return out.pop(); } in out dequeue twice C B A EDED 13

Example #2: Queue with two stacks A clever and simple queue implementation using only stacks class Queue { Stack in = new Stack (); Stack out = new Stack (); void enqueue(E x){ in.push(x); } E dequeue(){ if(out.isEmpty()) { while(!in.isEmpty()) { out.push(in.pop()); } return out.pop(); } in out dequeue again D C B A E 14

Analysis dequeue is not O( 1 ) worst-case because out might be empty and in may have lots of items But if the stack operations are (amortized) O( 1 ), then any sequence of queue operations is amortized O( 1 ) –The total amount of work done per element is 1 push onto in, 1 pop off of in, 1 push onto out, 1 pop off of out –When you reverse n elements, there were n earlier O( 1 ) enqueue operations to average with 15

When is Amortized Analysis Useful? When the average per operation is all we care about (i.e., sum over all operations), amortized is perfectly fine If we need every operation to finish quickly (e.g., in a web server), amortized bounds may be too weak 16

Not always so simple Proofs for amortized bounds can be much more complicated Example: Splay trees are dictionaries with amortized O( log n ) operations –See Chapter 4.5 if curious For more complicated examples, the proofs need much more sophisticated invariants and “potential functions” to describe how earlier cheap operations build up “energy” or “money” to “pay for” later expensive operations –See Chapter 11 if curious But complicated proofs have nothing to do with the code (which may be easy!) 17

Switching gears… Memory hierarchy/locality 18

Why do we need to know about the memory hierarchy/locality? One of the assumptions that Big-O makes is that all operations take the same amount of time Is this really true? 19

Definitions A cycle (for our purposes) is the time it takes to execute a single simple instruction (e.g. adding two registers together) Memory latency is the time it takes to access memory 20

CPU ~ registers Time to access: 1 ns per instruction Cache SRAM 8 KB - 4 MB 2-10 ns Main Memory DRAM 2-10 GB ns Disk many GB a few milliseconds (5-10 million ns) 21

What does this mean? It is much faster to do:Than: 5 million arithmetic ops 1 disk access 2500 L2 cache accesses1 disk access 400 main memory accesses1 disk access Why are computers build this way? –Physical realities (speed of light, closeness to CPU) –Cost (price per byte of different storage technologies) –Under the right circumstances, this kind of hierarchy can simulate storage with access time of highest (fastest) level and size of lowest (largest) level 22

23

Processor-Memory Performance Gap 24

What can be done? Goal: attempt to reduce the accesses to slower levels 25

So, what can we do? The hardware automatically moves data from main memory into the caches for you –Replacing items already there –Algorithms are much faster if “data fits in cache” (often does) Disk accesses are done by software (e.g. ask operating system to open a file or database to access some records) So most code “just runs,” but sometimes it’s worth designing algorithms / data structures with knowledge of memory hierarchy –To do this, we need to understand locality 26

Locality Temporal Locality (locality in time) –If an item (a location in memory) is referenced, that same location will tend to be referenced again soon. Spatial Locality (locality in space) –If an item is referenced, items whose addresses are close by tend to be referenced soon. 27

How does data move up the hierarchy? Moving data up the hierarchy is slow because of latency (think distance to travel) –Since we’re making the trip anyway, might as well carpool Get a block of data in the same time we could get a byte –Sends nearby memory because It’s easy Likely to be asked for soon (think fields/arrays) Once a value is in cache, may as well keep it around for a while; accessed once, a value is more likely to be accessed again in the near future (as opposed to some random other value) Spatial Locality Temporal Locality 28

Cache Facts Definitions: –Cache hit – address requested is in the cache –Cache miss – address requested is NOT in the cache –Block or page size – the number of contiguous bytes moved from disk to memory –Cache line size – the number of contiguous bytes moved from memory to cache 29

Examples x = a + 6 y = a + 5 z = 8 * a miss hit 30

Examples x = a + 6 y = a + 5 z = 8 * a x = a[0] + 6 y = a[1] + 5 z = 8 * a[2] miss hit 31

Examples x = a + 6 y = a + 5 z = 8 * a x = a[0] + 6 y = a[1] + 5 z = 8 * a[2] miss hit temporal locality spatial locality 32

Locality and Data Structures Which has (at least the potential) for better spatial locality, arrays or linked lists? 33

Locality and Data Structures Which has (at least the potential) for better spatial locality, arrays or linked lists? –e.g. traversing elements Only miss on first item in a cache line cache line size miss hit 34

Locality and Data Structures Which has (at least the potential) for better spatial locality, arrays or linked lists? –e.g. traversing elements 35

Locality and Data Structures Which has (at least the potential) for better spatial locality, arrays or linked lists? –e.g. traversing elements Miss on every item (unless more than one randomly happen to be in the same cache line) miss hit miss hit miss hit miss hit 36