Gerth Stølting Brodal Cache-Oblivious Algorithms - A Unified Approach to Hierarchical Memory Algorithms Current Trends in Algorithms, Complexity Theory,

Slides:



Advertisements
Similar presentations
Hardware-based Devirtualization (VPC Prediction) Hyesoon Kim, Jose A. Joao, Onur Mutlu ++, Chang Joo Lee, Yale N. Patt, Robert Cohn* ++ *
Advertisements

D. Tam, R. Azimi, L. Soares, M. Stumm, University of Toronto Appeared in ASPLOS XIV (2009) Reading Group by Theo 1.
Virtual Memory Chapter 18 S. Dandamudi To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer,  S. Dandamudi.
Lecture 34: Chapter 5 Today’s topic –Virtual Memories 1.
CSIE30300 Computer Architecture Unit 10: Virtual Memory Hsin-Chou Chi [Adapted from material by and
Virtual Memory Hardware Support
Hierarchy-conscious Data Structures for String Analysis Carlo Fantozzi PhD Student (XVI ciclo) Bioinformatics Course - June 25, 2002.
Caching IV Andreas Klappenecker CPSC321 Computer Architecture.
External-Memory and Cache-Oblivious Algorithms: Theory and Experiments Gerth Stølting Brodal Oberwolfach Workshop on ”Algorithm Engineering”, Oberwolfach,
S.1 Review: The Memory Hierarchy Increasing distance from the processor in access time L1$ L2$ Main Memory Secondary Memory Processor (Relative) size of.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
Review CPSC 321 Andreas Klappenecker Announcements Tuesday, November 30, midterm exam.
Recap. The Memory Hierarchy Increasing distance from the processor in access time L1$ L2$ Main Memory Secondary Memory Processor (Relative) size of the.
1 Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
The Memory Hierarchy II CPSC 321 Andreas Klappenecker.
Csci4203/ece43631 Review Quiz. 1)It is less expensive 2)It is usually faster 3)Its average CPI is smaller 4)It allows a faster clock rate 5)It has a simpler.
Virtual Memory and Paging J. Nelson Amaral. Large Data Sets Size of address space: – 32-bit machines: 2 32 = 4 GB – 64-bit machines: 2 64 = a huge number.
Cache Oblivious Search Trees via Binary Trees of Small Height
Memory: Virtual MemoryCSCE430/830 Memory Hierarchy: Virtual Memory CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng Zhu.
1  2004 Morgan Kaufmann Publishers Chapter Seven.
1 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value is stored as a charge.
Virtual Memory Topics Virtual Memory Access Page Table, TLB Programming for locality Memory Mountain Revisited.
Mem. Hier. CSE 471 Aut 011 Evolution in Memory Management Techniques In early days, single program run on the whole machine –Used all the memory available.
1 CSE SUNY New Paltz Chapter Seven Exploiting Memory Hierarchy.
Systems I Locality and Caching
Lecture 11: DMBS Internals
CSE431 L22 TLBs.1Irwin, PSU, 2005 CSE 431 Computer Architecture Fall 2005 Lecture 22. Virtual Memory Hardware Support Mary Jane Irwin (
Lecture 19: Virtual Memory
1  2004 Morgan Kaufmann Publishers Multilevel cache Used to reduce miss penalty to main memory First level designed –to reduce hit time –to be of small.
The Memory Hierarchy 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir.
IT253: Computer Organization
Memory and cache CPU Memory I/O. CEG 320/52010: Memory and cache2 The Memory Hierarchy Registers Primary cache Secondary cache Main memory Magnetic disk.
Virtual Memory Review Goal: give illusion of a large memory Allow many processes to share single memory Strategy Break physical memory up into blocks (pages)
Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.
1 Virtual Memory Main memory can act as a cache for the secondary storage (disk) Advantages: –illusion of having more physical memory –program relocation.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
Computer Organization CS224 Fall 2012 Lessons 45 & 46.
Improving Cache Performance Four categories of optimisation: –Reduce miss rate –Reduce miss penalty –Reduce miss rate or miss penalty using parallelism.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.
CS.305 Computer Architecture Memory: Virtual Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made available.
15-853: Algorithms in the Real World Locality II: Cache-oblivious algorithms – Matrix multiplication – Distribution sort – Static searching.
1  1998 Morgan Kaufmann Publishers Chapter Seven.
1  2004 Morgan Kaufmann Publishers Locality A principle that makes having a memory hierarchy a good idea If an item is referenced, temporal locality:
1 Adapted from UC Berkeley CS252 S01 Lecture 18: Reducing Cache Hit Time and Main Memory Design Virtucal Cache, pipelined cache, cache summary, main memory.
Virtual Memory Review Goal: give illusion of a large memory Allow many processes to share single memory Strategy Break physical memory up into blocks (pages)
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.
Virtual Memory 1 Computer Organization II © McQuain Virtual Memory Use main memory as a “cache” for secondary (disk) storage – Managed jointly.
High Performance Computing1 High Performance Computing (CS 680) Lecture 2a: Overview of High Performance Processors * Jeremy R. Johnson *This lecture was.
CS203 – Advanced Computer Architecture Virtual Memory.
Memory Hierarchies [FLPR12] Matteo Frigo, Charles E. Leiserson, Harald Prokop, Sridhar Ramachandran. Cache- Oblivious Algorithms. ACM Transactions on Algorithms,
Memory Management memory hierarchy programs exhibit locality of reference - non-uniform reference patterns temporal locality - a program that references.
CS161 – Design and Architecture of Computer
ECE232: Hardware Organization and Design
Memory COMPUTER ARCHITECTURE
CS161 – Design and Architecture of Computer
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
Algorithm Engineering (2016, Q3, 5 ECTS)
Chapter 8 Digital Design and Computer Architecture: ARM® Edition
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
Evolution in Memory Management Techniques
Memory and cache CPU Memory I/O.
Advanced Topics in Data Management
External-Memory and Cache-Oblivious Algorithms: Theory and Experiments
Algorithm Engineering (2017, Q3, 5 ECTS)
Virtual Memory Overcoming main memory size limitation
Computer Architecture
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
Overview Problem Solution CPU vs Memory performance imbalance
Presentation transcript:

Gerth Stølting Brodal Cache-Oblivious Algorithms - A Unified Approach to Hierarchical Memory Algorithms Current Trends in Algorithms, Complexity Theory, and Cryptography, Tsinghua University, Beijing, China, May 22-27, 2009 Cache-Oblivious Algorithms A Unified Approach to Hierarchical Memory Algorithms Gerth Stølting Brodal Aarhus University

Gerth Stølting Brodal Cache-Oblivious Algorithms - A Unified Approach to Hierarchical Memory Algorithms Overview  Computer ≠ Unit Cost RAM  Overview of Computer Hardware  A Trivial Program  Hierarchical Memory Models  Basic Algorithmic Results for Hierarchical Memory  The influence of other Chip Technologies Theory Practice

Gerth Stølting Brodal Cache-Oblivious Algorithms - A Unified Approach to Hierarchical Memory Algorithms Computer ≠ Unit Cost RAM

Gerth Stølting Brodal Cache-Oblivious Algorithms - A Unified Approach to Hierarchical Memory Algorithms From Algorithmic Idea to Program Execution Divide and conquer? … if ( t1[t1index] <= t2[t2index] ) a[index] = t1[t1index++]; else a[index] = t2[t2index++]; … for each x in m up to middle add x to left for each x in m after middle add x to right … Pseudo Code (Java) Code Idea Microcode Virtual Memory/ TLB L1, L2,… cache Pipelining Branch Prediction (Java) Byte Code + Virtual machine Assembler Machine Code Compiler Program Execution

Gerth Stølting Brodal Cache-Oblivious Algorithms - A Unified Approach to Hierarchical Memory Algorithms Computer Hardware

Gerth Stølting Brodal Cache-Oblivious Algorithms - A Unified Approach to Hierarchical Memory Algorithms Computer Hardware

Gerth Stølting Brodal Cache-Oblivious Algorithms - A Unified Approach to Hierarchical Memory Algorithms

Gerth Stølting Brodal Cache-Oblivious Algorithms - A Unified Approach to Hierarchical Memory Algorithms Inside a PC

Gerth Stølting Brodal Cache-Oblivious Algorithms - A Unified Approach to Hierarchical Memory Algorithms Motherboard

Gerth Stølting Brodal Cache-Oblivious Algorithms - A Unified Approach to Hierarchical Memory Algorithms Intel Pentium (1993)

Gerth Stølting Brodal Cache-Oblivious Algorithms - A Unified Approach to Hierarchical Memory Algorithms Pentium ® 4 Processor Microarchitecture Intel Technology Journal, 2001

Gerth Stølting Brodal Cache-Oblivious Algorithms - A Unified Approach to Hierarchical Memory Algorithms Inside a Harddisk

Gerth Stølting Brodal Cache-Oblivious Algorithms - A Unified Approach to Hierarchical Memory Algorithms Customizing a Dell 650 Processor speed 2.4 – 3.2 GHz L3 cache size 0.5 – 2 MB Memory 1/4 – 4 GB Hard Disk 36 GB– 146 GB – RPM CD/DVD 8 – 48x L2 cache size 256 – 512 KB L2 cache line size 128 Bytes L1 cache line size 64 Bytes L1 cache size 16 KB

Gerth Stølting Brodal Cache-Oblivious Algorithms - A Unified Approach to Hierarchical Memory Algorithms Memory Access Times LatencyRelative to CPU Register0.5 ns1 L1 cache0.5 ns1-2 L2 cache3 ns2-7 DRAM150 ns TLB500+ ns Disk10 ms10 7 Increasing

Gerth Stølting Brodal Cache-Oblivious Algorithms - A Unified Approach to Hierarchical Memory Algorithms A Trivial Program

Gerth Stølting Brodal Cache-Oblivious Algorithms - A Unified Approach to Hierarchical Memory Algorithms A Trivial Program for (i=0; i+d<n; i+=d) A[i]=i+d; A[i]=0; for (i=0, j=0; j<8*1024*1024; j++) i = A[i];

Gerth Stølting Brodal Cache-Oblivious Algorithms - A Unified Approach to Hierarchical Memory Algorithms A Trivial Program — d=1 RAM : n ≈ 2 25 = 128 MB

Gerth Stølting Brodal Cache-Oblivious Algorithms - A Unified Approach to Hierarchical Memory Algorithms A Trivial Program — d=1 L1 : n ≈ 2 12 = 16 KB L2 : n ≈ 2 16 = 256 KB

Gerth Stølting Brodal Cache-Oblivious Algorithms - A Unified Approach to Hierarchical Memory Algorithms A Trivial Program — n=2 24

Gerth Stølting Brodal Cache-Oblivious Algorithms - A Unified Approach to Hierarchical Memory Algorithms A Trivial Program Experiments were performed on a DELL 8000, PentiumIII, 850MHz, 128MB RAM, running Linux 2.4.2, and using gcc version 2.96 with optimization -O3 L1 instruction and data caches  4-way set associative, 32-byte line size  16KB instruction cache and 16KB write-back data cache L2 level cache  8-way set associative, 32-byte line size  256KB — hardward specification www. Intel.com

Gerth Stølting Brodal Cache-Oblivious Algorithms - A Unified Approach to Hierarchical Memory Algorithms Hierarchical Memory Models

Gerth Stølting Brodal Cache-Oblivious Algorithms - A Unified Approach to Hierarchical Memory Algorithms Algorithmic Problem  Modern hardware is not uniform — many different parameters – Number of memory levels – Cache sizes – Cache line/disk block sizes – Cache associativity – Cache replacement strategy – CPU/BUS/memory speed  Programs should ideally run for many different parameters – by knowing many of the parameters at runtime, or – by knowing few essential parameters, or – ignoring the memory hierarchies  Programs are executed on unpredictable configurations – Generic portable and scalable software libraries – Code downloaded from the Internet, e.g. Java applets – Dynamic environments, e.g. multiple processes Practice

Gerth Stølting Brodal Cache-Oblivious Algorithms - A Unified Approach to Hierarchical Memory Algorithms Hierarchical Memory – Abstract View CPUL1L2 A R M Increasing access time and space L3Disk

Gerth Stølting Brodal Cache-Oblivious Algorithms - A Unified Approach to Hierarchical Memory Algorithms Hierarchical Memory Models Limited success because to complicated Increasing access time and space — many parameters CPUL1L2 A R M L3Disk M1M1 M2M2 M3M3 M4M4 B1B1 B2B2 B3B3 B4B4

Gerth Stølting Brodal Cache-Oblivious Algorithms - A Unified Approach to Hierarchical Memory Algorithms External Memory Model — two parameters  Measure number of block transfers between two memory levels  Bottleneck in many computations  Very successful (simplicity) Limitations  Parameters B and M must be known  Does not handle multiple memory levels  Does not handle dynamic M CPU M e m o r y I/O c a c h e M B Aggarwal and Vitter 1988

Gerth Stølting Brodal Cache-Oblivious Algorithms - A Unified Approach to Hierarchical Memory Algorithms Ideal Cache Model — no parameters !?  Program with only one memory  Analyze in the I/O model for  Optimal off-line cache replacement  strategy arbitrary B and M Advantages  Optimal on arbitrary level → optimal on all levels  Portability, B and M not hard-wired into algorithm  Dynamic changing parameters CPU M e m o r y B M I/O c a c h e Frigo, Leiserson, Prokop, Ramachandran 1999

Gerth Stølting Brodal Cache-Oblivious Algorithms - A Unified Approach to Hierarchical Memory Algorithms Justification of the Ideal-Cache Model Optimal replacement LRU + 2 × cache size → at most 2 × cache misses Sleator and Tarjan, 1985 Corollary T M,B (N) = O(T 2M,B (N)) ) #cache misses using LRU is O(T M,B (N)) Two memory levels Optimal cache-oblivious algorithm satisfying T M,B (N) = O(T 2M,B (N)) → optimal #cache misses on each level of a multilevel LRU cache Fully associativity cache Simulation of LRU  Direct mapped cache  Explicit memory management  Dictionary (2-universal hash functions) of cache lines in memory  Expected O(1) access time to a cache line in memory Frigo, Leiserson, Prokop, Ramachandran 1999

Gerth Stølting Brodal Cache-Oblivious Algorithms - A Unified Approach to Hierarchical Memory Algorithms Basic External-Memory and Cache-Oblivious Results

Gerth Stølting Brodal Cache-Oblivious Algorithms - A Unified Approach to Hierarchical Memory Algorithms Scanning sum = 0 for i = 1 to N do sum = sum + A[i] sum = 0 for i = 1 to N do sum = sum + A[i] N B A O(N/B) I/Os Corollary External/Cache-oblivious selection requires O(N/B) I/Os Hoare 1961 / Blum et al. 1973

Gerth Stølting Brodal Cache-Oblivious Algorithms - A Unified Approach to Hierarchical Memory Algorithms External-Memory Merging Merging k sequences with N elements requires O(N/B) IOs provided k ≤ M/B - 1

Gerth Stølting Brodal Cache-Oblivious Algorithms - A Unified Approach to Hierarchical Memory Algorithms External-Memory Sorting θ(M/B)-way MergeSort achieves optimal O(Sort(N)=O(N/B·log M/B (N/B)) I/Os Aggarwal and Vitter 1988 M M Partition into runs Sort each run Merge pass I Merge pass II... Run 1Run 2Run N/M Sorted N Sorted ouput Unsorted input

Gerth Stølting Brodal Cache-Oblivious Algorithms - A Unified Approach to Hierarchical Memory Algorithms Cache-Oblivious Merging  k-way merging (lazy) using binary merging with buffers  tall cache assumption M ≥ B 2  O(N/B·log M/B k) IOs B1B1... M1M1 M top Frigo, Leiserson, Prokop, Ramachandran 1999 M k B k

Gerth Stølting Brodal Cache-Oblivious Algorithms - A Unified Approach to Hierarchical Memory Algorithms Cache-Oblivious Sorting – FunnelSort  Divide input in N 1/3 segments of size N 2/3  Recursively FunnelSort each segment  Merge sorted segments by an N 1/3 -merger  Theorem Provided M ≥ B 2 performs optimal O(Sort(N)) I/Os Frigo, Leiserson, Prokop and Ramachandran 1999

Gerth Stølting Brodal Cache-Oblivious Algorithms - A Unified Approach to Hierarchical Memory Algorithms Sorting (Disk) Brodal, Fagerberg, Vinther 2004

Gerth Stølting Brodal Cache-Oblivious Algorithms - A Unified Approach to Hierarchical Memory Algorithms Sorting (RAM) Brodal, Fagerberg, Vinther 2004

Gerth Stølting Brodal Cache-Oblivious Algorithms - A Unified Approach to Hierarchical Memory Algorithms External Memory Search Trees  B-trees Bayer and McCreight 1972  Searches and updates use O(log B N) I/Os degree O(B) Search/update path

Gerth Stølting Brodal Cache-Oblivious Algorithms - A Unified Approach to Hierarchical Memory Algorithms Cache-Oblivious Search Trees  Recursive memory layout (van Emde Boas) Prokop BkBk A B1B1 h h/2 AB1B1 BkBk Binary treeSearches use O(log B N) I/Os  Dynamization (several papers) – ”reduction to static layout”

Gerth Stølting Brodal Cache-Oblivious Algorithms - A Unified Approach to Hierarchical Memory Algorithms Cache-Oblivious Search Trees Brodal, Jacob, Fagerberg 1999

Gerth Stølting Brodal Cache-Oblivious Algorithms - A Unified Approach to Hierarchical Memory Algorithms Matrix Multiplication Frigo, Leiserson, Prokop and Ramachandran 1999 Iterative: Recursive: O ( N 3 ) I/Os I/Os Average time taken to multiply two N x N matrics divided by N 3 N

Gerth Stølting Brodal Cache-Oblivious Algorithms - A Unified Approach to Hierarchical Memory Algorithms  The ideal cache model helps designing competitive and robust algorithms  Basic techniques: Scanning, recursion/divide-and- conquer, recursive memory layout, sorting  Many other problems studied: Permuting, FFT, Matrix transposition, Priority queues, Graph algorithms, Computational geometry...  Overhead involved in being cache-oblivious can be small enough for the nice theoretical properties to transfer into practical advantages Cache-Oblivious Summary

Gerth Stølting Brodal Cache-Oblivious Algorithms - A Unified Approach to Hierarchical Memory Algorithms The Influence of other Chip Technologies why some experiments do not turn out as expected

Gerth Stølting Brodal Cache-Oblivious Algorithms - A Unified Approach to Hierarchical Memory Algorithms Translation Lookaside Buffer (TLB)  translate virtual addresses into physical addresses  small table (full associative)  TLB miss requires lookup to the page table Size: 8 - 4,096 entries Hit time: clock cycle Miss penalty: clock cycles wikipedia.org

Gerth Stølting Brodal Cache-Oblivious Algorithms - A Unified Approach to Hierarchical Memory Algorithms TLB and Radix Sort Rahman and Raman 1999 Time for one permutation phase

Gerth Stølting Brodal Cache-Oblivious Algorithms - A Unified Approach to Hierarchical Memory Algorithms Cache Associativity Execution times for scanning k sequences of total length N =2 24 in round-robin fashion (SUN-Sparc Ultra, direct mapped cache) Sanders 1999Cache associativityTLB misses wikipedia.org

Gerth Stølting Brodal Cache-Oblivious Algorithms - A Unified Approach to Hierarchical Memory Algorithms TLB and Radix Sort Rahman and Raman 1999 TLB optimized

Gerth Stølting Brodal Cache-Oblivious Algorithms - A Unified Approach to Hierarchical Memory Algorithms Prefetching vs. Caching All Pairs Shortest Paths (APSP) Organize data so that the CPU can prefetch the data → computation (can) dominate cache effects Pan, Cherng, Dick, Ladner 2007 Prefetching disabled Prefetching enabled

Gerth Stølting Brodal Cache-Oblivious Algorithms - A Unified Approach to Hierarchical Memory Algorithms Branch Prediction vs. Caching QuickSort Select the pivot biased to achieve subproblems of size α and 1-α + reduces # branch mispredictions - increases # instructions and # cache faults Kaligosi and Sanders 2006

Gerth Stølting Brodal Cache-Oblivious Algorithms - A Unified Approach to Hierarchical Memory Algorithms Branch Prediction vs. Caching Skewed Search Trees Subtrees have size α and 1-α + reduces # branch mispredictions - increases # instructions and # cache faults Brodal and Moruz α α

Gerth Stølting Brodal Cache-Oblivious Algorithms - A Unified Approach to Hierarchical Memory Algorithms Another Trivial Program for(i=0;i<n;i++) A[i]=rand()% 1000; for(i=0;i<n;i++) if(A[i]>threshold) g++; else s++; for(i=0;i<n;i++) A[i]=rand()% 1000; for(i=0;i<n;i++) if(A[i]>threshold) g++; else s++; — the influence of branch predictions n A

Gerth Stølting Brodal Cache-Oblivious Algorithms - A Unified Approach to Hierarchical Memory Algorithms Branch Mispredictions for(i=0;i<n;i++) a[i]=rand()% 1000; for(i=0;i<n;i++) if(a[i]>threshold) g++; else s++; for(i=0;i<n;i++) a[i]=rand()% 1000; for(i=0;i<n;i++) if(a[i]>threshold) g++; else s++; Worst-case #mispredictions for thresholds

Gerth Stølting Brodal Cache-Oblivious Algorithms - A Unified Approach to Hierarchical Memory Algorithms Running Time for(i=0;i<n;i++) a[i]=rand()% 1000; for(i=0;i<n;i++) if(a[i]>threshold) g++; else s++; for(i=0;i<n;i++) a[i]=rand()% 1000; for(i=0;i<n;i++) if(a[i]>threshold) g++; else s++;  Prefetching disabled → sec  Prefetching enabled → 0.15 – 0.5 sec Prefetching disabled Prefetching enabled

Gerth Stølting Brodal Cache-Oblivious Algorithms - A Unified Approach to Hierarchical Memory Algorithms L2 Cache Misses for(i=0;i<n;i++) a[i]=rand()% 1000; for(i=0;i<n;i++) if(a[i]>threshold) g++; else s++; for(i=0;i<n;i++) a[i]=rand()% 1000; for(i=0;i<n;i++) if(a[i]>threshold) g++; else s++;  Prefetching disabled → cache misses  Prefetching enabled → cache misses Prefetching disabled Prefetching enabled

Gerth Stølting Brodal Cache-Oblivious Algorithms - A Unified Approach to Hierarchical Memory Algorithms L1 Cache Misses for(i=0;i<n;i++) a[i]=rand()% 1000; for(i=0;i<n;i++) if(a[i]>threshold) g++; else s++; for(i=0;i<n;i++) a[i]=rand()% 1000; for(i=0;i<n;i++) if(a[i]>threshold) g++; else s++;  Prefetching disabled → 4 – 16 x 10 6 cache misses  Prefetching enabled → 2.5 – 5.5 x 10 6 cache misses Prefetching disabled Prefetching enabled

Gerth Stølting Brodal Cache-Oblivious Algorithms - A Unified Approach to Hierarchical Memory Algorithms Summary  Be conscious about the presence of memory hierarcies when designing algorithms  Experimental results not often quite as expected due to neglected hardware features in algorithm design  External memory model and cache-oblivious model adequate models for capturing disk/cache buttlenecks

Gerth Stølting Brodal Cache-Oblivious Algorithms - A Unified Approach to Hierarchical Memory Algorithms  non-uniform memory  parallel disks  parallel/distributed algorithms  graph algorithms  computational geometry  string algorithms  and a lot more... What did I not talk about...

Gerth Stølting Brodal Cache-Oblivious Algorithms - A Unified Approach to Hierarchical Memory Algorithms Overview  Computer ≠ Unit Cost RAM  Overview of Computer Hardware  A Trivial Program  Hierarchical Memory Models  Basic Algorithmic Results for Hierarchical Memory  The influence of other Chip Technologies Theory Practice