Presentation is loading. Please wait.

Presentation is loading. Please wait.

I/O-Algorithms Lars Arge January 31, 2005. Lars Arge I/O-algorithms 2 Random Access Machine Model Standard theoretical model of computation: –Infinite.

Similar presentations


Presentation on theme: "I/O-Algorithms Lars Arge January 31, 2005. Lars Arge I/O-algorithms 2 Random Access Machine Model Standard theoretical model of computation: –Infinite."— Presentation transcript:

1 I/O-Algorithms Lars Arge January 31, 2005

2 Lars Arge I/O-algorithms 2 Random Access Machine Model Standard theoretical model of computation: –Infinite memory –Uniform access cost RAMRAM

3 Lars Arge I/O-algorithms 3 Hierarchical Memory Modern machines have complicated memory hierarchy –Levels get larger and slower further away from CPU –Levels have different associativity and replacement strategies –Large access time amortized using block transfer between levels Bottleneck often transfers between largest memory levels in use L1L1 L2L2 RAMRAM

4 Lars Arge I/O-algorithms 4 I/O-Bottleneck I/O is often bottleneck when handling massive datasets –Disk access is 10 6 times slower than main memory access –Large transfer block size (typically 8-16 Kbytes) Important to obtain “locality of reference” –Need to store and access data to take advantage of blocks

5 Lars Arge I/O-algorithms 5 Massive Data Massive datasets are being collected everywhere Storage management software is billion-$ industry Examples: Phone: AT&T 20TB phone call database, wireless tracking Consumer: WalMart 70TB database, buying patterns WEB: Web crawl of 200M pages and 2000M links, Akamai stores 7 billion clicks per day Geography: NASA satellites generate 1.2TB per day

6 Lars Arge I/O-algorithms 6 Example: Grid Terrain Data Appalachian Mountains (800km x 800km) 500MB at 100m resolution 5.5GB at 30m resolution – NASA SRTM mission acquired 30m data for 80% of the earth land mass 50GB at 10m resolution (some of US available from USGS) 5TB at 1m resolution

7 Lars Arge I/O-algorithms 7 I/O-Model Parameters N = # elements in problem instance B = # elements that fits in disk block M = # elements that fits in main memory K = # output size in searching problem We often assume that M>B 2 I/O: Movement of block between memory and disk D P M Block I/O

8 Lars Arge I/O-algorithms 8 List Ranking Trivial internal memory algorithm takes O(N) time –and causes O(N) page faults in external memory O(N/B) is the number of I/Os we need to read N element –Difference between N and N/B is extremely important in practice Can we develop O(N/B) algorithm? –Answer is NO CBADEGHF B=M/B=2

9 Lars Arge I/O-algorithms 9 Fundamental Bounds [AV88] Internal External Scanning: N Sorting: N log N Permuting –List rank Searching: Note: –Permuting not linear –Permuting and sorting bounds are equal in all practical cases –B factor VERY important: –Cannot sort optimally with search tree

10 Lars Arge I/O-algorithms 10 Sorting Merge sort: –Create N/M memory sized sorted runs –Merge runs together M/B at a time  phases using I/Os each

11 Lars Arge I/O-algorithms 11 Distribution Sort

12 Lars Arge I/O-algorithms 12 Finding partition elements

13 Lars Arge I/O-algorithms 13


Download ppt "I/O-Algorithms Lars Arge January 31, 2005. Lars Arge I/O-algorithms 2 Random Access Machine Model Standard theoretical model of computation: –Infinite."

Similar presentations


Ads by Google