Presentation is loading. Please wait.

Presentation is loading. Please wait.

Oct 29, 2001CSE 373, Autumn 20011 External Storage For large data sets, the computer will have to access the disk. Disk access can take 200,000 times longer.

Similar presentations


Presentation on theme: "Oct 29, 2001CSE 373, Autumn 20011 External Storage For large data sets, the computer will have to access the disk. Disk access can take 200,000 times longer."— Presentation transcript:

1 Oct 29, 2001CSE 373, Autumn 20011 External Storage For large data sets, the computer will have to access the disk. Disk access can take 200,000 times longer than a machine instruction. The RAM model does not account for disk I/O. memory disk 128 MB fast, expensive 60 GB slow, cheap

2 Oct 29, 2001CSE 373, Autumn 20012 Disks, continued The difference between memory speed and disk speed is increasing. Example: State of Florida driving records (256 bytes). 10,000,000 items. 6 disk accesses per second on a time-sharing system. unbalanced binary search tree: possibly 10,000,000 accesses. BST: on avg. 32 accesses (5 sec.) AVL: worst: 1.44 log n typical case: log n, 25 accesses (4 sec.)

3 Oct 29, 2001CSE 373, Autumn 20013 Disk accesses Goal: reduce the number of disk accesses. We are willing to do more complicated computations in memory in order to save disk time. Idea: increase the branching of the tree so that the height is decreased. Defn: An M-ary search tree allows up to M children per node.

4 Oct 29, 2001CSE 373, Autumn 20014 B-Trees 1.All the data items are stored at the leaves. 2.The non-leaf nodes store up to M-1 keys. The ith key represents the smallest key in subtree i+1. 3.The root is either a leaf of has between 2 and M children. 4.All non-leaf nodes (except the root) have between  M/2  and M children. 5.All leaves are at the same depth and have between  L/2  and L data items.

5 Oct 29, 2001CSE 373, Autumn 20015 B-Trees: Choices Choose M and L based on the size of the keys K and on the size of the record R. Suppose a disk block is of size B (bytes). Choose M so that a non-leaf node fits into one block: B  (M-1) · K + M · 4 Choose L so that a leaf node fits into one block: B  L · R accesses: log 2 N vs. log  M/2  N

6 Oct 29, 2001CSE 373, Autumn 20016 Hash Tables Constant time accesses! A hash table is an array of some fixed size, usually a prime number. General idea: key space (e.g., strings) 0 … TableSize –1 hash func. h(K) hash table

7 Oct 29, 2001CSE 373, Autumn 20017 Desirable Properties We want a hash function to: 1.be simple/fast to compute, 2.map different keys to different cells, (impossible – why?) 3.have keys distributed evenly among cells. Idea: If #1 and #3 are true and the hash table is not very full, then it should be fast to do a find.

8 Oct 29, 2001CSE 373, Autumn 20018 Example key space = integers h(K) = K mod 10 0 141 2 3 434 5 6 77 818 9 We lose all ordering information: findMin, findMax, inorder traversal, printing items in sorted order.

9 Oct 29, 2001CSE 373, Autumn 20019 Example 2 key space = strings s = s 0 s 1 s 2 … s k-1 h(s) = s 0 mod TableSize BAD HASH FUNCTION h(s) = mod TableSize BETTER HASH FUNCTION

10 Oct 29, 2001CSE 373, Autumn 200110 Collision Resolution Separate chaining: All keys that map to the same hash value are kept in a list. 0 1 2 3 4 5 6 7 8 9 10 107 221242


Download ppt "Oct 29, 2001CSE 373, Autumn 20011 External Storage For large data sets, the computer will have to access the disk. Disk access can take 200,000 times longer."

Similar presentations


Ads by Google