Presentation is loading. Please wait.

Presentation is loading. Please wait.

CPSC 461 Final Review I Hessam Zakerzadeh Dina Said.

Similar presentations


Presentation on theme: "CPSC 461 Final Review I Hessam Zakerzadeh Dina Said."— Presentation transcript:

1 CPSC 461 Final Review I Hessam Zakerzadeh Dina Said

2 9.1) What is the most important difference between a disk and a tape?

3 Tapes are sequential devices that do not support direct access to a desired page. We must essentially step through all pages in order. Disks support direct access to a desired page.

4 Exercise 11.4 Answer the following questions about Linear Hashing: 1. How does Linear Hashing provide an average-case search cost of only slightly more than one disk I/O, given that overflow buckets are part of its data structure?

5 Linear Hashing No directory More flexibility wrt time for bucket splits Worse performance than Extendible Hashing if data is skewed. Utilizes a family of Hash function h0,h1,… such that hi(v)=h(v) mod 2 i N – N is the initial number of buckets – If N is power of 2 d0, then apply h and look at the last di bits → di=d0+1

6 Inserting a Data Entry in LH Find bucket by applying h Level / h Level+1 :  If the bucket to insert into is full: Add overflow page and insert data entry. (Maybe) split Next bucket and increment Next.  Else simply insert the data entry into the bucket.

7 Bucket Split A split can be triggered by  the addition of a new overflow page  conditions such as space utilization Whenever a split is triggered,  the Next bucket is split,  and hash function h Level+1 redistributes entries between this bucket (say bucket number b) and its split image;  the split image is therefore bucket number b+N Level.  Next  Next + 1.

8 Example: Insert 44 (11100), 9 (01001) 0 h h 1 Level=0, Next=0, N=4 0 0101 1010 1 00 0 001 01 0 01 1 PRIMAR Y PAGE S 44 * 36 * 32 * 25 * 9*9* 5*5* 14 * 18 * 10 * 30 * 31 * 35 * 11 * 7*7* ( This info is for illustration only!) Next=0

9 Example: Insert 43 (101011) 0 h h 1 Level=0, N=4 0 0101 1010 1 00 0 001 01 0 01 1 Next=0 PRIMAR Y PAGE S 44 * 36 * 32 * 25 * 9*9* 5*5* 14 * 18 * 10 * 30 * 31 * 35 * 11 * 7*7* ( This info is for illustration only!) 0 h h 1 Level=0 0 0101 1010 1 00 0 001 01 0 01 1 Next=1 PRIMAR Y PAGE S OVERFLO W PAGE S 0 10 0 44 * 36 * 32 * 25 * 9*9* 5*5* 14 * 18 * 10 * 30 * 31 * 35 * 11 * 7*7* 43 * ( This info is for illustration only!) ç

10 Example: End of a Round 0 h h 1 22* 00 01 10 11 000 001 010 011 00 100 Next=3 01 10 101 110 Level=0, Next = 3 PRIMARY PAGES OVERFLOW PAGES 32* 9* 5* 14* 25* 66* 10* 18* 34* 35*31* 7* 11* 43* 44*36* 37*29* 30* 0 h h 1 37* 00 01 10 11 000 001 010 011 00 100 10 101 110 Next=0 111 11 PRIMARY PAGES OVERFLOW PAGES 11 32* 9*25* 66* 18* 10* 34* 35* 11* 44* 36* 5* 29* 43* 14* 30* 22* 31*7* 50* Insert 50 (110010) Level=1, Next = 0

11 Exercise 11.4 Answer the following questions about Linear Hashing: 1. How does Linear Hashing provide an average-case search cost of only slightly more than one disk I/O, given that overflow buckets are part of its data structure?

12 If we start with an index which has B buckets, during the round all the buckets will be split in order, one after the other. A hash function is expected to distribute the search key values uniformly in all the buckets  A split can be triggered by C onditions such as space utilization → length of the overflow chain reduces.  Therefore, number of overflow pages isn't expect to be more than 1

13 Exercise 11.4 Answer the following questions about Linear Hashing: Does Linear Hashing guarantee at most one disk access to retrieve a record with a given key value?

14 Exercise 11.4 Answer the following questions about Linear Hashing: Does Linear Hashing guarantee at most one disk access to retrieve a record with a given key value? No. Overflow chains are part of the structure, so no such guarantees are provided

15 Exercise 11.4 Answer the following questions about Linear Hashing: If a Linear Hashing index using Alternative (1) for data entries contains N records, with P records per page and an average storage utilization of 80 percent, what is the worst- case cost for an equality search? Under what conditions would this cost be the actual search cost?

16 Maximum Number of records in each page = 0.8 * P If all keys map to the same bucket  We will have (N / 0.8P) pages in that bucket.  This is the worst time

17 Exercise 11.4 Answer the following questions about Linear Hashing: If the hash function distributes data entries over the space of bucket numbers in a very skewed (non-uniform) way, what can you say about the space utilization in data pages?

18 Space utilization =  Total Number of buckets / Total Number of pages If data is skewed:  All records are mapped to the same bucket Suppose that we have m main pages  All records will be mapped to bucket 0  Each additional overflow will cause split  Suppose we added n overflow pages to bucket 0 → we added n buckets  Total Number of buckets = n+1  Total Number of pages = m + n +n  Space Utilization = (n+1) / (m+2n) < 50% → Very bad

19 13.4

20 10*10^6 pages Files 320 pages average seek time 10 ms, average rotational delay 5 ms Transfer time 1 ms per page The page is 4K For Pass 0: Ceil(10*10^6 / 320)= 31250 Runs Read Cost per Run = (10+5 + 1*320) Write Cost per Run = (10+5 + 1*320) Total I/O cost =  No of Runs * (Cost of read + Cost of Write) = 31250 * 2* (15+320) → Cost of Pass 0

21 10*10^6 pages Files 320 pages average seek time 10 ms, average rotational delay 5 ms Transfer time 1 ms per page The page is 4K Total Cost for subsequent merges =  No. of Passes * (Read Cost + Write Cost)  No. of passes = ceil (log noOfWay 31250) = ceil ( ln 31250 / ln No. of ways) Read/Write Cost: = No. of blocks * ( 10 + 5 + 1 * No. of pages per block)  No. of blocks= Ceil (10*10^6 / No. of pages per block)

22 10*10^6 pages Files 320 pages average seek time 10 ms, average rotational delay 5 ms Transfer time 1 ms per page b) Create 256 ‘input’ buffers of 1 page each, create an ‘output’ buffer of 64 pages, and do 256-way merges. Total Cost for subsequent merges =  No. of Passes * (Read Cost + Write Cost)  No. of passes = = ceil ( ln 31250 / ln No. of ways) = ceil ( ln 31250 / ln 256) = 2 Read Cost: = 16 *10^7

23 10*10^6 pages Files 320 pages average seek time 10 ms, average rotational delay 5 ms Transfer time 1 ms per page b) Create 256 ‘input’ buffers of 1 page each, create an ‘output’ buffer of 64 pages, and do 256-way merges. Write Cost: = No. of blocks * ( 10 + 5 + 1 * No. of pages per block) = 156250 * (15+64)  No. of blocks= Ceil (10*10^6 / No. of pages per block) = ceil (10* 10^6 /64) = 156250

24 10*10^6 pages Files 320 pages average seek time 10 ms, average rotational delay 5 ms Transfer time 1 ms per page b) Create 256 ‘input’ buffers of 1 page each, create an ‘output’ buffer of 64 pages, and do 256-way merges. Total Cost for subsequent merges =  No. of Passes * (Read Cost + Write Cost) = 2* (16*10^7 + 156250 * (15+64))

25 10*10^6 pages Files 320 pages average seek time 10 ms, average rotational delay 5 ms Transfer time 1 ms per page e) Create four ‘input’ buffers of 64 pages each, create an ‘output’ buffer of 64 pages, and do four-way merges. Total Cost for subsequent merges =  No. of Passes * (Read Cost + Write Cost) No. of passes = ceil ( ln 31250 / ln No. of ways) =8 Read/Write Cost: = No. of blocks * ( 10 + 5 + 1 * No. of pages per block) = 156250 * (15+64)  No. of blocks= Ceil (10*10^6 / No. of pages per block) = ceil (10* 10^6 /64) = 156250 Total Cost=8 * (2 * 156250 * (15+64))


Download ppt "CPSC 461 Final Review I Hessam Zakerzadeh Dina Said."

Similar presentations


Ads by Google