Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction and File Structures Database System Implementation CSE 507 Some slides adapted from R. Elmasri and S. Navathe, Fundamentals of Database Systems,

Similar presentations


Presentation on theme: "Introduction and File Structures Database System Implementation CSE 507 Some slides adapted from R. Elmasri and S. Navathe, Fundamentals of Database Systems,"— Presentation transcript:

1 Introduction and File Structures Database System Implementation CSE 507 Some slides adapted from R. Elmasri and S. Navathe, Fundamentals of Database Systems, Sixth Edition, Pearson. And Silberschatz, Korth and Sudarshan Database System Concepts – 6 th Edition.

2 Linear Hashing

3  Directory is avoided in Linear Hash by using overflow pages, and choosing bucket to split in a round-robin fashion.  Splitting proceeds in “rounds.”  Round ends when all the initial buckets ( Nr ) for a round R are split.  At any stage during a round, the buckets 0 to Next-1 have been split.  The new buckets results from splits are placed after Nr. Some material adapted from Prof J. Harista IISC Bangalore.

4 Linear Hashing (LH) Some material adapted from Prof J. Harista IISC Bangalore.

5 Linear Hashing (LH) Some material adapted from Prof J. Harista IISC Bangalore.

6 Snapshot of a LH file  Buckets at the beginning of a round R. ……

7 Snapshot of a LH file ……

8 Snapshot of a LH file  Pointer (n) to the bucket to be split ……

9 Snapshot of a LH file When this is split a new bucket is created at the end. ……

10 Snapshot of a LH file ……  Buckets at the beginning of a round R.

11 Snapshot of a LH file ……  Buckets at the beginning of a round R.

12 Snapshot of a LH file ……  Buckets at the beginning of a round R. Buckets to be split New Buckets According to h i+1 (K) Buckets already split h i+1 (K)

13 Snapshot of a LH file ……  Buckets at the beginning of a round R. Buckets to be split Use h i (k) New Buckets According to h i+1 (K) Buckets to be split next

14 Searching Algorithm Step 1: bucketaddr h i (key) Step 2: bucketaddr < Nexttosplit  Then bucketaddr h i+1 (key)

15 Search Algorithm for a Key k …… Check if h i (k) falls in the red region Buckets to be split h i (K) New Buckets According to h i+1 (K) Buckets already split h i+1 (K)

16 Search Algorithm for a Key k …… If yes then use h i+1 (k) Why?? Buckets to be split h i (K) New Buckets According to h i+1 (K) Buckets already split h i+1 (K)

17 Search Algorithm for a Key k …… Else h i (k) gives the correct bucket. Why?? Buckets to be split h i (K) New Buckets According to h i+1 (K) Buckets already split h i+1 (K)

18 Insert Algorithm for Linear Hashing Some material adapted from Prof J. Harista IISC Bangalore.  Find bucket by applying h i / h i+1 : –If bucket to insert into is full:  Add overflow page and insert data entry.  Split Next bucket and increment Next pointer (Uncontrolled split)

19 Snapshot of a LH file All records in this bucket are re-hashed using h i+1 ……  Insertion over here caused an overflow Next Bucket to be split

20 Some Comments Some material adapted from Prof J. Harista IISC Bangalore.  Since buckets are split round-robin, long overflow chains don’t develop!  Doubling of directory in Extendible Hashing is similar;  switching of hash functions is implicit in how the # of bits examined is increased  Splits can be controlled using load factor.

21 LH* --- Linear Hashing in a Distributed Setting Litwin et. Al. “LH* -- A Scalable, Distributed Data Structure,” ACM Transactions on Database Systems, 21(4), 480--525

22 LH* --- Linear Hashing in a Distributed Setting  Setting:  Several client sites share a file F.  The clients insert records given keys.  F is stored on server sites  Clients and server are whole machines that nodes of a network.  Each server provides a storage space for objects of F called a bucket.  A server can send records to other servers.  LH* can accommodate any number of clients and servers.

23 LH* --- Linear Hashing in a Distributed Setting  LH* meets the following criteria:  The file expands to new servers gracefully, and only when already used servers are efficiently loaded.  There is no master site that the record address computations must go through.  The file access and maintenance primitives, e.g, search, insertion and split, etc., never require atomic updates to multiple clients.

24 Key Features of LH*  The file can grow to practically any size, with load factor about constant.  Insertion usually requires one message, and three in worst case.  Retrieval usually requires two messages, and four on worst case.  Supports parallel operations.  With and without a specialized Split Coordinator site.  In basic version (these slides) splitting is serialized by the split coordinator.  Several other variants, e.g., parallel splits and autonomous splitting.

25 Snapshot of LH* J = 10 Server 0 J = 10 Server 1 J = 9 Server 80 J = 10 Server 591 J = 10 Server 583 Next Split (N) Client 1 N’ = 5 I’ = 6 Client 2 N’ = 0 I’ = 2 Client M N’ = 31 I’ = 9

26 Addressing in LH*  Records of a LH* file are manipulated by the clients.  LH is based on the assumption that we know the correct N and I.  In a distributed setting with multiple clients this is only possible if we have master site (inefficient).  LH* do not require all the clients to have a consistent view of N and I.  Each client has its own view of N (N’) and I (I’).

27 Addressing in LH*  Step1: Client address calculation.  Step2: Server address calculation.

28 Addressing in LH* --- Algorithm at Client  Algorithm at Client (A1) with its own N’ and I’ Step 1: bucketaddr h i’ (key) Step 2: bucketaddr < N’  Then bucketaddr h i’+1 (key)

29 Addressing in LH*  Step1: Client address calculation.  Client computing the address using its N’ and I’ (both initialized to 0).  Sends the request to the concerned server.  Client image is updated in case of an addressing error.  Global N and I are not known to client, it slowly reaches there through updates.  Step2: Server address calculation.

30 Addressing in LH*  Step1: Client address calculation.  Step2: Server address calculation.  A server receiving a key, first verifies whether it should be the recipient.  If not the server re-calculates the address and forwards.  This is forwarding can at most take place 2 times.

31 Example on Client side Addressing J = 5 J = 4 Actual File N =7 and I=4 0 6715 16 22 J = 4 J = 3 0 23 78 10 Client Image N’ =3 and I’=3 Insert Key = 7

32 Example on Client side Addressing J = 5 J = 4 Actual File N =7 and I=4 0 6715 16 22 J = 4 J = 3 0 23 78 10 Client Image N’ =3 and I’=3 Insert Key = 15

33 Example on Client side Addressing J = 5 J = 4 Actual File N =7 and I=4 0 6715 16 22 J = 4 J = 3 0 34 78 10 Client Image N’ =4 and I’=3 Insert Key = 20

34 Addressing in LH* --- Algorithm at Server  Each Bucket (server) in LH* retains its level (J = I or J = I + 1).  Value of N (next bucket to be split) is not known to servers.  A server with bucket address A, recalculates the Key’s address A’ Step 1: A’ h J (key) Step 2: If A’ != A A’’ h J-1 (key) If A’’ > A and A’’ < A’ then A’ A’’ Forward the message to A’

35 Some things to Remember About Addressing Item 1: h J+1 (key) >= h J (key) Item 2: At instant the LH* can only have buckets at I or I+1

36 Example on Server side Addressing J = 5 J = 4 Actual File N =7 and I=4 0 6715 16 22 Step 1: A’ = 15 Mod 2^4 (J=4 for server 7)= 15 Step 2: A’ != A (15 != 7) A’’ = 15 Mod 8 If condition not satisfied (A’’ is not > A, they are ==) Message Forwarded to server 15 (Correct address) Insert Key = 15; Client – 7; Actual 15 Client Image was N’ =3 and I’=3 Server Side Algorithm

37 Example on Server side Addressing J = 2 J = 1 Actual File N =1 and I=1 Insert Key = 7; Client – 0; Actual 1 32 216 251 153 10 6 Client at N’=0 and I’= 0

38 Example on Server side Addressing J = 2 J = 1 Actual File N =1 and I=1 Step 1: A’ = 7 Mod 2^2 (J=2 for server 0)= 3 Step 2: A’ != A (3 != 0) A’’ = 7 Mod 2 = 1 If condition satisfied (A’’ > A && A’ > A’’) Message Forwarded to server 1 (Correct address) Insert Key = 7; Client – 0; Actual 1 Server Side Algorithm 32 216 251 153 10 6 Client at N’=0 and I’= 0 Server 3 does not exist ! Prevents requests from going to invalid servers

39 Example on Sever side Addressing J = 2 Actual File N =0 and I=2 Insert Key = 7; Client – 0; Actual 3 216 12 145 321 10 6 Client at N’=0 and I’= 0 J = 2 251 215

40 Example on Sever side Addressing J = 2 Actual File N =0 and I=2 Insert Key = 7; Client – 0; Actual 3 216 12 145 321 10 6 Client at N’=0 and I’= 0 J = 2 251 215 Step 1: A’ = 7 Mod 2^2 (J=2 for server 0)= 3 Step 2: A’ != A (3 != 0) A’’ = 7 Mod 2 = 1 If condition satisfied (A’’ > A && A’ > A’’) Message Forwarded to server 1 (Not Correct address) Sever Side Algorithm But can send them to a more conservative place

41 Example on Server side Addressing J = 2 Actual File N =0 and I=2 Insert Key = 7; Client – 0; Actual 3; Forwarded from 0 216 12 145 321 10 6 Client at N’=0 and I’= 0 J = 2 251 215 Step 1: A’ = 7 Mod 2^2 (J=2 for server 1)= 3 Step 2: A’ != A (3 != 1) A’’ = 7 Mod 2 = 1 If condition not satisfied (A’’ > A && A’ > A’’) Message Forwarded to server 3 (Correct address) Server Side Algorithm But can pull this off only once.

42 Example on Server side Addressing J = 5 J = 4 Actual File N =7 and I=4 0 6715 16 22 Step 1: A’ = 20 Mod 2^5 (J=5 for server 0)= 20 Step 2: A’ != A (20 != 0) A’’ = 20 Mod 16 = 4 If condition satisfied (A’’ > A && A’ > A’’) Message Forwarded to server 4 (Incorrect address) Insert Key = 20; Client – 0; Actual 20 Server Side Algorithm Client Image N’=0 and I’=0

43 Example on Server side Addressing J = 5 J = 4 Actual File N =7 and I=4 0 6715 16 22 Step 1: A’ = 20 Mod 2^5 (J=5 for server 0)= 20 Step 2: A’ != A (20 != 0) A’’ = 20 Mod 16 = 4 If condition not satisfied (A’’ == A && A’ > A’’) Message Forwarded to server 20 (Correct address) Insert Key = 20; Client – 0; Actual 20; From Server 0 Server Side Algorithm Client Image N’=0 and I’=0

44 Client Image Adjustment  Client updates its N’ and I’ whenever it encounters a addressing error.  A is the address where the client sent its key.  J is the level at server A (J is returned in image adj message).  Certainly not the accurate but it gets closer with each error. Step 1: I’ J -1; N’ A + 1; Step 2: If N’ >= 2^I’ then N’ 0 I’ I’ + 1

45 Client Image Adjustment Example J = 4 J = 3 0 23 78 10 Client Image N’ =3 and I’=3 Insert Key = 15; Client – 7; Actual 15; Server 7 was at level 4 J = 4 0 15 Adjusted Image N’ =0 and I’=4

46 Splitting in LH* (Uncontrolled)

47 Splitting in LH*  Splitting coordinator computes the value of new I and NexttoSplit  Server n (with bucket level J) which receives a message to split. Step 1: Creates a bucket n + 2^j with level J+1 Step 2: splits bucket n applying h J+1 Step 3: Updates J J + 1 Step 4: Commits the split to the splitting coordinator

48 Introduction to Buffering in Databases

49 Buffer Managers Buffer manager: A module in a database intelligently shuffles data from main memory to disk. It is transparent to higher levels of DBMS operation Buffer manager: A module in a database intelligently shuffles data from main memory to disk. It is transparent to higher levels of DBMS operation

50 Buffer Managers  Data must be in RAM for DBMS to operate on it!  Table of pairs is maintained DB MAIN MEMORY DISK disk page free frame Page Requests from Higher Levels BUFFER POOL choice of frame dictated by replacement policy READ WRITE INPUT OUTUPT

51 When a bucket/page is requested  If the requested page/bucket in the buffer pool  No need to go back to the disk!  If not? Choose a frame to replace.  If there is a free frame, use it!  Terminology : We pin a page (means it’s in use)  If not? We need to choose a page to remove!  What would be good strategy? --- Replacement policy

52 A simple strategy  A page is dirty, if its contents have been changed after writing  Buffer Manager keeps a dirty bit  Say we choose to evict P  If P is dirty, we write it to disk  What if no page is dirty?  Or multiple pages are dirty?

53 Review of Some strategies from OS-- LRU  Order pages by the time of last accessed  Always replace the least recently accessed P5, P2, P8, P4, P1, P9, P6, P3, P7 Access P6 P6, P5, P2, P8, P4, P1, P9, P3, P7

54 Some strategies from OS– Clock algorithm  Instead we maintain a “last used clock”  Think of buckets ordered 1…N around a clock  “The hand” sweeps around  Buckets keep a “ref bit” set to 1 or 0.  Whenever a bucket is fetched in its “ref bit” is set to 1.  Similarly it is set to “1” whenever it is referenced.  The buffer manager’s “hand” looks for the first 0 for replacement.  Whenever it passes by a “1” it is set to “0”.

55 Some strategies from OS– MRU algorithm  M ost R ecently U sed.  Are you kidding me? Why would you ever want to use this? Hint: Consider scanning a relation that has 1 Million buckets, but we only have 1000 buffer pages…

56 Consider a database operation– Nested Join How would LRU and Clock algorithm on this Nested join algorithm? For each record in Relation R For each record in Relation S Test the join condition specified End For

57 Database Buffer managers can be much smarter these! We will cover some popular buffer managers after discussing query processing algorithms.


Download ppt "Introduction and File Structures Database System Implementation CSE 507 Some slides adapted from R. Elmasri and S. Navathe, Fundamentals of Database Systems,"

Similar presentations


Ads by Google