Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sections 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT.

Similar presentations


Presentation on theme: "Sections 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT."— Presentation transcript:

1 Sections 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT

2 Presentation Outline  13.1 The Memory Hierarchy  13.1.1 The Memory Hierarchy  13.1.2 Transfer of Data Between Levels  13.1.3 Volatile and Nonvolatile Storage  13.1.4 Virtual Memory  13.2 Disks  13.2.1 Mechanics of Disks  13.2.2 The Disk Controller  13.2.3 Disk Access Characteristics

3 Presentation Outline (con’t)  13.3 Accelerating Access to Secondary Storage  13.3.1 The I/O Model of Computation  13.3.2 Organizing Data by Cylinders  13.3.3 Using Multiple Disks  13.3.4 Mirroring Disks  13.3.5 Disk Scheduling and the Elevator Algorithm  13.3.6 Prefetching and Large-Scale Buffering

4 13.1.1 Memory Hierarchy Several components for data storage having different data capacities available Cost per byte to store data also varies Device with smallest capacity offer the fastest speed with highest cost per bit

5 Memory Hierarchy Diagram Programs, DBMS Main Memory DBMS’s Main Memory Cache As Visual Memory Disk File System Tertiary Storage

6 13.1.1 Memory Hierarchy Cache – Lowest level of the hierarchy – Data items are copies of certain locations of main memory – Sometimes, values in cache are changed and corresponding changes to main memory are delayed – Machine looks for instructions as well as data for those instructions in the cache – Holds limited amount of data

7 13.1.1 Memory Hierarchy (con’t) No need to update the data in main memory immediately in a single processor computer In multiple processors data is updated immediately to main memory….called as write through

8 Main Memory Everything happens in the computer i.e. instruction execution, data manipulation, as working on information that is resident in main memory Main memories are random access….one can obtain any byte in the same amount of time

9 Secondary storage Used to store data and programs when they are not being processed More permanent than main memory, as data and programs are retained when the power is turned off E.g. magnetic disks, hard disks

10 Tertiary Storage Holds data volumes in terabytes Used for databases much larger than what can be stored on disk

11 13.1.2 Transfer of Data Between levels Data moves between adjacent levels of the hierarchy At the secondary or tertiary levels accessing the desired data or finding the desired place to store the data takes a lot of time Disk is organized into bocks Entire blocks are moved to and from memory called a buffer

12 13.1.2 Transfer of Data Between level (cont’d) A key technique for speeding up database operations is to arrange the data so that when one piece of data block is needed it is likely that other data on the same block will be needed at the same time Same idea applies to other hierarchy levels

13 13.1.3 Volatile and Non Volatile Storage A volatile device forgets what data is stored on it after power off Non volatile holds data for longer period even when device is turned off All the secondary and tertiary devices are non volatile and main memory is volatile

14 13.1.4 Virtual Memory Typical software executes in virtual memory Address space is typically 32 bit or 2 32 bytes or 4GB Transfer between memory and disk is in terms of blocks

15 13.2.1 Mechanism of Disk Mechanisms of Disks – Use of secondary storage is one of the important characteristic of DBMS – Consists of 2 moving pieces of a disk 1. disk assembly 2. head assembly – Disk assembly consists of 1 or more platters – Platters rotate around a central spindle – Bits are stored on upper and lower surfaces of platters

16 13.2.1 Mechanism of Disk Disk is organized into tracks The track that are at fixed radius from center form one cylinder Tracks are organized into sectors Tracks are the segments of circle separated by gap

17

18 13.2.2 Disk Controller One or more disks are controlled by disk controllers Disks controllers are capable of – Controlling the mechanical actuator that moves the head assembly – Selecting the sector from among all those in the cylinder at which heads are positioned – Transferring bits between desired sector and main memory – Possible buffering an entire track

19 13.2.3 Disk Access Characteristics Accessing (reading/writing) a block requires 3 steps – Disk controller positions the head assembly at the cylinder containing the track on which the block is located. It is a ‘seek time’ – The disk controller waits while the first sector of the block moves under the head. This is a ‘rotational latency’ – All the sectors and the gaps between them pass the head, while disk controller reads or writes data in these sectors. This is a ‘transfer time’

20 13.3 Accelerating Access to Secondary Storage  Several approaches for more-efficiently accessing data in secondary storage:  Place blocks that are together in the same cylinder.  Divide the data among multiple disks.  Mirror disks.  Use disk-scheduling algorithms.  Prefetch blocks into main memory.  Scheduling Latency – added delay in accessing data caused by a disk scheduling algorithm.  Throughput – the number of disk accesses per second that the system can accommodate.

21 13.3.1 The I/O Model of Computation  The number of block accesses (Disk I/O’s) is a good time approximation for the algorithm.  This should be minimized.  Ex 13.3: You want to have an index on R to identify the block on which the desired tuple appears, but not where on the block it resides.  For Megatron 747 (M747) example, it takes 11ms to read a 16k block.  A standard microprocessor can execute millions of instruction in 11ms, making any delay in searching for the desired tuple negligible.

22 13.3.2 Organizing Data by Cylinders  If we read all blocks on a single track or cylinder consecutively, then we can neglect all but first seek time and first rotational latency.  Ex 13.4: We request 1024 blocks of M747.  If data is randomly distributed, average latency is 10.76ms by Ex 13.2, making total latency 11s.  If all blocks are consecutively stored on 1 cylinder:  6.46ms + 8.33ms * 16 = 139ms (1 average seek)(time per rotation)(# rotations)

23 13.3.3 Using Multiple Disks  If we have n disks, read/write performance will increase by a factor of n.  Striping – distributing a relation across multiple disks following this pattern:  Data on disk R 1 : R 1, R 1+n, R 1+2n,…  Data on disk R 2 : R 2, R 2+n, R 2+2n,… …  Data on disk R n : R n, R n+n, R n+2n, …  Ex 13.5: We request 1024 blocks with n = 4.  6.46ms + (8.33ms * (16/4)) = 39.8ms (1 average seek)(time per rotation)(# rotations)

24 13.3.4 Mirroring Disks  Mirroring Disks – having 2 or more disks hold identical copied of data.  Benefit 1: If n disks are mirrors of each other, the system can survive a crash by n-1 disks.  Benefit 2: If we have n disks, read performance increases by a factor of n.  Performance increases further by having the controller select the disk which has its head closest to desired data block for each read.

25 13.3.5 Disk Scheduling and the Elevator Problem  Disk controller will run this algorithm to select which of several requests to process first.  Pseudo code:  requests[] // array of all non-processed data requests  upon receiving new data request:  requests[].add(new request)  while(requests[] is not empty)  move head to next location  if(head location is at data in requests[])  retrieve data  remove data from requests[]  if(head reaches end)  reverse head direction

26 13.3.5 Disk Scheduling and the Elevator Problem (con’t) Events: Head starting point Request data at 8000 Request data at 24000 Request data at 56000 Get data at 8000 Request data at 16000 Get data at 24000 Request data at 64000 Get data at 56000 Request Data at 40000 Get data at 64000 Get data at 40000 Get data at 16000 datatime Current time 0 4.3 Current time 10 Current time 13.6 Current time 20 Current time 26.9 Current time 30 Current time 34.2 Current time 45.5 Current time 56.8 8000 16000 24000 32000 40000 48000 56000 64000 datatime 8000..4.3 datatime 8000..4.3 24000..13.6 datatime 8000..4.3 24000..13.6 56000..26.9 datatime 8000..4.3 24000..13.6 56000..26.9 64000..34.2 datatime 8000..4.3 24000..13.6 56000..26.9 64000..34.2 40000..45.5 datatime 8000..4.3 24000..13.6 56000..26.9 64000..34.2 40000..45.5 16000..56.8

27 13.3.5 Disk Scheduling and the Elevator Problem (con’t) datatime 8000..4.3 24000..13.6 56000..26.9 64000..34.2 40000..45.5 16000..56.8 datatime 8000..4.3 24000..13.6 56000..26.9 16000..42.2 64000..59.5 40000..70.8 Elevator Algorithm FIFO Algorithm

28 13.3.6 Prefetching and Large-Scale Buffering  If at the application level, we can predict the order blocks will be requested, we can load them into main memory before they are needed.

29 Data Representation Recovery from Disk Crashes – 13.4 Presented By: Deepti Bhardwaj Roll No. 223_103 SJSU ID: 006521307

30 Contents 13.4.5 Recovery from Disk Crashes 13.4.6 Mirroring as a Redundancy Technique 13.4.7 Parity Blocks 13.4.8 An Improving: RAID 5 13.4.9 Coping With Multiple Disk Crashers

31 Recovery from Disk Crashes: Ways to recover data  The most serious mode of failure for disks is “head crash” where data permanently destroyed.  So to reduce the risk of data loss by disk crashes there are number of schemes which are know as RAID (Redundant Arrays of Independent Disks) schemes.

32 Recovery from Disk Crashes: Ways to recover data Continue : Recovery from Disk Crashes: Ways to recover data  Each of the schemes starts with one or more disks that hold the data and adding one or more disks that hold information that is completely determined by the contents of the data disks called Redundant Disk.

33 Mirroring as a Redundancy Technique  Mirroring Scheme is referred as RAID level 1 protection against data loss scheme.  In this scheme we mirror each disk.  One of the disk is called as data disk and other redundant disk.  In this case the only way data can be lost is if there is a second disk crash while the first crash is being repaired.

34 Parity Blocks  RAID level 4 scheme uses only one redundant disk no matter how many data disks there are.  In the redundant disk, the ith block consists of the parity checks for the ith blocks of all the data disks.  It means, the jth bits of all the ith blocks of both data disks and redundant disks, must have an even number of 1’s and redundant disk bit is used to make this condition true.

35 Parity Blocks – Reading disk Reading data disk is same as reading block from any disk. We could read block from each of the other disks and compute the block of the disk we want to read by taking the modulo-2 sum. disk 2: 10101010 disk 3: 00111000 disk 4: 01100010 If we take the modulo-2 sum of the bits in each column, we get disk 1: 11110000

36 Parity Block - Writing When we write a new block of a data disk, we need to change that block of the redundant disk as well. One approach to do this is to read all the disks and compute the module-2 sum and write to the redundant disk. But this approach requires n-1 reads of data, write a data block and write of redundant disk block. Total = n+1 disk I/Os

37 Continue : Parity Block - Writing Better approach will require only four disk I/Os 1. Read the old value of the data block being changed. 2. Read the corresponding block of the redundant disk. 3. Write the new data block. 4. Recalculate and write the block of the redundant disk.

38 Parity Blocks – Failure Recovery If any of the data disk crashes then we just have to compute the module-2 sum to recover the disk. Suppose that disk 2 fails. We need to re compute each block of the replacement disk. We are given the corresponding blocks of the first and third data disks and the redundant disk, so the situation looks like: disk 1: 11110000 disk 2: ???????? disk 3: 00111000 disk 4: 01100010 If we take the modulo-2 sum of each column, we deduce that the missing block of disk 2 is : 10101010

39 An Improvement: RAID 5 RAID 4 is effective in preserving data unless there are two simultaneous disk crashes. Whatever scheme we use for updating the disks, we need to read and write the redundant disk's block. If there are n data disks, then the number of disk writes to the redundant disk will be n times the average number of writes to any one data disk. However we do not have to treat one disk as the redundant disk and the others as data disks. Rather, we could treat each disk as the redundant disk for some of the blocks. This improvement is often called RAID level 5.

40 Continue : An Improvement: RAID 5 For instance, if there are n + 1 disks numbered 0 through n, we could treat the ith cylinder of disk j as redundant if j is the remainder when i is divided by n+1. For example, n = 3 so there are 4 disks. The first disk, numbered 0, is redundant for its cylinders numbered 4, 8, 12, and so on, because these are the numbers that leave remainder 0 when divided by 4. The disk numbered 1 is redundant for blocks numbered 1, 5, 9, and so on; disk 2 is redundant for blocks 2, 6. 10,..., and disk 3 is redundant for 3, 7, 11,....

41 Coping With Multiple Disk Crashes Error-correcting codes theory known as Hamming code leads to the RAID level 6. By this strategy the two simultaneous crashes are correctable.  The bits of disk 5 are the modulo-2 sum of the corresponding bits of disks 1, 2, and 3.  The bits of disk 6 are the modulo-2 sum of the corresponding bits of disks 1, 2, and 4.  The bits of disk 7 are the module2 sum of the corresponding bits of disks 1, 3, and 4

42 Coping With Multiple Disk Crashes – Reading/Writing We may read data from any data disk normally. To write a block of some data disk, we compute the modulo-2 sum of the new and old versions of that block. These bits are then added, in a modulo-2 sum, to the corresponding blocks of all those redundant disks that have 1 in a row in which the written disk also has 1.

43 UPDATES Stable Storage: To recover the disk failure known as Media Decay,in which if we overwrite a file, the new data is not read correctly RAID 5 need: Shortcoming of RAID level 4: suffers from a bottleneck defect (when updating data disk need to read and write the redundant disk);

44 13.6 Representing Block and Record Addresses Ramya Karri CS257 Section 2 ID: 206

45 Introduction Address of a block and Record – In Main Memory Address of the block is the virtual memory address of the first byte Address of the record within the block is the virtual memory address of the first byte of the record – In Secondary Memory: sequence of bytes describe the location of the block in the overall system Sequence of Bytes describe the location of the block : the device Id for the disk, Cylinder number, etc.

46 Addresses in Client-Server Systems The addresses in address space are represented in two ways – Physical Addresses: byte strings that determine the place within the secondary storage system where the record can be found. – Logical Addresses: arbitrary string of bytes of some fixed length Physical Address bits are used to indicate: – Host to which the storage is attached – Identifier for the disk – Number of the cylinder – Number of the track – Offset of the beginning of the record

47 Map Table relates logical addresses to physical addresses. LogicalPhysical Logical Address Physical Address A DDRESSES IN C LIENT -S ERVER S YSTEMS (C ONTD..)

48 Logical and Structured Addresses Purpose of logical address? Gives more flexibility, when we – Move the record around within the block – Move the record to another block Gives us an option of deciding what to do when a record is deleted? Recor d 4 Recor d 3 Recor d 2 Recor d 1 Header Offset table Unused

49 Pointer Swizzling Having pointers is common in an object- relational database systems Important to learn about the management of pointers Every data item (block, record, etc.) has two addresses: – database address: address on the disk – memory address, if the item is in virtual memory

50 Pointer Swizzling (Contd…) Translation Table: Maps database address to memory address All addressable items in the database have entries in the map table, while only those items currently in memory are mentioned in the translation table DbaddrMem-addr Database address Memory Address

51 Pointer Swizzling (Contd…) Pointer consists of the following two fields – Bit indicating the type of address – Database or memory address – Example 13.17 Disk Block 2 Block 1 Memory Swizzled Unswizzled Block 1

52 Example 13.7 Block 1 has a record with pointers to a second record on the same block and to a record on another block If Block 1 is copied to the memory – The first pointer which points within Block 1 can be swizzled so it points directly to the memory address of the target record – Since Block 2 is not in memory, we cannot swizzle the second pointer

53 Pointer Swizzling (Contd…) Three types of swizzling – Automatic Swizzling As soon as block is brought into memory, swizzle all relevant pointers. – Swizzling on Demand Only swizzle a pointer if and when it is actually followed. – No Swizzling Pointers are not swizzled they are accesses using the database address.

54 Programmer Control of Swizzling Unswizzling – When a block is moved from memory back to disk, all pointers must go back to database (disk) addresses – Use translation table again – Important to have an efficient data structure for the translation table

55 Pinned records and Blocks A block in memory is said to be pinned if it cannot be written back to disk safely. If block B1 has swizzled pointer to an item in block B2, then B2 is pinned – Unpin a block, we must unswizzle any pointers to it – Keep in the translation table the places in memory holding swizzled pointers to that item – Unswizzle those pointers (use translation table to replace the memory addresses with database (disk) addresses

56 CS 255: Database System Principles slides: Variable length data and record By:- Arunesh Joshi( 107) Id:-006538558 Cs257_107_ch13_13.7

57

58 Agenda Records With Variable-Length Fields Records With Repeating Fields Variable-Format Records Records That Do Not Fit in a Block BLOBs Column Stores

59 Records With Variable-Length Fields A simple but effective scheme is to put all fixed length fields ahead of the variable-length fields. We then place in the record header: 1. The length of the record. 2. Pointers to (i.e., offsets of) the beginnings of all the variable-length fields. However, if the variable-length fields always appear in the same order then the first of them needs no pointer; we know it immediately follows the fixed-length fields.

60

61 Records With Repeating Fields A similar situation occurs if a record contains a variable number of Occurrences of a field F, but the field itself is of fixed length. It is sufficient to group all occurrences of field F together and put in the record header a pointer to the first. We can locate all the occurrences of the field F as follows. Let the number of bytes devoted to one instance of field F be L. We then add to the offset for the field F all integer multiples of L, starting at 0, then L, 2L, 3L, and so on. Eventually, we reach the offset of the field following F. Where upon we stop.

62

63 An alternative representation is to keep the record of fixed length, and put the variable length portion - be it fields of variable length or fields that repeat an indefinite number of times - on a separate block. In the record itself we keep: 1. Pointers to the place where each repeating field begins, and 2. Either how many repetitions there are, or where the repetitions end.

64 Storing variable-length fields separately from the record

65 Variable-Format Records The simplest representation of variable-format records is a sequence of tagged fields, each of which consists of: 1. Information about the role of this field, such as: (a) The attribute or field name, (b) The type of the field, if it is not apparent from the field name and some readily available schema information, and (c) The length of the field, if it is not apparent from the type. 2. The value of the field.

66 There are at least two reasons why tagged fields would make sense. 1.Information integration applications - Sometimes, a relation has been constructed from several earlier sources, and these sources have different kinds of information For instance, our movie star information may have come from several sources, one of which records birthdates, some give addresses, others not, and so on. If there are not too many fields, we are probably best off leaving NULL those values we do not know. 2. Records with a very flexible schema - If many fields of a record can repeat and/or not appear at all, then even if we know the schema, tagged fields may be useful. For instance, medical records may contain information about many tests, but there are thousands of possible tests, and each patient has results for relatively few of them.

67 A record with tagged fields

68 Records That Do Not Fit in a Block These large values have a variable length, but even if the length is fixed for all values of the type, we need to use some special techniques to represent these values. In this section we shall consider a technique called “spanned records" that can be used to manage records that are larger than blocks. Spanned records also are useful in situations where records are smaller than blocks, but packing whole records into blocks wastes significant amounts of space. For both these reasons, it is sometimes desirable to allow records to be split across two or more blocks. The portion of a record that appears in one block is called a record fragment.

69 If records can be spanned, then every record and record fragment requires some extra header information: 1. Each record or fragment header must contain a bit telling whether or not it is a fragment. 2. If it is a fragment, then it needs bits telling whether it is the first or last fragment for its record. 3. If there is a next and/or previous fragment for the same record, then the fragment needs pointers to these other fragments. Storing spanned records across blocks

70 BLOBS Binary, Large OBjectS = BLOBS BLOBS can be images, movies, audio files and other very large values that can be stored in files. Storing BLOBS – Stored in several blocks. – Preferable to store them consecutively on a cylinder or multiple disks for efficient retrieval. Retrieving BLOBS – A client retrieving a 2 hour movie may not want it all at the same time. – Retrieving a specific part of the large data requires an index structure to make it efficient. (Example: An index by seconds on a movie BLOB.)

71 Column Stores An alternative to storing tuples as records is to store each column as a record. Since an entire column of a relation may occupy far more than a single block, these records may span many block, much as long as files do. If we keep the values in each column in the same order then we can reconstruct the relation from column records

72 Consider this relation

73 RECORD MODIFICATION AKSHAY SHENOY CLASS ID :108 Topic 13.8 Proffesor : T.Y Lin

74 INTRODUCTION What is Record ? Record is a single, implicitly structured data item in the database table. Record is also called as Tuple. What is definition of Record Modification ? We say Records Modified when a data manipulation operation is performed.

75 STRUCTURE OF A RECORD RECORD STRUCTURE FOR A PERSON TABLE CREATE TABLE PERSON ( NAME CHAR(30), ADDRESS CHAR(256), GENDER CHAR(1), BIRTHDATE CHAR(10));

76 TYPES OF RECORDS FIXED LENGTH RECORDS CREATE TABLE SJSUSTUDENT(STUDENT_ID INT(9) NOT NULL, PHONE_NO INT(10) NOT NULL); VARIABLE LENGTH RECORDS CREATE TABLE SJSUSTUDENT(STUDENT_ID INT(9) NOT NULL, NAME CHAR(100),ADDRESS CHAR(100),PHONE_NO INT(10) NOT NULL);

77 RECORD MODIFICATION Modification of Record  Insert  Update  Delete Issues even with Fixed Length Records More Issues with Variable Length Records

78 STRUCTURE OF A BLOCK & RECORDS Various Records are clubbed together and stored together in memory in blocks STRUCTURE OF BLOCK

79 BLOCKS & RECORDS If records need not be any particular order, then just find a block with enough empty space We keep track of all records/tuples in a relation/tables using Index structures, File organization concepts

80 Inserting New Records If Records are not required to be a particular order, just find an empty block and place the record in the block. eg: Heap Files What if the Records are to be Kept in a particular Order(eg: sorted by primary key) ? Locate appropriate block,check if space is available in the block if yes place the record in the block.

81 INSERTING NEW RECORDS We may have to slide the Records in the Block to place the Record at an appropriate place in the Block and suitably edit the block header.

82 What If The Block Is Full ? We need to Keep the record in a particular block but the block is full. How do we deal with it ? We find room outside the Block There are 2 approaches to finding the room for the record. I.Find Space on Nearby Block II.Create an Overflow Block

83 APPROACHES TO FINDING ROOM FOR RECORD FIND SPACE ON NEARBY BLOCK  BLOCK B1 HAS NO SPACE  IF SPACE AVAILABLE ON BLOCK B2 MOVE RECORDS OF B1 TO B2.  IF THERE ARE EXTERNAL POINTERS TO RECORDS OF B1 MOVED TO B2 LEAVE FORWARDING ADDRESS IN OFFSET TABLE OF B1

84 APPROACHES TO FINDING ROOM FOR RECORD CREATE OVERFLOW BLOCK  EACH BLOCK B HAS IN ITS HEADER POINTER TO AN OVERFLOW BLOCK WHERE ADDITIONAL BLOCKS OF B CAN BE PLACED.

85 DELETION Try to reclaim the space available on a record after deletion of a particular record If an offset table is used for storing information about records for the block then rearrange/slide the remaining records. If Sliding of records is not possible then maintain a SPACE-AVAILABLE LIST to keep track of space available on the Record.

86 TOMBSTONE What about pointer to deleted records ? A tombstone is placed in place of each deleted record A tombstone is a bit placed at first byte of deleted record to indicate the record was deleted ( 0 – Not Deleted 1 – Deleted) A tombstone is permanent

87 UPDATING RECORDS For Fixed-Length Records, there is no effect on the storage system For variable length records : If length increases, like insertion “slide the records” If length decreases, like deletion we update the space-available list, recover the space/eliminate the overflow blocks.

88 Query Execution Chapter 15 Section 15.1 Presented by Khadke, Suvarna CS 257 (Section II) Id 213 88

89 Agenda Query Processor and major parts of Query processor Physical-Query-Plan Operators Scanning Tables Basic approaches to locate the tuples of a relation R Sorting While Scanning Tables Computation Model for Physical Operator I/O Cost for Scan Operators Iterators 89

90 What is a Query Processor Group of components of a DBMS that converts a user queries and data-modification commands into a sequence of database operations It also executes those operations Must supply detail regarding how the query is to be executed 90

91 Major parts of Query processor 91 Query Execution:  The algorithms that manipulate the data of the database.  Focus on the operations of extended relational algebra.

92 Outline of Query Compilation Query compilation Parsing : A parse tree for the query is constructed Query Rewrite : The parse tree is converted to an initial query plan and transformed into logical query plan (less time)  Physical Plan Generation : Logical Q Plan is converted into physical query plan by selecting algorithms and order of execution of these operator. 92

93 Physical-Query-Plan Operators Physical operators are implementations of the operator of relational algebra. They can also be use in non relational algebra operators like “scan” which scans tables, that is, bring each tuple of some relation into main memory 93

94 Scanning Tables One of the basic thing we can do in a Physical query plan is to read the entire contents of a relation R. Variation of this operator involves simple predicate, read only those tuples of the relation R that satisfy the predicate. 94

95 Scanning Tables Basic approaches to locate the tuples of a relation R  Table Scan Relation R is stored in secondary memory with its tuples arranged in blocks It is possible to get the blocks one by one  Index-Scan If there is an index on any attribute of Relation R, we can use this index to get all the tuples of Relation R 95

96 Sorting While Scanning Tables Number of reasons to sort a relation  Query could include an ORDER BY clause, requiring that a relation be sorted.  Algorithms to implement relational algebra operations requires one or both arguments to be sorted relations.  Physical-query-plan operator sort-scan takes a relation R, attributes on which the sort is to be made, and produces R in that sorted order 96

97 Computation Model for Physical Operator Physical-Plan Operator should be selected wisely which is essential for good Query Processor. For “cost” of each operator is estimated by number of disk I/O’s for an operation. The total cost of operation depends on the size of the answer, and includes the final write back cost to the total cost of the query. 97

98 Parameters for Measuring Costs Parameters that affect the performance of a query  Buffer space availability in the main memory at the time of execution of the query  Size of input and the size of the output generated  The size of memory block on the disk and the size in the main memory also affects the performance 98

99 Parameters for Measuring Costs B: The number of blocks are needed to hold all tuples of relation R.  Also denoted as B(R) T:The number of tuples in relationR.  Also denoted as T(R) V: The number of distinct values that appear in a column of a relation R  V(R, a)- is the number of distinct values of column for a in relation R 99

100 I/O Cost for Scan Operators If relation R is clustered, then the number of disk I/O for the table-scan operator is = ~B disk I/O’s If relation R is not clustered, then the number of required disk I/O generally is much higher A index on a relation R occupies many fewer than B(R) blocks That means a scan of the entire relation R which takes at least B disk I/O’s will require more I/O’s than the entire index 100

101 Iterators for Implementation of Physical Operators Many physical operators can be implemented as an Iterator. Three methods forming the iterator for an operation are: 1. Open( ) :  This method starts the process of getting tuples  It initializes any data structures needed to perform the operation 101

102 Iterators for Implementation of Physical Operators 2. GetNext( ):  Returns the next tuple in the result  If there are no more tuples to return, GetNext returns a special value NotFound 3. Close( ) :  Ends the iteration after all tuples  It calls Close on any arguments of the operator 102

103 One Pass Algorithm Presented By: Pradhyuman raol ID : 114 Instructor: Dr T.Y. LIN

104 Agenda  One Pass algorithm  Tuple-at-a-Time  Unary Operators  Binary Operations

105 One-Pass Algorithms  One Pass Algorithm: Some methods involve reading the data only once from disk. They work only when at least one of the arguments of the operation fits in main memory.

106 Tuple-at-a-Time  We read the blocks of R one at a time into an input buffer, perform the operation on the tuple, and more the selected tuples or the projected tuples to the output buffer.  Examples: Selection & Projection

107 Tuple at a time Diagram Input bufferOutput buffer Unary operator R

108 Unary Operators The unary operations that apply to relations as a whole, rather than to one tuple at a time. Duplicate Elimination  (R) :Check whether that tuple is already there or not. M= Main memory B(  (R))= Size of Relation R Assumption: B(  (R)) <= M

109 Unary Operators Grouping : A grouping operation gives us zero or more grouping attributes and presumably one or more accumulated value or values for each aggregation. Min or Max Count Sum Average

110 Binary Operations  Set Union  Set Intersection  Set Difference  Bag Intersection  Bag Difference  Product  Natural join

111 Nested Loops Joins Book Section of chapter 15.3 Submitted to : Prof. Dr. T.Y. LIN Submitted by: Saurabh Vishal

112 Topic to be covered Tuple-Based Nested-Loop Join An Iterator for Tuple-Based Nested-Loop Join A Block-Based Nested-Loop Join Algorithm Analysis of Nested-Loop Join

113 15.3.1 Tuple-Based Nested-Loop Join The simplest variation of nested-loop join has loops that range over individual tuples of the relations involved. In this algorithm, which we call tuple-based nested-loop join, we compute the join as follows R S

114 Continued For each tuple s in S DO For each tuple r in R Do if r and s join to make a tuple t THEN output t; – If we are careless about how the buffer the blocks of relations R and S, then this algorithm could require as many as T(R)T(S) disk.there are many situations where this algorithm can be modified to have much lower cost.

115 Continued One case is when we can use an index on the join attribute or attributes of R to find the tuples of R that match a given tuple of S, without having to read the entire relation R. The second improvement looks much more carefully at the way tuples of R and S are divided among blocks, and uses as much of the memory as it can to reduce the number of disk I/O's as we go through the inner loop. We shall consider this block-based version of nested-loop join.

116 15.3.2 An Iterator for Tuple-Based Nested-Loop Join Open() { – R.Open(); – S.open(); – A:=S.getnext(); } GetNext() { Repeat { r:= R.Getnext(); IF(r= Not found) {/* R is exhausted for the current s*/ R.close(); s:=S.Getnext(); 

117 IF( s= Not found) RETURN Not Found; /* both R & S are exhausted*/ R.Close(); r:= R.Getnext(); } until ( r and s join) RETURN the join of r and s; } Close() { R.close (); S.close (); }

118 15.3.3 A Block-Based Nested-Loop Join Algorithm We can Improve Nested loop Join by compute R |><| S. 1.Organizing access to both argument relations by blocks. 2.Using as much main memory as we can to store tuples belonging to the relation S, the relation of the outer loop.

119 The nested-loop join algorithm FOR each chunk of M-1 blocks of S DO BEGIN read these blocks into main-memory buffers; organize their tuples into a search structure whose search key is the common attributes of R and S; FOR each block b of R DO BEGIN read b into main memory; FOR each tuple t of b DO BEGIN find the tuples of S in main memory that join with t ; output the join of t with each of these tuples; END ;

120 15.3.4 Analysis of Nested-Loop Join Assuming S is the smaller relation, the number of chunks or iterations of outer loop is B(S)/(M - 1). At each iteration, we read hf - 1 blocks of S and B(R) blocks of R. The number of disk I/O's is thus B(S)/M-1(M-1+B(R)) or B(S)+B(S)B(R)/M-1

121 Continued Assuming all of M, B(S), and B(R) are large, but M is the smallest of these, an approximation to the above formula is B(S)B(R)/M. That is, cost is proportional to the product of the sizes of the two relations, divided by the amount of available main memory.

122 Example B(R) = 1000, B(S) = 500, M = 101 – Important Aside: 101 buffer blocks is not as unrealistic as it sounds. There may be many queries at the same time, competing for main memory buffers. Outer loop iterates 5 times At each iteration we read M-1 (i.e. 100) blocks of S and all of R (i.e. 1000) blocks. Total time: 5*(100 + 1000) = 5500 I/O’s Question: What if we reversed the roles of R and S? We would iterate 10 times, and in each we would read 100+500 blocks, for a total of 6000 I/O’s. Compare with one-pass join, if it could be done! We would need 1500 disk I/O’s if B(S)  M-1

123 Continued……. 1.The cost of the nested-loop join is not much greater than the cost of a one-pass join, which is 1500 disk 110's for this example. In fact.if B(S) 5 lZI - 1, the nested-loop join becomes identical to the one-pass join algorithm of Section 15.2.3 2.Nested-loop join is generally not the most efficient join algorithm.

124 Two-Pass Algorithms Based on Sorting SECTION 15.4 Rupinder Singh

125 Two-Pass Algorithms Based on Sorting Two-pass Algorithms: where data from the operand relations is read into main memory, processed in some way, written out to disk again, and then reread from disk to complete the operation

126 Basic idea Step 1: Read M blocks of R into main memory. Step 2:Sort these M blocks in main memory, using an efficient, main-memory sorting algorithm. so we expect that the time to sort will not exceed the disk 1/0 time for step (1). Step 3: Write the sorted list into M blocks of disk.

127 Duplicate Elimination Using Sorting δ(R) First we sort the tuples of R in sublists Then we use the available main memory to hold one block from each sorted sublist Then we repeatedly copy one to the output and ignore all tuples identical to it. The total cost of this algorithm is 3B(R) This algorithm requires only √B(R)blocks of main memory, rather than B(R) blocks(one- pass algorithm).

128 Example Suppose that tuples are integers, and only two tuples fit on a block. Also, M = 3 and the relation R consists of 17 tuples: 2,5,2,1,2,2,4,5,4,3,4,2,1,5,2,1,3 After first-pass SublistsElements R11,2,2,2,2,5 R22,3,4,4,4,5 R31,1,2,3,5

129 Example Second pass After processing tuple 1 Output: 1 Continue the same process with next tuple. SublistIn memoryWaiting on disk R11,22,2, 2,5 R22,34,4, 4,5 R31,12,3,5 SublistIn memoryWaiting on disk R122,2, 2,5 R22,34,4, 4,5 R32,35

130 Grouping and Aggregation Using Sorting γ(R) Two-pass algorithm for grouping and aggregation is quite similar to the previous algorithm. Step 1:Read the tuples of R into memory, M blocks at a time. Sort each M blocks, using the grouping attributes of L as the sort key. Write each sorted sublist to disk. Step 2:Use one main-memory buffer for each sublist, and initially load the first block of each sublist into its buffer. Step 3:Repeatedly find the least value of the sort key (grouping attributes) present among the first available tuples in the buffers. This algorithm takes 3B(R) disk 1/0's, and will work as long as B(R) < M².

131 A Sort-Based Union Algorithm For bag-union one-pass algorithm is used. For set-union – Step 1:Repeatedly bring M blocks of R into main memory, sort their tuples, and write the resulting sorted sublist back to disk. – Step 2:Do the same for S, to create sorted sublists for relation S. – Step 3:Use one main-memory buffer for each sublist of R and S. Initialize each with the first block from the corresponding sublist. – Step 4:Repeatedly find the first remaining tuple t among all the buffers. Copy t to the output. and remove from the buffers all copies of t (if R and S are sets there should be at most two copies) This algorithm takes 3(B(R)+B(S)) disk 1/0's, and will work as long as B(R)+B(S) < M².

132 Sort-Based Intersection and Difference For both set version and bag version, the algorithm is same as that of set-union except that the way we handle the copies of a tuple t at the fronts of the sorted sublists. For set intersection, output t if it appears in both R and S. For bag intersection, output t the minimum of the number of times it appears in R and in S. For set difference, R-S, output t if and only if it appears in R but not in S. For bag difference, R-S, output t the number of times it appears in R minus the number of times it appears in S.

133 A Simple Sort-Based Join Algorithm When taking a join, the number of tuples from the two relations that share a common value of the join attribute(s), and therefore need to be in main memory simultaneously, can exceed what fits in memory To avoid facing this situation, are can try to reduce main-memory use for other aspects of the algorithm, and thus make available a large number of buffers to hold the tuples with a given join-attribute value

134 A Simple Sort-Based Join Algorithm Given relations R(X, Y) and S(Y, Z) to join, and given M blocks of main memory for buffers. Step 1:Sort R and S, using a two-phase, multiway merge sort, with Y as the sort key. Step 2:Merge the sorted R and S. The following steps are done repeatedly: – Find the least value y of the join attributes Y that is currently at the front of the blocks for R and S. – If y does not appear at the front of the other relation, then remove the tuple(s) with sort key y. – Otherwise, identify all the tuples from both relations having sort key y. – Output all the tuples that can be formed by joining tuples from R and S with a common Y-value y. – If either relation has no more unconsidered tuples in main memory.,reload the buffer for that relation.

135 A Simple Sort-Based Join Algorithm The simple sort-join uses 5(B(R) + B(S)) disk I/0's. It requires B(R) ≤ M² and B(S) ≤ M² to work.

136 A More Efficient Sort-Based Join If we do not have to worry about very large numbers of tuples with a common value for the join attribute(s), then we can save two disk 1/0's per block by combining the second phase of the sorts with the join itself To compute R(X, Y) ►◄ S(Y, Z) using M main-memory buffers – Create sorted sublists of size M, using Y as the sort key, for both R and S. – Bring the first block of each sublist into a buffer – Repeatedly find the least Y-value y among the first available tuples of all the sublists. Identify all the tuples of both relations that have Y-value y. Output the join of all tuples from R with all tuples from S that share this common Y-value

137 A More Efficient Sort-Based Join The number of disk I/O’s is 3(B(R) + B(S)) It requires B(R) + B(S) ≤ M² to work

138 Summary of Sort-Based Algorithms OperatorsApproximate M required Disk I/O γ,δγ,δ √B 3B U, ∩,− √(B(R) + B(S)) 3(B(R) + B(S)) ►◄√(max(B(R),B(S))) 5(B(R) + B(S)) ►◄(more efficient)√(B(R) + B(S)) 3(B(R) + B(S))

139 Concurrency Control by Timestamps Prepared By: Ronak Shah Professor :Dr. T. Y Lin ID: 116

140 Concurrency Concurrency is a property of a systems in which several computations are executing and overlapping in time, and interacting with each other. Timestamp Timestamp is a sequence of characters, denoting the date or time at which a certain event occurred. Example of Timestamp: 20-MAR-09 04.55.14.000000 PM 06/20/2003 16:55:14:000000

141 Timestamping We assign a timestamp to transaction and timestamp is usually presented in a consistent format, allowing for easy comparison of two different records and tracking progress over time; the practice of recording timestamps in a consistent manner along with the actual data is called timestamping.

142 Timestamps To use timestamping as a concurrency-control method, the scheduler needs to assign to each transaction T a unique number, its timestamp TS(T). Here two approaches use to generating timestamps 1.Using system clock 2.Another approach is for the scheduler to maintain a counter. Each time when transaction starts the counter is incremented by 1 and new value become timestamp for transaction.

143 Whichever method we use to generate timestamp, the scheduler must maintain a table of currently active transaction and their timestamp.  To use timestamps as a concurrency-control method we need to associate with each database element x two timestamps and an additional bit. RT(x) The read time of x. WT(x) The write time of x. C(x) The commit bit of x. which is true if and only if the most recent transaction to write x has already committed. The purpose of this bit is to avoid a situation of “Dirty Read”.

144 Physically Unrealizable Behaviors Read too late Transaction T tries to read too late

145 Write too late Transaction T tries to write too late

146 Problem with dirty data T could perform a dirty read if it is reads X

147 A write is cancelled because of a write with a later timestamp, but the writer then aborts

148 Rules for timestamp based scheduling 1.Granting Request 2.Aborting T (if T would violate physical reality) and restarting T with a new timestamp (Rollback) 3.Delaying T and later deciding whether to abort T or to grant the request Scheduler’s Response to a T’s request for Read(X)/Write(X)

149 Request RT(X): 1.If TS(T) >= WT(X), the read is physically realizable I.If C(X) is true, grant the request. If TS(T) > RT(X), set RT(X) := TS(T); otherwise do not change RT(X) II.If C(X) is false, delay T until C(X) becomes true or the transaction that wrote X aborts 2.If TS(T) < WT(X), the read is physically unrealizable. Rollback T; abort T and restart it with a new, larger timestamp Rules

150 Request WT(X): 1.If TS(T) >= RT(X) and TS(T) >= WT(X), the write is physically realizable and must be performed 1.Write the new value for X 2.Set WT(X) := TS(T), and 3.Set C(X) := false 2.If TS(T) >= RT(X), but TS(T) < WT(X), then the write is physically realizable, but there is already a later value in X. If C(X) is true, then ignore the write by T. If C(X) is false, delay T 3.If TS(T) < RT(X), then the write is physically unrealizable

151 Timestamps Vs Locks TimestampsLocks Superior if most transactions are read-only rare that concurrent transactions will read or write the same element Superior in high-conflict situations In high-conflict situations, rollback will be frequent, introducing more delays than a locking system Frequently delay transactions as they wait for locks

152 Query Processing (Two-Pass Algorithms) 152 Priyadarshini.S 006493851 Cs 257_117_ch 15_15.5

153 Two-Pass Algorithms Using Hashing General idea: – Hash the tuples using an appropriate hash key – For the common operations, there is a way to choose the hash key so that all tuples that need to be considered together has the same hash value – Do the operation working on one bucket at a time 153

154 Partitioning by Hashing initialize M-1 buckets with M-1 empty buffers for each block b of relation R do read block b into the Mth buffer for each tuple t in b do if the buffer for bucket h(t) is full then copy the buffer to disk initialize a new empty block in that buffer copy t to the buffer for bucket h(t) for each bucket do if the buffer for this bucket is not empty then write the buffer to disk 154

155 Duplicate Elimination Using Hashing Hash the relation R to M-1 buckets, R 1, R 2,…,R M-1 Note: all copies of the same tuple will hash to the same bucket! Do duplicate elimination on each bucket R i independently, using one-pass algorithm Return the union of the individual bucket results 155

156 Analysis of Duplicate Elimination Using Hashing Number of disk I/O's: 3*B(R) In order for this to work, we need: – hash function h evenly distributes the tuples among the buckets – each bucket R i fits in main memory (to allow the one-pass algorithm) – i.e., B(R) ≤ M 2 156

157 Grouping Using Hashing Hash all the tuples of relation R to M-1 buckets, using a hash function that depends only on the grouping attributes Note: all tuples in the same group end up in the same bucket! Use the one-pass algorithm to process each bucket independently Uses 3*B(R) disk I/O's, requires B(R) ≤ M 2 157

158 Union, Intersection and Difference Using Hashing Use same hash function for both relations! Hash R to M-1 buckets R 1, R 2, …, R M-1 Hash S to M-1 buckets S 1, S 2, …, S M-1 Do one-pass {set union, set intersection, bag intersection, set difference, bag difference} algorithm on R i and S i, for all i 3*(B(R) + B(S)) disk I/O's; min(B(R),B(S)) ≤ M 2 158

159 Join Using Hashing Use same hash function for both relations; hash function should depend only on the join attributes Hash R to M-1 buckets R 1, R 2, …, R M-1 Hash S to M-1 buckets S 1, S 2, …, S M-1 Do one-pass join of R i and S i, for all i 3*(B(R) + B(S)) disk I/O's; min(B(R),B(S)) ≤ M 2 159

160 Comparison of Sort-Based and Hash- Based For binary operations, hash-based only limits size of smaller relation, not sum Sort-based can produce output in sorted order, which can be helpful Hash-based depends on buckets being of equal size Sort-based algorithms can experience reduced rotational latency or seek time 160

161 UPDATES R U S we hash both R and S to M-1 R ∩ S we hash both R and S to 2(M-1) R-S we hash both R and S to 2(M-1) Requires 3(B(R)+B(S)) disk I/O’s. Two pass hash based algorithm requires min(B(R)+B(S))≤ M2 161

162 Index-Based Algorithms Chapter 15 Section 15.6 Presented by Fan Yang CS 257 Class ID218 162

163 Clustering and Nonclustering Indexes Clustered Relation: Tuples are packed into roughly as few blocks as can possibly hold those tuples Clustering indexes: Indexes on attributes that all the tuples with a fixed value for the search key of this index appear on roughly as few blocks as can hold them 163

164 Clustering and Nonclustering Indexes A relation that isn’t clustered cannot have a clustering index A clustered relation can have nonclustering indexes 164

165 Index-Based Selection For a selection σ C (R), suppose C is of the form a=v, where a is an attribute For clustering index R.a: the number of disk I/O’s will be B(R)/V(R,a) 165

166 Index-Based Selection The actual number may be higher: 1. index is not kept entirely in main memory 2. they spread over more blocks 3. may not be packed as tightly as possible into blocks 166

167 Example B(R)=1000, T(R)=20,000 number of I/O’s required: 1. clustered, not index 1000 2. not clustered, not index 20,000 3. If V(R,a)=100, index is clustering 10 4. If V(R,a)=10, index is nonclustering 2,000 167

168 Joining by Using an Index Natural join R(X, Y) S S(Y, Z) Number of I/O’s to get R Clustered: B(R) Not clustered: T(R) Number of I/O’s to get tuple t of S Clustered: T(R)B(S)/V(S,Y) Not clustered: T(R)T(S)/V(S,Y) 168

169 Example R(X,Y): 1000 blocks S(Y,Z)=500 blocks Assume 10 tuples in each block, so T(R)=10,000 and T(S)=5000 V(S,Y)=100 If R is clustered, and there is a clustering index on Y for S the number of I/O’s for R is:1000 the number of I/O’s for S is10,000*500/100=50,000 169

170 Joins Using a Sorted Index Natural join R(X, Y) S (Y, Z) with index on Y for either R or S Extreme case: Zig-zag join Example: relation R(X,Y) and R(Y,Z) with index on Y for both relations search keys (Y-value) for R: 1,3,4,4,5,6 search keys (Y-value) for S: 2,2,4,6,7,8 170

171 UPDATES Clustered index on a: cost B(R)/V(R,a) – If the index on R.a is clustering, then the number of disk I/O's to retrieve the set  a = v (R) will average B(R)/V(R, a). The actual number may be somewhat higher. Unclustered index on a: cost T(R)/V(R,a) – If the index on R.a is nonclustering, each tuple we retrieve will be on a different block, and we must access T(R)/V(R,a) tuples. Thus, T(R)/V(R, a) is an estimate of the number of disk I/O’s we need. 171

172 CS 257 Database Systems Principles Assignment 1 Instructor: Student: Dr. T. Y. Lin Rajan Vyas (119)

173 Modes of Information Integration

174 Information Information Integration allows database or other distributed information to work together. Three most common approaches:  Federated Database  DataWareHousin  Mediators

175 Federated Database System Sources are independent, but one source can call on others to supply information. One-to-One connection between the all pairs of databases DB 1 DB 4 DB 2 DB 3

176 Dealer 1 NeededCars(mode1, color, autotrans) Dealer 2 Autos(seria1, model, color) Options(seria1, option) Dealer 1 to Dealer 2 f or ( e a ch t u p l e (:m, : c, :a) in neededCars ) i f ( : a = TRUE) { /* automatic transmission wanted */ SELECT s e r i a l FROM Autos, Options WHERE Autos.seria1 = Options.seria1 AND Options.option = 'autoTrans' AND Autos.mode1 = :m AND Autos.color = :c; } e l s e { /* automatic transmission not wanted */ SELECT serial FROM Autos WHERE Autos.mode1 = :m AND Autos.color = :c AND NOT EXISTS ( SELECT * FROM Options WHERE s e r i a l = Autos.seria1 AND option = 'autoTrans' }; } } Dealer 3 Cars(serialN0, model, color, autoTrans,...)

177 Data WareHouse

178 Information Copies sources of data from several sources are stored in a single database. User Query Result Ware House Combiner Extractor 2 Extractor 1 Sour ce 1 Sour ce 2

179 Dealer 1 Cars(serialN0, model, color, autoTrans, cdPlayer,... ) Dealer 2 Autos(seria1, model, color) Opt ions ( s e r i a l, option) WareHouse AutosWhse(seria1N0, model, color, autoTrans, dealer)

180 Mediators

181 It is a software component that supports a virtual database. It stores no data of its own. User Query Result Mediator Wrapper Sour ce 1 Sour ce 2

182 Extractor for translating Dealer-2 data to the warehouse INSERT INTO AutosWhse(serialNo, model, color,autoTrans, dealer) SELECT s e r i a l, model, color, ' y e s ', 'dealer2' FROM Autos, Options WHERE Autos.seria1 = Options.seria1 AND option = 'autoTrans'; INSERT INTO AutosWhse(serialNo, model, color,autoTrans, dealer) SELECT s e r i a l, model, color, 'no', 'dealer2‘ FROM Autos WHERE NOT EXISTS ( SELECT * FROM Options WHERE s e r i a l = Autos.seria1 AND option = 'autoTrans' );

183 Algorithms Using More Than Two Passes

184 The relations of arbitrary size can be processed by using many passes as necessary.  Multipass Sort-Based Algorithms  Multipass Hash-Bases Algorithms Introduction

185 Allow us to sort a relation, however large it may be. Suppose “M” main-memory buffers are available to sort a relation R, then do: Basis: If R fits in M blocks (ie, B(R) <=M),  Read R into main memory  Use main-memory sorting algorithm  Write the sorted relation to disk. Multipass Sort-Based Algorithms

186 Induction: If R does not fit into main memory,  Partition R into M groups (R1,R2,…,Rm)  Recursively sort Ri for each i=1,2,…,M  Merge the M sorted sublists Merging M sorted sublists:  Output one copy of each distinct tuples, skip over copies  Sort on the grouping attributes  Combine the tuples with a given value of these grouping attributes in an appropriate manner Multipass Sort-Based Algorithms

187 s- Size of the relation operated upon M – Buffers k – Passes s(M,k) – Maximum size of a relation that can be sorted using M buffers Compute s(M,k): Basis: If k=1,  One pass is allowed  s(M,k) = M Performance of Multipass Sort-Based Algorithms

188 Induction: If k>1,  Partition R into M pieces, each of which must be sortable in k-1 passes  If B(R) =s(M,k), then s(M,k) / M – Size of each of the M pieces of R, cannot exceed s(M,k-1) ie, s(M,k) = M s(M,k-1)  s(M,k) = M s(M,k-1) = M^2 s(M,k-1) =... = M^(k-1) s(M,1)  But, s(M,1) = M  s(M,k) = M^k Minimum number of buffers:  Using k passes, we can sort a relation R if B(R) <= M^k  ie, If we want to sort R in k passes, the minimum number of buffers we can use is M = (B(R))^1/k  Each pass of a sorting algorithm reads data from the disk and writes it out again  A k-pass sorting algorithm requires 2kB(R) disk I/Os. Performance of Multipass Sort-Based Algorithms

189 A recursive approach to using hashing for operations on large relations.  Hash the relation into M-1 buckets, M – number of available memory buffers  Apply the operation to each bucket individually – Unary operation  Apply the operation to each pair of corresponding buckets – Binary operation Recursive Approach: Basis: Case 1: Unary operation:  If the relation fits in M buffers, read it into memory  Perform the operation Case 2: Binary operation:  If either of the relations fits into M-1 buffers, read into main memory  Read the second relation, one block at a time into the buffer no. M Multipass Hash-Based Algorithms

190 Induction: If no relation fits in main memory,  Hash each relation into M-1 buckets  Recursively perform the operation on each bucket or corresponding pair of buckets  Accumulate the output from each bucket or pair Multipass Hash-Based Algorithms

191 Assumption: The tuples divide as evenly as possible among the buckets. Practically, there will be some unevenness in the tuple distribution. Case 1: Unary operation: Let u(M,k) be the number of blocks in the largest relation that a k-pass hashing algorithm can handle. If R fits into M buffers, u(M,1) = M If R does not fit into M buffers,  Divide R into M-1 buckets of equal size – First pass  The buckets for the next pass must be small that they can be handled in k-1 passes, ie, buckets are of size u(M,k-1).  Since R is divided into M-1 buckets, we must have u(M,k) = (M-1) u(M,k-1) => u(M,k) = M(M-1)^(k-1) or approx, M^k, M is large. Performance of Multipass Hash-Based Algorithms

192 Case 2: Binary operation: Let j(M,k) be the upper bound on the size of the smaller of the two relations R and S. Basis: j(M,1) = M-1; ie, if we use the one-pass algorithm to join, then either R or S must fit in M-1 blocks. Induction:  First pass – Divide into M-1 buckets => j(M,k) = (M-1) j(M,k-1)  Each bucket size = 1/M-1  We must be able to join each pair of the corresponding buckets in M-1 passes j(M,k) = (M-1)^k, M is large => j(M,k) = M^k Performance of Multipass Hash-Based Algorithms

193 The Query Compiler 16.1 Parsing and Preprocessing Meghna Jain(205) Dr. T. Y. Lin

194 Presentation Outline 16.1 Parsing and Preprocessing 16.1.1 Syntax Analysis and Parse Tree 16.1.2 A Grammar for Simple Subset of SQL 16.1.3 The Preprocessor 16.1.4 Processing Queries Involving Views

195 Query compilation is divided into three steps 1. Parsing: Parse SQL query into parser tree. 2. Logical query plan: Transforms parse tree into expression tree of relational algebra. 3.Physical query plan: Transforms logical query plan into physical query plan.. Operation performed. Order of operation. Algorithm used. The way in which stored data is obtained and passed from one operation to another.

196 Parser Preprocessor Logical Query plan generator Query rewrite Preferred logical query plan Query Form a query to a logical query plan

197 Syntax Analysis and Parse Tree Parser takes the sql query and convert it to parse tree. Nodes of parse tree: 1. Atoms: known as Lexical elements such as key words, constants, parentheses, operators, and other schema elements. 2. Syntactic categories: Subparts that plays a similar role in a query as,

198 Grammar for Simple Subset of SQL ::= ::= ( )‏ ::= SELECT FROM WHERE ::=, ::= ::=, ::= ::= AND ::= IN ::= = ::= LIKE ::= Atoms(constants), (variable), ::= (can be expressed/defined as)‏

199 Query and Parse Tree StarsIn(title,year,starName) MovieStar(name,address,gender,birthdate)‏ Query: Give titles of movies that have at least one star born in 1960 SELECT title FROM StarsIn WHERE starName IN ( SELECT name FROM MovieStar WHERE birthdate LIKE '%1960%' );

200

201 Another query equivalent SELECT title FROM StarsIn, MovieStar WHERE starName = name AND birthdate LIKE '%1960%' ;

202 Parse Tree SELECT FROM WHERE, AND title StarsIn = LIKE starName name birthdate ‘%1960’ MovieStar

203 The Preprocessor Functions of Preprocessor. If a relation used in the query is virtual view then each use of this relation in the form-list must replace by parser tree that describe the view.. It is also responsible for semantic checking 1. Checks relation uses : Every relation mentioned in FROM- clause must be a relation or a view in current schema. 2. Check and resolve attribute uses: Every attribute mentioned in SELECT or WHERE clause must be an attribute of same relation in the current scope. 3. Check types: All attributes must be of a type appropriate to their uses.

204 StarsIn(title,year,starName) MovieStar(name,address,gender,birthdate)‏ Query: Give titles of movies that have at least one star born in 1960 SELECT title FROM StarsIn WHERE starName IN ( SELECT name FROM MovieStar WHERE birthdate LIKE '%1960%' );

205 Preprocessing Queries Involving Views When an operand in a query is a virtual view, the preprocessor needs to replace the operand by a piece of parse tree that represents how the view is constructed from base table. Base Table: Movies( title, year, length, genre, studioname, producerC#)‏ View definition : CREATE VIEW ParamountMovies AS SELECT title, year FROM movies WHERE studioName = 'Paramount'; Example based on view: SELECT title FROM ParamountMovies WHERE year = 1979;

206 Query Compiler [Figure 16.16 & Figure16.18] CS 257 [section 1] Aakanksha Pendse. Roll number : 127

207 Problem Statement To verify the claim given in the figure 16.18 using the figure 16.16

208 What is figure 16.16? The tree in this diagram represents the following query. ◦ “find movies with movie-stars born in 1960” Database tables with its attributes are: ◦ StarsIn(movieTitle, movieYear, starName) ◦ MovieStar(name, address, gender, birthdate) SQL Query: SELECT movieTitle FROM StarsIn WHERE starName IN ( SELECT name FROM MovieStar WHERE birthdate LIKE ‘%1960’ );

209 This diagram (fig.16.14) is derived from figure 16.14 This is done by applying the Rule which handles the two-argument selection with a condition involving IN In this query the sub-query is uncorrelated.

210 Figure 16.14

211 Figure 16.16 : applying the rule for IN condition

212 Rule

213 What is figure 16.18? The tree in this diagram represents the following query. ◦ “Find the movies where the average age of the stars was at most 40 when the movie was made.” SQL Query : SELECT DISTINCT m1.movieTitle, m1.movieYear FROM StarsIn m1 WHERE m1.movieYear – 40 <= (SELECT AVG(birthdate) FROM StarsIn m2, MovieStar s WHERE m2.starName = s.name AND m1.movieTitle = m2.movieTitle AND m1.movieYear = m2.movieYear); This is a co-related sub-query.

214 Rules for translating a co-related sub- query to relational algebra. Correlated sub queries contains unknown values defined outside themselves. Because of this reason, co-related sub-queries cannot be translated in isolation. These types of sub queries need to be translated so that they produce a relation in which certain extra attributes appear. These attributes must later be compared with the externally defined attributes. Conditions that relate attributes from the sub query to attributes outside are then applied to this relation. The extra attributes which are not necessary, can be projected out. In this strategy, care should be taken of not forming duplicate tuples at the end.

215 Steps of formation of tree in figure 16.18 The tree in figure 16.18 is formed by parsing of the query and partial translation to relational algebra. The WHERE-clause of the sub query is split into two. It is used to covert the product of relations to an equijoin The aliases m1, m2 and s are made the nodes of the tree.

216 Figure 16.18 : partially transformed parse tree

217 16.2 ALGEBRAIC LAWS FOR IMPROVING QUERY PLANS Ramya Karri ID: 206

218 Optimizing the Logical Query Plan The translation rules converting a parse tree to a logical query tree do not always produce the best logical query tree. It is often possible to optimize the logical query tree by applying relational algebra laws to convert the original tree into a more efficient logical query tree. Optimizing a logical query tree using relational algebra laws is called heuristic optimization

219 Relational Algebra Laws These laws often involve the properties of: – commutativity - operator can be applied to operands independent of order. E.g. A + B = B + A - The “+” operator is commutative. – associativity - operator is independent of operand grouping. E.g. A + (B + C) = (A + B) + C - The “+” operator is associative.

220 Associative and Commutative Operators The relational algebra operators of cross-product (×), join ( ⋈ ), union, and intersection are all associative and commutative. Commutative R X S = S X R R ⋈ S = S ⋈ R R  S = S  R R ∩ S = S ∩ R Associative (R X S) X T = S X (R X T) (R ⋈ S) ⋈ T= S ⋈ (R ⋈ T) (R  S)  T = S  (R  T) (R ∩ S) ∩ T = S ∩ (R ∩ T)

221 Laws Involving Selection Complex selections involving AND or OR can be broken into two or more selections: (splitting laws) σ C1 AND C2 (R) = σ C1 ( σ C2 (R)) σ C1 OR C2 (R) = ( σ C1 (R) )  S ( σ C2 (R) ) Example – R={a,a,b,b,b,c} – p1 satisfied by a,b, p2 satisfied by b,c – σ p1vp2 (R) = {a,a,b,b,b,c} – σ p1 (R) = {a,a,b,b,b} – σ p2 (R) = {b,b,b,c} – σ p1 (R) U σ p2 (R) = {a,a,b,b,b,c}

222 Laws Involving Selection (Contd..) Selection is pushed through both arguments for union: σ C (R  S) = σ C (R)  σ C (S) Selection is pushed to the first argument and optionally the second for difference: σ C (R - S) = σ C (R) - S σ C (R - S) = σ C (R) - σ C (S)

223 Laws Involving Selection (Contd..) All other operators require selection to be pushed to only one of the arguments. For joins, may not be able to push selection to both if argument does not have attributes selection requires. σ C (R × S) = σ C (R) × S σ C (R ∩ S) = σ C (R) ∩ S σ C (R ⋈ S) = σ C (R) ⋈ S σ C (R ⋈ D S) = σ C (R) ⋈ D S

224 Laws Involving Selection (Contd..) Example Consider relations R(a,b) and S(b,c) and the expression σ (a=1 OR a=3) AND b<c (R ⋈ S) σ a=1 OR a=3 (σ b<c (R ⋈ S)) σ a=1 OR a=3 (R ⋈ σ b<c (S)) σ a=1 OR a=3 (R) ⋈ σ b<c (S)

225 Laws Involving Projection Like selections, it is also possible to push projections down the logical query tree. However, the performance gained is less than selections because projections just reduce the number of attributes instead of reducing the number of tuples.

226 Laws Involving Projection Laws for pushing projections with joins: π L (R × S) = π L( π M (R) × π N (S)) π L (R ⋈ S) = π L(( π M (R) ⋈ π N (S)) π L (R ⋈ D S) = π L(( π M (R) ⋈ D π N (S))

227 Laws Involving Projection Laws for pushing projections with set operations. Projection can be performed entirely before union. π L (R U B S) = π L (R) U B π L (S) Projection can be pushed below selection as long as we also keep all attributes needed for the selection ( M = L  attr(C)). π L ( σ C (R)) = π L ( σ C ( π M (R)))

228 Laws Involving Join We have previously seen these important rules about joins: 1. Joins are commutative and associative. 2. Selection can be distributed into joins. 3. Projection can be distributed into joins.

229 Laws Involving Duplicate Elimination The duplicate elimination operator (δ) can be pushed through many operators. R has two copies of tuples t, S has one copy of t, δ (RUS)=one copy of t δ (R) U δ (S)=two copies of t

230 Laws Involving Duplicate Elimination Laws for pushing duplicate elimination operator (δ): δ (R × S) = δ (R) × δ (S) δ (R S) = δ (R) δ (S) δ (R D S) = δ (R) D δ (S) δ ( σ C(R) = σ C ( δ (R))

231 Laws Involving Duplicate Elimination The duplicate elimination operator (δ) can also be pushed through bag intersection, but not across union, difference, or projection in general. δ (R ∩ S) = δ (R) ∩ δ (S)

232 Laws Involving Grouping The grouping operator (γ) laws depend on the aggregate operators used. There is one general rule, however, that grouping subsumes duplicate elimination: δ(γL(R)) = γL(R) The reason is that some aggregate functions are unaffected by duplicates (MIN and MAX) while other functions are (SUM, COUNT, and AVG).

233 Query Compiler By:Payal Gupta Roll No:106(225) Professor :Tsau Young Lin

234 Pushing Selections It is, replacing the left side of one of the rules by its right side. In pushing selections we first a selection as far up the tree as it would go, and then push the selections down all possible branches.

235 Let’s take an example: S t a r s I n ( t i t l e, year, starName) Movie(title, year, length, incolor, studioName, producerC#) Define view MoviesOf 1996 by: CREATE VIEW MoviesOfl996 AS SELECT * FROM Movie,WHERE year = 1996;

236 "which stars worked for which studios in 1996?“ can be given by a SQL Query: SELECT starName, studioName FROM MoviesOfl996 NATURAL JOIN StarsIn;

237 ΠstarName,studioName O Year=1996 StarsIn Movie Logical query plan constructed from definition of a query and view

238 ΠstarName,studioName O Year=1996 StarsIn Movie Year=1996 O Improving the query plan by moving selections up and down the tree

239 Laws Involving Projection "pushing" projections really involves introducing a new projection somewhere below an existing projection. projection keeps the number of tuples the same and only reduces the length of tuples. To describe the transformations of extended projection Consider a term E + x on the list for a projection, where E is an attribute or an expression involving attributes and constants and x is an output attribute.

240 Example Let R(a, b, c) and S(c, d, e) be two relations. Consider the expression x,+,,,, b+y(R w S). The input attributes of the projection are a,b, and e, and c is the only join attribute. We may apply the law for pushing projections below joins to get the equivalent expression: Πa+e->x,b->y(Πa,b,c(R) Πc,e(S)) Eliminating this projection and getting a third equivalent expression:Πa+e->x, b->y( R Πc,e(S))

241 In addition, we can perform a projection entirely before a bag union. That is: ΠL(R UB S)= ΠL(R) )UB ΠL(S)

242 Laws About Joins and Products laws that follow directly from the definition of the join: R c S = c( R * S) R S = ΠL( c ( R * S) ), where C is the condition that equates each pair of attributes from R and S with the same name. and L is a list that includes one attribute from each equated pair and all the other attributes of R and S. We identify a product followed by a selection as a join of some kind. O O

243 Laws Involving Duplicate Elimination The operator δ which eliminates duplicates from a bag can be pushed through many but not all operators. In general, moving a δ down the tree reduces the size of intermediate relations and may therefore beneficial. Moreover, sometimes we can move δ to a position where it can be eliminated altogether,because it is applied to a relation that is known not to possess duplicates.

244 δ (R)=R if R has no duplicates. Important cases of such a relation R include: a) A stored relation with a declared primary key, and b) A relation that is the result of a γ operation, since grouping creates a relation with no duplicates.

245 Several laws that "push" δ through other operators are: δ (R*S) =δ(R) * δ(S) δ (R S)=δ(R) δ(S) δ (R c S)=δ(R) c δ(S) δ ( c (R))= c (δ(R)) We can also move the δ to either or both of the arguments of an intersection: δ (R ∩ B S) = δ(R) ∩ B S = R ∩ B δ (S) = δ(R) ∩ B δ (S) OO

246 Laws Involving Grouping and Aggregation When we consider the operator γ, we find that the applicability of many transformations depends on the details of the aggregate operators used. Thus we cannot state laws in the generality that we used for the other operators. One exception is that a γ absorbs a δ. Precisely: δ(γ L (R))=γ L (R)

247 let us call an operator γ duplicate-impervious if the only aggregations in L are MIN and/or MAX then: γ L(R) = γ L (δ(R)) provided γL is duplicate- impervious.

248 Example Suppose we have the relations MovieStar(name, addr, gender, birthdate) StarsIn(movieTitle, movieyear, starname) and we want to know for each year the birthdate of the youngest star to appear in a movie that year. We can express this query as: SELECT movieyear, MAX(birth date) FROM MovieStar, StarsIn WHERE name = starName GROUP BY movieyear;

249 γ movieYear, MAX ( birthdate ) name = starName MovieStar StarsIn Initial logical query plan for the query O

250 Some transformations that we can apply to Fig are 1. Combine the selection and product into an equijoin. 2.Generate a δ below the γ, since the γ is duplicate- impervious. 3. Generate a Π between the γ and the introduced δ to project onto movie-Year and birthdate, the only attributes relevant to the γ

251 γ movieYear, MAX ( birthdate ) Π movieYear, birthdate δ name = starName MovieStar StarsIn Another query plan for the query

252 γ movieYear, MAX ( birthdate ) Π movieYear, birthdate name = starName δ Π birthdate,name Π movieYear,starname MovieStar StarsIn third query plan for Example

253 CS 255: Database System Principles slides: From Parse Trees to Logical Query Plans By:- Arunesh Joshi Id:-006538558

254 Agenda Conversion to Relational Algebra. Removing Sub queries From Conditions. Improving the Logical Query Plan. Grouping Associative/Commutative Operators.

255 Parsing Goal is to convert a text string containing a query into a parse tree data structure: – leaves form the text string (broken into lexical elements) – internal nodes are syntactic categories Uses standard algorithmic techniques from compilers – given a grammar for the language (e.g., SQL), process the string and build the tree

256 Example: SQL query SELECT title FROM StarsIn WHERE starName IN ( SELECT name FROM MovieStar WHERE birthdate LIKE ‘%1960’ ); (Find the movies with stars born in 1960) Assume we have a simplified grammar for SQL.

257 Example: Parse Tree SELECT FROM WHERE IN title StarsIn ( ) starName LIKE name MovieStar birthDate ‘%1960’ SELECT FROM WHERE

258 The Preprocessor It replaces each reference to a view with a parse (sub)-tree that describes the view (i.e., a query) It does semantic checking: – are relations and views mentioned in the schema? – are attributes mentioned in the current scope? – are attribute types correct?

259 Convert Parse Tree to Relational Algebra The complete algorithm depends on specific grammar, which determines forms of the parse trees Here is a flavor of the approach

260 Conversion Suppose there are no subqueries. SELECT att-list FROM rel-list WHERE cond is converted into PROJ att-list (SELECT cond (PRODUCT(rel-list))), or  att-list (  cond ( X (rel-list)))

261 SELECT movieTitle FROM StarsIn, MovieStar WHERE starName = name AND birthdate LIKE '%1960'; SELECT FROM WHERE, AND movieTitle StarsIn LIKE MovieStar birthdate '%1960' = starName name

262 Equivalent Algebraic Expression Tree  movieTitle starname = name AND birthdate LIKE '%1960' X StarsIn MovieStar 

263 Handling Subqueries Recall the (equivalent) query: SELECT title FROM StarsIn WHERE starName IN ( SELECT name FROM MovieStar WHERE birthdate LIKE ‘%1960’ ); Use an intermediate format called two- argument selection

264  title  StarsIn IN  name  birthdate LIKE ‘%1960’ starName MovieStar Example: Two-Argument Selection

265 Converting Two-Argument Selection To continue the conversion, we need rules for replacing two-argument selection with a relational algebra expression Different rules depending on the nature of the sub query Here is shown an example for IN operator and uncorrelated query (sub query computes a relation independent of the tuple being tested)

266 Rules for IN  R t IN S R  S  X C C is the condition that equates attributes in t with corresponding attributes in S

267 Example: Logical Query Plan  title  starName=name StarsIn  name  birthdate LIKE ‘%1960’ MovieStar 

268 What if Subquery is Correlated? Example is when subquery refers to the current tuple of the outer scope that is being tested More complicated to deal with, since subquery cannot be translated in isolation Need to incorporate external attributes in the translation Some details are in textbook

269 Improving the Logical Query Plan There are numerous algebraic laws concerning relational algebra operations By applying them to a logical query plan judiciously, we can get an equivalent query plan that can be executed more efficiently Next we'll survey some of these laws

270 Example: Improved Logical Query Plan  title starName=name StarsIn  name  birthdate LIKE ‘%1960’ MovieStar

271 Associative and Commutative Operations product natural join set and bag union set and bag intersection  associative: (A op B) op C = A op (B op C)  commutative: A op B = B op A

272 Laws Involving Selection Selections usually reduce the size of the relation Usually good to do selections early, i.e., "push them down the tree" Also can be helpful to break up a complex selection into parts

273 Selection Splitting  C1 AND C2 (R) =  C1 (  C2 (R))  C1 OR C2 (R) = (  C1 (R)) U set (  C2 (R)) if R is a set  C1 (  C2 (R)) =  C2 (  C1 (R))

274 Selection and Binary Operators Must push selection to both arguments: –  C (R U S) =  C (R) U  C (S) Must push to first arg, optional for 2nd: –  C (R - S) =  C (R) - S –  C (R - S) =  C (R) -  C (S) Push to at least one arg with all attributes mentioned in C: – product, natural join, theta join, intersection – e.g.,  C (R X S) =  C (R) X S, if R has all the atts in C

275 Pushing Selection Up the Tree Suppose we have relations – StarsIn(title,year,starName) – Movie(title,year,len,inColor,studioName) and a view – CREATE VIEW MoviesOf1996 AS SELECT * FROM Movie WHERE year = 1996; and the query – SELECT starName, studioName FROM MoviesOf1996 NATURAL JOIN StarsIn;

276 The Straightforward Tree  starName,studioName  year=1996 StarsIn Movie Remember the rule  C (R S) =  C (R) S ?

277 The Improved Logical Query Plan  starName,studioName  year=1996 StarsIn Movie  starName,studioName  year=1996 Movie StarsIn  starName,studioName  year=1996 Movie StarsIn push selection up tree push selection down tree

278 Grouping Assoc/Comm Operators Groups together adjacent joins, adjacent unions, and adjacent intersections as siblings in the tree Sets up the logical QP for future optimization when physical QP is constructed: determine best order for doing a sequence of joins (or unions or intersections) U D E F U U A BC DEF A B C

279 16.4 Estimating the Cost of Operations Project GuidePrepared By Dr. T. Y. LinAkshay Shenoy Computer Science Dept San Jose State University

280 Introduction Possible Physical Plan Estimating Sizes of Intermediate Relations Estimating the Size of a Projection Estimating the Size of a Selection Estimating the Size of a Join Natural Joins With Multiple Join Attributes Joins of Many Relations Estimating Sizes of Other Operations

281 Physical Plan For each physical plan select An order and grouping for associative-and- commutative operations like joins, unions. An Algorithm for each operator in the logical plan. eg: whether nested loop join or hash join to be used Additional operators that are needed for the physical plan but that were not present explicitly in the logical plan. eg: scanning, sorting The way in which arguments are passed from one operator to the next.

282 Estimating Sizes of Intermediate Relations Rules for estimating the number of tuples in an intermediate relation: 1.Give accurate estimates 2.Are easy to compute 3.Are logically consistent Objective of estimation is to select best physical plan with least cost.

283 Estimating the Size of a Projection The projection is different from the other operators, in that the size of the result is computable. Since a projection produces a result tuple for every argument tuple, the only change in the output size is the change in the lengths of the tuples.

284 Estimating the Size of a Selection(1) Let,,where A is an attribute of R and C is a constant. Then we recommend as an estimate: T(S) =T(R)/V(R,A) The rule above surely holds if all values of attribute A occur equally often in the database.

285 Estimating the Size of a Selection(2) If,then our estimate for T(s) is: T(S) = T(R)/3 We may use T(S)=T(R)(V(R,a) -1 )/ V(R,a) as an estimate. When the selection condition C is the And of several equalities and inequalities, we can treat the selection as a cascade of simple selections, each of which checks for one of the conditions.

286 Estimating the Size of a Selection(3) A less simple, but possibly more accurate estimate of the size of is to assume that C1 and of which satisfy C2, we would estimate the number of tuples in S as In explanation, is the fraction of tuples that do not satisfy C1, and is the fraction that do not satisfy C2. The product of these numbers is the fraction of R’s tuples that are not in S, and 1 minus this product is the fraction that are in S.

287 Estimating the Size of a Join two simplifying assumptions: 1. Containment of Value Sets If R and S are two relations with attribute Y and V(R,Y)<=V(S,Y) then every Y-value of R will be a Y-value of S. 2. Preservation of Value Sets Join a relation R with another relation S with attribute A in R and not in S then all distinct values of A are preserved and not lost.V(S R,A) = V(R,A) Under these assumptions, we estimate T(R S) = T(R)T(S)/max(V(R,Y), V(S, Y))

288 Natural Joins With Multiple Join Attributes Of the T(R),T(S) pairs of tuples from R and S, the expected number of pairs that match in both y1 and y2 is: T(R)T(S)/max(V(R,y1), V(S,y1)) max(V(R, y2), V(S, y2)) In general, the following rule can be used to estimate the size of a natural join when there are any number of attributes shared between the two relations. The estimate of the size of R S is computed by multiplying T(R) by T(S) and dividing by the largest of V(R,y) and V(S,y) for each attribute y that is common to R and S.

289 Joins of Many Relations(1) rule for estimating the size of any join Start with the product of the number of tuples in each relation. Then, for each attribute A appearing at least twice, divide by all but the least of V(R,A)’s. We can estimate the number of values that will remain for attribute A after the join. By the preservation-of-value-sets assumption, it is the least of these V(R,A)’s.

290 Joins of Many Relations(2) Based on the two assumptions-containment and preservation of value sets: No matter how we group and order the terms in a natural join of n relations, the estimation of rules, applied to each join individually, yield the same estimate for the size of the result. Moreover, this estimate is the same that we get if we apply the rule for the join of all n relations as a whole.

291 Estimating Sizes for Other Operations Union: the average of the sum and the larger. Intersection: approach1: take the average of the extremes, which is the half the smaller. approach2: intersection is an extreme case of the natural join, use the formula T(R S) = T(R)T(S)/max(V(R,Y), V(S, Y))

292 Estimating Sizes for Other Operations Difference: T(R)-(1/2)*T(S) Duplicate Elimination: take the smaller of (1/2)*T(R) and the product of all the V(R, )’s. Grouping and Aggregation: upper-bound the number of groups by a product of V(R,A)’s, here attribute A ranges over only the grouping attributes of L. An estimate is the smaller of (1/2)*T(R) and this product.

293 16.5 Introduction to Cost-based plan selection Amith KC Student Id: 109

294 Whether selecting a logical query plan or constructing a physical query plan from a logical plan, the query optimizer needs to estimate the cost of evaluating certain expressions. We shall assume that the "cost" of evaluating an expression is approximated well by the number of disk I/O's performed.

295 The number of disk I/O’s, in turn, is influenced by: 1. The particular logical operators chosen to implement the query, a matter decided when we choose the logical query plan. 2. The sizes of intermediate results (whose estimation we discussed in Section 16.4) 3. The physical operators used to implement logical operators. e.g.. The choice of a one-pass or two-pass join, or the choice to sort or not sort a given relation. 4. The ordering of similar operations, especially joins 5. The method of passing arguments from one physical operator to the next.

296 Obtaining Estimates for Size Parameter The formulas of Section 16.4 were predicated on knowing certain important parameters, especially T(R), the number of tuples in a relation R, and V(R, a), the number of different values in the column of relation R for attribute a. A modern DBMS generally allows the user or administrator explicitly to request the gathering of statistics, such as T(R) and V(R, a). These statistics are then used in subsequent query optimizations to estimate the cost of operations. By scanning an entire relation R, it is straightforward to count the number of tuples T(R) and also to discover the number of different values V(R, a) for each attribute a. The number of blocks in which R can fit, B(R), can be estimated either by counting the actual number of blocks used (if R is clustered), or by dividing T(R) by the number of tuples per block

297 Computation of Statistics Periodic re-computation of statistics is the norm in most DBMS's, for several reasons. – First, statistics tend not to change radically in a short time. – Second, even somewhat inaccurate statistics are useful as long as they are applied consistently to all the plans. – Third, the alternative of keeping statistics up-to-date can make the statistics themselves into a "hot-spot" in the database; because statistics are read frequently, we prefer not to update them frequently too.

298 The recomputation of statistics might be triggered automatically after some period of time, or after some number of updates. However, a database administrator noticing, that poor- performing query plans are being selected by the query optimizer on a regular basis, might request the recomputation of statistics in an attempt to rectify the problem. Computing statistics for an entire relation R can be very expensive, particularly if we compute V(R, a) for each attribute a in the relation. One common approach is to compute approximate statistics by sampling only a fraction of the data. For example, let us suppose we want to sample a small fraction of the tuples to obtain an estimate for V(R, a).

299 Heuristics for Reducing the Cost of Logical Query Plans One important use of cost estimates for queries or sub- queries is in the application of heuristic transformations of the query. We have already observed previously how certain heuristics applied independent of cost estimates can be expected almost certainly to improve the cost of a logical query plan. However, there are other points in the query optimization process where estimating the cost both before and after a transformation will allow us to apply a transformation where it appears to reduce cost and avoid the transformation otherwise. In particular, when the preferred logical query plan is being generated, we may consider a number of optional transformations and the costs before and after.

300 Because we are estimating the cost of a logical query plan, so we have not yet made decisions about the physical operators that will be used to implement the operators of relational algebra, our cost estimate cannot be based on disk I/Os. Rather, we estimate the sizes of all intermediate results using the techniques of Section 16.1, and their sum is our heuristic estimate for the cost of the entire logical plan. For example,

301 Consider the initial logical query plan of as shown below,

302 The statistics for the relations R and S be as follows To generate a final logical query plan from, we shall insist that the selection be pushed down as far as possible. However, we are not sure whether it makes sense to push the δ below the join or not. Thus, we generate from the two query plans shown in next slide. They differ in whether we have chosen to eliminate duplicates before or after the join.

303 (a)(b)

304 We know how to estimate the size of the result of the selections, we divide T(R) by V(R, a) = 50. We also know how to estimate the size of the joins; we multiply the sizes of the arguments and divide by max(V(R, b), V(S, b)), which is 200.

305 Approaches to Enumerating Physical Plans Let us consider the use of cost estimates in the conversion of a logical query plan to a physical query plan. The baseline approach, called exhaustive, is to consider all combinations of choices (for each of issues like order of joins, physical implementation of operators, and so on). Each possible physical plan is assigned an estimated cost, and the one with the smallest cost is selected.

306 There are two broad approaches to exploring the space of possible physical plans: – Top-down: Here, we work down the tree of the logical query plan from the root. – Bottom-up: For each sub-expression of the logical-query-plan tree, we compute the costs of all possible ways to compute that sub- expression. The possibilities and costs for a sub-expression E are computed by considering the options for the sub-expressions for E, and combining them in all possible ways with implementations for the root operator of E.

307 Branch-and-Bound Plan Enumeration This approach, often used in practice, begins by using heuristics to find a good physical plan for the entire logical query plan. Let the cost of this plan be C. Then as we consider other plans for sub-queries, we can eliminate any plan for a sub-query that has a cost greater than C, since that plan for the sub-query could not possibly participate in a plan for the complete query that is better than what we already know. Likewise, if we construct a plan for the complete query that has cost less than C, we replace C by the cost of this better plan in subsequent exploration of the space of physical query plans.

308 Hill Climbing This approach, in which we really search for a “valley” in the space of physical plans and their costs; starts with a heuristically selected physical plan. We can then make small changes to the plan, e.g., replacing one method for an operator by another, or reordering joins by using the associative and/or commutative laws, to find "nearby" plans that have lower cost. When we find a plan such that no small modification yields a plan of lower cost, we make that plan our chosen physical query plan.

309 Dynamic Programming In this variation of the general bottom-UP strategy, we keep for each sub-expression only the plan of least cost. As we work UP the tree, we consider possible implementations of each node, assuming the best plan for each sub-expression is also used.

310 Selinger-Style Optimization This approach improves upon the dynamic-programming approach by keeping for each sub-expression not only the plan of least cost, but certain other plans that have higher cost, yet produce a result that is sorted in an order that may be useful higher up in the expression tree. Examples of such interesting orders are when the result of the sub-expression is sorted on one of: – The attribute(s) specified in a sort (r) operator at the root – The grouping attribute(s) of a later group-by (γ) operator. – The join attribute(s) of a later join.

311 Choosing an Order for Joins Chapter 16.6 by: Chiu Luk ID: 210

312 Introduction This section focuses on critical problem in cost-based optimization: – Selecting order for natural join of three or more relations Compared to other binary operations, joins take more time and therefore need effective optimization techniques

313 Introduction

314 Significance of Left and Right Join Arguments The argument relations in joins determine the cost of the join The left argument of the join is – Called the build relation – Assumed to be smaller – Stored in main-memory

315 Significance of Left and Right Join Arguments The right argument of the join is – Called the probe relation – Read a block at a time – Its tuples are matched with those of build relation The join algorithms which distinguish between the arguments are: – One-pass join – Nested-loop join – Index join

316 Join Trees Order of arguments is important for joining two relations Left argument, since stored in main-memory, should be smaller With two relations only two choices of join tree With more than two relations, there are n! ways to order the arguments and therefore n! join trees, where n is the no. of relations

317 Join Trees Order of arguments is important for joining two relations Left argument, since stored in main-memory, should be smaller With two relations only two choices of join tree With more than two relations, there are n! ways to order the arguments and therefore n! join trees, where n is the no. of relations

318 Join Trees Total # of tree shapes T(n) for n relations given by recurrence: T(1) = 1 T(2) = 1 T(3) = 2 T(4) = 5 … etc

319 Left-Deep Join Trees Consider 4 relations. Different ways to join them are as follows

320 In fig (a) all the right children are leaves. This is a left-deep tree In fig (c) all the left children are leaves. This is a right-deep tree Fig (b) is a bushy tree Considering left-deep trees is advantageous for deciding join orders

321 Join order Join order selection – A1 A2 A3.. An – Left deep join trees – Dynamic programming Best plan computed for each subset of relations – Best plan (A1,.., An) = min cost plan of( Best plan(A2,.., An) A1 Best plan(A1, A3,.., An) A2 …. Best plan(A1,.., An-1)) An Ai An

322 Dynamic Programming to Select a Join Order and Grouping Three choices to pick an order for the join of many relations are: – Consider all of the relations – Consider a subset – Use a heuristic o pick one Dynamic programming is used either to consider all or a subset – Construct a table of costs based on relation size – Remember only the minimum entry which will required to proceed

323 Dynamic Programming to Select a Join Order and Grouping

324

325

326

327 A Greedy Algorithm for Selecting a Join Order It is expensive to use an exhaustive method like dynamic programming Better approach is to use a join-order heuristic for the query optimization Greedy algorithm is an example of that – Make one decision at a time about order of join and never backtrack on the decisions once made

328 UPDATES Reducing Cost by Heuristics: Applies for logical query plan Estimate cost before and after a transformation Only choose/apply transformation when cost estimations show beneficial JOIN TREES: Order of arguments is important for joining two relations Left argument, since stored in main-memory, should be smaller

329 Completing the Physical-Query-Plan and Chapter 16 Summary (16.7-16.8) CS257 Spring 2009 Professor Tsau Lin Student: Suntorn Sae-Eung Donavon Norwood

330 Outline 16.7 Completing the Physical-Query-Plan I. Choosing a Selection Method II. Choosing a Join Method III. Pipelining Versus Materialization IV. Pipelining Unary Operations V. Pipelining Binary Operations VI. Notation for Physical Query Plan VII. Ordering the Physical Operations 16.8 Summary of Chapter 16 330

331 Before complete Physical-Query-Plan A query previously has been – Parsed and Preprocessed (16.1) – Converted to Logical Query Plans (16.3) – Estimated the Costs of Operations (16.4) – Determined costs by Cost-Based Plan Selection (16.5) – Weighed costs of join operations by choosing an Order for Joins 331

332 16.7 Completing the Physical-Query-Plan 3 topics related to turning LP into a complete physical plan 1.Choosing of physical implementations such as Selection and Join methods 2.Decisions regarding to intermediate results (Materialized or Pipelined) 3.Notation for physical-query-plan operators 332

333 I. Choosing a Selection Method (A) Algorithms for each selection operators 1. Can we use an created index on an attribute? – If yes, index-scan. Otherwise table-scan) 2. After retrieve all condition-satisfied tuples in (1), then filter them with the rest selection conditions 333

334 Choosing a Selection Method(A) (cont.) Recall  Cost of query = # disk I/O’s How costs for various plans are estimated from σ C (R) operation 1. Cost of table-scan algorithm a)B(R) if R is clustered b)T(R) if R is not clustered 2. Cost of a plan picking an equality term (e.g. a = 10) w/ index-scan a)B(R) / V(R, a) clustering index b)T(R) / V(R, a) nonclustering index 3. Cost of a plan picking an inequality term (e.g. b < 20) w/ index-scan a)B(R) / 3 clustering index b)T(R) / 3 nonclustering index 334

335 Example Selection: σ x=1 AND y=2 AND z<5 (R) - Where parameters of R(x, y, z) are : T(R)=5000,B(R)=200, V(R,x)=100, andV(R, y)=500 -Relation R is clustered -x, y have nonclustering indexes, only index on z is clustering. 335

336 Example (cont.) Selection options: 1.Table-scan  filter x, y, z. Cost is B(R) = 200 since R is clustered. 2.Use index on x =1  filter on y, z. Cost is 50 since T(R) / V(R, x) is (5000/100) = 50 tuples, index is not clustering. 3.Use index on y =2  filter on x, z. Cost is 10 since T(R) / V(R, y) is (5000/500) = 10 tuples using nonclustering index. 4.Index-scan on clustering index w/ z < 5  filter x, y. Cost is about B(R) /3 = 67 336

337 Example (cont.) Costs option 1 = 200 option 2 = 50 option 3 = 10 option 4 = 67 The lowest Cost is option 3. Therefore, the preferred physical plan 1.retrieves all tuples with y = 2 2.then filters for the rest two conditions (x, z). 337

338 II. Choosing a Join Method Determine costs associated with each join algorithms: 1. One-pass join, and nested-loop join devotes enough buffer to joining 2. Sort-join is preferred when attributes are pre-sorted or two or more join on the same attribute such as ( R(a, b) S(a, c)) T(a, d) - where sorting R and S on a will produce result of R S to be sorted on a and used directly in next join 338

339 Choosing a Join Method (cont.) 3. Index-join for a join with high chance of using index created on the join attribute such as R(a, b) S(b, c) 4. Hashing join is the best choice for unsorted or non-indexing relations which needs multipass join. 339

340 III. Pipelining Versus Materialization Materialization (naïve way) – store (intermediate) result of each operations on disk Pipelining (more efficient way) – Interleave the execution of several operations, the tuples produced by one operation are passed directly to the operations that used it – store (intermediate) result of each operations on buffer, which is implemented on main memory 340

341 IV. Pipelining Unary Operations Unary = a-tuple-at-a-time or full relation selection and projection are the best candidates for pipelining. 341 R In buf Unary operation Out buf In buf Unary operation Out buf M-1 buffers

342 Pipelining Unary Operations (cont.) Pipelining Unary Operations are implemented by iterators 342

343 V. Pipelining Binary Operations Binary operations : , , -,, x The results of binary operations can also be pipelined. Use one buffer to pass result to its consumer, one block at a time. The extended example shows tradeoffs and opportunities 343

344 Example Consider physical query plan for the expression ( R(w, x) S(x, y)) U(y, z) Assumption – R occupies 5,000 blocks, S and U each 10,000 blocks. – The intermediate result R S occupies k blocks for some k. – Both joins will be implemented as hash-joins, either one-pass or two-pass depending on k – There are 101 buffers available. 344

345 Example (cont.) First consider join R S, neither relations fits in buffers Needs two-pass hash-join to partition R into 100 buckets (maximum possible) each bucket has 50 blocks The 2 nd pass hash-join uses 51 buffers, leaving the rest 50 buffers for joining result of R S with U. 345

346 Example (cont.) Case 1: suppose k  49, the result of R S occupies at most 49 blocks. Steps 1.Pipeline in R S into 49 buffers 2.Organize them for lookup as a hash table 3.Use one buffer left to read each block of U in turn 4.Execute the second join as one-pass join. 346

347 Example (cont.) The total number of I/O’s is 55,000 – 45,000 for two-pass hash join of R and S – 10,000 to read U for one- pass hash join of (R S) U. 347

348 Example (cont.) Case 2: suppose k > 49 but < 5,000, we can still pipeline, but need another strategy which intermediate results join with U in a 50-bucket, two-pass hash-join. Steps are: 1.Before start on R S, we hash U into 50 buckets of 200 blocks each. 2.Perform two-pass hash join of R and U using 51 buffers as case 1, and placing results in 50 remaining buffers to form 50 buckets for the join of R S with U. 3.Finally, join R S with U bucket by bucket. 348

349 Example (cont.) The number of disk I/O’s is: – 20,000 to read U and write its tuples into buckets – 45,000 for two-pass hash-join R S – k to write out the buckets of R S – k+10,000 to read the buckets of R S and U in the final join The total cost is 75,000+2k. 349

350 Example (cont.) Compare Increasing I/O’s between case 1 and case 2 – k  49 (case 1) Disk I/O’s is 55,000 – k > 50  5000 (case 2) k=50, I/O’s is 75,000+(2*50) = 75,100 k=51, I/O’s is 75,000+(2*51) = 75,102 k=52, I/O’s is 75,000+(2*52) = 75,104 Notice: I/O’s discretely grows as k increases from 49  50. 350

351 Example (cont.) Case 3: k > 5,000, we cannot perform two- pass join in 50 buffers available if result of R S is pipelined. Steps are 1.Compute R S using two-pass join and store the result on disk. 2.Join result on (1) with U, using two-pass join. 351

352 Example (cont.) The number of disk I/O’s is: – 45,000 for two-pass hash-join R and S – k to store R S on disk – 30,000 + k for two-pass join of U in R S The total cost is 75,000+4k. 352

353 Example (cont.) In summary, costs of physical plan as function of R S size. 353

354 VI. Notation for Physical Query Plans Several types of operators: 1.Operators for leaves 2.(Physical) operators for Selection 3.(Physical) Sorts Operators 4.Other Relational-Algebra Operations In practice, each DBMS uses its own internal notation for physical query plan. 354

355 Notation for Physical Query Plans (cont.) 1.Operator for leaves – A leaf operand is replaced in LQP tree TableScan(R) : read all blocks SortScan(R, L) : read in order according to L IndexScan(R, C): scan index attribute A by condition C of form Aθc. IndexScan(R, A) : scan index attribute R.A. This behaves like TableScan but more efficient if R is not clustered. 355

356 Notation for Physical Query Plans (cont.) 2.(Physical) operators for Selection – Logical operator σ C (R) is often combined with access methods. If σ C (R) is replaced by Filter(C), and there is no index on R or an attribute on condition C – Use TableScan or SortScan(R, L) to access R If condition C  Aθc AND D for condition D, and there is an index on R.A, then we may – Use operator IndexScan(R, Aθc) to access R and – Use Filter(D ) in place of the selection σ C (R) 356

357 Notation for Physical Query Plans (cont.) 3.(Physical) Sort Operators – Sorting can occur any point in physical plan, which use a notation SortScan(R, L). – It is common to use an explicit operator Sort(L) to sort relation that is not stored. – Can apply at the top of physical-query-plan tree if the result needs to be sorted with ORDER BY clause (г). 357

358 Notation for Physical Query Plans (cont.) 4.Other Relational-Algebra Operations – Descriptive text definitions and signs to elaborate Operations performed e.g. Join or grouping. Necessary parameters e.g. theta-join or list of elements in a grouping. A general strategy for the algorithm e.g. sort-based, hashed based, or index-based. A decision about number of passed to be used e.g. one-pass, two-pass or multipass. An anticipated number of buffers the operations will required. 358

359 Notation for Physical Query Plans (cont.) Example of a physical-query-plan – A physical-query-plan in example 16.36 for the case k > 5000 TableScan Two-pass hash join Materialize (double line) Store operator 359

360 Notation for Physical Query Plans (cont.) Another example – A physical-query-plan in example 16.36 for the case k < 49 TableScan (2) Two-pass hash join Pipelining Different buffers needs Store operator 360

361 Notation for Physical Query Plans (cont.) A physical-query-plan in example 16.35 – Use Index on condition y = 2 first – Filter with the rest condition later on. 361

362 VII. Ordering of Physical Operations The PQP is represented as a tree structure implied order of operations. Still, the order of evaluation of interior nodes may not always be clear. – Iterators are used in pipeline manner – Overlapped time of various nodes will make “ordering” no sense. 362

363 Ordering of Physical Operations (cont.) 3 rules summarize the ordering of events in a PQP tree: 1.Break the tree into sub-trees at each edge that represent materialization. Execute one subtree at a time. 2.Order the execution of the subtree Bottom-top Left-to-right 3.All nodes of each sub-tree are executed simultaneously. 363

364 Summary of Chapter 16 In this part of the presentation I will talk about the main topics of Chapter 16. 364

365 365 COMPILATION OF QUERIES Compilation means turning a query into a physical query plan, which can be implemented by query engine. Steps of query compilation : – Parsing – Semantic checking – Selection of the preferred logical query plan – Generating the best physical plan

366 366 THE PARSER The first step of SQL query processing. Generates a parse tree Nodes in the parse tree corresponds to the SQL constructs Similar to the compiler of a programming language

367 367 VIEW EXPANSION A very critical part of query compilation. Expands the view references in the query tree to the actual view. Provides opportunities for the query optimization.

368 368 SEMANTIC CHECKING Checks the semantics of a SQL query. Examines a parse tree. Checks : – Attributes – Relation names – Types Resolves attribute references.

369 369 CONVERSION TO A LOGICAL QUERY PLAN Converts a semantically parsed tree to a algebraic expression. Conversion is straightforward but sub queries need to be optimized. Two argument selection approach can be used.

370 370 ALGEBRAIC TRANSFORMATION Many different ways to transform a logical query plan to an actual plan using algebraic transformations. The laws used for this transformation : – Commutative and associative laws – Laws involving selection – Pushing selection – Laws involving projection – Laws about joins and products – Laws involving duplicate eliminations – Laws involving grouping and aggregation

371 371 ESTIMATING SIZES OF RELATIONS True running time is taken into consideration when selecting the best logical plan. Two factors the affects the most in estimating the sizes of relation : – Size of relations ( No. of tuples ) – No. of distinct values for each attribute of each relation Histograms are used by some systems.

372 372 COST BASED OPTIMIZING Best physical query plan represents the least costly plan. Factors that decide the cost of a query plan : – Order and grouping operations like joins, unions and intersections. – Nested loop and the hash loop joins used. – Scanning and sorting operations. – Storing intermediate results.

373 373 PLAN ENUMERATION STRATEGIES Common approaches for searching the space for best physical plan. – Dynamic programming : Tabularizing the best plan for each sub expression – Selinger style programming : sort-order the results as a part of table – Greedy approaches : Making a series of locally optimal decisions – Branch-and-bound : Starts with enumerating the worst plans and reach the best plan

374 374 LEFT-DEEP JOIN TREES Left – Deep Join Trees are the binary trees with a single spine down the left edge and with leaves as right children. This strategy reduces the number of plans to be considered for the best physical plan. Restrict the search to Left – Deep Join Trees when picking a grouping and order for the join of several relations.

375 375 PHYSICAL PLANS FOR SELECTION Breaking a selection into an index-scan of relation, followed by a filter operation. The filter then examines the tuples retrieved by the index-scan. Allows only those to pass which meet the portions of selection condition.

376 376 PIPELINING VERSUS MATERIALIZING This flow of data between the operators can be controlled to implement “ Pipelining “. The intermediate results should be removed from main memory to save space for other operators. This techniques can implemented using “ materialization “. Both the pipelining and the materialization should be considered by the physical query plan generator. An operator always consumes the result of other operator and is passed through the main memory.

377 UPDATES Pipelining Versus Materialization Materialization (naïve way) store (intermediate) result of each operations on disk Pipelining (more efficient way) Interleave the execution of several operations, the tuples produced by one operation are passed directly to the operations that used it store (intermediate) result of each operations on buffer, which is implemented on main memory

378 CONCURRENCY CONTROL 18.1 Serial and Serializable Schedule By: Nitin Mathur Id: 110 CS: 257

379 What Is Concurrency Control & Who controls it? A process of assuming that the transactions preserve the consistency when executing simultaneously is called Concurrency Control. This consistency is taken care by Scheduler.

380 Flow How Transaction is Executed Disk Buffer INPUT OUTPUT READ & WRITE

381 Transaction Manager Scheduler Buffer Read & Writes Read / Write Requests

382 Correctness Principle It’s a principle that states that a transaction starts in a correct database state and ends in a correct database state. Does the system really follow the correctness principal all the time?

383 Basic Example Schedule T1 READ (A,t) t := t+100 WRITE (A,t) READ (B,t) t := t+100 WRITE (B,t) T2 READ (A,s) s := s*2 WRITE (A,s) READ (B,s) s := s*2 WRITE (B,s) A=B=50 To be consistent the final state should be A=B

384 Serial Schedule T1T2AB50 READ (A,t) t := t+100 WRITE (A,t)150 READ (B,t) t := t+100 WRITE (B,t)150 READ (A,s) s := s*2 WRITE (A,s)300 READ (B,s) s := s*2 WRITE (B,s)300 (T1,T2) A := 2*(A+100)

385 Does the order really matter? T1T2AB50 READ (A,s) s := s*2 WRITE (A,s)100 READ (B,s) s := s*2 WRITE (B,s)100 READ (A,t) t := t+100 WRITE (A,t)200 READ (B,t) t := t+100 WRITE (B,t)200 (T2,T1) The final state of a database is not independent of the order of transaction.

386 Serializable Schedule T1T2AB50 READ (A,t) t := t+100 WRITE (A,t)150 READ (A,s) s := s*2 WRITE (A,s)300 READ (B,t) t := t+100 WRITE (B,t)150 READ (B,s) s := s*2 WRITE (B,s)300 Serializable but not Serial Schedule

387 Non-Serializable Schedule T1T2AB50 READ (A,t) t := t+100 WRITE (A,t)150 READ (A,s) s := s*2 WRITE (A,s)300 READ (B,s) s := s*2 WRITE (B,s)100 READ (B,t) t := t+100 WRITE (B,t)200 A := 2*(A+100) B := 2*B + 100

388 A Serializable Schedule with details T1T2AB50 READ (A,t) t := t+100 WRITE (A,t)150 READ (A,s) s := s*1 WRITE (A,s)150 READ (B,s) s := s*1 WRITE (B,s)50 READ (B,t) t := t+100 WRITE (B,t)150 A := 1*(A+100) B := 1*B + 100

389 Notations for Transaction 1.Action : An expression of the form r i (X) or w i (X) meaning that transaction T i reads or writes, respectively, the database X. 2.Transaction : A transaction T i is a sequence of actions with subscript. 3.Schedule : A schedule S of a transactions T is a sequence of actions, in which for each transaction T i in T, the action of T i appear in the definition of T i itself.

390 Notational Example T1 READ (A,t) t := t+100 WRITE (A,t) READ (B,t) t := t+100 WRITE (B,t) T2 READ (A,s) s := s*2 WRITE (A,s) READ (B,s) s := s*2 WRITE (B,s) Notation: T1 : r1(A); w1(A); r1(B); w1(B) T2 : r2(A); w2(A); r2(B); w2(B)

391 Concurrency Control 18.1 – 18.2 Chiu Luk CS257 Database Systems Principles Spring 2009

392 Concurrency Control Concurrency control in database management systems (DBMS) ensures that database transactions are performed concurrently without the concurrency violating the data integrity of a database. Executed transactions should follow the ACID rules. The DBMS must guarantee that only serializable (unless Serializability is intentionally relaxed), recoverable schedules are generated. It also guarantees that no effect of committed transactions is lost, and no effect of aborted (rolled back) transactions remains in the related database.

393 Transaction ACID rulesACID AtomicityAtomicity - Either the effects of all or none of its operations remain when a transaction is completed - in other words, to the outside world the transaction appears to be indivisible, atomic. ConsistencyConsistency - Every transaction must leave the database in a consistent state.transactionconsistent state IsolationIsolation - Transactions cannot interfere with each other. Providing isolation is the main goal of concurrency control. DurabilityDurability - Successful transactions must persist through crashes. crashes

394 Serial and Serializable Schedules In the field of databases, a schedule is a list of actions, (i.e. reading, writing, aborting, committing), from a set of transactions. In this example, Schedule D is the set of 3 transactions T1, T2, T3. The schedule describes the actions of the transactions as seen by the DBMS. T1 Reads and writes to object X, and then T2 Reads and writes to object Y, and finally T3 Reads and writes to object Z. This is an example of a serial schedule, because the actions of the 3 transactions are not interleaved.

395 Serial and Serializable Schedules A schedule that is equivalent to a serial schedule has the serializability property. In schedule E, the order in which the actions of the transactions are executed is not the same as in D, but in the end, E gives the same result as D.

396 Serial Schedule TI precedes T2 T1T2 Read(A); A  A+100 Write(A); Read(B); B  B+100; Write(B); Read(A);A  A  2; Write(A); Read(B);B  B  2; Write(B); AB25 125 250 250

397 Serial Schedule T2 precedes Tl T1T2 Read(A);A  A  2; Write(A); Read(B);B  B  2; Write(B); Read(A); A  A+100 Write(A); Read(B); B  B+100; Write(B); AB25 50 150 150

398 serializable, but not serial, schedule T1T2 Read(A); A  A+100 Write(A); Read(A);A  A  2; Write(A); Read(B); B  B+100; Write(B); Read(B);B  B  2; Write(B); AB25 125 250 125 250250 r1(A); w1 (A): r2(A); w2(A); r1 (B); w1 (B); r2(B); w2(B);

399 nonserializable schedule T1T2 Read(A); A  A+100 Write(A); Read(A);A  A  2; Write(A); Read(B);B  B  2; Write(B); Read(B); B  B+100; Write(B); AB25 125 250 50 150 250150

400 schedule that is serializable only because of the detailed behavior of the transactions T1T2’ Read(A); A  A+100 Write(A); Read(A);A  A  1; Write(A); Read(B);B  B  1; Write(B); Read(B); B  B+100; Write(B); regardless of the consistent initial state: the final state will be consistent. AB25 125 25 125125

401 Non-Conflicting Actions Two actions are non-conflicting if whenever they occur consecutively in a schedule, swapping them does not affect the final state produced by the schedule. Otherwise, they are conflicting.

402 Conflicting Actions: General Rules Two actions of the same transaction conflict: – r1(A) w1(B) Two actions over the same database element conflict, if one of them is a write – r1(A) w2(A) – w1(A) w2(A)

403 Conflict actions Two or more actions are said to be in conflict if: – The actions belong to different transactions. – At least one of the actions is a write operation. – The actions access the same object (read or write). The following set of actions is conflicting: – T1:R(X), T2:W(X), T3:W(X) While the following sets of actions are not: – T1:R(X), T2:R(X), T3:R(X) – T1:R(X), T2:W(Y), T3:R(X)

404 Conflict Serializable We may take any schedule and make as many nonconflicting swaps as we wish. With the goal of turning the schedule into a serial schedule. If we can do so, then the original schedule is serializable, because its effect on the database state remains the same as we perform each of the nonconflicting swaps.

405 Conflict Serializable A schedule is said to be conflict-serializable when the schedule is conflict-equivalent to one or more serial schedules. Another definition for conflict-serializability is that a schedule is conflict-serializable if and only if there exists an acyclic precedence graph/serializability graph for the schedule. Which is conflict-equivalent to the serial schedule, but not.

406 Conflict equivalent / conflict-serializable Let Ai and Aj are consecutive non-conflicting actions that belongs to different transactions. We can swap Ai and Aj without changing the result. Two schedules are conflict equivalent if they can be turned one into the other by a sequence of non-conflicting swaps of adjacent actions. We shall call a schedule conflict-serializable if it is conflict- equivalent to a serial schedule.

407 conflict-serializable T1 T2 R(A) W(A) R(A) R(B) W(A) W(B) R(B) W(B)

408 conflict-serializable T1 T2 R(A) W(A) R(B) R(A) W(A) W(B) R(B) W(B)

409 conflict-serializable T1 T2 R(A) W(A) R(A) R(B) W(B) W(A) R(B) W(B)

410 conflict-serializable T1 T2 R(A) W(A) R(A) W(B) R(B) W(A) R(B) W(B) Serial Schedule

411 Conflict-Serializability (section 18.2 of Concurrency Control) - Amith KC Student ID – 006498310 411

412 Overview One of the sufficient conditions to assure that a schedule is serializable is “ conflict-serializability ”. Idea of Conflicts. – Conflicting and non-conflicting actions. – Conflict-equivalent and conflict-serializability schedules. Precedence Graphs – Definition. – Test for conflict-serializability. 412

413 Conflicts Definition – is a pair of consecutive actions in a schedule such that, if their order is interchanged, then the behavior of at least one of the transactions involved can change. Non-conflicting actions: Let T i and T j be two different transactions (i ≠ j), then: – r i (X); r j (Y) is never a conflict, even if X = Y. – r i (X); w j (Y) is not a conflict provided X ≠ Y. – w i (X); r j (Y) is not a conflict provided X ≠ Y. – Similarly, w i (X); w j (Y) is also not a conflict, provided X ≠ Y. 413

414 continued… Three situations of conflicting actions (where we may not swap their order) – Two actions of the same transaction. e.g., r i (X);w i (Y) – Two writes of the same database element by different transactions. e.g., w i (X);w j (X) – A read and a write of the same database element by different transactions. e.g., r i (X);w j (X) To summarize, any two actions of different transactions may be swapped unless: – They involve the same database element, and – At least one of them is a write operation. 414

415 Converting conflict-serializable schedule to a serial schedule S: r 1 (A); w 1 (A); r 2 (A); w 2 (A); r 1 (B); w 1 (B); r 2 (B); w 2 (B); r 1 (A); w 1 (A); r 2 (A); w 2 (A); r 1 (B); w 1 (B); r 2 (B); w 2 (B); r 1 (A); w 1 (A); r 2 (A); r 1 (B); w 2 (A); w 1 (B); r 2 (B); w 2 (B); r 1 (A); w 1 (A); r 1 (B); r 2 (A); w 2 (A); w 1 (B); r 2 (B); w 2 (B); r 1 (A); w 1 (A); r 1 (B); r 2 (A); w 1 (B); w 2 (A); r 2 (B); w 2 (B); r 1 (A); w 1 (A); r 1 (B); w 1 (B); r 2 (A); w 2 (A); r 2 (B); w 2 (B); 415

416 Conflict-equivalent schedules: – Two schedules are called conflict-equivalent schedules if they can be turned one into the other by a sequence of non-conflicting swaps of adjacent actions. Conflict-serializability schedule: – A schedule is conflict-serializable if it is conflict-equivalent to a serial schedule. 416 continued…

417 Precedence Graphs Conflicting pairs of actions (of a schedule S) put constraints on the order of transactions in the hypothetical, conflict-equivalent serial schedule. For a schedule S, involving transactions T 1 and T 2 (among other transactions), we say that T 1 takes precedence over T 2 (written as T 1 < s T 2 )if there are actions A 1 of T 1 and A 2 of T 2, such that: – A 1 is ahead of A 2 in S, – Both A 1 and A 2 involve the same database element, and – At least one of them is a write operation. 417

418 continued… The precedences mentioned in the previous slide can be depicted in a “ precedence graph”. The nodes in this graph are the transactions of the schedule S. Example of a precedence graph: – Consider a schedule S which involves three transactions T 1, T 2 and T 3, i.e., S: r 2 (A); r 1 (B); w 2 (A); r 3 (A); w 1 (B); w 3 (A); r 2 (B); w 2 (B); The precedence graph for this as is shown below: 418 1 2 3 Figure 1

419 Test for conflict-serializability Construct the precedence graph for S and observe if there are any cycles. – If yes, then S is not conflict-serializable – Else, it is a conflict-serializable schedule. Example of a cyclic precedence graph: – Consider the below schedule S 1 : r 2 (A); r 1 (B); w 2 (A); r 2 (B); r 3 (A); w 1 (B); w 3 (A); w 2 (B); The precedence graph for this as shown below: 419 1 2 3 Figure 2

420 continued… Observing the actions of A in the previous example (figure 2), we can find that T 2 <s 1 T 3. But when we observe B, we get both T 1 <s 1 T 2 and T 2 <s 1 T 1. Thus the graph has a cycle between 1 and 2. So, based on this fact we can conclude that S 1 is not conflict-serializable. 420

421 Why the Precedence-Graph test works A cycle in the graph puts too many constraints on the order of transactions in a hypothetical conflict-equivalent serial schedule. If there is a cycle involving n transactions T 1 T 2..T n T 1 – Then in the hypothetical serial order, the actions of T 1 must precede those of T 2 which would precede those of T 3... up to n. – But actions of T n are also required to precede those of T 1. – So, if there is a cycle in the graph, then we can conclude that the schedule is not conflict-serializable. 421

422 Concurrency Control By Donavon Norwood Ankit Patel Aniket Mulye 422

423 INTRODUCTION Enforcing serializability by locks – Locks – Locking scheduler – Two phase locking Locking systems with several lock modes – Shared and exclusive locks – Compatibility matrices – Upgrading/updating locks – Incrementing locks 423

424 Locks It works like as follows : – A request from transaction – Scheduler checks in the lock table – Generates a serializable schedule of actions. 424

425 Consistency of transactions Actions and locks must relate each other – Transactions can only read & write only if has a lock and has not released the lock. – Unlocking an element is compulsory. Legality of schedules – No two transactions can aquire the lock on same element without the prior one releasing it. 425

426 Locking scheduler Grants lock requests only if it is in a legal schedule. Lock table stores the information about current locks on the elements. 426

427 The locking scheduler (contd.) A legal schedule of consistent transactions but unfortunately it is not a serializable. 427

428 Locking schedule (contd.) 428 The locking scheduler delays requests that would result in an illegal schedule.

429 Two-phase locking Guarantees a legal schedule of consistent transactions is conflict-serializable. All lock requests proceed all unlock requests. The growing phase: – Obtain all the locks and no unlocks allowed. The shrinking phase: – Release all the locks and no locks allowed. 429

430 Working of Two-Phase locking Assures serializability. Two protocols for 2PL: – Strict two phase locking : Transaction holds all its exclusive locks till commit / abort. – Rigorous two phase locking : Transaction holds all locks till commit / abort. Possible to find a transaction Tj that has a 2PL and a schedule S for Ti ( non 2PL ) and Tj that is not conflict serializable. 430

431 Failure of 2PL. 431 2PL fails to provide security against deadlocks.

432 Locking Systems with Several Lock Modes Locking Scheme – Shared/Read Lock ( For Reading) – Exclusive/Write Lock( For Writing) Compatibility Matrices Upgrading Locks Update Locks Increment Locks 432

433 Shared & Exclusive Locks Consistency of Transactions – Cannot write without Exclusive Lock – Cannot read without holding some lock This basically works on 2 principles – A read action can only proceed a shared or an exclusive lock – A write lock can only proceed a exclusice lock All locks need to be unlocked before commit 433

434 Shared and exclusive locks (cont.) Two-phase locking of transactions – Must precede unlocking Legality of Schedules – An element may be locked exclusively by one transaction or by several in shared mode, but not both. 434

435 Compatibility Matrices Has a row and column for each lock mode. – Rows correspond to a lock held on an element by another transaction – Columns correspond to mode of lock requested. – Example : 435 LOCKREQUESTED SX LOCKSYESNO HOLDXNO

436 Upgrading Locks Suppose a transaction wants to read as well as write : – It aquires a shared lock on the element – Performs the calculations on the element – And when its ready to write, It is granted a exclusive lock. Transactions with unpredicted read write locks can use UPGRADING LOCKS. 436

437 Upgrading locks (cont.) Indiscriminating use of upgrading produces a deadlock. Example : Both the transactions want to upgrade on the same element 437

438 Update locks Solves the deadlock occurring in upgrade lock method. A transaction in an update lock can read but cant write. Update lock can later be converted to exclusive lock. An update lock can only be given if the element has shared locks. 438

439 Update locks (cont.) An update lock is like a shared lock when you are requesting it and is like a exclusive lock when you have it. Compatibility matrix : 439 SXU SYESNOYES XNO U

440 Increment Locks Used for incrementing & decrementing stored values. E.g. - Transfer money from one bank to another, Ticket selling transactions in which number seats are decremented after each transaction. 440

441 Increment lock (cont.) A increment lock does not enable read or write locks on element. Any number of transaction can hold increment lock on element Shared and exclusive locks can not be granted if an increment lock is granted on element 441 SXI SYESNO X I YES

442 Concurrency Control: 18.4 Locking Systems with Several Lock Modes CS257 Spring/2009 Professor: Tsau Lin Student: Suntorn Sae-Eung ID: 212 442

443 18.4 Locking Systems with Several Lock Modes In 18.3, if a transaction must lock a database element (X) either reads or writes, – No reason why several transactions could not read X at the same time, as long as none write X Introduce locking schemes – Shared/Read Lock ( For Reading) – Exclusive/Write Lock( For Writing) 443

444 18.4.1 Shared & Exclusive Locks Transaction Consistency – Cannot write without Exclusive Lock – Cannot read without holding some lock – Consider lock for writing is “stronger” than for reading This basically works on 2 principles 1. A read action can only proceed a shared or an exclusive lock 2. A write lock can only proceed a exclusive lock All locks need to be unlocked before commit 444

445 18.4.1 Shared & Exclusive Locks (cont.) Two-phase locking (2PL) of transactions T i Notation: sl i (X)– T i requests shared lock on DB element X xl i (X)– T i requests exclusive lock on DB element X u i (X)– T i relinquishes whatever lock on X 445 Lock  R/W  Unlock

446 18.4.1 Shared & Exclusive Locks (cont.) Legality of Schedules – An element may be locked by: one write transaction or by several read transactions shared mode, but not both

447 18.4.2 Compatibility Matrices A convenient way to describe lock- management policies – Rows correspond to a lock held on an element by another transaction – Columns correspond to mode of lock requested. – Example : 447 Lock requested SX Lock in hold SYESNO X

448 18.4.3 Upgrading Locks A transaction (T) taking a shared lock is friendly toward other transaction. When T wants to read and write a new value X, 1. T takes a shared lock on X. 2. performs operations on X (may spend long time) 3. When T is ready to write a new value, “Upgrade” shared lock to exclusive lock on X. 448

449 18.4.3 Upgrading Locks (cont.) Observe the example T 1 cannot take an exclusive lock on B until all locks on B are released. ‘B’ is released T1 retry and succeed

450 18.4.3 Upgrading Locks (cont.) Upgrading can simply cause a “Deadlock”. – Both the transactions want to upgrade on the same element 450 Both transactions will wait forever !!

451 18.4.4 Update locks The third lock mode resolving the deadlock problem, which rules are – Only “Update lock” can be upgraded to a write (exclusive) lock later. – An “Update lock” is allowed to grant on X when there are already shared locks on X. – Once there is an “Update lock,” it prevents additional any kinds of lock, and later changes to a write (exclusive) lock. Notation: ul i (X) 451

452 18.4.4 Update locks (cont.) Example

453 18.4.4 Update locks (cont.) Compatibility matrix (asymmetric) 453 Lock requested SXU Lock in hold SYESNOYES XNO U

454 18.4.5 Increment Locks A useful lock for transactions which increase/decrease value. e.g. money transfer between two bank accounts. If 2 transactions (T 1, T 2 ) add constants to the same database element (X), – It doesn’t matter which goes first, but no reads are allowed in between transaction processing Let see on following exhibits 454

455 18.4.5 Increment Locks (cont.) A=5 A=15 A=17 A=7 T 1 : INC (A,2) T 2 : INC (A,10) CASE 1 CASE 2

456 18.4.5 Increment Locks (cont.) What if A=5 A=15 A=7 T 1 : INC (A,2) T 2 : INC (A,10) A=5A=7 A != 17 A=5

457 18.4.5 Increment Locks (cont.) INC (A, c) – – Increment action of writing on database element A, which is an atomic execution consisting of 1. READ(A,t); 2. t = t+c; 3. WRITE(A,t); Notation: – il i (X)– action of T i requesting an increment lock on X – inc i (X)– action of T i increments X by some constant; don’t care about the value of the constant.

458 18.4.5 Increment Locks (cont.) Example

459 18.4.5 Increment Locks (cont.) Compatibility matrix 459 Lock requested SXI Lock in hold SYESNO X I YES

460 Concurrency Control Chapter 18 Section 18.5 Presented by Khadke, Suvarna CS 257 (Section II) Id 213 460

461 Overview Assume knowledge of:  Lock  Two phase lock  Lock modes: shared, exclusive, update A simple scheduler architecture based on following principle :  Insert lock actions into the stream of reads, writes, and other actions  Release locks when the transaction manager tells it that the transaction will commit or abort 461

462 Scheduler That Inserts Lock Actions into the transactions request stream

463 Scheduler That Inserts Lock Actions If transaction is delayed, waiting for a lock, Scheduler performs following actions Part I: Takes the stream of requests generated by the transaction & insert appropriate lock modes to db operations (read, write, or update) Part II: Take actions (a lock or db operation) from Part I and executes it.  Determine the transaction (T) that action belongs and status of T (delayed or not). If T is not delayed then 1. Database access action is transmitted to the database and executed 463

464 Scheduler That Inserts Lock Actions 2.If lock action is received by PartII, it checks the L Table whether lock can be granted or not i> Granted, the L Table is modified to include granted lock ii>Not G. then update L Table about requested lock then PartII delays transaction T 3.When a T = commits or aborts, PartI is notified by the transaction manager and releases all locks. If any transactions are waiting for locks PartI notifies PartII. 3.Part II when notified about the lock on some DB element, determines next transaction T’ to get lock to continue. 464

465 The Lock Table A relation that associates database elements with locking information about that element Implemented with a hash table using database elements as the hash key Size is proportional to the number of lock elements only, not to the size of the entire database 465 DB element A Lock informatio n for A

466 Lock Table Entries Structure 466 Some Sort of information found in Lock Table entry 1>Group modes -S: only shared locks are held -X: one exclusive lock and no other locks - U: one update lock and one or more shared locks 2>wait : one transaction waiting for a lock on A 3>A list : T currently hold locks on A or Waiting for lock on A

467 Handling Lock Requests Suppose transaction T requests a lock on A If there is no lock table entry for A, then there are no locks on A, so create the entry and grant the lock request If the lock table entry for A exists, use the group mode to guide the decision about the lock request 467

468 Handling Lock Requests If group mode is U (update) or X (exclusive) No other lock can be granted  Deny the lock request by T  Place an entry on the list saying T requests a lock  And Wait? = ‘yes’ If group mode is S (shared) Another shared or update lock can be granted  Grant request for an S or U lock  Create entry for T on the list with Wait? = ‘no’  Change group mode to U if the new lock is an update lock 468

469 Handling Unlock Requests Now suppose transaction T unlocks A  Delete T’s entry on the list for A  If T’s lock is not the same as the group mode, no need to change group mode  Otherwise check entire list for new group mode S: GM(S) or nothing U: GM(S) or nothing X: nothing 469

470 Handling Unlock Requests 470 If the value of waiting is “yes" need to grant one or more locks using following approaches  First-Come-First-Served:  Grant the lock to the longest waiting request.  No starvation (waiting forever for lock)  Priority to Shared Locks:  Grant all S locks waiting, then one U lock.  Grant X lock if no others waiting  Priority to Upgrading:  If there is a U lock waiting to upgrade to an X lock, grant that first.

471 Concurrency Control Managing Hierarchies of Database Elements (18.6) 471 Presented by Ronak Shah (214) March 9, 2009

472 Managing Hierarchies of Database Elements Two problems that arise with locks when there is a tree structure to the data are: When the tree structure is a hierarchy of lockable elements – Determine how locks are granted for both large elements (relations) and smaller elements (blocks containing tuples or individual tuples) When the data itself is organized as a tree (B-tree indexes) – This will be discussed in the next section

473 Locks with Multiple Granularity A database element can be a relation, block or a tuple Different systems use different database elements to determine the size of the lock Thus some may require small database elements such as tuples or blocks and others may require large elements such as relations

474 Example of Multiple Granularity Locks Consider a database for a bank – Choosing relations as database elements means we would have one lock for an entire relation – If we were dealing with a relation having account balances, this kind of lock would be very inflexible and thus provide very little concurrency – Why? Because balance transactions require exclusive locks and this would mean only one transaction occurs for one account at any time – But as each account is independent of others we could perform transactions on different accounts simultaneously

475 …(contd.) – Thus it makes sense to have block element for the lock so that two accounts on different blocks can be updated simultaneously Another example is that of a document – With similar arguments as above, we see that it is better to have large element (a complete document) as the lock in this case

476 Warning (Intention) Locks These are required to manage locks at different granularities – In the bank example, if the a shared lock is obtained for the relation while there are exclusive locks on individual tuples, unserializable behavior occurs The rules for managing locks on hierarchy of database elements constitute the warning protocol

477 Database Elements Organized in Hierarchy

478 Rules of Warning Protocol These involve both ordinary (S and X) and warning (IS and IX) locks The rules are: – Begin at the root of hierarchy – Request the S/X lock if we are at the desired element – If the desired element id further down the hierarchy, place a warning lock (IS if S and IX if X) – When the warning lock is granted, we proceed to the child node and repeat the above steps until desired node is reached

479 Compatibility Matrix for Shared, Exclusive and Intention Locks ISIXSX ISYes No IXYes No SYesNoYesNo X The above matrix applies only to locks held by other transactions

480 Group Modes of Intention Locks An element can request S and IX locks at the same time if they are in the same transaction (to read entire element and then modify sub elements) This can be considered as another lock mode, SIX, having restrictions of both the locks i.e. No for all except IS SIX serves as the group mode

481 Example Consider a transaction T 1 as follows – Select * from table where attribute1 = ‘abc’ – Here, IS lock is first acquired on the entire relation; then moving to individual tuples (attribute = ‘abc’), S lock in acquired on each of them Consider another transaction T 2 – Update table set attribute2 = ‘def’ where attribute1 = ‘ghi’ – Here, it requires an IX lock on relation and since T 1 ’s IS lock is compatible, IX is granted

482 – On reaching the desired tuple (ghi), as there is no lock, it gets X too – If T2 was updating the same tuple as T1, it would have to wait until T1 released its S lock

483 Phantoms and Handling Insertions Correctly This arises when transactions create new sub elements of lockable elements Since we can lock only existing elements the new elements fail to be locked The problem created in this situation is explained in the following example

484 Example Consider a transaction T 3 – Select sum(length) from table where attribute1 = ‘abc’ – This calculates the total length of all tuples having attribute1 – Thus, T 3 acquires IS for relation and S for targeted tuples – Now, if another transaction T4 inserts a new tuple having attribute1 = ‘abc’, the result of T3 becomes incorrect

485 Example (…contd.) This is not a concurrency problem since the serial order (T3, T4) is maintained But if both T3 and T4 were to write an element X, it could lead to unserializable behavior – r3(t1);r3(t2);w4(t3);w4(X);w3(L);w3(X) – r3 and w3 are read and write operations by T3 and w4 are the write operations by T4 and L is the total length calculated by T3 (t1 + t2) – At the end, we have result of T3 as sum of lengths of t1 and t2 and X has value written by T3 – This is not right; if value of X is considered to be that written by T3 then for the schedule to be serializable, the sum of lengths of t1, t2 and t3 should be considered

486 Example (…contd.) – Else if the sum is total length of t1 and t2 then for the schedule to be serializable, X should have value written by T4 This problem arises since the relation has a phantom tuple (the new inserted tuple), which should have been locked but wasn’t since it didn’t exist at the time locks were taken The occurrence of phantoms can be avoided if all insertion and deletion transactions are treated as write operations on the whole relation

487 Presentation Topic 18.7 of Book Tree Protocol Submitted to: Prof. Dr. T.Y.LIN Submitted By :Saurabh Vishal

488 Topic’s That Should be covered in This Presentation Motivation for Tree-Based Locking Rules for Access to Tree-Structured Data Why the Tree Protocol Works

489 Introduction Tree structures that are formed by the link pattern of the elements themselves. Database are the disjoint pieces of data, but the only way to get to Node is through its parent. B trees are best example for this sort of data. Knowing that we must traverse a particular path to an element give us some important freedom to manage locks differently from two phase locking approaches.

490 Tree Based Locking B tree index in a system that treats individual nodes( i.e. blocks) as lockable database elements. The Node Is the right level granularity. We use a standard set of locks modes like shared,exculsive, and update locks and we use two phase locking

491 Rules for access Tree Structured Data There are few restrictions in locks from the tree protocol. We assume that that there are only one kind of lock. Transaction is consider a legal and schedules as simple. Expected restrictions by granting locks only when they do not conflict with locks already at a node, but there is no two phase locking requirement on transactions.

492 Why the tree protocol works. A transaction's first lock may be at any node of the tree. Subsequent locks may only be acquired if the transaction currently has a lock on the parent node. Nodes may be unlocked at any time A transaction may not relock a node on which it has released a lock, even if it still holds a lock on the node’s parent

493 A tree structure of Lockable elements

494 Three transactions following the tree protocol

495 Why the Tree Protocol works? The Tree protocol forces a serial order on the transactions involved in a schedule. Ti <sTj if in schedule S., the transaction Ti and Tj lock a node in common and Ti locks the node first.

496 Example If precedence graph drawn from the precedence relations that we defined above has no cycles, then we claim that any topological order of transactions is an equivalent serial schedule. For Example either ( T1,T2,T3) or (T3,T1,T2) is an equivalent serial schedule the reason for this serial order is that all the nodes are touched in the same order as they are originally scheduled.

497 If two transactions lock several elements in common, then they are all locked in same order. I am Going to explain this with help of an example.

498 Precedence graph derived from Schedule

499 Example:--4 Path of elements locked by two transactions

500 Continued…. Now Consider an arbitrary set of transactions T1, T2;.... Tn,, that obey the tree protocol and lock some of the nodes of a tree according to schedule S. First among those that lock, the root. they do also in same order. If Ti locks the root before Tj, Then Ti locks every node in common with Tj does. That is Ti sTi.

501 Two-Pass Algorithms Based on Sorting Prepared By: Ronak Shah ID: 116

502 Introduction In two-pass algorithms, data from the operand relations is read into main memory, processed in some way, written out to disk again, and then reread from disk to complete the operation. In this section, we consider sorting as tool from implementing relational operations. The basic idea is as follows if we have large relation R, where B(R) is larger than M, the number of memory buffers we have available, then we can repeatedly

503 1.Read M blocks of R in to main memory 2.Sort these M blocks in main memory, using efficient, main memory algorithm. 3.Write sorted list into M blocks of disk, refer this contents of the blocks as one of the sorted sub list of R.

504 Duplicate elimination using sorting To perform δ(R) operation in two passes, we sort tuples of R in sublists. Then we use available memory to hold one block from each stored sublists and then repeatedly copy one to the output and ignore all tuples identical to it.

505 The no. of disk I/O’s performed by this algorithm, 1). B(R) to read each block of R when creating the stored sublists. 2). B(R) to write each of the stored sublists to disk. 3). B(R) to read each block from the sublists at the appropriate time. So, the total cost of this algorithm is 3B(R).

506 Grouping and aggregation using sorting Reads the tuples of R into memory, M blocks at a time. Sort each M blocks, using the grouping attributes of L as the sort key. Write each sorted sublists on disk. Use one main memory buffer for each sublist, and initially load the first block of each sublists into its buffer. Repeatedly find least value of the sort key present among the first available tuples in the buffers. As for the δ algorithm, this two phase algorithm for γ takes 3B(R) disk I/O’s and will work as long as B(R) <= M^2

507 A sort based union algorithm When bag-union is wanted, one pass algorithm is used in that we simply copy both relation, works regardless of the size of arguments, so there is no need to consider a two pass algorithm for Union bag. The one pass algorithm for Us only works when at least one relation is smaller than the available main memory. So we should consider two phase algorithm for set union. To compute R Us S, we do the following steps, 1. Repeatedly bring M blocks of R into main memory, sort their tuples and write the resulting sorted sublists back to disk. 2.Do the same for S, to create sorted sublist for relation S.

508 3.Use one main memory buffer for each sublist of R and S. Initialize each with first block from the corresponding sublist. 4.Repeatedly find the first remaining tuple t among all buffers. Copy t to the output, and remove from the buffers all copies of t.

509 A simple sort-based join algorithm Given relation R(x,y) and S(y,z) to join, and given M blocks of main memory for buffers, 1. Sort R, using a two phase, multiway merge sort, with y as the sort key. 2. Sort S similarly 3. Merge the sorted R and S. Generally we use only two buffers, one for the current block of R and the other for current block of S. The following steps are done repeatedly. a. Find the least value y of the join attributes Y that is currently at the front of the blocks for R and S. b. If y doesn’t appear at the front of the other relation, then remove the tuples with sort key y.

510 c. Otherwise identify all the tuples from both relation having sort key y d. Output all the tuples that can be formed by joining tuples from R and S with a common Y value y. e. If either relation has no more unconsidered tuples in main memory reload the buffer for that relation. The simple sort join uses 5(B(R) + B(S)) disk I/O’s It requires B(R)<=M^2 and B(S)<=M^2 to work

511 Summary of sort-based algorithms Main memory and disk I/O requirements for sort based algorithms

512 UPDATE Multiversion schemes keep old versions of data item to increase concurrency. Each successful write results in the creation of a new version of the data item written. Use timestamps to label versions. When a read(X) operation is issued, select an appropriate version of X based on the timestamp of the transaction, and return the value of the selected version.

513 Concurrency Control by Validation(18.9) By Swathi Vegesna 217

514 At a Glance  Introduction  Validation based scheduling  Validation based Scheduler  Expected exceptions  Validation rules  Example  Comparisons  Summary

515 Introduction What is optimistic concurrency control? ( assumes no unserializable behavior will occur ) Timestamp- based scheduling and Validation-based scheduling ( allows T to access data without locks )

516 Validation based scheduling  Scheduler keeps a record of what the active transactions are doing.  Executes in 3 phases 1.Read- reads from RS( ), computes local address 2.Validate- compares read and write sets 3.Write- writes from WS( )

517 Validation based Scheduler  Contains an assumed serial order of transactions.  Maintains three sets: – START( ): set of T’s started but not completed validation. – VAL( ): set of T’s validated but not finished the writing phase. – FIN( ): set of T’s that have finished.

518 Expected exceptions 1. Suppose there is a transaction U, such that: U is in VAL or FIN; that is, U has validated, FIN(U)>START(T); that is, U did not finish before T started RS(T) ∩WS(T) ≠φ; let it contain database element X. 2. Suppose there is transaction U, such that: U is in VAL; U has successfully validated. FIN(U)>VAL(T); U did not finish before T entered its validation phase. WS(T) ∩ WS(U) ≠φ; let x be in both write sets.

519 Validation rules  Check that RS(T) ∩ WS(U)= φ for any previously validated U that did not finish before T has started i.e. FIN(U)>START(T).  Check that WS(T) ∩ WS(U)= φ for any previously validated U that did not finish before T is validated i.e. FIN(U)>VAL(T)

520 Example

521 Solution Validation of U: Nothing to check Validation of T: WS(U) ∩ RS(T)= {D} ∩{A,B}=φ WS(U) ∩ WS(T)= {D}∩ {A,C}=φ Validation of V: RS(V) ∩ WS(T)= {B}∩{A,C}=φ WS(V) ∩ WS(T)={D,E}∩ {A,C}=φ RS(V) ∩ WS(U)={B} ∩{D}=φ Validation of W: RS(W) ∩ WS(T)= {A,D}∩{A,C}={A} WS(W) ∩ WS(V)= {A,D}∩{D,E}={D} WS(W) ∩ WS(V)= {A,C}∩{D,E}=φ(W is not validated)

522 Comparison Concurrency control Mechanisms Storage UtilizationDelays LocksSpace in the lock table is proportional to the number of database elements locked. Delays transactions but avoids rollbacks TimestampsSpace is needed for read and write times with every database element, neither or not it is currently accessed. Do not delay the transactions but cause them to rollback unless Interface is low ValidationSpace is used for timestamps and read or write sets for each currently active transaction, plus a few more transactions that finished after some currently active transaction began. Do not delay the transactions but cause them to rollback unless interface is low

523 UPDATE- VALIDATION Optimistic concurrency control Concurrency Control assumes that conflicts between transactions are rare Scheduler maintains record of active transactions Does not require locking Check for conflicts just before commit

524 21.1 Introduction to Information Integration CS257 Fan Yang

525 Need for Information Integration All the data in the world could put in a single database (ideal database system) In the real world (impossible for a single database): databases are created independently hard to design a database to support future use

526 University Database Registrar: to record student and grade Bursar: to record tuition payments by students Human Resources Department: to record employees Other department….

527 Inconvenient Record grades for students who pay tuition Want to swim in SJSU aquatic center for free in summer vacation? (all the cases above cannot achieve the function by a single database) Solution: one database

528 How to integrate Start over build one database: contains all the legacy databases; rewrite all the applications result: painful Build a layer of abstraction (middleware) on top of all the legacy databases this layer is often defined by a collection of classes BUT…

529 Heterogeneity Problem What is Heterogeneity Problem Aardvark Automobile Co. 1000 dealers has 1000 databases to find a model at another dealer can we use this command: SELECT * FROM CARS WHERE MODEL=“A6”;

530 Type of Heterogeneity Communication Heterogeneity Query-Language Heterogeneity Schema Heterogeneity Data type difference Value Heterogeneity Semantic Heterogeneity

531 Chapter 21.2 Modes of Information Integration ID: 219 Name: Qun Yu Class: CS257 219 Spring 2009 Instructor: Dr. T.Y.Lin

532 Content Index 21.2 Modes of Information Integration 21.2.1 Federated Database Systems 21.2.2 Data Warehouses 21.2.3 Mediators

533 Federations  The simplest architecture for integrating several DBs  One to one connections between all pairs of DBs  n DBs talk to each other, n(n-1) wrappers are needed  Good when communications between DBs are limited

534 Wrapper Wrapper : a software translates incoming queries and outgoing answers. In a result, it allows information sources to conform to some shared schema.

535 Federations Diagram DB2DB1 DB3DB4 2 Wrappers A federated collection of 4 DBs needs 12 components to translate queries from one to another.

536 Example Car dealers want to share their inventory. Each dealer queries the other’s DB to find the needed car. Dealer-1’s DB relation: NeededCars(model,color,autoTrans) Dealer-2’s DB relation: Auto(Serial, model, color) Options(serial,option) Dealer-1’s DB Dealer-2’s DB wrapper

537 Example… Dealer 1 queries Dealer 2 for needed cars For(each tuple(:m,:c,:a) in NeededCars){ if(:a=TRUE){/* automatic transmission wanted */ SELECT serial FROM Autos, Options WHERE Autos.serial = Options.serial AND Options.option = ‘autoTrans’ AND Autos.model = :m AND Autos.color =:c; } Else{/* automatic transmission not wanted */ SELECT serial FROM Auto WHERE Autos.model = :m AND Autos.color = :c AND NOT EXISTS( SELECT * FROM Options WHERE serial = Autos.serial AND option=‘autoTrans’); }

538 Data Warehouse  Sources are translated from their local schema to a global schema and copied to a central DB.  User transparent: user uses Data Warehouse just like an ordinary DB  User is not allowed to update Data Warehouse

539 Warehouse Diagram Warehouse Extractor Source 1Source 2 User query result Combiner

540 Example Construct a data warehouse from sources DB of 2 car dealers: Dealer-1’s schema: Cars(serialNo, model,color,autoTrans,cdPlayer,…) Dealer-2’s schema: Auto(serial,model,color) Options(serial,option) Warehouse’s schema: AutoWhse(serialNo,model,color,autoTrans,dealer) Extractor --- Query to extract data from Dealer-1’s data: INSERT INTO AutosWhse(serialNo, model, color, autoTans, dealer) SELECT serialNo,model,color,autoTrans,’dealer1’ From Cars;

541 Example Extractor --- Query to extract data from Dealer-2’s data: INSERT INTO AutosWhse(serialNo, model, color, autoTans, dealer) SELECT serialNo,model,color,’yes’,’dealer2’ FROM Autos,Options WHERE Autos.serial=Options.serial AND option=‘autoTrans’; INSERT INTO AutosWhse(serialNo, model, color, autoTans, dealer) SELECT serialNo,model,color,’no’,’dealer2’ FROM Autos WHERE NOT EXISTS ( SELECT * FROM serial =Autos.serial AND option = ‘autoTrans’);

542 Construct Data Warehouse 1)Periodically reconstructed from the current data in the sources, once a night or at even longer intervals. Advantages: simple algorithms. Disadvantages: 1) need to shut down the warehouse; 2) data can become out of date. There are mainly 3 ways to constructing the data in the warehouse:

543 Construct Data Warehouse 2)Updated periodically based on the changes(i.e. each night) of the sources. Advantages: involve smaller amounts of data. (important when warehouse is large and needs to be modified in a short period) Disadvantages: 1) the process to calculate changes to the warehouse is complex. 2) data can become out of date.

544 Construct Data Warehouse 3) Changed immediately, in response to each change or a small set of changes at one or more of the sources. Advantages: data won’t become out of date. Disadvantages: requires too much communication, therefore, it is generally too expensive. (practical for warehouses whose underlying sources changes slowly.)

545 Mediators  Virtual warehouse, which supports a virtual view or a collection of views, that integrates several sources.  Mediator doesn’t store any data.  Mediators’ tasks: 1)receive user’s query, 2)send queries to wrappers, 3)combine results from wrappers, 4)send the final result to user.

546 A Mediator diagram Mediator Wrapper Source 1Source 2 User query Query Result

547 Example Same data sources as the example of data warehouse, the mediator Integrates the same two dealers’ source into a view with schema: AutoMed(serialNo,model,color,autoTrans,dealer) When the user have a query: SELECT sericalNo, model FROM AkutoMed Where color=‘red’

548 Example In this simple case, the mediator forwards the same query to each Of the two wrappers. Wrapper1: Cars(serialNo, model, color, autoTrans, cdPlayer, …) SELECT serialNo,model FROM cars WHERE color = ‘red’; Wrapper2: Autos(serial,model,color); Options(serial,option) SELECT serial, model FROM Autos WHERE color=‘red’; The mediator needs to interprets serial into serialNo, and then returns the union of these sets of data to user.

549 Example There may be different options for the mediator to forward user query, for example, the user queries if there are a specific model&color car (i.e. “Gobi”, “blue”). The mediator decides 2 nd query is needed or not based on the result of 1 st query. That is, If dealer-1 has the specific car, the mediator doesn’t have to query dealer-2.

550 Chapter 21 Information Integration 21.3 Wrappers in Mediator-Based Systems Presented by: Kai Zhu Professor: Dr. T.Y. Lin Class ID: 220

551 Intro Templates for Query patterns Wrapper Generator Filter

552 Wrappers in Mediator-based Systems More complicated than that in most data warehouse system. Able to accept a variety of queries from the mediator and translate them to the terms of the source. Communicate the result to the mediator.

553 How to design a wrapper? Classify the possible queries that the mediator can ask into templates, which are queries with parameters that represent constants.

554 Templates for Query Patterns: Use notation T=>S to express the idea that the template T is turned by the wrapper into the source query S. Use notation T=>S to express the idea that the template T is turned by the wrapper into the source query S.

555 Example 1 Dealer 1 Cars (serialNo, model, color, autoTrans, navi, … ) For use by a mediator with schema AutoMed (serialNo, model, color, autoTrans, dealer)

556 We denote the code representing that color by the parameter $c, then the template will be: SELECT * FROM AutosMed WHERE color = ’ $c ’ ; => SELECT serialNo, model, color, autoTrans, ’ dealer1 ’ FROM Cars WHERE color= ’ $c ’ ; (Template T => Source query S)

557 There will be total 2 n templates if we have the option of specifying n attributes.

558 Wrapper Generators The wrapper generator creates a table holds the various query patterns contained in the templates. The source queries that are associated with each.

559 A driver is used in each wrapper, the task of the driver is to: Accept a query from the mediator. Search the table for a template that matches the query. The source query is sent to the source, again using a “ plug-in ” communication mechanism. The response is processed by the wrapper.

560 Filter Have a wrapper filter to supporting more queries.

561 Example 2 If wrapper is designed with more complicated template with queries specify both model and color. SELECT * FROM AutosMed WHERE model = ’ $m ’ AND color = ’ $c ’ ; => SELECT serialNo, model, color, autoTrans, ’ dealer1 ’ FROM Cars WHERE model = ’ $m ’ AND color= ’ $c ’ ;

562 Now we suppose the only template we have is color. However the wrapper is asked by the Mediator to find “ blue Gobi model car. ”

563 Solution: 1. Use template with $c= ‘ blue ’ find all blue cars and store them in a temporary relation: TemAutos (serialNo, model, color, autoTrans, dealer) 2.The wrapper then return to the mediator the desired set of automobiles by excuting the local query: SELECT* FROM TemAutos WHERE model= ’ Gobi ’ ;

564 Sections 21.4 – 21.5 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin INFORMATION INTEGRATION

565 Presentation Outline  21.4 Capability Based Optimization  21.4.1The Problem of Limited Source Capabilities  21.4.2 A notation for Describing Source Capabilities  21.4.3 Capability-Based Query-Plan Selection  21.4.4 Adding Cost-Based Optimization  21.5 Optimizing Mediator Queries  21.5.1 Simplified Adornment Notation  21.5.2 Obtaining Answers for Subgoals  21.5.3 The Chain Algorithm  21.5.4 Incorporating Union Views at the Mediator

566 21.4 Capability Based Optimization Introduction – Typical DBMS estimates the cost of each query plan and picks what it believes to be the best – Mediator – has knowledge of how long its sources will take to answer – Optimization of mediator queries cannot rely on cost measure alone to select a query plan – Optimization by mediator follows capability based optimization

567 21.4.1 The Problem of Limited Source Capabilities Many sources have only Web Based interfaces Web sources usually allow querying through a query form E.g. Amazon.com interface allows us to query about books in many different ways. But we cannot ask questions that are too general – E.g. Select * from books;

568 21.4.1 The Problem of Limited Source Capabilities (con’t) Reasons why a source may limit the ways in which queries can be asked – Earliest database did not use relational DBMS that supports SQL queries – Indexes on large database may make certain queries feasible, while others are too expensive to execute – Security reasons E.g. Medical database may answer queries about averages, but won’t disclose details of a particular patient's information

569 21.4.2 A Notation for Describing Source Capabilities  For relational data, the legal forms of queries are described by adornments  Adornments – Sequences of codes that represent the requirements for the attributes of the relation, in their standard order  f(free) – attribute can be specified or not  b(bound) – must specify a value for an attribute but any value is allowed  u(unspecified) – not permitted to specify a value for a attribute

570 21.4.2 A notation for Describing Source Capabilities….(cont’d)  c[S](choice from set S) means that a value must be specified and value must be from finite set S.  o[S](optional from set S) means either do not specify a value or we specify a value from finite set S  A prime (f’) specifies that an attribute is not a part of the output of the query  A capabilities specification is a set of adornments  A query must match one of the adornments in its capabilities specification

571 21.4.2 A notation for Describing Source Capabilities….(cont’d)  E.g. Dealer 1 is a source of data in the form: Cars (serialNo, model, color, autoTrans, navi) The adornment for this query form is b’uuuu

572 21.4.3 Capability-Based Query-Plan Selection Given a query at the mediator, a capability based query optimizer first considers what queries it can ask at the sources to help answer the query The process is repeated until: – Enough queries are asked at the sources to resolve all the conditions of the mediator query and therefore query is answered. Such a plan is called feasible. – We can construct no more valid forms of source queries, yet still cannot answer the mediator query. It has been an impossible query.

573 21.4.3 Capability-Based Query-Plan Selection (cont’d) The simplest form of mediator query where we need to apply the above strategy is join relations E.g we have sources for dealer 2 – Autos(serial, model, color) – Options(serial, option) Suppose that ubf is the sole adornment for Auto and Options have two adornments, bu and uc[autoTrans, navi] Query is – find the serial numbers and colors of Gobi models with a navigation system

574 21.4.4 Adding Cost-Based Optimization Mediator’s Query optimizer is not done when the capabilities of the sources are examined Having found feasible plans, it must choose among them Making an intelligent, cost based query optimization requires that the mediator knows a great deal about the costs of queries involved Sources are independent of the mediator, so it is difficult to estimate the cost

575 21.5 Optimizing Mediator Queries Chain algorithm – a greed algorithm that finds a way to answer the query by sending a sequence of requests to its sources. – Will always find a solution assuming at least one solution exists. – The solution may not be optimal.

576 21.5.1 Simplified Adornment Notation A query at the mediator is limited to b (bound) and f (free) adornments. We use the following convention for describing adornments: – name adornments (attributes) – where: name is the name of the relation the number of adornments = the number of attributes

577 21.5.2 Obtaining Answers for Subgoals Rules for subgoals and sources: – Suppose we have the following subgoal: R x 1 x 2 …x n (a 1, a 2, …, a n ), and source adornments for R are: y 1 y 2 …y n. If y i is b or c[S], then x i = b. If x i = f, then y i is not output restricted. – The adornment on the subgoal matches the adornment at the source: If y i is f, u, or o[S] and x i is either b or f.

578 21.5.3 The Chain Algorithm Maintains 2 types of information: – An adornment for each subgoal. – A relation X that is the join of the relations for all the subgoals that have been resolved. Initially, the adornment for a subgoal is b iff the mediator query provides a constant binding for the corresponding argument of that subgoal. Initially, X is a relation over no attributes, containing just an empty tuple.

579 21.5.3 The Chain Algorithm (con’t)  First, initialize adornments of subgoals and X.  Then, repeatedly select a subgoal that can be resolved. Let R α (a 1, a 2, …, a n ) be the subgoal: 1.Wherever α has a b, we shall find the argument in R is a constant, or a variable in the schema of R.  Project X onto its variables that appear in R.

580 21.5.3 The Chain Algorithm (con’t) 2.For each tuple t in the project of X, issue a query to the source as follows ( β is a source adornment). – If a component of β is b, then the corresponding component of α is b, and we can use the corresponding component of t for source query. – If a component of β is c[S], and the corresponding component of t is in S, then the corresponding component of α is b, and we can use the corresponding component of t for the source query. – If a component of β is f, and the corresponding component of α is b, provide a constant value for source query.

581 21.5.3 The Chain Algorithm (con’t) – If a component of β is u, then provide no binding for this component in the source query. – If a component of β is o[S], and the corresponding component of α is f, then treat it as if it was a f. – If a component of β is o[S], and the corresponding component of α is b, then treat it as if it was c[S]. 3.Every variable among a 1, a 2, …, a n is now bound. For each remaining unresolved subgoal, change its adornment so any position holding one of these variables is b.

582 21.5.3 The Chain Algorithm (con’t) 4.Replace X with X π s(R), where S is all of the variables among: a 1, a 2, …, a n. 5.Project out of X all components that correspond to variables that do not appear in the head or in any unresolved subgoal. If every subgoal is resolved, then X is the answer. If every subgoal is not resolved, then the algorithm fails. α

583 21.5.3 The Chain Algorithm Example Mediator query: – Q: Answer(c) ← R bf (1,a) AND S ff (a,b) AND T ff (b,c) Example: Relation R S T Data Adornment bfc’[2,3,5]f bu wx 12 13 14 xy 24 35 yz 46 57 58

584 21.5.3 The Chain Algorithm Example (con’t) Initially, the adornments on the subgoals are the same as Q, and X contains an empty tuple. – S and T cannot be resolved because they each have ff adornments, but the sources have either a b or c. R(1,a) can be resolved because its adornments are matched by the source’s adornments. Send R(w,x) with w=1 to get the tables on the previous page.

585 21.5.3 The Chain Algorithm Example (con’t) Project the subgoal’s relation onto its second component, since only the second component of R(1,a) is a variable. This is joined with X, resulting in X equaling this relation. Change adornment on S from ff to bf. a 2 3 4

586 21.5.3 The Chain Algorithm Example (con’t) Now we resolve S bf (a,b): – Project X onto a, resulting in X. – Now, search S for tuples with attribute a equivalent to attribute a in X. Join this relation with X, and remove a because it doesn’t appear in the head nor any unresolved subgoal: ab 24 35 b 4 5

587 21.5.3 The Chain Algorithm Example (con’t) Now we resolve T bf (b,c): Join this relation with X and project onto the c attribute to get the relation for the head. Solution is {(6), (7), (8)}. bc 46 57 58

588 21.5.4 Incorporating Union Views at the Mediator This implementation of the Chain Algorithm does not consider that several sources can contribute tuples to a relation. If specific sources have tuples to contribute that other sources may not have, it adds complexity. To resolve this, we can consult all sources, or make best efforts to return all the answers.

589 21.5.4 Incorporating Union Views at the Mediator (con’t) Consulting All Sources – We can only resolve a subgoal when each source for its relation has an adornment matched by the current adornment of the subgoal. – Less practical because it makes queries harder to answer and impossible if any source is down. Best Efforts – We need only 1 source with a matching adornment to resolve a subgoal. – Need to modify chain algorithm to revisit each subgoal when that subgoal has new bound requirements.

590 Local-as-View Mediators Priya Gangaraju(Class Id:203)

591 Local-as-View Mediators. In a LAV mediator, global predicates defined are not views of the source data. for each source, expressions are defined, involving the global predicates that describe the tuples that the source is able to produce. Queries are answered at the mediator by discovering all possible ways to construct the query using the views provided by the source.

592 Motivation for LAV Mediators Sometimes the the relationship between what the mediator should provide and what the sources provide is more subtle. For example, consider the predicate Par(c, p) meaning that p is a parent of c which represents the set of all child parent facts that could ever exist. The sources will provide information about whatever child-parent facts they know.

593 Motivation(contd..) There can be sources which may provide child- grandparent facts but not child- parent facts at all. This source can never be used to answer the child-parent query under GAV mediators. LAV mediators allow to say that a certain source provides grand parent facts. They help discover how and when to use that source in a given query.

594 Terminology for LAV Mediation. The queries at the mediator and the queries that describe the source will be single Datalog rules. A query that is a single Datalog rule is often called a conjunctive query. The global predicates of the LAV mediator are used as the subgoals of mediator queries. There are conjunctive queries that define views.

595 Contd.. Their heads each have a unique view predicate that is the name of a view. Each view definition has a body consisting of global predicates and is associated with a particular source. It is assumed that each view can be constructed with an all-free adornment.

596 Example.. Consider global predicate Par(c, p) meaning that p is a parent of c. One source produces parent facts. Its view is defined by the conjunctive query- V 1 (c, p)  Par(c, p) Another source produces some grand parents facts. Then its conjunctive query will be – V 2 (c, g)  Par(c, p) AND Par(p, g)

597 Example contd.. The query at the mediator will ask for great- grand parent facts that can be obtained from the sources. The mediator query is – Q(w, z)  Par(w, x) AND Par(x, y) AND Par(y, z) One solution can be using the parent predicate(V 1 ) directly three times. Q(w, z)  V 1 (w, x) AND V 1 (x, y) AND V 1 (y, z)

598 Example contd.. Another solution can be to use V 1 (parent facts) and V 2 (grandparent facts). Q(w, z)  V 1 (w, x) AND V 2 (x, z) Or Q(w, z)  V 2 (w, y) AND V 1 (y, z)

599 Expanding Solutions. Consider a query Q, a solution S that has a body whose subgoals are views and each view V is defined by a conjunctive query with that view as the head. The body of V’s conjunctive query can be substituted for a subgoal in S that uses the predicate V to have a body consisting of only global predicates.

600 Expansion Algorithm A solution S has a subgoal V(a 1, a 2,…an) where a i ’s can be any variables or constants. The view V can be of the form V(b 1, b 2,….b n )  B Where B represents the entire body. V(a 1, a 2, … a n ) can be replaced in solution S by a version of body B that has all the subgoals of B with variables possibly altered.

601 Expansion Algorithm contd..  The rules for altering the variables of B are: 1.First identify the local variables B, variables that appear in the body but not in the head. 2.If there are any local variables of B that appear in B or in S, replace each one by a distinct new variable that appears nowhere in the rule for V or in S. 3.In the body B, replace each b i by a i for i = 1,2…n.

602 Example. Consider the view definitions, V 1 (c, p)  Par(c, p) V 2 (c, g)  Par(c, p) AND Par(p, g) One of the proposed solutions S is Q(w, z)  V 1 (w, x) AND V 2 (x, z) The first subgoal with predicate V 1 in the solution can be expanded as Par(w, x) as there are no local variables.

603 Example Contd. The V2 subgoal has a local variable p which doesn’t appear in S nor it has been used as a local variable in another substitution. So p can be left as it is. Only x and z are to be substituted for variables c and g. The Solution S now will be Q(w, z)  Par(w, x) AND Par(x, p) AND Par(p,z)

604 Containment of Conjunctive Queries  A containment mapping from Q to E is a function т from the variables of Q to the variables and constants of E, such that: 1.If x is the ith argument of the head of Q, then т(x) is the ith argument of the head of E. 2.Add to т the rule that т(c)=c for any constant c. If P(x 1,x 2,… x n ) is a subgoal of Q, then P(т(x 1 ), т(x 2 ),… т(x n )) is a subgoal of E.

605 Example. Consider two Conjunctive queries: Q 1 : H(x, y)  A(x, z) and B(z, y) Q 2 : H(a, b)  A(a, c) AND B(d, b) AND A(a, d) When we apply the substitution, Т(x) = a, Т(y) = b, Т(z) = d, the head of Q 1 becomes H(a, b) which is the head of Q 2. So,there is a containment mapping from Q 1 to Q 2.

606 Example contd.. The first subgoal of Q 1 becomes A(a, d) which is the third subgoal of Q 2. The second subgoal of Q 1 becomes the second subgoal of Q 2. There is also a containment mapping from Q 2 to Q 1 so the two conjunctive queries are equivalent.

607 Why the Containment-Mapping Test Works  Suppose there is a containment mapping т from Q 1 to Q 2.  When Q 2 is applied to the database, we look for substitutions σ for all the variables of Q 2.  The substitution for the head becomes a tuple t that is returned by Q 2.  If we compose т and then σ, we have a mapping from the variables of Q 1 to tuples of the database that produces the same tuple t for the head of Q 1.

608 Finding Solutions to a Mediator Query  There can be infinite number of solutions built from the views using any number of subgoals and variables.  LMSS Theorem can limit the search which states that If a query Q has n subgoals, then any answer produced by any solution is also produced by a solution that has at most n subgoals.  If the conjunctive query that defines a view V has in its body a predicate P that doesn’t appear in the body of the mediator query, then we need not consider any solution that uses V.

609 Example. Recall the query Q1: Q(w, z)  Par(w, x) AND Par(x, y) AND Par(y, z) This query has three subgoals, so we don’t have to look at solutions with more than three subgoals.

610 Why the LMSS Theorem Holds Suppose we have a query Q with n subgoals and there is a solution S with more than n subgoals. The expansion E of S must be contained in Query Q, which means that there is a containment mapping from Q to E. We remove from S all subgoals whose expansion was not the target of one of Q’s subgoals under the containment mapping.

611 Contd.. We would have a new conjunctive query S’ with at most n subgoals. If E’ is the expansion of S’ then, E’ is a subset of Q. S is a subset of S’ as there is an identity mapping. Thus S need not be among the solutions to query Q.

612 INFORMATION INTEGRATION Presenter: Namrata Buddhadev (104_224_21.6.1-21.6.7) Professor: Dr T Y Lin

613 Index 21.6 Local-as-View Mediators 21.6.1 Motivation for LAV Mediators 21.6.2 Terminology for LAV Mediators 21.6.3 Expanding Solutions 21.6.4 Containment of Conjunctive Queries 21.6.5 Why the Containment-Mapping Test Works 21.6.6 Finding Solutions to a Mediator Query 21.6.7 Why the LMSS Theorem Holds

614 Local-as-View Mediators GAV: Global as view mediators are like view, it doesn’t exist physically, but piece of it are constructed by the mediator by asking queries LAV: Local as view mediators, defines the global predicates at the mediator, but we do not define these predicates as views of the source of data Global expressions are defined for each source involving global predicates that describe the tuple that source is able to produce and queries are answered at mediator by discovering all possible ways to construct the query using the views provided by sources

615 Motivation for LAV Mediators LAV mediators help us to discover how and when to use that source in a given query Example: Par(c,p)-> GAV of Par(c,p) gives information about the child and parent but does not give information of grandparents LAV Par(c,p) will help to get information of chlid-parent and even grandparent

616 Terminology for LAV Mediation It is in form of logic that serves as the language for defining views. Datalog is used which will remain common for the queries of mediator and source which is known as Conjunctive query. LAV has global predicates which are the subgoals of mediator queries Conjunctive queries defines the views which has unique view predicate and that view has Global predicates and associated with particular view.

617 Example: Par(c,p)->Global predicate view defined by conjunctive query: 1.V1(c,p)<- Par(c,p) 2.Another source produces: V2(c,g)<-Par(c,p) AND Par(p,g) Query at the mediator ask for great grandparents facts: 1.Q(w,z)<-Par(w,x) AND Par(x,y) AND Par(y,z) 2.Or Q(w,z)<-V1(w,x) AND V2(x,z) 3.Or Q(w,z)<-V2(w,y) AND V1(y,z)

618 Expanding Solutions Query Q, Solution S, Sub goals : V(a1,a2,..,an)[can be same] V(b1,b2,..,bn)<-B (Entire Body)[distinct], we can replace V(a1,..an) in solution S by a version of body B that has the sub goals of B with variables possibly altered. Rules: 1.Find local variables of B which are there in the body but not in the head, we can replace any local variables within the conjunctive query if it does not appear elsewhere in the conjunctive query.

619 If there are any local variables of B that appear in B or in S, replace each one by a distinct new variable that appears nowhere in V or in S. In the body B, replace each bi, by ai, for i=1,2,..n. Example: V(a,b,c,d)<-E(a,b,x,y) AND F(x,y,c,d) here for V, x and y are local so, x, y->e, f so, V(a,b,c,d)<-E(a,b,e,f) AND F(e,f,c,d) a,d ->x, b->y and c->1 V(x,y,1,x) has two subgoals E(x,y,e,f) and F(e,f,1,x).

620 Containment of Conjunctive Queries Conjunctive query S be the solution to the mediator Q, Expansion of S->E, produces same answers that Q produces, so, E subset Q. A containment mapping from Q to E is function Γ(x) is the ith argument of the head E. Add to Γ the rule that Γ(c) =c for any constant c. IF P(x1,x2,..xn) is a subgoal of Q, then P(Γ(x1), Γ(x2),.., Γ(xn)) is a subgoal of E.

621 Example : Queries: P1: H(x,y)<-A(x,z) AND A(z,y) P2: H(a,b)<-A(a,c) AND A(c,d) AND A(d,b) consider Γ(x)=a and Γ(y)=b, first subgoal A(x,z) can only map to A(a,c) of P2. 1. Γ(z) must be C as A(x,z) can map A(a,c) of P2. 2. Γ(z) must be d as Γ(y)=b, subgoal A(z,y) of P1 becomes A(d,b) in P2. So, no containment mapping from P! and P2 exists.

622 Complexity of the containment Mapping Test : It is NP-complete to decide whether there is an containment mapping from one conjunctive query to another. Importance of containment mappings is expressed by the theorem: If Q1 and A2 are conjunctive queries, then Q2 is subset or equal to Q1, if and only if there is a containment mapping from Q1 and Q2.

623 Why Containment Mapping Test Works : Questions: 1.If there is containment mapping, why must there be a containment of conjunctive queries? 2.If there is containment, why must there be a containment mapping?

624 Finding Solutions to a Mediator Query Query Q, solutions S, Expansion E of S is contained in Q. “If a query Q has n subgoals, then any answer produced by any solution is also produced by a solution that has at most n subgoals. This is known by LMSS Theorem

625 Example : Q1: Q(w,z)<-Par(w,x) AND Par(x,y) AND Par(y,z) S1: Q(w,z)<-V1(w,x) AND V2(x,z) S2: Q(w,z)<-V1(w,x) AND V2(x,z) AND V1(t,u) AND V2(u,v) by LMSS, E2: Q(w,z)<-Par(w,x) AND Par(x,p) AND Par(t,u) AND Par(u,q) AND Par(q,v) and E2 is subset or equal to E1 using containment mapping that sends each vairable of E1 to the same variable in E2.

626 Why the LMSS Theorem Holds Query Q with n subgoals and S with n subgoals, E of S must be contained in query Q, E is expansion of Q. S’ must be the solution got after removing all subgoals from S those are not the target of Q. E subset or equal to Q and also E’ is the expansion of S’. So, S is subser of S’ : identity mapping. Thus there is no need for solution s among the solution S among the solutions to query Q.

627 Information Integration Entity Resolution – 21.7 Presented By: Deepti Bhardwaj Roll No: 223_103

628 Contents 21.7 Entity Resolution  21.7.1 Deciding Whether Records Represent a Common Entity  21.7.2 Merging Similar Records  21.7.3 Useful Properties of Similarity and Merge Functions  21.7.4 The R-Swoosh Algorithm for ICAR Records  21.7.5 Other Approaches to Entity Resolution

629 Introduction Determining whether two records or tuples do or do not represent the same person, organization, place or other entity is called ENTITY RESOLUTION.

630 Deciding whether Records represent a Common Entity Two records represent the same individual if the two records have similar values for each of the fields associated with those records. It is not sufficient that the values of corresponding fields be identical because of following reasons: 1. Misspellings 2. Variant Names 3. Misunderstanding of Names

631 Continue: Deciding whether Records represent a Common Entity 4. Evolution of Values 5. Abbreviations Thus when deciding whether two records represent the same entity, we need to look carefully at the kinds of discrepancies and use the test that measures the similarity of records.

632 Deciding Whether Records Represents a Common Entity - Edit Distance First approach to measure the similarity of records is Edit Distance. Values that are strings can be compared by counting the number of insertions and deletions of characters it takes to turn one string into another. So the records represent the same entity if their similarity measure is below a given threshold.

633 Deciding Whether Records Represents a Common Entity - Normalization To normalize records by replacing certain substrings by others. For instance: we can use the table of abbreviations and replace abbreviations by what they normally stand for. Once normalize we can use the edit distance to measure the difference between normalized values in the fields.

634 Merging Similar Records Merging means replacing two records that are similar enough to merge and replace by one single record which contain information of both. There are many merge rules: 1. Set the field in which the records disagree to the empty string. 2. (i) Merge by taking the union of the values in each field (ii) Declare two records similar if at least two of the three fields have a nonempty intersection.

635 Continue: Merging Similar Records Name Address Phone 1. Susan 123 Oak St. 818-555-1234 2. Susan 456 Maple St. 818-555-1234 3. Susan 456 Maple St. 213-555-5678 After Merging Name Address Phone (1-2-3) Susan {123 Oak St.,456 Maple St} {818-555-1234, 213- 555-5678}

636 Useful Properties of Similarity and Merge Functions The following properties say that the merge operation is a semi lattice : 1.Idempotence : That is, the merge of a record with itself should surely be that record. 2.Commutativity : If we merge two records, the order in which we list them should not matter. 3.Associativity : The order in which we group records for a merger should not matter.

637 Continue: Useful Properties of Similarity and Merge Functions There are some other properties that we expect similarity relationship to have: Idempotence for similarity : A record is always similar to itself Commutativity of similarity : In deciding whether two records are similar it does not matter in which order we list them Representability : If r is similar to some other record s, but s is instead merged with some other record t, then r remains similar to the merger of s and t and can be merged with that record.

638 R-swoosh Algorithm for ICAR Records Input: A set of records I, similarity function and a merge function. Output: A set of merged records O. Method: – O:= emptyset; – WHILE I is not empty DO BEGIN » Let r be any record in I; » Find, if possible, some record s in O that is similar to r; » IF no record s exists THEN move r from I to O » ELSE BEGIN delete r from I; delete s from O; add the merger of r and s to I; » END;

639 Other Approaches to Entity Resolution The other approaches to entity resolution are : – Non- ICAR Datasets – Clustering – Partitioning

640 Other Approaches to Entity Resolution - Non ICAR Datasets Non ICAR Datasets : We can define a dominance relation r<=s that means record s contains all the information contained in record r. If so, then we can eliminate record r from further consideration.

641 Other Approaches to Entity Resolution - Clustering Clustering: Some time we group the records into clusters such that members of a cluster are in some sense similar to each other and members of different clusters are not similar.

642 UPDATE Partitioning: We can group the records, perhaps several times, into groups that are likely to contain similar records and look only within each group for pairs of similar records.


Download ppt "Sections 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT."

Similar presentations


Ads by Google