Presentation is loading. Please wait.

Presentation is loading. Please wait.

Database Management 6. course. OS and DBMS DMBS DB OS DBMS DBA USER DDL DML WHISHESWHISHES RULESRULES.

Similar presentations


Presentation on theme: "Database Management 6. course. OS and DBMS DMBS DB OS DBMS DBA USER DDL DML WHISHESWHISHES RULESRULES."— Presentation transcript:

1 Database Management 6. course

2 OS and DBMS DMBS DB OS DBMS DBA USER DDL DML WHISHESWHISHES RULESRULES

3 Steps of a query 1.SQL query 2.Permission in the schema? 3.Permission in the subschema? 4.I/O operation 5.Search 6.Import 7.Notification 8.User workspace 9.User notification

4 Data storage: disks and files

5 Mass storage device (disc, drive) I/O – READ: disc  memory (RAM) – WRITE: memory  disc – Time consuming

6 Why not storing everything in RAM? Expenses of 1GB: RAM 10 € ↔HDD 0,5 € RAM volatilis Tipical way of storage: – Actual data is in memory – Secondary storage is on HDD (local server, cloud) – Tertiary storage

7 Storage on disks Unit: disc block Speed depends on location!

8 Components

9 Reading a block Access time of a block: – Seek time – Rotational delay – Transfer time: 1ms/4KB I/O optimization: reducing seek time and rotational delay

10 Order of data Frequently used blocks close to each other – Same block – Same track, same cylinder – Adjacent cylinder Reading is sequential Multiple block reading saves time

11 Way of storage - RAID Redundant Array of Inexpensive/Independent Data Connecting disks logically, storing data redundantly Aims: – Minimizing data loss, increase reliability – Increasing capacity by more smaller/cheaper disks – Increase data access performance – Increase flexibility (can be replaced during usage)

12 Two main techniques Data striping – Data is partitioned (striping unit) – Partitions are distributed on several disks Redundancy – Reconstruction of data

13 Level 0 Non redundant If one of the disks fails, data is lost Parallel reading/writing Performance depends on the worst disk

14 Level 1 Mirrored Data can be reconstructed Parallel reading, increased velocity Parallel writing, normal velocity Performance depends on the worst disk Does not use data striping

15 Level 2 Data striping (unit=1 bit), error-correcting codes ECC: redundant bits calculated from data bits (compress) Not used any more

16 Level 3 Bit-Interleaved Parity Cannot identify the failed disk One check disk with parity information The failed disk’s data can be recovered Can process only one I/O at a time Strip=1 bit

17 Level 4 Block-Interleaved Parity Like RAID 3, strip=disk blocks Supports multiple users Parity disk update can be bottle neck In case of disk failure, reading speed reduces

18 Level 5 Block-Interleaved Distributed Parity Rotating parity Parallel read and write Similar to RAID 3 and 4 depending on the size of strips If a disks fails, it has to be replaced inmediately

19 RAID 5 Capacity= min_capacity*(no of disks-1) Reading speed=min_speed*(no of disks-1)

20 Level 6 High possibility of the failure during recovery 2 check disks Recover from up to two disk failures Read and write speed is equal to RAID 5

21 RAID 0+1 and RAID 10 RAID 0+1 RAID 10

22 Disk space and buffering

23 Disk space management The lowest level of DBMS manages the space Unit of data: page Size of page=size of disk block Higher levels can – Allocate and delete pages – Write / read pages Allows higher levels of DBMS to think of the data as a collection of pages

24 Keeping track of free blocks Maintain a list of free blocks with pointer to the first free block OR Maintain a bitmap with one bit for each block: block is used or not

25 Using OS to manage disk space Possible, not common Not portable: different file system On 32-bit systems the largest file size is 4GB, OS files cannot span disk devices

26 Buffer manager Data has to be imported into the memory (RAM) to use it pares are stored in tables DB Memory Disc page free frame Page requests BUFFER POOL If a requested page is not in the pool and the pool is full, the buffer manager’s replacement policy controls which existing page is replaced.

27 When a request comes… If the page is not in the buffer: – Choose a frame to replace, incerase its pin count – If the dirty bit for the replacement frame is on, write the content on the disk – Reads the requested page into the replacement frame Return the address of the frame to the requestor If it can be predicted that which page will be requested next, then multiple pages can be read (pre-fetching)

28 Buffer management The requestor has to unpin the request Mark if the content of the page is modified – With the dirty bit The page in the buffer can be called multiple times by processes/transactions – Pin_count: page can be replaced if and only if pin_count=0 Concurrency handling and rollback handling can influence the replacement policy

29 Buffer replacement policies Least-recently-used (LRU): counts what was used and when (costs a lot) Clock replacement – Current frame is stored Goes to the next until pin count=0 and referenced bit is off (not used) – After the last, jumps to the first (like a circle)

30 Files and indexes

31 Records in files DBMS handles records and files Files: collection of pages containing records They must support – DML (insert, update, delete) – Read records (identified by record id – rid) – Read all the records (that satisfy some conditions)

32 Unordered (heap) files Simplest file structure DBMS must register – pages in the file – free space in the page – records in the page

33 Heap file as a linked list Every page contains two pointers Header Page Data Page Data Page Data Page Data Page Data Page Data Page Pages with free space Full pages

34 Disadvantages – Every page is in the list of free records if they have variable length – To insert a record, we must examine several pages before finding enough space

35 Directory-based heap file Maintain directory of pages DBMS stores the address of the first page of each heap file Directory=collection of pages Counter for every page: amount of free space/entry Data Page 1 Data Page 2 Data Page N Header Page DIRECTORY

36 Index Read the records sequentially Search for a concrete rid Records with specific conditions for its attributes (e.g. all CLERCKs) Value-based queries

37 Example, library 1. lokate books of Asimov 2. Search for Foundation

38 Indexed file: Give a search key for the entries (records in files), calculate the index of this key, look for it Goal: speed up search E.g. I am looking for employees of a given age, then I can build an index which might contain pairs The pages of the index files are organized based on the indexes to find the result quickly (access methods)

39 Access methods B trees B+ trees Hash-based structures Discussed in detail later

40 Page formats Data as a collection of records Page~collection of slots, each slot contains a record Record identification: – =rid – Number every record and store its location in a table

41 Fixed-length records All records have the same length Insertion: locate empty slot, place there Main issue: – Keep track of empty slots – Locate all records on a page

42 Deletion alternatives – first option Store records in the first N slots without gap If a record is deleted, the last record is moved to the gap Advantage: finding location is easy (just offset calculation) The empty slots remain together at the end of the page Disadvantge: if the moved record is referred externally (the rid changes)

43 Second option Using an array of bits, one bit/slot If record is deleted, its bit turns off Summary: Every page contains additional file- level info

44 Variable-length records If new record is to be inserted, enough and not too big space is needed (do not waste) If deleted, move the others to fill the hole Most flexible organization: directory of slots for each page

45 Directory of slots Offset (pointer) and length of the records are stored Deletion: set offset to -1 Records can be moved since rid=(page number,slot number[position in the directory]) does not change Only the record offset changes

46 The offset of the free space is stored When new record is inserted and there is not enough space, records are moved If a record is deleted the number of the rest record cannot be changed due to external references If a record is inserted, a missing number should be given to it

47

48 Record formats Number of fields and field types are stored in the system catalog

49 Fixed-length records Each field has fixed length (uniform for every record) By the offset of the record the offset of each field can be calculated easily: Base address (B) L1L2L3L4 F1F2F3F4 Address = B+L1+L2

50 Variable-length records Variable length fields (e.g. varchar2) Two formats: – Separators are used: scan of the record is needed for reading the fields – Array of integer offsets at the beginning of the record to store the relative position of the fields and the end of the record:

51 The offset of the end of the record is stored Disadvantage – Storage overhead Advantages – Direct access to the fields – NULL: start of the field=end of the field

52 Issues When insert, move the other fields When modify, move the other fields – Page modification may cause a problem – Forwarding address is left on the page When a record is too big for one page – Break record to smaller records – Chain them

53 Thank you for your attention!


Download ppt "Database Management 6. course. OS and DBMS DMBS DB OS DBMS DBA USER DDL DML WHISHESWHISHES RULESRULES."

Similar presentations


Ads by Google