CS 405G: Introduction to Database Systems 21 Storage Chen Qian University of Kentucky.

CS 405G: Introduction to Database Systems 21 Storage Chen Qian University of Kentucky

Review Disk management Basic disk (physical) unit: page Basic DB unit: record (row of a table) How to connect page with record? File! 6/4/2016Chen Qian @ University of Kentucky2

6/4/2016Chen Qian @ University of Kentucky3 A row in a table, when located on disks, is called A record Two types of record: Fixed-length Variable-length

6/4/2016Chen Qian @ University of Kentucky4 In an abstract sense, a file is A set of “records” on a disk In reality, a file is A set of disk pages Each record lives on A page Physical Record ID (RID) A tuple of

Files Higher levels of DBMS operate on records, and files of records. FILE: A collection of pages, containing a collection of records. Record = row in a table Must support: insert/delete/modify record fetch a particular record (specified using record id) scan all records (possibly with some conditions on the records to be retrieved) 6/4/20165Chen Qian @ University of Kentucky

6/4/2016Chen Qian @ University of Kentucky6 Record layout Record = row in a table Variable-format records Rare in DBMS—table schema dictates the format Relevant for semi-structured data such as XML Focus on fixed-format records With fixed-length fields only, or With possible variable-length fields

Record Formats: Fixed Length All field lengths and offsets are constant Computed from schema, stored in the system catalog Finding i’th field done via arithmetic. Base address (B) L1L2 L3L4 F1F2 F3F4 Address = B+L1+L2 6/4/2016Chen Qian @ University of Kentucky7

6/4/2016Chen Qian @ University of Kentucky8 Fixed-length fields Example: CREATE TABLE Student(SID INT, name CHAR(20), age INT, GPA FLOAT); 142 04 Bart (padded with space) 24 102.3 2836 Watch out for alignment May need to pad; reorder columns if that helps What about NULL ? Add a bitmap at the beginning of the record

Record Formats: Variable Length Two alternative formats (# fields is fixed): * Second offers direct access to i’th field, efficient storage of nulls ; extra but small directory overhead. $$ $$ Fields Delimited by Special Symbols F1 F2 F3 F4 Array of Field Offsets 6/4/2016Chen Qian @ University of Kentucky9

Trade-off Compared to the 1 st formats, the offset based format has: Pros: Fast access to a particular field; efficient storage of nulls Cons: Extra space 6/4/2016Chen Qian @ University of Kentucky10

6/4/2016Chen Qian @ University of Kentucky11 Large Object (LOB) fields Example: CREATE TABLE Student(SID INT, name CHAR(20), age INT, GPA FLOAT, picture BLOB(32000)); Student records get “de-clustered” Bad because most queries do not involve picture Decomposition (automatically done by DBMS and transparent to the user) Student(SID, name, age, GPA) StudentPicture(SID, picture)

6/4/2016Chen Qian @ University of Kentucky12 Page layout How do you organize records in a page? Fixed length records Variable length records NSM (N-ary Storage Model) is used in most commercial DBMS

Page Formats: Fixed Length Records * Record id =. In first alternative, moving records for free space management changes rid; may not be acceptable. Slot 1 Slot 2 Slot N... NM1 0 M... 3 2 1 PACKED UNPACKED, BITMAP Slot 1 Slot 2 Slot N Free Space Slot M 11 number of records number of slots 6/4/2016Chen Qian @ University of Kentucky13

Trade-off Compared to the packed formats, the unpacked format has: Pros: No need to move records into vacated slots after delete. No need to change the slot number. Cons: To insert a record, one needs to find an empty slot, by scanning the bitmap. 6/4/2016Chen Qian @ University of Kentucky14

6/4/2016Chen Qian @ University of Kentucky15 142 Bart 10 2.3123 Milhouse 10 3.1 456 Ralph 8 2.3 857 Lisa 8 4.3 Variable-Length Records Store records from the beginning of each page Use a directory at the end of each page To locate records and manage free space Necessary for variable-length records Why store data and directory at two different ends? Both can grow easily

6/4/2016Chen Qian @ University of Kentucky16 Options Reorganize after every update/delete to avoid fragmentation (gaps between records) Need to rewrite half of the block on average What if records are fixed-length? Reorganize after delete Only need to move one record Need a pointer to the beginning of free space Do not reorganize after update Need a bitmap indicating which slots are in use

System Catalogs For each relation: name, file location, file structure (e.g., Heap file) attribute name and type, for each attribute index name, for each index integrity constraints For each index: structure (e.g., B+ tree) and search key fields For each view: view name and definition Plus statistics, authorization, buffer pool size, etc. Catalogs are themselves stored as relations ! 6/4/2016Chen Qian @ University of Kentucky17

Indexes A Heap file allows us to retrieve records: by specifying the rid, or by scanning all records sequentially Sometimes, we want to retrieve records by specifying the values in one or more fields, e.g., Find all students in the “CS” department Find all students with a gpa > 3 Indexes are file structures that enable us to answer such value-based queries efficiently. 6/4/2016Chen Qian @ University of Kentucky18

6/4/2016Chen Qian @ University of Kentucky19 Basics Given a value, locate the record(s) with this value SELECT * FROM R WHERE A = value ; SELECT * FROM R, S WHERE R. A = S. B ; Other search criteria, e.g. Range search SELECT * FROM R WHERE A > value ; Keyword search database indexing Search

6/4/2016Chen Qian @ University of Kentucky20 Dense and sparse indexes Dense: one index entry for each search key value Sparse: one index entry for each block Records must be clustered according to the search key

6/4/2016Chen Qian @ University of Kentucky21 Dense versus sparse indexes Index size Sparse index is smaller Requirement on records Records must be clustered for sparse index Lookup Sparse index is smaller and may fit in memory Dense index can directly tell if a record exists Update Easier for sparse index

6/4/2016Chen Qian @ University of Kentucky22 Primary and secondary indexes Primary index Created for the primary key of a table Records are usually clustered according to the primary key Can be sparse Secondary index Usually dense SQL PRIMARY KEY declaration automatically creates a primary index, UNIQUE key automatically creates a secondary index Additional secondary index can be created on non-key attribute(s) CREATE INDEX StudentGPAIndex ON Student(GPA) ;

Tree-Structured Indexes: Introduction Tree-structured indexing techniques support both range selections and equality selections. ISAM =Indexed Sequential Access Method static structure; early index technology. B + tree: dynamic, adjusts gracefully under inserts and deletes. 6/4/2016Chen Qian @ University of Kentucky23

Motivation for Index ``Find all students with gpa > 3.0’’ If data file is sorted, do binary search Cost of binary search in a database can be quite high, Why? Simple idea: Create an `index’ file. Can do binary search on (smaller) index file! Page 1 Page 2 Page N Page 3 Data File k2 kN k1 Index File P 0 K 1 P 1 K 2 P 2 K m P m index entry 6/4/2016Chen Qian @ University of Kentucky24

6/4/2016Chen Qian @ University of Kentucky25 ISAM What if an index is still too big? Put a another (sparse) index on top of that!  ISAM (Index Sequential Access Method), more or less 100, 200, …, 901 100, 123, …, 192901, …, 996 … Index blocks 200, … 100, 108, 119, 121 123, 129, … 901, 907, … 996, 997, … ……… Data blocks 192, 197, … 200, 202, … Example: look up 197

How many steps? For a balanced tree of N pages, the num of search steps is O(logN) Less than a constant times logN 6/4/2016Chen Qian @ University of Kentucky26

6/4/2016Chen Qian @ University of Kentucky27 Updates with ISAM Overflow chains and empty data blocks degrade performance Worst case: most records go into one long chain Example: insert 107 107 Overflow block Example: delete 129 100, 200, …, 901 100, 123, …, 192901, …, 996 … Index blocks 200, … 100, 108, 119, 121 123, 129, … 901, 907, … 996, 997, … ……… Data blocks 192, 197, … 200, 202, …

How many steps? For a chain of N pages, the worst-case num of search steps is N Much larger than O(logN)! 6/4/2016Chen Qian @ University of Kentucky28

6/4/2016Chen Qian @ University of Kentucky29 A Note of Caution ISAM is an old-fashioned idea B+-trees are usually better, as we’ll see But, ISAM is a good place to start to understand the idea of indexing Upshot Don’t brag about being an ISAM expert on your resume Do understand how they work, and tradeoffs with B + - trees

Summary (Contd.) DBMS vs. OS File Support DBMS needs features not found in many OSes, e.g., forcing a page to disk, controlling the order of page writes to disk, files spanning disks, ability to control pre-fetching and page replacement policy based on predictable access patterns, etc. Variable length record format with field offset directory offers support for direct access to ith field and null values. 6/4/2016Chen Qian @ University of Kentucky30

CS 405G: Introduction to Database Systems 21 Storage Chen Qian University of Kentucky.

Similar presentations

Presentation on theme: "CS 405G: Introduction to Database Systems 21 Storage Chen Qian University of Kentucky."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CS 405G: Introduction to Database Systems 21 Storage Chen Qian University of Kentucky.

Similar presentations

Presentation on theme: "CS 405G: Introduction to Database Systems 21 Storage Chen Qian University of Kentucky."— Presentation transcript:

Similar presentations

About project

Feedback