CPSC-310 Database Systems

CPSC-310 Database Systems
Professor Jianer Chen Room 315C HRBB Lecture #17

Index The most common operation on relations is searching (a set of tuples with some specific values on certain attributes) in a relation.

Index The most common operation on relations is searching (a set of tuples with some specific values on certain attributes) in a relation. An index on attribute(s) of a relation is a data structure for the relation that supports efficient searching (and other operations) based on the attributes in the relation.

Index The most common operation on relations is searching (a set of tuples with some specific values on certain attributes) in a relation. An index on attribute(s) of a relation is a data structure for the relation that supports efficient searching (and other operations) based on the attributes in the relation. SQL has a way to tell DBMS to build an index on certain attributes for a relation:

CREATE INDEX <indexname> ON <relation(attributes)>
The most common operation on relations is searching (a set of tuples with some specific values on certain attributes) in a relation. An index on attribute(s) of a relation is a data structure for the relation that supports efficient searching (and other operations) based on the attributes in the relation. SQL has a way to tell DBMS to build an index on certain attributes for a relation: CREATE INDEX <indexname> ON <relation(attributes)>

CREATE INDEX <indexname> ON <relation(attributes)>
The most common operation on relations is searching (a set of tuples with some specific values on certain attributes) in a relation. An index on attribute(s) of a relation is a data structure for the relation that supports efficient searching (and other operations) based on the attributes in the relation. SQL has a way to tell DBMS to build an index on certain attributes for a relation: CREATE INDEX <indexname> ON <relation(attributes)> -- In many cases, DBMS builds an index on the primary key for a relation, without having to be requested by users.

How Does an Index Work? This depends on

How Does an Index Work? This depends on
How is a relation stored (in disk)?

How is a relation stored (in disk)? How does a computer access data in disks?

How is a relation stored (in disk)? How does a computer access data in disks? How do we measure the efficiency of an algorithm that accesses disk data?

Outline of Course Representing things by tables E-R model (Ch. 4)
Good table structures Functional Dependencies (Ch.3) Basic operations on relations Relational Algebra (Chs. 2+5) Storage management (Chs ) SQL languages in DDL/DML (Ch. 6) Query processing (Chs ) More on SQL (Chs. 7-9) Transition processing (Chs )

How Does Disk Store Data?

typical computer architecture CPU bus ... main memory Disk controller Secondary Storage disks

platter cylinder track sector gap typical disk head Terms: Platter, Head, Cylinder, Track, Sector, Gap

A typical disk: 5 platters (thus 10 surfaces) A surface has 20,000 tracks A track has 500 sectors (million bytes) A sector has several thousand bytes Thus a disk has a capacity of 100’s GB.

How Does Computer Access Disk?

CPU

CPU A sector (or several sectors) makes a “basic storage unit” (a block of typically 16KB) that is read/written as a whole by computer.

CPU A sector (or several sectors) makes a “basic storage unit” (a block of typically 16KB) that is read/written as a whole by computer. CPU specifies a “disk address”: disk-id/cylinder#/surface#/sector#

CPU A sector (or several sectors) makes a “basic storage unit” (a block of typically 16KB) that is read/written as a whole by computer. CPU specifies a “disk address”: disk-id/cylinder#/surface#/sector# Disk controller: * moves the head to the cylinder, * waits until the sector comes under the head, * transfers data in the sector to main memory

CPU A sector (or several sectors) makes a “basic storage unit” (a block of typically 16KB) that is read/written as a whole by computer. CPU specifies a “disk address”: disk-id/cylinder#/surface#/sector# Disk controller: * moves the head to the cylinder, * waits until the sector comes under the head, * transfers data in the sector to main memory CPU operates on the data in main memory.

CPU A sector (or several sectors) makes a “basic storage unit” (a block of typically 16KB) that is read/written as a whole by computer. CPU specifies a “disk address”: disk-id/cylinder#/surface#/sector# Disk controller: * moves the head to the cylinder, * waits until the sector comes under the head, * transfers data in the sector to main memory CPU operates on the data in main memory. Mechanical movement, very slow (~ 40 ms)

How To Measure an algorithm on Disk?

Disks: slow (read/write: 1~40 ms), with large capacity (100’s GBs)

Disks: slow (read/write: 1~40 ms), with large capacity (100’s GBs) Main Memory: fast (read/write: ns), with smaller capacity (GBs)

Disks: slow (read/write: 1~40 ms), with large capacity (100’s GBs) Main Memory: fast (read/write: ns), with smaller capacity (GBs) Disks are about 105~106 times slower than main memory.

I/O Model of Computation
Dominance of I/O cost: if a block needs to be moved between disk and main memory, then the time taken to perform the read/write is much larger than the time likely to be used to manipulate that data in main memory.

Dominance of I/O cost: if a block needs to be moved between disk and main memory, then the time taken to perform the read/write is much larger than the time likely to be used to manipulate that data in main memory. The number of disk block reads/writes is a good approximation to the entire computation.

Thus, making a relation in a disk sorted then searching it using binary search may not be a good idea:

Thus, making a relation in a disk sorted then searching it using binary search may not be a good idea: for a relation R of 1M tuples, the binary search takes about 20 disk reads.

Thus, making a relation in a disk sorted then searching it using binary search may not be a good idea: for a relation R of 1M tuples, the binary search takes about 20 disk reads. It is also difficult to insert and delete in a sorted relation.

Thus, making a relation in a disk sorted then searching it using binary search may not be a good idea: for a relation R of 1M tuples, the binary search takes about 20 disk reads. It is also difficult to insert and delete in a sorted relation. It is also a big waste that each time we read a block (of 16KB) which is only for a single tuple (probably 50 bytes).

B+Trees Notes #7

B+Trees Support fast search Notes #7

B+Trees Support fast search Support range search Notes #7

B+Trees Support fast search Support range search
Support dynamic changes Notes #7

Support dynamic changes Could be either dense or sparse Notes #7

Support dynamic changes Could be either dense or sparse * dense: pointers to all records * sparse: one pointer per block Notes #7

B+Trees A B+tree node of order n pn kn k2 p1 k1 p0 p2 …… Notes #7

B+Trees A B+tree node of order n
where ph are pointers (disk addresses) and kh are search-keys (values of the attributes in the index) pn kn k2 p1 k1 p0 p2 …… Notes #7

B+Trees A B+tree node of order n How big is n?
where ph are pointers (disk addresses) and kh are search-keys (values of the attributes in the index) How big is n? pn kn k2 p1 k1 p0 p2 …… Notes #7

B+Trees A B+tree node of order n How big is n?
where ph are pointers (disk addresses) and kh are search-keys (values of the attributes in the index) How big is n? Basically we want each B+tree node to fit in a disk block so that a B+tree node can be read/written by a single disk I/O. Typically, n ~ pn kn k2 p1 k1 p0 p2 …… Notes #7

B+Tree Example order n = 3
root 100 30 120 150 180 3 5 11 30 35 100 101 110 120 130 150 156 179 180 200 Notes #7

CPSC-310 Database Systems

Similar presentations

Presentation on theme: "CPSC-310 Database Systems"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CPSC-310 Database Systems

Similar presentations

Presentation on theme: "CPSC-310 Database Systems"— Presentation transcript:

Similar presentations

About project

Feedback