Presentation is loading. Please wait.

Presentation is loading. Please wait.

CPSC-310 Database Systems

Similar presentations


Presentation on theme: "CPSC-310 Database Systems"— Presentation transcript:

1 CPSC-310 Database Systems
Professor Jianer Chen Room 315C HRBB Lecture #17

2 Index

3 Index The most common operation on relations is searching (a set of tuples with some specific values on certain attributes) in a relation.

4 Index The most common operation on relations is searching (a set of tuples with some specific values on certain attributes) in a relation. An index on attribute(s) of a relation is a data structure for the relation that supports efficient searching (and other operations) based on the attributes in the relation.

5 Index The most common operation on relations is searching (a set of tuples with some specific values on certain attributes) in a relation. An index on attribute(s) of a relation is a data structure for the relation that supports efficient searching (and other operations) based on the attributes in the relation. SQL has a way to tell DBMS to build an index on certain attributes for a relation:

6 CREATE INDEX <indexname> ON <relation(attributes)>
The most common operation on relations is searching (a set of tuples with some specific values on certain attributes) in a relation. An index on attribute(s) of a relation is a data structure for the relation that supports efficient searching (and other operations) based on the attributes in the relation. SQL has a way to tell DBMS to build an index on certain attributes for a relation: CREATE INDEX <indexname> ON <relation(attributes)>

7 CREATE INDEX <indexname> ON <relation(attributes)>
The most common operation on relations is searching (a set of tuples with some specific values on certain attributes) in a relation. An index on attribute(s) of a relation is a data structure for the relation that supports efficient searching (and other operations) based on the attributes in the relation. SQL has a way to tell DBMS to build an index on certain attributes for a relation: CREATE INDEX <indexname> ON <relation(attributes)> -- In many cases, DBMS builds an index on the primary key for a relation, without having to be requested by users.

8 How Does an Index Work? This depends on

9 How Does an Index Work? This depends on
How is a relation stored (in disk)?

10 How Does an Index Work? This depends on
How is a relation stored (in disk)? How does a computer access data in disks?

11 How Does an Index Work? This depends on
How is a relation stored (in disk)? How does a computer access data in disks? How do we measure the efficiency of an algorithm that accesses disk data?

12 Outline of Course Representing things by tables E-R model (Ch. 4)
Good table structures Functional Dependencies (Ch.3) Basic operations on relations Relational Algebra (Chs. 2+5) Storage management (Chs ) SQL languages in DDL/DML (Ch. 6) Query processing (Chs ) More on SQL (Chs. 7-9) Transition processing (Chs )

13 Outline of Course Representing things by tables E-R model (Ch. 4)
Good table structures Functional Dependencies (Ch.3) Basic operations on relations Relational Algebra (Chs. 2+5) Storage management (Chs ) SQL languages in DDL/DML (Ch. 6) Query processing (Chs ) More on SQL (Chs. 7-9) Transition processing (Chs )

14 How Does Disk Store Data?

15 How Does Disk Store Data?
typical computer architecture CPU bus ... main memory Disk controller Secondary Storage disks

16 How Does Disk Store Data?
platter cylinder track sector gap typical disk head Terms: Platter, Head, Cylinder, Track, Sector, Gap

17 How Does Disk Store Data?
A typical disk: 5 platters (thus 10 surfaces) A surface has 20,000 tracks A track has 500 sectors (million bytes) A sector has several thousand bytes Thus a disk has a capacity of 100’s GB.

18 How Does Computer Access Disk?

19 How Does Computer Access Disk?
CPU

20 How Does Computer Access Disk?
CPU A sector (or several sectors) makes a “basic storage unit” (a block of typically 16KB) that is read/written as a whole by computer.

21 How Does Computer Access Disk?
CPU A sector (or several sectors) makes a “basic storage unit” (a block of typically 16KB) that is read/written as a whole by computer. CPU specifies a “disk address”: disk-id/cylinder#/surface#/sector#

22 How Does Computer Access Disk?
CPU A sector (or several sectors) makes a “basic storage unit” (a block of typically 16KB) that is read/written as a whole by computer. CPU specifies a “disk address”: disk-id/cylinder#/surface#/sector# Disk controller: * moves the head to the cylinder, * waits until the sector comes under the head, * transfers data in the sector to main memory

23 How Does Computer Access Disk?
CPU A sector (or several sectors) makes a “basic storage unit” (a block of typically 16KB) that is read/written as a whole by computer. CPU specifies a “disk address”: disk-id/cylinder#/surface#/sector# Disk controller: * moves the head to the cylinder, * waits until the sector comes under the head, * transfers data in the sector to main memory CPU operates on the data in main memory.

24 How Does Computer Access Disk?
CPU A sector (or several sectors) makes a “basic storage unit” (a block of typically 16KB) that is read/written as a whole by computer. CPU specifies a “disk address”: disk-id/cylinder#/surface#/sector# Disk controller: * moves the head to the cylinder, * waits until the sector comes under the head, * transfers data in the sector to main memory CPU operates on the data in main memory. Mechanical movement, very slow (~ 40 ms)

25 How To Measure an algorithm on Disk?

26 How To Measure an algorithm on Disk?
Disks: slow (read/write: 1~40 ms), with large capacity (100’s GBs)

27 How To Measure an algorithm on Disk?
Disks: slow (read/write: 1~40 ms), with large capacity (100’s GBs) Main Memory: fast (read/write: ns), with smaller capacity (GBs)

28 How To Measure an algorithm on Disk?
Disks: slow (read/write: 1~40 ms), with large capacity (100’s GBs) Main Memory: fast (read/write: ns), with smaller capacity (GBs) Disks are about 105~106 times slower than main memory.

29 I/O Model of Computation
Dominance of I/O cost: if a block needs to be moved between disk and main memory, then the time taken to perform the read/write is much larger than the time likely to be used to manipulate that data in main memory.

30 I/O Model of Computation
Dominance of I/O cost: if a block needs to be moved between disk and main memory, then the time taken to perform the read/write is much larger than the time likely to be used to manipulate that data in main memory. The number of disk block reads/writes is a good approximation to the entire computation.

31 I/O Model of Computation
Thus, making a relation in a disk sorted then searching it using binary search may not be a good idea:

32 I/O Model of Computation
Thus, making a relation in a disk sorted then searching it using binary search may not be a good idea: for a relation R of 1M tuples, the binary search takes about 20 disk reads.

33 I/O Model of Computation
Thus, making a relation in a disk sorted then searching it using binary search may not be a good idea: for a relation R of 1M tuples, the binary search takes about 20 disk reads. It is also difficult to insert and delete in a sorted relation.

34 I/O Model of Computation
Thus, making a relation in a disk sorted then searching it using binary search may not be a good idea: for a relation R of 1M tuples, the binary search takes about 20 disk reads. It is also difficult to insert and delete in a sorted relation. It is also a big waste that each time we read a block (of 16KB) which is only for a single tuple (probably 50 bytes).

35 B+Trees Notes #7

36 B+Trees Support fast search Notes #7

37 B+Trees Support fast search Support range search Notes #7

38 B+Trees Support fast search Support range search
Support dynamic changes Notes #7

39 B+Trees Support fast search Support range search
Support dynamic changes Could be either dense or sparse Notes #7

40 B+Trees Support fast search Support range search
Support dynamic changes Could be either dense or sparse * dense: pointers to all records * sparse: one pointer per block Notes #7

41 B+Trees A B+tree node of order n pn kn k2 p1 k1 p0 p2 …… Notes #7

42 B+Trees A B+tree node of order n
where ph are pointers (disk addresses) and kh are search-keys (values of the attributes in the index) pn kn k2 p1 k1 p0 p2 …… Notes #7

43 B+Trees A B+tree node of order n How big is n?
where ph are pointers (disk addresses) and kh are search-keys (values of the attributes in the index) How big is n? pn kn k2 p1 k1 p0 p2 …… Notes #7

44 B+Trees A B+tree node of order n How big is n?
where ph are pointers (disk addresses) and kh are search-keys (values of the attributes in the index) How big is n? Basically we want each B+tree node to fit in a disk block so that a B+tree node can be read/written by a single disk I/O. Typically, n ~ pn kn k2 p1 k1 p0 p2 …… Notes #7

45 B+Tree Example order n = 3
root 100 30 120 150 180 3 5 11 30 35 100 101 110 120 130 150 156 179 180 200 Notes #7


Download ppt "CPSC-310 Database Systems"

Similar presentations


Ads by Google