Oracle SQL*Loader

Slides:

Advertisements

Similar presentations

Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 17 Disk Storage, Basic File Structures, and Hashing.

Advertisements

Disk Storage, Basic File Structures, and Hashing.

Disk Storage, Basic File Structures, and Hashing

Databasteknik Databaser och bioinformatik Data structures and Indexing (II) Fang Wei-Kleiner.

Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide

Advance Database System

File Organizations and Indexes ISYS 464. Disk Devices Disk drive: Read/write head and access arm. Single-sided, double-sided, disk pack Track, sector,

METU Department of Computer Eng Ceng 302 Introduction to DBMS Disk Storage, Basic File Structures, and Hashing by Pinar Senkul resources: mostly froom.

Efficient Storage and Retrieval of Data

1 Storage Hierarchy Cache Main Memory Virtual Memory File System Tertiary Storage Programs DBMS Capacity & Cost Secondary Storage.

Disk Storage, Basic File Structures, and Hashing

Copyright © 2004 Pearson Education, Inc.. Chapter 13 Disk Storage, Basic File Structures, and Hashing.

CS 728 Advanced Database Systems Chapter 16

Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Chapter 13 Disk Storage, Basic File Structures, and Hashing.

Physical Storage Organization. Advanced DatabasesPhysical Storage Organization2 Outline Where and How data are stored? –physical level –logical level.

DISK STORAGE INDEX STRUCTURES FOR FILES Lecture 12.

Layers of a DBMS Query optimization Execution engine Files and access methods Buffer management Disk space management Query Processor Query execution plan.

Database Management Systems, R. Ramakrishnan and J. Gehrke1 File Organizations and Indexing Chapter 5, 6 of Elmasri “ How index-learning turns no student.

Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide

CHAPTER 13:DISK STORAGE, BASIC FILE STRUCTURES, AND HASHING Disk Storage, Basic File Structures, and Hashing Copyright © 2007 Ramez Elmasri and Shamkant.

Physical Storage Organization. Advanced DatabasesPhysical Storage Organization2 Outline Where and How are data stored? –physical level –logical level.

Disk Storage Copyright © 2004 Pearson Education, Inc.

1 Chapter 1 Disk Storage, Basic File Structures, and Hashing. Adapted from the slides of “Fundamentals of Database Systems” (Elmasri et al., 2003)

Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 17 Disk Storage, Basic File Structures, and Hashing.

Disk Storage, Basic File Structures, and Hashing

Chapter 13 Disk Storage, Basic File Structures, and Hashing. Copyright © 2004 Pearson Education, Inc.

Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide

Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide

Basic File Structures and Hashing Lectured by, Jesmin Akhter, Assistant professor, IIT, JU.

Announcements Exam Friday Project: Steps –Due today.

1 Chapter 17 Disk Storage, Basic File Structures, and Hashing Chapter 18 Index Structures for Files.

Chapter 9 Disk Storage and Indexing Structures for Files Copyright © 2004 Pearson Education, Inc.

1) Disk Storage, Basic File Structures, and Hashing This material is a modified version of the slides provided by Ramez Elmasri and Shamkant Navathe for.

1 Overview of Database Design Process. Data Storage, Indexing Structures for Files 2.

Physical Storage Organization. Advanced DatabasesPhysical Storage Organization2 Outline Where and How data are stored? –physical level –logical level.

Chapter Ten. Storage Categories Storage medium is required to store information/data Primary memory can be accessed by the CPU directly Fast, expensive.

Chapter 13 Disk Storage, Basic File Structures, and Hashing. Copyright © 2004 Pearson Education, Inc.

File Structures. 2 Chapter - Objectives Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files Hashed Files Dynamic and.

Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Chapter 13 Disk Storage, Basic File Structures, and Hashing.

Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide

Lec 5 part2 Disk Storage, Basic File Structures, and Hashing.

Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe.

DMBS Internals I February 24 th, What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the.

Chapter 15 Disk Storage, Basic File Structures, and Hashing. Copyright © 2004 Pearson Education, Inc.

Chapter 5 Record Storage and Primary File Organizations

Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Lec 5 part1 Disk Storage, Basic File Structures, and Hashing.

Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide

File Organization Record Storage and Primary File Organization

Disk Storage, Basic File Structures, and Hashing

Chapter Outline Indexes as additional auxiliary access structure

Lecture 16: Data Storage Wednesday, November 6, 2006.

Disk Storage, Basic File Structures, and Hashing

Lecture 11: DMBS Internals

Hashing algorithm. Hashing algorithm key Buket size 1 Buket size 2 Buket size 3 Remainer Load factor size size

Chapters 17 & 18 6e, 13 & 14 5e: Design/Storage/Index

Disk Storage, Basic File Structures, and Hashing

Disk Storage, Basic File Structures, and Buffer Management

Disk storage Index structures for files

Indexing and Hashing Basic Concepts Ordered Indices

Disk Storage, Basic File Structures, and Hashing.

Advance Database System

Secondary Storage Management Brian Bershad

Chapter 11 Indexing And Hashing (1)

Lec 7:Disk Storage, Basic File Structures, and Hashing

CHAPTER 16 Disk Storage, Basic File Structures, Hashing, and Modern Storage Architectures.

Secondary Storage Management Hank Levy

Disk Storage Devices Preferred secondary storage device for high storage capacity and low cost. Data stored as magnetized areas on magnetic disk surfaces.

Presentation transcript:

Chapter 16-17 File Structures, Hashing, Indexing, and Physical Database Design

Oracle SQL*Loader http://www.oracle.com/technetwork/database/enterprise-edition/sql-loader-overview-095816.html

Storage Primary storage (main memory) Secondary storage Can be operated on directly by computer CPU small, fast Secondary storage http://en.wikipedia.org/wiki/Hard_disk Can not be operated on directly by computer CPU Magnetic disks, optical disks, tapes, etc. Larger capacities, inexpensive, slower than main memory

Table 16.1 Types of Storage with Capacity, Access Time, Max Bandwidth (Transfer Speed), and Commodity Cost

Table 16.2 Specifications of Typical High-End Enterprise Disks from Seagate (a) Seagate Enterprise Performance 10 K HDD - 1200 GB continued on next slide

Storage capacity units Kilobytes – 1000 bytes Megabytes – 1 million bytes Gigabytes (Gbytes) – 1 billion bytes Terabytes – 1000 gigabytes

Memory Hierarchies and Storage Devices Primary storage Cache (static RAM)– most expensive, fast, used by CPU to speed up execution programs http://wombat.doc.ic.ac.uk/foldoc/foldoc.cgi?query=cache Main memory (dynamic RAM) – work area for CPU

Secondary storage (Mass storage) CD-ROM Tapes Disks Main memory database: entire database is stored in main memory

Figure 16. 1 (a) A single-sided disk with read/write hardware Figure 16.1 (a) A single-sided disk with read/write hardware. (b) A disk pack with read/write

Figure 16. 2 Different sector organizations on disk Figure 16.2 Different sector organizations on disk. (a) Sectors subtending a fixed angle. (b) Sectors maintaining a uniform recording density.

Tracks The part of a disk which passes under one read/write head while the head is stationary. The number of tracks on a disk surface therefore corresponds to the number of different radial positions of the head(s). The collection of all tracks on all surfaces at a given radial position is known a cylinder and each track is divided into sectors.

Cylinder The set of tracks on a multi-headed disk that may be accessed without head movement. That is, the collection of disk tracks which are the same distance from the spindle about which the disks rotate.

Sector one sector lies within a continuous range of rotational angle of the disk

Data transfer between main memory and disks (in blocks) Hardware Address of a block Surface number Track number Block number Time requires Seek time Rotational delay time (latency) Block transfer time

Table 16.2 (continued) Specifications of Typical High-End Enterprise Disks from Seagate (a) Seagate Enterprise Performance 10 K HDD - 1200 GB continued on next slide

Table 16.2 (continued) Specifications of Typical High-End Enterprise Disks from Seagate (a) Seagate Enterprise Performance 10 K HDD - 1200 GB

Figure 16.3 Interleaved concurrency versus parallel execution.

Figure 16.4 Use of two buffers, A and B, for reading from disk.

Figure 16. 5 Three record storage formats Figure 16.5 Three record storage formats. (a) A fixed-length record with six fields and size of 71 bytes. (b) A record with two variable-length fields and three fixed-length fields. (c) A variable-field record with three types of separator characters.

Figure 16.6 Types of record organization. (a) Unspanned. (b) Spanned.

Figure 16.7 Some blocks of an ordered (sequential) file of EMPLOYEE records with Name as the ordering key field.

Table 16.3 Average Access Times for a File of b Blocks under Basic File Organizations

File organization Heap file (unordered file) place new records in no order at the end of the file Sorted file ( sequential file) keeps the records ordered by the value of a particular file Hashed file Uses hash function applied to a field (hash key) to determine a record’s placement on disk B-trees, B+ trees – use tree structure

Static hashing – hash address space is fixed Hashing techniques Static hashing – hash address space is fixed Extendible hashing Linear hashing

Hashing algorithm

Hash Table (Wikipedia) http://en.wikipedia.org/wiki/Hash_table

Figure 16. 8 Internal hashing data structures Figure 16.8 Internal hashing data structures. (a) Array of M positions for use in internal hashing. (b) Collision resolution by chaining records.

Figure 16.9 Matching bucket numbers to disk block addresses.

Figure 16.10 Handling overflow for buckets by chaining.

Figure 16.11 Structure of the extendible hashing scheme.

Figure 16.11 Structure of the extendible hashing scheme.

Figure 16.12 Structure of the dynamic hashing scheme.

Figure 16. 13 Striping of data across multiple disks Figure 16.13 Striping of data across multiple disks. (a) Bit-level striping across four disks. (b) Block-level striping across four disks.

Figure 16. 14 Some popular levels of RAID Figure 16.14 Some popular levels of RAID. (a) RAID level 1: Mirroring of data on two disks. (b) RAID level 5: Striping of data with distributed parity across four disks.

A search tree of order p is a tree such that each node contains at most p - 1 search values and p pointers in the order < P1, K1, P2, K2, ..., Pq-1, Kq-1, Pq >, where q 1 p; each Pi is a pointer to a child node (or a null pointer); and each Ki is a search value from some ordered set of values.

B tree of order p Each internal node in the B-tree is of the form <P1, <K1, Pr1> , P2, <K2, Pr2> , ..., <Kq-1,Prq-1> , Pq> where q 1 p. Each Pi is a tree pointer—a pointer to another node in the B-tree. Each Pri is a data pointer —a pointer to the record whose search key field value is equal to Ki (or to the data file block containing that record). Within each node, K1 <K2 < ... < Kq-1. 3. For all search key field values X in the subtree pointed at by Pi (the ith subtree, see Figure 06.10a), we have: Ki-1 < X < Ki for 1 < i < q; X < Ki for i = 1; and Ki-1 < X for i = q. Each node has at most p tree pointers. Each node, except the root and leaf nodes, has at least (p/2) tree pointers. The root node has at least two tree pointers unless it is the only node in the tree. A node with q tree pointers, q 1 p, has q - 1 search key field values (and hence has q - 1 data pointers). All leaf nodes are at the same level. Leaf nodes have the same structure as internal nodes except that all of their tree pointers Pi are null.

Root: 1 node 15 entries 16 pointers EXAMPLE 5: Suppose that the search field of Example 4 is a nonordering key field, and we construct a B-tree on this field. Assume that each node of the B-tree is 69 percent full. Each node, on the average, will have p * 0.69 = 23 * 0.69 or approximately 16 pointers and, hence, 15 search key field values. The average fan-out fo =16. We can start at the root and see how many values and pointers can exist, on the average, at each subsequent level: Root: 1 node 15 entries 16 pointers Level 1: 16 nodes 240 entries 256 pointers Level 2: 256 nodes 3840 entries 4096 pointers Level 3: 4096 nodes 61,440 entries 65536 pointers Level 4: 65536 nodes 983,040 entries

B+ Trees The structure of the internal nodes of a B+-tree of order p is as follows: Each internal node is of the form <P1, K1, P2, K2, ..., Pq-1, Kq-1, Pq> where q 1 p and each Pi is a tree pointer. Within each internal node, K1 < K2 < ... <Kq-1. For all search field values X in the subtree pointed at by Pi, we have Ki-1 < X 1 Ki for 1 < i < q; X 1 Ki for i = 1; and Ki-1 < X for i = q. Each internal node has at most p tree pointers. Each internal node, except the root, has at least (p/2) tree pointers. The root node has at least two tree pointers if it is an internal node. An internal node with q pointers, q 1 p, has q - 1 search field values.

The structure of the leaf nodes of a B+-tree of order p (Figure 14 The structure of the leaf nodes of a B+-tree of order p (Figure 14.11b) is as follows: Each leaf node is of the form <<K1, Pr1> , <K2, Pr2>, ..., <Kq-1, Prq-1>, Pnext> Where q 1 p, each Pri is a data pointer, and Pnext points to the next leaf node of the B+-tree. Within each leaf node, K1 < K2 < ... < Kq-1, q 1 p. Each Pri is a data pointer that points to the record whose search field value is Ki or to a file block containing the record (or to a block of record pointers that point to records whose search field value is Ki if the search field is not a key). Each leaf node has at least (p/2) values. All leaf nodes are at the same level.

Root: 1 node 22 entries 23 pointers EXAMPLE 7: Suppose that we construct a B+-tree on the field of Example 6. To calculate the approximate number of entries of the B+-tree, we assume that each node is 69 percent full. On the average, each internal node will have 34 * 0.69 or approximately 23 pointers, and hence 22 values. Each leaf node, on the average, will hold 0.69 * pleaf = 0.69 * 31 or approximately 21 data record pointers. A B+-tree will have the following average number of entries at each level: Root: 1 node 22 entries 23 pointers Level 1: 23 nodes 506 entries 529 pointers Level 2: 529 nodes 11,638 entries 12,167 pointers Level 3: 12,167nodes 255,507entries 279,841 pointers Level 4: 279,841 nodes 5,876,661entries