CPSC 231 Managing Files of Records (D.H.) 1 Learning Objectives Concept of key - primary and secondary keys. Sequential versus direct access. RRN Use of.

Slides:



Advertisements
Similar presentations
Disk Storage, Basic File Structures, and Hashing
Advertisements

Chapter 10: Designing Databases
ICDL Software Applications - Database Concepts. Unit 6 Data and Data Representation Database Concepts –File Structure –Relationships Database Design –Data.
File Processing - Organizing file for Performance MVNC1 Organizing Files for Performance Chapter 6 Jim Skon.
Dr. Kalpakis CMSC 661, Principles of Database Systems Representing Data Elements [12]
File StructureSNU-OOPSLA Lab1 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형주 교수 Chap 5. Managing Files of Records File Structures by Folk, Zoellick, and Ricarrdi.
Comp 335 File Structures Indexes. The Search for Information When searching for information, the information desired is usually associated with a key.
1 Introduction to Database Systems CSE 444 Lectures 19: Data Storage and Indexes November 14, 2007.
Chapter 10: File-System Interface
Chapter 11 File-System Interface
Advance Database System
CPSC 231 Organizing Files for Performance (D.H.) 1 LEARNING OBJECTIVES Data compression. Reclaiming space in files. Compaction. Searching. Sorting, Keysorting.
File System Implementation
Variable Length Data and Records Eswara Satya Pavan Rajesh Pinapala CS 257 ID: 221.
LEARNING OBJECTIVES Index files.
Chapter 12 File Management
1 File Structure n File as a stream of characters l No structure l Consider students registered in a course Joe SmithSC Kathy LeeEN Albert.
METU Department of Computer Eng Ceng 302 Introduction to DBMS Disk Storage, Basic File Structures, and Hashing by Pinar Senkul resources: mostly froom.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Chapter 13 Disk Storage, Basic File Structures, and Hashing.
Operating Systems File systems
Managing Files of Records CS 3050, Spring /4/2007 Dr Melanie Martin.
DISK STORAGE INDEX STRUCTURES FOR FILES Lecture 12.
Beyond Record Structures Dr. Robert J. Hammell Assistant Professor Towson University Computer and Information Sciences Department 8000 York Road - Suite.
CHP - 9 File Structures. INTRODUCTION In some of the previous chapters, we have discussed representations of and operations on data structures. These.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 17 Disk Storage, Basic File Structures, and Hashing.
February 1 & 31 Csci 2111: Data and File Structures Week4, Lectures 1 & 2 Fundamental File Structure Concepts & Managing Files of Records.
Fundamental File Structure Concepts & Managing Files of Records
Data and its manifestations. Storage and Retrieval techniques.
File Systems Long-term Information Storage Store large amounts of information Information must survive the termination of the process using it Multiple.
Chapter 10: File-System Interface 10.1 Silberschatz, Galvin and Gagne ©2011 Operating System Concepts – 8 th Edition 2014.
File Systems CSCI What is a file? A file is information that is stored on disks or other external media.
File System Interface. File Concept Access Methods Directory Structure File-System Mounting File Sharing (skip)‏ File Protection.
Operating Systems (CS 340 D) Dr. Abeer Mahmoud Princess Nora University Faculty of Computer & Information Systems Computer science Department.
File Systems (1). Readings r Reading: Disks, disk scheduling (3.7 of textbook; “How Stuff Works”) r Reading: File System Implementation ( of textbook)
Operating Systems COMP 4850/CISG 5550 File Systems Files Dr. James Money.
CS4432: Database Systems II Record Representation 1.
Database Management Systems.  Database management system (DBMS)  Store large collections of data  Organize the data  Becomes a data storage system.
Access 2007 ® Use Databases How can Microsoft Access 2007 help you structure your database?
Chapter 13 Disk Storage, Basic File Structures, and Hashing. Copyright © 2004 Pearson Education, Inc.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 12: File System Implementation File System Structure File System Implementation.
Module 4.0: File Systems File is a contiguous logical address space.
Practical PC, 7 th Edition Chapter 4: File Basics.
File Processing - Fundamental concepts MVNC1 Fundamental File Structure Concepts Chapter 4.
File Structures. 2 Chapter - Objectives Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files Hashed Files Dynamic and.
Disk & File System Management Disk Allocation Free Space Management Directory Structure Naming Disk Scheduling Protection CSE 331 Operating Systems Design.
CE Operating Systems Lecture 17 File systems – interface and implementation.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Chapter 13 Disk Storage, Basic File Structures, and Hashing.
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems File systems.
It consists of two parts: collection of files – stores related data directory structure – organizes & provides information Some file systems may have.
Chapter 5 Record Storage and Primary File Organizations
W4118 Operating Systems Instructor: Junfeng Yang.
Fundamental File Structure Concepts
Module 11: File Structure
CPSC 231 Organizing Files for Performance (D.H.)
CHP - 9 File Structures.
CPSC 231 Managing Files of Records (D.H.)
LEARNING OBJECTIVES O(1), O(N) and O(LogN) access times. Hashing:
Ch. 8 File Structures Sequential files. Text files. Indexed files.
File System Implementation
9/12/2018.
Disk Storage, Basic File Structures, and Hashing
Disk Storage, Basic File Structures, and Buffer Management
Disk storage Index structures for files
Variable Length Data and Records
Files Management – The interfacing
Introduction to Database Systems CSE 444 Lectures 19: Data Storage and Indexes May 16, 2008.
Chap 5. Managing Files of Records
VIJAYA PAMIDI CS 257- Sec 01 ID:102
Beyond Record Structures
Database management systems
Presentation transcript:

CPSC 231 Managing Files of Records (D.H.) 1 Learning Objectives Concept of key - primary and secondary keys. Sequential versus direct access. RRN Use of templates for I/O operations Abstract Data Models Tags Extensibility Portability

CPSC 231 Managing Files of Records (D.H.) 2 Record Access -Record Keys File structures concentrate on records. Records are retrieved, written, modified deleted, etc. In order to perform operations on a record we need to identify this record or we need a record key!

CPSC 231 Managing Files of Records (D.H.) 3 Key Record key is an expression derived from one or more of the fields within a record that can be used to identify this record. The fields used to build the key are sometimes called the key fields. Key based access provides a way of performing content-based retrieval of records, rather than retrieval based merely on a record’s position.

CPSC 231 Managing Files of Records (D.H.) 4 Desired properties of primary keys Canonical (conforming to specific rules) –e.g. has to consist of uppercase letters and no blanks Unique (each record has a distinct key) –e.g. student I.D. number+StudentName –unique canonical keys are called primary keys Primary keys should be dataless –it is easier to ensure its uniqueness –it should be unchanging

CPSC 231 Managing Files of Records (D.H.) 5 Secondary keys Secondary keys do not have to be unique and can contain data –E.G. city field in a record with a name and address.

CPSC 231 Managing Files of Records (D.H.) 6 Sequential Search Sequential search for a desired record requires that the file records be retrieved serially until the record that matches a desired key is found. Sequential search is based on reading each record from a file and comparing its key with the key of the record that we are looking for.

CPSC 231 Managing Files of Records (D.H.) 7 Sequential search performance If each read operation requires one disk access than sequential read can be very inefficient. E.g. Suppose that we have a file with one thousand records and we are looking for a record called Alan Smith. For an average search we need 500 read operations. In general: n/2 read calls are needed if the file has n records.

CPSC 231 Managing Files of Records (D.H.) 8 Blocking of records In order to improve performance, records are blocked together on the disk and read together to avoid additional seeks. Blocking can improve the sequential search time considerably due to reduced number of seeks. Note that the average search time is still proportional to the number of records in the file (O(n)).

CPSC 231 Managing Files of Records (D.H.) 9 When sequential search is good. Text files in which you are searching for some pattern. Files with few records. Files that hardly ever are searched (archive files). Files that are searched on a secondary key and a lot matches are expected.

CPSC 231 Managing Files of Records (D.H.) 10 Unix Tools for Sequential processing cat - prints a text file sequentially to the console. –E.G. % cat myfile wc - reads a text file sequentially and counts the number of lines and words in it. –E.G. % wc myfile grep - searches sequentially through a file for a pattern –E.G. % grep text myfile

CPSC 231 Managing Files of Records (D.H.) 11 Direct Access A radical alternative to sequential search is direct access. Direct access is a file access mode that involves jumping to the exact location of a record in the file. The search time required to perform a read via direct access is constant and it does not depend on the number of records in the file (O(1)).

CPSC 231 Managing Files of Records (D.H.) 12 Direct Access C++ Example Int IOBuffer:: DRead (istream &, int reref) //read specified record //recref is record reference (or address, or offset) { stream.seekg(recref, ios::beg); if (stream.tellg()!=recref) return -1; return Read(stream); }

CPSC 231 Managing Files of Records (D.H.) 13 RRN RRN = Relative Record Number If a file is a collection of records than RRN is the record number of a record relative to its position in the file –E.G. RRN =0 for the first record, RRN=1 for the second record, etc.

CPSC 231 Managing Files of Records (D.H.) 14 RRN usage What was an RRN in our first assignment? We can support direct access with RRNs if the file structure uses fixed size records. How?

CPSC 231 Managing Files of Records (D.H.) 15 Record Structure and Length In designing a fixed size record structure one may choose: –fixed length fields or –variable size fields. The fixed length fields approach is simple but it tends to waste more disk space. The variable size fields approach is more complicated but its usage of disk space is better.

CPSC 231 Managing Files of Records (D.H.) 16 Record size One may choose to assure that a record never spans multiple sectors by selecting a record size that is a power of two. This way an integral number of records can be placed in one sector.

CPSC 231 Managing Files of Records (D.H.) 17 Header records Header record is a record placed at the beginning of a file that is used to store information about the file contents and the file organization. –E.G. Header record contents can be three two byte values: the size of the header the number of records the size of each record

CPSC 231 Managing Files of Records (D.H.) 18 Header records - cont. Additionally, the following information can be kept in header records: –the date and the time of last update –the date and the time of last access –protection information Header records usually have a different structure than data records.

CPSC 231 Managing Files of Records (D.H.) 19 C++ Templates use for file I/O. Template class to support direct read and write of records The template parameter RecType must support the following int Pack (BufferType &) ; pack record int Unpack (BufferType &); unpack record

CPSC 231 Managing Files of Records (D.H.) 20 Example of file I/O using templates template class RecordFile:public BufferFile {public: int Read (RecType & record, int recaddr); int Write(const RecType & record, int recaddr); RecordFile(IOBuffer &buffer): BufferFile(buffer) {} };

CPSC 231 Managing Files of Records (D.H.) 21 Template method: Read template int RecordFile ::Read (RecType & record, int recaddr) { int writeAddr, result; writeAddr = BufferFile::Read(recaddr); if (!writeAddr) return -1; result = record.Unpack(buffer); if(!result) return -1; return writeAddr; }

CPSC 231 Managing Files of Records (D.H.) 22 RecordFile template The RecordFile template is a pattern that can be used for different classes of records. When a template class is supplied with values for its parameters, it becomes a real class. –E.G. RecordFile PersonFile (Buffer)

CPSC 231 Managing Files of Records (D.H.) 23 File Organization vs File Access File Organization refers to: record and field organization E.G. variable or fixed size records, etc. File Access refers to; sequential access or direct access Both file organization and access need to be considered when designing an efficient file structure for a given application.

CPSC 231 Managing Files of Records (D.H.) 24 Abstract Data Models Abstract Data Models refer to application oriented view of data (as opposed to media - oriented view). Abstract Data Models allow for dealing with information that cannot be easily represented as a sequence of records. E.G. images, sounds

CPSC 231 Managing Files of Records (D.H.) 25 Metadata Data in the file that is not the primary data but describes the primary data in the file. A common place to store metadata is in a file header. Typically, a community of users of a particular data agrees on a standard format for holding metadata.

CPSC 231 Managing Files of Records (D.H.) 26 Metadata Example FITS (Flexible Image Transport System) is a standard for holding metadata in images developed by the International Astronomers’ Union. FITS header record is 2880 bytes long and holds information images generated by telescopes such as: Date/time and place of the picture taken, Galactic longitude and latitude, telescope type, e number of pixels/row and number of rows, etc.

CPSC 231 Managing Files of Records (D.H.) 27 Tagged File Format Tags are keywords used in connection of file structures to identify various data objects. Tagged files are used to store objects with different data types. Index tables and tags are used to hold information about data objects and to distinguish different types of objects. See Fig.5.9 p.181 of text to for tagged file

CPSC 231 Managing Files of Records (D.H.) 28 Examples of tags header / header record text /text data image /image data exec/executable program video/video data sound /sound data

CPSC 231 Managing Files of Records (D.H.) 29 Extensibility Tag approach allows for easy extensibility of file systems. Once the software is built to manipulate different type of objects, it is easy to add to it new object types. For each new object one needs to define a tag, index and the methods for reading and writing it.

CPSC 231 Managing Files of Records (D.H.) 30 Portability Portability of a file system refers to its ability to be used on different hardware platforms, running various operating system, to be used by different applications.

CPSC 231 Managing Files of Records (D.H.) 31 Portability Issues Differences among operating systems –E.G. EOF character in MS-DOS is CTRL/Z but in Unix is CTRL/D Differences among languages –E.G. Pascal supports only fixed size record for non text files but C++ supports both fixed size and variable size records Differences in hardware architectures –E.G. PC stores the low order byte followed by the high order byte, but Sun does the way around.

CPSC 231 Managing Files of Records (D.H.) 32 Achieving Portability Standard formats for data storage and encoding are used to achieve portability Standard physical record format represents a data format that is independent of the hardware, the language and the operating system. E.G. FITS is a good a example of a standard physical format

CPSC 231 Managing Files of Records (D.H.) 33 Standard Data Encoding Standards for Text and Number Encoding –ASCII or EBCDIC for text –IEEE Standard formats and XDR formats for binary representation of numbers IEEE Standard formats specify formats for data 32 bit, 62-bit and 128-bit floating point numbers and for 8-bit, 16-bit, and 32-bit integers XDR specifies encoding for files and routines for each machine how to convert data while writing to a file and vice versa.