CNG 3511 CNG 351 Introduction to Data Management and File Structures Müslim Bozyiğit (Prof. Dr.) Department of Computer Engineering METU.

Slides:



Advertisements
Similar presentations
Indexing Large Data COMP # 22
Advertisements

Database management system (DBMS)  a DBMS allows users and other software to store and retrieve data in a structured way  controls the organization,
Chapter 6: Memory Management
Lecture 13 Page 1 CS 111 Online File Systems: Introduction CS 111 On-Line MS Program Operating Systems Peter Reiher.
Comp 335 File Structures Indexes. The Search for Information When searching for information, the information desired is usually associated with a key.
Managing data Resources: An information system provides users with timely, accurate, and relevant information. The information is stored in computer files.
1 Storing Data: Disks and Files Yanlei Diao UMass Amherst Feb 15, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
METU Department of Computer Eng Ceng 302 Introduction to DBMS Disk Storage, Basic File Structures, and Hashing by Pinar Senkul resources: mostly froom.
Introduction to Databases Transparencies
Using Secondary Storage Effectively In most studies of algorithms, one assumes the "RAM model“: –The data is in main memory, –Access to any item of data.
Chapter 12 File Management Systems
Data Storage Technology
Introduction to Databases and Database Languages
School of Engineering and Computer Science Victoria University of Wellington Copyright: Xiaoying Gao, Peter Andreae, VUW Indexing Large Data COMP
CENG 3511 CENG 351 Introduction to Data Management and File Structures Nihan Kesim Çiçekli Department of Computer Engineering METU.
File Organization Techniques
Lecture 11: DMBS Internals
Chapter 10 Storage and File Structure Yonsei University 2 nd Semester, 2013 Sanghyun Park.
1 Chapter 12 File Management Systems. 2 Systems Architecture Chapter 12.
Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?
Overview of Computing. Computer Science What is computer science? The systematic study of computing systems and computation. Contains theories for understanding.
1 Introduction to Database Systems. 2 Database and Database System / A database is a shared collection of logically related data designed to meet the.
Databases and Database Management Systems
File Processing - Introduction MVNC1 File Processing BASIC CONCEPTS & TERMINOLOGY.
March 16 & 21, Csci 2111: Data and File Structures Week 9, Lectures 1 & 2 Indexed Sequential File Access and Prefix B+ Trees.
File System Interface. File Concept Access Methods Directory Structure File-System Mounting File Sharing (skip)‏ File Protection.
External data structures
Database Management Systems,Shri Prasad Sawant. 1 Storing Data: Disks and Files Unit 1 Mr.Prasad Sawant.
 Three-Schema Architecture Three-Schema Architecture  Internal Level Internal Level  Conceptual Level Conceptual Level  External Level External Level.
External Storage Primary Storage : Main Memory (RAM). Secondary Storage: Peripheral Devices –Disk Drives –Tape Drives Secondary storage is CHEAP. Secondary.
IDA / ADIT Databasteknik Databaser och bioinformatik Data structures and Indexing (I) Fang Wei-Kleiner.
2.1 Operating System Concepts Chapter 2: Computer-System Structures Computer System Operation Storage Structure Storage Hierarchy Hardware Protection General.
CS246 Data & File Structures Lecture 1 Introduction to File Systems Instructor: Li Ma Office: NBC 126 Phone: (713)
CS4432: Database Systems II Record Representation 1.
File Organization Lecture 1
MEMORY ORGANIZTION & ADDRESSING Presented by: Bshara Choufany.
Chapter Ten. Storage Categories Storage medium is required to store information/data Primary memory can be accessed by the CPU directly Fast, expensive.
Introduction to Database Systems1. 2 Basic Definitions Mini-world Some part of the real world about which data is stored in a database. Data Known facts.
CPSC 404, Laks V.S. Lakshmanan1 External Sorting Chapter 13: Ramakrishnan & Gherke and Chapter 2.3: Garcia-Molina et al.
CPSC 231 D.H.1 Learning Objectives Understanding of disk versus RAM performance gap. Understanding definition, design goals and design problems of file.
Lecture 3 Page 1 CS 111 Online Disk Drives An especially important and complex form of I/O device Still the primary method of providing stable storage.
CIS/SUSL1 Fundamentals of DBMS S.V. Priyan Head/Department of Computing & Information Systems.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
Data Structures and Algorithms Dr. Tehseen Zia Assistant Professor Dept. Computer Science and IT University of Sargodha Lecture 1.
1 TOPIC 6 DATABASE 6.1 Introduction to Database 6.2 Basic Concept of Database 6.3 Database Object DATABASE.
BBM371 Data Managment Assoc. Prof. Dr. Ebru Akçapınar Sezer
Introduction.  Administration  Simple DBMS  CMPT 454 Topics John Edgar2.
FILE ORGANIZATION.
A memory is just like a human brain. It is used to store data and instructions. Computer memory is the storage space in computer where data is to be processed.
1 CSC103: Introduction to Computer and Programming Lecture No 27.
DMBS Internals I February 24 th, What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
CENG 3511 CENG 351 Introduction to Data Management and File Structures Nihan Kesim Çiçekli Department of Computer Engineering METU.
1 CENG 351 CENG 351 Introduction to Data Management and File Structures Department of Computer Engineering METU.
1 Lecture 16: Data Storage Wednesday, November 6, 2006.
( ) 1 Chapter # 8 How Data is stored DATABASE.
CENG 351 Introduction to Data Management and File Structures
File Organization and Processing
Chapter 2: Computer-System Structures
Lecture 16: Data Storage Wednesday, November 6, 2006.
Operating System Structure
Lecture 11: DMBS Internals
Disk Storage, Basic File Structures, and Buffer Management
Managing data Resources:
RDBMS Chapter 4.
Chapter 2: Operating-System Structures
Introduction to Operating Systems
Introduction to Data Structure
CENG 351 Introduction to Data Management and File Structures
Chapter 2: Operating-System Structures
Presentation transcript:

CNG 3511 CNG 351 Introduction to Data Management and File Structures Müslim Bozyiğit (Prof. Dr.) Department of Computer Engineering METU

CNG 3512 Introduction to Data Management and File management

CNG 3513 Motivation  Most computers are used for data processing, as a big growth area in the “information age”  Data processing from a computer science perspective: –Storage of data –Organization of data –Access to data –Processing of data

CNG 3514 Data Structures vs File Structures Both involve: –Representation of Data + –Operations for accessing data Difference: –Data structures: deal with data in the main memory –File structures: deal with the data in the secondary storage

CNG 3515 Hardware Operating System DBMS File system Application Where do File Structures fit in Computer Science?

CNG 3516 Computer Architecture Main Memory (RAM) Secondary Storage data transfer data is manipulated here data is stored here - type: Semiconductors - Properties: Fast, expensive, volatile, small - type: disks, tapes - properties: Slow,cheap, stable, large

CNG 3517 Main Memory-MM fast small volatile, i.e. data is lost during power failures. Secondary Storage-SS big (because it is cheap) stable (non-volatile) i.e. data is not lost during power failures slow (10,000 times slower than MM)

CNG 3518 How fast is the main memory? Typical time for getting info from: Main memory: ~10 nanosec = 10 x sec Hard disks: ~10 milisec = 10 x sec An analogy keeping same time proportion as above: seconds versus weeks

CNG 3519 Goal of the file structures What is performance –Time Minimize the number of trips to the SS in order to get desired information Group related information so that we are likely to get everything we need with fewer trip to the SS. –Memory Balance the memory size and the time How to improve performance –Use the right file structure Understand the advantages disadvantages of alternative methods

CNG Metrics used to measure efficiency and effectiveness of a File structure-1 simplicity, reliability, time complexities, space complexities, scalability, programmability, and maintainability.  Note that the domains of the efficiency and effectiveness concerns rely on time and space complexity more than any other factor.

CNG Metrics used to measure efficiency and effectiveness of a File structure-2 The file structures involve two domains: hardware and software. –Hardware primarily involves the physical characteristics of the storage medium. –Software involves the data structures used to manipulate the files and methods or algorithms to deal with these structures. The physical characteristics of the hardware together with data structures and the algorithms are used to predict the efficiency of file operations.

CNG File operations search for a particular data in a file, add a certain data item, remove a certain item, order the data items according to a certain criterion, merge of files, creation of new files from existing file(s). finally create, open, and close operations which have implications in the operating system.

CNG File structures versus DBMS According to Alan Tharp, “file structures is used to process data in physical level, DBMS is used to manage data in a logical level” According to Raghu Ramakrishnan, “DBMS is a piece of software designed to make data maintenance easier, safer, and more reliable”. Thus, file processing is a pre-requisite to DBMSs.  Note that small applications may not be able to justify the overhead incurred by the DBMSs.

CNG Physical Files and Logical Files physical file: a collection of bytes stored on a disk or tape logical file: an “interface" that allows the application programs to access the physical file on the SS The operating system is responsible for associating a logical file in a program to a physical file in a SS. Writing to or reading from a file in a program is done through the operating system.

CNG Files The physical file has a name, for instance myfile.txt The logical file has a logical name (a varibale) inside the program. In C : FILE * outfile ; In C++: fstream outfile;

CNG Basic File Processing Operations Opening Closing Reading Writing Seeking

CNG Opening Files Opening Files: – links a logical file to a physical file. In C: FILE * outfile; outfile = fopen(“myfile.txt”, “w”); In C++: fstream outfile; outfile.open(“myfile.txt”, ios::out);

CNG Closing Files Cuts the link between the physical and logical files. After closing a file, the logical name is free to be associated to another physical file. Closing a file used for output guarantees everything has been written to the physical file. (When the file is closed the leftover from the buffers in the MM is flushed to the file on the SS.) In C : fclose(outfile); In C++ : outfile.close();

CNG Reading Read data from a file and place it in a variable inside the program. In C: char c; FILE * infile; infile = fopen(“myfile.txt”,”r”); fread(&c, 1, 1, infile); In C++: char c; fstream infile; infile.open(“myfile.txt”,ios::in); infile >> c;

CNG Writing Write data from a variable inside the program into the file. In C: char c; FILE * outfile; outfile = fopen(“mynew.txt”,”w”); fwrite(&c, 1, 1, outfile); In C++: char c; fstream outfile; outfile.open(“mynew.txt”,ios::out); outfile << c;

CNG Seeking Used for direct access; an item can be accessed by specifying its position in the file. In C: fseek(infile,0, 0); // moves to the beginning fseek(infile, 0, 2); // moves to the end fseek(infile,-10, 1); //moves 10 bytes from //current position In C++: infile.seekg(0,ios::beg); infile.seekg(0,ios::end); infile.seekg(-10,ios::cur);

CNG File Systems Data is not scattered on disk. Instead, it is organized into files. Files are organized into records. Records are organized into fields.

CNG Example A student file may be a collection of student records, one record for each student Each student record may have several fields, such as –Name –Address –Student number –Gender –Age –GPA Typically, each record in a file has the same fields.

CNG Properties of Files 1)Persistence: Data written into a file persists after the program stops, so the data can be used later. 2)Shareability: Data stored in files can be shared by many programs and users simultaneously. 3)Size: Data files can be very large. Typically, they cannot fit into MM.