Presentation on theme: "CSC 370 – Database Systems Introduction. In essence a database is nothing more than a collection of information that exists over a long period of time."— Presentation transcript:
CSC 370 – Database Systems Introduction
In essence a database is nothing more than a collection of information that exists over a long period of time. Databases are empowered by a body of knowledge and technology embodied in specialized software called a database management system, or DBMS. A DBMS is a powerful tool for creating and managing large amounts of data efficiently and allowing it to persist over long periods of time, safely. Among the most complex types of software available. What’s a database?
1.Allows users to create new databases and specify their schema (logical structure of the data), using a data-definition language. 2.Enables users to query and modify the data, using a query language and data- manipulation language. 3.Supports intelligent storage of very large amounts of data. Protects data from accident or not proper use. Example: We can require from the DBMS to not allow the insertion of two different employees with the same SIN. Allows efficient access to the data for queries and modifications. Example: Indexes over a specified fields 4.Controls access to data from many users at once (concurrency), without allowing “bad” interactions that can corrupt the data accidentally. 5.Recovers from software failures and crashes. The database [management] system
The first commercial database systems evolved from file systems. The file systems allow the storage of big amounts of data (not safely though). But file systems do not provide a query language for the data in files. People had to write programs in order to extract even the most elementary information from a set of files. Example: Suppose we have stored in a file called Employees records having the fields: (code, name, dept_code) and in another file called Departments records having the fields: (dept_code, dept_name) Suppose now that given an employee, for instance with name “Smith”, we want to find what department is he working for. Early database systems and file syst.
In the absence of a query language we have to write a program which will: 1.open the file Employees 2.declare a variable of the same type as the records stored in the file 3.scan the file: while the end of the file is not yet encountered assign the current record to above variable. 4.If the value of the name field is “Smith” get the value of the dept_code field. Suppose it is “10000” 5.Search in a similar way for a record with “10000” for the dept_code field in the Department file. 6.Print the dept_name when successfully finding the dept_code. Very painful procedure even for the simplest queries. Compare it to the short and elegant SQL query SELECT dept_name FROM Employees, Department WHERE Employees.name=”Smith” AND Employees.dept_code = Department.dept_code Cont.
The ones where the data was composed of many small items, and many queries or modifications were made. Airline reservation systems Banking systems Corporate records The first important applications of DBMS’s
Here the items of data include: –Reservations by a single customer on a single flight, including such information as assigned seat… –Flights information – the airport they fly from and to, their departure and arrival times… –Ticket information – prices, requirements, and availability. Typical queries ask for: –Flights leaving about a certain time from one given city to another, what seats are available, and at what prices. Typical data modifications include: –Making a reservation in a flight for a customer, assign a seat etc. Many agents will be accessing parts of the data at any given time. The DBMS must allow concurrent accesses preventing problems such as two agents assigning the same seat simultaneously. Also, the DBMS should protect against loss of records if the system suddenly fails. Airline Reservation Systems
Data items include: –Customers, their names, addresses etc. –Accounts, and their balances –Loans, and their balances –Connections between customers and their accounts and loans. Typical queries are those for account and loan balances. Typical modifications are those representing a payment from or deposit to an account. In banking systems failures cannot be tolerated. –E.g, once the money has been ejected from an ATM machine, the bank must record the debit, even if the power immediately fails. –On the other hand, it is not permissible for the bank to record the debit and then not to deliver the money because the power fails. –The proper way to handle this operation is far from obvious and is one of the significant achievements in DBMS architecture. Banking Systems
They encouraged the user to view the data much as it was stored. The chief models were the Hierarchical and Network. The main characteristic of these models was the possibility of easy jumping or navigating from one object to another through pointers. –E.g. From one employee to his department. However these models didn’t provide a high-level query language for the data. –So, one had still to write programs for querying the data. Also they didn’t allow on-line schema modifications. Early DBMS’s (1960’s)
Codd (1970): A database system should present the user with a view of data organized as tables (also called relations). Behind the scene there could be a complex data structure that allows rapid response to a variety of queries. But the user would not be concerned with the storage structure. Queries could be expressed in a very high-level language, which greatly increases the efficiency of database programmers. This high-level query language for relational databases is called: Structured Query Language (SQL) Relational databases
Relations = Tables. The columns are “headed” by attribute names. A relation Accounts might be: Example of a Relational DB Below the attributes are the rows, or tuples. Suppose we want to know the balance of account “67890”. We could ask this query in SQL as in (1). accountNobalancetype savings checking ……… For another e.g., we ask for the sav. accounts with neg. balances (2). We examine all the tuples of the relation Accounts in FROM-clause. Pick out those tuples that satisfy some criterion in the WHERE-clause, Produce as an answer certain attributes of those tuples, as indicated in the SELECT-clause. 1 SELECT balance FROM Accounts WHERE accountNo = 67890; 2 SELECT accountNo FROM Accounts WHERE type = ‘savings’ AND balance < 0;
The “cylindrical” component contains not only data, but also metadata, i.e. info about the structure of data. If DBMS is relational metadata includes: –names of relations, –names of attributes of those relations, and –data types for those attributes (e.g., integer or character string). A database maintains indexes for the data. –Indexes are part of the stored data. –Description of which attributes have indexes is part of the metadata. Architecture of a DBMS
Similar to a book indexes. A book index associates words with page numbers where they appear. A database index associates values of some object field(s) with the physical address of the corresponding objects in the disk. Two are the main properties of an index: a)it is sorted, and b)its size is much smaller than the record set being indexed. Hence, searching in an index is much faster than searching in the corresponding record set. A few words about indexes
The job of the storage manager is to –obtain data from the data storage, and –modify the data to the data storage when requested. Storage manager has two components: –File manager handles files. Keeps track of the location of files Obtains block(s) of a file on request from the buffer manager. –Buffer manager handles main memory. Obtains and returns blocks of data from/to the file manager Stores blocks temporarily in main memory pages. 1 block = 1 page = 4,000 to 16,000 bytes. –Smallest unit of data that is read/written from/to disk. Storage Manager
Query processor handles: queries and modifications to the data. Finds the best way to carry out a requested operation and Issues commands to the storage manager that will carry them out. E.g. A bank has a DB with two relat.: Customers (name, SIN, address), Accounts (accountNo, balance, SIN) Query: “Find the balances of all accounts of which Sally is the owner.” SELECT Accounts.balance FROM Customers, Accounts WHERE Customers.SIN = Accounts.SIN AND Customers.name = 'Sally'; Query Processor
What this query logically says is: 1.Make Cartesian product of tables specified in the FROM-clause, 2.Choose from R the tuples satisfying the condition in the WHERE clause. 3.Produce as answer only the values of attributes in SELECT-clause. If answer this query as it says the performance would be terrible. –Because of the usually enormous Cartesian product. Suppose we have –Index on name of Customer and –Index on SIN of Accounts. Then, query processor will cleverly create a plan which inexpensively: –Retrieves the tuple for “Sally” and gets the SIN number. –Retrieves the account tuples for this SIN number. Query Processor (Cont.)
Transaction manager is responsible for the integrity of the system. It must assure that: –several queries running simultaneously do not interfere with each other and that, –the system will not lose data even if there is a power failure. Transaction manager interacts with: query manager, –Because it may need to delay certain query operations to avoid conflicts. storage manager –Because schemes for protecting data involve storing a log of changes to the data. Transaction Manager
Database Studies Design of databases. –What kinds of information go into the database? –How is the information structured? –How do data items connect? Database programming. –How does one express queries on the database? –How does one use other capabilities of a DBMS, such as transactions or constraints, in an application? –How is database programming combined with conventional programming? Database system implementation. –How does one build a DBMS, including such matters as query processing, transaction processing and organizing storage for efficient access?