2What’s a database?In essence a database is nothing more than a collection of information that exists over a long period of time.Databases are empowered by a body of knowledge and technology embodied in specialized software called a database management system, or DBMS.A DBMS is a powerful tool for creating and managing large amounts of data efficiently and allowing it to persist over long periods of time, safely.Among the most complex types of software available.
3The database [management] system Allows users to create new databases and specify their schema (logical structure of the data), using a data-definition language.Enables users to query and modify the data, using a query language and data-manipulation language.Supports intelligent storage of very large amounts of data.Protects data from accident or not proper use.Example: We can require from the DBMS to not allow the insertion of two different employees with the same SIN.Allows efficient access to the data for queries and modifications.Example: Indexes over a specified fieldsControls access to data from many users at once (concurrency), without allowing “bad” interactions that can corrupt the data accidentally.Recovers from software failures and crashes.
4Early database systems and file syst. The first commercial database systems evolved from file systems.The file systems allow the storage of big amounts of data (not safely though).But file systems do not provide a query language for the data in files.People had to write programs in order to extract even the most elementary information from a set of files.Example: Suppose we have stored in a file called Employees records having the fields:(code, name, dept_code)and in another file called Departments records having the fields:(dept_code, dept_name)Suppose now that given an employee, for instance with name “Smith”, we want to find what department is he working for.
5Cont.In the absence of a query language we have to write a program which will:open the file Employeesdeclare a variable of the same type as the records stored in the filescan the file:while the end of the file is not yet encountered assign the current record to above variable.If the value of the name field is “Smith” get the value of the dept_code field. Suppose it is “10000”Search in a similar way for a record with “10000” for the dept_code field in the Department file.Print the dept_name when successfully finding the dept_code.Very painful procedure even for the simplest queries.Compare it to the short and elegant SQL query SELECT dept_nameFROM Employees, DepartmentWHERE Employees.name=”Smith” AND Employees.dept_code = Department.dept_code
6The first important applications of DBMS’s The ones where the data was composed ofmany small items, andmany queries or modifications were made.Airline reservation systemsBanking systemsCorporate records
7Airline Reservation Systems Here the items of data include:Reservations by a single customer on a single flight, including such information as assigned seat…Flights information – the airport they fly from and to, their departure and arrival times…Ticket information – prices, requirements, and availability.Typical queries ask for:Flights leaving about a certain time from one given city to another, what seats are available, and at what prices.Typical data modifications include:Making a reservation in a flight for a customer, assign a seat etc.Many agents will be accessing parts of the data at any given time.The DBMS must allow concurrent accesses preventing problems such as two agents assigning the same seat simultaneously. Also, the DBMS should protect against loss of records if the system suddenly fails.
8Banking Systems Data items include: Customers, their names, addresses etc.Accounts, and their balancesLoans, and their balancesConnections between customers and their accounts and loans.Typical queries are those for account and loan balances.Typical modifications are those representing a payment from or deposit to an account.In banking systems failures cannot be tolerated.E.g, once the money has been ejected from an ATM machine, the bank must record the debit, even if the power immediately fails.On the other hand, it is not permissible for the bank to record the debit and then not to deliver the money because the power fails.The proper way to handle this operation is far from obvious and is one of the significant achievements in DBMS architecture.
9Early DBMS’s (1960’s)They encouraged the user to view the data much as it was stored.The chief models were the Hierarchical and Network.The main characteristic of these models was the possibility of easy jumping or navigating from one object to another through pointers.E.g. From one employee to his department.However these models didn’t provide a high-level query language for the data.So, one had still to write programs for querying the data.Also they didn’t allow on-line schema modifications.
10Relational databasesCodd (1970): A database system should present the user with a view of data organized as tables (also called relations).Behind the scene there could be a complex data structure that allows rapid response to a variety of queries.But the user would not be concerned with the storage structure.Queries could be expressed in a very high-level language, which greatly increases the efficiency of database programmers.This high-level query language for relational databases is called: Structured Query Language (SQL)
11Example of a Relational DB Relations = Tables. The columns are “headed” by attribute names.A relation Accounts might be:accountNobalancetype12345savings67890checking…Below the attributes are the rows, or tuples.Suppose we want to know the balance of account “67890”. We could ask this query in SQL as in (1).1SELECT balanceFROM AccountsWHERE accountNo = 67890;2SELECT accountNoFROM AccountsWHERE type = ‘savings’ AND balance < 0;For another e.g., we ask for the sav. accounts with neg. balances (2).We examine all the tuples of the relation Accounts in FROM-clause.Pick out those tuples that satisfy some criterion in the WHERE-clause,Produce as an answer certain attributes of those tuples, as indicated in the SELECT-clause.
12Architecture of a DBMSThe “cylindrical” component contains not only data, but also metadata, i.e. info about the structure of data.If DBMS is relational metadata includes:names of relations,names of attributes of those relations, anddata types for those attributes (e.g., integer or character string).A database maintains indexes for the data.Indexes are part of the stored data.Description of which attributes have indexes is part of the metadata.
13A few words about indexes Similar to a book indexes.A book index associates words with page numbers where they appear.A database index associates values of some object field(s) with the physical address of the corresponding objects in the disk.Two are the main properties of an index:it is sorted, andits size is much smaller than the record set being indexed.Hence, searching in an index is much faster than searching in the corresponding record set.
14Storage Manager The job of the storage manager is to obtain data from the data storage, andmodify the data to the data storage when requested.Storage manager has two components:File manager handles files.Keeps track of the location of filesObtains block(s) of a file on request from the buffer manager.Buffer manager handles main memory.Obtains and returns blocks of data from/to the file managerStores blocks temporarily in main memory pages.1 block = 1 page = 4,000 to 16,000 bytes. Smallest unit of data that is read/written from/to disk.
15Query ProcessorQuery processor handles: queries and modifications to the data.Finds the best way to carry out a requested operation andIssues commands to the storage manager that will carry them out.E.g. A bank has a DB with two relat.: Customers (name, SIN, address),Accounts (accountNo, balance, SIN)Query: “Find the balances of all accounts of which Sally is the owner.”SELECT Accounts.balanceFROM Customers, AccountsWHERE Customers.SIN = Accounts.SIN AND Customers.name = 'Sally';
16Query Processor (Cont.) What this query logically says is:Make Cartesian product of tables specified in the FROM-clause,Choose from R the tuples satisfying the condition in the WHERE clause.Produce as answer only the values of attributes in SELECT-clause.If answer this query as it says the performance would be terrible.Because of the usually enormous Cartesian product.Suppose we haveIndex on name of Customer andIndex on SIN of Accounts.Then, query processor will cleverly create a plan which inexpensively:Retrieves the tuple for “Sally” and gets the SIN number.Retrieves the account tuples for this SIN number.
17Transaction ManagerTransaction manager is responsible for the integrity of the system. It must assure that:several queries running simultaneously do not interfere with each other and that,the system will not lose data even if there is a power failure.Transaction manager interacts with:query manager,Because it may need to delay certain query operations to avoid conflicts.storage managerBecause schemes for protecting data involve storing a log of changes to the data.
18Database Studies Design of databases. Database programming. What kinds of information go into the database?How is the information structured?How do data items connect?Database programming.How does one express queries on the database?How does one use other capabilities of a DBMS, such as transactions or constraints, in an application?How is database programming combined with conventional programming?Database system implementation.How does one build a DBMS, including such matters as query processing, transaction processing and organizing storage for efficient access?