Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Databases

Similar presentations


Presentation on theme: "Introduction to Databases"— Presentation transcript:

1 Introduction to Databases
“When I use a word,” Humpty Dumpty said in rather a scornful tone, “it means just what I choose it to mean - neither more nor less.” Lewis Carroll, Through the Looking Glass

2 Class Outline What is data and why is it important?
What is a database and database schema? What is a database management system? What is a database application and what are its components? What are the levels of database representation? What were the limitations of the systems that led to the development of the current relational database systems? What are various types of database systems? What is a table, file and record?

3 When do I use a Database program?

4 Principles of Information Resource Management
Organizational resources flow into and out of the organization Two types of major organizational resources: Physical resources, Conceptual resources (data & information) As scale of organization grows, it becomes increasingly difficult to manage by observation (i.e., reliance on conceptual resources) Conceptual resources can be managed just like physical resources or assets (e.g., employees, $$, equipment, widgets, etc.) Management of data & information means getting it before it’s needed, protecting it, assuring quality, and getting rid of it when no longer required Management of data & information can be achieved only through organizational commitment Adapted from McFadden, F.R. & Hoffer, J.A. (1994). Modern Database Management. Redwood City, CA:Benjamin/Cummings Publishing (p. 6)

5 Information is a major organizational resource
processing Survey customers; invest in advertising; cut costs, expand product line Action Sales have dropped between July and August Knowledge Average/ July is 40 Average/ Aug is 15 Information (organized data) At present, computers are generally used to process data into information; humans must collect data, process information into knowledge and act upon it. Corollary: Poor data leads to errors that lead to bad decisions. if managers have good information (accurate, timely) , they are more likely to make sound, timely decisions that will have a positive impact on their business Difficult to assess the monetary value of info Vs. tangible assets such as facilities, personnel whose value can be appraised with some precision. Survival of the fittest - companies making sensible decisions are more likely to survive and adapt to a changing environment more than those making non-sense decisions. What is data? What is information? At the simplest level, data is the letters, numbers, and words that describe our world. Data tends to be language and culturally dependent. No one has effectively developed a means to using data that is meaningful outside of languages or context. Closest would be pictures, audio, and video-- but these also have language/context issues. Key is to view data as information. Information is useful knowledge (who, what, where, when, what, how, and why) described by data. What are databases? A place to store data--nice but too simplistic, should have a good reason. Most often driven by a corporate need to store AND access information. Need = profit...often driven by productivity, sales, efficiency, etc.. Databases should be seen as places to store and access information critical to operation and function of an organization. Storage is critical, data can be large and cumbersome-- does not lend itself to being managed easily. Access is as critical. Anyone must be able to access the data (theoretical), reality is that many depend on a few to do it (most often using applications created by a few). Hardware has always driven our ability to implement and use databases--look at history: Scrolls, paper, and books (more reliable than we give them credit for); Punch cards (easy to drop); Tape (linear and slow by today's standards); Giga- and Tera- byte storage, networks, the Internet What is a data warehouse? Besides a popular buzzword. Typically implies a corporate repository of information that furthers a corporate function (e.g., delivery of information and or goods). Also implies ability to track history, determine trends-- depends on the ability to understand and analyze the data. A great deal of hardware, software, and marketing hype about data warehousing. Jane bought 30 in July Jane bought 20 in Aug John bought 50 in July John bought 10 in Aug Data (isolated facts)

6 What is a Database? Organized collection of related information or data stored on a computer disk for easy, efficient use generally contains static (products we sell) as well as dynamic information (pruchases) called transactions which represent events data information

7 What is a Database Management System (DBMS)?
“A set of programs used to define,administer, and process the database and its applications conveniently and efficiently” Program (or collection of programs) that enables users to create the database. The DBMS manages the storage and retrieval of data, and provides the user with certain functionalities to guarantee that the data will be logically organized and consistently applied. Oracle, Ingress, Focus, Paradox Revelation, MDBS, Helix, etc are RDBMS The ‘heart’ of the database system Database DBMS Database Application user (e.g., Oracle, dBase, Access, Paradox)

8 What is a Database Application?
A computer program that performs a specific task of practical value in a business situation An interface that allows the user to enter and manipulate data; User can request abstract views of data Created by database designers and developers using a DBMS program or a programming language DBMS Database application

9 Major Components of a Database Application
1. Form- data entry 2. Report- summarizes & prints 3. Query- asks questions of data 4. Menu - organizes components application programs are written in a language that is specific to the DBMS or in a standard language that interfaces with the DBMS through a predefined program interface. Access supports VB Access allows the creation of application without any other products and stores this info with the database. Access DBMS contains the design tools subsystem to create forms, queries and reports. Access also has VB, a version of Basic programming language queries can be saved 5. Program - used to automate a database

10 Features of a DBMS DBMS developer DBMS Engine users
Design Tools Subsystem Table Creation Tool Form Creation Tool Query Creation Tool Report Creation Tool Procedural Language Compiler Application program DBMS Engine Database user data metadata indexes application metadata users Intermediary between design tools and run-time subsystems and the data – Translates requests for data into OS commands Control access (concurrency, integrity, security) – Transactions, locking, backup, recovery Run Time Subsystem Form Processor Query Processor Report Writer Procedural Language RunTime Application program

11 Types of Database Systems
Centralized (single site) microcomputer (desktop) legacy mainframe/ mini computer (1 CPU) client/server architecture (>1 CPU) Distributed >1 site, requires network not widely adapted yet due to many problems our focus; centralized, micro-computer database middle to late 80s, end users began to connect their microcomputers using local area networks (LANS) which led to the development of multi-user database applications on LANS LAN based multi-user architecture is different from mainframe databases. With a mainframe, only 1 CPU is involved but in a LAN many CPUs can be simultaneously involved - advantageous (greater performance) and more problematic (co-ordinating CPUs), it led to anew style of multi-user database processing called client-server database architecture. Simple but less robust processing is file-sharing architecture - ok for small groups but larger groups require client-server processing Distributed databases: all of the organization's data are spread over many computers - micros, LAN server and mainframes that communicate with one another. the goal is to make it appear to each user that she is the only user and provide same consistency, accuracy and timeliness that if no one else were using the system research for >25 years, but not yet feasible problem of security and control with hundreds of concurrent users coordinating and synchronizing processing can be difficult; if one user downloads and updates part of the database, how does the system prevent another user to use the version on the mainframe in the meanwhile? MS is developing MS Transaction Server (MTS) Two types: homogeneous (one type of DBMS), heterogeneous (>1 type of DBMS)

12 Three levels of Database Representation
database design, logical, abstract description of data elements & their relationships Conceptual level physical implementation - access methods, index construction, data structure; database exists in reality only here Internal level External level each user group will have its own view of the database; database is accessed from here The distinction between logical and physical representation of data was officially recognized in 1978 when ANSI/SPARC committee proposed a general framework for database systems of three-level architecture. conceptual design involves analysis of users’ information needs and definition of data items needed to meet them. The result of conceptual design is the conceptual schema, a single, logical description of all data elements and their relationships external level consists of the user views of the database; each definable user group will have its own view of the database. Each of these views gives the a user-oriented description of the data elements and relationships of which the view is composed. It can be derived directly from the conceptual schema. The collection of all such provides the physical view of the database internal level - provides the physical view of the database - the disk drives, physical addresses, indexes, pointers, etc. which physical devices will contain the data, what access methods will be used to retrieve and update the data and what measures will be taken to maintain or improve database performance.. The implementation of three levels requires the DMMS to “map”: or translate from one level to another. Primary focus of the lectures of this course is the conceptual level because the creation of a database begins with its design; the focus of the laboratories is the external level, using a RDBMS, which manages the internal level.

13 Focus of this course Lectures
Conceptual design of databases: determining their purpose, developing a model, identifying the tables that are required, designing normalized tables and identifying their relationship to one another. Laboratories Implement a database at the external level: create databases (tables) and database applications (queries, forms, reports, programs) using a typical microcomputer relational database management system, MS Access 97.

14 The Database System Environment
Hardware - physical devices computer, peripherals, network devices Software DBMS (manages the database) operating systems software (manages hardware & software) application programs (user access and manipulate database) People system administrators (manage general operations) database designers (architects of database structure) database administrators (ensure the database is functioning) systems analysts & programmers (design & implement database) end users (use application programs) Procedures - rules of the company governing use of data Data you are here networks important for airline reservation system, automatic teller machines, etc. operating system: DOS, Windows and OS/2 for micros; Unix and VMS for minicomputers and MVS by IBM mainframes software - Oracle, DB2 (IBM) app programs - to access data, generate reports designers very imp because analysts and programmers cannot create a good program based on a poor foundation procedures - instructions that rule the design and use of db - ensure that there is an organized way to monitor and audit the data and info that is generated from data data - facts stored in db; how this data is to be organized is designer’s job

15 In the beginning…(in the 1950s)
…There were no databases. Just file (or data processing) systems. File systems were typically organized by function (use) The first data management systems performed clerical tasks (transactional processing) such as order entry processing, payroll, work scheduling. e.g., files for patients (file folder analogy); each record for a single patient; another file for appointment/ billing information Name: Jane Doe Address: 123 Easy St. City: London Phone: file systems directly access files of data Date: Sept 14, 1955 Time: 2:00 p.m. Patient: Jane Doe, OHIP:

16 Limitations of Data File Systems
Customer processing Application Customer file Order processing Application Order file Worked adequately if data collection needs were relatively small. Problems arose as data files, information needs, and reporting requirements grow in complexity due to: Extensive programming - use of third-generation languages (e.g., COBOL, FORTRAN) in which the programmer must specify what is be done as well as how it is to be done program/data dependence - if customer record is modified to expand zip code field from 5 to 9 digits, all programs using that customer record must be modified even if they don’t use zip code. program dependency - format of a file processed by a COBOL program is different from the format of a VB or C program. thus they can’t be combined or compared every file reference in a program requires the programmer to use complex coding to match the data characteristics and to define the precise access paths to the various file and system components - as systems become more complex, the the access paths become difficult to manage and produce system malfunctions structural dependence - because all data access programs are subject to change when the file structure changes

17 Limitations of Data File Systems
Poor mechanisms for sharing data across organization - files are often incompatible with one another (separate, isolated data) Data redundancy - duplicate information in two or more files Program/ data dependence - if the file structure changed, ALL programs using the file had to be modified - time-consuming Lack of flexibility - could not do ad hoc queries or reports; required separate programs for every report or query Poor security - difficult to program, therefore, often omitted Difficulty of representing data in the users’ perspective data redundancy - very serious - credibility of stored data in question data inconsistency = mis-keying, update, delete, invalid data data elements with multiple names: e.g.,“account” has different meanings depending upon whether the context is a loan vs. savings different terms to same element: e.g., insurance company uses “policy”and “case” interchangeably

18 Historical Roots of Database Systems
DBMS Customer processing Application Order processing Application Employee processing Application ` Developed to overcome limitations of file systems, developed initially on mainframe computers in late 60s and early 70s - a typical early DBMS cost $100,000 (many are still in use) First general databases were created for General Electric Company (GEC) - Integrated Data Store (IDS), designed to run on GEC machines; B.F. Goodrich ported IDS to IBM became dominant until 1980s As PCs gained popularity (1980s), single-user, personal databases developed; at present, most database technology is used in workgroups compare with Access now at $130! 2000x fold increase in the number of orgs using DBMS products in NA File-processing systems directly access files of stored data. In contrast, database-processing programs call the DBMS to access the stored data. This difference is significant because it makes application programming easier; application programmers do not have to be concerned with the ways in which data are physically stored In a database system, all the application data is stored in a single facility called the database. An application program can ask the DBMS to access customer data or sales data or both.. The application programmer specifies only how the data are to be combined and the DBMS performs the necessary operations to do it.

19 Better Definition of a Database
A collection of users’ data, organized logically and managed by a unifying set of principles, procedures, and functionalities, which help guarantee the consistent application and interpretation of that data (a) organized collection of related information or data stored on a computer disk for easy, efficient use; represented in tabular format

20 Better Definition of a Database (cont'd)
(b) A database is self-describing (metadata or system catalogues or data dictionary) A database contains a description of its own structure (e.g., the names of all the tables, the names and types of data in each column in all the tables) a description of its own structure: similar to a library - a self-describing collection of books. In addition to books, a library contains a card catalog describing them. in the same way, the data dictionary (which is part of the database, just a the card catalog is part of the library) describes the data contained in the database this is important because we don’t need to maintain external documentation of the file and record formats (as is done in file-processing systems) Second, if we change the structure of the data in the database (such as adding new data items to an existing record) we enter only that change in the data dictionary. Few if any programs will need to be changed. In most cases, only those programs that process the altered data items must be changed. Kroenke, D.M., Database Processing: Fundamentals, Design & Implementation, Prentice Hall, 1998

21 Better Definition of a Database (cont'd)
(c) Indexes are stored with the database Data accessed from a source table for sorting and searching is time-consuming without a “pointer” system, which improves performance and accessibility of the database The “overhead cost” of indexing is that each time data is updated, all indexes must also be updated, therefore, reserve index for cases in which they are needed Note the difference between the "design" and "implementation" definitions of a key. A design key identifies a unique row, an implementation key is used to construct indexes for increasing access and sorting speed of data. (d) Application Metadata - stores structure and format of application components; not all DBMS support this feature

22 Evolution of Database Models
Hierarchical still in use in many older (1970s) legacy systems; very few new databases; referred to “navigational systems” Network the vast majority currently use this, therefore, our course’s focus is here Relational Modeling the world around us is an inherently human activity, whether it is building ships in bottles, crafting dolls and their houses, or drawing a map. The process of creating a model is an attempt to capture the essence of things both concrete and abstract, to make order of the chaos inherent in the world around us. It is no different for those of us who work within the abstract world of computer systems - in order to understand and control a system's size and complexity, we must reduce it to a model that we can fit our brains around. OOP - object oriented programming began to be used in late 80s; considerably more complex than other structures; difficult to store in existing relational DBMS products, so new category of DBMS is evolving OOP is difficult to use, very expensive to develop applications. organizations unwilling to bear the cost and risk required to convert millions or billion bytes of data already organized in relational database. most OOP developed to support engineering applications; and they do not have features and functions that are appropriate or readily adaptable to business information applications OODMS are likely to occupy a niche in commercial information systems applications Semantic Very few new databases are being created using Object-Oriented Programming (not many ODBMS for businesses to implement this model) Object-Relational Object-Oriented

23 The Relational Database Model
Agents Clients Entertainers Instruments Engagements Entertainer styles represented by tables (like spreadsheets) tables are NOT linked with physical pointers unlike earlier systems, all three types of relationships can be represented accommodates the design of larger databases that involve complex relationships and intricate manipulations An agent represents a number of clients and entertainers. Furthermore, clients and entertainers are associated with each other through the Engagements tables, since a client will hire any number of entertainers and an entertainer will perform for any number of clients. An entertainer may play one or more musical instruments which is reflected in the entertainer styles table Why are relational databases so important? Can effectively model the actual organization and how information (data) is used. Based on the idea that people and organizations function based on relationships to efficiently conduct work. Also information is often easy to represent by relationships to other information. Information can be described by it's structure (entities, tables) which can be decomposed (attributes, fields). Relational Model - based on the use of relations, tuples, and attributes to capture/manage data. Three characteristics: 1) cells (fields) must be single-valued, 2) all columns must be of the same type "thing", 3) no two rows (records) can be the same and the order of all rows is irrelevant.

24 Evaluation of the Relational database model
But #1 problem still is Advantages mechanisms for minimizing data redundancy and inconsistency logical database design is separated from physical aspects relatively program-data independent management of data for access, manipulation, and security flexible mechanisms for generating reports and queries program development and maintenance costs are reduced data can be accessed in a multiplicity of ways within and amongst organizations Disadvantages ease of use - many untrained people create and use databases without considering its design - usually incorporate many errors Paradox, MDBS, Helix, dBase- developed for microcomputers Oracle, Focus and Ingress - ported down to PCs

25 Comparison of Database models
File Systems data dependence structural dependence demands upon programmer Hierarchical, Network DBMS data independence structural dependence demands upon programmer Relational DBMS data independence structural independence demands upon computer

26 Table Users view their data in two-dimensional tables.
table = file = relation

27 Field The fields within records contain data.
Data within a field must be of the same data type. Each field within a table must have a unique name. Order of fields is unimportant. column = field = attribute

28 Record A record is a group of related fields of information about a single instance of one object or event in a database. Tables consist of zero, one, or more records. Order of rows is unimportant. row = record = tuple

29 Database Schema Database schema defines database’s structure, tables, relationships, domains, and constraint rules Tables BOOK (ISBN, Title, AuthID, PubID, Price) PUBLISHER (PubID, PubName, PubPhone) AUTHOR (AuthID, AuthName, AuthPhone) Relationships Each book is published by one and only one publisher Each publisher publishes one or more books Domains (set of values in a column) Physical description (e.g., set of integers 0 < x < 99999) Constraints (business rules) Price cannot be less than zero; Author phone field cannot be left blank


Download ppt "Introduction to Databases"

Similar presentations


Ads by Google