Presentation on theme: "Introduction to Databases"— Presentation transcript:
1Introduction to Databases “When I use a word,” Humpty Dumpty said in rather a scornful tone, “it means just what I choose it to mean - neither more nor less.” Lewis Carroll, Through the Looking Glass
2Class Outline What is data and why is it important? What is a database and database schema?What is a database management system?What is a database application and what are its components?What are the levels of database representation?What were the limitations of the systems that led to the development of the current relational database systems?What are various types of database systems?What is a table, file and record?
4Principles of Information Resource Management Organizational resources flow into and out of the organizationTwo types of major organizational resources: Physical resources, Conceptual resources (data & information)As scale of organization grows, it becomes increasingly difficult to manage by observation (i.e., reliance on conceptual resources)Conceptual resources can be managed just like physical resources or assets (e.g., employees, $$, equipment, widgets, etc.)Management of data & information means getting it before it’s needed, protecting it, assuring quality, and getting rid of it when no longer requiredManagement of data & information can be achieved only through organizational commitmentAdapted from McFadden, F.R. & Hoffer, J.A. (1994). Modern Database Management. Redwood City, CA:Benjamin/Cummings Publishing (p. 6)
5Information is a major organizational resource processingSurvey customers; invest in advertising; cut costs, expand product lineActionSales have dropped between July and AugustKnowledgeAverage/ July is 40Average/ Aug is 15Information(organized data)At present, computers are generally used to process data into information;humans must collect data, process information into knowledge and act upon it.Corollary: Poor data leads to errors that lead to bad decisions.if managers have good information (accurate, timely) , they are more likely to make sound, timely decisions that will have a positive impact on their businessDifficult to assess the monetary value of info Vs. tangible assets such as facilities, personnel whose value can be appraised with some precision.Survival of the fittest - companies making sensible decisions are more likely to survive and adapt to a changing environment more than those making non-sense decisions.What is data? What is information?At the simplest level, data is the letters, numbers, and words that describe our world.Data tends to be language and culturally dependent. No one has effectively developed a means to using data that is meaningful outside of languages or context. Closest would be pictures, audio, and video-- but these also have language/context issues.Key is to view data as information. Information is useful knowledge (who, what, where, when, what, how, and why) described by data.What are databases?A place to store data--nice but too simplistic, should have a good reason. Most often driven by a corporate need to store AND access information. Need = profit...often driven by productivity, sales, efficiency, etc.. Databases should be seen as places to store and access information critical to operation and function of an organization. Storage is critical, data can be large and cumbersome-- does not lend itself to being managed easily. Access is as critical. Anyone must be able to access the data (theoretical), reality is that many depend on a few to do it (most often using applications created by a few). Hardware has always driven our ability to implement and use databases--look at history:Scrolls, paper, and books (more reliable than we give them credit for); Punch cards (easy to drop); Tape (linear and slow by today's standards); Giga- and Tera- byte storage, networks, the InternetWhat is a data warehouse? Besides a popular buzzword.Typically implies a corporate repository of information that furthers a corporate function (e.g., delivery of information and or goods). Also implies ability to track history, determine trends-- depends on the ability to understand and analyze the data. A great deal of hardware, software, and marketing hype about data warehousing.Jane bought 30 in JulyJane bought 20 in AugJohn bought 50 in JulyJohn bought 10 in AugData (isolated facts)
6What is a Database?Organized collection of related information or data stored on a computer disk for easy, efficient usegenerally contains static (products we sell) as well as dynamic information (pruchases) called transactions which represent eventsdatainformation
7What is a Database Management System (DBMS)? “A set of programs used to define,administer, and process the database and its applications conveniently and efficiently”Program (or collection of programs) that enables users to create the database. The DBMS manages the storage and retrieval of data, and provides the user with certain functionalities to guarantee that the data will be logically organized and consistently applied.Oracle, Ingress, Focus, Paradox Revelation, MDBS, Helix, etc are RDBMSThe ‘heart’ of the database systemDatabaseDBMSDatabase Applicationuser(e.g., Oracle, dBase, Access, Paradox)
8What is a Database Application? A computer program that performs a specific task of practical value in a business situationAn interface that allows the user to enter and manipulate data; User can request abstract views of dataCreated by database designers and developers using a DBMS program or a programming languageDBMSDatabase application
9Major Components of a Database Application 1. Form- data entry2. Report- summarizes & prints3. Query- asks questions of data4. Menu - organizes componentsapplication programs are written in a language that is specific to the DBMS or in a standard language that interfaces with the DBMS through a predefined program interface. Access supports VBAccess allows the creation of application without any other products and stores this info with the database.Access DBMS contains the design tools subsystem to create forms, queries and reports. Access also has VB, a version of Basic programming languagequeries can be saved5. Program - used to automate a database
10Features of a DBMS DBMS developer DBMS Engine users Design Tools SubsystemTable Creation ToolForm Creation ToolQuery Creation ToolReport Creation ToolProcedural Language CompilerApplicationprogramDBMSEngineDatabaseuser datametadataindexesapplication metadatausersIntermediary between design tools and run-timesubsystems and the data – Translates requests for data into OS commandsControl access (concurrency, integrity,security) – Transactions, locking, backup, recoveryRun Time SubsystemForm ProcessorQuery ProcessorReport WriterProcedural Language RunTimeApplicationprogram
11Types of Database Systems Centralized (single site)microcomputer (desktop)legacy mainframe/ mini computer (1 CPU)client/server architecture (>1 CPU)Distributed>1 site, requires networknot widely adapted yet due to many problemsour focus; centralized, micro-computer databasemiddle to late 80s, end users began to connect their microcomputers using local area networks (LANS) which led to the development of multi-user database applications on LANSLAN based multi-user architecture is different from mainframe databases. With a mainframe, only 1 CPU is involved but in a LAN many CPUs can be simultaneously involved - advantageous (greater performance) and more problematic (co-ordinating CPUs), it led to anew style of multi-user database processing called client-server database architecture.Simple but less robust processing is file-sharing architecture - ok for small groups but larger groups require client-server processingDistributed databases: all of the organization's data are spread over many computers - micros, LAN server and mainframes that communicate with one another. the goal is to make it appear to each user that she is the only user and provide same consistency, accuracy and timeliness that if no one else were using the systemresearch for >25 years, but not yet feasibleproblem of security and control with hundreds of concurrent userscoordinating and synchronizing processing can be difficult; if one user downloads and updates part of the database, how does the system prevent another user to use the version on the mainframe in the meanwhile? MS is developing MS Transaction Server (MTS)Two types: homogeneous (one type of DBMS), heterogeneous (>1 type of DBMS)
12Three levels of Database Representation database design, logical, abstract description of data elements & their relationshipsConceptual levelphysical implementation - access methods, index construction, data structure; database exists in reality only hereInternal levelExternal leveleach user group will have its own view of the database; database is accessed from hereThe distinction between logical and physical representation of data was officially recognized in 1978 when ANSI/SPARC committee proposed a general framework for database systems of three-level architecture.conceptual design involves analysis of users’ information needs and definition of data items needed to meet them. The result of conceptual design is the conceptual schema, a single, logical description of all data elements and their relationshipsexternal level consists of the user views of the database; each definable user group will have its own view of the database. Each of these views gives the a user-oriented description of the data elements and relationships of which the view is composed. It can be derived directly from the conceptual schema. The collection of all such provides the physical view of the databaseinternal level - provides the physical view of the database - the disk drives, physical addresses, indexes, pointers, etc. which physical devices will contain the data, what access methods will be used to retrieve and update the data and what measures will be taken to maintain or improve database performance..The implementation of three levels requires the DMMS to “map”: or translate from one level to another.Primary focus of the lectures of this course is the conceptual level because the creation of a database begins with its design; the focus of the laboratories is the external level, using a RDBMS, which manages the internal level.
13Focus of this course Lectures Conceptual design of databases: determining their purpose, developing a model, identifying the tables that are required, designing normalized tables and identifying their relationship to one another.LaboratoriesImplement a database at the external level: create databases (tables) and database applications (queries, forms, reports, programs) using a typical microcomputer relational database management system, MS Access 97.
14The Database System Environment Hardware - physical devicescomputer, peripherals, network devicesSoftwareDBMS (manages the database)operating systems software (manages hardware & software)application programs (user access and manipulate database)Peoplesystem administrators (manage general operations)database designers (architects of database structure)database administrators (ensure the database is functioning)systems analysts & programmers (design & implement database)end users (use application programs)Procedures - rules of the company governing use of dataDatayou are herenetworks important for airline reservation system, automatic teller machines, etc.operating system: DOS, Windows and OS/2 for micros; Unix and VMS for minicomputers and MVS by IBM mainframessoftware - Oracle, DB2 (IBM)app programs - to access data, generate reportsdesigners very imp because analysts and programmers cannot create a good program based on a poor foundationprocedures - instructions that rule the design and use of db - ensure that there is an organized way to monitor and audit the data and info that is generated from datadata - facts stored in db; how this data is to be organized is designer’s job
15In the beginning…(in the 1950s) …There were no databases. Just file (or data processing) systems.File systems were typically organized by function (use)The first data management systems performed clerical tasks (transactional processing) such as order entry processing, payroll, work scheduling.e.g., files for patients (file folder analogy); each record for a single patient; another file for appointment/ billing informationName: Jane DoeAddress: 123 Easy St.City: LondonPhone:file systems directly access files of dataDate: Sept 14, 1955Time: 2:00 p.m.Patient: Jane Doe,OHIP:
16Limitations of Data File Systems Customer processing ApplicationCustomer fileOrder processing ApplicationOrder fileWorked adequately if data collection needs were relatively small.Problems arose as data files, information needs, and reporting requirements grow in complexity due to:Extensive programming - use of third-generation languages (e.g., COBOL, FORTRAN) in which the programmer must specify what is be done as well as how it is to be doneprogram/data dependence - if customer record is modified to expand zip code field from 5 to 9 digits, all programs using that customer record must be modified even if they don’t use zip code.program dependency - format of a file processed by a COBOL program is different from the format of a VB or C program. thus they can’t be combined or comparedevery file reference in a program requires the programmer to use complex coding to match the data characteristics and to define the precise access paths to the various file and system components - as systems become more complex, the the access paths become difficult to manage and produce system malfunctionsstructural dependence - because all data access programs are subject to change when the file structure changes
17Limitations of Data File Systems Poor mechanisms for sharing data across organization - files are often incompatible with one another (separate, isolated data)Data redundancy - duplicate information in two or more filesProgram/ data dependence - if the file structure changed, ALL programs using the file had to be modified - time-consumingLack of flexibility - could not do ad hoc queries or reports; required separate programs for every report or queryPoor security - difficult to program, therefore, often omittedDifficulty of representing data in the users’ perspectivedata redundancy - very serious - credibility of stored data in questiondata inconsistency = mis-keying, update, delete, invalid datadata elements with multiple names: e.g.,“account” has different meanings depending upon whether the context is a loan vs. savingsdifferent terms to same element: e.g., insurance company uses “policy”and “case” interchangeably
18Historical Roots of Database Systems DBMSCustomer processing ApplicationOrder processing ApplicationEmployee processing Application`Developed to overcome limitations of file systems, developed initially on mainframe computers in late 60s and early 70s - a typical early DBMS cost $100,000 (many are still in use)First general databases were created for General Electric Company (GEC) - Integrated Data Store (IDS), designed to run on GEC machines; B.F. Goodrich ported IDS to IBM became dominant until 1980sAs PCs gained popularity (1980s), single-user, personal databases developed; at present, most database technology is used in workgroupscompare with Access now at $130!2000x fold increase in the number of orgs using DBMS products in NAFile-processing systems directly access files of stored data. In contrast, database-processing programs call the DBMS to access the stored data. This difference is significant because it makes application programming easier; application programmers do not have to be concerned with the ways in which data are physically storedIn a database system, all the application data is stored in a single facility called the database. An application program can ask the DBMS to access customer data or sales data or both.. The application programmer specifies only how the data are to be combined and the DBMS performs the necessary operations to do it.
19Better Definition of a Database A collection of users’ data, organized logically and managed by a unifying set of principles, procedures, and functionalities, which help guarantee the consistent application and interpretation of that data(a) organized collection of related information or data stored on a computer disk for easy, efficient use; represented in tabular format
20Better Definition of a Database (cont'd) (b) A database is self-describing (metadata or system catalogues or data dictionary)A database contains a description of its own structure (e.g., the names of all the tables, the names and types of data in each column in all the tables)a description of its own structure: similar to a library - a self-describing collection of books. In addition to books, a library contains a card catalog describing them. in the same way, the data dictionary (which is part of the database, just a the card catalog is part of the library) describes the data contained in the databasethis is important because we don’t need to maintain external documentation of the file and record formats (as is done in file-processing systems)Second, if we change the structure of the data in the database (such as adding new data items to an existing record) we enter only that change in the data dictionary. Few if any programs will need to be changed. In most cases, only those programs that process the altered data items must be changed.Kroenke, D.M., Database Processing: Fundamentals, Design & Implementation, Prentice Hall, 1998
21Better Definition of a Database (cont'd) (c) Indexes are stored with the databaseData accessed from a source table for sorting and searching is time-consuming without a “pointer” system, which improves performance and accessibility of the databaseThe “overhead cost” of indexing is that each time data is updated, all indexes must also be updated, therefore, reserve index for cases in which they are neededNote the difference between the "design" and "implementation" definitions of a key. A design key identifies a unique row, an implementation key is used to construct indexes for increasing access and sorting speed of data.(d) Application Metadata - stores structure and format of application components; not all DBMS support this feature
22Evolution of Database Models Hierarchicalstill in use in many older (1970s) legacy systems; very few new databases; referred to “navigational systems”Networkthe vast majority currently use this, therefore, our course’s focus is hereRelationalModeling the world around us is an inherently human activity, whether it is building ships in bottles, crafting dolls and their houses, or drawing a map. The process of creating a model is an attempt to capture the essence of things both concrete and abstract, to make order of the chaos inherent in the world around us. It is no different for those of us who work within the abstract world of computer systems - in order to understand and control a system's size and complexity, we must reduce it to a model that we can fit our brains around.OOP - object oriented programming began to be used in late 80s; considerably more complex than other structures; difficult to store in existing relational DBMS products, so new category of DBMS is evolvingOOP is difficult to use, very expensive to develop applications. organizations unwilling to bear the cost and risk required to convert millions or billion bytes of data already organized in relational database.most OOP developed to support engineering applications; and they do not have features and functions that are appropriate or readily adaptable to business information applicationsOODMS are likely to occupy a niche in commercial information systems applicationsSemanticVery few new databases are being created using Object-Oriented Programming (not many ODBMS for businesses to implement this model)Object-RelationalObject-Oriented
23The Relational Database Model AgentsClientsEntertainersInstrumentsEngagementsEntertainer stylesrepresented by tables (like spreadsheets)tables are NOT linked with physical pointersunlike earlier systems, all three types of relationships can be representedaccommodates the design of larger databases that involve complex relationships and intricate manipulationsAn agent represents a number of clients and entertainers. Furthermore, clients and entertainers are associated with each other through the Engagements tables, since a client will hire any number of entertainers and an entertainer will perform for any number of clients. An entertainer may play one or more musical instruments which is reflected in the entertainer styles tableWhy are relational databases so important?Can effectively model the actual organization and how information (data) is used.Based on the idea that people and organizations function based on relationships to efficiently conduct work.Also information is often easy to represent by relationships to other information.Information can be described by it's structure (entities, tables) which can be decomposed (attributes, fields).Relational Model - based on the use of relations, tuples, and attributes to capture/manage data. Three characteristics: 1) cells (fields) must be single-valued, 2) all columns must be of the same type "thing", 3) no two rows (records) can be the same and the order of all rows is irrelevant.
24Evaluation of the Relational database model But #1 problem still isAdvantagesmechanisms for minimizing data redundancy and inconsistencylogical database design is separated from physical aspectsrelatively program-data independentmanagement of data for access, manipulation, and securityflexible mechanisms for generating reports and queriesprogram development and maintenance costs are reduceddata can be accessed in a multiplicity of ways within and amongst organizationsDisadvantagesease of use - many untrained people create and use databases without considering its design - usually incorporate many errorsParadox, MDBS, Helix, dBase- developed for microcomputersOracle, Focus and Ingress - ported down to PCs
25Comparison of Database models File Systemsdata dependencestructural dependencedemands upon programmerHierarchical, Network DBMSdata independencestructural dependencedemands upon programmerRelational DBMSdata independencestructural independencedemands upon computer
26Table Users view their data in two-dimensional tables. table = file = relation
27Field The fields within records contain data. Data within a field must be of the same data type. Each field within a table must have a unique name. Order of fields is unimportant.column = field = attribute
28RecordA record is a group of related fields of information about a single instance of one object or event in a database.Tables consist of zero, one, or more records.Order of rows is unimportant.row = record = tuple
29Database SchemaDatabase schema defines database’s structure, tables, relationships, domains, and constraint rulesTablesBOOK (ISBN, Title, AuthID, PubID, Price)PUBLISHER (PubID, PubName, PubPhone)AUTHOR (AuthID, AuthName, AuthPhone)RelationshipsEach book is published by one and only one publisherEach publisher publishes one or more booksDomains (set of values in a column)Physical description (e.g., set of integers 0 < x < 99999)Constraints (business rules)Price cannot be less than zero; Author phone field cannot be left blank