Presentation on theme: "Introduction to Databases"— Presentation transcript:
1Introduction to Databases Data OrganisationDefinitionData modellingSQLDBMS functions
2Basics of data Organisation: DATA HIERARCHY (four categories)Fields = represent a single data itemRecords = made up of a related set of fields describing one instance of an entityFile / Table = a set of related records - as many as instances (occurrence) in the setDatabase = a collection of related files
3Example of data structure FieldsName First name TelephoneSampras PeteHealy MargaretClinton BillHenry ThierryRecords+ Other files =>complete dataStructure = DBFile / Table
4Database: Definition."A collection of interrelated data stored together with controlled redundancy, to serve one or more applications in an optimal fashion; the data is stored so that it is independent of the application programs which use it; a common and controlled approach is used in adding new data and in modifying existing data within the database."
5Definition - closer look A collection of interrelated data stored togetherwith controlled redundancyto serve one or more applications in an optimal fashionthe data is stored so that it is independent of the application programs which use ita common and controlled approach is used in adding new data and in modifying existing data within the database.
6Advantages of Databases: data are independent from applications - stored centrallydata repository accessible to any new programdata are not duplicated in different locationsprogrammers do not have to write extensive descriptions of the filesThese save enough money and time to offset the extra costs of setting and maintaining DBs
7Disadvantages of DBs: Data are more accessible so more easily abused Large DBs require expensive hardware and softwarespecialised / scarce personnel is required to develop and maintain large DBsPeople / business units may object to “their” data being widely available in a DB
8Characteristics of DBs… High concurrency (high performance under load)Multi-user (read does not interfere with write)Data consistency – changes to data don’t affect running queries + no phantom data changesHigh degree of recoverability (pull the plug test)
9ACID test Atomicity Consistency Isolation Durability All or nothing Preserve consistency of databaseTransactions are independentOnce committed data is preserved
10DataBase Management System (DBMS): program that makes it possible to:createusemaintain a databaseIt provides an interface / translation mechanism between the logical organisation of the data stored in the DB and the physical organisation of the data
11Using a database: Two main functions of the DBMS : Query language - for people who are not programmer (greatest advantage of DB)Data manipulation language - for programmers who want to modify the links between data elements within the DBAlso, Host Language - the language used by programmers to develop the rest of the application - eg: Visual Basic for Applications (VBA) / Oracle developer 2000
12Different types of DBs: creating the DB = specifying the links between data itemsdifferent types of relationships can be specified - ie different logical viewsthey correspond to three main types of DBMSs:Hierarchical DBsNetwork DBsRelational DBsObject Oriented DBs
13Hierarchical DBs:data item are related as “Parent” and “Child” in a tree-like structure“parent” means data item is higher in the tree than “child” and connected to itone “parent” can have more than one “child”, but one “child” can only have one “parent”most common platform = IBM’s Information Management System (IMS)
14Example Customers Payments Orders Currency Items Unit of packaging Substitution ProductVery fast retrieval
15Undesirable side effects: Insertion of record:dependent record cannot be added without a parenteg: units of packaging cannot be added without linkage to an existing itemDeletion of record:deletion of a parent deletes all childrendeleting an existing item will delete its replacement itemsImpossible to have two parents = trouble
16Network DBs:same as parent and children in Hierarchical DB, but children can have more than one parentIt is also possible to link items upwards to other items parentspractically, it means that the DBMS is more flexible for data retrieval
17Example Suppliers Customers Payments Orders Currency Items Unit of packagingSubstitution Product
18Relational DBs: Data items stored in tables Specific fields in tables related to other field in other tables (joint)infinite number of possible viewpoints on the data (queries)Highly flexible DB but overly slow for complex searchesOracle, SyBase, Ingres, Access, Paradox for Windows...
19Describing relationships Attempt at modelling the business elements (entities) and their relationships (links)Can be based on users’ descriptions of the business processesSpecifies dependencies between the data itemsCoded in an Entity-Relationship Diagram (ERD)
20Types of Relationships one-to-one: one instance of one data item corresponds to one instance of anotherone-to-many: one instance to many instancesmany-to-many: many instance correspond to many instancesAlso some relationships may be:compulsoryoptional
21Example Student registering system What are the entities? What type of relationship do they have?Draw the diagram
23Next step - creating the data structure Few rules - a lot of experienceCan get quite complex (paramount for the speed of the DB)Tables must be normalised - ie redundancy is limited to the strict minimum by an algorithmIn practice, normalisation is not always the best
24Data Structure Diagrams Describe the underlying structure of the DB: the complete logical structureData items are stored in tables linked by pointersattribute pointers: data fields in one table that will link it to another (common information)logical pointers: specific links that exist between tablesTables have a keyIf an attribute seems to belong to a relationship rather than an attribute, it may mean an associative entity must be added
28Some test questions Is it a bird is it a plane? Is it an entity or an attribute?
29NormalisationProcess of simplifying the relationships amongst data items as much as possible (see example provided - handout)Through an iterative process, structure of data is refined to 1NF, 2NF, 3NF etc.Reasons for normalisation:to simplify retrieval (speed of response)to simplify maintenance (updates, deletion, insertions)to reduce the need to restructure the data for each new application
30First Normal Formdesign record structure so that each record looks the same (same length, no repeating groups)repetition within a record means one relation was missed = create new relationelements of repeating groups are stored as a separate entity, in a separate tablenormalised records have a fixed length and expanded primary key
31Second Normal Form Record must be in first normal form first each item in the record must be fully dependent on the key for identificationFunctional dependency means a data item’s value is uniquely associated with another’sonly on-to-one relationship between elements in the same fileotherwise split into more tables
32Third normal form to remove transitive dependencies when one item is dependent on an item which is dependent from the key in the filerelationship is split to avoid data being lost inadvertentlythis will give greater flexibility for the design of the application + eliminate deletion problemsin practice, 3 NF not used all the time - speed of retrieval can be affected
33Beyond data modeling Model must be normalised – purpose ? Outcome is a set of tables = logical designThen, design can be warped until it meets the realistic constraints of the systemEg: what business problem are we trying to solve? – see handout [riccardi p. 113, 127]Update anomaliesEach item should appear only once+ you ask many good questions
34Realistic constraints Users cannot cope with too many tablesToo much development required in hiding complex data structureToo much administrationOptimisation is impossible with too many tablesActually: RDBs can be quite slow!
35Key practical questions What are the most important tasks that the DB MUST accomplish efficiently?How must the DB be rigged physically to address these?What coding practices will keep the coding clean and simple?What additional demands arise from the need for resilience and security?
36Analysis - Three Levels of Schema External Schema 1External Schema 2External Schema …TablesLogical SchemaDiskArrayInternal Schema
374 way trade-offSecurityPerformanceEase of useClarity of code
38Key decisions Oracle offers many different ways to do things IndexesBackups…Good analysis is not only about knowing these => understanding whether they are appropriateFailure to think it through => unworkable modelParticularly, predicting performance must be done properlyOk on the technical side, tricky on the business side
39Design optimisation Sources of problems: Network trafficExcess CPU usageBut physical I/O is greatest threat (different from physical I/O)Disks still the slowest in the loopSolution: minimise or re-schedule accessAlso try to minimise the impact of Q4 (e.g. mirroring, internal consistency checks…)
40Creating links between the tables use common fields to join tables / queriesvery easy when data is properly normalisedGives total flexibility in terms of data retrievalMain strength of RDBs (SQL)
41Structured Query Language used for defining and manipulating data in Relational DBsaimed at:reducing training costsincreasing productivityimprove application portabilityincrease application longevityreduce dependency on single vendorsenable cross systems communicationIn practice, SQLs can be a bit different
42Querying RDBs with SQLuse a form of pseudo english to retrieve data in a view (which looks like a table)syntax is based on a number of “clauses”Select: specifies what data elements will be included in the viewFrom: lists the tables involvedWhere: specifies conditions to filter the dataspecific values soughtlinks between tables
43Example with one tablefind the name and address of customer number 1217
44Example with a rangefind the items which are priced between £50 and £15000
45Example with two tables find the rep name of all customers
46Example with two tables same for customer Robson only
47Use of a Search Condition - nested queries find the name and address of the customer who ordered order # 110
48Additional syntax Add computation in the “select” statement: select SUM(price)select AVG(price), MAX, MIN, COUNTSimplify comparisons with a BETWEEN clause and LIKE clause (with *, ?)Add sorting instruction after the where clauseORDER BY name (alphabetical)ORDER BY price (ascending)Provide aggregate information by grouping data:GROUP BY customer
49find contents (item# and description) of order 110:
50find the average price of the cars for sale find the average price of all orders taken so far by customer “Jones”
51find how much cash customer “Barry” has generated in total
52find the average price of all orders taken so far