Presentation on theme: "1 Complex Data and XML. 2 Week 10 Review Trees and the places where tree structures are used in computing and knowledge representation Nomenclature in."— Presentation transcript:
1 Complex Data and XML
3 Tree and questions Look at the tree in the diagram Answer the following questions: –Name the root of the tree –Name the leaves of the tree –Using path notation in Windows or Unix What is the location of J relative to the root? What is the location of the root if you are at J? What is the location of K if you are at J?
4 A L D K NM J GB RC
5 The rationale for XML Complex data –StudentsOnlineStudentsOnline –Modules and Courses Export and import – self-documenting data –FlickrFlickr –TransXChangeTransXChange XML-based tools –XFormsXForms
6 Year 1 Oracle example Students on year 1 Database module modelled a much bigger data model concerning modules, students, staff and courses. Oracle SQL DDL was reverse-engineered to QSEE, then the code regenerated to MySQL, installed and executed. Query interface using PHP
7 Foreign key identifies Weak entity
8 Lessons Problems with data types and data formats ER model as Conceptual Model Foreign keys in QSEE Many of the entities are weak
9 Data types Data types differ between the two databases – QSEE uses a more general set of datatypes and then outputs DDL dependant on the target. –Data typing is a big problem in databases Data types too small for the data stored, so data might be lost Date formats differ in MySQL, 08-dec in Oracle
10 Conceptual ER model No foreign keys in the ER model. –foreign keys are the realisation (implementation) of the relationships defined in the ER Model in a RDBMS Case tool can then –removes redundancy - there is no point in defining both the relationship AND the foreign keys [one thing in one place] –it emphasises the conceptual nature of the model, which may be implemented in other ways –the data type of the primary key and all foreign keys must be the same - but you may need to change it so better to change it in one place and regenerate the schema –all the necessary integrity constraints are also generated
11 Foreign Keys in QSEE The bar across a relationship indicates that the foreign key is to be part of the primary key. –As well as creating a foreign key, the primary key constraint will be generated appropriately. A dotted boundary indicates that the whole primary key is made up of foreign keys Any entity in which a foreign key is part of the primary key is 'weak' - it depends for its existence on another entity. Foreign key naming –default - fk1_, fk2_ prefix –My policy is to remove the prefix globally set the prefix on specific relationships where required (manager, leader)
12 Foreign keys in RDBMS In MySQL, action depends on which file model is being used –ISAM (Index Sequential access Method) does not enforce foreign key integrity –INNODB does Loading data is a problem –Need to load data in the right order to prevent foreign key errors – roughly parent before child –Not possible if any cycles, so RDBMS allows it to be turned off for the loading –Special programs for bulk loading from files such as CSV– easier than using INSERT statements
13 Questions about the model What foreign keys would be added to the Course table in an RDBMS? To the Teaches table? Explain the optionality on the mgr relationship between staff and department? Is it on the right end? Which two tables would be impossible to put in order to ensure foreign key constraints are not violated?
14 Non-first normal form First normal form specifies that all fields must be atomic – single valued Look at the qualification table. The bar indicates that it is dependant on staff. If repeated fields were allowed in tables then we could simply add: qualifications : set Varchar to the staff table We would need to be able to search for individual qualifications Tags in the photo album site are a similar problem. Which other table could be removed if non-first normal form tables were allowed? MySQL has sets and operators to add, remove and search (but these all add complexity)
15 Harder cases What additional structure would be required to get rid of the nextOfKin table? How could we get rid of the teaches and undertakes tables? How would this data be handled in an Object-oriented model? If all the weak entities were removed, how much would this simplify the model?
16 NextOfKin –Requires a sub-record of name, phone and relationship which can be repeated Teaches –Decide which side this data belongs to and add a set of sub-records Which side would you place the data from teaches and undertakes?
17 Complex data Real data is much more complex than this ISIS the student record system has over 200 tables, many them like the 5 weak tables in the simpler model – so the same ratio might be assumed to apply Consider an actual module specificationmodule specification Note the number of repeated sub-records? How many tables would be required to model this data?
18 Communication Applications which stand alone and cannot interchange data with other applications are called silos An RBMS must export data as files of INSERT statements or as a comma separated variable file In both cases, you cant make any sense of the data without knowing the Database schema INSERT INTO teaches VALUES (11, 'UFCEMQ-20-1', 5); INSERT INTO teaches VALUES (11, 'UFEEHV-20-1', 5); INSERT INTO teaches VALUES (13, 'UFQEGC-20-1', 6); INSERT INTO teaches VALUES (8, 'UFCEKN-20-1', 10); 1, 'David', 'Johnson', '109 Mount View Road, Kingswood, Bristol BS15 8UB', ' ', '3A31', 'Male', 32000, 'Professor', 'dpjohnso', 'Applied Sciences' 2, 'Carrie', 'Ford', '15 Heron Way, Chipping Sodbury, Bristol BS37 6NT', ' ', '1C29', 'Female', 29000, 'Senior Lecturer', 'caford', 'Bristol Business School'
19 'BUWE B80 GF78', 'PgDIT', '3 Years', 13, 'CIS' 'BUWE B80 M221', 'Commercial Law', '3 Years', 7, 'Law' 'BUWE B80 B230', 'Pharmaceutical Sciences', '4 Years', 1, 'Applied Sciences' 'BUWE B80 G500', 'Business Decision Analysis', '4 Years', 9, 'Bristol Business School' 'BUWE B80 X300', 'Education Studies', '3 Years', 5, 'Education' 'BUWE B80 G451', 'Multimedia Computing', '4 Years', 10, 'Computing, Engineering and Mathematical Sciences' 'BUWE B80 L300', 'Sociology', '3 Years', 6, 'Humanities, Languages and Social Sciences' CREATE TABLE course( cCodeVARCHAR(30) NOT NULL, titleVARCHAR(50), durationVARCHAR(20), deptNameVARCHAR(50) NOT NULL, leader_staffNoNUMERIC(4,0) NOT NULL, -- Specify the PRIMARY KEY constraint for table "course". -- This indicates which attribute(s) uniquely identify each row of data. CONSTRAINTpk_course PRIMARY KEY (cCode) ) TYPE=INNODB;
20 XML XML is tagged data following some simple rules on the formation and relationship between tags – well-formed XML documents are trees and thus support non-first normal form data The tags in the document provide –some- information about the meaning of the data The tags and the common tree structure allow general tools to do much of the processing without a separate schema.
21 David Johnson 109 Mount View Road, Kingswood, Bristol BS15 8UB A31 Male Professor dpjohnso Applied Sciences PhD Staff as XML Optional processing instruction Root node element Element with children attribute value
22 Processes required for XML Store XML (native XML database) Query XML (XQuery) Transform XML into other XML, RDF or text (XQuery, XSLT) Transform structured text to XML (?) Define deeper semantics for an XML file (XSchema, RelaxNG) User Interface (XForms)
23 Applications of XML XHTML – HTML as well-formed XML which conforms to standard SVG – Structured Vector Graphics kml – GoogleEarth overlay files XForms – UI definition Flickr API responses RSS news syndication ….
24 Well-formed XML Well-formed – absolute property of an XML document – must obey rules for well-formedness Check by –Opening in Word 2003 –Submitting to an online validator –Check using an XML tool XML Spy Stylus Studio Oxygen Check out the rules in the Wikipedia entry
25 Well-formed XML Every XML document must be well-formed and must therefore adhere to the following rules (among others): 1.Every start-tag must have a matching end tag. 2.Elements may nest but must not overlap. Anna Coffey - Anna Coffey - × 3.There must be exactly one root element. 4.Attribute values must be quoted. 5.An element must not be quoted. 6.Comments and processing instructions may not appear inside tags. 7.No unescaped < or & signs may occur in the character data of an element.
26 Well-formed XML documents Element names are case sensitive -,, & are four different element types. No white spaces in element name - not allowed; OK. Element names cannot start with the letters XML or xml – reserved terms. Element names must start with a letter or a underscore. Element names cannot start with a number but numbers may be embedded within an element name - not allowed; is OK. Attribute names are constrained by the above rules for element names. Entity references are used to substitute specific characters. There are five predefined entities built into XML: EntityCharNotes &&Do not use inside processing instructions <
27 Spot the mistakes Davd Johnson 109 Mount View Road, Kingswood, Bristol BS15 8UB A Profesor dpjohnso Applied Sciences PhD
28 Valid Validity is relative - valid with respect to a specified schema which defines: –The sequence and nesting of elements –The allowable attributes of an element –Whether elements are optional or repeated –The type of data in an element or attribute Different languages for expressing these rules –DTD – Document Type Definition –XML Schema (xsd) – itself an XML vocabulary –Relax NG –Schematron – permissive Document can be linked to a specific schema –Too inflexible
29 Tutorial Review the exercises from this lecture Take the module specification and work out how many tables would be required to hold this data Take one of the TransXChange files and try to develop a data model.TransXChange –Use a browser to view the file, dont try to print it out. –Use find to follow references. Later –Read the wikipedia article on XMLXML –Work through the w3.schools tutorialw3.schools –XML dialects have developed in many fields of knowledge. Investigate the XML dialects which have been created in one particular area of interest e.g. Multimedia, Family History
30 Next Week Normalisation Exercise Processing Language Prize for the best Photo Album as judged by you and us.