Lecture 2.01 Data Modeling I: XML & XSLT Marc Dumontier Blueprint Initiative Samuel Lunenfeld Research Institute Mount Sinai Hospital Toronto, ON.

Lecture 2.01 Data Modeling I: XML & XSLT Marc Dumontier Blueprint Initiative Samuel Lunenfeld Research Institute Mount Sinai Hospital Toronto, ON

Lecture 2.02 Data Modeling I: XML & XSLT Data Modeling Concepts Extensible Markup Language (XML) Extensible Stylesheet Language Transformations (XSLT)

Lecture 2.03 Data Modeling Concepts Data modeling involves considering how to represent data objects within a system, both logically and physically. Tools are processes which act on data. Many tools in the bioinformatics field are used in a pipeline to produce meaningful conclusions. example: given a list of GIs, retrieve the Gene Ontology terms, and generate an XML document, which is then transformed for difference presentation such as PDF and HTML using XSLT.

Lecture 2.04 File Formats The format of your input data will be the largest factor in the design and architecture of your tool. Complex data must be distributed in a format that is based on a specification or model which sets the constraints and organizational structure. A good file format is self-documenting. Parsing structured data Validation of data

Lecture 2.05 File Formats (ASN.1)

Lecture 2.06 File Formats (GO Flat File)

Lecture 2.07 File Formats (XML)

Lecture 2.09 Extensible Markup Language XML is a framework for defining markup languages Based on Standard Generalized Markup Language XML makes data portable Human readable Unicode – character encoding (16-bit characters) XML Schema/Document Type Definition (DTD) XML Namespaces Programmable interfaces: SAX/DOM XPath

Lecture 2.010 Extensible Markup Language

Lecture 2.011 Well Formed Document contains only one root element. Document contains one or more elements. Every start tag must have a corresponding end tag. Tags should be properly nested. Attribute values must be enclosed in quotes. Tag names must be valid XML names.

Lecture 2.012 Well Formed

Lecture 2.017 Definitions Tag: The words between are XML tags Element: The information from the start of a start tag to the end of an end tag Bioinformatics is fun Attributes: name/value pairs which are in tags PCDATA: Parsed Character Data This is some PCDATA

Lecture 2.018 Elements Naming rules –Start with letter or underscore –After the first character, numbers are allowed as well as “.” and “-” –Names can’t contain spaces –Names can’t contain “:” (XML Namespaces) –Names can’t start with the letters XML (any case)

Lecture 2.019 White Space Includes the space character, tabs, new lines (carriage return and line feed). No white space stripping (unlike HTML) Extraneous white space

Lecture 2.020 Comments Comments are nodes too!

Lecture 2.021 Processing Instructions Processing Instructions are used to provide information to an application. These are not a necessary part of an XML document, but the XML processor is required to pass them to an application. (can contain attributes)

Lecture 2.022 XML Namespaces An XML namespace is a collection of names, identified by a URI reference, which are used in XML documents as element and attribute names Avoid element name collisions. Allows for the combination of vocabularies into single XML documents. * URI: Uniform Resource Identifier

Lecture 2.023 XML Namespaces Define a namespace prefix and point it to a URI

Lecture 2.024 Entity References Can be thought of as a variable. Used to represent a special characters in text nodes. –< less-than sign –> greater-than sign –& ampersand –" quote (double) –' quote (single) Can even define your own!

Lecture 2.025 Character References Character references allow authors to enter any unicode character by their code positions. This is mostly useful for specifying international characters, as well as non-visible characters. means code position 160 (space). Unicode positions are the same as ISO-8859-1 (ISO Latin 1). You can use hexadecimals by prefixing position with an ‘x’.

Lecture 2.026 XML Authoring Software XMLSpy (http://www.altova.com)

Lecture 2.027 Software

Lecture 2.028 Extensible Markup Language Questions?

Lecture 2.030 Extensible Stylesheet Language http://www.w3.org/Style/XSL/ XSL is a family of recommendations for defining XML document transformation and presentation. It consists of three parts: –XSL Transformations (XSLT) –XML Path Language (XPATH) –XML Formatting Objects (XSL-FO)

Lecture 2.031 XSLT XSL XSLTXPathXSL-FO

Lecture 2.032 XSLT A language for transforming XML Allows for the extraction, simplification, and reorganization of XML documents without writing programs which use SAX or DOM. Transformations are defined by stylesheets. Transforms one XML document to another XML document, or other type of documents such as RTF,PDF (using XSL-FO), flat file, etc.

Lecture 2.033 XSLT Processors Transformations are generally run in 3 ways –Standalone XSLT processor –A client program such as IE/Netscape –Server-side such as a Java servlet, JSP page, or a PERL CGI. Xalan Java is an implementation of the W3C recommendation for XSLT and Xpath. http://xml.apache.org Java –jar xalan.jar –IN test.xml –XSL test.xsl

Lecture 2.034 Trees and Nodes In XSLT, an XML document is used in terms of a tree data structure which consists of nodes. There are different types of nodes such as elements, attributes, comments, processing instructions, namespaces, and text. Root node vs. root element.

Lecture 2.035 Trees and Nodes

Lecture 2.036 Creating a stylesheet A stylesheet is an XML document whose root element is “stylesheet” from the XSL namespace. A stylesheet contains zero or more template elements. Each template element is a rule which specifies a transformation. the “apply-templates” element processes all of the children of the current node, including text nodes

Lecture 2.037 Creating a stylesheet

Lecture 2.041 Templates Accessing node values is done through the use of the element. To insert spaces, and output literals, you can use is for shallow copy is for deep copy

Lecture 2.042 Match Patterns Match patterns are a subset of the XPath language. Match patterns have 3 parts: –Pattern Axis –Node test –Predicate Can be used in,, and “/” matches the root node “*” matches all element nodes

Lecture 2.043 Match Patterns “Protein” matches all elements. “Protein/GI” matches all elements that are children of a element. “//Protein” matches all elements at any depth. “.” matches the current node.

Lecture 2.044 Match Patterns – Pattern Axes A subset of XPath. There are 2 available in match patterns (XPath supports 13) –child –attribute Syntax of a step in a match pattern is: axis::node test [predicate] Default axis is the child axis Shortcut for attribute axis is “@”

Lecture 2.045 Match Patterns – Pattern Axes

Lecture 2.046 Match Patterns – Node Test Use the name of the node or the wild card “*” to select element nodes as well as node types. comment() – comment node node() – any type of node processing-instruction() – processing instruction text() – text node

Lecture 2.047 Match Patterns – Node Test

Lecture 2.048 Match Patterns – Node Test Found an element:

Lecture 2.049 Match Patterns – Node Test Found an element: Protein-list Found an element: Protein Found an element: Accession Found an element: GI Found an element: Taxon Found an element: Protein Found an element: Accession Found an element: GI Found an element: Taxon

Lecture 2.050 Match Patterns – Predicates Predicates contain XPath expressions, enclosed in the [ ] operator. Tests whether a condition is true. –Test the value of a text node or attribute –Test the existance of a child element –Test the position of a node in the node tree. Boolean conditions using the or operator “|”

Lecture 2.051 Match Patterns – Predicates The Last Protein is

Lecture 2.052 Match Patterns – Predicates The Last Protein is NP_002077

Lecture 2.053 Conditional Processing Create a test using and make processing conditional on that test. There is no “else” in XSLT. To make an if/else statement, use,, and

Lecture 2.054 Conditional Processing

Lecture 2.056 Conditional Processing To loop through a node set, use The expression in the select attribute must evaluate to a node-set

Lecture 2.058 XPath XML Path Language is a language for addressing parts of an XML document, designed to be used by both XSLT and XPointer. The language mainly consists of location paths and expressions Specification available at http://www.w3.org/TR/xpath

Lecture 2.059 XPath - Datatypes There are four data types. –Node Sets –Booleans –Numbers –Strings

Lecture 2.060 XPath – Node Sets Can be zero, one, or more nodes. XPath expressions that return node sets are called location paths. Functions which operate on Node Sets are –count(node-set) –last() –local-name(node-set) –name(node-set) –position()

Lecture 2.061 XPath – Node Sets

Lecture 2.062 XPath - Numbers Numbers are stored in double floating point format. The following operators can be used with numbers: +, -, *, div, mod Functions which operate on Numbers are: –ceiling() –floor() –round() –sum()

Lecture 2.063 XPath - Numbers

Lecture 2.064 XPath - Strings String support is not extensive in XSLT. Functions which operate on Strings are: –concat(string string1, string string2, …) –contains(string string1, string string2) –starts-with(string string1,string string2) –string-length(string string1) –substring(string string1, number offset, number length) …

Lecture 2.065 XPath - Strings

Lecture 2.066 XPath - Booleans Evaluate to either true or false. Numbers are false if equal to zero. Strings are false if empty. Node Sets are false if empty. Operators are !=,, >= Boolean operators ‘and’, ‘or’, ‘not’

Lecture 2.067 XPath - Booleans

Lecture 2.068 Creating XPath Location Paths A Location Path is a set of Location Steps. Each location step is separated by either ‘/’ or ‘//’. To Start from the root element, start the Location Path with ‘/’ (absolute path). Otherwise, The Location Path starts from the context node (relative path). A Location Step is made up of an axis, node test, and zero or more predicates.

Lecture 2.069 Creating XPath Location Paths Examples –Protein-list –Protein-list/*/Accession –/Protein-list/Protein/Taxon/[@taxid = 9606] –/Protein-list/Protein[position() = 1]/GI –//GI –.. –Protein[Accession] –Protein-list//Taxon/[@taxid != 0]

Lecture 2.070 XPath Location Steps: Axes ancestor – all parents of the context node. ancestor-or-self. attribute – attributes of context node. child – children of context node. descendent – all children,grand-children, etc of context node. descendent-or-self. following – every node in the document which come after context node (document order).

Lecture 2.071 XPath Location Steps: Axes following-sibling – all siblings of context node that are following (sibling = same depth). namespace – namespace nodes or context node. parent – parent of the context node. preceding – all nodes that came before the context node (document order) preceding-sibling – all siblings of context node that came before (document order). self – context node.

Lecture 2.072 XPath Location Steps: Axes Examples –child::GI –descendant::Accession –/child::Protein- list/child::Protein/child::Taxon/attribute::taxid –ancestor::node()

Lecture 2.073 XPath Location Steps: Node Tests Use the names of nodes or wildcard ‘*’. Can select based on type of node –comment() –node() –processing-instruction() –text()

Lecture 2.074 XPath Location Steps: Node Tests

Lecture 2.075 XPath Location Steps: Predicates A conditions which evaluates to true or false. Short cut: [3] = [position() = 3] Examples –Test for the existence of nodes. –Test for the value of strings in attributes or text nodes. –Test for the position of that node.

Lecture 2.076 Named Templates A template can be given a name, and then explicitly called using. Parameters can be passed to templates this way much like functions are called in programming languages. Pass parameters using. Declare variables in templates using.

Lecture 2.077 Named Templates

Lecture 2.078 Variables Variables are used to store values. Any XPath data type can be stored. The value cannot be changed once set. To access a variable from Xpath, prefix variable name with “$” To create a variable, use Can be used as a top-level element (global scope)or inside a template body (local scope)

Lecture 2.079 Variables

Lecture 2.080 XSLT Questions?

Lecture 2.01 Data Modeling I: XML & XSLT Marc Dumontier Blueprint Initiative Samuel Lunenfeld Research Institute Mount Sinai Hospital Toronto, ON.

Similar presentations

Presentation on theme: "Lecture 2.01 Data Modeling I: XML & XSLT Marc Dumontier Blueprint Initiative Samuel Lunenfeld Research Institute Mount Sinai Hospital Toronto, ON."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Lecture 2.01 Data Modeling I: XML & XSLT Marc Dumontier Blueprint Initiative Samuel Lunenfeld Research Institute Mount Sinai Hospital Toronto, ON.

Similar presentations

Presentation on theme: "Lecture 2.01 Data Modeling I: XML & XSLT Marc Dumontier Blueprint Initiative Samuel Lunenfeld Research Institute Mount Sinai Hospital Toronto, ON."— Presentation transcript:

Similar presentations

About project

Feedback