Presentation on theme: "1 CSE 5095 Semantic Web Technologies and their usages in BMI Jing Liu CSE5095: Biomedical Informatics Spring 2011 Computer Science."— Presentation transcript:
1 CSE 5095 Semantic Web Technologies and their usages in BMI Jing Liu CSE5095: Biomedical Informatics Spring 2011 firstname.lastname@example.org Computer Science & Engineering Department University of Connecticut Storrs, CT 06269
2 CSE 5095Outline Semantic Web Overview Semantic Web Technologies RDF RDFS OWL SPARQL SWRL (RIF) Biomedical Informatics with Semantic Web Translational Research with Semantic Web Semantic PHRs Knowledge-Driven Querying of Biomedical Data
3 CSE 5095 Semantic Web Overview Example: why the Semantic Web? Search articles written by “Tim Berners-Lee” Returns millions of results, most of which will cite or refer to him
4 CSE 5095 Semantic Web Overview What is Semantic Web? Essentially, the Semantic Web is a web of data. It is about two things: It's about common formats for integration and combination of data drawn from diverse sources. It is also about language for recording how the data relates to real world objects. A set of technologies that supports identifying, representing, and reasoning across a wide of range of data.
5 CSE 5095 Semantic Web Overview Impact on areas: Information management and discovery tools Digital Libraries Support for interaction between virtual communities and collaborations E-learning methods and tools
6 CSE 5095 Semantic Web Stack Identifiers: URI Character Set: UNICODE Syntax: XML Data Exchange: RDF Taxonomies: RDFS Ontologi es: OWL Rules:SWRL/RIF Querying:SPARQL Unifying Logic Proof TrustCryptography User Interface and Applications
7 CSE 5095 Semantic Web technologies - RDF RDF stands for Resource Description Framework. RDF is a language for representing: information about resources in the World Wide Web. metadata about Web resources. Resources are things that can be identified on the Web, even when they cannot be directly retrieved on the Web. RDF can be processed and exchanged by applications. W3 Recommendations for RDF is at: http://www.w3.org/RDF/
8 CSE 5095 Identification and description in RDF RDF identifies resources using URIs It may be a URL, but not always Anything that can be named via a URI is a resource Resources are described in terms of simple properties and property values. A property is a resource that has a name. - e.g. Author, Title, Mailbox A property value is the value of the property. - e.g. mailto:email@example.com is the value of mailbox property - A property value can be another resource
9 CSE 5095 RDF statements RDF is intended to provide a simple way to make statements about Web resources Each statement, which is also called “triple”, consists of three parts: Subject: the thing the statement describes. Predicate: a specific property of the thing the statement describes. Object: the thing the statement says is the value of this property. Statements can be represented in RDF Graph: illustrates RDF’s conceptual model RDF/XML: an XML syntax for writing down and exchanging RDF graphs Notation 3
10 CSE 5095 RDF Example (1) “there is a Person identified by http://www.w3.org/People/EM/contact#me, whose name is Eric Miller, whose email address is firstname.lastname@example.org, and whose title is Dr."
11 CSE 5095 RDF Example (2) Eric Miller Dr. Eric Miller Dr. RDF/XML Describing Eric Miller:
12 CSE 5095 Semantic Web technologies - RDFS RDFS stands for RDF Schema. An extension of RDF. RDF Schema provides a higher level of abstraction than RDF. Describes classes Describes properties Describes relationships between classes and properties It allows resources to be defined as instances of one or more classes. Classes can be organized in a hierarchical fashion. RDFS provides important semantic capabilities that are used by enhanced semantic languages like DAML, OIL and OWL.
13 CSE 5095 RDFS: Describing Classes To say that ex:MotorVehicle is a class, write: ex:MotorVehicle rdf:type rdfs:Class. To create an instance of ex:MotorVehicle, write: exthings:companyCar rdf:type ex:MotorVehicle. Convention: class names start with an uppercase letter property and instance names are lowercase A resource may be an instance of more than one class. A Motor Vehicle Class
14 CSE 5095 RDFS: Defining Subclasses We might want to represent various specialized kinds of motor vehicle: ex:Van rdf:type rdfs:Class. ex:Truck rdf:type rdfs:Class. These statements only describe the individual classes. If we want to indicate their special relationship to class ex:MotorVehicle: ex:Van rdfs:subClassOf ex:MotorVehicle. ex:Truck rdfs:subClassOf ex:MotorVehicle.
18 CSE 5095 An Instance of ex:MotorVehicle xmlns:ex="http://example.org/schemas/vehicles"> … Two methods to create an instance of ex:MotorVehicle:
19 CSE 5095 RDFS: Describing Properties All properties in RDF are described as instances of class rdf:Property. e.g. exterms:weightInKg rdf:type rdf:Property. rdfs:range is used to indicate that the values of a particular property are instances of a designated class. e.g. ex:Person rdf:type rdfs:Class. ex:author rdf:type rdf:Property. ex:author rdf:type rdf:Property. ex:author rdfs:range ex:Person. ex:author rdfs:range ex:Person. rdfs:domain is used to indicate that a particular property applies to a designated class. e.g. ex:Book rdf:type rdfs:Class. ex:author rdf:type rdf:Property. ex:author rdf:type rdf:Property. ex:author rdfs:domain ex:Book. ex:author rdfs:domain ex:Book. rdfs:subPropertyOf is used to define a property hierarchy.
20 CSE 5095 Limitations of RDFS No standard for expressing primitive data types such as integer, etc. All data types in RDF/RDFS are treated as strings. No standard for expressing relations of properties (unique, transitive, inverse etc.) No standard for expressing whether enumerations are closed. No standard to express equivalence, disjointedness etc. among properties.
21 CSE 5095Ontologies RDFS is useful, but does not solve all possible requirements. Complex applications may have more requirements: characterization of properties identification of objects with different URIs disjointness or equivalence of classes construct classes, not only name them can a program reason about some terms and more…
22 CSE 5095 Semantic Web technologies - OWL OWL stands for Web Ontology Language. The current version of OWL, also referred to as “OWL 2”, was published in 2009. OWL is used to represent rich and complex knowledge about things, groups of things, and relations between things. The three sublanguages of OWL: OWL Lite OWL DL OWL Full In addition to RDF/RDFS tags, it also allows us to express equivalence, identity, difference, inverse, and transivity.
23 CSE 5095 3 Dialects in OWL OWL Full: an extension of RDF allows for classes as instances, modification of RDF and OWL vocabularies OWL DL: the part of OWL Full that fits in the Description Logic framework known to have decidable reasoning OWL Lite: a subset of OWL DL easier for frame-based tools to transition to easier reasoning Lite DL Full
24 CSE 5095 Two Syntaxes for OWL RDF/XML documents OWL is part of the Semantic Web OWL can be an extension of RDF RDF applications can parse OWL Abstract syntax easier to read and write manually corresponds more closely to Description Logics and Frames
25 CSE 5095 How is OWL Used Build an ontology Create the ontology Name classes and provide information about them Name properties and provide information about them State facts about a domain Provide information about individuals Reason about ontologies and facts Determine consequences of what was built and stated
26 CSE 5095 Creating Ontologies Information in OWL is generally in an ontology Ontology- “a branch of metaphysics concerned with the nature and relations of being” An ontology determines what is of interest in a domain and how information about it is structured An OWL ontology is just a collection of information, generally mostly information about classes and properties Ontology([name]...) Ontologies can include (import) information from other ontologies
27 CSE 5095 OWL Components What is an Instance? An instance is an object. It corresponds to a description logic individual. What is a Class? e.g., person, pet, car a collection of individuals (object, things,... ) a way of describing part of the world an object in the world (OWL Full) What is a Property? e.g., has father, has pet a collection of relationships between individuals (and data) a way of describing relationships between individuals
28 CSE 5095 Example Ontology Class(pp:old+lady complete intersectionOf(pp:elderly pp:female pp:person)) Class(pp:old+lady partial intersectionOf( restriction(pp:has_pet allValuesFrom(pp:cat)) restriction(pp:has_pet someValuesFrom(pp:animal)))) This ontology represents:” Every old lady must have a pet cat.”
29 CSE 5095 Semantic Web technologies - SPARQL SPARQL stands for Simple Protocol and RDF Query Language. A protocol: A way of communication between parties that run SPARQL queries. Defining a way of invoking the service. Bindings of a transport protocol for that goal. A standard RDF Query Language (QL) A standard query language in the form of expressive query against the RDF data model. Data access language. Graph patterns. Powerful than XML queries in some aspects.
30 CSE 5095 Semantic Web technologies - SWRL SWRL stands for Semantic Web Rule Language. SWRL is intended to be the rule language of the Semantic Web. SWRL includes a high-level abstract syntax for Horn- like rules. All rule are expressed in terms of OWL concepts (classes, properties, individuals). A proposal to combine ontologies and rules: Ontologies: OWL-DL Rules: RuleML Can work with reasoners.
31 CSE 5095 SWRL Human Readable Syntax In the SWRL syntax, a rule has the form: antecedent => consequent Both antecedent and consequent are conjunctions of atoms written a1 ∧... ∧ an. Variables are indicated using the standard convention of prefixing them with a question mark. Build-in relations that are functional can be written in functional notation. For example: parent(?x,?y) ∧ brother(?y,?z) ⇒ uncle(?x,?z) ?x = op:numeric-add(3,?z)
32 CSE 5095SWRLTab A development environment for working with SWRL rules in Protégé-OWL. It supports the editing and execution of SWRL rules. Extension mechanisms to work with third-party rule engines. Mechanisms for users to define built-in method libraries. Supports querying of ontologies.
35 CSE 5095 Translational Research with Semantic Web Biomedical researchers and health care practitioners work together to exchange ideas, information, and knowledge across organization, governance, socio- cultural, political, and national boundaries. A significant barrier to translational research is the lack of uniformly structured data across related biomedical domains. In applying research to cure and prevent diseases, an integrated understanding across subspecialties becomes essential.
36 CSE 5095 How can the Semantic Web help BMI? (1) The global scope of identifiers decreases the complexities caused by the proliferation of local identifiers. The Semantic Web technologies simplify the management and comprehension of relationships among the data. RDFS and OWL offer some relief to the burden of understanding data schemas. A well-designed ontology, the structure itself can help guide users towards its correct use.
37 CSE 5095 How can the Semantic Web help BMI? (2) RDFS and OWL are flexible, extendable, and decentralized. They support hierarchical relationships. Data built upon ontologies will be easier to link together than those that use ad-hoc solutions. Ability to do inference, classification, and consistency checking which will help avoid inappropriate diagnosis and treatment.
38 CSE 5095HCLSIG Health Care and Life Sciences Interest Group (HCLSIG) o HCLSIG was set up within the framework of World Wide Web Consortium. http://www.w3.org/wiki/HCLSIG o The mission of the Semantic Web for Health Care and Life Sciences Interest Group (HCLSIG) is to develop, advocate for, and support the use of Semantic Web technologies for biological science, translational medicine and health care. o Document use cases to aid individuals in understanding the business and technical benefits of using Semantic Web technologies.
39 CSE 5095 Task Forces and their goals BioRDF: Converting a number of life sciences data sources into RDF and OWL. Ontologies: Facilitating creation, evaluation, and maintenance of core vocabularies and ontologies. Drug safety and efficacy: Indentifying and addressing challenges Detecting, examining, and classifying signals of potential drug side-effect and adverse reactions. Data security and integrity. Facilitating electronic submissions. Adaptable clinical pathways and protocols (ACPP): Representing guideline and protocol and reasoning. Scientific publishing: Collecting publications, applying natural language to scientific text, developing tools.
40 CSE 5095 Semantic PHRs Current standard structures for PHRs XML schema of CCR of ASTM HL7’s CCD Disadvantages: XML-based PHR are document-centric-data, whereas health care data usage often is data- centric. Computation capabilities are not provided. Semantic PHRs were developed
41 CSE 5095 Semantic PHRs Develop personal health record ontology to describe the concepts of the domain in which PHRs take place. The complex elements transformed to OWL classes. Simple elements transformed to OWL data properities, Element-attribute relationships transformed to OWL data prosperities. The relationships transformed to class-to-class relationships To transform the XML schema to OWL-ontology
43 CSE 5095 Semantic PHRs In data storage, instance ontologies are presented by RDF-elements. XSLT (Extensible Style sheet Language) is used to transforms an XML document to RDF. Then we can query PHRs by query languages developed for RDF, e.g. by SPARQL.
44 CSE 5095 Semantic PHRs PHR instance ontology is to organize PHR instances according to the ontology.
45 CSE 5095 Semantic PHRs Summary Transforming an XML document into RDF/XML element.
46 CSE 5095 Knowledge-Driven Querying of Biomedical Data Martin et al represented an end-to-end knowledge- based system based on Semantic Web Technologies. Martin J. O'Connor, Ravi D. Shankar, Samson W. Tu, Csongor Nyulas, Dave Parrish, Mark A. Musen, Amar K. Das: “Using Semantic Web Technologies for Knowledge-Driven Querying of Biomedical Data”, AIME 2007: 267-276
47 CSE 5095 Knowledge-Driven Querying of Biomedical Data Background: o Biomedical applications have significant knowledge and information management requirements. o Very few of current systems emphasize the knowledge requirements for day-to-day activities. o Schema design of these systems often reflects the operational requirements. o Inconsistencies between knowledge-level concepts in system design and corresponding operational data collected in a deployed system needs to be ovvercome.
48 CSE 5095 Knowledge-Driven Querying of Biomedical Data Limitations in Technologies: o OWL provides limited deductive reasoning capabilities. o Using RDF to store data at back end in biomedical systems is still not practical. o Separation of knowledge and data, which creates a semantic gap.
49 CSE 5095 Knowledge-Driven Querying of Biomedical Data Solution: (1) Specify the mapping of rows in relational table to triples in an RDF model, which will then be mapped to OWL classes, properties, and individuals. A tool written in Protégé-OWL accomplished this task. (2) Develop mapping software that works with a query engine to allow queries written in SWRL to use data retrieved from a relational database.
50 CSE 5095 Knowledge-Driven Querying of Biomedical Data Optimization techniques to improve the performance: o Adding built-in annotation ontology. o Re-writing SWRL queries. o Rule base level optimizations. o Standard database optimization techniques.
51 CSE 5095 Semantic Web Technologies Thank you!