Presentation is loading. Please wait.

Presentation is loading. Please wait.

Enterprise Solutions for the Semantic Web

Similar presentations


Presentation on theme: "Enterprise Solutions for the Semantic Web"— Presentation transcript:

1 Enterprise Solutions for the Semantic Web
Ralph Hodgson, TopQuadrant Susie Stephens, Oracle SICOP 4th Semantic Interoperability for e-Government Conference, February 9-10, 2006, Mitre, McLean, VA Give overview of the RDF Data Model, and then describe some use cases in the life sciences.

2 Agenda Introduction Semantic Web Technology Overview
Architecture of the Oracle RDF Data Model Life Sciences Use Cases and Demos Wrap Up

3 Us is

4 ? You Knowledge/Experience Commitment to Semantic Technology Adoption
Advocacy Enthusiasm Curiosity Skepticism Commitment to Semantic Technology

5 Adoption of Semantic Technology
Knowledge/Experience Current State Adoption Confidence in ability to implement and scale Advocacy 2005 Positive experiences of the power of RDF/OWL Enthusiasm 2003 People are now asking “How” questions as opposed to “Why” and “What”. Curiosity 2002 Skepticism Increase in attendance at trainings and more evidence of coverage at conferences Commitment to ST

6 Applications are getting smarter

7 What is Semantic Technology?
“Semantic technology (software) allows the meaning of and associations between information to be known and processed at execution time. For a semantic technology to be truly at work within a system, there must be a knowledge model of some part of the world (an active ontology) that is used by one or more applications at execution time.” TopQuadrant

8 Set of mind = “interoperate”
Evolution of the WEB 1995 2000 2005 + RDF, OWL ? + XML J2EE, .NET, … Encoding Paradigm Creation + RDBMS JSP, ASP, Java, … A newspaper becomes a catalog Set of mind = “retrieve/update” Generated applying specific templates, used by people Killer Apps Search Content Mgmt Web Application Servers A catalog becomes a transaction platform “interact” Portals Process Integration Web Services Platforms connect Set of mind = “interoperate” Generated by applications based on fixed schemas, used by applications and people Advisors Personal Agents IP Apps Cognitive Engines Generated by applications based on models, used by applications, devices and people Static Dynamic Transactional Semantic HTML CGI, Perl, ... Hand crafted by people for people Advertisement, Information, 1 large newspaper “browse” Browser Marketing Sales Service Integration

9 The Semantic Continuum
Simple Metadata: XML Richer Metadata: RDF/S Very Rich Metadata: OWL Human interpreted Computer interpreted Interpretation Continuum DATA KNOWLEDGE Relatively unstructured Random Very structured Logical ... Info retrieval Web search Text summarization Content extraction Topic maps Reasoning services Ontology Induction Store and connect patterns via conceptual model (i.e,. an ontology); link to docs to aid retrieval Automatically acquire concepts; evolve ontologies into domain theories; link to institution repositories (e.g., MII) Automatically span domain theories and institution repositories; inter-operate with fully interpreting computer Display raw documents; All interpretation done by humans Find and correlate patterns in raw docs; display matches only Moving to the right depends on increasing automated semantic interpretation Adapted from: Leo Obrst, “Ontologies and the Semantic Web: An Overview” Mitre, June 2005

10 The Semantic Stack

11 The Quadrants of Meaning
Modal Logics Semantic Descriptions Semantic Executable Models Thesaurus FOL OWL-DLP DL Formal CG Topic Maps Taxonomy OWL-DL OWL-Lite RDFS Terminology Management Rules UML MDA Syntactical Consensus XML Textual Descriptions PDF Code ER Informal HTML Human Machine

12 The Semantic Stack - Demystified
+ Proof + Trust OWL Reasoning CD Rules RDFS A Is-a CD Classes RDF Relationships A B hasTrack XML Structures

13 Mapping Capability Cases
Product Design Assistant Semantic Web Services Composer Expert Locator Ontology Driven Information Retriever Formal Context-Aware Retriever Semantic Data Integrator Concept-Based Search Semantic Multi-Faceted Search Semantic Workplace Semantic Data Registry Semantic Portal Semantic Web Server Generative Documentation News Aggregator Recommender Informal Application Integrator Human Machine

14 Early Adopters: A Quick Look at 7 Capability Cases
Semantic Data Integrator: Consulting Services Company Data Quality An international services company wanted to see side-by-side information from its American & European divisions. Different divisions had their own definitions of key business indicators such as utilization rates. The system uses technology from Unicorn Solutions. Semantic Data Integrator: FAA Air Passenger Threat Analyzer The system allows security personnel to assess passenger threats. Based on an Ontology and the Semagix Freedom engine, the system interfaces with diverse information sources, extracts relevant information in near real-time, unifying the data against the model. Expert Locator: Boeing’s Expert Locator Boeing has a large workforce of experts making it hard to find the right person. This web-based system returns details on potentially appropriate experts. The Boeing technical thesaurus was harnessed to create expert profiles. Product Design Assistant: Semantic Testcar Configurator A major European car manufacturer uses semantic technologies provided by Ontoprise to represent complex design knowledge in electronic form. Knowledge is integrated from different sources, across which the system draws logical conclusion. Rights Mediator: RightsCom Policy Engine Using OWL and semantic technology from Network Inference, RightCom has built an integrated solution for rights management in the media and entertainment industry. Semantic Content Registry: European Environment Agency ReportNet The Semantic Content Registry gets its information from multiple Data Repositories through harvesting them for metadata (pull) or through notifications after upload events (push). The registry uses RDF to keep track of the deliveries of data sets. Concept-based Search: Siemens Self-Service for Industrial Equipment Simatic is a self-service WEB application for Siemens Industrial Control Products. The system uses a model-based CBR engine called Orenge from Empolis. At least three distinct ways for semantic technologies to provide value

15 Ontologies are like and unlike other IT models
Like databases ontologies are used by applications at run time (queried and reasoned over) Unlike databases, relationships are first-class constructs Like object models ontologies describe classes and attributes (properties) Unlike object models, ontologies are set-based Like business rules they encode rules Unlike business rules, ontologies organize rules using axioms Like XML schemas they are native to the web (and are in fact serialized in XML) Unlike XML schemas, ontologies are graphs not trees and used for reasoning

16 This is an Ontology

17 These are Ontologies

18 Think Triples Subject predicate Object hasTrack Conference Session
hasStartTime xsd: time

19 Semantic Technology 101 EA Classes are Sets Sets can have Sub-Sets
Relationships are Properties Properties are expressed as “Subject-Property-Object” Triples Properties can have qualifiers The “From-End” of the Property is the Domain and the “To-End” is the Range Classes can specify restrictions on property ranges Domains, Ranges and Restrictions can be Set Expressions Class Membership is based on Properties EA Activities Capabilities Services CAP 1 CAP 2 CAP 4 CAP 3 allValuesFrom someValuesFrom hasValue minCardinality maxCardinality cardinality

20 What can you do with OWL? Represent and Aggregate Knowledge
Make Inferences and Discover New Knowledge Make more informed Decisions Supply Context-Based Information Integrate Disparate Databases Make Recommenders

21 Semantic technology is about putting Ontologies to work
So, what is an ontology? It is a run time model of information Defined using constructs for: Concepts – classes Relationships – properties (object and data) Rules – axioms and constraints Instances of concepts – individuals (data) Semantic web ontologies are defined using W3C standards: RDF/S and OWL

22 F.I .P.D.A. Decision Flow FIND: INTERPRET: PREDICT: DECIDE: ACT:
Capability and Services Directory Context-aware retrieval INTERPRET: Compliance Checker Dependency Discoverer Capability-Centric Communities of Practice PREDICT: Impact Analyzer What-If Analyzer DECIDE: Tradeoff Analyzer Signoff Coordinator ACT: Interest-Based Information Provider Capability Configurator Decision Flow

23 Enterprise Architecture – a Semantic Sweet-spot

24 Federal Enterprise Architecture
Performance Reference Model (PRM) Government-wide Performance Measures & Outcomes Line of Business-Specific Performance Measures & Outcomes Business Reference Model (BRM) Lines of Business Agencies, Customers, Partners Business-Driven Approach (Citizen-Centered Focus) Service Component Reference Model (SRM) Component-Based Architectures Service Layers, Service Types Components, Access and Delivery Channels Technical Reference Model (TRM) Service Component Interfaces, Interoperability Technologies, Recommendations Data Reference Model (DRM) Business-focused data standardization Cross-Agency Information exchanges

25 Example of a Registry: Showing DOD extensions to FEA
Agency-specific extensions shown “green” Hot links to TRM areas

26 Using Ontologies, FEA-RMO delivers “Line of Sight”
srm:accessedThrough srm: runsOn fea: Mission fea: intentOf srm: allignedWith srm: Component trm: Technology fea: ValuePoint fea: Agency srm: develops fea: hasIntent prm: hasPerformance prm: Performance prm:measuredBy prm: OperationalizedMeasurementIndicator fea: SubFunction fea: IT Initiative fea:undertakes brm: allignedWith prm: providesValue prm: recivesValue fea: Customer fea: Process rdfs:subClassOf rdfs:subPropertyOf Other relationships

27 Architecture of the Oracle RDF Data Model

28 Why Specialized Triple Stores?
Relational database technology does not provide a natural storage model for RDF, as it is designed for managing data that is in a relational representation, rather than a graph. a graph has a very different structure to a graph Some triple stores are just a long skinny table, but that isn’t very efficient Complex to map applications from graph model to relational backend

29 Why Oracle Supports RDF
Oracle supports open standards and RDF and OWL became W3C standards in 2004 Life Sciences customers requested the functionality Semantic Web provides important advances for data integration and search Already had graph capability with Network Data Model

30 RDF Data Model RDF data stored in a directed, logical network
Subjects and objects mapped to nodes Predicates mapped to links that have subject start nodes and object end nodes Links represent complete RDF triples RDF Triples: {S1, P1, O1} {S1, P2, O2} {S2, P2, O2} In Oracle Database 10g Release 2, a new object type has been developed for storing RDF data. This functionality builds on the Oracle Spatial Network Data Model (NDM), which is the Oracle solution for managing graphs within the database. The RDF Data Model stores RDF data using a directed, logical network. In the RDF Data Model, subjects and objects are stored in a system nodes’ table and predicates in a system links’ table. Each link must have a start node and and end node. For RDF storage, the start node of a link is the subject of a statement, and the end node of a link is the object of a statement. A link therefore represents a complete RDF triple. A key feature of Oracle’s RDF storage is that subject and object nodes are stored only once, regardless of the number of times they participate in triples. Subject and object nodes are reused, if they already exist in the database. A new link, however, is always created whenever a new triple is inserted. When a triple is deleted from the database, the corresponding link is directly removed. However, the nodes attached to this link are not removed if there is at least one other link connected to them. There is one universe for all RDF data stored in the database. All RDF triples are parsed and stored in the system as entries in tables under the MDSYS schema. An RDF triple (subject, predicate, object) is treated as one database object. A single RDF document that contains multiple triples will, therefore, result in many database objects. The possible node types are blank nodes, Uniform Resource Identifiers (URIs), plain literals, and typed literals. The Oracle Database has a type named URIType that is used to store instances of any URI, and is used to store the names of the nodes and links in the RDF network. S1 O1 O2 S2 P2 P1

31 RDF Data Model RDF_VALUE$ RDF_LINK$ RDF_NODE$
VALUE_ID LINK_ID VALUE_NAME START_NODE_ID VALUE_TYPE END_NODE_ID LITERAL_TYPE CANON_END_NODE_ID LANGUAGE_TYPE LINK_COST_COLUMN LONG_VALUE P_VALUE_ID MODEL_ID RDF_NODE$ NODE_ID RDF_MODEL$ is the system level table that is created to store information on all of the RDF models in the database. RDF_VALUE$ is the table that stores the text values. Each text value is stored only once, and a unique VALUE_ID is generated for the text entry. Uniform Resource Identifiers (URIs), blank nodes, plain literals and typed literals are all possible VALUE_TYPE entries. RDF_NODE$ is the table that stores the VALUE_ID for text values that participate in the subjects or objects of statements. RDF_LINK$ is the table that stores the triples for all of the RDF models in the database. Selecting all of the links for a specified MODEL_ID returns the RDF network for that particular model. Blank nodes are used to represent unknown objects, and when the relationship between a subject node and an object node is n-ary. New blank nodes are automatically generated whenever blank nodes are encountered in triples. However, it is possible to re-use blank nodes, for example, when inserting data into a containers or collections. The RDF_BLANK_NODE$ table stores the original names of blank nodes that are to be reused when encountered in triples. RDF_MODEL$ OWNER MODEL_ID MODEL_NAME TABLE_NAME COLUMN_NAME NODE_VALUE NODE_ID ORIG_NAME MODEL_ID RDF_BLANK_NODE$

32 Reification Resource generated from unique LINK_ID to represent reified statement Resource can then be used in subject or object A reification of a statement in RDF is a description of the statement using an RDF statement. To represent a reified statement in the RDF Data Model a resource is generated using the triple’s LINK_ID (RDF_T_ID). This resource can then be used as the subject or object of a statement. To process a reification statement, a triple is first entered with the reified statement’s resource as subject, rdf:type as property and rdf:Statement as object. A triple is then entered for each assertion about the reified statement. Each reified statement will have only one rdf:type –> rdf:Statement associated with it, regardless of the number of assertions made using this resource.

33 Containers and Collections
Containers and collections are handled similarly in the RDF Data Model. Each container or collection will have a rdf:type -> rdf:container_name/collection_name associated with it. The LINK_TYPE for container or collection members are RDF_MEMBER. Collections have an additional constraint: no new entries can be added to the list.

34 RDF Triple Implementation
SDO_RDF_TRIPLE ( subject VARCHAR2(2000), property VARCHAR2(2000), object VARCHAR2(2000)); SDO_RDF_TRIPLE_S ( RDF_T_ID NUMBER, RDF_M_ID NUMBER, RDF_S_ID NUMBER, RDF_P_ID NUMBER, RDF_O_ID NUMBER, ... Two new datatypes are defined for RDF-modeled data: The SDO_RDF_TRIPLE type is defined to serve as the triple representation of RDF data. The SDO_RDF_TRIPLE_S type is defined to store persistent data in the database. The GET_RDF_TRIPLE() function can be used to return an SDO_RDF_TRIPLE type. Oracle has also developed a Java loader for getting data into the RDF Data Model. CREATE TABLE jobs (triple SDO_RDF_TRIPLE_S); SELECT j.triple.GET_RDF_TRIPLE() FROM jobs j;

35 Rules and Rulebases A rule is an object that can be applied to draw inferences from RDF Data An IF side pattern for the antecedents An optional filter condition that further restricts the subgraphs matched by the IF side pattern A THEN side pattern for the consequents A rulebase is an object that contains rules. RDF and RDFS rulebases are provided Each RDF rulebase consists of a set of rules. Each rule is identified by a name, and consists of an ‘IF’ side pattern for the antecedents, an optional filter condition that further restricts the subgraphs, and a ‘THEN’ side pattern for the consequents. A rule when applied to an RDF model may yield additional triples. An RDF model augmented with a rulebase is equivalent to the original set of triples plus the triples inferred by applying the rulebase to the model. Rules in a rulebase may be applied to the rulebase itself to generate additional triples. Oracle supplies both an RDF rulebase that implements the RDF entailment rules, and an RDF Schema (RDFS) rulebase that implements the RDFS entailment rules. Both rulebases are automatically created when RDF support is added to the database. It is also possible to create a user-defined rulebase for additional specialized inferencing capabilities. For each rulebase, a system table is created to hold rules in the rulebase, along with a system view of the rulebase. The view is used to insert, delete and modify rules in the rulebase.

36 Rule Index Rules index contains pre-computed triples that can be inferred from applying rulebases to models If a query refers to a rulebase, then a rule index must exist for the rulebase-model combination Flexible model for updating the rules index A rules index is an object containing pre-computed triples that can be inferred from applying a specified set of rulebases to a specified set of models. If a graph query refers to any rulebases, a rule index must exist for each rulebase-model combination in the query. When a rule index is created, a view is also created of the RDF triples associated with the index in the schema. This view is visible only to the owner of the rules index and to users with suitable privileges. Information about all rule indexes is maintained in the rule index information view.

37 RDF_MATCH The RDF_MATCH table function allows a graph query to be embedded in a SQL query Searches for an arbitrary pattern against the RDF data, including inferencing, based on RDF, RDFS, and user-defined rules Automatically resolve multiple representations of the same point in value space Use of the SDO_RDF_MATCH table function allows a graph query to be embedded in a SQL query. It has the ability to search for an arbitrary pattern against the RDF data, including inferencing, based on RDF, RDFS, and user-defined rules. It can automatically resolve multiple representations of the same point in value space. The SDO_RDF_MATCH function has been designed to meet most of the requirements identified by W3C in SPARQL for graph querying.

38 Enterprise Functionality: Scalability, High Availability
Data Loads Protein data Chemistry Genome data High-speed interconnect

39 Enterprise Functionality: Security
LDAP User Management Selective Encryption  Virtual Private Database Single Sign-On

40 Enterprise Functionality: Performance
Image Source: VLDB 2005

41 Enterprise Functionality: Performance
Image Source: VLDB 2005

42 Data Integration SQL / RDBMS XQuery / XML SPARQL / RDF
Concise, efficient transactions Transaction metadata is embedded or implicit in the application or database schema XQuery / XML Transaction across organizational boundaries XML wraps the metadata about the transaction around the data SPARQL / RDF Information sharing with ultimate flexibility Enables semantics as well as syntax to be embedded in documents SQL/RDBMS, Xquery/XML and SPARQL/RDF offer three different ways to query and manage information. Why do we need three different methods?  They are designed to serve different, complementary purposes.  By using each of these in different situations, a user can optimize the quality and efficiency of information querying and management. A relational database and SQL are best where concise, efficient transactions are needed. Typically, this occurs within an enterprise application such as an ERP, CRM or SCM application. In these applications the user is interacting with the data through a tightly constrained set of forms provided by the application.  Given the tightly controlled environment, the application (and the underlying RDBMS) need a minimal amount of input (e.g. a string, a number, a date) to execute properly. This is because all the metadata about the transaction is embedded or implicit in the application or database schema itself. The benefits of SQL/RDBMS are the low overhead required to execute a transaction and, therefore, the performance and scalability with a known level of quality of service that can be achieved. However, when executing a transaction across organizational boundaries, the environment is much less tightly controlled. A supplier or customer may use a different application and a different database schema for the same type of transaction.  In addition, systems may constantly vary in a large population of organizations that are sharing information. In that case, SQL is at least very difficult to use if not utterly inadequate. For this environment, Xquery/XML combined with Web services is more appropriate.  XML documents can be used to execute transactions (e.g. purchase order) just as with SQL except that XML wraps the metadata about the transaction around the data itself.  When an XML document is sent from one organization to another, an agreed upon schema can be used to decode the metadata about the transaction.  This is feasible when you have a well-structured federation of organizations as, for example, in a supply chain.  Many industries are adopting standard XML schemas for their industry to define business documents such as purchase order, resume, prescription, etc.  These can be used with standardized industry Web services to build sophisticated inter-business processes. Xquery/XML is not as efficient as SQL/RDBMS but offers much richer transactions and more flexibility for information sharing across applications. But even Xquery/XML requires some agreement among parties as to the format of documents. Users must know ahead of time how, approximately, the information will be used. In many cases, it is impossible to know who will be looking for information and how they may choose to use it.  SPARQL/RDF, is designed for information sharing with ultimate flexibility.  By encoding the relationships between data, RDF enables semantics as well as syntax to be embedded in documents.  Users can apply arbitrary ontologies to the data and semantics to discover information that may not have even been anticipated by the original data provider.  Users with little or no technical knowledge of where the data is located or how it is structured can also formulate queries. This can be particularly powerful for applications on enterprise grids. The disadvantages of SPARQL/RDF it is difficult to guarantee the completeness and accuracy of query results.  Also, these queries cannot be performed as efficiently or with the scalability of Xquery, let alone a SQL.  Each of the different information management models has distinct strengths. An important question is how all three can be used together on a single set of data to meet the needs of all users while ensuring a single consistent source of truth.  Only Oracle Database offers the ability to use all three information management models on a single set of data within a database. Oracle supports relational, XML and RDF schemas within a single database. Each can be used on the same data in different ways.

43 Life Sciences Use Cases and Demos
OK, so here are some use cases

44 Case Study 1: Identification of Clinical Trial Candidates
Natural Language Rule To start with, I want to show you an example of an RDF_MATCH query that takes advantages of the rules capability. The rule-based drug development scenario demonstrates the use of a rule in the critical task of identifying suitable patients for a clinical trial study. The identification of patients for trials has proven very challenging to the life sciences industry. With the move towards personalized medicine, it is becoming increasingly important that physicians are able to select appropriate patients for trials, in order that effective and safe new drugs can be released. In this example, male patients over 40 with high-grade prostatic intraepithelial neoplasia (HGPIN) were selected for screening for chemoprevention of prostate cancer. The natural language representation of the rule is shown in Figure 1.

45 Case Study 1: Identification of Clinical Trial Candidates
Oracle Rule This slide shows the encoding of the rule.

46 Case Study 1: Identification of Clinical Trial Candidates
RDF Inference This slide shows the structure of an RDF_MATCH query. The ‘query’ attribute is a string literal with one or more triple patterns, usually containing variables. A triple pattern is a triple of atoms enclosed in parentheses. Each atom can be a variable, a qualified name that is expanded based on the default namespace and the value of the alias parameter, or a full URI. In addition, the third atom can be a numeric literal, a plain literal, a language-tagged plain literal, or a typed literal. The ‘models’ attribute identifies the RDF model or models to use. The ‘rulebases’ attribute identifies one or more rulebases whose rules are to be applied to the query. The ‘models’ and ‘rulebases’ together constitute the RDF data to be queried. The ‘aliases’ attribute identifies one or more namespaces, in addition to the default namespaces, to be used for expansion of qualified names in the query pattern. The ‘filter’ attribute identifies any additional selection criteria. If this attribute is not null, it should be a string in the form of a WHERE clause without the WHERE keyword. The SDO_RDF_MATCH table function returns an object of type ANYDATASET, with elements that depend upon the input variables. This is then what the RDF_MATCH query looks like. it enhances the rule language with negation, and takes advantage of the proven scalability of the Oracle platform.

47 Case Study 2: Bioinformatics Data Integration and Navigation
The following use case is provided to demonstrate the application of the Oracle RDF Data Model and Seamark Navigator to data search and browsing within drug discovery. To provide support for the RDF data exploration through Seamark Navigator, twelve publicly available bioinformatics data sets were identified. These collectively contained a wide range of biologically relevant data. Each data set was manually examined to identify a cross reference that would be needed in order to map between the different data sources. The goal was to create a concept map that linked all of the biological entities to one another, enabling users to easily jump to information of interest between the different data sets. Several additional data sets were required in order to achieve all of the desired mappings, for example, ec2go (http://www.geneontology.org/GO.indices), and gene2go (ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2go.gz). The interconnectivity of the chosen data sets is shown in Figure 3. URIs were assigned to the biological entities within the data sets. The Life Sciences Identifier (LSID) standard that has been developed for assigning Uniform Resource Name (URN) identifiers to biological entities [9], was taken advantage of in data sets that support this standard. In other instances, data set proprietary unique identifiers were used to generate URN identifiers. Examples of URNs for each of the data sets is available in Table 3, along with information regarding the size of data set used. In total, the use case data generated 316,296 triples, and this data is available for download from Enzymes, GO, IntAct, NCBI Taxonomy, and UniProt were already available in RDF/XML, so these data sets were simply downloaded from the Web. For the data that was only available in a flat file format, the XML DB feature within the Oracle Database was used to convert data in a tab-delimited format into XML. The data were either loaded into the database for the transformation, or accessed externally through Java Database Connectivity (JDBC). Extensible Stylesheet Language Transformations (XSLT) were then applied to all of the data sets in XML to convert them into RDF/XML. An example of the XSLT used for converting OMIM from XML to RDF is provided in Figure 4. Data were converted from RDF/XML to NTriple using the Jena Semantic Web application development tool [http://jena.sourceforge.net/]. Data was converted to NTriple in order to take advantage of the Java tool from Oracle for loading data into the Oracle RDF Data Model (http://www.oracle.com/technology/tech/semantic_technologies/sample_code). Once the RDF data were loaded into the Oracle RDF Data Model, rules were used to link the data sets together. If different data sets co-referenced the same URN, then the data regarding that particular entity was collapsed. In this use case, all data mappings were manually examined to ensure correctness. If the use case were to be deployed in a production environment, it would be possible to write a program that could automatically perform the mappings between the data sets. Oracle Data Mining (ODM) was used to identify top biomarker genes for a subset of patients with diffuse large B-cell lymphoma that do not respond to chemotherapy. The raw gene expression measurements from an Affymetrix scanner were loaded into Oracle Database 10g using the SQL*Loader utility. The Minimum Description Length algorithm for determining attribute importance in ODM identified 88 values with positive influence on the outcome. The RDF infrastructure was used to identify biological interesting information relating to the 6 probesets with the highest importance values.

48 Case Study 2: Bioinformatics Data Integration and Navigation
As Seamark Navigator is a faceted browser, it was necessary to select the specific facets for each data set. In order to ensure an intuitive user experience, attention was paid to which facets were retrieved at each stage during the browsing process. The initial Web page interface was designed to assist the user in identifying the data that they would be interested in interrogating (Figure 5). Subsequent Web pages were designed to help guide the user to relevant information of interest, while using filters to minimizing the size of the RDF search space (Figure 6). The interface was designed to enable users to either retrieve data about a single biological entity, or to retrieve data that applied to a group of entities. Once the facets had been determined, Seamark was used to generate a faceted browsing interface. The top six probesets from the gene expression analysis were entered into Seamark Navigator in order to retrieve related gene and protein information (Table 2). It was discovered that one of the probesets that was ranked highly by the Minimum Description Length algorithm was for the gene Protein Kinase C Beta. This was of interest because Protein Kinase C is known to be a critical protein messenger in the transfer of growth signals for B-cells and B-cell lymphomas [37]. It was of interest to discover whether the probesets would display clustering within Gene Ontology (GO). By selecting all 6 probesets of interest simultaneously for further drill down, it was determined that they corresponded to 43 GO terms. It was further revealed that there was a clustering within the GO molecular function classes of receptor activity, receptor binding, hydrolase activity, and transferase activity. The results of the clustering were examined for statistical significance using the BiNGO plugin [24] for Cytoscape [36]. Seamark was able to achieve sub-second response times for all queries undertaken.

49 Case Study 3: Drug Safety Determination
As compounds move along the drug discovery and development pipeline, companies make decisions at several points as to whether to pursue the compound further. To pass through a decision gate a compound must meet a series of criteria that a particular functional area predetermines.11 More effective decision-making regarding a drug’s safety profile would use all known information regarding the compound, target, and patient group. OWL inferencing (for term disambiguation and reconciliation for integrating data, and for complex definitions and classification), complemented where appropriate by rules-based inferencing, could help guide decisions on continuing to pursue a compound or withdrawing a drug from the market. The goal being to minimize the high rate of adverse events and medical errors.

50 Case Study 3: Drug Safety Determination
IF compound has >90% structural similarity to a failed compound AND compound binds to target with more than 5 SNPs AND therapeutic index is low AND histology indicates > 5% incidence of liver necrosis in rats AND ALT reading is > 2x above normal in phase I AND therapeutic dose is > 30 mg in phase II AND >80% of patients with Cytochrome P450 2DE report skin rash in phase III THEN consider immediately stopping trials for those patients shows an example of a rule that uses data from many functional groups to help guide a physician on the best course of action when a patient in a clinical trial reports a skin rash. The rule shows many characteristics relating to the compound, its interaction with a target, and related toxicity data. No single condition in the rule would make the compound appear ideal, yet none is individually strong enough to warrant halting the trial. However, seeing all conditions in parallel might necessitate a stronger reaction. We could use the Oracle RDF_MATCH capability to run the rule in Figure 4 against data in the RDF Data Model provided all data were unified prior to running the query. Cerebra Server, when used with the Oracle RDF Data Model, would link multiple ontologies and disambiguate queries over federated information, as a Mediation Service. For example, when asked for all “rashes,” the Mediation Service would use OWL Restriction Classes and inference to interpret that, in a local vocabulary, a rash is an inflammation of the skin. The Mediation Service would then retrieve the appropriate data regardless of whether the original modeler deemed the feature “dermatitis” or “rashes.” Conceptually, this process uses three ontology layers. The first layer is the data ontology, providing an interface between the database schema and the domain ontology. This layer isn’t required if data is stored in the Oracle RDF Data Model. The second layer is the domain ontology, describing life sciences and healthcare domains. In our example, a subset of the Snomed (Systematized Nomenclature of Medicine) terminology could represent diagnosis and procedures, while a separate ontology represents portions of Loinc (Logical Observation Identifiers, Names, and Codes) to describe laboratory findings. Gene Ontology could represent genetic information, and an in-house ontology could represent the chemical structure information. With these two layers in place, it becomes possible to query data through the concepts in the ontologies instead of the traditional instance matching approach, allowing queries to be more easily specialized and generalized. The third layer, the application ontology, describes a more specialized ontology based on either a particular application or a third-party world view. We can construct new concepts from, or map to, concepts in the domain ontology. We can then dynamically reclassify instances in the database on the basis of the axioms in the application ontology.

51 Demos BioDASH Family Tree

52 SAPPHIRE Project Situation-Aware Prevention of Public Heath Incidents using Reasoning Engines Systematic and Continuous Collection, Analysis, Interpretation, and Dissemination of Diagnostic and Pre-Diagnostic Data for use in Timely and Sensitive Detection of Public health Incidents (Bioterrorism or Natural) to Reduce Morbidity and Mortality by Better Response Planning and Coordination. The Center for Biosecurity and Public Health Informatics Research University of Texas, Houston

53 SAPPHIRE Capability Cases
The Center for Biosecurity and Public Health Informatics Research SAPPHIRE Capability Cases

54 SAPPHIRE Ontology Architecture
The Center for Biosecurity and Public Health Informatics Research SAPPHIRE Ontology Architecture

55 The Center for Biosecurity and Public Health Informatics Research
SAPPHIRE Tools

56 Best Practices for Ontology Engineering

57 Technology Adoption of Ontology Engineering
Modeling Guidance Methodology for Ontology Engineering Techniques Ontology Modeling Patterns Tools

58 Ontology Modeling Guidance: Techniques (in frequently-asked order)
How to build an Ontology How to develop an Ontology Architecture How to integrate Databases Federated Search Concept-Based Search How to annotate information resources Entity and Concept Extraction Techniques Working with XML Schemas Working with UML How to use an Upper Ontology

59 Ontology Modeling Methodology
Problem Modeling, CATWOE Solution Envisioning Checkland’s Soft Systems Methodology Boundary Objects - Coordination & Negotiation of Meaning in Organizations Stakeholders Forces, Barriers, Challenges, Results, Capability Cases & Capability Architecture LIBRA Boundary Object re-contextualization Technique Insights on Software Reuse ODM Stakeholder Analysis, CV, Boundary Criteria, Context as Settings Uschold & King Competency Questions TopSAIL™ Grunniger & Fox Agent Model, Task Model, Roles Ontology Architecture, Ontology Patterns, Knowledge Maps + Agile Methods TOVE Modeling Patterns Boundary object is a term coined by Star and Greisemer (1989) to describe objects that serve as an interface between boundaries of domain knowledge. Within communities of practice theory, boundary objects are defined as artifacts, documents, terms, concepts, and other forms of reification around which communities of practice can organize their interconnections....They enable coordination, but they can do so without actually creating a bridge between the perspectives and the meanings of various constituencies. (Wenger 1998, p. 107) ------ Star [13] and Star and Griesemer [14] initiated the discussion of boundary objects. For them boundary objects are: …objects which both inhabit several intersecting social worlds and satisfy the informational requirements of each of them... They have different meanings in different social worlds but their structure is common enough to more than one world to make them recognizable, a means of translation. ([14], p. 393) The work has traveled in several directions since. Bowker and Star [5] recently have focused on the role of boundary objects in translation, specifically how boundary objects assist in classification and how they calcify into standards. Other studies have focused on how boundary objects play in the micro-negotiations within developing shared understanding. Henderson’s work with design engineers [7] centered on how engineers use diagrams, drawings, and blueprints as points of negotiation. She focused specifically on the changes, both positive and negative, occurring as the CAD revolution shifted these artifacts from paper to digital form. Bechky [3] also attended to the role that drawings play in negotiations among engineers. However, she focused on drawings that explicitly span social world boundaries (e.g. moving from design to manufacturing). More to the point of this paper, other researchers have examined what is inscribed on the boundary objects in the processes of negotiation, and the meanings behind those inscriptions. Berg and Bowker [4] detail how patient records in hospitals act as boundary objects “producing” the patient for physicians, technicians, and nursing staff via the mappings between the individual and their surrogate representation in the record. Mambrey and Robinson [10], in the GMD’s POLITeam project, looked at boundary objects and their inscriptions, primarily those of workflow. In their study of a German ministry, inscriptions detailing workflow allowed groups to understand the relative meanings for an artifact. They also noted that boundary objects could be compound: Folders circulated with enclosed papers and documents. Ackerman and Halverson [1] reported on a personnel hotline, detailing the information flows within telephone calls and the construction of the answers. In all of these, as Star points out, boundary objects were necessarily decontextualized on one side of the boundary, and reconstructed on the other. The reconstruction of the boundary object, for example a personnel record, was found to be critical to reusing information in organizations. Several other streams of research in CSCW are of importance in this work. Boundary objects allow an ability to represent multiple perspectives of a single information artifact, interpret the negotiations that govern its creation and evolution, and map the intersection of social worlds onto aspects of the artifact itself. In this function, boundary objects are similar in their negotiation affordances to coordination mechanisms [11]. As well, Bannon and Bødker [2] points out the importance of what they call punctuation in informational artifacts, moments when informational artifacts cease to be dynamic and changing, and instead crystallize. References Star, S. L., and Griesemer, J. R. “Institutional Ecology, 'Translations' and Boundary Objects: Amateurs and Professionals in Berkeley's Museum of Vertebrate Zoology, ,” Social Studies of Science (19), 1989, pp Wenger, E. Communities of Practice: Learning, Meaning and Identity, Cambridge, England: Cambridge University Press, 1998. 1. Ackerman, M. S. & Halverson, C. Considering an Organization’s Memory, Proceedings of CSCW, 1998, 39- 48. 2. Bannon, L. & Bødker, S. Constructing Common Information Spaces, Proceedings of E-CSCW, 1997, 3. Bechky, B. A. Crossing Occupational Boundaries, Ph.D. dissertation, Stanford University, 1998. 4. Berg, M. & Bowker, G. The Multiple Bodies of the Medical Record, Sociological Quarterly (38:3), 1997, 5. Bowker, G. & Star, S. L. Sorting Things Out. Cambridge, MA: MIT Press, 1999. 6. Garfinkel, H. Studies in Ethnomethodology. New York: Polity, 1967. 7. Henderson, K. On Line and on Paper. Cambridge, MA: MIT Press, 1998. 8. Lutters, W G. Supporting Reuse: IT and the Role of Archival Boundary Objects in Collaborative Problem Solving, Ph.D. dissertation, University of California, Irvine, 2001. 10. Mambrey, P. & Robinson, M. Understanding the Role of Documents in a Hierarchical Flow of Work, Proceedings of GROUP, 1997, 11. Schmidt, K. & Simone, C. Coordination Mechanisms, CSCW (5:2), 1996, 12. Schmidt, K. & Bannon, L. Taking CSCW Seriously: Supporting Articulation Work, CSCW (1:1), 1992, 7-40. 13. Star, S. L. The Structure of Ill-Structured Solutions, in Gasser, L. & Huhns M. (eds), Distributed Artificial Intelligence– Volume II. Morgan Kaufmann, 1989, 14. Star, S. L. & Griesemer, J. Institutional Ecology, ‘Translations’ and Boundary Objects, Social Studies of Science (19:3), 1989, 15. Strauss, A. & Corbin, J. Basics of Qualitative Research. Newbury Park, CA: Sage Publications, 1990. 16. Suchman, L. Office Procedure as Practical Action, ACM Transactions on Office Information Systems (1:4), 1983, 17. Weick, Karl E. & Roberts, K. Collective Mind in Organizations, Administrative Science Quarterly (38:3), 1993, 18. Wenger, E. Communities of Practice. Cambridge: Cambridge University Press, 1998. CommonKADS Workproduct Concept MOKA Methontology 1990 1994 1998 2006 IBM 2002

60 An Ontology Architecture is crucial: Some dependencies in the EA Ontologies
Enterprise Architecture Capability Cases Enterprise Capability ECM CAPCASE Standards EPM Enterprise Process ORGS Product HR Technology Industry Time TQEC Enterprise Core Ontology TQC TopQuadrant Core Ontology DC Dublin Core

61 Ontology Architecture Requirements Specification (OARS)
Ontology of Ontologies Stakeholders Systems Competency Questions Capability Questions Architecture Dependencies Ontology Reuse

62 Ontology Modeling: Top 10 Guidelines
Standardize: modelling patterns, concept and property names and namespaces - provide human-readable names with rdfs:label Keep ontologies small and modular - evolve an ontology architecture Assimilate enterprise knowledge, for example, internal lists, vocabularies, taxonomies. Be clear on the role of each ontology: specification versus knowledge discovery Agile re-factoring using ontology re-factoring patterns Use domain and range with care Analyze  Synthesize  Evaluate: Iterate with stakeholders using blueprints. Validate models using competency questions Test often using sample data Be careful with open and closed world reasoning when using restrictions and avoid the ‘allValuesFrom’ restriction with ‘Equivalent Classes’ Model for reuse – separate instances from classes 1) one of the best practices could be: keep ontologies small and modular this is indeed a best practice, but one issue with it is that once you start importing modular ontologies, you can no longer change the names of their classes and properties if another ontology uses them - used to be a problem, but holger has tool support for this now 2) another best practice is to re-factor as you progress, so that the models do no get out of hands and stay small and modular how do you do it without the ability to move concepts? well, you do not need to worry now - we got it 3) another issue is that you need to find a way to present your work to others and explain it - this is where diagrams come handy (one more opportunity to bring up a tool) 4) distribuiting the work is best practice - modular ontologies is one aspect, another aspect is that your business user could be contribuiting by giving you controlled vocabularies and other knowledge - terms such as standard geographies, standard document statuses, etc - this is an opportunity to talk about the division between 'schema' ontologies and datasets expressed in RDF/OWL and to bring up TopBraid Collaboration Server And you can talk about setting up ontology architecture ontology for managing the models and the competency questions, etc. One more best practice (I think) If you find yourself putting multiple classes in the domains, ranges or on the end of restrinctions (like in Cl 1, Cl2, Cl3 are all in the domain of p1 or restriction that says allValuesFrom Cl1, Cl2, Cl3) consider defining a parent class for Cl1, Cl2 and Cl3.

63 Ontology Modeling Guidance: Patterns
Trust My Classifier n-ary relations Class Bridge Class-Instance Mirror Transitive Parent Property Abstraction

64 Ontology Design Pattern: Model Bridge - Example
Travel Services US travel model Airline Hotel subClassOf Lodging British travel model B&B Hotel

65 Tooling Protégé SWOOP Semantic Works TopBraid Studio

66 List of ontologies used in each project, user can switch between them
TopBraid Studio – 1 Class Tree Properties List of ontologies used in each project, user can switch between them

67 TopBraid Studio – 2 Form Editor Tracking changes

68 TopBraid Studio - 3 Switching the view from a Form to RDF Source
Multiple serializations are available Edits can be made directly Instance Window

69 TopBraid Studio - 4

70 Wrap Up

71 Further Information - Oracle

72 Further Information - TopQuadrant
Irene Polikoff and Robert Coyne, “Towards Executable Enterprise Models: Ontology and Semantic Web Meet Enterprise Architecture”, Journal of Enterprise Architecture, Fawcette Publications, August 2005 Dean Allemang, Irene Polikoff, Ralph Hodgson , “Enterprise Architecture Reference Modeling in OWL/RDF”, ISWC, International Semantic Web Conference, Ireland, 2005 TopQuadrant White Paper on FEA-RMO, 2/21/2005 FEA Ontology Models FEA - BRM2PRM - PRM - BRM - SRM - TRM - Merged Ontology -

73 Books on Semantic Technology - 1
Johan Hjelm, “Creating the Semantic Web with RDF”, John Wiley, 2001 Dieter Fensel: “Ontologies: A Silver Bullet for Knowledge Management and Electronic Commerce”, Springer Verlag, 2001 John Davies, Dieter Fensel & Frank van Harmelen:, “Towards the Semantic WEB – Ontology Driven Knowledge Management”, John Wiley, 2002 Dieter Fensel, Wolfgang Wahlster, Henry Lieberman, James Hendler (Eds.): “Spinning the Semantic Web: Bringing the World Wide Web to Its Full Potential”, MIT Press, 2002 Michael C. Daconta, Leo J. Obrst, Kevin T. Smith: “The Semantic Web: A Guide to the Future of XML, Web Services, and Knowledge Management”, John Wiley, 2003 Vladimir Geroimenko (Editor), Chaomei Chen (Editor), “Visualizing the Semantic Web”, Springer-Verlag, 2003 M. Klein and B. Omelayenko (eds.), “Knowledge Transformation for the Semantic Web”, Vol. 95, Frontiers in Artificial Intelligence and Applications, IOS Press, 2003 Sheller Powers, “Practical RDF”, O’Reilly, 2003

74 Books on Semantic Technology - 2
Lee W. Lacy, “OWL: Representing Information Using the Web Ontology Language”, Trafford Publishing, 2005 Thomas B. Passin, "Explorer's Guide to the Semantic Web", ISBN , June 2004 Jeff Pollock and Ralph Hodgson, "Adaptive Information: Improving Business Through Semantic Interoperability, Grid Computing, and Enterprise Integration“, John Wiley, September 2004 Munindar P. Singh, Michael N. Huhns, “Service-Oriented Computing : Semantics, Processes, Agents”, John Wiley & Sons, 2005 Irene Polikoff et al, ”Capability Cases – A Solution Envisioning Approach”, Addison-Wesley, 2005 Grigoris Antoniou and Frank van Harmelen, “A Semantic Web Primer”, The MIT Press, April 2004

75

76 Integration and Aggregation of Data
This is a screen snapshot of BioDASH, which is a demo that I was involved in building with a number of people in the life sciences community. You can download the software to run BioDASH from the W3C’s Web site. The demo was built to show how it becomes easier to integrate and aggregate data using RDF. On the left hand panel, you can see the red GSK beta protein, and some of the chemical compounds that it interacts with. If you click on any of these entities you get to see more information. On the right, you can see the wnt pathway. Image Source: BioDASH

77 Integration and Aggregation of Data
If you drag and drop the protein from the left pane, to the the right pane, you can aggregate the 2 pieces of information. No common schema was needed to do this, as the various components are using RDF triples with unique identifiers. Image Source: BioDASH


Download ppt "Enterprise Solutions for the Semantic Web"

Similar presentations


Ads by Google