Download presentation
Presentation is loading. Please wait.
Published byVivien Crawford Modified over 9 years ago
1
ONTOLOGY-DRIVEN DISCOVERY OF SCIENTIFIC COMPUTATIONAL ENTITIES Pearl Brazier Department of Computer Science University of Texas-Pan American November 2, 2010
2
2 Outline Motivation Research Goals and Objectives Significance of Contribution Background Information and Context Research Efforts GEO-SEED Architecture Scientific Computational Entity Discovery Ontology RDF Repository Usability and Performance Studies Conclusions and Future Work
3
November 2, 20103 Motivation: Geosciences Web Services Web contains many scientific resources Scientific data (sharing datasets, experimental results) Geosciences web services metadata Resources are currently shared via publication human contact web portals Metadata annotations needed to assist collaboration allow machine processing
4
November 2, 20104 Research Goal To investigate an ontology ‐ driven discovery approach that can be distributed on the Web and that can support the elicitation, documentation, and registration of computational entities and other resources
5
BACKGROUND INFORMATION AND CONTEXT November 2, 2010 5
6
Cyberinfrastructure/e-science Supports building new types of scientific and engineering knowledge environments and organizations Supports modern in-silico experiments that can lead to important scientific discoveries through scientific data repositories, semantic mediation services, and scientific workflows Describes computationally intensive science, which is carried out in highly distributed network environments, or science that uses immense data sets November 2, 2010 6
7
Web Technologies-1 Web 2.0 Technologies Includes social networks and Wiki technologies Used by humans Semantic Web Allows machines to understand meaning of information on the Web Used by machines and automated agents Supports core standards such as RDF, SPARQL, OWL - 62 Wine Ontology Sample ontology used in the OWL specification documents. http://www.w3.org/TR/2003/ CR-owl-guide- 20030818/wine# November 2, 2010 7
8
Web Technologies-2 Ontologies Captures concepts and relationships among them Provides standard vocabulary and classifications Web Services Metadata WSDL WSDL-S OWL-S and SWSO November 2, 2010 8
9
RESEARCH OBJECTIVES AND ACTIVITIES 9
10
10 Objective 1 Define an ontology for scientific computational entities that supports the development of a repository and a system that can retrieve computational entities. Activities: Define use cases for the ontology. Determine the essential elements of an ontology that documents the features and relationships used to identify computational entities and distinguish one from another.
11
November 2, 2010 11 Objective 2 Define an architecture that supports an ontology-driven approach. Activities Investigate efficient approaches for storing information. Investigate the relationships of registration, annotation, and knowledge extraction.
12
12 Objective 3 Evaluate the usability of a system based on the ontology-driven discovery approach. Activities Design and implement a prototype system based on the ontology-driven discovery approach. Conduct a usability study of the prototype system with computer scientists and geoscientists (novices and experts).
13
13 Objective 4 Evaluate the performance of a system based on the ontology- driven approach. Activities Design the schema and implement a relational RDF repository that supports efficient storage and querying of documented scientific computational entities based on the ontology-driven discovery approach. Run a simulation to analyze the performance of the system.
14
Research Contributions and Significance Designed new Scientific Computational Entity Discovery Ontology More comprehensive and domain specific than existing discovery ontologies, enabling the scientist to more easily share their computational entities Created a novel design for organizing the RDF data Uses SPARQL queries for the RDF representation Supports more efficient query evaluation Developed GEO-SEED wiki using Web 2.0 and Semantic Web Technologies Supports discovery and sharing of scientific computational entities 14
15
RESEARCH EFFORTS: GEO-SEED ARCHITECTURE AND ONTOLOGY 15
16
16
17
GEO-SEED ARCHITECTURE 17
18
GEO-SEED Scientific Computational Entity Discovery Ontology 18
19
Computational Entity Profiles PROFILE NAMEPURPOSE GeneralProfileBasic Identifying Information QoSProfileQuality-of-service characteristics InvocationProfileDetails needed to execute an entity DeploymentProfileDetails needed to download and deploy an entry from the Web ImplementationProfileInformation related to the implementation of a computational entity GeoscienceProfileInformation obtained with a domain specific ontology 19
20
General Profile Descriptors Descriptor NameRelease Date AuthorsLanguages Contact InformationSubject ContributorsType DescriptionCost URIsLicense Unique IdentifierSupport Version 20
21
QoS Profile Descriptors Descriptor TrustSecurity ReliabilityKnown Failures AvailabilityOverall Rating Processing TimeUser Reviews Requests per Second Custom Metric 21
22
Deployment Profile Descriptors Descriptor URLsHardware Software Architecture Installation Operating Systems Activation and Deactivation Software Dependencies 22
23
Invocation Profile Descriptors Descriptor Processes and Operations Effects TypesBindings Parameters and MessagesWSDL Document PreconditionsOWL-S Document 23
24
Implementation Descriptors Descriptor Source Repositories Algorithms Software Development Environment (SDE) Software Components Languages and Paradigms Documentation 24
25
Geoscience Descriptors Descriptor Ontologies Domain-Ontology Dependent Annotations 25
26
Scientific Computational Entity Discovery Ontology 26
27
RESEARCH EFFORTS: RDF REPOSITORY 27
28
28 Schema Mapping Strategies Five approaches to generate database schemas: Schema-Oblivious Schema-Aware Data Driven User-Customizable Hybrid
29
29 Schema-Oblivious (Triple Table). "Pearl Brazier". "5". “0.9". “4". spo :WS1rdf:type:WebService :WS1describedBy:GP1 :WS1describedBy:QoSP1 :GP1rdf:type:GeneralProfile :QoSP1rdf:type:QoSProfile :GP1Subject:Gridding :GP1authorPearl Brazier :QoSp1trust5 :QoSP1availability0.9 :QoSP1overallRating5 Triple subject predicateobject Extracted RDF Triples
30
30 Schema-Aware (Property Table). "Pearl Brazier". "5". “0.9". “4". so :WS1:WebService :GP1:GeneralProfile :QoSP1:QoSProfile Property_type so :GP1Pearl Brazier Property_author so :WS1:GP1 :WS1:QoSP1 Property_describedBy
31
31 User Customizable (Profile Tables). "Pearl Brazier". "5". “0.9". “4". spo :GP1rdf:type:GeneralProfile :GP1subject:Gridding :GP1authorPearl Brazier GeneralProfile SPo :QoSP1rdf:type:QoSProfile :QoSp1Trust5 :QoSP1availability0.9 :QoSP1overallRating4 QoSProfile
32
SPARQL Query Retrieves the quality-of-service descriptors of a Web service :WS1: Select ?profile ?pre ?obj Where { :WS1 :describedBy ?profile. ?profile rdf:type :QoSProfile. ?profile ?pre ?obj. } 32
33
Triple Table: Two Joins Triple Triple Triple. Note: Tables can get large Property Table: Two Joins describedBy type (trust ⋃ reliability ⋃ availability ⋃ ⋯⋃ userReview) Note: Union result is not indexed Query Complexity Comparison 33
34
OR Many Joins: (describedBy type trust) ⋃ (describedBy type reliability) ⋃ (describedBy type availability) ⋃⋯⋃ (describedBy type userReview) Note: Indexed but re-computes the (describedBy type) many times Profile Table: One Join describedBy QosProfile 34
35
Empirical Comparison of the Three Approaches Created a GEO-SEED dataset that describes 10,000 web services. Defined six common queries using SPARQL Ran queries on PC with 3.00 GHz Intel Core 2 CPU, 4GB RAM, 750 GB disk space running Evaluated the execution time 35
36
36 Performance Test Queries 1.Find web services that implement a computational entity with the name “gridding” 2.Find web services, along with their user reviews and overall quality-of-service ratings, that implement a computational entity “gridding” 3.Find web services that implement a computational entity with the name “gridding” and that have trust ≥ 4 and availability ≥ 0.8 ratings 4.Retrieve a general profile of a particular Web service. 5.Retrieve a quality-of-service profile of a particular Web service 6.Retrieve quality-of-service profiles of two Web services
37
37 Performance Study Results
38
RESEARCH EFFORTS: GEO- SEED WIKI PROTOTYPE 38
39
39 Overview GEO-SEED consists of two components: Wiki and RDF repository Wiki serves as a collaborative environment for knowledge sharing of geosciences web services. Provides interface for human interaction RDF repository serves as a meta-data database readily accessible by machines and automated agents
40
CONCLUSIONS 40
41
Conclusions GEO-SEED architecture supports a new generation Web portal metadata repository for scientific computational entities in geosciences for sharing and discovery Ontology-driven profiles approach supports usability for Humans Machines Unique User-customizable profile table design for storing the RDF data allows efficient queries of large metadata collections 41
42
Future Work Explore user-guided metadata extraction algorithms for the Wiki Explore coupling GEO-SEED with an existing SWFMS Extend the project to support annotation and discovery of scientific workflows and datasets in geosciences Refine the prototype to address user interface issues 42
43
Thank You! Questions? November 2, 2010 Summer 2010Spring 2010 Cactus from El Paso 2005 43
44
Presentations Abraham, John, Brazier, Pearl, Chebotko, Artem, Jaime Navarro, and Piazza, Anthony, "Distributed Storage and Querying Techniques for a Semantic Web of Scientific Workflow Provenance", in Proc. of the 7th IEEE International Conference on Services Computing (SCC'10), Miami, Florida, USA, July 5-10, 2010. Acceptance rate: 18% DownloadDownload Brazier, Pearl, Chebotko, Artem, Gonzalez, Eric, Kashlev, Andrey, and Piazza, Anthony, "Supporting Geosciences Web Services Metadata Management and Discovery", in Proc. of the 7th IEEE International Conference on Services Computing (SCC'10), Miami, Florida, USA, July 5-10, 2010. Brazier, Pearl, Chebotko, Artem, Gates Ann Q., Piazza, Anthony, and Salayandia, Leonardo. (2009) “Web 2.0 and Semantic Web Portal for Annotation and Discovery of Web Services in Geosciences”, Presented and published in 2009 International Conference on Semantic Web and Web Services (SWWS 2009), Las Vegas, Nevada, July 13-16, CSREA Press, USA. Brazier, Pearl, Chebotko, Artem, Gates Ann Q., and Salayandia, Leonardo. (2009) “GEO-SEED: A Metadata Repository for Geosciences Web Service Discovery”, Presented and published IEEE 2009 Third International Workshop on Scientific Workflows (SWF 2009), Los Angeles, CA., July 6-10. August, 2010 UTEP Computer Science Dissertation Defense 44
45
August 12, 2010 UTEP Dissertation Defense 45
46
RESEARCH EFFORTS: USABILITY STUDY August 12, 2010 UTEP Dissertation Defense 46
47
Usability Study Overview 31 Invitations sent to Geology faculty, students, Computer Science faculty and students (17 Responses) Steps in study: – Register – Login – Submit a computational entity – Search for a computational entity – Add a user rating for an entity – Complete a survey rating the experience 47 UTEP Dissertation Defense August 12, 2010
48
Overall GEO-SEED would be a useful tool for sharing August 12, 2010 UTEP Dissertation Defense 48
49
Other Usability Study Results August 12, 2010 UTEP Dissertation Defense 49
50
Descriptive Statistical Analysis BINOM-DIST – Grouped responses into two group Strongly Disagree + Disagree Agree + Strongly Agree – Compared p-values for < 0.05 t Test – Used 4 groups Strongly Disagree + Disagree + Agree + Strongly Agree – Compared p-values for < 0.05 August 12, 2010 UTEP Dissertation Defense 50
51
Register log in getting started change help BINOM DIST p value t test p value t test mean Assessment experienced no difficulty registering as a GEO-SEED user 0.02 1.07Support experienced no difficulty logging in to GEO-SEED 0.040.0041.13Support clear how to get started to register[submit] my information in GEO-SEED 0.150.440.08Not Support able to add to or change existing information in GEO- SEED 0.390.230.36Not Support easy to learn to use the system0.400.240.31Not Support adequate information provided how to use the system 0.500.630.37Not Support Statistical Results of BINOM-DIST and t test (Meeting Goal)
52
August 12, 2010 UTEP Dissertation Defense 52
53
Rate features of GEO-SEED Wiki Helpfulness BINOM DIST p value t test p value t test mean Assessment successfully searched0.0040.0061.27Support able to understand the fields0.210.130.50Not Support able to find help when needed 0.130.240.40Not Support sharing processes a Geologist uses 0.020.011.00Support sharing existing datasets information 0.020.011.00Support sharing geology application software 0.020.011.00Support locating web portals a Geologist might use 0.05 0.83Support helpful for Geology students0.030.020.93Support
54
August 12, 2010 UTEP Dissertation Defense 54
55
Statistical Results of BINOM-DIST and t test (Meeting Goal) Degree of difficulty entering new Information in GEO-SEED Profiles BINOM DIST p value t test p value t test mean Assessment General Information0.290.210.23Not Support Invocation Information0.390.290.17Not Support Deployment Information0.190.130.33Not Support Quality of Service Information 0.610.500.00Not Support Implementation Information 0.610.500.00Not Support Domain Knowledge Information 0.500.400.08Not Support
56
Data set for Performance Study (GeneralProfile & QoSProfile). "gridding".. "name of a web service". "author1". "contact1", "contributor1". "description1". August, 2010 UTEP Computer Science Dissertation Defense 56
57
Dr. Pearl W. Brazier August 12, 2010 Summer 2010Spring 2010 Cactus from El Paso 2005
58
"url1". "identifier1". "version1". "releaseDate1". "language1". "cost1". "license1". "support1". "5". "1.0". "0.8". "250 ms". "40". "0". "unknown". "5". "User wrote something August, 2010 58
59
Sample Data Sets SPARQL to SQL translations for Q1-Q6 Available in Appendix E of Dissertation August, 2010 UTEP Dissertation Defense 59
60
Ontology Descriptors Sources Semantic Markup for Web Services (OWL-S) Semantic Web Services Ontology (SWSO) Web Services Description Language (WSDL) Web Service Semantics (WSDL-S) Additional Web services descriptors August, 2010 UTEP Dissertation Defense 60
61
Scuccelent Plant Ontology August, 2010
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.