1 Semantic Research Grid Open Grid Forum Web 2.0 Workshop OGF21, Seattle Washington October 15 2007 Geoffrey Fox, Aurel Cami, Ahmet Fatih Mustacoglu, Ahmet.

Slides:



Advertisements
Similar presentations
How to Author Teaching Files Draft Medical Imaging Resource Center.
Advertisements

CICC Chemical Compound Mining Workflows Jungkee (Jake) Kim Community Grids Laboratory.
EndNote Web Reference Management Software (module 5.1)
In the Format section, we have activated the Bibliographic style drop down menu. From this page, you can choose a specific journal or format (e.g. BMC.
EndNote Web Reference Management Software (module 5)
DIGIDOC A web based tool to Manage Documents. System Overview DigiDoc is a web-based customizable, integrated solution for Business Process Management.
Business Development Suit Presented by Thomas Mathews.
Introduction to Mendeley. What is Mendeley? Mendeley is a reference manager allowing you to manage, read, share, annotate and cite your research papers...
FALL 2011 JACKIE STAPLETON, LIAISON LIBRARIAN MARTHA LAUZON, LIBRARY ASSOCIATE RefWorks: The Basics Introduction to Refworks workshop Pre-learning assessment.
An introduction to Cambridge Collections Online… Full online access to collections of classic and newly- published scholarly titles in PDF format Contains.
Introducing Symposia : “ The digital repository that thinks like a librarian”
An Agent-Oriented Approach to the Integration of Information Sources Michael Christoffel Institute for Program Structures and Data Organization, University.
1 CS 502: Computing Methods for Digital Libraries Lecture 4 Identifiers and Reference Links.
Before class begins… Help us to assess this session and plan for future workshops Please complete the Advanced Refworks Pre-learning assessment at:
Management of information. Objectives Discuss the benefits of good management practice Present reference management tools Present bookmark management.
Jean Phillips Schwerdtfeger Library Space Science and Engineering Center University of Wisconsin-Madison November 2005.
MyiLibrary® ‘Search & View’ Website Training June 8, 2010.
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
Trimble Connected Community
The purpose of this Software Requirements Specification document is to clearly define the system under development, that is, the International Etruscan.
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
1 The BT Digital Library A case study in intelligent content management Paul Warren
OASIS ebXML Registry Standard Open Forum 2003 on Metadata Registries 10:30 – 11:15 January 20, 2003 Kathryn Breininger The Boeing Company Chair, OASIS.
Event-Based Model for Reconciling Digital Entries Thesis Proposal Ahmet Fatih Mustacoglu 10/3/20151Ahmet.
Introduction to Mendeley. What is Mendeley? Mendeley is a reference manager allowing you to manage, read, share, annotate and cite your research papers...
Web 2.0 Features on Scitation. Web 2.0 and Powder Diffraction Web 2.0 features can be found on the Scitation platform for Powder Diffraction –
Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.
Mendeley Citation Management and Research Network Helen Smith Life Sciences Library Penn State University.
Microsoft Academic Search Search | Explore | Discover Alex D. Wade Director - Scholarly Communication.
Integrated Collaborative Information Systems Ahmet E. Topcu Advisor: Prof Dr. Geoffrey Fox 1.
PLoS ONE Application Journal Publishing System (JPS) First application built on Topaz application framework Web 2.0 –Uses a template engine to display.
OARE Module 5A: Scopus (Elsevier). Table of Contents About Scopus (Elsevier) Using Scopus Search Page Results/Refine Search Pages Download, PDF, Export,
Open access & visibility Management Digital Preservation ORA: Purposes.
My Bibliography/eRA Commons Integration More utility, less work Bart Trawick Neil Thakur Commons Working Group, 9/22/09.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
29-30 October, 2006, Estonia 1 IST4Balt Information analysis using social bookmarking and other tools IST4Balt Information analysis using social bookmarking.
Presentation title: 32pt Arial Regular, black Recommended maximum length: 1 line My Library Phase 3 concepts.
Event-Based Hybrid Consistency Framework (EBHCF) for Distributed Annotation Records Ahmet Fatih Mustacoglu Advisor: Prof. Geoffrey.
1 CS 502: Computing Methods for Digital Libraries Lecture 19 Interoperability Z39.50.
SRG: A Digital Document-Enhanced Service Oriented Research Grid Ahmet E. Topcu Ahmet Fatih Mustacoglu Geoffrey C. Fox Aurel Cami Indiana University Computer.
Mercury – A Service Oriented Web-based system for finding and retrieving Biogeochemical, Ecological and other land- based data National Aeronautics and.
Information Literacy – Masters 2015 Linda Stoop October 2015.
Microsoft FrontPage 2003 Illustrated Complete Integrating a Database with a Web Site.
1 Web 2.0 and Grids for Scholarly Research Peking University July Geoffrey Fox Computer Science, Informatics, Physics Pervasive Technology Laboratories.
October RefWorks Basics Creating accounts and folders Adding references (manually & electronically) Sorting, editing and linking Creating a bibliography.
EBI is an Outstation of the European Molecular Biology Laboratory. Literature Resources at the EBI Information Workshop on European Bioinformatics Resources.
IN THE NAME OF GOD. Reference Citing Software.
Internet Documentation and Integration of Metadata (IDIOM) Presented by Ahmet E. Topcu Advisor: Prof. Geoffrey C. Fox 1/14/2009.
IAEA International Atomic Energy Agency INIS Collection Search: Introduction and main features The Role of the International Nuclear Information System.
Refined Online Citation Matching and Adaptive Canonical Metadata Construction CSE 598B Course Project Report Huajing Li.
Event-Based Infrastructure for Reconciling Distributed Annotation Records Ahmet Fatih Mustacoglu Advisor: Prof. Geoffrey C. Fox.
Event-Based Model for Reconciling Digital Entities Ahmet Fatih Mustacoglu Ahmet E. Topcu Aurel Cami Geoffrey C. Fox Indiana University Computer Science.
The AstroGrid-D Information Service Stellaris A central grid component to store, manage and transform metadata - and connect to the VO!
1 ACCESSING THE PURDUE LIBRARY DATABASES AND ONLINE JOURNALS September 14, 2006.
Indiana University School of Indiana University ECCR Summary Infrastructure: Cheminformatics web service infrastructure made available as a community resource.
Developing our Metadata: Technical Considerations & Approach Ray Plante NIST 4/14/16 NMI Registry Workshop BIPM, Paris 1 …don’t worry ;-) or How we concentrate.
Using E-Business Suite Attachments
Template library tool and Kestrel training
Reference management soft wares Endnote & Mendeley
SpringerLink Training August 2010
Ahmet Fatih Mustacoglu
CICC Combines Grid Computing with Chemical Informatics
Zetoc: Electronic Table of Contents from the British Library
Event-Based Infrastructure for Reconciling Distributed Annotation Records Ahmet Fatih Mustacoglu Advisor: Prof. Geoffrey C. Fox.
Zetoc: Electronic Table of Contents from the British Library
Event-Based Infrastructure for Reconciling Distributed Annotation Records Ahmet Fatih Mustacoglu Advisor: Prof. Geoffrey C. Fox.
Semantic Scholars’ Grid I
Bibliography and reference manager programs, Endnote 2018 Attila Skulteti
CICC Chemical Compound Mining Workflows
Integrated Collaborative Information Systems
Presentation transcript:

1 Semantic Research Grid Open Grid Forum Web 2.0 Workshop OGF21, Seattle Washington October Geoffrey Fox, Aurel Cami, Ahmet Fatih Mustacoglu, Ahmet E. Topcu Community Grids Laboratory, Indiana University Bloomington IN

Existing User Interface Semantic Scholars Grid etc. Google Scholar Manuscript Central Science.gov Windows Live Academic Search Citeseer CMT Conference Management Existing Document based Tools Web service Wrappers New Document-enhanced Research Tools Integration/ Enhancement User Interface Community Tools Generic Document Tools MyResearch Database Bibliographic Database Export: RSS, Bibtex Endnote etc. CiteULike Connotea Del.icio.us Bibsonomy Biolicious PubChem PubMed Traditional Grid Cyberinfrastructure MySpace Web 2.0 MASHUPMASHUP

Delicious Semantic Web/Grid purchased by Yahoo for ~$30M (Nature) Associate metadata with Bookmarks specified by URL’s, DOI’s (Digital Object Identifiers) Users add comments and keywords (called tags) Users are linked together into groups (communities) Information such as title and authors extracted automatically from some sites (PubMed, ACM, IEEE, Wiley etc.) Bibtex like additional information in CiteULike This is perhaps de facto Semantic Web – remarkable for its simplicity

Example Parallel Computing Collection selected on Cell Tag So far no clear “winner” in tagging space Maybe CiteUlike with different metadata better How do I preserve investment?

General Document Semantic Analysis Citeseer and Google Scholar scour the Internet and analyze documents for incidental metadata Title, author and institution of documents Citations with their own metadata allowing one to match to other documents These capabilities are sure to become more powerful and to be extended Give “Citation Index” in real time Tell you all authors of all papers that cite a paper that cites you etc. (Note it’s a small world so don’t go too far in link analysis) Tell you all citations of all papers in a workshop Helps journal editor by suggesting referees based on document analysis or by doing a “plagiarism” analysis by scoring comparison with other Internet documents

Possible challenges Use of Web 2.0 tools in science (and business) is very promising but adoption is currently small Which of many tools will be popular with your colleagues? What happens if tool you chose is not adopted or worse – just disappears in a industry “shake-up”? How to best integrate web-tagged document with Word and Latex citations? Need to tag URI’s – e.g. database entries, not just URL’s (did for journal control system) Is currently security model sufficient? Can we link virtual organization of tagging system with that of other Cyberinfrastructure/Web 2.0 subsystems

Roughly what we are doing We are NOT building a new tagging or search system We are building tools integrating and adding value to existing systems We built a mashup linking to del.icio.us, CiteULike, Connotea allowing exchange of tags between sites and between local repositories Repositories also link to local sources (PubsOnline) and Google Scholar (GS) and Windows Academic Live (WLA) GS has number of cited publications. WLA has Digital Object Identifier (DOI) We implement a rather more powerful access control mechanism We build heuristic tools to mine “web lists” for citations We have an “event” based architecture (consistency model) allowing change actions to be preserved and selectively changed Supports integrating different inconsistent views of a given document and its updates on different tagging systems

del.icio.us Tags Download to Local System del.icio.us Tags

Semantic Research Grid (SRG) Architecture

Key Concepts of System Architecture Digital Entity (DE): a digital collection of metadata for a citation Event: a time-stamped action on a digital entity. Our event-based model consists of: Major Events: Insertion or deletion of a digital entity Minor Events: Modifications to an existing digital entity Dataset: Collection of major and minor events Service-based Framework (SOAP over Http) 11/21/

Example Subsystem Transfer Download/Upload Modify Digital Entity (DE) Share DE with other users Add/Get More info on a DE History (as a set of events) of a DE and rollback 11/21/ CiteULikeDelicousConnotea Research Database Research Database Research Database Core Web Services

SRG System Modules I Digital Entity (DE) Management Service Manual DE entity into the system DE history DE versioning and flexible choices (rollback) Editing and more info tools for a DE (Update Model) Session and Event Management Services Event and dataset management DE view options User credentials (username/password) - cookie-based Annotation Tools Service Transfer Service Download service Upload Service Extract DE and tags from web lists 11/21/201512

SRG System Modules II Search Tools Services Google Scholar/Windows Live Academic Google Scholar Advanced Local Database Search: Via integrated PubsOnline Tool from Indiana University My Research Database My Research Database Advanced Authentication and Authorization Services Login and Logout service DE Access rights management Database access rights management Administrative tools Other Services User Registration Username and password recovery User’s Profile Management DE metadata view options 11/21/201513

Technical Issues Event-based model Manipulating data and metadata How to build event-based model ? Major and Minor events Datasets (collection of minor events) How to apply event-based model ? How to apply modifications to a record (Digital Entity) ? Keep them in user’s session and let user apply them Or apply them automatically to a DE How to merge metadata fields of Event and Digital Entity ? Identification of metadata fields as dynamic or static field How to apply service-based framework as wrapper? 11/21/201514

Some recent Features of SRG Hybrid Consistency Framework Implementation – Data-centric strict consistency model – Implements primary-copy based consistency protocol – Pull-based: Time-based consistency approach. Communicates with Annotation Tools to collect updates periodically – Push-based: Updates are distributed to Annotation Tools immediately once they occurred on the primary copy Periodic Search Tools Implementation – Search, compare and apply the updates made to a Digital Entity (DE) in the system. Unique (128 bit) UUID assignment for each Digital Entity User Tags view in the system – Displays all tags belongs to a user – Allow easy update or more info request on a Digital Entity by tags

Hybrid Consistency Framework for Semantic Research Grid

Tool Updating Database from Web Page

Metadata Collection from CGL web pages The aim is to –Eliminate duplicate data entry in different web platforms. –Building richer metadata in SRG using base collected Digital Entities from web pages. –Share new Digital Entities with other tools and users in SRG –Push new collected Digital Entities to other communities using web 2.0 features

Methodology for Collection Collect: –Digital Entities in Community Grid Publication web pages. Analyze: –Using heuristic methodology to extract metadata fields of the Digital Entities for CGL publications Build: –RSS objects using collected Digital Entities. –New tags using collected Digital Entities. Compare: –Collected Digital Entities from CGL web pages with the existing Digital Entities in SRG. If they are: –different: Store new Digital Entities in SRG storage. –same: Option to update tags and other fields. Share: – New Digital Entities with other Tools using SRG.

Security Model Security in Web 2.0 can be limited We implement a simple but more powerful security model around local tools that wrap Web 2.0 systems We used an access-control matrix model to provide security for our information system Supports multiple groups and multiple users for each object. Similar to UNIX file system The Unix RWX bits corresponds to Read, Write, and Execute operation for each file and directory. In SRG, DE (Digital Entity) correspond to the file element and folder corresponds to the directory element. For each DE and folder, there are three types of access rights defined in the systems: Read, Write, and Delete.

Security Model II We have a security model that supports Level of Authorization Roles are defined as Super Administrator (SA) and Group Administrator (GA), User (U) The system allows having more than one SA. An existing SA can add other SAs to the system. SA can assign any U to become GA, and remove GA from group. Each group should at least one GA. GA add/remove U from group User profile Share user profile between Web 2.0 sites.

Current Usage of Semantic Research Grid Project We have used/tested Semantic Research Grid (SRG) (a prototype model) for published scientific research publications in Community Grids Lab at Indiana University In CGL 20 students,post-docs and faculty members are testing They are using the prototype model for collecting of publication, uploading/ downloading them and sharing them with other users

Summary Integration We have successfully integrated Google Scholar and Windows Live Academic search tools and CiteUlike, Delicious, and Connotea annotation tools which provide a system that allow dynamic publication. Flexibility and Extensibility We provides flexibility allowing integration of different tools having common metadata. Easy to add and extend service mechanism Management and Consistency Scheme of Digital Entities Allows the manipulation of a digital entity Applies Event-based model based on the concept of: Major events Minor events Datasets Provides a rollback feature to: Support for history tool for a DE Merge and change the content of a digital entity A service-based framework for using existing annotation tools through web services Prototype project web site:

Domain Specific Semantic Document Analysis It is natural to develop core document Services such as those used in Citeseer/Google Scholar but applied to “your” documents of interest that may not have been processed yet As just submitted to a conference perhaps These tools can help form useful lists such as authors of all cited or submitted papers to a journal OSCAR3 (from Peter Murray-Rust’s group at Cambridge) augments the application independent “core” metadata (Title, authors, institutions, Citations) with a list of all chemical terms This tool is a Service that can be applied to “your” document or to a set of documents harvested in some fashion Luis Rocha has developed related ideas for Biology Other fields have natural application specific metadata and OSCAR like tools can be developed for them This is another Semantic Scholar Grid Tool

OSCAR3 Chemistry Document analysis It detects “magic” chemical strings in text and then Stores them as metadata associated with document Queries ChemInformatics repositories to tell you lots of information about identified compounds Tells you which other documents have this compound

Initial Results from OSCAR on PubMed We have a small sample (100) of full text Chemistry papers selected at random from 15 years of PubMed with over 5 million abstracts OSCAR3 generates 4.17 compound names per abstract and 36.7 compound names per full text 555,007 PubMed abstracts of 2005 – 2006 (part) used for Abstracts (on Big Red) Illustrates how much knowledge journal publishers are hiding from us

CICC Chemical Informatics Cyberinfrastructure Collaboratory PubMed Database OSCAR Text Analysis POV-Ray Parallel Rendering Initial 3D Structure Calculation Toxicity Filtering Cluster Grouping Docking Molecular Mechanics Calculations Quantum Mechanics Calculations IU’s Varuna Database NIH PubChem Database NIH PubChem Database Product databases are wrapped with Web service interfaces and are suitable for inclusion in Taverna workflows. PubChem Database MOAD Database Integrating document (OSCAR) and conventional services on the IU Big Red Supercomputer