Tom Oinn, In general a grid system is, or should be : “A collection of a resources able to act collaboratively in pursuit of an overall.

Slides:



Advertisements
Similar presentations
Delivering User Needs: A middleware perspective Steven Newhouse Director.
Advertisements

Remote Visualisation System (RVS) By: Anil Chandra.
Abstraction Layers Why do we need them? –Protection against change Where in the hourglass do we put them? –Computer Scientist perspective Expose low-level.
A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University
The National Grid Service and OGSA-DAI Mike Mineter
OMII-UK Steven Newhouse, Director. © 2 OMII-UK aims to provide software and support to enable a sustained future for the UK e-Science community and its.
Configuration management
1 G2 and ActiveSheets Paul Roe QUT Yes Australia!
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 12 Slide 1 Distributed Systems Design 2.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 9 Distributed Systems Architectures Slide 1 1 Chapter 9 Distributed Systems Architectures.
IBM Watson Research © 2004 IBM Corporation BioHaystack: Gateway to the Biological Semantic Web Dennis Quan
1 Richard White Design decisions: architecture 1 July 2005 BiodiversityWorld Grid Workshop NeSC, Edinburgh, 30 June - 1 July 2005 Design decisions: architecture.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
Distributed Systems Architectures
Data Grids: Globus vs SRB. Maturity SRB  Older code base  Widely accepted across multiple communities  Core components are tightly integrated Globus.
Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.
UvA, Amsterdam June 2007WS-VLAM Introduction presentation WS-VLAM Requirements list known as the WS-VLAM wishlist System and Network Engineering group.
Ch 12 Distributed Systems Architectures
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 12 Slide 1 Distributed Systems Architectures.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 12 Slide 1 Distributed Systems Design 1.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Apache Airavata GSOC Knowledge and Expertise Computational Resources Scientific Instruments Algorithms and Models Archived Data and Metadata Advanced.
- 1 - Grid Programming Environment (GPE) Ralf Ratering Intel Parallel and Distributed Solutions Division (PDSD)
A Scalable Application Architecture for composing News Portals on the Internet Serpil TOK, Zeki BAYRAM. Eastern MediterraneanUniversity Famagusta Famagusta.
©Ian Sommerville 2006Software Engineering, 8th edition. Chapter 12 Slide 1 Distributed Systems Architectures.
Connecting OurGrid & GridSAM A Short Overview. Content Goals OurGrid: architecture overview OurGrid: short overview GridSAM: short overview GridSAM: example.
Taverna and my Grid Basic overview and Introduction Tom Oinn
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
DISTRIBUTED COMPUTING
Nicholas LoulloudesMarch 3 rd, 2009 g-Eclipse Testing and Benchmarking Grid Infrastructures using the g-Eclipse Framework Nicholas Loulloudes On behalf.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
Taverna and my Grid Open Workflow for Life Sciences Tom Oinn
QCDGrid Progress James Perry, Andrew Jackson, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.
The Network Performance Advisor J. W. Ferguson NLANR/DAST & NCSA.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Stian Soiland-Reyes myGrid, School of Computer Science University of Manchester, UK UKOLN DevSci: Workflow Tools Bath,
ICCS WSES BOF Discussion. Possible Topics Scientific workflows and Grid infrastructure Utilization of computing resources in scientific workflows; Virtual.
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
Introduction to Grids By: Fetahi Z. Wuhib [CSD2004-Team19]
Scalable Hybrid Keyword Search on Distributed Database Jungkee Kim Florida State University Community Grids Laboratory, Indiana University Workshop on.
1 G52IWS: Web Services Chris Greenhalgh. 2 Contents The World Wide Web Web Services example scenario Motivations Basic Operational Model Supporting standards.
Development of e-Science Application Portal on GAP WeiLong Ueng Academia Sinica Grid Computing
Providing web services to mobile users: The architecture design of an m-service portal Minder Chen - Dongsong Zhang - Lina Zhou Presented by: Juan M. Cubillos.
GraDS MacroGrid Carl Kesselman USC/Information Sciences Institute.
CSC 480 Software Engineering Lecture 17 Nov 4, 2002.
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
July 19, 2004Joint Techs – Columbus, OH Network Performance Advisor Tanya M. Brethour NLANR/DAST.
Breaking the frontiers of the Grid R. Graciani EGI TF 2012.
Workflow and myGrid Justin Ferris IT Innovation Centre 7 October 2003 Life Sciences Grid GGF9.
OGSA-DAI.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Distributed Systems Architectures Chapter 12. Objectives  To explain the advantages and disadvantages of different distributed systems architectures.
Distributed Systems Architectures. Topics covered l Client-server architectures l Distributed object architectures l Inter-organisational computing.
Chapter 1 Characterization of Distributed Systems
Accessing the VI-SEEM infrastructure
Joseph JaJa, Mike Smorul, and Sangchul Song
Grid Computing.
Recap: introduction to e-science
CSC 480 Software Engineering
Chapter 18 MobileApp Design
Grid Services B.Ramamurthy 12/28/2018 B.Ramamurthy.
Serpil TOK, Zeki BAYRAM. Eastern MediterraneanUniversity Famagusta
The Anatomy and The Physiology of the Grid
The Anatomy and The Physiology of the Grid
Grid Systems: What do we need from web service standards?
Presentation transcript:

Tom Oinn,

In general a grid system is, or should be : “A collection of a resources able to act collaboratively in pursuit of an overall objective” A life science grid is therefore : “A collection of resources able to act collaboratively to solve a problem in the life science domain”

 Massive diversity of  Information classes  Services  Data  Problems  Relatively small data sizes  Relatively small computational load  Challenge is complexity and heterogeneity  Much scientific work is exploratory  Environment must be flexible and easy to reconfigure  Environment must provide facilities for provenance capture

 Existing diverse services  Web based, SOAP services, custom protocols such as BioMoby etc.  Existing data resources  Relational, unstructured flat file, XML  May or may not be exposed through some kind of service interface i.e. SRS, BioMart  Existing user communities  Large well funded service and research projects with substantial IT support  Small groups with no IT support, little funding but interesting problems

 Experts in their domain  Little or no experience with distributed computing  Most bioinformaticians are not computer scientists  Generally not supported by dedicated CS groups  Need to allow these users to make use of their existing expertise but remove concerns such as:  Parallelism  Distributed programming  Fault recovery  Job dispatch and submission  Provenance capture  Logging and auditing

“A collection of existing legacy and novel tools and databases exposed through a variety of technologies able to act collaboratively to solve a problem posed by an ‘IT naïve’ user in the life science domain across the public internet and with little or no technological support and as inexpensively as possible.” Users typically have no control over services (provided by 3 rd parties) so create a client side integration platform. Should be accessible to an unsupported PhD student with standard networking, a three year old PC and no dedicated IT support.

A ‘super client’ to a variety of disparate services on both intra-net and inter-net

 Project homepage :  my Grid project page :  OMII-UK home :  Alberto’s Taverna + EBI mini tutorial :

 Taverna is :  A workflow language based on a dataflow model.  A graphical editing environment for that language.  An invocation system to run instances of that language on data supplied by a user of the system.  When you download it you get all this rolled into a single piece of desktop software  The enactor can be run independently of the GUI

 Taverna can interoperate the following by default :  SOAP based web services  Biomart data warehouses  Soaplab wrapped command line tools  BioMoby services and object constructors  Inline interpreted scripting (Java based)  Other service classes can be added through an extension point (but you probably don’t need to)

Document builders Service invocation (creates job) Polling loop (check status, fail if not ready) Get results Add service to services list by pointing Taverna to Web Service Description Language (WSDL) document online Taverna inspects WSDL, extracts operations Add operations to workflow, right click to automatically add document builders and splitters for doc/literal style services Use nested workflow to define polling logic, sub-workflow fails, waits and retries if data is not ready *SOAP is the Simple Object Access Protocol - &

Soaplab server in services list Individual tool within category Soaplab services support rich descriptive metadata Soaplab services are added to the services palette by pointing Taverna at the root of the Soaplab installation. Individual services within that server are categorized and displayed within categories Services support polling and provide links to metadata directly within Taverna

 BioMoby provides semantic description of services  Taverna can use this to assist in the service composition at design time  All this provided by the Moby team – Taverna’s extension architecture allows third party developers to contribute in a loosely coupled way

 Service discovery  Free text search over ‘known’ services.  Semantic search over service repository, relies on manual service annotation and submission of those annotations to the repository.  Provenance tracking  Lineage tracking of result data.  Automatic semantic annotation of data from service annotations.  Possible as the workflow engine creates a ‘managed environment’ with an overview of all data movement.  Result visualization  Common renderers included in base distribution include 3d structure, images, graph rendering  Extensibility  New service classes  New renderer types  New UI elements

 Funded through the Open Middleware Infrastructure Institute (OMII-UK) as part of the my Grid project run by Carole Goble  Four years old, funding secured through 2008 and beyond.  Development team at Manchester & Hinxton, UK  Wide group of ‘friends and allies’ across the world particularly within UK eScience  Implemented in Java, released under LGPL licence.

OMII-UK (Virtual Institute) myGrid (group) Manchester & EBI TavernamyExperiment… OGSA- DAI (group) Edinburgh OGSA-DQP OMII Stack (group) Southampton OMII Server side stack, WS Security etc. GridSAM job submission tool … Top level support, overall strategy, coordinated builds Areas of expertise, project management, per-project support Software products

 Science varies widely in scale both in space (CPU cycles required, storage, numbers of services etc) and time (duration of collaborations, stability of VO membership)  Current grid infrastructure is focused on projects with large spatial and temporal scale  Does this existing work map well to scientific problems with different characteristics, especially different temporal characteristics?  What about security…?

 A workflow can access multiple resources  These resources can have arbitrary security constraints  It is likely that a given workflow requires more than one principal to be available to complete.  How can we make multiple security agents available to the workflow engine in a principled fashion?  Define the basic unit of a virtual experiment or fast virtual organization to map directly to a peer group within a peer to peer framework  Peer group contains a workflow instance along with any resources required to enact that instance including arbitrarily many security agents, data stores, metadata stores etc.  Services accessed by the workflow may (and usually will) exist outside of the peer group.

Set of credentials Policy engine Policy Set of credentials Policy engine Policy Workflow instance Security Agent (User A) Security Agent (User B) Data Manager Peer Group (Virtual Experiment) Grid serviceWeb serviceREST service... External tools, data and services

 A Virtual Experiment (VE) is created by the construction of a new peer group within the P2P framework  Resources such as workflow engines, data managers and security agents exist as factory services.  Each factory can construct a limited version of itself  Workflow engines with specific workflow definitions loaded  Data managers with specific levels of storage space  Security agents with policies to restrict full use of credentials  These limited proxy objects connect to the peer group  This is a secured operation but as there is no delegation existing security mechanisms are adequate to get this far  Factories may be on the intranet or internet (most likely for workflow services) or on the user’s workstation, PDA or cellphone (for security agents).

 A VE becomes collaborative when more than one user can access the objects within the peer group.  A VE uses collaborative security when more than one user inserts a security agent into the peer group.  Note that the peer group structure also allows multiple views on the same VE as objects can exist in more than one peer group.  For example, you could split the workflow instance into a monitoring and steering component and give some users access to a peer group containing both and others to one containing only the monitoring part.  The peer group has a unique identity which can be used to discover or register it with any registry service available.

 Taverna2 under development, delivery by the end of 2007  Rewrite of Taverna to support, amongst other things:  Integration with grid technologies through a set of new extensibility points  Transient VO management (short lived virtual organizations, 20 second upwards lifetime!)  More sophisticated computational model  Massive scalability, pipelining of nested token streams, single threaded execution model, transparent reference passing architecture  Monitoring and steering of running processes with arbitrary granularity through an extension point  Implement extensions to interface to your GRID  Get a free and well supported rich client portal for non expert users  Access otherwise out of reach user communities

 If you have a grid with resources that our community could use  Talk to us, tell us about it  Write a plugin for its resource broker, data system or security model  If you have a scientific community who wants to access such resources  Again, please let us know  We can provide on site training  We are always interested in new application areas for our work  I can be contacted at or for more general discussion please join the mailing lists linked from

Please see for most up to date listhttp://