e-Science e-Business e-Government and their Technologies Introduction

Slides:



Advertisements
Similar presentations
Web Service Ahmed Gamal Ahmed Nile University Bioinformatics Group
Advertisements

Database Architectures and the Web
RPC Robert Grimm New York University Remote Procedure Calls.
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 9 Distributed Systems Architectures Slide 1 1 Chapter 9 Distributed Systems Architectures.
High Performance Computing Course Notes Grid Computing.
This product includes material developed by the Globus Project ( Introduction to Grid Services and GT3.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
Distributed components
Presentation 7 part 2: SOAP & WSDL. Ingeniørhøjskolen i Århus Slide 2 Outline Building blocks in Web Services SOA SOAP WSDL (UDDI)
Notes to the presenter. I would like to thank Jim Waldo, Jon Bostrom, and Dennis Govoni. They helped me put this presentation together for the field.
Distributed Systems Architectures
4b.1 Grid Computing Software Components of Globus 4.0 ITCS 4010 Grid Computing, 2005, UNC-Charlotte, B. Wilkinson, slides 4b.
Ch 12 Distributed Systems Architectures
Systems Architecture, Fourth Edition1 Internet and Distributed Application Services Chapter 13.
Principles for Collaboration Systems Geoffrey Fox Community Grids Laboratory Indiana University Bloomington IN 47404
® IBM Software Group © IBM Corporation IBM Information Server Service Oriented Architecture WebSphere Information Services Director (WISD)
Client/Server Architectures
QCDgrid Technology James Perry, George Beckett, Lorna Smith EPCC, The University Of Edinburgh.
©Ian Sommerville 2006Software Engineering, 8th edition. Chapter 12 Slide 1 Distributed Systems Architectures.
Possible Architectural Principles for OGSA-UK and other Grids UK e-Science Core Programme Town Meeting London Monday 31st January 2005 “Defining the next.
Web Services Architecture1 - Deepti Agarwal. Web Services Architecture2 The Definition.. A Web service is a software system identified by a URI, whose.
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
1 Advances and Changes in Simulation Geoffrey Fox Professor of Computer Science, Informatics, Physics Pervasive Technology Laboratories Indiana University.
DISTRIBUTED COMPUTING
Lecture 3: Sun: 16/4/1435 Distributed Computing Technologies and Middleware Lecturer/ Kawther Abas CS- 492 : Distributed system.
1 HKU CSIS DB Seminar: HKU CSIS DB Seminar: Web Services Oriented Data Processing and Integration Speaker: Eric Lo.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
Architecting Web Services Unit – II – PART - III.
1 S-Matrix and the Grid Geoffrey Fox Professor of Computer Science, Informatics, Physics Pervasive Technology Laboratories Indiana University Bloomington.
GEM Portal and SERVOGrid for Earthquake Science PTLIU Laboratory for Community Grids Geoffrey Fox, Marlon Pierce Computer Science, Informatics, Physics.
Grid Execution Management for Legacy Code Applications Grid Enabling Legacy Code Applications Tamas Kiss Centre for Parallel.
Web Services. Abstract  Web Services is a technology applicable for computationally distributed problems, including access to large databases What other.
Grid Architecture William E. Johnston Lawrence Berkeley National Lab and NASA Ames Research Center (These slides are available at grid.lbl.gov/~wej/Grids)
Remarks on Grids e-Science CyberInfrastructure and Peer-to-Peer Networks Los Alamos September Geoffrey Fox Community Grids Lab Indiana University.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Grid Security: Authentication Most Grids rely on a Public Key Infrastructure system for issuing credentials. Users are issued long term public and private.
XML and Web Services (II/2546)
1 Overview of e-Science and the Grid Geoffrey Fox Professor of Computer Science, Informatics, Physics Pervasive Technology Laboratories Indiana University.
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
Ipgdec5-01 Remarks on Web Services PTLIU Laboratory for Community Grids Geoffrey Fox, Marlon Pierce, Shrideep Pallickara, Choonhan Youn Computer Science,
ISERVOGrid Architecture Working Group Brisbane Australia June Geoffrey Fox Community Grids Lab Indiana University
Kemal Baykal Rasim Ismayilov
Remarks on OGSA and OGSI e-Science All Hands Meeting September Geoffrey Fox, Indiana University.
GCE Shell? GGF6 Chicago October Geoffrey Fox Marlon Pierce Indiana University
1 G52IWS: Web Services Chris Greenhalgh. 2 Contents The World Wide Web Web Services example scenario Motivations Basic Operational Model Supporting standards.
7. Grid Computing Systems and Resource Management
Some comments on Portals and Grid Computing Environments PTLIU Laboratory for Community Grids Geoffrey Fox, Marlon Pierce Computer Science, Informatics,
Development of e-Science Application Portal on GAP WeiLong Ueng Academia Sinica Grid Computing
Providing web services to mobile users: The architecture design of an m-service portal Minder Chen - Dongsong Zhang - Lina Zhou Presented by: Juan M. Cubillos.
GRID ANATOMY Advanced Computing Concepts – Dr. Emmanuel Pilli.
Partnerships in Innovation: Serving a Networked Nation Grid Technologies: Foundations for Preservation Environments Portals for managing user interactions.
EGEE is a project funded by the European Union under contract IST Introduction to Web Services 3 – 4 June
Grid Execution Management for Legacy Code Architecture Exposing legacy applications as Grid services: the GEMLCA approach Centre.
E-commerce Architecture Ayşe Başar Bener. Client Server Architecture E-commerce is based on client/ server architecture –Client processes requesting service.
1 Grid Systems: What is needed from Web Service standards? ICSOC Panel November Geoffrey Fox Computer Science, Informatics, Physics Pervasive Technology.
A service Oriented Architecture & Web Service Technology.
E-Business e-Science and the Grid Geoffrey Fox Professor of Computer Science, Informatics, Physics Pervasive Technology Laboratories Indiana University.
Added Value to XForms by Web Services Supporting XML Protocols Elina Vartiainen Timo-Pekka Viljamaa T Research Seminar on Digital Media Autumn.
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
Grid Computing.
University of Technology
#01 Client/Server Computing
iSERVOGrid Architecture Working Group Brisbane Australia June
Introduction to Grid Technology
The Anatomy and The Physiology of the Grid
The Anatomy and The Physiology of the Grid
#01 Client/Server Computing
Presentation transcript:

e-Science e-Business e-Government and their Technologies Introduction Bryan Carpenter, Geoffrey Fox, Marlon Pierce Pervasive Technology Laboratories Indiana University Bloomington IN 47404 January 12 2004 dbcarpen@indiana.edu gcf@indiana.edu mpierce@cs.indiana.edu http://www.grid2004.org/spring2004

Class Structure Grading based on mixture of homework and a single final project Up to 2 students can collaborate together on final project Homework 70% Final Project 30% grade NO midterm or final Homework will mainly be programming based but there may be reports either in final or one or two homework assignments Reports will use Internet, book and “Gap Analysis”

What are we doing This is a semester-long course on Grids (viewed as technologies and infrastructure) and the application – mainly to science but also to business and government We will assume a basic knowledge of the Java language and then interweave 6 topic areas – first four cover technologies that will be used by students 1) Advanced Java: including networking, Java Server Pages and perhaps servlets 2) XML: Specification, Tools, Linkage to Java 3) Web Services: Basic Ideas, WSDL, Axis and Tomcat 4)Grid Systems: GT3/Cogkit, Gateway, XSOAP, Portlet 5) Advanced Technology Discussions: CORBA as istory, OGSA-DAI, security, Semantic Grid, Workflow 6) Applications: Bioinformatics, Particle Physics, Engineering, Crises, Computing-on-demand Grid, Earth Science

Course Topics 1 and 2 : Background/Core Advanced Java Programming We will assume basic Java programming proficiency We will cover Java client/server, three-tiered and network programming. Ancillary but interesting Java topics to be covered include Apache Ant, XML-Beans, and Java Message Service XML and XML Schema We will provide introductory material. Necessary to understand Web Service standards Examples include RDF (semantic web) and SOAP (Web services) XML Tools XML Databases (Xindice, Sleepycat) Search: XPath, XQuery

Course Topics 3 and 4: Web and Grid Services Overview Material Grid and Web Service Architectures Basic Web Service Standards WSDL, SOAP: structure and definitions Building services in Java: Apache Axis Advanced Web Services: Emerging capabilities WS-ReliableMessaging, WS-Security, WS-Transaction Computational Grids Globus Toolkit 2 Java COG Kit for Globus programming Grids Meet Web Services Open Grid Service Architecture/Infrastructure Implementations: GSX from Indiana University The Semantic Grid: Information Models for Describing Resources RDF, DAML-OIL, and OWL

Grid Computing: Making The Global Infrastructure a Reality Based on work done in preparing book edited with Fran Berman and Anthony J.G. Hey, ISBN: 0-470-85319-0 Hardcover 1080 Pages Published March 2003 http://www.grid2002.org

Other See the webcast in an Oracle technology series http://webevents.broadcast.com/techtarget/Oracle/100303/index.asp?loc=10 See also the “Gap Analysis” http://grids.ucs.indiana.edu/ptliupages/publications/GapAnalysis30June03v2.pdf We can send you nicely printed versions of this End of this is a good collection of references and it gives both a general survey of current Grids and specific examples from UK Appendix with more details is: http://grids.ucs.indiana.edu/ptliupages/publications/Appendix30June03.pdf See also GlobusWorld http://www.globusworld.org/ and the Grid Forum http://www.gridforum.org

e-moreorlessanything and the Grid e-Business captures an emerging view of corporations as dynamic virtual organizations linking employees, customers and stakeholders across the world. The growing use of outsourcing is one example e-Science is the similar vision for scientific research with international participation in large accelerators, satellites or distributed gene analyses. The Grid integrates the best of the Web, traditional enterprise software, high performance computing and Peer-to-peer systems to provide the information technology e-infrastructure for e-moreorlessanything. A deluge of data of unprecedented and inevitable size must be managed and understood. People, computers, data and instruments must be linked. On demand assignment of experts, computers, networks and storage resources must be supported

So what is a Grid? Supporting human decision making with a network of at least four large computers, perhaps six or eight small computers, and a great assortment of disc files and magnetic tape units - not to mention remote consoles and teletype stations - all churning away. (Licklider 1960) Coordinated resource sharing and problem solving in dynamic multi-institutional virtual organizations Infrastructure that will provide us with the ability to dynamically link together resources as an ensemble to support the execution of large-scale, resource-intensive, and distributed applications. Realizing thirty year dream of science fiction writers that have spun yarns featuring worldwide networks of interconnected computers that behave as a single entity.

What is a High Performance Computer? We might wish to consider three classes of multi-node computers 1) Classic MPP with microsecond latency and scalable internode bandwidth (tcomm/tcalc ~ 10 or so) 2) Classic Cluster which can vary from configurations like 1) to 3) but typically have millisecond latency and modest bandwidth 3) Classic Grid or distributed systems of computers around the network Latencies of inter-node communication – 100’s of milliseconds but can have good bandwidth All have same peak CPU performance but synchronization costs increase as one goes from 1) to 3) Cost of system (dollars per gigaflop) decreases by factors of 2 at each step from 1) to 2) to 3) One should NOT use classic MPP if class 2) or 3) suffices unless some security or data issues dominates over cost-performance One should not use a Grid as a true parallel computer – it can link parallel computers together for convenient access etc.

e-Science e-Science is about global collaboration in key areas of science, and the next generation of infrastructure that will enable it. This is a major UK Program e-Science reflects growing importance of international laboratories, satellites and sensors and their integrated analysis by distributed teams CyberInfrastructure is the analogous US initiative Grid Technology supports e-Science and CyberInfrastructure It is software (middeleware) built on top of networks

Global Terabit Research Network The Grid software and resources run on top of high performance global networks

USA Network

Terabit Networks Network performance will increase faster than Moore’s law – partly because optical fiber has almost unlimited bandwidth and partly because there are many old networks to be replaced Home dial-ups (56kbit)  DSL/Cable Modem (2 megabits/sec)  FTTP (Fiber to the Premise at gigabit performance) 2006 Goal of Global Terabit Research Network International: National Backbone: Organization;: Optical Desktop: Copper Desktop is 1000:1000:100:10:1 Gigabit/sec

e-Business and (Virtual) Organizations Enterprise Grid supports information system for an organization; includes “university computer center”, “(digital) library”, sales, marketing, manufacturing … Outsourcing Grid links different parts of an enterprise together (Gridsourcing) Manufacturing plants with designers Animators with electronic game or film designers and producers Coaches with aspiring players (e-NCAA or e-NFL etc.) Customer Grid links businesses and their customers as in many web sites such as amazon.com e-Multimedia can use secure peer-to-peer Grids to link creators, distributors and consumers of digital music, games and films respecting rights Distance education Grid links teacher at one place, students all over the place, mentors and graders; shared curriculum, homework, live classes …

e-Defense and e-Crisis Grids support Command and Control and provide Global Situational Awareness Link commanders and frontline troops to themselves and to archival and real-time data; link to what-if simulations Dynamic heterogeneous wired and wireless networks Security and fault tolerance essential System of Systems; Grid of Grids The command and information infrastructure of each ship is a Grid; each fleet is linked together by a Grid; the President is informed by and informs the national defense Grid Grids must be heterogeneous and federated Crisis Management and Response enabled by a Grid linking sensors, disaster managers, and first responders with decision support

Classes of Computing Grid Applications Running “Pleasing Parallel Jobs” as in United Devices, Entropia (Desktop Grid) “cycle stealing systems” Can be managed (“inside” the enterprise as in Condor) or more informal (as in SETI@Home) Computing-on-demand in Industry where jobs spawned are perhaps very large (SAP, Oracle …) Support distributed file systems as in Legion (Avaki), Globus with (web-enhanced) UNIX programming paradigm Particle Physics will run some 30,000 simultaneous jobs this way Pipelined applications linking data/instruments, compute, visualization Seamless Access where Grid portals allow one to choose one of multiple resources with a common interfaces

Utility Computing An important business application of Grids is utility computing Namely support a pool of computers to be assigned as needed to take-up extra demand Pool shared between multiple applications One can say this application is common in academia where different simulations share resources while in industry we have Web Servers Financial Modeling Data-mining Simulation response to crisis like forest fire or earthquake Architecture is “Farm of Grid Services” connected to Internet not cluster of computers connected to each other

Static Assignment with redundancy Dynamic on-demand Assignment Resources-on-demand Computing-on-demand uses dynamically assigned (shared) pool of resources to support excess demand in flexible cost-effective fashion Program A Computer 1 Program Z Computer 26 Program A Computer 27 Program Z Computer 52 Spares Static Assignment with redundancy Pool Computer 1 Pool Computer N <52 Program A Program Z Dynamic on-demand Assignment

Some Important Styles of Grids Computational Grids were origin of concepts and link computers across the globe – high latency stops this from being used as parallel machine Knowledge and Information Grids link sensors and information repositories as in Virtual Observatories or BioInformatics More detail on next slide Education Grids link teachers, learners, parents as a VO with learning tools, distant lectures etc. e-Science Grids link multidisciplinary researchers across laboratories and universities Community Grids focus on Grids involving large numbers of peers rather than focusing on linking major resources – links Grid and Peer-to-peer network concepts Semantic Grid links Grid, and AI community with Semantic web (ontology/meta-data enriched resources) and Agent concepts

Information/Knowledge Grids Distributed (10’s to 1000’s) of data sources (instruments, file systems, curated databases …) Data Deluge: 1 (now) to 100’s petabytes/year (2012) Moore’s law for Sensors Possible filters assigned dynamically (on-demand) Run image processing algorithm on telescope image Run Gene sequencing algorithm on compiled data Needs decision support front end with “what-if” simulations Metadata (provenance) critical to annotate data Integrate across experiments as in multi-wavelength astronomy Data Deluge comes from pixels/year available

2.4 Petabytes Today

Repositories Federated Databases SERVOGrid for e-Geoscience Closely Coupled Compute Nodes Analysis and Visualization Repositories Federated Databases Sensor Nets Streaming Data Loosely Coupled Filters SERVOGrid for e-Geoscience ? Discovery Services SERVOGrid – Solid Earth Research Virtual Observatory will link Australia, Japan, USA ……

SERVOGrid Requirements Seamless Access to Data repositories and large scale computers Integration of multiple data sources including sensors, databases, file systems with analysis system Including filtered OGSA-DAI (Grid database access) Rich meta-data generation and access with SERVOGrid specific Schema extending openGIS (Geography as a Web service) standards and using Semantic Grid Portals with component model for user interfaces and web control of all capabilities Collaboration to support world-wide work Basic Grid tools: workflow and notification

~ Gigabyte per aircraft per Engine per transatlantic flight DAME In flight data ~5000 engines ~ Gigabyte per aircraft per Engine per transatlantic flight Global Network Such as SITA Ground Station Airline Engine Health (Data) Center Maintenance Centre Internet, e-mail, pager Rolls Royce and UK e-Science Program Distributed Aircraft Maintenance Environment

NASA Aerospace Engineering Grid It takes a distributed virtual organization to design, simulate and build a complex system like an aircraft

Virtual Observatory Astronomy Grid Integrate Experiments Radio Far-Infrared Visible Dust Map Visible + X-ray Galaxy Density Map

e-Chemistry Laboratory Experiments-on-demand Grid-enabled Output Streams Grid Resources

CERN LHC Data Analysis Grid

Typical Grid Architecture Each Blob is a Computer Program! Raw (HPC) Resources Middleware Database Portal Services System Services Application Service Each Blob is a Computer Program! User Services “Core” Grid

Sources of Grid Technology Grids support distributed collaboratories or virtual organizations integrating concepts from The Web Agents Distributed Objects (CORBA Java/Jini COM) Globus, Legion, Condor, NetSolve, Ninf and other High Performance Computing activities Peer-to-peer Networks With perhaps the Web and P2P networks being the most important for “Information Grids” and Globus for “Compute Grids”

The Essence of Grid Technology? We will start from the Web view and assert that basic paradigm is Meta-data rich Web Services communicating via messages These have some basic support from some runtime such as .NET, Jini (pure Java), Apache Tomcat+Axis (Web Service toolkit), Enterprise JavaBeans, WebSphere (IBM) or GT3 (Globus Toolkit 3) These are the distributed equivalent of operating system functions as in UNIX Shell Called Hosting Environment or platform W3C standard WSDL defines IDL (Interface standard) for Web Services

Meta-data Meta-data is usually thought of as “data about data” The Semantic Web is at its simplest considered as adding meta-data to web pages For example, the hospital web-page has meta-data telling you its location, phone-number, specialties which can be used to automate Google-style searches to allow planning of disease/accident treatment from web Modern trend (Semantic Grid) is meta-data about web-services e.g. specify details of interface and useage Such as that a bioinformatics service is free or bandwidth input is of limited amount Provenance – history and ownership – of data very important

A typical Web Service In principle, services can be in any language (Fortran .. Java .. Perl .. Python) and the interfaces can be method calls, Java RMI Messages, CGI Web invocations, totally compiled away (inlining) The simplest implementations involve XML messages (SOAP) and programs written in net friendly languages like Java and Python Payment Credit Card Web Services WSDL interfaces Security Catalog Portal Service Warehouse Shipping control WSDL interfaces Web Services

Services and Distributed Objects A web service is a computer program running on either the local or remote machine with a set of well defined interfaces (ports) specified in XML (WSDL) Web Services (WS) have many similarities with Distributed Object (DO) technology but there are some (important) technical and religious points (not easy to distinguish) CORBA Java COM are typical DO technologies Agents are typically SOA (Service Oriented Architecture) Both involve distributed entities but Web Services are more loosely coupled WS interact with messages; DO with RPC (Remote Procedure Call) DO have “factories”; WS manage instances internally and interaction-specific state not exposed and hence need not be managed DO have explicit state (statefull services); WS use context in the messages to link interactions (statefull interactions) Claim: DO’s do NOT scale; WS build on experience (with CORBA) and do scale

Details of Web Service Protocol Stack UDDI finds where programs are remote (distributed) programs are just Web Services (not clearly a great success) WSFL links programs together (under revision as BPEL) WSDL defines interface (methods, parameters, data formats) SOAP defines structure of message including serialization of information HTTP is negotiation/transport protocol TCP/IP is layers 3-4 of OSI Physical Network is layer 1 of OSI UDDI or WSIL WSFL WSDL SOAP or RMI HTTP or SMTP or IIOP or RMTP TCP/IP Physical Network

Classic Grid Architecture Resources Database Database Content Access Composition Middle Tier Brokers Service Providers Netsolve Security Collaboration Computing Middle Tier becomes Web Services Clients Users and Devices

Grid Services for the Education Process “Learning Object” XML standards already exist Registration Performance (grading) Authoring of Curriculum Online laboratories for real and virtual instruments Homework submission Quizzes of various types (multiple choice, random parameters) Assessment data access and analysis Synchronous Delivery of Curricula including Audio/Video Conferencing and other synchronous collaborative tools as Web Services Scheduling of courses and mentoring sessions Asynchronous access, data-mining and knowledge discovery Learning Plan agents to guide students and teachers

Grid Learning Model Education and Research Grids share some services both for content and “process” For example collaboration services are largely identical Research will use much larger simulation engines to get high resolution results Maybe a researcher uses a CAVE to visualize; education a Macintosh But both can share data services but run through different filters to select for precision (research) or pedagogical value (education) Education has “digital textbook” frontend to resources of the research Grid Both use same workflow technologies to link services together

Repositories Federated Databases SERVOGrid for e-Education Coarse grain simulations Analysis and Visualization Repositories Federated Databases Field Trip Data Streaming Data Loosely Coupled Filters Sensors ? Discovery Services SERVOGrid for e-Education

Some Observations “Traditional “ Grids manage and share asynchronous resources in a rather centralized fashion Peer-to-peer networks are “just like” Grids with different implementations of message-based services like registration and look-up Collaboration systems like WebEx/Placeware (Application sharing) or Polycom (audio/video conferencing) can be viewed as Grids Computers are fast and getting faster. One can afford many strategies that used to be unrealistic including rich usually XML based messaging Web Services interact with messages Everything (including applications like PowerPoint) will be a Web Service? Grids, P2P Networks, Collaborative Environments are (will be) managed message-linked Web Services

Peer to Peer Grid Peers Peers A democratic organization Database Database Peers Service Facing Web Service Interfaces Event/ Message Brokers Peer to Peer Grid Peers User Facing Web Service Interfaces A democratic organization Peer to Peer Grid

System and Application Services? There are generic Grid system services: security, collaboration, persistent storage, universal access OGSA (Open Grid Service Architecture) is implementing these as extended Web Services An Application Web Service is a capability used either by another service or by a user It has input and output ports – data is from sensors or other services Consider Satellite-based Sensor Operations as a Web Service Satellite management (with a web front end) Each tracking station is a service Image Processing is a pipeline of filters – which can be grouped into different services Data storage is an important system service Big services built hierarchically from “basic” services Portals are the user (web browser) interfaces to Web services

Satellite Science Grid Environment

What is Happening? Grid ideas are being developed in (at least) three communities Web Service – W3C, OASIS Grid Forum (High Performance Computing, e-Science) Service Standards are being debated Grid Operational Infrastructure is being deployed Grid Architecture and core software being developed Particular System Services are being developed “centrally” – OGSA framework for this in Lots of fields are setting domain specific standards and building domain specific services There is a lot of hype Grids are viewed differently in different areas Largely “computing-on-demand” in industry (IBM, Oracle, HP, Sun) Largely distributed collaboratories in academia

OGSA OGSI & Hosting Environments Start with Web Services in a hosting environment Add OGSI to get a Grid service and a component model Add OGSA to get Interoperable Grid “correcting” differences in base platform and adding key functionalities OGSI on Web Services Broadly applicable services: registry, authorization, monitoring, data access, etc., etc. Hosting Environment for WS More specialized services: data replication, workflow, etc., etc. Domain - specific services Network OGSA Environment Possibly OGSA Not OGSA Given to us from on high

Technical Activities of Note Look at different styles of Grids such as Autonomic (Robust Reliable Resilient) New Grid architectures hard due to investment required Critical Services Such as Security – build message based not connection based Notification – event services Metadata – Use Semantic Web, provenance Databases and repositories – instruments, sensors Computing – Submit job, scheduling, distributed file systems Visualization, Computational Steering Fabric and Service Management Network performance Program the Grid – Workflow Access the Grid – Portals, Grid Computing Environments

Issues and Types of Grid Services Lightweight P2P Federation and Interoperability 2) Core Infrastructure and Hosting Environment Service Management Component Model Service wrapper/Invocation Messaging 3) Security Services Certificate Authority Authentication Authorization Policy 4) Workflow Services and Programming Model Enactment Engines (Runtime) Languages and Programming Compiler Composition/Development 5) Notification Services 6) Metadata and Information Services Basic including Registry Semantically rich Services and meta-data Information Aggregation (events) Provenance 7) Information Grid Services OGSA-DAI/DAIT Integration with compute resources P2P and database models 8) Compute/File Grid Services Job Submission Job Planning Scheduling Management Access to Remote Files, Storage and Computers Replica (cache) Management Virtual Data Parallel Computing 9) Other services including Grid Shell Accounting Fabric Management Visualization Data-mining and Computational Steering Collaboration 10) Portals and Problem Solving Environments 11) Network Services Performance Reservation Operations

Data Technology Components of (Services in) a Computing Grid Remote Grid Service 10: Job Status 1: Job Management Service (Grid Service Interface to user or program client) 1: Plan Execution 4: Job Submittal 2: Schedule and control Execution 8: Virtual Data 3: Access to Remote Computers 7: Cache Data Replicas 6: File and Storage Access Data 5: Data Transfer Data Technology Components of (Services in) a Computing Grid 9: Grid MPI

Approach Build on e-Science methodology and Grid technology Application WS Approach Typical codes WS linking to user and Other WS (data sources) Build on e-Science methodology and Grid technology Science applications with multi-scale models, scalable parallelism, data assimilation as key issues Data-driven models for earthquakes, climate, environment ….. Use existing code/database technology (SQL/Fortran/C++) linked to “Application Web/OGSA services” XML specification of models, computational steering, scale supported at “Web Service” level as don’t need “high performance” here Allows use of Semantic Grid technology

Grid Computing Environments Raw (HPC) Resources Middleware Database Portal Services System Services Application Service User Services Grid Computing Environments Application Metadata Actual Application “Core” Grid

Why we can dream of using HTTP and that slow stuff We have at least three tiers in computing environment Client (user portal) “Middle Tier” (Web Servers/brokers) Back end (databases, files, computers etc.) In Grid programming, we use HTTP (and used to use CORBA and Java RMI) in middle tier ONLY to manipulate a proxy for real job Proxy holds metadata Control communication in middle tier only uses metadata “Real” (data transfer) high performance communication in back end

Virtualization The Grid could and sometimes does virtualize various concepts – should do more Location: URI (Universal Resource Identifier) virtualizes URL (WSAddressing goes further) Replica management (caching) virtualizes file location generalized by GriPhyn virtual data concept Protocol: message transport and WSDL bindings virtualize transport protocol as a QoS request P2P or Publish-subscribe messaging virtualizes matching of source and destination services Semantic Grid virtualizes Knowledge as a meta-data query Brokering virtualizes resource allocation Virtualization implies all references can be indirect and needs powerful mapping (look-up) services -- metadata

Integration of Data and Filters One has the OGSA-DAI Data repository interface combined with WSDL of the (Perl, Fortran, Python …) filter User only sees WSDL not data syntax Some non-trivial issues as to where the filtering compute power is Microsoft says filter next to data DB Filter WSDL Of Filter OGSA-DAI Interface

Complexity Simulation Service Visualization Service SERVOGrid Complexity Computing Environment Database Parallel Simulation Service Database Service Compute Service Sensor Service Middle Tier with XML Interfaces Application Service-1 XML Meta-data Service Application Service-2 CCE Control Portal Aggregation Complexity Simulation Service Application Service-3 Users Visualization Service

SERVOGrid (Complexity) Computing Model Data Filter Data Filter OGSA-DAI Grid Services Analysis Control Visualize Grid Data Filter This Type of Grid integrates with Parallel computing Multiple HPC facilities but only use one at a time Many simultaneous data sources and sinks Grid Data Assimilation HPC Simulation Data Filter Other Grid and Web Services Distributed Filters massage data For simulation Data Filter SERVOGrid (Complexity) Computing Model

Two-level Programming I The paradigm implicitly assumes a two-level Programming Model We make a Service (same as a “distributed object” or “computer program” running on a remote computer) using conventional technologies C++ Java or Fortran Monte Carlo module Data streaming from a sensor or Satellite Specialized (JDBC) database access Such services accept and produce data from users files and databases The Grid is built by coordinating such services assuming we have solved problem of programming the service Service Data

Two-level Programming II The Grid is discussing the composition of distributed services with the runtime interfaces to Grid as opposed to UNIX pipes/data streams Familiar from use of UNIX Shell, PERL or Python scripts to produce real applications from core programs Such interpretative environments are the single processor analog of Grid Programming Some projects like GrADS from Rice University are looking at integration between service and composition levels but dominant effort looks at each level separately Service1 Service2 Service3 Service4

Conclusions Grids are inevitable and pervasive Can expect Web Services and Grids to merge with a common set of general principles but different implementations with different scaling and functionality trade-offs e-Science will grow in importance as Science grows as an international “team sport”; affects scientists and organizations Enough is known that one can start today We will be flooded with data, information and purported knowledge One should be learning about Grids; understanding relevant Web and Grid standards and developing new domain specific standards Note many existing (standards) efforts assume client-server and not a brokered service model; these will need to change!