Download presentation
Presentation is loading. Please wait.
Published bySamuel Oliver Modified over 9 years ago
2
San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida DGL: The Assembly Language for Grid Computing Arun swaran Jagatheesan arun@sdsc.edu San Diego Supercomputer Center (SDSC) University of California, San Diego 13 th IEEE International Symposium on High-Performance Distributed Computing (HPDC) June 4-6, 2004, Honolulu, Hawaii, USA
3
San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida 2 Acknowledgement Participants Jonathan Weinberg Allen Ding Dipti Borkar Lucas Gilbert (BIRN) Alumnii Erik Vandekieft Reena Mathew Good-will Wishers Reagan Moore and SDSC SRB Team Marcio Faerman (SCEC) You !!!
4
San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida 3 Talk Outline Data Grids in production for Scientific pipelines Problem : Gridflow Description and Querying Gridflows Description Requirements Relevance to P2P Gridflow Management Gridflow Data Model and components Summary
5
San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida 4 Data Grid Management Systems Southern California Earthquake Center NASA Data Grids NIH Biomedical Informatics Research Network All these Data Grids have pipelines for their scientific or business processes National Science Digital Library Scripps Institute of Oceanography
6
San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida 5 Data handling pipeline in SCEC (data information pipeline) Metadata derivation Ingest Metadata Ingest Data Determine analysis pipeline Initiate automated analysis Organize result data into distributed data grid collections Use the optimal set of resources based on the task – on demand Pipeline could be triggered by input at data source or by a data request from user All gridflow activities stored for data flow provenance
7
San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida 6 Data Discovery Digital entities Meta-data Services State New data updates relationships among data in collections Services invoked to analyze new relationships DGMS applications get notified of state updates Digital entities Meta-data Services State
8
San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida 7 What they want? We know the business (scientific) process CyberInfrastructure is all we care (why bother about atoms or DNA)
9
San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida 8 What they want? Use DGL to describe your process logic with abstract references to datagrid infrastructure dependencies
10
San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida 9 Gridflows Grid Workflow (Gridflow) is the automation of a execution pipeline in which data or tasks are processed through multiple autonomous grid resources according to a set of procedural rules Gridflows are executed on resources that are dynamically obtained through confluence of one or more autonomous administrative domains (peers)
11
San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida 10 Gridflow Language and CS Domains Compiler Design Variable scope definition, Recursive Grammar, Execution Stack Management, Data Modeling Schema definitions for gridflow patterns Grid Computing Data Grid data types, Virtual Organization, basic operations, … Other concepts and Standards Rules, W3C XQuery, GGF JSDL?
12
San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida 11 Gridflow Language Requirements High level Abstract descriptions Abstract description of cyberinfrastructure dependencies Simple yet flexible Flexible to describe complex requirements (no brute force) Gridflow dependency patterns Based on execution structure and data semantics (Parallel, Sequential, fork-new), (milestones, for-each, switch-case).. Asynchronous execution For long-run requests Querying using existing standard XQuery
13
San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida 12 Gridflow Language Requirements Process meta data and annotations Runtime definition, update and querying of meta-data Runtime Management of Gridflows Stop gridflow at run time Partitioning Facility in language to divide a gridflow request to multiple requests Import descriptions Refer other gridflows in execution
14
San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida 13 Data Grid Language (DGL) DGL is just a language specification Can be used in any commercial or academic data grid software DGL describes gridflow description and dependencies
15
San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida 14 Gridflow Process I End User using DGBuilder Gridflow Description Data Grid Language
16
San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida 15 Gridflow Process II Abstract Gridflow using Data Grid Language Planner Concrete Gridflow Using Data Grid Language
17
San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida 16 Gridflow Process III Gridflow P2P Network Gridflow Processor Concrete Gridflow Using Data Grid Language
18
San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida 17 DGL Structure (data model) Runnable Pre-Process Post-Process ECA Rule based definitions Meta-data Flow Logic Structure Structure – parallel, sequential etc., Recursive definition of runnables as either data operation or as a executable process (Job)
19
San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida 18 Operations in DGL Execute Process (DAG, java, WSDL, etc) Very generic Datagrid operations Copy directories/files Change Permissions (Chmod) Create directory/file/archive Delete directory/file/archive Ingest/download URl or any data source Replicate, Rename, List SeekNWrite, SeekNRead Ingest, Query Any type of Metadata
20
San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida 19 Components of DGL DGL document is either a request or a response Data Grid Request Could be a Flow (aggregation of operations) Or could be a Status Query Data Grid Response Could be a Flow Acknowledgement Or could be a Status Response Can be made Synchronous or Asynchronous Flexibility for any type of Implementation
21
San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida 20 Summary A standard description language is Needed Requirements of the language Data Grid Language (DGL) Recursive definition of flows and steps Metadata or variable scopes Rules Can be partitioned (sub-divided) Components of Data Grid Language Next step: Talk to Scheduling or Heuristics people
22
San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida got ideas/suggestions? Contact: SDSC Matrix project arun@sdsc.edu Google key word: SDSC Gridflow Click here to start the slide show again
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.