San Diego Supercomputer Center SDSC Storage Resource Broker A Data Storage Language for the Requirements of Rebels and Misfits Arun Jagatheesan San Diego.

Slides:



Advertisements
Similar presentations
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids for Collection Federation Reagan W. Moore University.
Advertisements

GFS OGF-22 Global Resource Naming Developers: Reagan Moore Arcot Mike.
The Storage Resource Broker and.
The Storage Resource Broker and.
Digital Preservation Lifecycle Management Building a demonstration prototype for the preservation of large-scale multi-media collections Arcot Rajasekar.
SACNAS, Sept 29-Oct 1, 2005, Denver, CO What is Cyberinfrastructure? The Computer Science Perspective Dr. Chaitan Baru Project Director, The Geosciences.
Data Grid: Storage Resource Broker Mike Smorul. SRB Overview Developed at San Diego Supercomputing Center. Provides the abstraction mechanisms needed.
San Diego Supercomputer Center Self-organizing Smart Namespaces : Next Generation Data Grid Systems Arun Jagatheesan iRODS.org.
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE SAN DIEGO SUPERCOMPUTER CENTER Particle Physics Data Grid PPDG Data Handling System Reagan.
San Diego Supercomputer Center, University of California at San Diego Grid Physics Network (GriPhyN) University of Florida A Data Storage Language for.
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure1 Grid Based Solutions for Distributed Data Management Reagan.
Background Chronopolis Goals Data Grid supporting a Long-term Preservation Service Data Migration Data Migration to next generation technologies Trust.
Looking ahead for GFS … Arun Jagatheesan San Diego Supercomputer Center Remote Talk at GGF-16 Athens, Greece.
A Very Brief Introduction to iRODS
INFSO-RI Enabling Grids for E-sciencE Grid & Data Preservation Boon Low System Development, EGEE Training National.
Chronopolis: Preserving Our Digital Heritage David Minor UC San Diego San Diego Supercomputer Center.
MS DB Proposal Scott Canaan B. Thomas Golisano College of Computing & Information Sciences.
Jean-Yves Nief, CC-IN2P3 Wilko Kroeger, SCCS/SLAC Adil Hasan, CCLRC/RAL HEPiX, SLAC October 11th – 13th, 2005 BaBar data distribution using the Storage.
Architecture of Grid File System (GFS) - Based on the outline draft - Arun swaran Jagatheesan San Diego Supercomputer Center Global Grid Forum 11 Honolulu,
QCDgrid Technology James Perry, George Beckett, Lorna Smith EPCC, The University Of Edinburgh.
ArcGIS Workflow Manager An Introduction
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
Supercomputing Center Jysoo Lee KISTI Supercomputing Center National e-Science Project.
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center National Partnership for Advanced.
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center National Partnership for Advanced.
San Diego Supercomputer CenterUniversity of California, San Diego Preservation Research Roadmap Reagan W. Moore San Diego Supercomputer Center
San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida Programming Gridflows using Matrix Arun Jagatheesan Architect, SDSC.
Jan Storage Resource Broker Managing Distributed Data in a Grid A discussion of a paper published by a group of researchers at the San Diego Supercomputer.
San Diego Supercomputer Center SDSC Storage Resource Broker SRB as data grid solution (Chinese version) Arun Jagatheesan San Diego Supercomputer.
San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida Dataflows in SRB using SDSC Matrix Arun Jagatheesan Architect & Team.
Data R&D Issues for GTL Data and Knowledge Systems San Diego Supercomputer Center University of California, San Diego Bertram Ludäscher
Rule-Based Data Management Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar {moore, schroede, mwan, {moore, schroede, mwan,
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center National Partnership for Advanced.
San Diego Supercomputer Center SDSC Storage Resource Broker Data Grid Automation Arun Jagatheesan et al., San Diego Supercomputer Center University of.
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center National Partnership for Advanced.
QCDGrid Progress James Perry, Andrew Jackson, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.
Production Data Grids SRB - iRODS Storage Resource Broker Reagan W. Moore
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center National Partnership for Advanced.
1-1.1 Sample Grid Computing Projects. NSF Network for Earthquake Engineering Simulation (NEES) 2004 – to date‏ Transform our ability to carry out research.
Data Grid projects in HENP R. Pordes, Fermilab Many HENP projects are working on the infrastructure for global distributed simulated data production, data.
Rule-Based Programming for VORBs Bertram Ludaescher Arcot Rajasekar Data and Knowledge Systems San Diego Supercomputer Center U.C. San Diego.
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure SRB + Web Services = Datagrid Management System (DGMS) Arcot.
Designing the Architecture for Grid File System (GFS) Arun swaran Jagatheesan San Diego Supercomputer Center Global Grid Forum 12 Brussels, Belgium.
Rule-Based Preservation Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar Richard Marciano {moore, schroede, mwan, sekar,
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Management of Distributed Data Reagan W. Moore.
San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida DGL: The Assembly Language for Grid Computing Arun swaran Jagatheesan.
San Diego Supercomputer Center iRODS DGMS Towards Data Grid Standard Implementations Arun Jagatheesan San Diego Supercomputer Center Open.
SAN DIEGO SUPERCOMPUTER CENTER By: Roman Olschanowsky An Introduction to the.
Introduction to The Storage Resource.
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center National Partnership for.
San Diego Supercomputer Center, University of California at San Diego Grid Physics Network (GriPhyN) University of Florida Data Grid and Gridflow Management.
Biomedical Informatics Research Network The Storage Resource Broker & Integration with NMI Middleware Arcot Rajasekar, BIRN-CC SDSC October 9th 2002 BIRN.
Data and storage services on the NGS.
National Archives and Records Administration1 Integrated Rules Ordered Data System (“IRODS”) Technology Research: Digital Preservation Technology in a.
The Storage Resource Broker and.
Grid File System Working Group SAGA and GFS-WG Grid File System Working Group (GFS-WG) Global Grid Forum (GGF)
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center National Partnership for Advanced.
Lynda : Lyon Neuroimaging Database and Applications (1) Institut des Sciences Cognitives UMR 5015 CNRS ; (2) parallel computing ENS-Lyon ; (3)Centre de.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Collection-Based Persistent Archives Arcot Rajasekar, Richard Marciano, Reagan Moore San Diego Supercomputer Center Presented by: Preetham A Gowda.
Group D New Environments and Data Management System Issues.
Chapter 1 Overview of Databases and Transaction Processing.
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
Policy-Based Data Management integrated Rule Oriented Data System
Designing the Architecture for Grid File System (GFS)
San Diego Supercomputer Center University of California, San Diego
VORB Virtual Object Ring Buffers
Grid Application Model and Design and Implementation of Grid Services
Architecture of Grid File System (GFS) - Based on the outline draft -
Presentation transcript:

San Diego Supercomputer Center SDSC Storage Resource Broker A Data Storage Language for the Requirements of Rebels and Misfits Arun Jagatheesan San Diego Supercomputer Center University of California, San Diego HPTS Workshop Asilomar, California, September 2005 Or A talk on Data Grids and DGL

San Diego Supercomputer Center SDSC Storage Resource Broker 2 Talk Outline “Next Hype in Grids” My belief system before we begin Meet my friends – Rebels and Misfits File Systems, Databases, Datagrids Mapping physical data to logical view Mapping physical data and storage to logical view SRB Statistics Mapping physical data, storage and processes to logical view Data Grid Language Conclusion What Now = work and sacrifices; What Next = Vision He has 44 slides and 20 minutes. No infotainment slides either – Boring!

San Diego Supercomputer Center SDSC Storage Resource Broker 3 Disclaimer and Warning My own opinion or thoughts Arun says so… (can be wrong?) Based on my current knowledge and understanding On September 2005 – current knowledge and level of understanding (can change?) My belief system I believe in Data Grids for Inter/Intra/Multi-Organizational Unstructured Data Management (biased ?) My belief might not be in sync with your belief, but it can co-exist with your favorite technology

San Diego Supercomputer Center SDSC Storage Resource Broker 4 Meet my friends – Rebels and Misfits Esoteric Requirements from “High-end” users To keep them alive, they need more… more of every thing Requirements not broadly felt or required in industry They push the existing technology to the limits From the existing technology’s perspective… These folks are nuts! The existing technology was not designed for these requirements My friends become rebels or misfits from the existing technology’s perspective

San Diego Supercomputer Center SDSC Storage Resource Broker 5 Talk Outline “Next Hype in Grids” My belief system before we begin Meet my friends – Rebels and Misfits File Systems, Databases, Datagrids Mapping physical data to logical view Mapping physical data and storage to logical view SRB Statistics Mapping physical data, storage and processes to logical view Data Grid Language Conclusion What Now = work and sacrifices; What Next = Vision

San Diego Supercomputer Center SDSC Storage Resource Broker 6 Mapping physical data to logical view Hierarchical view, independent of network, disk, sector, track, fragments Rule : Storage Abstraction – Hide storage resources

San Diego Supercomputer Center SDSC Storage Resource Broker 7 Mapping physical data to logical view Relational view (assume its a database), independent of network, disk, sector, track, fragments Thanks to rebels and misfits in Airline industry who wanted transactional capabilities

San Diego Supercomputer Center SDSC Storage Resource Broker 8 Talk Outline “Next Hype in Grids” My belief system before we begin Meet my friends – Rebels and Misfits File Systems, Databases, Datagrids Mapping physical data to logical view Mapping physical data and storage to logical view SRB Statistics Mapping physical data, storage and processes to logical view Data Grid Language Conclusion What Now = work and sacrifices; What Next = Vision

San Diego Supercomputer Center SDSC Storage Resource Broker 9 NIH BIRN SRB Data Grid Biomedical Informatics Research Network Access and analyze biomedical image data Data resources distributed throughout the country Medical schools and research centers across the US Stable high performance grid based environment Coordinate data sharing Federate collections Support data mining and analysis

San Diego Supercomputer Center SDSC Storage Resource Broker 10 Mapping distributed data & storage to logical view 25 Universities or Research Hospitals, Multiple heterogeneous storage resources

San Diego Supercomputer Center SDSC Storage Resource Broker 11 Approach we have taken in Data Grids Logical Schema (view) is independent of physical schema Just like databases or even file systems Physical Resources are provided in the form of logical resources in the logical view This is very different from databases (may be similar to tablespaces) A database is used for mapping Data path, network, access permissions, meta data, storage type, logical storage resource, physical storage resources Used for digital libraries, persistent archives and data grids

San Diego Supercomputer Center SDSC Storage Resource Broker 12 The “Grid” Vision

San Diego Supercomputer Center SDSC Storage Resource Broker 13 Data Grid Resource Providers Grid Resource Providers (GRP) providing content and/or storage GRP /txt3.txt GRP

San Diego Supercomputer Center SDSC Storage Resource Broker 14 Data Grid Administrative Domain GRP Administrative domain with one or more Grid Resource Providers Could include their data centers /txt3.txt GRP Research Lab

San Diego Supercomputer Center SDSC Storage Resource Broker 15 Data Grid Administrative domains /…/text1.txt /…//text2.txt GRP /txt3.txt GRP Storage-R-Us Resource Providers data + storage (50) Research lab data + storage (40) University data + storage (10)

San Diego Supercomputer Center SDSC Storage Resource Broker 16 Data Grid: Logical view of data & resources /…/text1.txt /…//text2.txt GRP /txt3.txt GRP Storage-R-Us Resource Providers data + storage (50) Research Lab data + storage (40) University data + storage (10) /home/arun.sdsc/exp1 /home/arun.sdsc/exp1/text1.txt /home/arun.sdsc/exp1/text2.txt /home/arun.sdsc/exp1/text3.txt data + storage (100) Logical Namespace (Need not be same as physical view of resources )

San Diego Supercomputer Center SDSC Storage Resource Broker 17 BIRN: Inter-organizational Data

San Diego Supercomputer Center SDSC Storage Resource Broker 18 SDSC SRB User Community (Major US) BaBar, Stanford Linear Accelerator Center (SLAC) California Digital Library (CDL) Center for Integrated Space Weather Modeling (CISM) CVC, Visualization Portal LDC Data Storage NIH Bio Informatics Research Network (BIRN) NSF Southern California Earthquake Center (SCEC) National Archives and Records Administration (NARA) National Aeronautics and Space Administration Centers (NASA) National Virtual Observatory (NVO) Npackage, NSF Middleware Initiative (NMI) National Science Digital Library (NSDL) National Optical Astronomy Observatory (NOAO) ROADNet Purdue University SCCOOS, USA Scientific Rich Media Archive Salk Institute Strand Map Service, USA UC Berkeley Library UCSD Library University of Houston Persistent Archives Test bed University of Wisconsin, Madison WebBase, Stanford University Yale University Library

San Diego Supercomputer Center SDSC Storage Resource Broker 19 SDSC SRB User Community Academia Sinica, Taiwan Australian National University Bio-Lab, University of Genoa, Italy Council for the Central Laboratory of the Research Councils (CCLRC), UK CC-IN2P3, France Distributed Framework, Singapore Distributed Aircraft Maintenance Environment (DAME), UK eMinerals Project, UK eScience, Belfast Center Fraunhofer ITWM, Germany High Energy Accelerator Organization, KEK, Japan K* Grid Computing, Korea KEK Computing Center, Japan Lyon, France NorGrid, Norway Nanyang Data Grid, Singapore NCHC, Taiwan Queensland University of Technology (QUT), Australia Rutherford Appleton Laboratory (RAL), UK T-Systems, Germany UK eScience Project, UK UniGrid, Poland UMK, Poland Virtual Laboratory for eScience, Netherlands

San Diego Supercomputer Center SDSC Storage Resource Broker TB 358 TB 682 TB Total data brokered by SDSC SRB

San Diego Supercomputer Center SDSC Storage Resource Broker 21 Talk Outline “Next Hype in Grids” My belief system before we begin Meet my friends – Rebels and Misfits File Systems, Databases, Datagrids Mapping physical data to logical view Mapping physical data and storage to logical view SRB Statistics Mapping physical data, storage and processes to logical view Data Grid Language Conclusion What Now = work and sacrifices; What Next = Vision

San Diego Supercomputer Center SDSC Storage Resource Broker 22 Mapping distributed data, storage and processes to logical view

San Diego Supercomputer Center SDSC Storage Resource Broker 23 Long-run Processes in Data Grid Data Grid ILM Data Grid Triggers Data Gridflows

San Diego Supercomputer Center SDSC Storage Resource Broker 24 Data Grid (Enterprise Utility) ABCZ.com US ABCZ.com Asia Data center IT Department USIT Department Asia 3 rd Party Physical Resources managed by autonomous administrative domains of the same enterprise (ABCZ.com)

San Diego Supercomputer Center SDSC Storage Resource Broker 25 Data Grid (Enterprise Utility) ABCZ.com US ABCZ.com Asia Data center IT Department USIT Department Asia 3 rd Party Project 1Project 2 Each project has a data grid instance consisting of Logical Resources with different SLAs offered by IT department

San Diego Supercomputer Center SDSC Storage Resource Broker 26 Data Grid (Enterprise Utility) ABCZ.com US ABCZ.com Asia Data center IT Department USIT Department Asia 3 rd Party Project1Project2Project3Project4

San Diego Supercomputer Center SDSC Storage Resource Broker 27 Data Grid ILM

San Diego Supercomputer Center SDSC Storage Resource Broker 28 Change is Constant Changes in access patterns Based on number of users accessing a data Domains which want to access data Data Value The value of data set (collections?) for a particular domain based on it business model and users’ access patterns Each domain will have a different value based on its users and its role in a data grid

San Diego Supercomputer Center SDSC Storage Resource Broker 29 “Data Value” based on users ABCZ.com US ABCZ.com Asia Data center IT Department USIT Department Asia 3 rd Party Project1Project2Project3Project4 When more users access a project’ data, its data value increases, move that data to a faster storage type

San Diego Supercomputer Center SDSC Storage Resource Broker 30 “Data Value” based on domain ABCZ.com US ABCZ.com Asia Data center IT Department USIT Department Asia 3 rd Party Project1Project2Project3Project4 When more users from the same domain access the data, the data value for that particular data in that particular domain increases, so replicate the data to resources in that domain. (converse is also true)

San Diego Supercomputer Center SDSC Storage Resource Broker 31 “Data Value” based on role ABCZ.com US ABCZ.com Asia Data center IT Department USIT Department Asia 3 rd Party Project1Project2Project3Project4 The 3 rd party data center – no users who use data, but is interested in having replica of any data (or deleted data) for long term preservation

San Diego Supercomputer Center SDSC Storage Resource Broker 32 Data Grid ILM ILM = Information Lifecycle Management (Sales Jargon) Dynamic re-orientation of data placement and data retention policies (rules) Based on “business value of data” and storage cost HSM = Hierarchical Storage Management, based on “data freshness”. ILM goes one step further Applying this concept on Data Grid, very tricky as different autonomous domains have different business rules

San Diego Supercomputer Center SDSC Storage Resource Broker 33 Data Grid Triggers Similar to triggers in databases Based on ECA concepts Event Condition Action Example Event = Insert new file in collection (“/ourProject/data”) Condition = (color= “blue” && galaxy = “Andromedia”) Action = Run ( selectiveDataReplicator.dgl )

San Diego Supercomputer Center SDSC Storage Resource Broker 34 Talk Outline “Next Hype in Grids” My belief system before we begin Meet my friends – Rebels and Misfits File Systems, Databases, Datagrids Mapping physical data to logical view Mapping physical data and storage to logical view SRB Statistics Mapping physical data, storage and processes to logical view Data Grid Language Conclusion What Now = work and sacrifices; What Next = Vision

San Diego Supercomputer Center SDSC Storage Resource Broker 35 Data Grid Language Requirement Data Grid ILM process The long run process that has to be run is described in DGL Data Grid Triggers Action part of the ECA (Event-Condition-Action) logic Data Gridflows Step by step execution of long run process on Data Grid Analogy of SQL in relational databases Long-run procedures stored and executed in Data Grid it self Captures the “Infrastructure Execution Logic”

San Diego Supercomputer Center SDSC Storage Resource Broker 36 DGL Request Annotations about the Data Grid Request Can be either a Flow or a Status Query

San Diego Supercomputer Center SDSC Storage Resource Broker 37 DGL Requests (2 types) Data Grid Flow An XML Structure that describes the execution logic, associated procedural rules and DGL variables. Can be synchronous or asynchronous flow Status Query An XML Structure used to query the execution status any gridflow or a sub-flow at any granular level. Status Queries can be made for both synchronous and asynchronous flows

San Diego Supercomputer Center SDSC Storage Resource Broker 38 Flow Scoped Variables that can control the flow Logic used by the sub-members Sub-members that are the real execution statements

San Diego Supercomputer Center SDSC Storage Resource Broker 39 Flow Logic (How a flow executes)

San Diego Supercomputer Center SDSC Storage Resource Broker 40 DGL-Response Responses can be synchronous or asynchronous

San Diego Supercomputer Center SDSC Storage Resource Broker 41 Talk Outline “Next Hype in Grids” My belief system before we begin Meet my friends – Rebels and Misfits File Systems, Databases, Datagrids Mapping physical data to logical view Mapping physical data and storage to logical view SRB Statistics Mapping physical data, storage and processes to logical view Data Grid Language Conclusion What Now = work and sacrifices; What Next = Vision

San Diego Supercomputer Center SDSC Storage Resource Broker 42 Conclusion Data Grids are for real – they manage Inter/Intra/Multi-organizational unstructured data (files, streams, …) Data Grids extend the database concepts and internally use a database A language like Data Grid Language mentioned here is necessary for the proliferation and automation of Data Grid Management Systems (DGMS) Reference: Paper in VLDB Workshop on Data Management in Grids

San Diego Supercomputer Center SDSC Storage Resource Broker 43 We are SDSC SRB Arun is here! - Shameless Self promotion Not in picture: Many students

San Diego Supercomputer Center SDSC Storage Resource Broker 44 Additional Thanks (Ignorance is a bliss) My Advisor: “You already graduated, and have a job at a research firm. Now why are writing to MS Research? Whom did you write to?” Me: “I wrote to two people. The first person works on social communities, we can use service brokering for them. I have not got any response from him. But there is another person who did respond. His last name is of the color “Gray” and his web page is very cheesy with music in the background. I guess he does not do much computer science – he works with astronomers.

San Diego Supercomputer Center SDSC Storage Resource Broker 45 Contact Info Arun Jagatheesan Or