San Diego Supercomputer Center www.iRODS.org Self-organizing Smart Namespaces : Next Generation Data Grid Systems Arun Jagatheesan iRODS.org.

Slides:



Advertisements
Similar presentations
GFS OGF-22 Global Resource Naming Developers: Reagan Moore Arcot Mike.
Advertisements

OGF-23 iRODS Metadata Grid File System Reagan Moore San Diego Supercomputer Center.
© 2012 Open Grid Forum Simplifying Inter-Clouds October 10, 2012 Hyatt Regency Hotel Chicago, Illinois, USA.
The Storage Resource Broker and.
The Storage Resource Broker and.
Multi-Application in Smart Card-based Devices Christophe Colas, Chief Software Architect August 2002.
Database Systems: Design, Implementation, and Management Tenth Edition
Data Grid: Storage Resource Broker Mike Smorul. SRB Overview Developed at San Diego Supercomputing Center. Provides the abstraction mechanisms needed.
San Diego Supercomputer Center, University of California at San Diego Grid Physics Network (GriPhyN) University of Florida A Data Storage Language for.
Background Chronopolis Goals Data Grid supporting a Long-term Preservation Service Data Migration Data Migration to next generation technologies Trust.
Looking ahead for GFS … Arun Jagatheesan San Diego Supercomputer Center Remote Talk at GGF-16 Athens, Greece.
A Very Brief Introduction to iRODS
CLOUD COMPUTING AN OVERVIEW & QUALITY OF SERVICE Hamzeh Khazaei University of Manitoba Department of Computer Science Jan 28, 2010.
 Introduction Originally developed by Open Software Foundation (OSF), which is now called The Open Group ( Provides a set of tools and.
6th Biennial Ptolemy Miniconference Berkeley, CA May 12, 2005 Distributed Computing in Kepler Ilkay Altintas Lead, Scientific Workflow Automation Technologies.
Robust Tools for Archiving and Preserving Digital Data Joseph JaJa, Mike Smorul, and Mike McGann Institute for Advanced Computer Studies Department of.
Presented by Sujit Tilak. Evolution of Client/Server Architecture Clients & Server on different computer systems Local Area Network for Server and Client.
Architecture of Grid File System (GFS) - Based on the outline draft - Arun swaran Jagatheesan San Diego Supercomputer Center Global Grid Forum 11 Honolulu,
Biology.sdsc.edu CIPRes in Kepler: An integrative workflow package for streamlining phylogenetic data analyses Zhijie Guan 1, Alex Borchers 1, Timothy.
February Semantion Privately owned, founded in 2000 First commercial implementation of OASIS ebXML Registry and Repository.
1 Challenges Facing Modeling and Simulation in HPC Environments Panel remarks ECMS Multiconference HPCS 2008 Nicosia Cyprus June Geoffrey Fox Community.
QCDgrid Technology James Perry, George Beckett, Lorna Smith EPCC, The University Of Edinburgh.
The material in this presentation is the property of Fair Isaac Corporation. This material has been provided for the recipient only, and shall not be used,
Quality Attributes of Web Software Applications – Jeff Offutt By Julia Erdman SE 510 October 8, 2003.
San Diego Supercomputer CenterUniversity of California, San Diego Preservation Research Roadmap Reagan W. Moore San Diego Supercomputer Center
Cloud Computing 1. Outline  Introduction  Evolution  Cloud architecture  Map reduce operation  Platform 2.
San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida Programming Gridflows using Matrix Arun Jagatheesan Architect, SDSC.
Ocean Observatories Initiative Common Execution Infrastructure (CEI) Overview Michael Meisinger September 29, 2009.
San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida Dataflows in SRB using SDSC Matrix Arun Jagatheesan Architect & Team.
STORAGE ARCHITECTURE/ EXECUTIVE: Virtualization It’s not what you think you’re buying. John Blackman Independent Storage Consultant.
Rule-Based Data Management Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar {moore, schroede, mwan, {moore, schroede, mwan,
1 School of Computer, National University of Defense Technology A Profile on the Grid Data Engine (GridDaEn) Xiao Nong
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center National Partnership for Advanced.
San Diego Supercomputer Center SDSC Storage Resource Broker Data Grid Automation Arun Jagatheesan et al., San Diego Supercomputer Center University of.
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center National Partnership for Advanced.
San Diego Supercomputer Center SDSC Storage Resource Broker A Data Storage Language for the Requirements of Rebels and Misfits Arun Jagatheesan San Diego.
Production Data Grids SRB - iRODS Storage Resource Broker Reagan W. Moore
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center National Partnership for Advanced.
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
© 2007 Open Grid Forum Data Grid Management Systems: Standard API - community development Arun Jagatheesan, San Diego Supercomputer Center & iRODS.org.
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure SRB + Web Services = Datagrid Management System (DGMS) Arcot.
Designing the Architecture for Grid File System (GFS) Arun swaran Jagatheesan San Diego Supercomputer Center Global Grid Forum 12 Brussels, Belgium.
Rule-Based Preservation Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar Richard Marciano {moore, schroede, mwan, sekar,
San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida DGL: The Assembly Language for Grid Computing Arun swaran Jagatheesan.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop iPlant Data Store.
Policy Based Data Management Data-Intensive Computing Distributed Collections Grid-Enabled Storage iRODS Reagan W. Moore 1.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
1October 9, 2001 Sun in Scientific & Engineering Computing Grid Computing with Sun Wolfgang Gentzsch Director Grid Computing Cracow Grid Workshop, November.
San Diego Supercomputer Center iRODS DGMS Towards Data Grid Standard Implementations Arun Jagatheesan San Diego Supercomputer Center Open.
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
Grid File System WG – GGF 17 Arun Jagatheesan San Diego Supercomputer Center GGF 17 May 11, 2006 Tokyo, Japan.
Introduction to Grids By: Fetahi Z. Wuhib [CSD2004-Team19]
© 2008 Open Grid Forum File Catalog Development in Japan e-Science Project GFS-WG, OGF24 Singapore Hideo Matsuda Osaka University.
Introduction to The Storage Resource.
1 NSF/TeraGrid Science Advisory Board Meeting July 19-20, San Diego, CA Brief TeraGrid Overview and Expectations of Science Advisory Board John Towns TeraGrid.
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center National Partnership for.
7. Grid Computing Systems and Resource Management
GRID ANATOMY Advanced Computing Concepts – Dr. Emmanuel Pilli.
National Archives and Records Administration1 Integrated Rules Ordered Data System (“IRODS”) Technology Research: Digital Preservation Technology in a.
High Risk 1. Ensure productive use of GRID computing through participation of biologists to shape the development of the GRID. 2. Develop user-friendly.
Cloud Distributed Computing Environment Hadoop. Hadoop is an open-source software system that provides a distributed computing environment on cloud (data.
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center National Partnership for Advanced.
OGSA-DAI.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CLOUD COMPUTING
SuperComputing 2003 “The Great Academia / Industry Grid Debate” ?
Introduction to Data Management in EGI
GSAF Grid Storage Access Framework
San Diego Supercomputer Center University of California, San Diego
Presentation transcript:

San Diego Supercomputer Center Self-organizing Smart Namespaces : Next Generation Data Grid Systems Arun Jagatheesan iRODS.org

San Diego Supercomputer Center 2 Content Outline State of the art Where we stand Concepts What is next, new, hot and exciting? Yesterday’s research - now Today’s research - future? What could be done from OGF, SNIA, IETF?? Standard for distributed data management Risks, rewards

San Diego Supercomputer Center 3 State of the art - where we are now (Shameless self promotion or fact!) Estimated 2 petabytes of data brokerage Multiple agencies- DoD, NARA, NSF, NIH, … Multiple countries - US, UK, Japan, France…, Antartica Span off a private company … We don’t live in the past anyways…

San Diego Supercomputer Center 4 Concepts and Lessons (Current understanding - looking back) Don’t hide distributed computing Allows users to “enjoy” distributed namespace rather than cheat them with “location opaque” namespace (unlike traditional file systems) Human readable or enjoy-able (No urls, uuids etc) Logical mappings to physical heterogeneities Data (files), storage resource, metadata, user groups, policies, and even file systems become logical entities in data grids Hide every thing including with logical human-friendly names Keep it simple and scalable (It’s the data model & design) Not layer on top of another layer. Finished product not lego blocks. Hybrid approach - Neither too much P2P nor too much centralization. Just the right level of distributed computing with some TLC for users

San Diego Supercomputer Center 5 Content Outline State of the art Where we stand Concepts What is next, new, hot and exciting? An use case - LSST Yesterday’s research - now Today’s research - future? What could be done from OGF, SNIA, IETF?? Standard for distributed data management Risks, rewards

San Diego Supercomputer Center 6 Motivational Use Case LSST = Large Synoptic Survey Telescope 150+ Petabytes Multiple countries, multiple data centers Multiple heterogeneous file systems (high performance, high distribution, interoperability, P2P, …) Multiple heterogeneous hardware

San Diego Supercomputer Center 7 Yesterday’s research Data Grid Workflows and policies Some concepts prototyped in SRB Matrix Event, Condition, Action (ECA) based “data grid flows” If, for, for-each, if-else, switch-case Server-side workflows on data grids Use a separate language to capture the recipe of workflow and execute it as action - Data Grid Language Let the flow be with you (Flow data type was introduced)

San Diego Supercomputer Center 8 Today’s research = future Now = Lessons learnt + yesterday’s research Allow logical namespace to reflect local namespace (local file system logically mounted on global namespace) Allow users to define their own policies and workflows (  Services, rules) iRODS.org - Open source platform - world’s first open source Data Grid Management System (DGMS).

San Diego Supercomputer Center 9 iRODS.org Its all about the namespace and how user’s or applications interact with it What if we made this namespace “smart” ECA Rules + Machine Learning or bootstrapped learning Event: (any thing, as simple as a file upload) Condition: based on system or user metadata Action: Any system-defined or user-defined service

San Diego Supercomputer Center 10 iRODS Namespace #1 (data) Human readable data names to data (or virtual data) Namespace #2 (resource) Human readable resource names to storage resource (allows distributed computing) Namespace #3 (policies) Human readable policy namespace of how data needs to be managed Again every thing can be accessed and controlled by end-users (not just SYSTEM adminis)

San Diego Supercomputer Center 11 Content Outline State of the art Where we stand Concepts What is next, new, hot and exciting? An use case - LSST Yesterday’s research - now Today’s research - future? What could be done from OGF, SNIA, IETF?? Standard for distributed data management Risks, rewards

San Diego Supercomputer Center 12 OGF, SNIA and iRODS.org Collaborative data management FAN / Data grid??? - but still Distributed data management But still needs a standard simple API as a standard Data grid namespace on XAM resources Standardize a simple API (java, C/C++) to provide data grid concepts on top of existing SNIA XAM or products Open source data grid software Involve engineers from different participating member organizations Multi-institutional participation Multiple countries, mulitple companies, academic and commercial participants

San Diego Supercomputer Center 13 Enthusiasm is contagious