Presentation is loading. Please wait.

Presentation is loading. Please wait.

San Diego Supercomputer Center SDSC Storage Resource Broker A Data Storage Language for the Requirements of Rebels and Misfits Arun Jagatheesan San Diego.

Similar presentations


Presentation on theme: "San Diego Supercomputer Center SDSC Storage Resource Broker A Data Storage Language for the Requirements of Rebels and Misfits Arun Jagatheesan San Diego."— Presentation transcript:

1 San Diego Supercomputer Center SDSC Storage Resource Broker A Data Storage Language for the Requirements of Rebels and Misfits Arun Jagatheesan San Diego Supercomputer Center University of California, San Diego HPTS Workshop Asilomar, California, 25-28 September 2005 Or A talk on Data Grids and DGL

2 San Diego Supercomputer Center SDSC Storage Resource Broker 2 Talk Outline “Next Hype in Grids” My belief system before we begin Meet my friends – Rebels and Misfits File Systems, Databases, Datagrids Mapping physical data to logical view Mapping physical data and storage to logical view SRB Statistics Mapping physical data, storage and processes to logical view Data Grid Language Conclusion What Now = work and sacrifices; What Next = Vision He has 44 slides and 20 minutes. No infotainment slides either – Boring!

3 San Diego Supercomputer Center SDSC Storage Resource Broker 3 Disclaimer and Warning My own opinion or thoughts Arun says so… (can be wrong?) Based on my current knowledge and understanding On September 2005 – current knowledge and level of understanding (can change?) My belief system I believe in Data Grids for Inter/Intra/Multi-Organizational Unstructured Data Management (biased ?) My belief might not be in sync with your belief, but it can co-exist with your favorite technology

4 San Diego Supercomputer Center SDSC Storage Resource Broker 4 Meet my friends – Rebels and Misfits Esoteric Requirements from “High-end” users To keep them alive, they need more… more of every thing Requirements not broadly felt or required in industry They push the existing technology to the limits From the existing technology’s perspective… These folks are nuts! The existing technology was not designed for these requirements My friends become rebels or misfits from the existing technology’s perspective

5 San Diego Supercomputer Center SDSC Storage Resource Broker 5 Talk Outline “Next Hype in Grids” My belief system before we begin Meet my friends – Rebels and Misfits File Systems, Databases, Datagrids Mapping physical data to logical view Mapping physical data and storage to logical view SRB Statistics Mapping physical data, storage and processes to logical view Data Grid Language Conclusion What Now = work and sacrifices; What Next = Vision

6 San Diego Supercomputer Center SDSC Storage Resource Broker 6 Mapping physical data to logical view Hierarchical view, independent of network, disk, sector, track, fragments Rule : Storage Abstraction – Hide storage resources

7 San Diego Supercomputer Center SDSC Storage Resource Broker 7 Mapping physical data to logical view Relational view (assume its a database), independent of network, disk, sector, track, fragments Thanks to rebels and misfits in Airline industry who wanted transactional capabilities

8 San Diego Supercomputer Center SDSC Storage Resource Broker 8 Talk Outline “Next Hype in Grids” My belief system before we begin Meet my friends – Rebels and Misfits File Systems, Databases, Datagrids Mapping physical data to logical view Mapping physical data and storage to logical view SRB Statistics Mapping physical data, storage and processes to logical view Data Grid Language Conclusion What Now = work and sacrifices; What Next = Vision

9 San Diego Supercomputer Center SDSC Storage Resource Broker 9 NIH BIRN SRB Data Grid Biomedical Informatics Research Network Access and analyze biomedical image data Data resources distributed throughout the country Medical schools and research centers across the US Stable high performance grid based environment Coordinate data sharing Federate collections Support data mining and analysis

10 San Diego Supercomputer Center SDSC Storage Resource Broker 10 Mapping distributed data & storage to logical view 25 Universities or Research Hospitals, Multiple heterogeneous storage resources

11 San Diego Supercomputer Center SDSC Storage Resource Broker 11 Approach we have taken in Data Grids Logical Schema (view) is independent of physical schema Just like databases or even file systems Physical Resources are provided in the form of logical resources in the logical view This is very different from databases (may be similar to tablespaces) A database is used for mapping Data path, network, access permissions, meta data, storage type, logical storage resource, physical storage resources Used for digital libraries, persistent archives and data grids

12 San Diego Supercomputer Center SDSC Storage Resource Broker 12 The “Grid” Vision

13 San Diego Supercomputer Center SDSC Storage Resource Broker 13 Data Grid Resource Providers Grid Resource Providers (GRP) providing content and/or storage GRP /txt3.txt GRP

14 San Diego Supercomputer Center SDSC Storage Resource Broker 14 Data Grid Administrative Domain GRP Administrative domain with one or more Grid Resource Providers Could include their data centers /txt3.txt GRP Research Lab

15 San Diego Supercomputer Center SDSC Storage Resource Broker 15 Data Grid Administrative domains /…/text1.txt /…//text2.txt GRP /txt3.txt GRP Storage-R-Us Resource Providers data + storage (50) Research lab data + storage (40) University data + storage (10)

16 San Diego Supercomputer Center SDSC Storage Resource Broker 16 Data Grid: Logical view of data & resources /…/text1.txt /…//text2.txt GRP /txt3.txt GRP Storage-R-Us Resource Providers data + storage (50) Research Lab data + storage (40) University data + storage (10) /home/arun.sdsc/exp1 /home/arun.sdsc/exp1/text1.txt /home/arun.sdsc/exp1/text2.txt /home/arun.sdsc/exp1/text3.txt data + storage (100) Logical Namespace (Need not be same as physical view of resources )

17 San Diego Supercomputer Center SDSC Storage Resource Broker 17 BIRN: Inter-organizational Data

18 San Diego Supercomputer Center SDSC Storage Resource Broker 18 SDSC SRB User Community (Major US) BaBar, Stanford Linear Accelerator Center (SLAC) California Digital Library (CDL) Center for Integrated Space Weather Modeling (CISM) CVC, Visualization Portal LDC Data Storage NIH Bio Informatics Research Network (BIRN) NSF Southern California Earthquake Center (SCEC) National Archives and Records Administration (NARA) National Aeronautics and Space Administration Centers (NASA) National Virtual Observatory (NVO) Npackage, NSF Middleware Initiative (NMI) National Science Digital Library (NSDL) National Optical Astronomy Observatory (NOAO) ROADNet Purdue University SCCOOS, USA Scientific Rich Media Archive Salk Institute Strand Map Service, USA UC Berkeley Library UCSD Library University of Houston Persistent Archives Test bed University of Wisconsin, Madison WebBase, Stanford University Yale University Library

19 San Diego Supercomputer Center SDSC Storage Resource Broker 19 SDSC SRB User Community Academia Sinica, Taiwan Australian National University Bio-Lab, University of Genoa, Italy Council for the Central Laboratory of the Research Councils (CCLRC), UK CC-IN2P3, France Distributed Framework, Singapore Distributed Aircraft Maintenance Environment (DAME), UK eMinerals Project, UK eScience, Belfast Center Fraunhofer ITWM, Germany High Energy Accelerator Organization, KEK, Japan K* Grid Computing, Korea KEK Computing Center, Japan Lyon, France NorGrid, Norway Nanyang Data Grid, Singapore NCHC, Taiwan Queensland University of Technology (QUT), Australia Rutherford Appleton Laboratory (RAL), UK T-Systems, Germany UK eScience Project, UK UniGrid, Poland UMK, Poland Virtual Laboratory for eScience, Netherlands

20 San Diego Supercomputer Center SDSC Storage Resource Broker 20 324 TB 358 TB 682 TB Total data brokered by SDSC SRB

21 San Diego Supercomputer Center SDSC Storage Resource Broker 21 Talk Outline “Next Hype in Grids” My belief system before we begin Meet my friends – Rebels and Misfits File Systems, Databases, Datagrids Mapping physical data to logical view Mapping physical data and storage to logical view SRB Statistics Mapping physical data, storage and processes to logical view Data Grid Language Conclusion What Now = work and sacrifices; What Next = Vision

22 San Diego Supercomputer Center SDSC Storage Resource Broker 22 Mapping distributed data, storage and processes to logical view

23 San Diego Supercomputer Center SDSC Storage Resource Broker 23 Long-run Processes in Data Grid Data Grid ILM Data Grid Triggers Data Gridflows

24 San Diego Supercomputer Center SDSC Storage Resource Broker 24 Data Grid (Enterprise Utility) ABCZ.com US ABCZ.com Asia Data center IT Department USIT Department Asia 3 rd Party Physical Resources managed by autonomous administrative domains of the same enterprise (ABCZ.com)

25 San Diego Supercomputer Center SDSC Storage Resource Broker 25 Data Grid (Enterprise Utility) ABCZ.com US ABCZ.com Asia Data center IT Department USIT Department Asia 3 rd Party Project 1Project 2 Each project has a data grid instance consisting of Logical Resources with different SLAs offered by IT department

26 San Diego Supercomputer Center SDSC Storage Resource Broker 26 Data Grid (Enterprise Utility) ABCZ.com US ABCZ.com Asia Data center IT Department USIT Department Asia 3 rd Party Project1Project2Project3Project4

27 San Diego Supercomputer Center SDSC Storage Resource Broker 27 Data Grid ILM

28 San Diego Supercomputer Center SDSC Storage Resource Broker 28 Change is Constant Changes in access patterns Based on number of users accessing a data Domains which want to access data Data Value The value of data set (collections?) for a particular domain based on it business model and users’ access patterns Each domain will have a different value based on its users and its role in a data grid

29 San Diego Supercomputer Center SDSC Storage Resource Broker 29 “Data Value” based on users ABCZ.com US ABCZ.com Asia Data center IT Department USIT Department Asia 3 rd Party Project1Project2Project3Project4 When more users access a project’ data, its data value increases, move that data to a faster storage type

30 San Diego Supercomputer Center SDSC Storage Resource Broker 30 “Data Value” based on domain ABCZ.com US ABCZ.com Asia Data center IT Department USIT Department Asia 3 rd Party Project1Project2Project3Project4 When more users from the same domain access the data, the data value for that particular data in that particular domain increases, so replicate the data to resources in that domain. (converse is also true)

31 San Diego Supercomputer Center SDSC Storage Resource Broker 31 “Data Value” based on role ABCZ.com US ABCZ.com Asia Data center IT Department USIT Department Asia 3 rd Party Project1Project2Project3Project4 The 3 rd party data center – no users who use data, but is interested in having replica of any data (or deleted data) for long term preservation

32 San Diego Supercomputer Center SDSC Storage Resource Broker 32 Data Grid ILM ILM = Information Lifecycle Management (Sales Jargon) Dynamic re-orientation of data placement and data retention policies (rules) Based on “business value of data” and storage cost HSM = Hierarchical Storage Management, based on “data freshness”. ILM goes one step further Applying this concept on Data Grid, very tricky as different autonomous domains have different business rules

33 San Diego Supercomputer Center SDSC Storage Resource Broker 33 Data Grid Triggers Similar to triggers in databases Based on ECA concepts Event Condition Action Example Event = Insert new file in collection (“/ourProject/data”) Condition = (color= “blue” && galaxy = “Andromedia”) Action = Run ( selectiveDataReplicator.dgl )

34 San Diego Supercomputer Center SDSC Storage Resource Broker 34 Talk Outline “Next Hype in Grids” My belief system before we begin Meet my friends – Rebels and Misfits File Systems, Databases, Datagrids Mapping physical data to logical view Mapping physical data and storage to logical view SRB Statistics Mapping physical data, storage and processes to logical view Data Grid Language Conclusion What Now = work and sacrifices; What Next = Vision

35 San Diego Supercomputer Center SDSC Storage Resource Broker 35 Data Grid Language Requirement Data Grid ILM process The long run process that has to be run is described in DGL Data Grid Triggers Action part of the ECA (Event-Condition-Action) logic Data Gridflows Step by step execution of long run process on Data Grid Analogy of SQL in relational databases Long-run procedures stored and executed in Data Grid it self Captures the “Infrastructure Execution Logic”

36 San Diego Supercomputer Center SDSC Storage Resource Broker 36 DGL Request Annotations about the Data Grid Request Can be either a Flow or a Status Query

37 San Diego Supercomputer Center SDSC Storage Resource Broker 37 DGL Requests (2 types) Data Grid Flow An XML Structure that describes the execution logic, associated procedural rules and DGL variables. Can be synchronous or asynchronous flow Status Query An XML Structure used to query the execution status any gridflow or a sub-flow at any granular level. Status Queries can be made for both synchronous and asynchronous flows

38 San Diego Supercomputer Center SDSC Storage Resource Broker 38 Flow Scoped Variables that can control the flow Logic used by the sub-members Sub-members that are the real execution statements

39 San Diego Supercomputer Center SDSC Storage Resource Broker 39 Flow Logic (How a flow executes)

40 San Diego Supercomputer Center SDSC Storage Resource Broker 40 DGL-Response Responses can be synchronous or asynchronous

41 San Diego Supercomputer Center SDSC Storage Resource Broker 41 Talk Outline “Next Hype in Grids” My belief system before we begin Meet my friends – Rebels and Misfits File Systems, Databases, Datagrids Mapping physical data to logical view Mapping physical data and storage to logical view SRB Statistics Mapping physical data, storage and processes to logical view Data Grid Language Conclusion What Now = work and sacrifices; What Next = Vision

42 San Diego Supercomputer Center SDSC Storage Resource Broker 42 Conclusion Data Grids are for real – they manage Inter/Intra/Multi-organizational unstructured data (files, streams, …) Data Grids extend the database concepts and internally use a database A language like Data Grid Language mentioned here is necessary for the proliferation and automation of Data Grid Management Systems (DGMS) Reference: Paper in VLDB Workshop on Data Management in Grids

43 San Diego Supercomputer Center SDSC Storage Resource Broker 43 We are SDSC SRB Arun is here! - Shameless Self promotion Not in picture: Many students

44 San Diego Supercomputer Center SDSC Storage Resource Broker 44 Additional Thanks (Ignorance is a bliss) My Advisor: “You already graduated, and have a job at a research firm. Now why are writing to MS Research? Whom did you write to?” Me: “I wrote to two people. The first person works on social communities, we can use service brokering for them. I have not got any response from him. But there is another person who did respond. His last name is of the color “Gray” and his web page is very cheesy with music in the background. I guess he does not do much computer science – he works with astronomers.

45 San Diego Supercomputer Center SDSC Storage Resource Broker 45 Contact Info Arun Jagatheesan arun@sdsc.edu Or srb@sdsc.edu http://www.sdsc.edu/srb/


Download ppt "San Diego Supercomputer Center SDSC Storage Resource Broker A Data Storage Language for the Requirements of Rebels and Misfits Arun Jagatheesan San Diego."

Similar presentations


Ads by Google