Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Condor Data Access Framework GridFTP / NeST Day 31 July 2001 Douglas Thain.

Similar presentations


Presentation on theme: "The Condor Data Access Framework GridFTP / NeST Day 31 July 2001 Douglas Thain."— Presentation transcript:

1 The Condor Data Access Framework GridFTP / NeST Day 31 July 2001 Douglas Thain

2 The Condor Data Access Framework Philosophy Components Organization: Communities Resource Discovery with ClassAds Example Applications Ongoing Work

3 Philosophy Goal: location-independent execution of jobs with large I/O needs. Build moderately-sized mechanisms that can be quickly deployed to existing problems. With experience, explore general-purpose polcies and larger systems. Priorities: Reliability and Correctness Throughput (PB/year) … Performance (MB/sec)

4 Where does Globus fit in? We expect that the Globus protocols will be the lingua franca of the grid. Condor is committed to speaking the right language in order to participate. Like any integration effort, there are some impedance-matching problems in both protocols and APIs. None are insurmountable.

5 Components NeST - Network Storage Appliance ReqEx - Scheduled Data Mover Kangaroo - Opportunistic Data Mover Bypass - Adapts Apps to Grid ClassAds - Express Relationships Others?

6 NeST MSSNeSTFTPD Schedules I/O according to declarations. ReqEx Performs I/O as apps request and conditions permit. Bypass Adapts ordinary I/O operations into grid protocols. ClassAds Express relationships and restrictions between participants.

7 ReqEx FTPDNeST Begin with list of jobs and data needs. Reserve space, Move inputs, Submit jobs, Move outputs. Scheduled Data Mover

8 Kangaroo FTPDNeST Move outputs back: During execution As conditions permit Fine-grained Hop-by-hop Move inputs: On demand Should cache Opportunistic Data Mover

9 Bypass NeST Bypass Creates interposition agents that re- route system calls to other code. Pluggable File System (PFS): An agent build with Bypass. Presents grid protocols as filesystems. vi /ftp/coral.cs.wisc.edu/etc/hosts

10 Organizing Structure: I/O Communities A community is simply a storage appliance shared by a number of CPUs. Traditional community: distributed file system. Ordinary users want to restructure communities according to application and load. So, communities for grid computing should be easy to set up, reconfigure, and tear down. NeST + Bypass makes this easy -- use the protocol appropriate for the situation.

11 I/O Communities Short-haul I/O Long-haul I/O GridFTP Chirp

12 What Discovery System? Device Discovery Replica Discovery If X is not on my disk, where can I find it? Where is my disk? Where can I place My output now? If I fetch X, where should I put it so that others can find it?

13 Everything Together AgentJob Device Discovery Replica Discovery CPU Discovery Execution Site NeST Remote Storage Short-Haul Long-Haul

14 Resource Discovery with ClassAds “Classic” ClassAds describe the properties and requirements of two parties looking for each other. When expressing I/O communites, there are three parties to a match: jobs, machines, and storage. By extending the language slightly, we allow jobs to refer to the properties of the attached storage: Requirements = NearestStorage.HasCMSData

15 Classic ClassAds Machine Job Ad Machine Ad match

16 References in ClassAds Machine NeST Job Ad Machine Ad Storage Ad match Refers to NearestStorage. Knows where NearestStorage is.

17 ClassAd Example Job Ad: Type = “Job” Cmd = “cmsim.exe” Owner = “thain” Requirements = (OpSys==LINUX) &&(NearestStorage.HasCMS) Machine Ad: Type = “Machine” Name = “vulture” OpSys = “Linux” Requirements = (Owner==“thain”) NearestStorage = (Type==“Storage”) &&(Name==“turkey”) Storage Ad: Type = “Storage” Name = “turkey” HasCMS = True CMSPath = “/cms”

18 Notes on ClassAds Every match is a hint Participants must verify in claiming phase. Storage: If dataset is missing, abort process and roll back. Reference feature is new - Condor 6.3 A variation on ‘gang-matching’ as described by Raman, et. al.

19 Example Applications I/O Communities: Applied to CMS simulation codes running at INFN and UW. Unmodified apps retrieve calibration data from nearest NeST. Kangaroo Applied to Gaussian codes running at NCSA. Users get progressive output when possible, but network failures don’t stop output. Same idea applied to CMS reconstruction at INFN. (Older work called Grid Console.) ReqEx In testing mode on CMS reconstruction at UW.

20 Ongoing Work Move jobs to data or vice versa? We can easily build communities for a particular application. Can we build software that works reasonably well in any situation? Select staging or remote I/0? Depends on number of jobs, storage capacity, network capacity, etc… Integration with replica management. Is the App->NeST channel collection aware?

21 Upcoming Publications Thain, Basney, Chang, Livny, “The Kangaroo Approach to Data Movement on the Grid”, HPDC 10. Thain, Bent, Livny, Arpaci-Dusseau, Arpaci Dusseau, “Gathering at the Well: Creating Communities for Grid I/O” - Supercomputing 2001.

22


Download ppt "The Condor Data Access Framework GridFTP / NeST Day 31 July 2001 Douglas Thain."

Similar presentations


Ads by Google