Presentation on theme: "Sergiu January 2007 TG Users’ Data Transfer Needs SDSC NCAR TACC UC/ANL NCSA ORNL PU IU PSC."— Presentation transcript:
Sergiu January 2007 TG Users’ Data Transfer Needs SDSC NCAR TACC UC/ANL NCSA ORNL PU IU PSC
Sergiu January 2007 Agents provocateurs I will kick off a discussion based on two use cases. Other user support folks in the audience will then chime in. This should give you a good sampling of user requirements, since all users basically want the same things and don’t really care how it’s done.
Sergiu January 2007 GASOLINE N-body cosmology software system by T. Quinn et al., U. Washington Phase 1: Generate Initial Conditions (ICs) –On Cobalt at NCSA: Job 1: generate 3 “force files” o(1-100 GB) Interactive GridFTP transfer of force files to PSC archive Job 2: from force files to IC file o( 100 MB – 100 GB) GridFTP transfer of IC file to PSC BigBenGridFTP transfer of IC file to PSC BigBen Phase 2: Simulation –On BigBen at PSC: Run for several weeks per IC. “Small cases”: output data files o(100 MB), total for run o(100 GB) “Large cases”: output data files o(100 GB), total for run o(2 TB) Archive at PSC (in future maybe also elsewhere on TG)
Sergiu January 2007 GASOLINE cont’d Phase 3: Data analysis –“Small” files (~100 MB each) directly to UW –“Large files” (~100 GB each) to Cobalt at NCSA Use GridFTP, need each transfer to complete in ~hrsUse GridFTP, need each transfer to complete in ~hrs Phase 4: Visualization using SALSA –Done at UW for “small” files –For “large” files TCS or NCSA DTF (XT3 port underway) Need distributed memory machine that can handle dynamic linking and whose compute nodes have sockets connectivity to a machine that in turn can communicate with the client application at UW.
Sergiu January 2007 GASOLINE Wish List Reliability –“We have a very low success rate in being able to initiate a GridFTP file transfer. Usually there is some authentication issue that appears either during "grid- proxy-init" or during the invocation of the GridFTP client itself.” Batch-initiated file transfers (BIFT) – Enable spawning a GridFTP file transfer at the end of a compute job. E.g. when an IC file is generated on Cobalt, spawn a GridFTP transfer to the PSC archiver. When a simulation job completes on BigBen, spawn a GridFTP transfer of the new data files to the reduction or visualization system.
Sergiu January 2007 Wish list con’t More on BIFT: –“Whether this is done via an explicit command-line transfer or a WAN-FS is not important – we just need near 100% reliability, and transfer to occur even several days after the original PBS job was submitted. The limited lifetime of proxies usually makes this difficult. –Please note that we do not want to do the file transfer as part of the actual compute job. We want to be able to dump the data during the simulation to the fastest local file system that is available. When the compute job completes, it should spawn a separate job (for which we are not billed 2048 SUs per hour) that does the file transfer.”
Sergiu January 2007 One more thing… “One thing I would highly recommend for command line file transfers is that the command line be somewhat shorter than globus-url-copy requires. Every researcher that I talk to about globus-url-copy does not use it because too much typing is involved. tgcp is supposed to address this problem, but it I have found it to be extremely unreliable at this point. Usually its "translation" from tgcp format to globus-url-copy format is incorrect and its call to globus-url-copy does not work. The best idea, I think, would be if someone could make a GridFTP client that was invoked using exactly the same conventions as "scp." Researchers would not have to learn anything new to be able to use it.” to use it.” –Jeff Gardner, PSC, Gasoline co-author & team member.
Sergiu January 2007 How to get MyCluster users to transfer files via TG? “Most users of MyCluster actually only use it to submit to a single machine. No matter how much I show them MyCluster’s amazing ability to easily run on multiple systems (which you can accomplish by adding just a single line to a configuration file), they have no interest in using this ability since they do not want to deal with file transfers between TeraGrid sites. Although they seem to understand that the automatic file transfers are actually quite simple to set up in MyCluster, they still resist the idea. Thus, the obstacle seems simply to be the concept of having their files automatically transferred to and from a location. They are not used to having this capability, so they avoid it. On the other hand, several have asked about shared file systems, and a few have indicated that if there were indeed some shared file system between multiple TeraGrid sites, that would make the concept of distributing their runs across sites more palatable. Most users have experience with NFS. Since they can directly map this onto something they already use, they seem much more comfortable with the idea.” On the other hand, several have asked about shared file systems, and a few have indicated that if there were indeed some shared file system between multiple TeraGrid sites, that would make the concept of distributing their runs across sites more palatable. Most users have experience with NFS. Since they can directly map this onto something they already use, they seem much more comfortable with the idea.” – Jeff Gardner, Ed Walker’s partner in MyCluster testing & deployment.
Sergiu January 2007 My Summary – Start of Debate In the TG system, there are two kinds of users: ftp’ers and NFSniks. Both need reliability first and foremost.Both need reliability first and foremost. Both care about performance, in the sense that files must be where needed when needed – the “when” can vary from minutes to ~ 1dayBoth care about performance, in the sense that files must be where needed when needed – the “when” can vary from minutes to ~ 1day But ftp’ers tend to care more (that’s why they’ve learned ftp/scp in the first place) User interface and scripting support are essential for tool adoption.User interface and scripting support are essential for tool adoption. Persistent WAN-FS hosting of community code as well asPersistent WAN-FS hosting of community code as well as working datasets would enhance many user groups’ productivity.
Sergiu January 2007 Looking ahead TB-months disk storage of working datasets is increasingly important as TG fosters complex WAN workflows.TB-months disk storage of working datasets is increasingly important as TG fosters complex WAN workflows. TG WIDE and OPEN will also place increasing importance on read-only access to Archives and Repositories.TG WIDE and OPEN will also place increasing importance on read-only access to Archives and Repositories. Live instruments and Repositories to be integrated with computational workflows => complex data workflows, cf. Tier 0 – 2 for LHC experiments (OSG)Live instruments and Repositories to be integrated with computational workflows => complex data workflows, cf. Tier 0 – 2 for LHC experiments (OSG) Track 1 and Track 2 systems will be, in essence, large instruments…