(e)Science-Driven, Production- Quality, Distributed Grid and Cloud Data Infrastructure for the Transformative, Disruptive, Revolutionary, Next-Generation.

(e)Science-Driven, Production- Quality, Distributed Grid and Cloud Data Infrastructure for the Transformative, Disruptive, Revolutionary, Next-Generation TeraGrid (now with free ponies) Data Architecture Progress Report December 11, 2008 Chris Jordan

Goals for the Data Architecture Improve the experience of working with data in the TeraGrid for the user community Reliability, Ease of use, Performance Integrate data management into the user workflow Balance performance goals against usability Avoid overdependence on data location Support the most common use cases as transparently as possible Move data in, run job, move data out as basic pattern Organize, search, and retrieve data from large “collections”

Some Realities Cannot address the issue of available storage Limited opportunity to improve data transfer performance at the high end Cannot introduce drastic changes to TG infrastructure at this stage of the project Remain dependent on the availability of technology and resources for wide-area file systems

Areas of Effort Simplifying command-line data movement Extending the reach of WAN file systems Develop unified data replication and management infrastructure Extend and unify user portal interfaces to data Integrate data into scheduling and workflows Provide common access mechanisms to diverse, distributed data resources

Extending Wide-Area File Systems A “Wide-Area” file system is available on multiple resources A “Global” file system is available on all TeraGrid resources Indiana and SDSC each have a WAN-FS in production now PSC has promising technology for distributed storage and Kerberos integration, but need testing to understand best management practices Point of emphasis: going production

Data Capacitor-WAN (DC-WAN) IU has this in production on BigRed, PSC Pople Can be mounted on any cluster running Lustre 1.4 or Lustre 1.6 Ready for testing and move to production Sites and resources committed: –TACC Lonestar, Ranger, Spur –NCSA Abe, possibly Cobalt and/or Mercury –LONI Queen Bee (testing, possible production) –Purdue Steele? –This presentation is an excellent opportunity to add your site to this list.

PSC “Josephine-WAN” Two major new design features: –Kerberos-based identity mapping –Distributed data and metadata Kerberos is likely to work well OOTB Distributed data/”storage pools” will need careful configuration and management Technology working well, but needs to be actively investigated and tested in various configurations Want to work on integration with TG User Portal

Getting to Global No single file system technology will be compatible/feasible to deploy on every system Will require hybrid solutions TGUP helps, but … Need to understand limit on simultaneous mounts, and … Once production DC-WAN reaches the technical limit, look at technologies to extend the FS: –pNFS –FUSE/SSHFS

Command-line tools Many users are still oriented towards shell access GridFTP is complicated to use via globus-url-copy –Long URLS, many often inconsistent options SSH/SCP is almost universally available and familiar to users Limited usefulness for data transfer in current configuration Simple changes to SSH/SCP configuration: –Support SCP-based access to data mover nodes –Support simpler addressing of data resources –Provide resource specific “default” configuration

Unified Data Management Management of both data and metadata, distributed across storage resources Multiple sites support data collections using SRB, iRODS, databases, web services, etc. This diversity is good in the abstract, but also confusing to new users Extend current iRODS-based data management infrastructure to additional sites Expand REDDNET “cloud storage” availability Integrate access to as many collections as possible through the User Portal

Interfaces to Data SSH and “ls” are not effective interfaces to large, complex datasets Portal and Gateway interfaces to data have proven useful and popular, but: They may not be able to access all resources, may require significant gateway developer effort Extend user portal to support WAN file systems and distributed data management Possible to expose user portal and other APIs to ease development of gateways?

Integrating Data into Workflows Almost all tasks run on TeraGrid require some data management and multiple storage resources Users should be able to include these steps as part of a job or workflow submission Port DMOVER to additional schedulers, deploy across TeraGrid –Working on BigBen, ready for Kraken –Working on SGE and LSF Evaluate PetaShare, other “Data Scheduling” systems (Stork?)

Gratuitous end slide #42 Data-WG has many attendees, but few participants We need: –More sites committed to deploying DC-WAN in production –More sites committed to testing “Josephine-WAN” –More sites contributing to Data Collections infrastructure –Help porting DMOVER, testing PetaShare and REDDNET –Users and projects to exercise the infrastructure –Select one or more If not you, who? If not now, when?

(e)Science-Driven, Production- Quality, Distributed Grid and Cloud Data Infrastructure for the Transformative, Disruptive, Revolutionary, Next-Generation.

Similar presentations

Presentation on theme: "(e)Science-Driven, Production- Quality, Distributed Grid and Cloud Data Infrastructure for the Transformative, Disruptive, Revolutionary, Next-Generation."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

(e)Science-Driven, Production- Quality, Distributed Grid and Cloud Data Infrastructure for the Transformative, Disruptive, Revolutionary, Next-Generation.

Similar presentations

Presentation on theme: "(e)Science-Driven, Production- Quality, Distributed Grid and Cloud Data Infrastructure for the Transformative, Disruptive, Revolutionary, Next-Generation."— Presentation transcript:

Similar presentations

About project

Feedback