Presentation is loading. Please wait.

Presentation is loading. Please wait.

Why GridFTP? l Performance u Parallel TCP streams, optimal TCP buffer u Non TCP protocol such as UDT u Order of magnitude greater l Cluster-to-cluster.

Similar presentations


Presentation on theme: "Why GridFTP? l Performance u Parallel TCP streams, optimal TCP buffer u Non TCP protocol such as UDT u Order of magnitude greater l Cluster-to-cluster."— Presentation transcript:

1 Why GridFTP? l Performance u Parallel TCP streams, optimal TCP buffer u Non TCP protocol such as UDT u Order of magnitude greater l Cluster-to-cluster data movement u Another order of magnitude l Support for reliable and restartable transfers l Multiple security options u Anonymous, password, SSH, GSI

2 GridFTP Data Transfers for the Advanced Photon Source “One Australian user left nearly 1TB of data on our systems that we had been struggling to transfer via standard FTP for several weeks. The typical data rate using standard FTP was ~200 KB/s. Using GridFTP we are now moving data at 6 MB/s—quite a significant boost in performance!” Brian Tieman Advanced Photon Source 30x speedup 9688 miles

3 Cluster-to-Cluster transfers

4 Users l HEP community is basing its entire tiered data movement infrastructure for the LHC computing Grid on GridFTP l Southern California Earthquake Center (SCEC), European Space Agency, Disaster Recovery Center in Japan move large volumes of data using GridFTP l An average of more than 2 million data transfers happen with GridFTP every day

5 A join activity l This is equivalent to running: SELECT id, x, y FROM tableOne, tableTwo where table1.id = table2.myID; l Where tableOne and tableTwo are in two different databases Tuple merge join SELECT id, x FROM tableOne ORDER by id Run SQL query SELECT myID, y FROM tableTwo ORDER by myID joinColumn2: myIDjoinColumn1: id Run SQL query

6 OGSA DAI SQL views l Layer above the database to implement views l Define views for databases to which you don’t have write access l Parses query l Maps view to SQL query over actual database l e.g if DrPatient was defined as u SELECT p.id, p.name, p.age, p.sex FROM Patient p, Doctor d WHERE p.DrID = d.ID AND d.dn = $DN$; u Can replace $DN$ by client’s DN from their certificate provided using GT4 security components u Doctors can only view their own patients l Factor in the client’s security credentials

7 Objectives for Data Replication A A A A A A Improve Durability Safeguard against data loss due to disk failure Improve Availability Safeguard against data inaccessibility due to network partition Improve Performance Safeguard against performance bottlenecks due to resource overload

8 Data Placement Services: Motivation l Scientific applications often perform complex computational analyses that consume and produce large data sets u Computational and storage resources distributed in the wide area l The placement of data onto storage systems can have a significant impact on u performance of applications u reliability and availability of data sets l We want to identify data placement policies that distribute data sets so that they can be u staged into or out of computations efficiently u replicated to improve performance and reliability

9 Replication occurs when… l Replica Placement u I want replica X at sites A, B, and C u I want N replicas of each file u I want replicas near my compute clusters l Replica Repair u Due to replica failure: lost or corrupted u But it can be hard to tell the difference between permanent and temporary failure!

10 Examples of Placement Policies Make N copies placed randomly on different sites Random One on my server, one on the same rack, one on another rack Topology-aware Query-based replication requests to push or pull data to make new replicas Publish/Subscribe Push replicas toward the “leaf” nodes (or access points) of the tree Tree-based dissemination Exploit locality of reference by creating replicas at any site where they are accessed Pervasive Place replicas at sites in order to optimize Quality-of-Service (QoS) criteria QoS Aware

11 Other Uses l GridFTP can be embedded in applications for high-performance data streaming l GridFTP can be used with SSH-style public keys instead of certificates l RFT can provide a Web services interface to GridFTP l RFT is used by GRAM for file staging l OGSA DAI can be used to implement a metadata service l And many more… OSGCC 2008Globus Primer: An Introduction to Globus Software11


Download ppt "Why GridFTP? l Performance u Parallel TCP streams, optimal TCP buffer u Non TCP protocol such as UDT u Order of magnitude greater l Cluster-to-cluster."

Similar presentations


Ads by Google