Presentation is loading. Please wait.

Presentation is loading. Please wait.

Connecting arbitrary data sources to the grid Shunde Zhang Australian Research Collaboration Service (ARCS) eResearch SA School of Computer Science, University.

Similar presentations


Presentation on theme: "Connecting arbitrary data sources to the grid Shunde Zhang Australian Research Collaboration Service (ARCS) eResearch SA School of Computer Science, University."— Presentation transcript:

1 Connecting arbitrary data sources to the grid Shunde Zhang Australian Research Collaboration Service (ARCS) eResearch SA School of Computer Science, University of Adelaide

2 Background Australian Research Collaboration Service A successor of APAC Services –HPC –Data –Collaboration tools: AccessGrid, EVO, Plone, drupal, Sakai

3 ARCS Data Fabric

4 ARCS Data Fabric (cont.) A national service Provided to all Australian researchers Based on iRODS

5 The Problem Interoperability with The Grid –The Grid: Globus, gLite, condor, etc. –Data sources GridFTP-compatible: dCache Non GridFTP-compatible: iRODS, SRB Possible solutions –Manual copy (or do it in PBS script) –Copy queue

6 The Problem (cont.) Movement of massive data –Both ends use same software (talks same protocol) –Different systems are used (talks different protocol) –Efficiency Possible solutions –Transfer via an intermediate point

7 A solution - old fashioned AWS Import/Export for Amazon S3 –Ship the hard-disks by courier company

8 Our Solution - GridFTP De facto standard –Compatible with the Grid, and many grid clients Efficiency –Parallel transfer –Data channel reuse –Large file transfer - in small blocks Compatible with many file transfer services –Monitoring –Scheduling

9 An overview of GridFTP protocol Based on FTP with extensions Third-party transfer –Intermediate point not needed Security - GSI Extended block mode –Parallel transfer –Striped transfer –Partial transfer Reliable and restartable TCP and UDP

10 The Architecture GridFTP interface Generic File System Framework Data Source Plugin Data Source

11 Generic File System Framework FileSystem FileSystemConnection FileObject RandomAccessFileObject creates

12 FileSystem interface public String getSeparator(); public void init() throws IOException; public FileSystemConnection createFileSystemConnection(GSSCredenti al credential) throws FtpConfigException, IOException; public void exit();

13 FileSystemConnection interface public FileObject getFileObject(String path); public String getHomeDir(); public String getUser(); public void close() throws IOException; public boolean isConnected(); public long getFreeSpace(String path);

14 FileObject interface public String getName(); public String getPath(); public boolean exists(); public boolean isFile(); public boolean isDirectory(); public int getPermission(); public String getCanonicalPath() throws IOException; public FileObject[] listFiles(); public long length(); public long lastModified(); public RandomAccessFileObject getRandomAccessFileObjec(String type) throws IOException; public boolean delete(); public FileObject getParent(); public boolean mkdir(); public boolean renameTo(FileObject file); public boolean setLastModified(long t);

15 RandomAccessFileObject interface public void seek(long offset) throws IOException; public int read() throws IOException; public int read(byte[] b) throws IOException; public int read(byte[] b, int off, int len) throws IOException; public void close() throws IOException; public String readLine() throws IOException; public void write(int b) throws IOException; public void write(byte[] b) throws IOException; public void write(byte[] b, int off, int len) throws IOException; public long length() throws IOException;

16 The Implementation - Griffin GridFTP interface Generic file system framework GridFTP client Grid job submission system Data transfer service Adaptor for iRODS Adaptor for local file system Other adaptors iRODS Local File System Other data source Griffin

17 Features GridFTP protocol version 1 Java-based – Spring framework – OS-independent Lightweight, stand-alone, self-contained – No need to install Globus Toolkit Two plugins included – iRODS plugin – Local file system plugin Open source (Apache 2 & GPL)

18 Parallel transfer with Griffin Client Griffin Data Source WANLAN/localhost

19 Authentication GSI –iRODS plugin User mapping –local file system plugin –XML file Maps GSI authentication (certificate DN) to internal user management system

20 Use case Integration of the Grid and Data Fabric –iRODS plugin for Data Fabric –Third-party transfer to cluster (Globus GridFTP) Tested with –Globus.org –Globus-url-copy (5.0 and 4.x) –Globus GridFTP GUI

21 Performance Evaluation Server: Two quad-core Xeon 3.16GHz CPU, 16GB memory Client: IBM xSeries 346 with two hyper- threaded Intel Xeon 3.20GHz CPUs, 4GB memory Network: 1Gbps LAN WAN: two 10Gbps links Transfer: 256MB, 512MB, 1GB, 2GB, 4GB, 8GB, 16GB –iCommands –Globus-url-copy

22 Evaluation Set up - Griffin vs iCommands Client iRODS Local File System Griffin Jargon Adaptor globus-url-copy iCommands

23 Evaluation Result Chart - Griffin vs iCommands

24 Evaluation Set up - Griffin vs Globus GridFTP Client Globus GridFTP server Local File System Griffin Local FS Adaptor globus-url-copy

25 Evaluation Result Chart - Griffin vs Globus GridFTP

26 Related work Client library –SAGA/jSAGA –Commons-vfs Data transfer service –Stork –PAFTP Globus –XIO –DSI

27 Griffin vs. Globus GridFTP GriffinGlobus GridFTP JavaC OS-independent*nix Simple, standalonecomplex

28 Conclusion A generic solution to connect arbitrary data sources to the grid –Data in/out of the grid –Data transfer between different data sources Java-based implementation –Standalone, lightweight –Plugable –Not depend on Globus

29 Future work Currently working on a plugin for MongoDB Java NIO UDP Striped transfer

30 MongoDB plugin MongoDB –NOSQL database –Stores JSON-style documents –GridFS component Stores files Plugin for griffin –Read/write files via GridFS

31 Acknowledgements ARCS funded

32 Current Status ARCS production service Used to transfer data in/out of ARCS Data Fabric Website –https://projects.arcs.org.au/trac/griffin

33 Thank you! Questions/Comments?


Download ppt "Connecting arbitrary data sources to the grid Shunde Zhang Australian Research Collaboration Service (ARCS) eResearch SA School of Computer Science, University."

Similar presentations


Ads by Google