Big Data transfer over computer networks Initial Sergey Khoruzhnikov Vladimir Grudinin Oleg Sadov Andrey Shevel Anatoly Oreshkin Elena Korytko Alexander.

Slides:



Advertisements
Similar presentations
Network II.5 simulator ..
Advertisements

The Globus Striped GridFTP Framework and Server Bill Allcock 1 (presenting) John Bresnahan 1 Raj Kettimuthu 1 Mike Link 2 Catalin Dumitrescu 2 Ioan Raicu.
Appropriateness of Transport Mechanisms in Data Grid Middleware Rajkumar Kettimuthu 1,3, Sanjay Hegde 1,2, William Allcock 1, John Bresnahan 1 1 Mathematics.
Cross-site data transfer on TeraGrid using GridFTP TeraGrid06 Institute User Introduction to TeraGrid June 12 th by Krishna Muriki
Current Testbed : 100 GE 2 sites (NERSC, ANL) with 3 nodes each. Each node with 4 x 10 GE NICs Measure various overheads from protocols and file sizes.
Esma Yildirim Department of Computer Engineering Fatih University Istanbul, Turkey DATACLOUD 2013.
GridFTP: File Transfer Protocol in Grid Computing Networks
Secure Off Site Backup at CERN Katrine Aam Svendsen.
RDMA ENABLED WEB SERVER Rajat Sharma. Objective  To implement a Web Server serving HTTP client requests through RDMA replacing the traditional TCP/IP.
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
Hands-On Microsoft Windows Server 2008 Chapter 1 Introduction to Windows Server 2008.
GridFTP Guy Warner, NeSC Training.
1 The SpaceWire Internet Tunnel and the Advantages It Provides For Spacecraft Integration Stuart Mills, Steve Parkes Space Technology Centre University.
Part Three: Data Management 3: Data Management A: Data Management — The Problem B: Moving Data on the Grid FTP, SCP GridFTP, UberFTP globus-URL-copy.
AliEn uses bbFTP for the file transfers. Every FTD runs a server, and all the others FTD can connect and authenticate to it using certificates. bbFTP implements.
IRODS performance test and SRB system at KEK Yoshimi KEK Building data grids with iRODS 27 May 2008.
Globus Striped GridFTP Framework and Server Raj Kettimuthu, ANL and U. Chicago.
Globus GridFTP: What’s New in 2007 Raj Kettimuthu Argonne National Laboratory and The University of Chicago.
Slide 1 DESIGN, IMPLEMENTATION, AND PERFORMANCE ANALYSIS OF THE ISCSI PROTOCOL FOR SCSI OVER TCP/IP By Anshul Chadda (Trebia Networks)-Speaker Ashish Palekar.
2nd April 2001Tim Adye1 Bulk Data Transfer Tools Tim Adye BaBar / Rutherford Appleton Laboratory UK HEP System Managers’ Meeting 2 nd April 2001.
Why GridFTP? l Performance u Parallel TCP streams, optimal TCP buffer u Non TCP protocol such as UDT u Order of magnitude greater l Cluster-to-cluster.
Secure, Collaborative, Web Service enabled and Bittorrent Inspired High-speed Scientific Data Transfer Framework.
Topaz : A GridFTP extension to Firefox M. Taufer, R. Zamudio, D. Catarino, K. Bhatia, B. Stearn University of Texas at El Paso San Diego Supercomputer.
Moving Large Amounts of Data Rob Schuler University of Southern California.
File and Object Replication in Data Grids Chin-Yi Tsai.
Reliable Data Movement using Globus GridFTP and RFT: New Developments in 2008 John Bresnahan Michael Link Raj Kettimuthu Argonne National Laboratory and.
Globus GridFTP and RFT: An Overview and New Features Raj Kettimuthu Argonne National Laboratory and The University of Chicago.
Data transfer over the wide area network with a large round trip time H. Matsunaga, T. Isobe, T. Mashimo, H. Sakamoto, I. Ueda International Center for.
UDT as an Alternative Transport Protocol for GridFTP Raj Kettimuthu Argonne National Laboratory The University of Chicago.
Remote Direct Memory Access (RDMA) over IP PFLDNet 2003, Geneva Stephen Bailey, Sandburst Corp., Allyn Romanow, Cisco Systems,
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Communicating Security Assertions over the GridFTP Control Channel Rajkumar Kettimuthu 1,2, Liu Wantao 3,4, Frank Siebenlist 1,2 and Ian Foster 1,2,3 1.
DYNES Storage Infrastructure Artur Barczyk California Institute of Technology LHCOPN Meeting Geneva, October 07, 2010.
Towards Exascale File I/O Yutaka Ishikawa University of Tokyo, Japan 2009/05/21.
Optimisation of Grid Enabled Storage at Small Sites Jamie K. Ferguson University of Glasgow – Jamie K. Ferguson – University.
LEGS: A WSRF Service to Estimate Latency between Arbitrary Hosts on the Internet R.Vijayprasanth 1, R. Kavithaa 2,3 and Raj Kettimuthu 2,3 1 Coimbatore.
Practical Distributed Authorization for GARA Andy Adamson and Olga Kornievskaia Center for Information Technology Integration University of Michigan, USA.
CEOS Working Group on Information Systems and Services - 1 Data Services Task Team Discussions on GRID and GRIDftp Stuart Doescher, USGS WGISS-15 May 2003.
Hepix LAL April 2001 An alternative to ftp : bbftp Gilles Farrache In2p3 Computing Center
Introduction to Grids By: Fetahi Z. Wuhib [CSD2004-Team19]
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE Site Architecture Resource Center Deployment Considerations MIMOS EGEE Tutorial.
IHEP(Beijing LCG2) Site Report Fazhi.Qi, Gang Chen Computing Center,IHEP.
BNL Service Challenge 3 Status Report Xin Zhao, Zhenping Liu, Wensheng Deng, Razvan Popescu, Dantong Yu and Bruce Gibbard USATLAS Computing Facility Brookhaven.
Globus and PlanetLab Resource Management Solutions Compared M. Ripeanu, M. Bowman, J. Chase, I. Foster, M. Milenkovic Presented by Dionysis Logothetis.
The Globus eXtensible Input/Output System (XIO): A protocol independent IO system for the Grid Bill Allcock, John Bresnahan, Raj Kettimuthu and Joe Link.
ALCF Argonne Leadership Computing Facility GridFTP Roadmap Bill Allcock (on behalf of the GridFTP team) Argonne National Laboratory.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
ANSE: Advanced Network Services for Experiments Institutes: –Caltech (PI: H. Newman, Co-PI: A. Barczyk) –University of Michigan (Co-PI: S. McKee) –Vanderbilt.
File Transfer And Access (FTP, TFTP, NFS). Remote File Access, Transfer and Storage Networks For different goals variety of approaches to remote file.
1.3 ON ENHANCING GridFTP AND GPFS PERFORMANCES A. Cavalli, C. Ciocca, L. dell’Agnello, T. Ferrari, D. Gregori, B. Martelli, A. Prosperini, P. Ricci, E.
DMLite GridFTP frontend Andrey Kiryanov IT/SDC 13/12/2013.
GridFTP Guy Warner, NeSC Training Team.
03/09/2007http://pcalimonitor.cern.ch/1 Monitoring in ALICE Costin Grigoras 03/09/2007 WLCG Meeting, CHEP.
Virtual Machine Movement and Hyper-V Replica
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
A Sneak Peak of What’s New in Globus GridFTP John Bresnahan Michael Link Raj Kettimuthu (Presenting) Argonne National Laboratory and The University of.
BDTS and Its Evaluation on IGTMD link C. Chen, S. Soudan, M. Pasin, B. Chen, D. Divakaran, P. Primet CC-IN2P3, LIP ENS-Lyon
Seminar On Rain Technology
A System for Monitoring and Management of Computational Grids Warren Smith Computer Sciences Corporation NASA Ames Research Center.
Activities and Perspectives at Armenian Grid site The 6th International Conference "Distributed Computing and Grid- technologies in Science and Education"
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Computing Clusters, Grids and Clouds Globus data service
Clouds of JINR, University of Sofia and INRNE Join Together
Recap: introduction to e-science
Study course: “Computing clusters, grids and clouds” Andrey Y. Shevel
Grid Canada Testbed using HEP applications
ExaO: Software Defined Data Distribution for Exascale Sciences
File Transfer Issues with TCP Acceleration with FileCatalyst
Beyond FTP & hard drives: Accelerating LAN file transfers
STATEL an easy way to transfer data
Presentation transcript:

Big Data transfer over computer networks Initial Sergey Khoruzhnikov Vladimir Grudinin Oleg Sadov Andrey Shevel Anatoly Oreshkin Elena Korytko Alexander Shkrebets Vladimir Titov Oleg Lazo Arsen Kairkanov (presenter) 2 July 2014sdn.ifmo.ru GRID 20141/17 Faculty of Infocommunication Technologies ITMO University St.Petersburg, Russian Federation

Outlook Sources of the Big Data. Ecosystem of the Big Data. Technology of the Big Data transfer. Our recently started research. 2 July 2014sdn.ifmo.ru GRID 20142/17

Network traffic growth 2 July 2014sdn.ifmo.ru GRID 20143/17

Scientific sources of Big Data Scientific experimental installations – - Large Synoptic Survey Telescope 15 TB per night (may be 10 PB/year) – - Square Kilometre Array PB/year – — CERN ~ 20PB/year (FAIR ~ same) – - International Thermonuclear Experimental Reactor ~ 1 PB/year – - CTA - The Cherenkov Telescope Array ~ 20 PB/year 2 July 2014sdn.ifmo.ru GRID 20144/17

3 Vs of Big Data 2 July 2014sdn.ifmo.ru GRID 20145/17

Big Data ecosystem 2 July 2014sdn.ifmo.ru GRID 20146/17 Big Data Provider (Organizations) Big Data Consumer (organizations, end users) Big Data distributed Framework Provider (Clusters, Storages, Networks, etc) Big Data Management (data transfer, every day management, security, etc)

Peculiarity of the Big Data transfer 2 July 2014sdn.ifmo.ru GRID 20147/17 Big Data transfer might consume many hours or days. The situation in channel might be changed: RTT, % of lost network packages, data link bandwidth). Finally, it might occured the interruption (hours?, days?) in operation of data link. Obviously it is useful to have access to two or more independent data links.

Technology peculiarities with Big Data transfer 2 July 2014sdn.ifmo.ru GRID 20148/17 Still main protocols — stack of TCP/IP. Number of network parameters in Linux (around ½ thousand). /proc -bash-4.1$ /sbin/sysctl -a | grep "^net\." | wc -l Important parameters: e.g. size of block, size of TCP Window, etc. Main method to decrease the transfer time (even over one data link) is using the multi-stream data transfer.

Testing on the first stage (program tools) 2 July 2014sdn.ifmo.ru GRID 20149/17 BBCP - GridFTP - BBFTP - FDT - ● FTS3 - Also technology components to watch the data links status, e.g. perfSONAR.

Ideas to compare the data transfer tools Availability. API. Performance. Reliability. Operation tracking. Ability to predict the time to transfer the data on the base of existing tracking records. Required resources: memory, CPU time, etc. Others. 2 July 2014sdn.ifmo.ru GRID /17

Research topic at ITMO University: the transfer of Big Data 2 July 2014sdn.ifmo.ru GRID /17 In laboratory of network technologies at ITMO University the new research «Big Data transfer over Internet» has been formed. It is planned to implement the special testbed (100 TB of disk storage + server 96 GB of main memory under OS RedHat/ScientificLinux on each side). Comparative study of the existing tools of the data transfer (testing and measurements). To use the testbed as instrument to compare various tools (tracking for the measurements + results). Extended automatic tracking information about measurements is under development.

Process of the Data transfer 2 July 2014sdn.ifmo.ru GRID /17

Planned measurements Local and long distant sites with existing data links (not only most advanced links). The idea is to use more than one data link in parallel. Recently we obtained some experience with Software Defined Networks (SDN) approach (protocol Openflow) and now we plan to use it in the Big Data transfer. 2 July 2014sdn.ifmo.ru GRID /17

What was done until now 2 July 2014sdn.ifmo.ru GRID /17 There were deployed ● Two servers HP DL380p Gen8 E5-2609, Intel(R) Xeon(R) CPU 64 GB under Scientific Linux 6.5. ● Six HP G-PoE yl (OpenFlow 1.0) ● Pica8 P-3920 (OpenFlow 1.2) ● Openstack Havana with appropriate set of Virtual Machines to test a number of mentioned utilities. ● PerfSonar ● Scripts for testing infocom/BigData

Main goals Combining the developed contemporary components and methods with ideas, developments, experience to achieve maximum speed for Big Data transfer on existing links. To create the testbed which would be used as place where researchers might compare theirs (new) tools for data transfer with earlier recorded measurement results. To sugggest the collaboration with … (suggestions?) To invite students from … (suggestions?) 2 July 2014sdn.ifmo.ru GRID /17

Partners (ideas exchange) 2 July 2014sdn.ifmo.ru GRID /17 –Laboratory of Information Technology (LIT) Joint Institute for Nuclear Research (JINR.ru) – The Application Research Center for Computer Networks at Moscow University – We are starting to collaborate with GENI The work is supported by the Saint-Petersburg University of Information Technology, Mechanics & Optics (ITMO University

Questions? 2 July 2014sdn.ifmo.ru GRID /17

OF Switches 2 July 2014sdn.ifmo.ru GRID /17

bbcp TCP Window size; number of TCP streams; I/O buffer size; compression on the fly; multi-directory copy; resuming failed copy; authentication with ssh; using pipes, where source or/and destination might be pipe; special option to transfer small files; and many other options dealing with many practical details. 2 July 2014sdn.ifmo.ru GRID /17

bbftp 2 July 2014sdn.ifmo.ru GRID /17 ◦encoded user name and password at connection; ▪SSH and Grid Certificate authentication modules; ▪multi-stream transfer; ▪big windows as defined in RFC1323; ▪on-the-fly data compression; ▪automatic retry ▪customizable time-outs; ▪transfer simulation; ▪AFS authentication integration.

gridFTP 2 July 2014sdn.ifmo.ru GRID /17 ◦two security flavors: Globus GSI and SSH; ◦the file with host aliases: each next data transfer stream will use next host aliases (useful for computer cluster); ◦pipes; ◦special debugging mode to find bottleneck in data transfer; ◦backend module name for source and destination sites; ◦number of parallel data transfer streams; ◦buffer size; ◦restart failed operations and number of restarts.

Other utilities 2 July 2014sdn.ifmo.ru GRID /17 Xdd – utility developed to optimize data transfer and I/O processes for storage systems. fdp – Java utility for multi-stream data transfer; FTS3 UDT RDMA MP TCP