Big Data: Movement, Crunching, and Sharing Guy Almes, Academy for Advanced Telecommunications 13 February 2015.

Slides:



Advertisements
Similar presentations
Joint CASC/CCI Workshop Report Strategic and Tactical Recommendations EDUCAUSE Campus Cyberinfrastructure Working Group Coalition for Academic Scientific.
Advertisements

Presentation at WebEx Meeting June 15,  Context  Challenge  Anticipated Outcomes  Framework  Timeline & Guidance  Comment and Questions.
High Performance Computing Course Notes Grid Computing.
Title or Title Event/Date Presenter, PresenterTitle, Internet2 Network Virtualization & the Internet2 Innovation Platform To keep our community at the.
Lecture 13 Information and History. Objectives Revolution or Paradigms of Information Systems Development of Information Systems in historical context.
Milos Kobliha Alejandro Cimadevilla Luis de Alba Parallel Computing Seminar GROUP 12.
14 July 2000TWIST George Brett NLANR Distributed Applications Support Team (NCSA/UIUC)
Internet Timeline first electronic mail on a single computer first book on packet switching theory ARPA funds ARPA Computer Network.
Approaching a Data Center Guy Almes meetings — Tempe 5 February 2007.
Introduction to Grid Computing Ann Chervenak Carl Kesselman And the members of the Globus Team.
Internet Basics مهندس / محمد العنزي
Chapter 4: Computer Networks Department of Computer Science Foundation Year Program Umm Alqura University, Makkah Computer Skills /1436.
Presented by CCS Network Roadmap Steven M. Carter Network Task Lead Center for Computational Sciences.
November 18, 1999 Internet 2 Mike Rackley Head of Information Technology Services
History of the Internet 1955: Pres. Eisenhower announce US would launch a small satellite for communication 1957: Kremlin launched Sputnik…made US feel.
9/16/2000Ian Bird/JLAB1 Planning for JLAB Computational Resources Ian Bird.
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. GENI Mesoscale and The.
Cloud Computing in NASA Missions Dan Whorton CTO, Stinger Ghaffarian Technologies June 25, 2010 All material in RED will be updated.
Distributed EU-wide Supercomputing Facility as a New Research Infrastructure for Europe Gabrielle Allen Albert-Einstein-Institut, Germany Jarek Nabrzyski.
The Research and Education Network: Platform for Innovation Heather Boyles, Next Generation Network Symposium Malaysia 2007-March-15.
1 Florida Cyberinfrastructure Development: SSERCA Fall Internet2 Meeting Raleigh, Va October 3, 2011 Paul Avery University of Florida
Developing a 100G TestBed for Life Science Collaborations  Taking advantage of existing UM/SURA dark fiber to create a research 100G pathway from St.
1 What is the history of the Internet? ARPANET (Advanced Research Projects Agency Network) TCP/IP (Transmission Control Protocol/Internet Protocol) NSFNET.
Campus Cyberinfrastructure – Network Infrastructure and Engineering (CC-NIE) Kevin Thompson NSF Office of CyberInfrastructure April 25, 2012.
What is Cyberinfrastructure? Russ Hobby, Internet2 Clemson University CI Days 20 May 2008.
DataTAG Research and Technological Development for a Transatlantic Grid Abstract Several major international Grid development projects are underway at.
Enhancing Networking Expertise Across the Great Plains Greg Monaco, Ph.D. Director for Research & Cyberinfrastructure Initiatives Great Plains Network.
1 Lecture # 21 Evolution of Internet. 2 Circuit switching network This allows the communication circuits to be shared among users. E.g. Telephone exchange.
A Brief Overview Andrew K. Bjerring President and CEO.
Panel: Is IP Routing Dead? -- Linda Winkler, Argonne Natl Lab -- Bill St Arnaud, CANARIE Guy Almes PFLDnet Workshop – Geneva 3 February 2003.
28 September, 2005Slide 1 From ARPANET to LambdaGrid: The 1980s Eruption Dennis Jennings NSF Program Director for Networking (1985/86) ex University College.
Authors: Ronnie Julio Cole David
The Internet The History and Future of the Internet.
Performance Measurements in Internet2 Guy Almes Geneve – 15 March 2004.
Cyberinfrastructure What is it? Russ Hobby Internet2 Joint Techs, 18 July 2007.
Spectrum of Support for Data Movement and Analysis in Big Data Science Network Management and Control E-Center & ESCPS Network Management and Control E-Center.
1 Internet Technologies CSC Internet Internet is a communication technology. Like telephone it enables people to communicate. Telephones enabled.
Summary - Part 2 - Objectives The purpose of this basic IP technology training is to explain video over IP network. This training describes how video can.
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
CEOS Working Group on Information Systems and Services - 1 Data Services Task Team Discussions on GRID and GRIDftp Stuart Doescher, USGS WGISS-15 May 2003.
Critical Decisions, Myths & Lessons Learned in Networking What is important at the time may be only apparent with hindsight What seems important at the.
Achieving Dependable Bulk Throughput in a Hybrid Network Guy Almes Aaron Brown Martin Swany Joint Techs Meeting Univ Wisconsin July 2006.
Marv Adams Chief Information Officer November 29, 2001.
Cyberinfrastructure: An investment worth making Joe Breen University of Utah Center for High Performance Computing.
Internet Network of networks Mother of all networks
7. Grid Computing Systems and Resource Management
| nectar.org.au NECTAR TRAINING Module 2 Virtual Laboratories and eResearch Tools.
| nectar.org.au NECTAR TRAINING Module 4 From PC To Cloud or HPC.
Cyberinfrastructure Overview Russ Hobby, Internet2 ECSU CI Days 4 January 2008.
Cyberinfrastructure: Many Things to Many People Russ Hobby Program Manager Internet2.
Simple Infrastructure to Exploit 100G Wide Are Networks for Data-Intensive Science Shawn McKee / University of Michigan Supercomputing 2015 Austin, Texas.
30 November 2001 Advisory Panel on Cyber Infrastructure National Science Foundation Douglas Van Houweling November 30, 2001 National Science Foundation.
Advanced research and education networking in the United States: the Internet2 experience Heather Boyles Director, Member and Partner Relations Internet2.
Open Science Grid in the U.S. Vicky White, Fermilab U.S. GDB Representative.
H i s t o r y o f t h e I n t e r n e t I l l u s t r a t e d b y : C a r l A n g e l o G. A n g c a n a.
 The Internet is the huge network of computers that links many different types of computers all over the world.  It is a network of networks that shares.
Logistical Networking: Buffering in the Network Prof. Martin Swany, Ph.D. Department of Computer and Information Sciences.
SCIENCE_DMZ NETWORKS STEVE PERRY, DIRECTOR OF NETWORKS UNM PIYASAT NILKAEW, DIRECTOR OF NETWORKS NMSU.
Internet2 Engineering Challenges Campus Engineering Workshop, Houston Guy Almes 10 April 2002.
Data and Computer Communications Eighth Edition by William Stallings Lecture slides by Lawrie Brown Chapter 1 – Data Communications, Data Networks, and.
IPCEI on High performance computing and big data enabled application: a pilot for the European Data Infrastructure Antonio Zoccoli INFN & University of.
A Brief history of the Internet Name:Ziyun Wang. Introduction Internet history revolves around four distinct aspects. 1. the technological evolution that.
Data Infrastructure in the TeraGrid Chris Jordan Campus Champions Presentation May 6, 2009.
Developing sustainable and cost effective compute platforms for NREN PRESENTED BY EJOVI AROR GROUP MD IPNX NIGERIA LIMITED 25 th NOVEMBER 2013.
Distributed OS.
Computing Clusters, Grids and Clouds Globus data service
Campus Cyberinfrastructure
Guy Almes Texas A&M University
ESnet and Science DMZs: an update from the US
The Internet and Its Applications
Presentation transcript:

Big Data: Movement, Crunching, and Sharing Guy Almes, Academy for Advanced Telecommunications 13 February 2015

Overarching theme Understanding the interplay among data movement, crunching, and sharing is key.

This is a persistent theme mid-1980s: NSF launched two closely related programs The NSF Supercomputer Centers brought HPC and the emergent computational science to the mainstream of NSF- funded research The NSFnet program, needed to connect science users to those supercomputers, resulted in connecting all our research universities to the Internet File transfer of huge (e.g., one megabyte!) files was a key issue Thus, A&M connected to NSFnet in August 1987

An ongoing theme The Internet “outgrew” the narrow mission of connecting universities to supercomputers But, in its broad missions, it often neglects the big-data needs of university researchers Thus, having spawned the commercial Internet in the early 1990s, the universities created Internet2 in 1996 Again, a dramatic improvement in our ability to move huge (e.g., one gigabyte) files Note the Teragrid network as a false step

To the present First, note A&M innovation in the ScienceDMZ, so that key data-intensive resources, e.g., the gridftp servers of Brazos high-throughput cluster, have direct access to the wide-area (LEARN, Internet2, ESnet, etc.) Recently, that wide-area infrastructure has been upgraded to 100-Gb/s We’ll look at these in turn

ScienceDMZ You can achieve high-speed wide-are flows only if packet loss is very very small and MTU is not small This fails if you try to extend these flows into the general- purpose campus LAN Beginning 2009, we designed the Data Intensive Network to place key resources adjacent to the wide-area network This idea, called “ScienceDMZ” and popularized by ESnet, is now widely adopted across the country If both source and destination of a high-speed wide-area flow are on ScienceDMZ’s, very good performance can be achieved Example: gridftp servers supporting flows to/from the 240 TByte file system for the Brazos high-throughput cluster

100 Gb/s Upgrade The Internet2 backbone is built around 100-Gb/s circuits (and with up to 80 such circuits/lambdas per fiber) With a combination of NSF and local funding, LEARN is evolving to 100-Gb/s: Now: 100-Gb/s College Station to Houston Now: 100-Gb/s Houston to Internet2 backbone at Greenspoint Now: 100-Gb/s Austin to Dallas Now: 100-Gb/s Dallas to Internet2 backbone at Kansas City Future: 100-Gb/s College Station to Dallas, and Austin to San Antonio to Houston This would then result in a consistent 100-Gb/s wide-area infrastructure

Sum of current good situation ScienceDMZ and the emerging 100-Gb/s infrastructure permit very good end-to-end performance to resources on the ScienceDMZ Software tools such as gridftp, Globus Online, discipline- specific tools such as Phedex, permit wide-area flows in excess of 1 TByte/hour to be sustained from other high-end sites Emerging “Advanced Layer-2 Services”, based on software- defined network techniques, may be very important

Crunching Several key computing resources are already on A&M’s ScienceDMZ Parallel file system of Brazos Similarly for Eos Similarly for Ada, a new very large x86 cluster Emerging: Power7 cluster and eventually? the BlueGene cluster Data-moving servers attached to the parallel file systems of these resources And, using tools such as Globus Online, large data flows can be achieved to the computing resources of NSF/XSEDE and the DoE

Sharing Things are more primitive here. One can only point to: A few discipline-specific examples, e.g., the Phedex system of the Large Hadron Collider’s CMS collaboration Some key tools: InCommon / Shibboleth provide federated identity/authentication Globus Online provides some support for controlled sharing But, generally, this situation does not match our needs to be able to share data among key scientific collaborations

An important work in progress Controlled high-performance sharing of data is key to effective scientific collaboration in the big-data world