ExaO: Software Defined Data Distribution for Exascale Sciences

Slides:



Advertisements
Similar presentations
Network Resource Broker for IPTV in Cloud Computing Lei Liang, Dan He University of Surrey, UK OGF 27, G2C Workshop 15 Oct 2009 Banff,
Advertisements

Database Architectures and the Web
High Performance Computing Course Notes Grid Computing.
Esma Yildirim Department of Computer Engineering Fatih University Istanbul, Turkey DATACLOUD 2013.
Optimizing of data access using replication technique Renata Słota 1, Darin Nikolow 1,Łukasz Skitał 2, Jacek Kitowski 1,2 1 Institute of Computer Science.
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
Technical Architectures
The new The new MONARC Simulation Framework Iosif Legrand  California Institute of Technology.
A Multi-Agent Learning Approach to Online Distributed Resource Allocation Chongjie Zhang Victor Lesser Prashant Shenoy Computer Science Department University.
Extreme Networks Confidential and Proprietary. © 2010 Extreme Networks Inc. All rights reserved.
Database Architectures and the Web
 Cloud computing  Workflow  Workflow lifecycle  Workflow design  Workflow tools : xcp, eucalyptus, open nebula.
Efficient Protocols for Massive Data Transport Sailesh Kumar.
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
The Grid System Design Liu Xiangrui Beijing Institute of Technology.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
7April 2000F Harris LHCb Software Workshop 1 LHCb planning on EU GRID activities (for discussion) F Harris.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
Tier-2  Data Analysis  MC simulation  Import data from Tier-1 and export MC data CMS GRID COMPUTING AT THE SPANISH TIER-1 AND TIER-2 SITES P. Garcia-Abia.
GridPP Deployment & Operations GridPP has built a Computing Grid of more than 5,000 CPUs, with equipment based at many of the particle physics centres.
Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”
Vic Liu Liang Xia Zu Qiang Speaker: Vic Liu China Mobile Network as a Service Architecture draft-liu-nvo3-naas-arch-01.
Operating Systems David Goldschmidt, Ph.D. Computer Science The College of Saint Rose CIS 432.
ICCS WSES BOF Discussion. Possible Topics Scientific workflows and Grid infrastructure Utilization of computing resources in scientific workflows; Virtual.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
CEOS Working Group on Information Systems and Services - 1 Data Services Task Team Discussions on GRID and GRIDftp Stuart Doescher, USGS WGISS-15 May 2003.
HIGUCHI Takeo Department of Physics, Faulty of Science, University of Tokyo Representing dBASF Development Team BELLE/CHEP20001 Distributed BELLE Analysis.
Globus and PlanetLab Resource Management Solutions Compared M. Ripeanu, M. Bowman, J. Chase, I. Foster, M. Milenkovic Presented by Dionysis Logothetis.
Scheduling MPI Workflow Applications on Computing Grids Juemin Zhang, Waleed Meleis, and David Kaeli Electrical and Computer Engineering Department, Northeastern.
Tool Integration with Data and Computation Grid “Grid Wizard 2”
Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research The GRACE Project GRid enabled.
IT 5433 LM1. Learning Objectives Understand key terms in database Explain file processing systems List parts of a database environment Explain types of.
BDTS and Its Evaluation on IGTMD link C. Chen, S. Soudan, M. Pasin, B. Chen, D. Divakaran, P. Primet CC-IN2P3, LIP ENS-Lyon
SERVERS. General Design Issues  Server Definition  Type of server organizing  Contacting to a server Iterative Concurrent Globally assign end points.
CERN IT Department CH-1211 Genève 23 Switzerland t EGEE09 Barcelona ATLAS Distributed Data Management Fernando H. Barreiro Megino on behalf.
SDN challenges Deployment challenges
Confluent vs. Splittable Flows
University of Maryland College Park
WP18, High-speed data recording Krzysztof Wrona, European XFEL
Zhenbin Li, Kai Lu Huawei Technologies IETF 98, Chicago, USA
Report from WLCG Workshop 2017: WLCG Network Requirements GDB - CERN 12th of July 2017
Database Architectures and the Web
Toward Super High-Level SDN Programming
Virtual laboratories in cloud infrastructure of educational institutions Evgeniy Pluzhnik, Evgeniy Nikulchev, Moscow Technological Institute
GWE Core Grid Wizard Enterprise (
Provisioning 160,000 cores with HEPCloud at SC17
Authors: Sajjad Rizvi, Xi Li, Bernard Wong, Fiodar Kazhamiaka
DOE Facilities - Drivers for Science: Experimental and Simulation Data
Author: Ragalatha P, Manoj Challa, Sundeep Kumar. K
Author: Daniel Guija Alcaraz
Joint Genome Institute
Grid Computing.
Dagmar Adamova (NPI AS CR Prague/Rez) and Maarten Litmaath (CERN)
SENSE: SDN for End-to-end Networked Science at the Exascale
Building Grids with Condor
Database Architectures and the Web
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
Cloud Computing and Cloud Networking
File Transfer Issues with TCP Acceleration with FileCatalyst
Distributed Channel Assignment in Multi-Radio Mesh Networks
AWS Cloud Computing Masaki.
Beyond FTP & hard drives: Accelerating LAN file transfers
2019/10/9 A Weighted ECMP Load Balancing Scheme for Data Centers Using P4 Switches Presenter:Hung-Yen Wang Authors:Jin-Li Ye, Yu-Huang Chu, Chien Chen.
SANDIE: Optimizing NDN for Data Intensive Science
Towards Predictable Datacenter Networks
Presentation transcript:

ExaO: Software Defined Data Distribution for Exascale Sciences California Institute of Technology, European Organization for Nuclear Research (CERN), Yale University

The Compact Muon Solenoid Computing Model Large raw datasets from LHC at the Tier-0 site RAW Data: tens of PB per year RECO and AOD: multiple times of RAW data depending on analysis requirements RECO and AOD datasets are distributed to Tier-1 sites RECO, AOD and simulation datasets are transferred among Tier-1~3 sites for analysis. For more information, please contact us: supersdnprogramming@gmail.com

PhEDEx: CMS Data Transfer Service Automatic Data Management and Movement in CMS Number of files > 63,000,000 Average file size > 2.6GB Total volume > 160PB Dataset hundreds to thousands files Dataset transfer requests 400-500 per day Multiple transfer patterns: one-to-one, one-to-many many-to-many Users submit requests through web interface Central data movement decision based on history statistics For more information, please contact us: supersdnprogramming@gmail.com

PhEDEx Architecture Built on top of traditional multi-domain networks Inflexible, static dataset level scheduling Performance • Inflexible infrastructures, • Lack of real-time, global network view • Infeasible of end-to-end data flow orchestration • Traditional client/server mode to select source for all files at once • Complete ignorance of destination sites’ potential as file providers • No network resource allocation scheme • Low concurrency • Low link utilization • Long transfer delay For more information, please contact us: supersdnprogramming@gmail.com

Example: Distributing Dataset X to All the Sites in PhEDEx Dataset X (3000 50GB files) Only site 1 is considered as a potential source PhEDEx scheduling: site 1 sends all 3000 files to each destination (File K, site 1, site X) where K=1, 2, …, 3000 and X=2, 3, … 7 Low concurrency Only the uplink of site 1 is utilized and becomes the bottleneck Low link utilization: 1/7=14.29% Flows compete for network resources With TCP, the fair share of each site-to-site flow converges at 100/6=16.7Gbps Bottleneck CMS Needs A More Efficient, Flexible Data Transfer Service

Design Challenges for An Flexible, Efficient Data Transfer Service Provision of a global, real-time, inter-domain network view Flexible, dynamic schedule with high transfer concurrency Efficient orchestration among end-to-end data flows with high network resource utilization Solution? ExaO: A Software Defined Data Transfer Orchestrator For more information, please contact us: supersdnprogramming@gmail.com

ExaO: Software Defined Data Transfer Orchestrator PhEDEx ExaO No real-time, global network view Application-Layer Traffic Optimization (ALTO) Collect complete network state at different domains (OpenDaylight) Compute real-time routing information at different domains (ALTO-SPCE) Compute global, on-demand, minimal, equivalent abstract routing state (ALTO-RSA) Dataset level scheduling Destination sites cannot become candidate sources until receiving the whole dataset Low concurrency Scheduler Centralized, dynamic, network-aware file level scheduling Leverage destination sites as candidate sources after file reception High concurrency Minimally invasive change on end host groups Real-time, dynamic resource allocation under the existence of other network traffic Not CMS or HEP specific, hence support any data intensive sciences. Dataset distribution to N destination: Maximal link utilization in the testbed N times faster than dataset level scheduling No network resource allocation scheme Data flows compete for network resources Low utilization Scheduler and Transfer Execution Nodes (TEN) Global, dynamic rate allocation among data flows (Scheduler) End host rate limiting to enforce allocation (TEN) For more information, please contact us: supersdnprogramming@gmail.com

Transfer Execution Nodes Components of ExaO Allow users submit and manage transfer request through different interface RESTful-API Collect on-demand, real-time, minimal abstract routing information Enforce resource allocation in the network ALTO Centralized, efficient file-level scheduling and network resource allocation ExaO Scheduler Enforce scheduling and rate allocation decisions at end hosts Transfer Execution Nodes Efficient data transfer tools on end hosts FDT Distributed monitoring infrastructure for real time monitoring of each transfer Monalisa For more information, please contact us: supersdnprogramming@gmail.com

For more information, please contact us: supersdnprogramming@gmail.com ExaO Workflow Users submit, track and manage dataset transfer requests User Requests For more information, please contact us: supersdnprogramming@gmail.com

For more information, please contact us: supersdnprogramming@gmail.com ExaO Workflow Users submit, track and manage dataset transfer requests Scheduler queries NEW requests via ExaO RESTful interface Scheduler queries ALTO controllers for routing state User Requests Routing Query Routing Query Routing Query For more information, please contact us: supersdnprogramming@gmail.com

For more information, please contact us: supersdnprogramming@gmail.com ExaO Workflow Users submit, track and manage dataset transfer requests Scheduler queries NEW requests via ExaO RESTful interface Scheduler queries ALTO controllers for routing state ALTO controller collects complete network state and computes on-demand, minimal, abstract routing state in response to query Routing ACK Routing ACK Routing ACK For more information, please contact us: supersdnprogramming@gmail.com

For more information, please contact us: supersdnprogramming@gmail.com ExaO Workflow Users submit, track and manage dataset transfer requests Scheduler queries NEW requests via ExaO RESTful interface Scheduler queries ALTO controllers for routing state ALTO controller collects complete network state and computes on-demand, minimal, abstract routing state in response to query Scheduler makes centralized, dynamic, file-level scheduling and global network resource allocation decisions Schedule Decisions Schedule Decisions Schedule Decisions Schedule Decisions For more information, please contact us: supersdnprogramming@gmail.com

For more information, please contact us: supersdnprogramming@gmail.com ExaO Workflow Users submit, track and manage dataset transfer requests Scheduler queries NEW requests via ExaO RESTful interface Scheduler queries ALTO controllers for routing state ALTO controller collects complete network state and computes on-demand, minimal, abstract routing state in response to query Scheduler makes centralized, dynamic, file-level scheduling and global network resource allocation decisions Transfer Execution Nodes (TEN) query updated scheduling and allocation decisions TEN instruct efficient data transfer tools on end hosts (e.g., FDT) to enforce scheduling and allocation decisions TEN Query TEN instruction TEN instruction TEN ininstruction TEN instruction For more information, please contact us: supersdnprogramming@gmail.com

For more information, please contact us: supersdnprogramming@gmail.com ExaO Workflow Users submit, track and manage dataset transfer requests Scheduler queries NEW requests via ExaO RESTful interface Scheduler queries ALTO controllers for routing state ALTO controller collects complete network state and computes on-demand, minimal, abstract routing state in response to query Scheduler makes centralized, dynamic, file-level scheduling and global network resource allocation decisions Transfer Execution Nodes (TEN) query updated scheduling and allocation decisions TEN instruct efficient data transfer tools on end hosts (e.g., FDT) to enforce scheduling and allocation decisions MonALISA sensors monitor data transfer status and send update back Transfer Status Transfer Status Transfer Status For more information, please contact us: supersdnprogramming@gmail.com

For more information, please contact us: supersdnprogramming@gmail.com Practical Concerns Minimally invasive changes on end host groups Real-time, dynamic resource allocation under the existence of other network traffic Not CMS or HEP specific, hence support any data intensive sciences Dataset distribution to N destination • Maximal link utilization in the testbed • N times faster than dataset level scheduling For more information, please contact us: supersdnprogramming@gmail.com

Example: Distributing Dataset X to All the Sites in ExaO Dataset X (3000 50GB files) Site 1 is the only source at the beginning Each site can become a file provider once receiving files Site 1 sends 3000/6=500 unique files to each destination site Limit the rate of each (site 1, site X) flow as 100/6=16.7Gbps Uplink utilization: 100% Site X (X=2, 3, …, 7): becomes a source to other destination sites after receiving a unique file from site 1 sends the received file to all other 5 destination sites, each at (100-16.7)/5=16.7Gbps Uplink utilization: is 83.3% Total link utilization: 6/7=85.71% 6 times of PhEDEx Achieves theoretical maximum For more information, please contact us: supersdnprogramming@gmail.com

SC 16 Demo: Datasets Distribution Among Different Host Groups DatasetX: Host1 -> Host2,3 DatasetY: Host4 -> Host5,6,7,8 Initial Host 1,4 as sources Final Host 2,3,5,6,7,8 as sources For more information, please contact us: supersdnprogramming@gmail.com

Results Request submitted Total rate Host 1 Host 2, 3

For more information, please contact us: supersdnprogramming@gmail.com What Else? “Super SDN Programming for Supercomputing” Introduce a series of tools to l Presentations at 5:30pm Wednesday and 11:30am Thursday Demo: Implementing A Science DMZ Traffic Control Application Run on a 6-switch network Lasts all 3 days For more information, please contact us: supersdnprogramming@gmail.com

Thank You. For more information, please contact us Thank You! For more information, please contact us. supersdnprogramming@gmail.com