Presentation on theme: "Connect communicate collaborate LHCONE – Linking Tier 1 & Tier 2 Sites Background and Requirements Richard Hughes-Jones DANTE Delivery of Advanced Network."— Presentation transcript:
connect communicate collaborate LHCONE – Linking Tier 1 & Tier 2 Sites Background and Requirements Richard Hughes-Jones DANTE Delivery of Advanced Network Technology to Europe LHCONE Planning Meeting, RENATER Paris, 5 April 2011
connect communicate collaborate 2 Introduction: Describe some of the changes in the computing model of the LHC experiments. Demonstrate the importance and usage of the network. Show the relation between LHCONE and LHCOPN. Bring together and present the user requirements for future LHC physics analysis. Provide the information to facilitate the presentations on the Architecture and the Implementation of LHCONE.
connect communicate collaborate 3 A Little History Requirements paper from K. Bos (Atlas) and I. Fisk (CMS) in autumn Experiments had devised new compute and data models for LHC data evaluation basically assuming a high speed network connecting the T2s worldwide. Ideas & proposals were discussed at a workshop held at CERN in Jan Gave input from the networking community. An "LHCONE Architecture" doc finalised in Lyon in Feb Here K. Bos proposed to start with a prototype based on the commonly agreed architecture. K. Bos and I. Fisk produced a "Use Case" note with list of sites for the prototype. In Rome late Feb 2011 some NRENs & DANTE formed ideas for the " LHCONE prototype planning " doc.
connect communicate collaborate LHCOPN LHC: Changing Data Models (1) LHC computing model based on MONARC served well > 10 years ATLAS strictly hierarchal; CMS less so. The successful operation of the LHC accelerator & start of data analysis, brought a re-evaluation of the computing and data models. Flatter hierarchy: Any site might in the future pull data from any other site hosting it. LHCOPN 4 Artur Barczyk
connect communicate collaborate LHC: Changing Data Models (2) Data caching: A bit like web caching. Analysis sites will pull datasets from other sites on demand, including from Tier2s in other regions, then make it available for others. Possible strategic pre-placement of data sets Datasets put close to physicists studying that data / suitable CPU power. Use of continental replicas. Remote data access: jobs executing locally, using data cached at a remote site in quasi-real time. Traffic patterns are changing – more direct inter-country data transfers 5
connect communicate collaborate ATLAS Data Transfers Between all Tier levels Average: ~ 2.3 GB/s (daily average) Peak: ~ 7 GB/s (daily average) Data available on site within a few hours. 70 Gbit/s on LHCOPN ATLAS reprocessing Daniele Bonacorsi 6
connect communicate collaborate Data Flow EU – US ATLAS Tier 2s Example above is from US Tier 2 sites Example above is from US Tier 2 sites Exponential rise in April and May, after LHC start Changed data distribution model end of June – caching ESD and DESD Much slower rise since July, even as luminosity grows rapidly Kors Bos 7
connect communicate collaborate LHC: Evolving Traffic Patterns One example of data coming from the US 4 Gbit/s for ~ 1.5 days (11 Jan 11) Transatlantic link GÉANT Backbone NREN Access Link Not an isolated case Often made up of many data flows Users getting good at running gridftp 8
connect communicate collaborate Data Transfers over RENATER Peak rates a substantial fraction of 10 Gigabits, often for hours. Several LHC involved. Demand variable depending on user work. Francois-Xavier Andreu 9
connect communicate collaborate Data Transfers over DFN Peak rates saturate one of the 10 Gigabit links DFN-GÉANT. Demand variable depending on user work. Christian Grimm 10 Two different weeks from GÉANT to Aachen
connect communicate collaborate Data Transfers from GARR - CNAF T0-T1 + T1-T1 + T1-T2 Peak rates Gigabit/s. Traffic shows diurnal demand & is variable depending on user work. Sustained growth over last year Marco Marletta 11
connect communicate collaborate CMS Data Transfers Data Placement for Physics Analysis Once data is onto the WLCG, it must be made accessible to analysis applications. Largest fraction of analysis computing at LHC is at the Tier2s. New flexibility reduces latency for end users. Daniele Bonacorsi 12 T1 T2 dominates T2 T2 emerges
connect communicate collaborate Data Transfer Performance Site or Network? Test NorthGrid to GÉANT PoP London UDP throughput from SE 990 Mbit/s. 75% packet loss. Data transmitted by SE at 3.8 Gbit/s over 4 1 Gigabit interfaces. TCP transmits in bursts at 3.8 Gbit/s packet loss & re-tries mean low throughput 13 1 Gbit Bottleneck at receiver Classic packet loss from bottleneck Even more data with end-hosts fixed.
connect communicate collaborate LHCOPN linking Tier 0 to Tier 1s LHCONE for Tier 1s and Tier 2s 14 LHCONE Other regions T2s in a country LHCONE prototype in Europe. T1 are connected but not LHCOPN
connect communicate collaborate Requirements for LHCONE LHCOPN provides infrastructure to move data T0-T1 and T1-T1. New infrastructure required to improve transfers T1-T2 & T2-T2: Analysis is mainly done in Tier 2, so data is required from any T1 or any T2. T2-T2 is very important. Work done at a Tier 2: Simulations & Physics Analysis (50:50) Network BW needs of a T2 include: Re-processing efforts: 400 TByte refresh in a week = 5 Gbit/s Data bursts from user analysis : 25 Tbyte in a day = 2.5Gbit/s Feeding a 1000 core farm with LHC events: ~ 1Gbit/s Note this implies timely delivery of data not just average rates! Access link available bandwidth for Tier 2 sizes: Large 10 Gbit; Medium 5 Gbit; Small 1 Gbit 15
connect communicate collaborate Requirements for LHCONE Sites are free to choose the way they wish to connect. Flexibility & extensibility required: T2s change Analysis usage pattern is more chaotic – Dynamic Networks of interest World-wide connectivity required for LHC sites. There is concern about LHC traffic swamping other disciplines. Monitoring & fault-finding support should be built in. Cost effective solution required – may influence the Architecture. No isolation of sites must occur. No interruption of the data-taking or physics analysis A prototype is needed. 16
connect communicate collaborate Requirements Fitting in with LHC 2011 data taking 17 Machine development & Technical Stops provide pauses in the data taking. This does not mean there is plenty of time. LHCONE prototype might grow in phases.
connect communicate collaborate ANY QUESTIONS ? 18