Presentation on theme: "GRID: Computing Without Borders Kajari Mazumdar Department of High Energy Physics Tata Institute of Fundamental Research, Mumbai. Soft-computing workshop."— Presentation transcript:
GRID: Computing Without Borders Kajari Mazumdar Department of High Energy Physics Tata Institute of Fundamental Research, Mumbai. Soft-computing workshop University of Mumbai, December 1, 2009 Disclaimer: I am only a physicist whose research field induces & utilizes cutting-edge technology I have mostly borrowed slides from various resources
Plan of talk Grid concept in simple terms Requirements of today’s scientific community Evolution of Grid LHC Computing Grid TIFR grid computing centre DAE contributions Outlook
Grid computing in simple words Grid is an utility or infra-structure for complex, huge computations, where remote resources are accessible through web (internet), from desktop, laptop, mobile phone. It is similar to power grid, where the user does not have to worry about the source of the computing power. Imagine millions of computers owned by individuals, institutes from various countries across the world connected to form a single, huge, super-computer! This technology, developed since last only one decade, is being used presently, by High energy physicists to analyze data to be produced very soon in LHC experiment where Indian scientists are taking part. Earth scientists to monitor Ozone layer activity (deals daily with Data whose volume is equivalent to 150 CDs). It is the natural evolution of internet facility.
1. Share more than information Data, computing power, applications in dynamic, multi-institutional, virtual organizations (Ian Foster: Anatomy of Grid) 2. Efficient use of resources at many institutes. People from many institutions working to solve a common problem (virtual organisation). 3. Join local communities. 4. Interactions with the underneath layers must be transparent and seemless to the user. From Web to Grid Computing
Share data between thousands of scientists with multiple interests Link major and minor computer centres Ensure all data accessible anywhere, anytime Grow rapidly, yet remain reliable for more than a decade Cope with different management policies of different centres Ensure data security Be up and running routinely Need to check up health of facilty on 24X7 basis. A huge man power is at work invisibly. Challenges in scientific computations
Ever-increasing demand PC of early 2000 era is as fast as of supercomputers of 1990’s. Still, for many application it is not adequate! users continue to buy new machines! Storage available in a PC could not be thought of during 1990’s storage capacity doubles every 12 month or so! Recent years of this decade is seeing mammoth scientifc projects where data size is several Petabytes per year. To work with a colleague even across a campus on Petabyte scale we need ultrafast network. Even though CPU power, disc storage, communication speed continue to increase, computing resources are failing to satisfy users’ demands, they are difficult to use.
Clusters: Primary IT infrastructures Clusters replace traditional computing platforms and can be configured according to the need Network load distribution and load balance High availability, High performance /computation intensive,.. Issues related to building clusters Scalability of interconnection network Scalability of software components (libraries, applications,..) Auto-installation, cluster management, trouble-shooting, … Space management (desktop/rack mounted) Layout of nodes, noises, cable layout, cooling,.. Power management Centralized infrastructure management software Performance/ Price/ Power consumption Cost of ownership not very low!
Peer to Peer (P2P) computing Computing based on idea of sharing distributed resources with each other with or without the support from a server There are many under-utilised resources With powerful pcs, real utilisation today is < 10% In large organizations, with thousands of PCs, increasing day by day utilise that in cycle stealing mode! Total delivered power is > few Mflops Total available free disk space > 100 Terabytes Latency and bandwidth of LAN environment is quite adequate for p2p computing mostly. Space is also not a problem, keep the PCs wherever they are!
Internet computing Today you cannot run your jobs on the internet. Internet computing using idle PC’s is becoming an important computing platform (Seti@home, Napster,..) www is the promising candidate for core component of wide-area distributed computing environment. Efficient client/server models/protocols Transparent networking, navigation, GUI with multimedia access and dissemination for data visualization. Mechanism for distributed computing : CGI, Java With improved price/performance and open source, free software, web-services, it is becoming easy to develop loosely coupled distributed applications.
LHC and the GRID Computing A pathologist uses a microscope to examine blood cells, of size about one thousandth of a mm, ie, 10 -6 m High energy probes structure of fundamental matter. LHC will collide very, very high energy protons for this purpose. Mammoth, very complex detectors (length 30 m, dia 20 m) are the technical eye of several thousand scientists to probe the smallest length scale.
Complexity of LHC experiments When 2 very high energy protons will collide at LHC, mostly the situation in the detector will be like this, very crowded. About 10 million electrical signals will have to be recorded in tiny fraction of a second, repeatedly for a long time (about 10 years). Using computers, a digital image is created for each such instance. Image size is about 2 MB on average, but varies considerably. But most of these pictures are not interesting! Good things are always rare!
In LHC experiment the task of the scientist is, to Look for an instance with patterns of this type from 10 thousand Billion (10 13 ) crowded pictures. This picture contains the clue about our universe. Such a job is like, searching for a needle in a million haystacks! Similar to looking for a particular person in a thousand world populations of today (6 Billion, India’s population 1.2 Billion) A single computing system will never scale up to the challenge. Concept of GRID computing developed from such requirements
LHC will collide 6-8 hundred million proton-on-proton per second for several years. Only 1 in 20 thousand collisions will have an important tale to tell, but we do not know which one! so we have to search through all of them! Huge task! 15 PBytes (10 15 bytes) of data a year Analysis requires ~100,000 computers to get results in reasonable time. GRID computing is essential In hard numbers
CAF 450MB/s (300Hz) 30-300MB/s (ag. 800MB/s) ~150k jobs/day ~50k jobs/day 50-500MB/s 10-20MB/s CMS detector Tier-0 Prompt Reconstruction Archival of Copy of Raw and First RECO data Calibration Streams (CAF) Data Distribution Tier-1 7 Tier-1s Re-Reconstruction Skimming Second Archival of RAW Served Copy of RECO Archival of Simulation Data Distribution Tier-2 ~50 Tier-2s Primary Resources for Physics Analysis and Detetector Studies by users MC Simulation Tier-1 WLCG Computing Grid Infrastructure The way CMS uses the GRID (WLCG) TIER-3 100MB/s CMS in Total: 1 Tier-0 at CERN (GVA) 7 Tier-1s on 3 continents 50 Tier-2s on 4 continents 33P.Kreuzer - GRID Computing - Mumbai
Tier 0 Tier 1 National centres Tier 2 Regional groups Different Universities, Institutes Individual scientist’s PC Experimental site CERN computer centre, Geneva ASIA (Taiwa n) IndiaChinaKorea Pakist an Franc e Italy Germa ny USA TIFR Delhi U. Panja b U. Useful model for Particle Physics experiments, but not necessary for others T2_IN_TIFR Tiered/Layered Structure connecting computers across the globe
Hardware at TIFR site: T2_IN_TIFR About 50 users/scientists at present, still growing. Another similar Tier2 centre in Kolkata for a different experiment at LHC. Grid facility has been functional at TIFR for almost a year. The CMS collaboration at LHC, CERN has been using the computer resources. Storage: 350 TB 300 worker nodes. Internet bandwidth: 1 GBps. To be upgraded in near future. Note, continuous monitoring essential, we are manageing with 5 engineers, not all are full time.
Networking, GRID Middleware, Sites GRID Middleware Services - Storage Elements - Computing Element - Workload Management System - Local File Catalog - Information System - Virtual Organisation Management Service - Inter-operability between GRIDs EGEE, OSG, NorduGriD.. Networking Site Specificities, e.g. Storage/Batch systems at CMS Tier-1s: Storage : dCache/ Castor dCache/HPSS dCache/ Castor Castor+ dCache/Chimera Enstore Enstore Storm TSM Batch : CondorTorque/Maui BQS Torque/Maui Torque/Maui LSF PBSPro FNAL RAL CCIN2P3 PIC ASGC INFN FZK 36P.Kreuzer - GRID Computing - Mumbai
Data Transfers from/to TIFR TIFR T1 Transfer Quality (last 3 months) : improving, aim for stability : TIFR T1 production transfers (last year) : modest but ready to grow ! TIFR ASGC 8TB MC data (custodial storage) T1 TIFR Tot 37TB from 7 T1s 37P.Kreuzer - GRID Computing - Mumbai
Statistics and plots Site summary table Site history Site ranking 38P.Kreuzer - GRID Computing - Mumbai
CMS Software Deployment Deployment of CMS SW to 90% sites in few hours Basic strategy: Use RPM (with apt-get) in CMS SW area EGEE 39P.Kreuzer - GRID Computing - Mumbai
CMS Centers and Computing Shifts CMS Centre at CERN: monitoring, computing operations, analysis CMS Experiment Control Room CMS Remote Operations Centre at Fermilab CMS running Computing shifts 24/7 Encourage remote shifts Main task: monitor and alarm CMS sites & Computing Experts 40P.Kreuzer - GRID Computing - Mumbai
World Wide Web – Information Sharing Invented at CERN by Tim Berners-Lee (in 1990s) Agreed protocols: HTTP, HTML, URLs Anyone can access information and post their own Quickly crossed over into public use No. of Internet hosts (millions) Year Going back