Presentation is loading. Please wait.

Presentation is loading. Please wait.

Steve LloydInaugural Lecture - 24 November 2004 Slide 1 The Data Deluge and the Grid Steve Lloyd Professor of Experimental Particle Physics Inaugural Lecture.

Similar presentations


Presentation on theme: "Steve LloydInaugural Lecture - 24 November 2004 Slide 1 The Data Deluge and the Grid Steve Lloyd Professor of Experimental Particle Physics Inaugural Lecture."— Presentation transcript:

1 Steve LloydInaugural Lecture - 24 November 2004 Slide 1 The Data Deluge and the Grid Steve Lloyd Professor of Experimental Particle Physics Inaugural Lecture

2 Steve LloydInaugural Lecture - 24 November 2004 Slide 2 Outline What is Data? Where it comes from – e-Science The CERN LHC and Experiments What is the Grid? GridPP Challenges ahead

3 Steve LloydInaugural Lecture - 24 November 2004 Slide 3 What is Data? Anything that can be expressed as numbers Raw Information → Numbers → Binary Digits Pictures Electrical Signals Sound Store amount of Red, Green and Blue Store loudness at each time Lots of Pictures + Sound = DVD Video Store voltage or current Text Every character has a numerical code

4 Steve LloydInaugural Lecture - 24 November 2004 Slide 4 Digital Data Numbers are stored as Binary digits 1 Bit = 0 or 1 1 Byte = 8 bits Can store yes/no or on/off Can store numbers from 0 to 255 (Enough for a character a-z, A-Z, *£$@<... ) 25 = 0x128 + 0x64 + 0x32 + 1x16 +1x8 + 0x4 + 0x2 + 1x1 = 00011001 1 kiloByte = ~1,000 Bytes Typical Word Document ~30kB 1 MegaByte = ~1,000,000 Bytes A Floppy Disk ~1.4MB A CD ~700MB 1 GigaByte = ~1,000,000,000 Bytes Typical PC Hard Drive 20-120 GB 1 TeraByte = ~1,000,000,000,000 Bytes 1 PetaByte = ~1,000,000,000,000,000 Bytes ~1.4 Million CDs 1 ExaByte = ~1,000,000,000,000,000,000 Bytes World Annual Book Production World Annual Information Production

5 Steve LloydInaugural Lecture - 24 November 2004 Slide 5 Data Analysis What is done with data? NothingRead itListen to it Watch it Analyse it 2323 Read A Read B C = A + B Print C 5 Computer Program "Job" Calculate how proteins fold Calculate what the weather is going to do

6 Steve LloydInaugural Lecture - 24 November 2004 Slide 6 e-Science In the UK this sort of activity has become known as "e-Science" "e-Science will change the dynamic of the way Science is undertaken" "Science increasingly done through distributed global collaborations enabled by the internet using very large data collections, terascale computing resources and high performance visualisation" Dr John Taylor - Director General of Research Councils: "e-Science is about global collaboration in key areas of science, and the next generation of infrastructure that will enable it"

7 Steve LloydInaugural Lecture - 24 November 2004 Slide 7 Astronomy Crab Nebula Optical Radio Infra-red X-ray Jet in M87 HST optical Gemini mid-IR VLA radio Chandra X-ray Virtual Observatories

8 Steve LloydInaugural Lecture - 24 November 2004 Slide 8 Earth Observation 1 TB/day Ozone map Ottawa Trafalgar Square

9 Steve LloydInaugural Lecture - 24 November 2004 Slide 9 Species 2000 To enumerate all ~1.7 million known species of plants, animals, fungi and microbes on Earth for studies of biodiversity A federation of initially 18 taxonomic databases - eventually ~ 200 databases From protozoa to platypus to primates

10 Steve LloydInaugural Lecture - 24 November 2004 Slide 10 Bioinformatics

11 Steve LloydInaugural Lecture - 24 November 2004 Slide 11 Healthcare Dynamic Brain Atlas Breast Screening Scanning Remote Consultancy

12 Steve LloydInaugural Lecture - 24 November 2004 Slide 12 Collaborative Engineering Real-time collection Multi-source Data Analysis Unitary Plan Wind Tunnel Archival storage

13 Steve LloydInaugural Lecture - 24 November 2004 Slide 13 Digital Curation Digitization of almost anything To create Digital Libraries and Museums

14 Steve LloydInaugural Lecture - 24 November 2004 Slide 14 The CERN LHC 4 Large Experiments The world’s most powerful particle accelerator - 2007

15 Steve LloydInaugural Lecture - 24 November 2004 Slide 15 7,000 tonnes 42m long 22m wide 22m high 2,000 Physicists 150 Institutes 34 Countries ATLAS Detector (About the height of a 5 storey building)

16 Steve LloydInaugural Lecture - 24 November 2004 Slide 16 ATLAS Pit

17 Steve LloydInaugural Lecture - 24 November 2004 Slide 17 The Higgs Primary objective of the LHC - What is the origin of Mass? Is it the Higgs Particle? Massless Particle – Travels at the speed of light Low Mass Particle – Travels slower High Mass Particle – Travels slower still

18 Steve LloydInaugural Lecture - 24 November 2004 Slide 18 Starting from this event… We are looking for this “signature” Selectivity: 1 in 10 13 Like looking for 1 person in a thousand world populations Or for a needle in 20 million haystacks! LHC Data Challenge ~100,000,000 electronic channels 800,000,000 proton- proton interactions per second 0.0002 Higgs per second 10 PBytes of data a year (10 Million GBytes = 14 Million CDs)

19 Steve LloydInaugural Lecture - 24 November 2004 Slide 19 LHC Computing Requirements CPU Power (Reconstruction, Simulation, User Analysis etc) - 50,000 of today's PCs Distributed Computing Solution – "The Grid" 'Tape' Storage 20 PetaBytes (= 20 M GBytes) Disk Storage – 2.5 PetaBytes (= 2.5 M GBytes)

20 Steve LloydInaugural Lecture - 24 November 2004 Slide 20 The Grid Ian Foster / Carl Kesselman: "A computational Grid is a hardware and software infrastructure that provides dependable, consistent, pervasive and inexpensive access to high-end computational capabilities." 'Grid' means different things to different people All agree it's a funding opportunity!

21 Steve LloydInaugural Lecture - 24 November 2004 Slide 21 Electricity Grid Analogy with the Electricity Power Grid 'Standard Interface' Power Stations Distribution Infrastructure

22 Steve LloydInaugural Lecture - 24 November 2004 Slide 22 Computing Grid Computing and Data Centres Fibre Optics of the Internet

23 Steve LloydInaugural Lecture - 24 November 2004 Slide 23 What is the Grid? MIDDLEWARE CPU Disks, CPU etc PROGRAMS OPERATING SYSTEM Word/Excel Email/Web Your Program Games CPU Cluster User Interface Machine CPU Cluster CPU Cluster Resource Broker Information Service Single PC Grid Disk Cluster Your Program Middleware is the Operating System of a distributed computing system

24 Steve LloydInaugural Lecture - 24 November 2004 Slide 24 What is the Grid? From this: To this:

25 Steve LloydInaugural Lecture - 24 November 2004 Slide 25 SETI@home A distributed computing project - not really a Grid project You pull the data from them rather than they submit the job to you Arecibo telescope in Puerto Rico Users - 5,240,038 Results received – 1,632,106,991 Years of CPU Time – 2,121,057 Extraterrestrials found – 0

26 Steve LloydInaugural Lecture - 24 November 2004 Slide 26 Entropia Uses idle cycles on Home PCs for profit and non-profit projects: FightAIDS@Home 60,000 Machines 1,400 years of cpu time Rebranding!

27 Steve LloydInaugural Lecture - 24 November 2004 Slide 27 GridPP 19 UK Universities, CCLRC (RAL & Daresbury) and CERN Funded by the Particle Physics and Astronomy Research Council (PPARC) GridPP1 - 2001-2004 £17m "From Web to Grid" GridPP2 - 2004-2007 £15m "From Prototype to Production"

28 Steve LloydInaugural Lecture - 24 November 2004 Slide 28 International Collaboration EU DataGrid (EDG) 2001-2004 –Middleware Development Project US and other Grid projects –Interoperability LHC Computing Grid (LCG) –Grid Deployment Project for LHC EU Enabling Grids for e-Science in Europe (EGEE) 2004-2006 –Grid Deployment Project for all disciplines

29 Steve LloydInaugural Lecture - 24 November 2004 Slide 29 Application Development ATLAS LHCbCMS BaBar (SLAC) SAMGrid (FermiLab)QCDGrid

30 Steve LloydInaugural Lecture - 24 November 2004 Slide 30 Middleware Development Configuration Management Storage Interfaces Network Monitoring Security Information Services Grid Data Management

31 Steve LloydInaugural Lecture - 24 November 2004 Slide 31 Tier Structure 'Tier-0' – where the data comes from 'Tier-1' – major centres in large countries 'Tier-2' – smaller centres in large countries or smaller countries UK Tier-1 US Tier-1 Italy Tier-1 Germany Tier-1 France Tier-1 Spain Tier-2 Poland Tier-2... Tier-2... Tier-1 UK Tier-2 CERN Tier-0 Tier structure not necessarily appropriate for all disciplines

32 Steve LloydInaugural Lecture - 24 November 2004 Slide 32 UK Tier-1/A Centre High quality data services National and International Role UK focus for International Grid development 700 Dual CPU 80 TB Disk 60 TB Tape (Capacity 1PB) Grid Operations Centre

33 Steve LloydInaugural Lecture - 24 November 2004 Slide 33 UK Tier-2 Centres ScotGrid Durham, Edinburgh, Glasgow NorthGrid Daresbury, Lancaster, Liverpool, Manchester, Sheffield SouthGrid Birmingham, Bristol, Cambridge, Oxford, RAL PPD, Warwick London Brunel, Imperial, QMUL, RHUL, UCL Mostly funded by HEFCE

34 Steve LloydInaugural Lecture - 24 November 2004 Slide 34 The Grid at QM The Queen Mary e-Science High Throughput Cluster 174 PCs (348 CPUs) 40 TByte Disk Storage Part of the London Tier-2 Centre

35 Steve LloydInaugural Lecture - 24 November 2004 Slide 35 The LCG Grid 89 Sites 9,056 CPUs 3 PBytes Disk

36 Steve LloydInaugural Lecture - 24 November 2004 Slide 36 Grid Snapshot

37 Steve LloydInaugural Lecture - 24 November 2004 Slide 37 Challenges (Ex-)Concorde (15 km) CD stack with 1 year LHC data (~ 20 km) We are here (1 km) Scaling to full size ~10,000 → 100,000 CPUs Stability, Robustness etc Security (Hackers Paradise!) Sharing resources (in RAE environment!) International Collaboration Continued funding beyond start of LHC!

38 Steve LloydInaugural Lecture - 24 November 2004 Slide 38 Further Info http://www.gridpp.ac.uk


Download ppt "Steve LloydInaugural Lecture - 24 November 2004 Slide 1 The Data Deluge and the Grid Steve Lloyd Professor of Experimental Particle Physics Inaugural Lecture."

Similar presentations


Ads by Google