Presentation is loading. Please wait.

Presentation is loading. Please wait.

Applications and the Grid EDG CERN 19.03.2003 Ingo Augustin CERN DataGrid HEP Applications.

Similar presentations


Presentation on theme: "Applications and the Grid EDG CERN 19.03.2003 Ingo Augustin CERN DataGrid HEP Applications."— Presentation transcript:

1 Applications and the Grid EDG Tutorial @ CERN 19.03.2003 Ingo Augustin CERN DataGrid HEP Applications

2 19. March 2003I. Augustin, CERN2 Introduction You’ve heard much about WHAT the Grid is, but not much about WHY the Grid is, or will be, or should be or whatever…. The Rationale behind the Grid * )  Size:  The Large Hadron Collider Experiments  Geographical Distribution:  The Monarc Computing Model  Complexity:  Earth Observation Applications  User Community:  Biomedical Applications * ) I am a physicist! All mistakes in EO & Bio applications are due to my ignorance.

3 19. March 2003I. Augustin, CERN3 Electrical Power Grid Metaphor  Power on demand  User unaware of actual provider  Resilience  Re-routing  Redundancy  Simple interface  Wall socket  Standardised protocols  230 V, 50 Hz

4 19. March 2003I. Augustin, CERN4 LHC Experiments

5 19. March 2003I. Augustin, CERN5 More Complex Events < 2000 > 2007

6 19. March 2003I. Augustin, CERN6 Typical HEP Software Scheme Typical HEP Software Scheme Generate Events Generate Events Simulate Events Simulate Events Simulation geometry Build Simulation Geometry Build Simulation Geometry Reconstuction geometry Build Reconstruction Geometry Build Reconstruction Geometry Detector description Detector alignment Detector calibration Reconstruction parameters Reconstruct Events Reconstruct Events ESD AOD Analyze Events Analyze Events Physics Raw Data ATLAS Detector

7 19. March 2003I. Augustin, CERN7 Characteristics of HEP computing Eventindependence Event independence  Data from each collision is processed independently  Mass of independent problems with no information exchange Massivedatastorage Massive data storage  Modest event size: 1-25 MB  Total is very large - Petabytes for each experiment. Mostlyreadonly Mostly read only  Data never changed after recording to tertiary storage  But is read often ! cf.. magnetic tape as an archive medium Modestfloatingpointneeds Modest floating point needs  HEP computations involve decision making rather than calculation  Computational requirements in SPECint95 secs

8 19. March 2003I. Augustin, CERN8 Typical Layout of a Computing Farm (up to several hundred nodes) tape servers disk servers application servers to external network local network servers

9 19. March 2003I. Augustin, CERN9 The Constraints Taken from: LHC Computing Review, CERN/LHCC/2001-004 Needed during a year of LHC operations TapeDiskCPU 29’400 TB9’600 TB6.2 * 10 6 SI95 In today’s units: 60 STK Silos 160’000 60GB disks 150’000 800 MHz CPUs

10 World Wide Collaboration  distributed computing & storage capacity LHC:> 5000physicists > 270 institutes > 60 countries

11 19. March 2003I. Augustin, CERN11 World-wide computing Two problems:  Funding  will funding bodies place all their investment at CERN?  Geography  does a geographically distributed model better serve the needs of the world-wide distributed community? No Maybe – if it is reliable and easy to use Need to provide physicists with the best possible access to LHC data irrespective of location

12 19. March 2003I. Augustin, CERN12 Tier2 Lab a Uni a Lab c Uni n Lab m Lab b Uni b Uni y Uni x Present LHC Computing Model Present LHC Computing Model les.robertson@cern.ch CERN Physics Department    Desktop Karlsruhe Tier 1 USA FermiLab UK France Italy CERN USA Brookhaven ……….

13 19. March 2003I. Augustin, CERN13 Regional Center

14 19. March 2003I. Augustin, CERN14 The Dungeon  Pain (administration)  money  manpower  reduction by ~ 30% before start of LHC  commodity  PC & Network &...  Torture (users & history)  anarchic user community  legacy (software & structures)  evolution instead of projects  Execution (deadline)  2006/7 start-up of LHC

15 19. March 2003I. Augustin, CERN15 Earth Observation (WP9)  Global Ozone (GOME) Satellite Data Processing and Validation by KNMI, IPSL and ESA  The DataGrid testbed provides a collaborative processing environment for 3 geographically distributed EO sites (Holland, France, Italy) 19. March 200315

16 19. March 2003I. Augustin, CERN16 ENVISAT 3500 MEuro programme cost Launched on February 28, 2002 10 instruments on board 200 Mbps data rate to ground 400 Tbytes data archived/year ~100 “standard” products 10+ dedicated facilities in Europe ~700 approved science user projects 3500 MEuro programme cost Launched on February 28, 2002 10 instruments on board 200 Mbps data rate to ground 400 Tbytes data archived/year ~100 “standard” products 10+ dedicated facilities in Europe ~700 approved science user projects

17 19. March 2003I. Augustin, CERN17 Earth Observation  Two different GOME processing techniques will be investigated  OPERA (Holland) - Tightly coupled - using MPI  NOPREGO (Italy) - Loosely coupled - using Neural Networks  The results are checked by VALIDATION (France). Satellite Observations are compared against ground-based LIDAR measurements coincident in area and time.

18 19. March 2003I. Augustin, CERN18  Level-1 data (raw satellite measurements) are analysed to retrieve actual physical quantities : Level-2 data  Level-2 data provides measurements of OZONE within a vertical column of atmosphere at a given lat/lon location above the Earth’s surface  Coincident data consists of Level-2 data co-registered with LIDAR data (ground-based observations) and compared using statistical methods GOME OZONE Data Processing Model

19 19. March 2003I. Augustin, CERN19 ESA – KNMI Processing of raw GOME data to ozone profiles With OPERA and NNO IPSL Validate GOME ozone profiles With Ground Based measurements Raw satellite data from the GOME instrument Visualization The EO Data challenge: Processing and validation of 1 year of GOME data LIDAR data DataGrid Level 1 Level 2

20 19. March 2003I. Augustin, CERN20 DataNumber of files to be processed and replicated Size Level 1 (Satellite data) 4,72415 Mb Level 2 (NNO) 9,448,00010 kb Level 2 (Opera) 9,448,00012 kb Coincident (Validation) 122.5 Mb Total: 18,900,736 files267 Gbyte Part of a 5-year, global dataset 1 Year of GOME data EO Use-Case File Numbers

21 19. March 2003I. Augustin, CERN21 User Interface Replica Manager Submit job input data Mydata input data input data input data CESE Site H CESE Site G CESE Site F CESE Site E CESE Site D CESE Site C CESE Site B Replica Catalog Replicate DataMetaData Step 1:Transfer Level1 data to the Grid Storage Element User GOME Processing Steps (1-2) Step 2:Register Level1 data with the ReplicaManager Replicate to other SEs if necessary

22 19. March 2003I. Augustin, CERN22 Resource Broker CESE Site H CESE Site G CESE Site F CESE Site E CESE Site D CESE Site C CESE Site B Certificate Authorities User Interface JDL script Executable Myjob Submit job Request status Check certificate Search Information Index Replica Catalog Search input data input data input data LFN Retrieve result LFN PFN Logical filename Physical filename LFNPFN :: LFNPFN :: LFNPFN :: User Step 3: Submit jobs to process Level1 data, produce Level2 data Step 4:Transfer Level2 data products to the Storage Element GOME Processing Steps (3-4)

23 19. March 2003I. Augustin, CERN23 perform VALIDATION Step 5: Produce Level-2 / LIDAR Coincident data GOME Processing Steps (5-6) Step 6: Visualize Results Level 2LIDAR COINCIDENT DATA Validation

24 19. March 2003I. Augustin, CERN24 Genomics and Bioinformatics (WP10) Genomics and Bioinformatics (WP10)

25 19. March 2003I. Augustin, CERN25 Challenges for a biomedical grid  The biomedical community has NO strong center of gravity in Europe  No equivalent of CERN (High-Energy Physics) or ESA (Earth Observation)  Many high-level laboratories of comparable size and influence without a practical activity backbone (EMB-net, national centers,…) leading to:  Little awareness of common needs  Few common standards  Small common long-term investment  The biomedical community is very large (tens of thousands of potential users)  The biomedical community is often distant from computer science issues

26 19. March 2003I. Augustin, CERN26 Biomedical requirements  Large user community(thousands of users)  anonymous/group login  Data management  data updates and data versioning  Large volume management (a hospital can accumulate TBs of images in a year)  Security  disk / network encryption  Limited response time  fast queues  High priority jobs  privileged users  Interactivity  communication between user interface and computation  Parallelization  MPI site-wide / grid-wide  Thousands of images  Operated on by 10’s of algorithms  Pipeline processing  pipeline description language / scheduling

27 19. March 2003I. Augustin, CERN27 The grid impact on data handling  DataGrid will allow mirroring of databases  An alternative to the current costly replication mechanism  Allowing web portals on the grid to access updated databases Biomedical Replica Catalog Trembl(EBI) Swissprot (Geneva)

28 19. March 2003I. Augustin, CERN28 Web portals for biologists  Biologist enters sequences through web interface  Pipelined execution of bio-informatics algorithms  Genomics comparative analysis (thousands of files of ~Gbyte)  Genome comparison takes days of CPU (~n**2)  Phylogenetics  2D, 3D molecular structure of proteins…  The algorithms are currently executed on a local cluster  Big labs have big clusters …  But growing pressure on resources – Grid will help  More and more biologists  compare larger and larger sequences (whole genomes)…  to more and more genomes…  with fancier and fancier algorithms !!

29 19. March 2003I. Augustin, CERN29 The Visual DataGrid Blast  A graphical interface to enter query sequences and select the reference database  A script to execute the BLAST algorithm on the grid  A graphical interface to analyze result  Accessible from the web portal genius.ct.infn.it

30 19. March 2003I. Augustin, CERN30 Summary of added value provided by Grid for BioMed applications  Data mining on genomics databases (exponential growth).  Indexing of medical databases (Tb/hospital/year).  Collaborative framework for large scale experiments (e.g. epidemiological studies).  Parallel processing for  Databases analysis  Complex 3D modelling

31 19. March 2003I. Augustin, CERN31 Conclusions  Grid or Grid-like systems are clearly needed  EDG is a start that has to be followed up  EDG is nowhere near to be the “real thing”  Currently key focus is resilience and scalability

32 19. March 2003I. Augustin, CERN32 References  Some interesting WEB sites and documents  LHC Review http://lhc-computing-review-public.web.cern.ch/lhc-computing-review- public/Public/Report_final.PDF (LHC Computing Review)http://lhc-computing-review-public.web.cern.ch/lhc-computing-review- public/Public/Report_final.PDF  LCG http://lcg.web.cern.ch/LCGhttp://lcg.web.cern.ch/LCG http://lcg.web.cern.ch/LCG/SC2/RTAG6 (model for regional centres)http://lcg.web.cern.ch/LCG/SC2/RTAG6 http://lcg.web.cern.ch/LCG/SC2/RTAG4 (HEPCAL Grid use cases)http://lcg.web.cern.ch/LCG/SC2/RTAG4  GEANT http://www.dante.net/geant/ (European Research Networks)http://www.dante.net/geant/  POOL http://lcgapp.cern.ch/project/persist/http://lcgapp.cern.ch/project/persist/  WP8 http://datagrid-wp8.web.cern.ch/DataGrid-WP8/http://datagrid-wp8.web.cern.ch/DataGrid-WP8/ http://edmsoraweb.cern.ch:8001/cedar/doc.info?document_id=332409 ( Requirements)http://edmsoraweb.cern.ch:8001/cedar/doc.info?document_id=332409  WP9 http://styx.srin.esa.it/gridhttp://styx.srin.esa.it/grid http://edmsoraweb.cern.ch:8001/cedar/doc.info?document_id=332411 (Reqts)http://edmsoraweb.cern.ch:8001/cedar/doc.info?document_id=332411  WP10 http://marianne.in2p3.fr/datagrid/wp10/http://marianne.in2p3.fr/datagrid/wp10/ http://www.healthgrid.org http://www.creatis.insa-lyon.fr/MEDIGRID/ http://edmsoraweb.cern.ch:8001/cedar/doc.info?document_id=332412 http://edmsoraweb.cern.ch:8001/cedar/doc.info?document_id=332412 (Reqts)

33 19. March 2003I. Augustin, CERN33 The End (finally)


Download ppt "Applications and the Grid EDG CERN 19.03.2003 Ingo Augustin CERN DataGrid HEP Applications."

Similar presentations


Ads by Google