Ischia, Italy July Session 9 Wednesday 12 th July Malcolm Atkinson
Ischia, Italy July 20062
3 Plan for talk Reminder of distributed system realities Motivation for Distributed Data Management –Example applications Data lifecycle –Generation, Storage, Use & Update, Archiving & Deletion/De-allocation Data movement isn’t free –But it is needed in many forms OGSA Data Architecture Examples of Data Services in Grid systems Summary & Conclusions
Ischia, Italy July Principles of Distributed Computing Issues you can’t avoid –Lack of Complete Knowledge (LOCK) –Latency –Heterogeneity –Autonomy –Unreliability –Change A Challenging goal –balance technical feasibility –against virtual homogeneity, stability and reliability Appropriate balance between usability and productivity –while remaining affordable, manageable and maintainable This is still NOT easy
Ischia, Italy July Reminder of Engineering Trade offs Challenge Heterogeneity & Variety Complex platform behaviour Partial failures Partial failures + large tasks Autonomy – owner’s rights Independent provision Scale, costs & latency Vulnerable to misuse Diverse & Evolving Valuable assets –Reputation, equipment, teams, data, algorithms, working practices Goal Simple operational model Simple application model Simple user model Minimal resource wastage Stability & uniformity Simple resource access Good performance Dependable protection Flexible & agile IPR & assets well protected
Ischia, Italy July 20066
7 Compound Causes of Data Growth Faster devices Cheaper devices Higher-resolution –all ~ Moore’s law Increased processor throughput – more derived data Cheaper & higher-volume storage Remote data more accessible –Public policy to make research data available –Bandwidth increases –Latency doesn’t get less though A product of effects faster than Moore’s law
Ischia, Italy July Diverse Data Sources Output from Modelling and Simulation –Increasingly sophisticated and detailed models –Longer model runs –More model runs Data from Observation –Arrays of deployed instruments Oceans, biosphere, atmosphere, geophysics, space, … Exploration systems Engineering & built-environment monitoring People monitoring – health, safety, security, finance, epidemiology, … –Systematic data collection Space-based earth observation Extensive sky surveys Large experiments – iter, LHC, … Extensive automated laboratories – crystallography, biochemistry, medicine, … Powered by Grids
Ischia, Italy July Diverse Data Sources 2 Commercial and Industrial data –Customer tracking –Production and product tracking –Digital entertainment media –Financial tracking & transactions Governmental & Socio-economic data –Census, surveys, enquiries, legal –Spatio-temporal socio-economic & historic data Derived data –Analysis, calibration & summaries more data –Computer-based composition e.g. automated annotation of sequences Powered by Grids
Ischia, Italy July
Ischia, Italy July Immense potential Increasing size of data collections –Allow smaller scale phenomena –Rarer phenomena –To be investigated / detected Increasing scope of data collections –Allow larger scale phenomena to be investigated Diverse data collections –Allow discovery by combining data from multiple sources E.g. the earthquake fault in China (see Highlights) Business intelligence from data –A crucial competitive advantage Growth in number of data collections –Generates a combinatorial expansion of the opportunities
Ischia, Italy July Exploiting that potential There are knowledge nuggets in the data But the data are in many places Mining them is hard –Finding, extracting & fetching relevant data –Processing the sheer volumes of data –Using more sophisticated matching Transformations to deal with data collection systems Transformations to remove “known” phenomena, hiding new phenomena Combinatorial space searches Delicate matching criteria Sophisticated statisics
Ischia, Italy July Exploiting that potential 2 Requires large amounts of data management –Acquisition, storage, cataloguing, movement, archiving, discard Requires large amounts of computation Requires the usual AAA controls –and sometimes privacy mechanisms Requires provenance trails & metadata records –E.g. for attribution & legal requirements –To handle re-computations efficiently This combination requires grids Data needs more
Ischia, Italy July Interpretational Challenges Finding & Accessing data –Variety of mechanisms & policies Interpreting data –Variety of forms, representations, value systems & ontologies Independent provision & ownership –Autonomous changes in availability, form, policy, … –Regional variations in legal requirements Processing data –Understanding how it may be related –Devising models that expose the relationships Presenting results –Humans need either Derived small volumes of statistics Visualisations Requires insight & creativity
Ischia, Italy July
Ischia, Italy July Motivation Entering an age of data –Data Explosion –CERN: LHC will generate 1GB/s = 10PB/y –VLBA (NRAO) generates 1GB/s today –Pixar generate 100 TB/Movie –Storage getting cheaper Data stored in many different ways –Data resources –Relational databases –XML databases –Flat files Need ways to facilitate –Data discovery –Data access –Data integration Empower e-Business and e-Science –The Grid is a vehicle for achieving this
Ischia, Italy July Composing Observations in Astronomy Data and images courtesy Alex Szalay, John Hopkins No. & sizes of data sets as of mid-2002, grouped by wavelength 12 waveband coverage of large areas of the sky Total about 200 TB data Doubling every 12 months Largest catalogues near 1B objects
Ischia, Italy July 2006© 18 Biomedical data – making connections acatttctac caacagtgga tgaggttgtt ggtctatgtt ctcaccaaat ttggtgttgt cagtctttta aattttaacc tttagagaag agtcatacag tcaatagcct tttttagctt gaccatccta atagatacac agtggtgtct cactgtgatt ttaatttgca ttttcctgct gactaattat gttgagcttg ttaccattta gacaacttca ttagagaagt gtctaatatt taggtgactt gcctgttttt ttttaattgg Slide provided by Carole Goble: University of Manchester
PDB 33,367 Protein structures EMBL DB 111,416,302,701 nucleotides Database Growth Slide provided by Richard Baldock: MRC HGU Edinburgh
China Workshops December 2005 GODIVA Data Portal Grid for Ocean Diagnostics, Interactive Visualisation and Analysis Daily Met Office Marine Forecasts and gridded research datasets National Centre for Ocean Forecasting ~3Tb climate model datastore via Web Services Interactive Visualisations inc. Movies ~ 30 accesses a day worldwide Other GODIVA software produces 3D/4D Visualisations reading data remotely via Web Services Online Movies
China Workshops December 2005 GODIVA Visualisations Unstructured Meshes Grid Rotation/Interpolation GeoSpatial Databases v. Files (Postgres, IBM, Oracle) Perspective 3D Visualisation Google maps viewer
China Workshops December 2005 NERC Data Grid The DataGrid focuses on federation of NERC Data Centres Grid for data discovery, delivery and use across sites Data can be stored in many different ways (flat files, databases…) Strong focus on Metadata and Ontologies Clear separation between discovery and use of data. Prototype focussing on Atmospheric and Oceanographic data
global in-flight engine diagnostics in-flight data airline maintenance centre ground station global network eg SITA internet, , pager DS&S Engine Health Center data centre Distributed Aircraft Maintenance Environment: Leeds, Oxford, Sheffield &York, Jim Austin 100,000 aircraft 0.5 GB/flight 4 flights/day 200 TB/day Now BROADEN Significant in getting Boeing 787 engine contract
Asif Usmani FireGrid Technologies Maps, models, scenarios Super-real-time simulation (HPC) KBS and Planning Emergency Responders 1000s of sensors & gateway processing Grid
Ischia, Italy July
Ischia, Italy July Terabyte → Petabyte TerabytePetabyte RAM time to move15 minutes2 months 1GB WAN move time 10 hours ($1000) 14 months ($1 million) Disk cost7 disks = $5000 (SCSI) 6800 Disks units + 32 racks = $7 million Disk power100 Watts100 Kilowatts Disk weight5.6 Kg33 Tonnes Disk footprintInside machine 60 m 2 Approximately Correct in May 2003 Distributed Computing Economics Jim Gray, Microsoft Research, MSR-TR
Ischia, Italy July Mohammed & Mountains Petabytes of Data cannot be moved (often) –It stays where it is produced or curated Hospitals, observatories, European Bioinformatics Institute, … –A few caches and a small proportion cached –Sometimes replicated Diverse data collections –Discovery depends on insights –Unpredictable or unexpected use of data –Remote users –Composition from multiple sites
Ischia, Italy July Move computation to the data Assumption: code size << data size –Minimise data transport Provision combined storage & compute resources Develop the database philosophy for this? –Queries are programs safe to run near data Develop the storage architecture for this? –Selection computation hosted close to storage Develop experiment, sensor & simulation architectures –That take code to select and digest data as an output control –That automatically attach the provenance & metadata Data Cutter a step in this direction –Sub-setting and aggregation of datasets using filters executed close to data –
Ischia, Italy July Reduce data movement: Caching Strategies Caching –Based on coherence in demand for data –The same applications, individuals or groups –Request the same or similar data repeatedly –Until their focus of interest moves Save data locally –To save re-fetching
Ischia, Italy July Reduce data movement: Caching Strategies Challenges –Choosing the right amount to store locally –Balance storage costs v re-fetch costs –Detect stale data an update on the original has made local data invalid –Detect a scan much larger than cache A cache would then generate extra costs for no return –But much research data is static or has regular update patterns Easier to cache –Some researchers accept bounded staleness E.g. 1 day, 1 week
Ischia, Italy July Reduce data movement: Caching Strategies Who handles those challenges? –Automated services –Application developers Exploiting general application-specific properties Requires infrastructure & application knowledge May be out-of-date when used –End-users Exploiting particular run properties But requires infrastructure and application knowledge Hard to develop Liable to be out of date –Adaptive controllers Learning on coherent workloads
Ischia, Italy July Reduce data movement: Replication Strategies Replication for –Reliability & Performance –Increase chance to be or compute data near Challenges –Logical to physical map maintenance and use –Deciding what to replicate Which data, which subsets/supersets of data –Propagating updates –Recycling storage space used for replicas –Physically making the copies More data movement Similar decision options
Ischia, Italy July Reduce delays from data movement Pre-load required data near execution host Pre-load required code near data Challenges –Storage availability in right place –Licence availability –Clean up after execution –Clean up after failures
Ischia, Italy July Streaming What is streaming? –C.f. Unix pipes & video / audio delivery Despatch and delivery of data in increments –Continuous or long stream of blocks –Data generator writes to stream when data ready May block if stream capacity reached –Data consumer reads from stream when needs data Will block when stream empty –Stream management system May organise intermediate buffer sizes and storage
Ischia, Italy July Streaming 2 Why stream –The consumer can start as soon as some data is ready Overlap execution times, e.g. on different processors Pipelined execution –The cost of intermediary storage may be avoided Read and write times to secondary memory Local RAM (and processor cache) consumption –The scale of data processed can exceed store capacity –The stream may flow from continuous recorders –The stream may flow to continuous enactors –Enables computational steering Stream of visualisable data showing computation or experiment progress
Ischia, Italy July Streaming 3 Why not stream –For small data transfers Stream administration may outweigh simple transfer time –Operation may require all of the data E.g. sort, aggregate calculation (count, sum, average, standard deviation) Can stream in our out in some cases – but not if both source & destination require all of the data
Ischia, Italy July
Ischia, Italy July OGSA Capabilities Security Cross-organizational users Trust nobody Authorized access only Security Cross-organizational users Trust nobody Authorized access only Information Services Registry Notification Logging/auditing Information Services Registry Notification Logging/auditing Execution Management Job description & submission Scheduling Resource provisioning Execution Management Job description & submission Scheduling Resource provisioning Data Services Common access facilities Efficient & reliable transport Replication services Data Services Common access facilities Efficient & reliable transport Replication services Self-Management Self-configuration Self-optimization Self-healing Self-Management Self-configuration Self-optimization Self-healing Resource Management Discovery Monitoring Control Resource Management Discovery Monitoring Control OGSA OGSA “profiles” Web services foundation Hiro Kishimoto: Keynote GGF17
Ischia, Italy July Issues Find Describe Access Data Formats Protocols Use cases Data Move/Copy/Replicate Metadata Data Manage Common access Data Services The basic problem Manage, transfer and access distributed data services and resources The basic problem Manage, transfer and access distributed data services and resources Derived dataCatalog SensorData stream Text file Relational database Hiro Kishimoto: Keynote GGF17
Ischia, Italy July Basic Data Services Data Resources Managed Storage Data Resources Transfer Protocols Storage Managemen t Data Managemen t Other Data Services Transfer Registries Non-OGSA client APIs & other services Service interface Resource interface Hiro Kishimoto: Keynote GGF17
Ischia, Italy July Data Services Data Service n Data Service 1 Data Service 2 Composite Data Services Replication Cache Federation Hiro Kishimoto: Keynote GGF17
Ischia, Italy July Data Services Name Service Data Service 1 Data Service 2 File System Data Services File Service
Ischia, Italy July Basic Data Interfaces Storage Management −e.g. Storage Resource Management (SRM) Storage Management −e.g. Storage Resource Management (SRM) Data Access −ByteIO −Data Access & Integration (DAI) Data Access −ByteIO −Data Access & Integration (DAI) Data Transfer −Data Movement Interface Specification (DMIS) −Protocols (e.g. GridFTP) Data Transfer −Data Movement Interface Specification (DMIS) −Protocols (e.g. GridFTP) Replica management Metadata catalog Cache management Replica management Metadata catalog Cache management Hiro Kishimoto: Keynote GGF17
Ischia, Italy July
Ischia, Italy July Storage Resource Manager (SRM) de facto & written standard in physics, … Collaborative effort –CERN, FNAL, JLAB, LBNL and RALCERN, FNAL, JLAB, LBNL and RAL Essential bulk file storage –(pre) allocation of storage abstraction over storage systems –File delivery / registration / access –Data movement interfaces E.g. gridFTP Rich function set –Space management, permissions, directory, data transfer & discovery
Ischia, Italy July Storage Resource Broker (SRB) SDSC developed Widely used –Archival document storage –Scientific data: bio-sciences, medicine, geo-sciences, … Manages –Storage resource allocation abstraction over storage systems –File storage –Collections of files –Metadata describing files, collections, etc. –Data transfer services
Ischia, Italy July OMII Data Management Services FTP –File Transfer Service OGSA-DAI –Access to structured data
Ischia, Italy July Condor Data Management Stork –Manages File Transfers –May manage reservations Nest –Manages Data Storage –C.f. GridFTP with reservations Over multiple protocols
Ischia, Italy July Globus Tools and Services for Data Management l GridFTP u A secure, robust, efficient data transfer protocol l The Reliable File Transfer Service (RFT) u Web services-based, stores state about transfers l The Data Access and Integration Service (DAIS) u Service to access to data resources, particularly relational and XML databases l The Replica Location Service (RLS) u Distributed registry that records locations of data copies l The Data Replication Service u Web services-based, combines data replication and registration functionality Slides from Ann Chervenak
Ischia, Italy July A Replica Location Service l A Replica Location Service (RLS) is a distributed registry that records the locations of data copies and allows replica discovery u RLS maintains mappings between logical identifiers and target names u Must perform and scale well: support hundreds of millions of objects, hundreds of clients l E.g., LIGO (Laser Interferometer Gravitational Wave Observatory) Project u RLS servers at 10 sites u Maintain associations between 6 million logical file names & 40 million physical file locations Slides from Ann Chervenak
Ischia, Italy July LRC RLI LRC Replica Location Indexes Local Replica Catalogs Replica Location Index (RLI) nodes aggregate information about one or more LRCs LRCs use soft state update mechanisms to inform RLIs about their state: relaxed consistency of index Optional compression of state updates reduces communication, CPU and storage overheads RLS Features Local Replica Catalogs (LRCs) contain consistent information about logical-to-target mappings Slides from Ann Chervenak
Ischia, Italy July Components of RLS Implementation l Common server implementation for LRC and RLI l Front-End Server u Multi-threaded u Written in C u Supports GSI Authentication using X.509 certificates l Back-end Server u MySQL, PostgreSQL and Oracle Relational Database l Client APIs: C, Java, Python l Client Command line tool Slides from Ann Chervenak
Ischia, Italy July RLS in Production Use: LIGO l Laser Interferometer Gravitational Wave Observatory Currently use RLS servers at 10 sites u Contain mappings from 6 million logical files to over 40 million physical replicas l Used in customized data management system: the LIGO Lightweight Data Replicator System (LDR) u Includes RLS, GridFTP, custom metadata catalog, tools for storage management and data validation Slides from Ann Chervenak
Ischia, Italy July RLS in Production Use: ESG l Earth System Grid: Climate modeling data (CCSM, PCM, IPCC) l RLS at 4 sites l Data management coordinated by ESG portal l Datasets stored at NCAR u TB in total files u 1230 portal users l IPCC Data at LLNL u TB in 59,300 files u 400 registered users u Data downloaded: TB in 263,800 files u Avg. 300GB downloaded/day u 200+ research papers being written Slides from Ann Chervenak
Ischia, Italy July gLite Data Management FTS –File Transfer Service LFC –Logical file catalogue Replication Service –Accessed through LFC AMGA –Metadata services
Enabling Grids for E-sciencE INFSO-RI Ischia, Italy July nd EGEE Review, CERN - gLite Middleware Status 56 Data Management Services FiReMan catalog –Resolves logical filenames (LFN) to physical location of files (URL understood by SRM) and storage elements –Oracle and MySQL versions available –Secure services, using VOMS groups, ACL support for DNs –Full set of Command Line tools –Simple API for C/C++ wrapping a lot of the complexity for easy usage –Attribute support –Symbolic link support –Exposing ServiceIndex and DLI (for matchmaking) –Separate catalog available as a keystore for data encryption (‘Hydra’) –Deployed on the Pre-Production Service and DILIGENT testbed gLite I/O –Posix-like access to Grid files –Castor, dCache and DPM support –Added a remove method to be able to delete files –Changed the configuration to match all other CLI configuration to service-discovery –Improved error reporting –Has been used for the BioMedical Demo Encryption and DICOM SRM –Deployed on the Pre-Production Service and the DILIGENT testbed AMGA MetaData Catalog –NA4 contribution Result of JRA1 & NA4 prototyping together with PTF assessment Used by the LHCb experiment Has been used for the BioMedical Demo
Enabling Grids for E-sciencE INFSO-RI Ischia, Italy July nd EGEE Review, CERN - gLite Middleware Status 57 File Transfer Service Reliable file transfer Full scalable implementation –Java Web Service front-end, C++ Agents, Oracle or MySQL database support –Support for Channel, Site and VO management –Interfaces for management and statistics monitoring –Gsiftp, SRM and SRM-copy support Has been in use by the Service Challenges for the last 5 months. –Evolved together with the Service Challenges Team –Daily meetings FTS evolved over summer to include –Support for MySQL and Oracle –Multi-VO support –GridFTP and SRM copy support –MyProxy server as a CLI argument –Many small changes/optimizations revealed by SC3 usage FTS workshop with LHC experiments on November 16 –Issues, Feedback and short term plans
Ischia, Italy July
Ischia, Italy July Summary: Take home message Distributed data management –A motivation for grids –Grids will not work without it Principal Requirements and functions –Abstracted storage management Data lifetime Bulk data: files & collections of files Creation, cataloguing, description, protection, movement, access & update, deletion/de-allocation –Varieties of data movement –Replication & caching
Ischia, Italy July