Presentation is loading. Please wait.

Presentation is loading. Please wait.

Querying Large Physics Data Sets Over an Information Grid How can we ensure that we have adequate information available over Grids to resolve our physics.

Similar presentations


Presentation on theme: "Querying Large Physics Data Sets Over an Information Grid How can we ensure that we have adequate information available over Grids to resolve our physics."— Presentation transcript:

1 Querying Large Physics Data Sets Over an Information Grid How can we ensure that we have adequate information available over Grids to resolve our physics queries ? CHEP’2001 Beijing, China. 3rd-6th September 2001.

2 CRISTAL Project Team2 Contents Data, Information & Knowledge Grids Distributed queries and workflow management Sub-query tracking and analysis processing Multi-layered systems and information abstraction Mapping description-driven workflow services onto Grids Conclusions, future activities.

3 CRISTAL Project Team3 Layered Grid Technologies Knowledge Grid Data Control Data Abstraction Data Grid Information Grid  Warehousing, distributed databases, streaming, near-line storage, large objects, access mechanisms, data staging…  Metadata, middleware, intelligent retrieval, information modelling, warehousing, workflow…  Data mining, visualisation, simulation, problem solving methods/environments …  Software component reuse, design reuse

4 CRISTAL Project Team4 Computing Research Challenges Knowledge Grid Data Control Data Abstraction Data Grid Information Grid  Metadata, ontologies, controlled vocabularies…  Semi structured data, info modelling and knowledge representation…  Data transformation & tracking…  Multi-context and evolving data…  Automated integration…  Intelligent Agents, resource discovery…  Knowledge discovery, prediction, machine learning… Support creative discovery of knowledge/ information via fast, ubiquitous, universal, and homogeneous access to heterogeneous assets

5 CRISTAL Project Team5 Metadata Requirements Used to describe information for: –data integration –to inform the analysis process e.g. navigation / location –to allow system flexibility and evolution –catalogues collections of data –provides mechanisms for access /security control More than just summary data - metadata is active and changes with system usage. Need to trace how data (and metadata) evolves.

6 CRISTAL Project Team6 HEP Query Analysis Very large amount of data to be collected (petaBytes) Physicist must have access to all the detector data –Raw data, events, runs, calibration data, simulation data System must be flexible, configurable, scalable Physicist needs to access multiple sources of (multiple versions of) data and algorithms Physicist must be able to carry out his own analysis on his own workstation Traceability of data and queries therefore crucial.

7 CRISTAL Project Team7 Multi-system Solution FNAL Regional Centre DESY Regional Centre IN2P3 Regional Centre CERN Regional Centre Experiment/CERN Data Locations, Routes Shared data & meta-data Analysis specific data Synchronisation required Data synchronisation Complex versioning Data replication Large network bandwidth required Data redundancy Analysis Workstations

8 CRISTAL Project Team8 A Single Logical System Data Management and Physics Algorithm are separately located and their workflows managed Algorithm provided by physicist and executed in the regional centres where the data reside Multiple versions of the same algorithm can coexist Data location must be transparent to the physicist Local query implies distributed processing Query result returned when algorithm has previously been executed

9 CRISTAL Project Team9 Single Logical System Solution FNAL Regional Centre DESY Regional Centre IN2P3 Regional Centre CERN Regional Centre Experiment/CERN Data Locations, Routes Shared data & meta-data Analysis-specific data Knowledge is stored alongside data Active (meta-)objects manage various versions of data and algorithms Smaller network bandwidth required Analysis Workstations Query Result 2 Query Result 1 Local Query1 Local Query1 Local Query1 Local Query1 Local Query2 Local Query2 Local Query2 Local Query2

10 CRISTAL Project Team10 Query/analysis processing 0: Physicist develops and registers algorithm 1: Physicist submits query locally 2: Query Handler decompose query and locates data 3: If Algorithm has been previously executed, results are immediately returned 4: Algorithms are executed where their data resides 5: Results returned to Query Handler for presentation to and further analysis by the physicist

11 CRISTAL Project Team11 How does this map onto Grids? Resource-specific implementations of basic services E.g., Transport protocols, name servers, differentiated services, CPU schedulers, public key infrastructure, site accounting, directory service, OS bypass Resource-independent and application-independent services authentication, authorization, resource location, resource allocation, events, accounting, remote data access, information, policy, fault detection Distributed Computing Toolkit Grid Fabric Middleware Services Application Toolkits Data- Intensive Applications Toolkit Collaborative Applications Toolkit Remote Visualization Applications Toolkit Problem Solving Applications Toolkit Remote Instrumentation Applications Toolkit Applications Chemistry Biology Cosmology High Energy Physics Environment Data and query tracking region

12 CRISTAL Project Team12 EU Data Grid Project Data and query tracking region

13 CRISTAL Project Team13 Grids Anatomy Application Layer Collective Layer Resource Layer Connectivity Layer Fabric Layer Application Layer Collective Layer Resource Layer Connectivity Layer Fabric Layer Application Transport Internet Link

14 CRISTAL Project Team14 Fabric & Connectivity Fabric Layer : Provides the resources on which the shared access will happen Implements local resource-specific operation. Resources should implement enquiry mechanism & resource management mech. Connectivity: Authentication protocols required for Grid Specific network transaction having: –Single sign on –Delegation –Integration with various local security solutions –User based trust relationships

15 CRISTAL Project Team15 Resource & Collective Services Resource : Provides protocol for initiation, monitoring and control of operations on shared resources. –Information Protocols, Management Protocols Collective Services: Contains APIs which are global in nature and capture interactions across collections of resources. Implementation is based on Resource Layer protocols Implements a wide variety of sharing behaviours –Directory services, co-allocation, scheduling, brokering services etc MetaData & traceability are essential features of Collective Services

16 CRISTAL Project Team16 The Role of Scientific WFM Manage and control GRID resource complexity –By describing tasks, steps and activities Examples: –Application Management Application resource requirements descriptions Application tasks descriptors –Data Traceability Describe algorithm/application execution steps Describe and manage data sets and versions “Handling data on a grid should be part of a workflow” (GGF Grid Computing Environments working group)

17 CRISTAL Project Team17 Current Grid Workflow Activities Global Grid Forum – Grid Computing Environments Working Group –Mississippi Computational Portal Web /XML based –GALE end to end automation of analyst’s workflow also web/XML based Workflow Management for Cosmology Collaboratory –Lawrence Berkeley National Laboratory (Stewart Loken, CHEP 2001 paper 10-036, see proceedings) Other projects ?

18 CRISTAL Project Team18 MetaData & OMG UML Model Is-an- Instance of data abstraction

19 CRISTAL Project Team19 A Description-Driven Model

20 CRISTAL Project Team20 Conclusions on MetaData Meta-objects : Provide flexibility & reusability of definitions Handle complexity in large scale systems Allow co-existence of multiple versions of data Minimise effects of system evolution Provide ‘hooks’ for interoperability Can be queried for data navigation --> the basis of Grids workflow collective services.

21 CRISTAL Project Team21 Future Activities Members of the GGF’s Grid Computing Environment working group - workflow interest group OMG members (invited to OMG Grids Workshop, Boston) E-science UK-sponsored CERN Fellowship to work alongside CMS physicists on workflow management E-science funding for UK computer scientists to work with the EU DataGrid project E-science Generic Middleware open call Continue to develop the CRISTAL workflow engine.

22 CRISTAL Project Team22 Reference papers F. Estrella, Z. Kovacs, J-M Le Goff & R. McClatchey, “Model and Information Abstraction for Description-Driven Systems”. Accepted paper at this conference (ID 8-053). F. Estrella, “Objects, Patterns and Descriptions in Data Management”, PhD Thesis, University of the West of England, Bristol, England, December 2000. J. Draskic et al., “Using a Meta-Model as the Basis for Enterprise-Wide Data Navigation” Proc of the 3rd IEEE Meta-Data Conference Bethesda, Maryland, USA. April 1999F. Available at http://computer.org/conferen/proceed/meta/1999/ J-M Le Goff et al., “Design Patterns for Description-Driven Systems”. CHEP 2001 Computer Physics Communications, in print and CMS NOTE 1999_045. N. Baker et al., “Component-Based Approach to Scientific Workflow Management”. ACAT’2000 conference, FermiLab, October 2000 and CMS NOTE 2001_024


Download ppt "Querying Large Physics Data Sets Over an Information Grid How can we ensure that we have adequate information available over Grids to resolve our physics."

Similar presentations


Ads by Google