Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Quantum Chromodynamics Grid James Perry, Andrew Jackson, Matthew Egbert, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.

Similar presentations


Presentation on theme: "The Quantum Chromodynamics Grid James Perry, Andrew Jackson, Matthew Egbert, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh."— Presentation transcript:

1 The Quantum Chromodynamics Grid James Perry, Andrew Jackson, Matthew Egbert, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh

2 Overview The data grid The metadata catalogue and browser Conclusions References

3 Overview Aim – To implement a 'QCDgrid' to become a production environment for UKQCD, a collaboration of UK Scientists carrying out Quantum Chromodynamics (QCD) simulations The Grid – This multi-terabyte storage system will supporting distributed data management across four UK sites: Edinburgh, Glasgow, Liverpool and Swansea Funding – QCDGrid is part of the GridPP project, a PPARC funded initiative

4 Why build a QCD Grid? QCD currently generates terabytes – petabytes of data –Especially when their purpose built HPC system QCDOC comes on line –Post-processing is highly diverse and distributed –Involves multinational collaborations The challenge is to store and access this data – Secure, reliable and expandable distributed storage system required Initially, the QCDGrid project aims to address this issue – Develop a multi-terabyte storage system, supporting distributed data management across different UK sites

5 The QCDGrid Stage 1: Implement a multi-site data storage Grid –Globus toolkit for toolkit for basic grid operations e.g. data transfer, security – Globus replica catalogue for to maintain a directory of files on the Grid –Intend to use EDG software in the future e.g. for file replication Stage 2: Develop structured data which describes the characteristics of the raw data (metadata) –Develop an XML schema for lattice QCD Calculations –Implement a metadata catalogue –Develop a metadata catalogue browser

6 The QCDGrid Structure

7 Basic DataGrid Requirements The data grid must distribute data across the four sites Robustly –Each file must be replicated at at least two sites Efficiently –Where possible, files should be stored close to where they are needed most often Transparently –End users should not need to be concerned with how the data grid is implemented

8 DataGrid Implementation Hardware –Storage elements are PCs –Data stored in RAID arrays – cheap and offer built in redundancy Software –Red Hat Linux 7.2 OS –Globus Toolkit 2.0 used for low level grid services –European DataGrid software intended to be used in next phase for data replication/job submission –Custom written QCDGrid software builds on Globus to implement QCDGrid client tools and control thread

9 Data Grid Structure

10 Simple Use Case – Adding a File The user issues a put command The software chooses a suitable storage element and copies the file to its NEW directory On its next scan, the control thread finds the new file and moves it to its actual home, registering it with the replica catalogue On its next scan, the control thread finds there is only one copy of the file and makes another one at a suitable site, registering it with the replica catalogue

11 Simple Use Case – Getting a File The user issues a get command on a client machine The software looks up the replica catalogue to find the nearest copy of the file The file is transferred from that copy If the transfer fails, the software looks up the replica catalogue again to find the next nearest copy, and tries to transfer that instead

12 Fault Tolerance Probably the most important requirement of QCDgrid Central control thread –Constantly monitoring nodes to make sure they are still working Node fails without warning – sent to the system administrator –Control thread begins to replicate the files that were on the node elsewhere Nodes can be temporarily disabled if they have to be shut down or rebooted –Prevents the grid moving data around unnecessarily A secondary node is constantly monitoring the central node –Backing up the replica catalogue and configuration files. –Grid can still be accessed (albeit read-only) if the central node goes down

13 Current Progress Data grid software has been implemented and is undergoing testing A 4 node test grid has been set up across two of the sites (Edinburgh and Liverpool) A web-based status monitor exists, allowing users to check the state of the data grid

14 Metadata Storing metadata which describes the actual data –This allow users to see what is on the grid and find what they want more easily Data described by XML metadata files –A schema is being developed for the QCD metadata The XML files stored centrally in an XML database – the QCDGrid metadata catalogue –Using Apache Xindice The XML files will also be submitted to the data grid itself –Ensures there is a backup copy of the metadata –Metadata catalogue can be reconstructed from the data grid in the event that it is lost

15 Implementation of Metadata Data submitted to the grid must be accompanied by a valid metadata file This can be enforced by checking it against the schema A submission tool (graphical or command line) takes care of sending the data and metadata to the right places The Xindice XML database is accessed as a grid service The API for this is being developed by the OGSA DAI project A graphical metadata browser will allow easy access to data stored on the grid, based on meaningful characteristics

16 Current Progress XML schema development is well advanced –Prototype available Metadata browser applet exists –May require modification due to changes in APIs used Metadata catalogue –OGSA DAI project are providing grid service software to QCDGrid

17 Conclusions Aim – To implement a 'QCDgrid' to become a production environment for UKQCD Developed a prototype distributed data grid – Adding real data to the grid this month Developed a prototype XML schema and browser Utilising the OGSA DAI grid service software for the XML metadata catalogue

18 References QCDGrid –Software mailing list: –Project information –Or see: /grid/qcdgrid/ –Example Schema, see: –http://www.ph.ed.ac.uk/ukqcd/community/the_grid/xml_sc hema/xml_schema.html GridPP –http://www.gridpp.ac.uk


Download ppt "The Quantum Chromodynamics Grid James Perry, Andrew Jackson, Matthew Egbert, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh."

Similar presentations


Ads by Google