Grid Activity at CCS Toshiyuki Amagasa Center for Computational Sciences, Univertsity of Tsukuba 1.

Slides:



Advertisements
Similar presentations
Gfarm v2 and CSF4 Osamu Tatebe University of Tsukuba Xiaohui Wei Jilin University SC08 PRAGMA Presentation at NCHC booth Nov 19,
Advertisements

© 2007 Open Grid Forum Data Management Challenge - The View from OGF OGF22 – February 28, 2008 Cambridge, MA, USA Erwin Laure David E. Martin Data Area.
Data Management Expert Panel - WP2. WP2 Overview.
Status Report: JLDG ( T. Yoshie for JLDG) AGENDA 1. Current Status of JLDG 2. Reconfiguration/Extension Plan 3. Funding.
USING THE GLOBUS TOOLKIT This summary by: Asad Samar / CALTECH/CMS Ben Segal / CERN-IT FULL INFO AT:
Input Validation For Free Text Fields ADD Project Members: Hagar Offer & Ran Mor Academic Advisor: Dr Gera Weiss Technical Advisors: Raffi Lipkin & Nadav.
Sharing Lattices Throughout the World: An ILDG Status Report ILDG July 31, 2007.
3.1 © 2004 Pearson Education, Inc. Exam Managing and Maintaining a Microsoft® Windows® Server 2003 Environment Lesson 3: Introducing Active Directory.
Data Warehousing: Defined and Its Applications Pete Johnson April 2002.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Data Grid Web Services Chip Watson Jie Chen, Ying Chen, Bryan Hess, Walt Akers.
TECHNIQUES FOR OPTIMIZING THE QUERY PERFORMANCE OF DISTRIBUTED XML DATABASE - NAHID NEGAR.
QCDgrid Technology James Perry, George Beckett, Lorna Smith EPCC, The University Of Edinburgh.
6/1/2001 Supplementing Aleph Reports Using The Crystal Reports Web Component Server Presented by Bob Gerrity Head.
Grid Information Systems. Two grid information problems Two problems  Monitoring  Discovery We can use similar techniques for both.
ILDG5QCDgrid1 QCDgrid status report UKQCD data grid Chris Maynard.
Lattice 2004Chris Maynard1 QCDml Tutorial How to mark up your configurations.
The Japanese Virtual Observatory (JVO) Yuji Shirasaki National Astronomical Observatory of Japan.
NAREGI WP4 (Data Grid Environment) Hideo Matsuda Osaka University.
Presenter: Dipesh Gautam.  Introduction  Why Data Grid?  High Level View  Design Considerations  Data Grid Services  Topology  Grids and Cloud.
Polish Infrastructure for Supporting Computational Science in the European Research Space QoS provisioning for data-oriented applications in PL-Grid D.
SITools Enhanced Use of Laboratory Services and Data Romain Conseil
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
1 School of Computer, National University of Defense Technology A Profile on the Grid Data Engine (GridDaEn) Xiao Nong
CHEP 2000, Giuseppe Andronico Grid portal based data management for Lattice QCD data ACAT03, Tsukuba, work in collaboration with A.
ESP workshop, Sept 2003 the Earth System Grid data portal presented by Luca Cinquini (NCAR/SCD/VETS) Acknowledgments: ESG.
QCDGrid Progress James Perry, Andrew Jackson, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.
DATABASE MANAGEMENT SYSTEMS IN DATA INTENSIVE ENVIRONMENNTS Leon Guzenda Chief Technology Officer.
UKQCD QCDgrid Richard Kenway. UKQCD Nov 2001QCDgrid2 why build a QCD grid? the computational problem is too big for current computers –configuration generation.
ILDG Middleware Status Chip Watson ILDG-6 Workshop May 12, 2005.
ILDG Middleware Status Bálint Joó UKQCD University of Edinburgh, School of Physics on behalf of ILDG Middleware Working Group alternative title: Report.
The JuxMem-Gfarm Collaboration Enhancing the JuxMem Grid Data Sharing Service with Persistent Storage Using the Gfarm Global File System Gabriel Antoniu,
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
Lattice QCD Data Grid Middleware: status report M. Sato, CCS, University of Tsukuba ILDG6, May, 12, 2005.
Web: Minimal Metadata for Data Services Through DIALOGUE Neil Chue Hong AHM2007.
Dr Chris Maynard Application Consultant, EPCC Tools for ILDG.
The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.
1 ILDG Status in Japan  Lattice QCD Archive(LQA) a gateway to ILDG Japan Grid  HEPNet-J/sc an infrastructure for Japan Lattice QCD Grid A. Ukawa Center.
Presented by Scientific Annotation Middleware Software infrastructure to support rich scientific records and the processes that produce them Jens Schwidder.
GO-ESSP Workshop, LLNL, Livermore, CA, Jun 19-21, 2006, Center for ATmosphere sciences and Earthquake Researches Construction of e-science Environment.
Replica Management Services in the European DataGrid Project Work Package 2 European DataGrid.
Presented by Jens Schwidder Tara D. Gibson James D. Myers Computing & Computational Sciences Directorate Oak Ridge National Laboratory Scientific Annotation.
CGW 04, Stripped replication for the grid environment as a web service1 Stripped replication for the Grid environment as a web service Marek Ciglan, Ondrej.
UKQCD Grid Status Report GridPP 13 th Collaboration Meeting Durham, 4th—6th July 2005 Dr George Beckett Project Manager, EPCC +44.
Marking up lattice QCD configurations and ensembles for ILDG Metadata Working Group P.Coddington, B.Joo, C.Maynard, D.Pleiter, T.Yoshie Working group members.
1 Metadata Working G roup Report Members (fixed in mid-January) G.AndronicoINFN,Italy P.CoddingtonAdelaide,Australia R.EdwardsJlab,USA C.MaynardEdinburgh,UK.
Lattice QCD Data Grid Middleware: Meta Data Catalog (MDC) -- CCS ( tsukuba) proposal -- M. Sato, for ILDG Middleware WG ILDG Workshop, May 2004.
USQCD regional grid Report to ILDG /28/09ILDG14, June 5, US Grid Usage  Growing usage of gauge configurations in ILDG file format.  Fermilab.
Oracle to MySQL synchronization Gianni Pucciani CERN, University of Pisa.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
EGI-Engage Data Services and Solutions Part 1: Data in the Grid Vincenzo Spinoso EGI.eu/INFN Data Services.
Data and storage services on the NGS.
May 2005 PPARC e-Science PG School1 QCDgrid Chris Maynard A Grid for UKQCD National collaboration for lattice QCD.
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Site Authorization Service Local Resource Authorization Service (VOX Project) Vijay Sekhri Tanya Levshina Fermilab.
LHCC Referees Meeting – 28 June LCG-2 Data Management Planning Ian Bird LHCC Referees Meeting 28 th June 2004.
Simulation Production System Science Advisory Committee Meeting UW-Madison March 1 st -2 nd 2007 Juan Carlos Díaz Vélez.
Introduction to Core Database Concepts Getting started with Databases and Structure Query Language (SQL)
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
System Software Laboratory Databases and the Grid by Paul Watson University of Newcastle Grid Computing: Making the Global Infrastructure a Reality June.
Simulation Production System
Vincenzo Spinoso EGI.eu/INFN
Introduction to Data Management in EGI
Grid Portal Services IeSE (the Integrated e-Science Environment)
GSAF Grid Storage Access Framework
Databases.
ILDG Implementation Status
Cloud computing mechanisms
Introduction to the ILDG session
Presentation transcript:

Grid Activity at CCS Toshiyuki Amagasa Center for Computational Sciences, Univertsity of Tsukuba 1

About Myself 2  Name  Toshiyuki Amagasa  Affiliation:  Division of Computational Informatics, Center for Computational Sciences  Department of Computer Science, Graduate School of Systems and Information Engineering  Area of research  Data engineering  Database system  Recent topics  XML databases  Parallel XML query processing  OLAP analysis for XML  Web information extraction for XML  Databases in scientific applications  Faceted navigation for QCDml  Meteorological database

ILDG-JP Members 3  Prof. Mitsuhisa Sato (Director, CCS)  Prof. Tomoteru Yoshie (CCS)  Prof. Osamu Tatebe (CCS)  Dr. Naoya Ukita (CCS)  Prof. Toshiyuki Amagasa (CCS)

Talk Outline 4  Current Status of ILDG  A Brief History of JLDG  An Overview of JLDG  A Development of New ILDG Client  Faceted Navigation of QCDml  Conclusions and Future Work

Current Status of JLDG 5

A Brief History of JLDG (1/3)  Hepnet-J/sc (SINET GbE private network)  Widely-distributed file system  Network backbone: Super SINET VPN  Institutes / Universities: KEK, U. Tsukuba, Kyoto U., Osaka U., Hiroshima U., and Kanazawa U.  Objective and Implementation  Data sharing among institutes / universities, in which administrative policies are not homogeneous, while attaining security  Mirroring among FSs attached to SCs with administrative CP-PACS SX-5 KEK SR8000 SX-5 Hepnet-J/sc File Server 6

A Brief History of JLDG (2/3)  Problems  Growing cost for managing data location  A dataset may be distributed in several disks.  It is hard for users to remember location of data and mirrors.  No concepts of users and user groups  Hard to support multiple research groups.  Necessary functionalities  A flat data sharing system which has not space limit (or can be extended at anytime)  Users and user group management over several organizations  Japan Lattice Data Grid (JLDG)  Project launched in November 2005  Operation started in March

A Brief History of JLDG (3/3)  JLDG v1 started operation in May 2008  Available datasets  CP-PACS Nf=2 QCD configuration  8,000 files, 1.5 TBytes  CP-PACS/JLQCD Nf=2+1 QCD configuration  21,000 files, 6 TBytes  PACS-CS Nf= x64 lattice QCD configuration  2,600 files, 3 TBytes  JLDG v2 started operation in December 2009  Storing and sharing research data generated in daily research activities  Data sharing within a research group 8

An Overview of JLDG  A widely-distributed file system with 100 TB-scale storage for domestic researchers in particle physics  Sharing simulation data computed by SCs for several months to several years.  Data files are distributed. Create replications if necessary.  A user do not need to recognize file locations. Files can be accessed very quickly if the site has replicas.  Storage space can be incrementally added during operation. Kyoto Kanazawa SINET3 Network Gfarm file system ILDG Tsukuba KEK Osaka Hiroshima 9

Software Components  Globus Toolkit V4 (ANL)  GSI authentication, Proxy user certificate creation  GridFTP server / client  VOMS (EDG)  VO management  Naregi-CA (Naregi)  User / host certificate creation  Gfarm file system (U. of Tsukuba) datafarm.apgrid.org  Widely-distributed file system  Uberftp (NCSA)   Interactive GridFTP client 10

Gfarm Distributed File System  An open-source distributed file system  A global namespace to unify storage systems  Scalable I/O performance exploiting data access locality  Automated replica selection for fault-tolerance and load- balancing Gfarm File System /gfarm ggfjp aistgtrc file1file3 file2 file4 file1file2 Replica creation Global namespace Mapping 11

System Configuration at each Site  Current configuration (90 TB)  6.4 TB x 3 (KEK, Kyoto, Osaka, Hiroshima, Kanazawa)  71 TB (Tsukuba)  Note: Available space varies depending on replica status.  Access by GridFTP client SINET3 6.4 TB 6.4 TB disk (500 GB x 16 RAID6) Dualcore 2.33GHz Xeon x 2 4 GB memory GridFTP Server GridFTP Server gfsd Client Super Computer Super Computer GridFTP Client GridFTP Client LAN 12

Adding File Servers  File servers can be added at each site if necessary.  Increase the total disk space  Adding GridFTP servers to balance the loads. SINET3 6.4 TB GridFTP Server GridFTP Server gfsd Client Super Computer Super Computer GridFTP Client GridFTP Client gfsd LAN GridFTP Server GridFTP Server 13

Summary 14  JLDG  A brief history  An overview  Used as an infrastructure for daily research activity  Hands on meeting on 27 Jan., 2009 Successfully done with19 attendees

Development of a New ILDG Client 15

Int’l Lattice Data Grid (ILDG) 16  A data grid for sharing Lattice QCD configuration  File Formats in ILDG  Configuration binary  LIME (Lattice QCD Interchange Message Encapsulation)  Metadata (QCDml)  ensemble XML  configuration XML  LFN (Logical File Name)  Identifier for configuration binary ensemble XML ensemble XML configuration XML configuration XML configuration (binary) configuration (binary) LFN configuration XML configuration XML configuration (binary) configuration (binary) LFN configuration XML configuration XML configuration (binary) configuration (binary) LFN configuration XML configuration XML configuration (binary) configuration (binary) LFN markovChainURI

QCDml Ensemble XML 17 mc://JLDG/CP-PACS/RCNF2/RC12x24- B1800K014090C CP-PACS RCNF2 (Nf=2 full QCD with iwasaki RG gauge and tadpole improved clover quark action) B1800 Phys.Rev. D65 (2002) (hep-lat/ ), Erratum-ibid. D67 (2003) add T.Yoshie Center fof Computational Sciences, University of Tsukuba

Typical Usecase of ILDG 18 Find desired data by MDCFind nearby site by FCAccess to the siteData transfer LFN (Logical File Name) SURL (Site URL) TURL (Transfer URL) VOMS Authentication

Difficulties in Finding Desired Configuration 19  Directly use query language (XQuery / XPath)  A simple example:  Knowledge about XML, QCDml, and XQuery (XPath) are needed.  Hard to get the whole picture of available data.  Hierarchical list  Easy to use.  Need huge screen to show the entire list.  Still difficult to get the whole picture of the data. /markovChain[descendant::node()[local-name() = 'beta'] [number(text()) > 4] and descendant::node() [local-name() = 'collaboration'][text() = 'CSSM']] /markovChain[descendant::node()[local-name() = 'beta'] [number(text()) > 4] and descendant::node() [local-name() = 'collaboration'][text() = 'CSSM']]

Basic Idea 20  Applying faceted-navigation interface to browse QCDml ensemble XML data.

Faceted-Navigation 21  What is “faceted-navigation”?  A scheme for browsing objects with attributes.  Successfully used in some applications, such as Apple iTunes.  Procedure  A user select a value in a facet  To select a set of objects of interest  The system updates the list of objects, list of facets, and respective values  (Repeat)  Example  The Flamenco Search

The Flamenco Search 22

The Flamenco Search 23

The Flamenco Search 24

Faceted-Navigation 25  Good features  Users have a freedom to choose a facet  c.f. Hierarchical list  Give a big picture of the dataset  Available values along with their population  Effective  Busch’s Law: 4 facets consisting of 10 values are enough to deal with 10,000 objects.

Technical Challenges 26  How to define facets?  How to extract values according to the facets?  How to achieve quick response from the database for improving user experience?

Choosing the Facets  Discussion with Prof. Yoshie, Dr. Ishii, and Prof, Tatebe.  Selected elements from QCDml ensemble XML  Regional grid  Collaboration  Project name  Number of flavors  Time  Parameters  Lattice size  Gluon action  Parameters  Quark action  Parameters 27

Extracting Values from a Facet (1/3)  Extract text values  Collaboration  Project name  Need substring extraction  Date 28 CP-PACS CP-PACS+JLQCD CSSM LHPC MILC RBC-UKQCD UKQCD dik etmc gral qcdsf sesam theta txl … CP-PACS CP-PACS+JLQCD CSSM LHPC MILC RBC-UKQCD UKQCD dik etmc gral qcdsf sesam theta txl … 2+1 DWF 2+1 Dynamical AsqTAD Baryon Resonances Dynamical FLIC Studies Electromagnetic Form Factors FLIC Overlap Studies Flux Tube Test Gluon Propagator Long_aqstad_run Pentaquark Volume Dependence … 2+1 DWF 2+1 Dynamical AsqTAD Baryon Resonances Dynamical FLIC Studies Electromagnetic Form Factors FLIC Overlap Studies Flux Tube Test Gluon Propagator Long_aqstad_run Pentaquark Volume Dependence … T21:39:33+09:00

Extracting Values from a Facet (2/3)  Need text value generation  Lattice size e.g.  12 / 12 / 12 / X 12 Y 12 Z 12 T 24 … X 12 Y 12 Z 12 T 24 …

Extracting Values from a Facet (2/3)  Gluon action / Quark action  An element name itself represents a value  Extract element name as a value of a facet

QCDml Ensemble (ILDG) & Configuration (JLDG) QCDml Ensemble (ILDG) & Configuration (JLDG) Facet Navigation System (PHP + SQL + XQuery) Facet Navigation System (PHP + SQL + XQuery) QCDml Faceted Navigation I/F System Configuration Facet Database XML DB (eXist) RDBMS (MySQL) Facet extraction (XQuery) ILDG JLDG CSSM LDG UKQCD USQCD Web Server (Apache) Web Server (Apache) Downloading Ensemble XML 31

Database Design (1/2) 32  Use RDBMS for quick response  Use fixed relational schema for extensibility *************************** 1. row *************************** uri: mc://cssm/su3b08500k1300s12t24DBW2FLIC property: rgrid value: cssm *************************** 2. row *************************** uri: mc://cssm/su3b08500k1300s12t24DBW2FLIC property: collaboration value: CSSM *************************** 3. row *************************** uri: mc://cssm/su3b08500k1300s12t24DBW2FLIC property: projectName value: Dynamical FLIC Studies *************************** 4. row *************************** uri: mc://cssm/su3b08500k1300s12t24DBW2FLIC property: date value: 2007 *************************** 1. row *************************** uri: mc://cssm/su3b08500k1300s12t24DBW2FLIC property: rgrid value: cssm *************************** 2. row *************************** uri: mc://cssm/su3b08500k1300s12t24DBW2FLIC property: collaboration value: CSSM *************************** 3. row *************************** uri: mc://cssm/su3b08500k1300s12t24DBW2FLIC property: projectName value: Dynamical FLIC Studies *************************** 4. row *************************** uri: mc://cssm/su3b08500k1300s12t24DBW2FLIC property: date value: 2007

Database Design (2/2) 33  Store preformatted text for improving rendering performance *************************** 1. row *************************** collaboration: CSSM size: 12/12/12/24 uri: mc://cssm/su3b08500k1300s12t24DBW2FLIC nf: 2 gact: DBW2GluonAction (beta=8.5) qact: fatLinkIrrelevantCloverQuarkAction (nf=2/kappa=0.1300) *************************** 2. row *************************** collaboration: CSSM size: 8/8/8/16 uri: mc://cssm/su3b09836s8t16DBW2 nf: gact: DBW2GluonAction (beta=9.836) qact: *************************** 1. row *************************** collaboration: CSSM size: 12/12/12/24 uri: mc://cssm/su3b08500k1300s12t24DBW2FLIC nf: 2 gact: DBW2GluonAction (beta=8.5) qact: fatLinkIrrelevantCloverQuarkAction (nf=2/kappa=0.1300) *************************** 2. row *************************** collaboration: CSSM size: 8/8/8/16 uri: mc://cssm/su3b09836s8t16DBW2 nf: gact: DBW2GluonAction (beta=9.836) qact:

A Screenshot of the System 34

Conclusion and Future Work 35  Conclusion  Current Status of ILDG  A Development of New ILDG Client  Future work  Exploring more chances to apply data engineering techniques in various e-Science fields.  Data mining  Data integration  …

Thank you very much for your kind attention Questions should be addressed to 36