Presentation is loading. Please wait.

Presentation is loading. Please wait.

Grid Activity at CCS Toshiyuki Amagasa Center for Computational Sciences, Univertsity of Tsukuba 1.

Similar presentations


Presentation on theme: "Grid Activity at CCS Toshiyuki Amagasa Center for Computational Sciences, Univertsity of Tsukuba 1."— Presentation transcript:

1 Grid Activity at CCS Toshiyuki Amagasa Center for Computational Sciences, Univertsity of Tsukuba 1

2 About Myself 2  Name  Toshiyuki Amagasa  Affiliation:  Division of Computational Informatics, Center for Computational Sciences  Department of Computer Science, Graduate School of Systems and Information Engineering  Area of research  Data engineering  Database system  Recent topics  XML databases  Parallel XML query processing  OLAP analysis for XML  Web information extraction for XML  Databases in scientific applications  Faceted navigation for QCDml  Meteorological database

3 ILDG-JP Members 3  Prof. Mitsuhisa Sato (Director, CCS)  Prof. Tomoteru Yoshie (CCS)  Prof. Osamu Tatebe (CCS)  Dr. Naoya Ukita (CCS)  Prof. Toshiyuki Amagasa (CCS)

4 Talk Outline 4  Current Status of ILDG  A Brief History of JLDG  An Overview of JLDG  A Development of New ILDG Client  Faceted Navigation of QCDml  Conclusions and Future Work

5 Current Status of JLDG 5

6 A Brief History of JLDG (1/3)  Hepnet-J/sc 2002- (SINET GbE private network)  Widely-distributed file system  Network backbone: Super SINET VPN  Institutes / Universities: KEK, U. Tsukuba, Kyoto U., Osaka U., Hiroshima U., and Kanazawa U.  Objective and Implementation  Data sharing among institutes / universities, in which administrative policies are not homogeneous, while attaining security  Mirroring among FSs attached to SCs with administrative CCP @Tsukuba CP-PACS RCNP @Osaka SX-5 CRC @ KEK SR8000 YITP @Kyoto SX-5 Hepnet-J/sc File Server 6

7 A Brief History of JLDG (2/3)  Problems  Growing cost for managing data location  A dataset may be distributed in several disks.  It is hard for users to remember location of data and mirrors.  No concepts of users and user groups  Hard to support multiple research groups.  Necessary functionalities  A flat data sharing system which has not space limit (or can be extended at anytime)  Users and user group management over several organizations  Japan Lattice Data Grid (JLDG)  Project launched in November 2005  Operation started in March 2007 7

8 A Brief History of JLDG (3/3)  JLDG v1 started operation in May 2008  Available datasets  CP-PACS Nf=2 QCD configuration  8,000 files, 1.5 TBytes  CP-PACS/JLQCD Nf=2+1 QCD configuration  21,000 files, 6 TBytes  PACS-CS Nf=2+1 323x64 lattice QCD configuration  2,600 files, 3 TBytes  JLDG v2 started operation in December 2009  Storing and sharing research data generated in daily research activities  Data sharing within a research group 8

9 An Overview of JLDG  A widely-distributed file system with 100 TB-scale storage for domestic researchers in particle physics  Sharing simulation data computed by SCs for several months to several years.  Data files are distributed. Create replications if necessary.  A user do not need to recognize file locations. Files can be accessed very quickly if the site has replicas.  Storage space can be incrementally added during operation. Kyoto Kanazawa SINET3 Network Gfarm file system ILDG www.jldg.org Tsukuba KEK Osaka Hiroshima 9

10 Software Components  Globus Toolkit V4 (ANL) www.globus.org  GSI authentication, Proxy user certificate creation  GridFTP server / client  VOMS (EDG)  VO management  Naregi-CA (Naregi) www.naregi.org  User / host certificate creation  Gfarm file system (U. of Tsukuba) datafarm.apgrid.org  Widely-distributed file system  Uberftp (NCSA)  http://dims.ncsa.uiuc.edu/set/uberftp/  Interactive GridFTP client 10

11 Gfarm Distributed File System  An open-source distributed file system  A global namespace to unify storage systems  Scalable I/O performance exploiting data access locality  Automated replica selection for fault-tolerance and load- balancing Gfarm File System /gfarm ggfjp aistgtrc file1file3 file2 file4 file1file2 Replica creation Global namespace Mapping 11

12 System Configuration at each Site  Current configuration (90 TB)  6.4 TB x 3 (KEK, Kyoto, Osaka, Hiroshima, Kanazawa)  71 TB (Tsukuba)  Note: Available space varies depending on replica status.  Access by GridFTP client SINET3 6.4 TB 6.4 TB disk (500 GB x 16 RAID6) Dualcore 2.33GHz Xeon x 2 4 GB memory GridFTP Server GridFTP Server gfsd Client Super Computer Super Computer GridFTP Client GridFTP Client LAN 12

13 Adding File Servers  File servers can be added at each site if necessary.  Increase the total disk space  Adding GridFTP servers to balance the loads. SINET3 6.4 TB GridFTP Server GridFTP Server gfsd Client Super Computer Super Computer GridFTP Client GridFTP Client gfsd LAN GridFTP Server GridFTP Server 13

14 Summary 14  JLDG  A brief history  An overview  Used as an infrastructure for daily research activity  Hands on meeting on 27 Jan., 2009 Successfully done with19 attendees

15 Development of a New ILDG Client 15

16 Int’l Lattice Data Grid (ILDG) 16  A data grid for sharing Lattice QCD configuration  File Formats in ILDG  Configuration binary  LIME (Lattice QCD Interchange Message Encapsulation)  Metadata (QCDml)  ensemble XML  configuration XML  LFN (Logical File Name)  Identifier for configuration binary ensemble XML ensemble XML configuration XML configuration XML configuration (binary) configuration (binary) LFN configuration XML configuration XML configuration (binary) configuration (binary) LFN configuration XML configuration XML configuration (binary) configuration (binary) LFN configuration XML configuration XML configuration (binary) configuration (binary) LFN markovChainURI

17 QCDml Ensemble XML 17 mc://JLDG/CP-PACS/RCNF2/RC12x24- B1800K014090C1600 1 CP-PACS RCNF2 (Nf=2 full QCD with iwasaki RG gauge and tadpole improved clover quark action) B1800 Phys.Rev. D65 (2002) 054505 (hep-lat/0105015), Erratum-ibid. D67 (2003) 059901 1 add T.Yoshie Center fof Computational Sciences, University of Tsukuba

18 Typical Usecase of ILDG 18 Find desired data by MDCFind nearby site by FCAccess to the siteData transfer LFN (Logical File Name) SURL (Site URL) TURL (Transfer URL) VOMS Authentication

19 Difficulties in Finding Desired Configuration 19  Directly use query language (XQuery / XPath)  A simple example:  Knowledge about XML, QCDml, and XQuery (XPath) are needed.  Hard to get the whole picture of available data.  Hierarchical list  Easy to use.  Need huge screen to show the entire list.  Still difficult to get the whole picture of the data. /markovChain[descendant::node()[local-name() = 'beta'] [number(text()) > 4] and descendant::node() [local-name() = 'collaboration'][text() = 'CSSM']] /markovChain[descendant::node()[local-name() = 'beta'] [number(text()) > 4] and descendant::node() [local-name() = 'collaboration'][text() = 'CSSM']]

20 Basic Idea 20  Applying faceted-navigation interface to browse QCDml ensemble XML data.

21 Faceted-Navigation 21  What is “faceted-navigation”?  A scheme for browsing objects with attributes.  Successfully used in some applications, such as Apple iTunes.  Procedure  A user select a value in a facet  To select a set of objects of interest  The system updates the list of objects, list of facets, and respective values  (Repeat)  Example  The Flamenco Search http://flamenco.berkeley.edu/

22 The Flamenco Search http://flamenco.berkeley.edu/ 22

23 The Flamenco Search http://flamenco.berkeley.edu/ 23

24 The Flamenco Search http://flamenco.berkeley.edu/ 24

25 Faceted-Navigation 25  Good features  Users have a freedom to choose a facet  c.f. Hierarchical list  Give a big picture of the dataset  Available values along with their population  Effective  Busch’s Law: 4 facets consisting of 10 values are enough to deal with 10,000 objects.

26 Technical Challenges 26  How to define facets?  How to extract values according to the facets?  How to achieve quick response from the database for improving user experience?

27 Choosing the Facets  Discussion with Prof. Yoshie, Dr. Ishii, and Prof, Tatebe.  Selected elements from QCDml ensemble XML  Regional grid  Collaboration  Project name  Number of flavors  Time  Parameters  Lattice size  Gluon action  Parameters  Quark action  Parameters 27

28 Extracting Values from a Facet (1/3)  Extract text values  Collaboration  Project name  Need substring extraction  Date 28 CP-PACS CP-PACS+JLQCD CSSM LHPC MILC RBC-UKQCD UKQCD dik etmc gral qcdsf sesam theta txl … CP-PACS CP-PACS+JLQCD CSSM LHPC MILC RBC-UKQCD UKQCD dik etmc gral qcdsf sesam theta txl … 2+1 DWF 2+1 Dynamical AsqTAD Baryon Resonances Dynamical FLIC Studies Electromagnetic Form Factors FLIC Overlap Studies Flux Tube Test Gluon Propagator Long_aqstad_run Pentaquark Volume Dependence … 2+1 DWF 2+1 Dynamical AsqTAD Baryon Resonances Dynamical FLIC Studies Electromagnetic Form Factors FLIC Overlap Studies Flux Tube Test Gluon Propagator Long_aqstad_run Pentaquark Volume Dependence … 2000 2005 2006 2007 2008 2000 2005 2006 2007 2008 2007-02-26T21:39:33+09:00

29 Extracting Values from a Facet (2/3)  Need text value generation  Lattice size e.g.  12 / 12 / 12 / 24 29 X 12 Y 12 Z 12 T 24 … X 12 Y 12 Z 12 T 24 …

30 Extracting Values from a Facet (2/3)  Gluon action / Quark action  An element name itself represents a value  Extract element name as a value of a facet 30 http://www.jldg.org/JLDG/... http://www.jldg.org/JLDG/... www.lqcd.org/ildg/pla... www.lqcd.org/ildg/pla...

31 QCDml Ensemble (ILDG) & Configuration (JLDG) QCDml Ensemble (ILDG) & Configuration (JLDG) Facet Navigation System (PHP + SQL + XQuery) Facet Navigation System (PHP + SQL + XQuery) QCDml Faceted Navigation I/F System Configuration Facet Database XML DB (eXist) RDBMS (MySQL) Facet extraction (XQuery) ILDG JLDG CSSM LDG UKQCD USQCD Web Server (Apache) Web Server (Apache) Downloading Ensemble XML 31

32 Database Design (1/2) 32  Use RDBMS for quick response  Use fixed relational schema for extensibility *************************** 1. row *************************** uri: mc://cssm/su3b08500k1300s12t24DBW2FLIC property: rgrid value: cssm *************************** 2. row *************************** uri: mc://cssm/su3b08500k1300s12t24DBW2FLIC property: collaboration value: CSSM *************************** 3. row *************************** uri: mc://cssm/su3b08500k1300s12t24DBW2FLIC property: projectName value: Dynamical FLIC Studies *************************** 4. row *************************** uri: mc://cssm/su3b08500k1300s12t24DBW2FLIC property: date value: 2007 *************************** 1. row *************************** uri: mc://cssm/su3b08500k1300s12t24DBW2FLIC property: rgrid value: cssm *************************** 2. row *************************** uri: mc://cssm/su3b08500k1300s12t24DBW2FLIC property: collaboration value: CSSM *************************** 3. row *************************** uri: mc://cssm/su3b08500k1300s12t24DBW2FLIC property: projectName value: Dynamical FLIC Studies *************************** 4. row *************************** uri: mc://cssm/su3b08500k1300s12t24DBW2FLIC property: date value: 2007

33 Database Design (2/2) 33  Store preformatted text for improving rendering performance *************************** 1. row *************************** collaboration: CSSM size: 12/12/12/24 uri: mc://cssm/su3b08500k1300s12t24DBW2FLIC nf: 2 gact: DBW2GluonAction (beta=8.5) qact: fatLinkIrrelevantCloverQuarkAction (nf=2/kappa=0.1300) *************************** 2. row *************************** collaboration: CSSM size: 8/8/8/16 uri: mc://cssm/su3b09836s8t16DBW2 nf: gact: DBW2GluonAction (beta=9.836) qact: *************************** 1. row *************************** collaboration: CSSM size: 12/12/12/24 uri: mc://cssm/su3b08500k1300s12t24DBW2FLIC nf: 2 gact: DBW2GluonAction (beta=8.5) qact: fatLinkIrrelevantCloverQuarkAction (nf=2/kappa=0.1300) *************************** 2. row *************************** collaboration: CSSM size: 8/8/8/16 uri: mc://cssm/su3b09836s8t16DBW2 nf: gact: DBW2GluonAction (beta=9.836) qact:

34 A Screenshot of the System 34

35 Conclusion and Future Work 35  Conclusion  Current Status of ILDG  A Development of New ILDG Client  Future work  Exploring more chances to apply data engineering techniques in various e-Science fields.  Data mining  Data integration  …

36 Thank you very much for your kind attention Questions should be addressed to amagasa@cs.tsukuba.ac.jp 36


Download ppt "Grid Activity at CCS Toshiyuki Amagasa Center for Computational Sciences, Univertsity of Tsukuba 1."

Similar presentations


Ads by Google