Dr Chris Maynard Application Consultant, EPCC +44 131 650 5077 Muttering about metadata Report from the Metadata work group.

Slides:



Advertisements
Similar presentations
DOCUMENT TYPES. Digital Documents Converting documents to an electronic format will preserve those documents, but how would such a process be organized?
Advertisements

Introduction to HTML & CSS
What is XML? a meta language that allows you to create and format your own document markups a method for putting structured data into a text file; these.
XML: Extensible Markup Language
SPECIAL TOPIC XML. Introducing XML XML (eXtensible Markup Language) ◦A language used to create structured documents XML vs HTML ◦XML is designed to transport.
HTML/XML XHTML Authoring. Creating Tables  Table: An arrangement of horizontal rows and vertical columns. The intersection of a row and a column is called.
ILDG File Format Chip Watson, for Middleware & MetaData Working Groups.
IS 373—Web Standards Todd Will
1 COS 425: Database and Information Management Systems XML and information exchange.
Tutorial 11 Creating XML Document
WWW and Internet The Internet Creation of the Web Languages for document description Active web pages.
CORE 2: Information systems and Databases HYPERTEXT/ HYPERMEDIA.
Introduction to XML This material is based heavily on the tutorial by the same name at
1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation.
Introduce of XML Xiaoling Song CS157A. What is XML? XML stands for EXtensible Markup Language XML stands for EXtensible Markup Language XML is a markup.
4/20/2017.
ECA 228 Internet/Intranet Design I Intro to XML. ECA 228 Internet/Intranet Design I HTML markup language very loose standards browsers adjust for non-standard.
Pemrograman Berbasis WEB XML part 2 -Aurelio Rahmadian- Sumber: w3cschools.com.
QCDgrid Technology James Perry, George Beckett, Lorna Smith EPCC, The University Of Edinburgh.
Data Exchange Tools (DExT) DExT PROJECTAN OPEN EXCHANGE FORMAT FOR DATA enables long-term preservation and re-use of metadata,
Server-side Scripting Powering the webs favourite services.
16-1 The World Wide Web The Web An infrastructure of distributed information combined with software that uses networks as a vehicle to exchange that information.
CPS120: Introduction to Computer Science The World Wide Web Nell Dale John Lewis.
ILDG5QCDgrid1 QCDgrid status report UKQCD data grid Chris Maynard.
Lattice 2004Chris Maynard1 QCDml Tutorial How to mark up your configurations.
Practical RDF Chapter 1. RDF: An Introduction
Introduction technology XSL. 04/11/2005 Script of the presentation Introduction the XSL The XSL standard Tools for edition of codes XSL Necessary resources.
XP 1 CREATING AN XML DOCUMENT. XP 2 INTRODUCING XML XML stands for Extensible Markup Language. A markup language specifies the structure and content of.
Report from Metadata Working Group ILDG7 (Dec.08,2005) T.Yoshie for MDWG CCS,Tsukuba ILDG6  file format was proposed and approved  QCDml1.1 had been.
Another PillowTalk Presentation  2004 Dynamic Systems, Inc. Introduction to XML for SOA Lee H. Burstein,
An Introduction to XML Presented by Scott Nemec at the UniForum Chicago meeting on 7/25/2006.
1 © Netskills Quality Internet Training, University of Newcastle Introducing XML © Netskills, Quality Internet Training University.
XHTML,XML M.Abdullah Mrian. What is the XHTML Why XHTML ?
What is XML?  XML stands for EXtensible Markup Language  XML is a markup language much like HTML  XML was designed to carry data, not to display data.
 XML is designed to describe data and to focus on what data is. HTML is designed to display data and to focus on how data looks.  XML is created to structure,
Intro. to XML & XML DB Bun Yue Professor, CS/CIS UHCL.
Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.
XML About XML Things to be known Related Technologies XML DOC Structure Exploring XML.
QCDGrid Progress James Perry, Andrew Jackson, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.
XML A web enabled data description language 4/22/2001 By Mark Lawson & Edward Ryan L’Herault.
Presentation Topic: XML and ASP Presented by Yanzhi Zhang.
XML eXtensible Markup Language. Topics  What is XML  An XML example  Why is XML important  XML introduction  XML applications  XML support CSEB.
WEB BASED DATA TRANSFORMATION USING XML, JAVA Group members: Darius Balarashti & Matt Smith.
Windows Presentation Foundation (WPF) Chapter 16 Dr. Abraham.
Waqas Anwar Next SlidePrevious Slide. Waqas Anwar Next SlidePrevious Slide XML XML stands for EXtensible Markup Language.
1 Introduction  Extensible Markup Language (XML) –Uses tags to describe the structure of a document –Simplifies the process of sharing information –Extensible.
Dr Chris Maynard Application Consultant, EPCC Tools for ILDG.
Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.
What it is and how it works
XML Introduction. Markup Language A markup language must specify What markup is allowed What markup is required How markup is to be distinguished from.
Metadata Quality The GLOBE experience Frans Van Assche Secretary General
Report from Metadata Working Group ILDG9 (Dec.01,2006) T. Yoshie for MDWG CCS,Tsukuba ILDG8  QCDml1.3 solved all known issues, except “action normalization”
Internet & World Wide Web How to Program, 5/e. © by Pearson Education, Inc. All Rights Reserved.2.
Marking up lattice QCD configurations and ensembles for ILDG Metadata Working Group P.Coddington, B.Joo, C.Maynard, D.Pleiter, T.Yoshie Working group members.
1 Metadata Working G roup Report Members (fixed in mid-January) G.AndronicoINFN,Italy P.CoddingtonAdelaide,Australia R.EdwardsJlab,USA C.MaynardEdinburgh,UK.
May 2005 PPARC e-Science PG School1 QCDgrid Chris Maynard A Grid for UKQCD National collaboration for lattice QCD.
Introduction to the World Wide Web & Internet CIS 101.
JavaScript Introduction and Background. 2 Web languages Three formal languages HTML JavaScript CSS Three different tasks Document description Client-side.
C Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Introduction to XML Standards.
 XML derives its strength from a variety of supporting technologies.  Structure and data types: When using XML to exchange data among clients, partners,
Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved Chapter 7 Representing Web Data:
XML Notes taken from w3schools. What is XML? XML stands for EXtensible Markup Language. XML was designed to store and transport data. XML was designed.
ILDG MDWG report Chris Maynard. ILDG Status QCDml1.3  Issues outstanding at ILDG7 –Management –Ensemble observables –Valid.
1 Extensible Stylesheet Language (XSL) Extensible Stylesheet Language (XSL)
XML QUESTIONS AND ANSWERS
XML in Web Technologies
Prepared for Md. Zakir Hossain Lecturer, CSE, DUET Prepared by Miton Chandra Datta
Chapter 7 Representing Web Data: XML
Introduction to the ILDG session
CSE591: Data Mining by H. Liu
Presentation transcript:

Dr Chris Maynard Application Consultant, EPCC Muttering about metadata Report from the Metadata work group Review of QCDml

metadata Meta- Greek among, with, after, from Data Latin information Literally data about data Data –Gauge configuration –Ensemble of gauge configurations Metadata (MD) –How was data created –Format, machine, code, algorithm, physics /03/2009XML at light

Why do we need metadata? Extreme example no metadata –Cfgs have random string names with no directory structure for different ensembles –Impossible to use Organise files –Into directories for ensembles –Give cfgs names with markov chain position Construct a scheme for the metadata –Rules for describing the data /03/2009XML at light

A basic scheme Use “meaningful” filenames –Metadata is encoded into the names of the files and directories –Can have some structure or hierarchy –But not completely flexible –Example with three fields. What ordering? –gauge-action/volume/quark-mass –gauge-action/quark-mass/volume –What happens for 2+1 flavours –Not extensible /03/2009XML at light

D52C202K3500U010010_LL3450X_FL3400X_CMesonT00T31  Old UKQCD meson correlator filename –What does X stand for?  Wilson, Rotated, Clover –Many different clover. Scheme broken –X means none of the above! D52C202K3500U010010_LL3450X_FL3400X_CMesonT00T31 Dynamical c SW =2.02 ≈ NP determined – no information /03/2009XML at light A broken scheme

A better scheme Recreate data from MD –This is an very important requirement –Know what the data is – Data provenance Combination of IO parameters and code :^) –Implicit assumptions are recorded! :^) :^( –Version n cannot read version m :^( :^( –Code X cannot process MD from Code Y :^( Can we construct a general scheme? –Recreate data from MD? /03/2009XML at light

Extensible schemes LQCD metadata is hierarchical –Rich structure –Metadata scheme has to reflect this –Extensible New metadata requires a new scheme –In extensible scheme –Old scheme is included in new one –Old metadata fits in new scheme –No need to refactor existing documents /03/2009XML at light

Markup language Combines text and information about text –Presentational –text format (e.g. This slide) WISIWIG –Procedural –presentation of text, not WISIWIG –Tex, postscript –Descriptive or semantic –Labels fragments of text –No presentation or other interpretation mandated –SGML, XML, VML HTML has both procedural and descriptive elements /03/2009XML at light

XML eXtensible Markup Language –XML is for structuring data –XML looks a bit like HTML –XML is text, but isn't meant to be read –XML is verbose by design –XML is a family of technologies –XML is license-free, platform-independent and well-supported – happiness /03/2009XML at light XML Web standard

XML II Semantic, eXtensible Markup language XML was designed to carry data, not to display data –Cf. with HTML, designed for displaying data. –Incompatible applications can exchange data wrapped in xml XML is just plain text User defined tags allow structure to be developed –Lattice QCD metadata is structured XML does not DO anything –You need an application for this XML schema –Defines a set of rules for the XML document 10-12/03/2009XML at light

Well formed XML /03/2009XML at light

XML schema What is XML schema? –Collection of rules for XML documents –Other schema languages, DTD, Relax NG, Schematron –An XML schema is itself an XML document Why do we need an XML schema? –Computers can read and understand XML IDs – 16 –Meaning of length is context dependent –Applications can know types, parse and processes XML data –Could just be an XSLT style sheet to transform XML in HTML and render a web page e.g. LDG web-client /03/2009XML at light

A word of caution XML is not magic –XML is not a solution –It is a useful tool –Not the only tool –We still have to use the tool Ideally produce metadata from code –What metadata? –What is standard/useful/implicit –Application has to do something with it /03/2009XML at light

Metacrap Not all aspects/connotations of metadata are good Metacrap: Putting the torch to seven straw-men of the meta-utopia Amusing and valid critique of some MD ideas –But not all are relevant to this project /03/2009XML at light

International lattice data grid (ILDG) Lattice data is very expensive to generate Many groups now make data available –MILC, UKQCD, RBC, JLQCD, Adelaide group –Many others share data ILDG is representative of whole lattice community –MD working group (CMM is UK rep) –Middleware working group (MGB, RO [epcc]) /03/2009XML at light

Introducing QCDml XML schemata for gauge configuration MD Developed and maintained by MDWG –Design by committee – always a good idea Basic concept –MD describing an ensemble –MD describing a configuration belonging an ensemble /03/2009XML at light

QCDml Ensemble /03/2009XML at light

Physics /03/2009XML at light

Fermion action inheritance /03/2009XML at light

Example QCDml Ensemble 1 Name of schema (URI) 2 Using W3.org XML schema (URI) 3 Schema location for –a) named schema (URI) –b) Location of schema (URL) 4 Name of Ensemble (URI) 20XML at light

Example QCDml Ensemble 1 Name of schema (URI) 2 Using W3.org XML schema (URI) 3 Schema location for –a) named schema (URI) –b) Location of schema (URL) 4 Name of Ensemble (URI) 21XML at light

Example QCDml Ensemble 1 Name of schema (URI) 2 Using W3.org XML schema (URI) 3 Schema location for –a) named schema (URI) –b) Location of schema (URL) 4 Name of Ensemble (URI) 22XML at light

Example QCDml Ensemble 1 Name of schema (URI) 2 Using W3.org XML schema (URI) 3 Schema location for –a) named schema (URI) –b) Location of schema (URL) 4 Name of Ensemble (URI) 23XML at light

Example quark action /03/2009XML at light

QCDml Config /03/2009XML at light

Markov Step /03/2009XML at light

Name hierarchy Unique name in for ensemble in ILDG namespace Ensemble Config Replica catalogue Actual file instances Multiple copies /03/2009XML at light

Algorithm General scheme too complex Algorithmic MD can belong to ensemble or config. Either name value pairs Or import another schema –Lives in separate namespace 28XML at light

Namespaces Allow another namespace to be imported here Application processing QCDml can ignore this namespace Can include all metadata into XML IDs Local applications can be alg schema aware, but ignore non-local ones /03/2009XML at light

Alg Namespace example Imported namespace has it’s own schema Imported schema is not in default namespace, so has a prefix All elements belonging to this namespace use this prefix /03/2009XML at light

Data format ILDG specified format –All gauge configurations in the same format Based on NERSC data layout SciDAC LIME records UKQCD perspective :^( –CPS cannot do this :^( :^) –Chroma can :^) –qdp++ tools exist for conversion etc /03/2009XML at light

How to generate QCDml Ensemble MD requires a human –Use schema aware tool (cmm uses XMLspy) –Take existing XML ID and hack –Not that hard, only once per ensemble Config XML –Post-processing on QCDOC –Example of DWF data /03/2009XML at light

DWF data CPS on QCDOC writes –Data in NERSC format –VML files containing MD –Parameters of objects –Effective check-pointing –Data stored with meaningful path- and filenames –Includes binary and source code version Satisfies important constraint :^) –Recreate data from Metadata :^) /03/2009XML at light

From one scheme to another Series of scripts and utilities can do conversion –qdp++ –cmm –/host/cmaynard/tools/scripts Utilities compiled for qcdocx –Conversion and submit to grid /03/2009XML at light

Scripts Data conversion and XML chunks are built by scripts makeQCDmlConfig.sh glues XML together dataSet.sh contains dataset specifications –Plus sundries /03/2009XML at light

Chris MaynardILDG 13 December Problems with XML Lattice QCD (meta)data is really mathematics –XML is not really ideal for storing this data QCDml has defined common names for etc –Even WilsonAction has more than one common usage –Kappa versus mass Algorithm metadata is too complex for common names –Not really defined in the metadata –Unstructured parameter values included This is OK because –an ensemble is defined by the action –not the algorithm used to generate it Extending to propagators and correlators is hard for the same reason as defining the algorithm

Chris MaynardILDG 13 December We need an application XML does not DO anything For it to be useful we need to do something with it! –What do we want to do with it? –Is QCDml good for this purpose –QCDml design focused on searching the metadata catalogue –This was probably a good idea! Xpath used to query XML databases –Basic tools/APIs exist for constructing queries –Cf. UKQCD DiGS GUI browser, LDG web-client and JLDG faceted navigation application Metadata capture –How do we create XML IDs? –Does any application actually write QCDml? –UKQCD does post-processing Data provenance –Does QCDml provide this? Hard-work

Chris MaynardILDG 13 December What next? QCDml seems to work OK –How much is it being used? We don’t have many applications that DO something with it  CMM’s Questions for ILDG –What do we want to do with metadata? –Do we have the right sort of metadata for this? –What tools or applications do we need? Someone then has to build them if we don’t ask, we don’t get!  Can we review QCDml usage to define what tools we need?

Work flow tools Graphical tools for linking work together –components could be … –Machine job submission tasks –The actual MC code –data logistics Now used by many areas of science, –particle physics experiments –chemistry Examples –Unicore –“my experiment” 10-12/03/2009XML at light

Topics for Discussion Technical –tools –Metadata capture –Data conversion –Use of QCDml –Data Curation, –Data provenance Sociological –Encouraging other groups to join –ILDG paper –Funding 10-12/03/2009XML at light

Finally /03/2009XML at light