Presentation is loading. Please wait.

Presentation is loading. Please wait.

METS - API application programming interface

Similar presentations


Presentation on theme: "METS - API application programming interface"— Presentation transcript:

1 METS - API application programming interface
METS Implementors Meeting, May 8th, 2007 There are already lots of programming interfaces to parse / create XML data as e.g. DOM or SAX – so what is an API good for? Markus Enders, SUB Göttingen Jens Ludwig, SUB Göttingen

2 Why? necessity of an API There are already lots of programming interfaces to parse / create XML data as e.g. DOM or SAX – so what is an API good for?

3 Why? METS has a complex data model:
the most common instantiation of METS is its XML form an API should be based on the data model and is (theoretically) independent of its XML representation complex data model consists of different objects which are related to each other This data model can be in other forms as well: e.g. in related database models An API could possibly provide access to METS-data, which is stored in databases for better / faster processing. Especially bigger METS files – e.g. with a huge amount of technical metadata for preservation may easily be 1 or 2 MB in size.

4 Why? API should be focused on METS elements and their appropriate attributes and relationships API should support creation of METS as well: Working with DOM might cause invalid METS files (files, which are not valid DOM works on XML level – in order to create an METS element you need to create several different kind of nodes and set their relationsships manually; easy to creat invalid METS files. creation of invalid data should not be possible (e.g. wrong order of elements...) 100% valid METS data

5 Why? Multi-Tier Applications:
API connects application with serialization level. Working with DOM might cause invalid METS files (files, which are not valid DOM works on XML level – in order to create an METS element you need to create several different kind of nodes and set their relationsships manually; easy to creat invalid METS files. API as a framework for METS creation / parsing

6 Why? METS API Applikation XML Repository Database
Working with DOM might cause invalid METS files (files, which are not valid DOM works on XML level – in order to create an METS element you need to create several different kind of nodes and set their relationsships manually; easy to creat invalid METS files. XML Repository Database

7 Implementation Issues:
Maintainance: Changes in METS-schema must be reflected by API Programming language: more than one language should be supported several options possible: API should be easy to maintain API should give access to the lowest METS level – to the elements and attributes, but needs to provide higher functionality as well (e.g adding metadata to a file or a structural element). multi-level access: Granularity of access

8 Implementation Issues:
Maintainance: Changes in METS-schema must be reflected by API Derive classes from xml-schema: e.g. Apache xmlbeans or SUN JAXB provides java classes for xml-schema Programming language: more than one language should be supported several options possible: API should be easy to maintain API should give access to the lowest METS level – to the elements and attributes, but needs to provide higher functionality as well (e.g adding metadata to a file or a structural element). multi-level access: Granularity of access

9 Implementation Issues:
Maintainance: Changes in METS-schema must be reflected by API Programming language: more than one language should be supported several options possible: API should be easy to maintain API should give access to the lowest METS level – to the elements and attributes, but needs to provide higher functionality as well (e.g adding metadata to a file or a structural element). php-java bridge: Inline-Java perl module: multi-level access: Granularity of access

10 Implementation Issues:
Maintainance: Changes in METS-schema must be reflected by API access to single elements / attributes higher level for more widespread functionality Programming language: more than one language should be supported several options possible: API should be easy to maintain API should give access to the lowest METS level – to the elements and attributes, but needs to provide higher functionality as well (e.g adding metadata to a file or a structural element). multi-level access: Granularity of access

11 Implementation Issues:
Apache xmlbeans based API for java Creates an interface for each schema object and an implementation to read / write this object to XML Other implementations possible (repository) xmlbeans: implementation which allows only access to xml-based data (xml files); xmlbeans is an Apache project easy integration with other xmlbeans based APIs; if these APIs are not available, they can be generated from an xml-schema as the API is build on top of the xml schema, which represents the METS data model, implementations to access METS data in databases etc. are possible as well. Can create DOM tree at any time, e.g. if non-schema based xml-data needs to be stored.

12 Implementation Issues:
level one: METSbeans allows acces to single METS elements, attributes and their relationships xmlbeans based API for java xmlbeans: implementation which allows only access to xml-based data (xml files); xmlbeans is an Apache project easy integration with other xmlbeans based APIs; if these APIs are not available, they can be generated from an xml-schema as the API is build on top of the xml schema, which represents the METS data model, implementations to access METS data in databases etc. are possible as well. level two: more complex functions which are based on the METSbeans

13 METSbeans every type from schema becomes one class
classes are generated automatically from the XML-schema generation process is very easy; only needs an xml-schema additional APIs can be generated and integrated for any xml-schema based data format (e.g. MODS, premis etc.)

14 METSbeans internal architecture:
for every type in the xml schema, an appropriate java interface exists every interface is implemented during automatic generation process generation process is very easy; only needs an xml-schema additional implementations of an interface are possible – high flexibility to access mets-data outside a file system

15 METSbeans internal architecture:
<xsd:complexType name="divType"> interface: DivType example, how the <div> element is mapped to a java class divType becomes DivType because of java coding conventions: classnames always begin with a capital letter class: DivTypeImpl

16 METSbeans internal architecture:
xmlbeans has a set of native data types: XMLObject, XMLString XMLShort, XMLTime etc... native data types a good for building interfaces, and to combine different APIs (e.g. the METS API with a MODS API – see examples later on)

17 METSbeans internal architecture:
METSDocument as topmost class instantiates the document. All other objects cannot be created without this object Instance can be created by: parsing a file using a factory class to create new document

18 METSbeans snippet: MetsDocument example factory class:
MetsDocument mets=MetsDocument.Factory.newInstance(); example parsing a file: generation process is very easy; only needs an xml-schema try { xml = XmlObject.Factory.parse(f); } catch (XmlException e) { e.printStackTrace(); return false; } MetsDocument metsDoc=(MetsDocument) xml;

19 METSbeans DivType: methods for accessing <mprtr> element
insertNewMptr and addNewMptr will create a new Mptr object getMptrArray(), getMptrArray(int i), sizeOfMptrArray(), setMptrArray(Mptr[] mptrArray), setMptrArray(int i, Mptr mptr), insertNewMptr(int i), addNewMptr(); removeMptr(int i)

20 METSbeans DivType: methods for accessing <div> element
very similar to <mptr> element getDivArray() getDivArray(int i) sizeOfDivArray() setDivArray(DivType[] divArray) setDivArray(int i, DivType div) insertNewDiv(int i) addNewDiv() removeDiv(int i)

21 METSbeans DivType: very similar methods for handling file pointers (<fptr> elements)

22 METSbeans DivType: methods to set attributes (id attribute) getID();
example shows the id attribute; methods for other attributes like "order" are very similar xsetID and xgetID works with native xmlbeans datatypes instead of native jaca data types getID(); isSetID(); setID(String id); unsetID(); xsetID(org.apache.xmlbeans.XmlID id); xgetID();

23 METSbeans snippet: create a new <div> element MetsDocument
first we create a document then we create a <mets> element addNewStructMap creates an empty (new) <structMap> element addNewDiv creates an empty (new) <div> element setTYPE sets the type-attribute of the <div> element in the end we create a new <div> element as a child and set its type. MetsDocument mets=MetsDocument.Factory.newInstance(); MetsType myMets=mets.addNewMets(); StructMapType sm=myMets.addNewStructMap(); DivType div=sm.addNewDiv(); div.setTYPE("Monograph"); DivType firstchild=div.addNewDiv(); firstchild.setTYPE("TitlePage");

24 METSbeans snippet: saving a METS document
HashMap suggestedPrefixes = new HashMap(); suggestedPrefixes.put(" "mets"); suggestedPrefixes.put(" "xlink"); XmlOptions opts = new XmlOptions(); opts.setSaveSuggestedPrefixes(suggestedPrefixes); File outputFile=new File(filename); mets.save(outputFile,opts);

25 METSbeans MdSecType represents the METS elements may contain: MdRef or
MdWrap object <dmdSec> <techMd> <digiprovMd> <rightsMd> <sourceMd> MdSecType may only contain an MdRef or MdWrap element <amdSec> is of the type amdSecType as it may contain different objects... but not: <amdSec>

26 METSbeans snippet: create an MdSecType object
first we create an <dmdSec> element for the MetsType instance and add the ID attribute to it Then we <mdwrap> and <xmlData> elements Finally we add an object to <xmlData> - this object can only be an XMLObject (native xmlbeans type). This can also be an XMLString... MetsDocument mets=MetsDocument.Factory.newInstance(); MetsType myMets=mets.addNewMets(); MdSecType dmdSec= myMets.addNewDmdSec(); dmdSec.setID("DMDID01"); MdSecType.MdWrap mdwrap=dmdSec.addNewMdWrap(); MdSecType.MdWrap.XmlData xmldata=mdwrap.addNewXmlData(); xmldata.set(modsObject); any XMLObject: e.g XMLString

27 METSbeans snippet: create an MdSecType object String: Document:
XmlString xs=XmlString.Factory.newValue("<mydata/>"); xmldata.set(xs); XMLString is a derived class from XMLObject we create an XMLString object using a factory class, which must contain at least wellformed xml this method can be used, if no schema is available for the metadata If a schema is available and is pretty complex, it is recommended to use an own xmlbeans based API to generate (or parse) this metadata. This approach is carried out in the second example: a ModsDocument is created and added to the <xmlData> element (xml-object). Document: ModsDocument modsObject=ModsDocument.Factory.newInstance(); ModsType myMods=mods.addNewMods(); IdentifierType identifier=myMods.addNewIdentifier(); .... xmldata.set(modsObject);

28 METSbeans parse mets data: the API provides some parse-methods:
parse(java.lang.String xmlAsString) parse(java.io.File file) parse(java.net.URL u) parse(java.io.InputStream is) parse(org.w3c.dom.Node node) parse a string, a file, a stream or even a node from a DOM tree. if the parsed data is NOT valid METS a XmlException is thrown.

29 METSbeans snippet: parse mets data File f=new File(filename);
XmlObject xml; try { xml = XmlObject.Factory.parse(f); } catch (XmlException e) { e.printStackTrace(); } catch (IOException e) { } MetsDocument metsDoc=(MetsDocument) xml; create a File object, parse it and catch a few exceptions do a type cast in the end

30 METSbeans snippet: get a DivType
get <mets> element from the MetsDocument create an array of all <structMap> a elements and iterate over this array in the loop we get the contents of the type attribute and compare it with "LOGICAL" if it is "LOGICAL" we get the (one and only) <div> (uppermost div) and its type MetsDocument metsDoc=(MetsDocument) xml; MetsType mets=inDoc.getMets(); StructMapType structs[]=mets.getStructMapArray(); for (int i=0; i<structs.length;i++){ StructMapType struct=structs[i]; String structtype=structs[i].getTYPE(); if ((structtype!=null)&&( structtype.equals("LOGICAL"))){ DivType div= struct.getDiv(); String divtype=div.getTYPE(); return divtype; }

31 METSbeans easy to create and parse valid METS data (much easier than parsing DOM trees) easy to combine with other xml data quite fast compared to DOM get <mets> element from the MetsDocument create an array of all <structMap> a elements and iterate over this array in the loop we get the contents of the type attribute and compare it with "LOGICAL" if it is "LOGICAL" we get the (one and only) <div> (uppermost div) and its type Drawback: as based on xmlbeans it is only available for java; php-java / inline::java modul needed for php/perl

32 Helper-class Functions: Need for additional high-level functions:
Though the METSbeans allow access to every single METS element, it is still a complex task to do simple things e.g. adding metadata to a <div> to add metadata to a DIV we must: create a dmdSec create appropriate <mdWrap> and <xmlData> elements create an ID and store it in the id-attribute of the <dmdSec> element add this ID as an IDREF to the <div> element Helper-class needed, which sits on top of MetsBeans

33 Helper-class Functions:
Following examples are from experiences working with METSbeans (based on METSbeans) No official implementation, just an excerpt of functions which a level 2 API could provide

34 Helper-class Functions: Create DMDSec for common METS-objects:
These functions would create a <dmdSec> for a a structural entity or a file An XMLObject can be: XMLString containing xml-data or a XMLDocument (e.g. e MODSDocument) createDMDSec(XMLObject inMetadata, DivType inDiv) FileType inFile) ...

35 Helper-class Functions:
Create adminsitrative metadata for common METS-objects: e.g. creates a "techMD", "sourceMD", according to the type parameter. inAmdSec gives information, if it's null, a new AmdSec is created The function might create an ID-attribute and add its content to the <div> element specified by inDiv. The data of inMetadata is added to this section. createMDSectionInAMDSec( XMLObject inMetadata, String type, DivType inDiv, AmdSecType inAmdSec) ...

36 Helper-class Functions:
function to retrieve special metadata sections by ID or TYPE: retrieves the appropriate MDSecType object by ID or TYPE (according to it's attributes value). While parsing a METS file, these functions will allow easy access to metadata (for <div> or <file> elements getMDSecTypeByID( String inID) getMDSecTypeByType( String inType) ...

37 Helper-class Functions:
functions to get related files (to a <div> element): getAllFilesForDivType follows the filepointers for each <div> and returns all FileType objects (as an array)... getAllFilesForFileGroup returns all files of a file groups and its underlying file groups (recursivley) getAllFilesForDivType( DivType inDiv) getAllFilesForFileGroup( FileGrpType inGrp) ...

38 Extension schema Integration of extension schema:
Export MetsBeans-objects as DOM tree. Create Beans for extensions schema as well: Premis, MODS, MIX - Beans. getAllFilesForDivType follows the filepointers for each <div> and returns all FileType objects (as an array)... getAllFilesForFileGroup returns all files of a file groups and its underlying file groups (recursivley)

39 Extension schema Example: create MODS data
getAllFilesForDivType follows the filepointers for each <div> and returns all FileType objects (as an array)... getAllFilesForFileGroup returns all files of a file groups and its underlying file groups (recursivley) MdSecType dmdSec=mets.addNewDmdSec(); dmdSec.setID(dmdid_string); MdSecType.MdWrap mdwrap=dmdSec.addNewMdWrap(); MdSecType.MdWrap.XmlData xml=mdwrap.addNewXmlData(); ModsDocument mods=ModsDocument.Factory.newInstance(); ModsType myMods=mods.addNewMods(); xml.set(mods);

40 Extension schema Example: create <premis:object> data
MdSecType.MdWrap mdwrap=dmdSec.addNewMdWrap(); MdSecType.MdWrap.XmlData xml=mdwrap.addNewXmlData(); ObjectDocument objdoc=ObjectDocument.Factory.newInstance(); ObjectDocument.Object premis_object=objdoc.addNewObject(); xml.set(objdoc);

41 Extension schema Example: parse MODS data MdSecType dmdSec; ....
MdSecType.MdWrap mdw= dmdSec.getMdWrap(); MdSecType.MdWrap.XmlData xml_data=mdw.getXmlData(); String result=xml_data.xmlText(); ModsDocument mods=ModsDocument.Factory.parse(result);

42 Problems?! Quality of the API API depends on XML-schema;
quality of API depends on quality of schema. MetsType fpr <mets> DivType for <div> MdSecType for <dmdSec>,.... but not type for METS-Header <metsHdr> as it is defined inline

43 Problems?! Integration of extension schema
Problematic, if extension schema do not have a top-level element; especially parsing is difficult: String result=xml_data.xmlText(); ModsDocument mods=ModsDocument.Factory.parse(result); result must always contain a valid XML-document! e.g DublinCore simple

44 How to continue Work with METSbeans
everybody can create METSbeans by him/herself -> see Apache xmlbeans Downloadable from GDZ website Will provide a primer as a non-complete documention for METSbeans. primer is necessary, as no java-doc is available!

45 How to continue Identify necessary functions for helper-class
Over time we will identify additional methods which might be useful and should be integrated in the "helper-class".

46 Application Layer can be build on top of METSbeans
Profile specific implementations can be build on top of METSbeans and provide an API to the underlying document/content model.

47 Application Layer METS API can be build on top of METSbeans
Applikation Applikation API for content model helper class METS API XML serialization


Download ppt "METS - API application programming interface"

Similar presentations


Ads by Google