Presentation is loading. Please wait.

Presentation is loading. Please wait.

FITS: The File Information Tool Set

Similar presentations


Presentation on theme: "FITS: The File Information Tool Set"— Presentation transcript:

1 FITS: The File Information Tool Set

2 Background FITS is part of the second generation Harvard University Library Digital Repository Service(DRS2), which supports content models and METS/PREMIS object descriptors. Developed Fall 2008 First public release Spring 2009:

3 Why? Needed an automatic way to identify and extract metadata for a wide range of file types No single file analysis tool satisfied our needs

4 Design Goals Act as a wrapper around other open source tools
Extensible Needs to be a standalone command line tool and also provide an API Allow priority setting for tools Open source

5 The Tools Current tools: 3 Categories Jhove 1.5 Exiftool
National Library of New Zealand Metadata Extractor (NLNZ) DROID FFIdent File Utility 3 Categories File Identification (all of them) Metadata Extraction (Jhove, Exiftool, NLNZ) format Validation (Jhove)

6 Process

7 Features Conflict management Value normalization Tool prioritization
“inches” vs “2” Tool prioritization Format tree for understanding more specific format identities. PDF/A is a more specific version of PDF

8 Example Output <fits> <identification>
<identity format="Graphics Interchange Format" mimetype="image/gif"> <tool toolname="Jhove" toolversion="1.5" /> ... </identity> </identification> <fileinfo> <size toolname="OIS File Information" toolversion="0.1" status="SINGLE_RESULT">40149</size> <md5checksum toolname="OIS File Information" toolversion="0.1" status="SINGLE_RESULT">265c9345ebf93c89d472766fda095de4</md5checksum> </fileinfo> <filestatus> <well-formed toolname="Jhove" toolversion="1.5" status="SINGLE_RESULT">true</well-formed> <valid toolname="Jhove" toolversion="1.5" status="SINGLE_RESULT">true</valid> </filestatus> <metadata> <image> <height toolname="Jhove" toolversion="1.5" status="SINGLE_RESULT">1024</height> </image> </metadata> </fits>

9 Configuration All settings are in the fits.xml config file
Enable/disable tools (available in the API too) Prevent tools from processing files with specific file extensions Set tool priority Add new tools Use your own consolidator code Report or ignore conflicts Options to display original tool output

10 Sample Configuration File
<fits_configuration> <!-- Order of the tools determines preference --> <tools> <!-- exclude-exts attribute is a comma delimited list of file extensions that the tool should not try to process --> <tool class="edu.harvard.hul.ois.fits.tools.jhove.Jhove" exclude-exts="dng,mbx"/> <tool class="edu.harvard.hul.ois.fits.tools.fileutility.FileUtility" exclude-exts="dng,wps"/> <tool class="edu.harvard.hul.ois.fits.tools.exiftool.Exiftool" exclude-exts="txt,wps,vsd"/> <tool class="edu.harvard.hul.ois.fits.tools.droid.Droid" exclude-exts="dng"/> <tool class="edu.harvard.hul.ois.fits.tools.nlnz.MetadataExtractor" exclude- exts="dng,zip,odb,ott,odg,otg,odp,otp,ods,ots,odc,otc,odi,oti,odf,otf,odm,oth"/> <tool class="edu.harvard.hul.ois.fits.tools.oisfileinfo.FileInfo"/> <tool class="edu.harvard.hul.ois.fits.tools.oisfileinfo.XmlMetadata"/> <tool class="edu.harvard.hul.ois.fits.tools.ffident.FFIdent" exclude-exts="dng,wps,vsd"/> </tools> <output> <dataConsolidator class="edu.harvard.hul.ois.fits.consolidation.OISConsolidator"/> <display-tool-output>true</display-tool-output> <report-conflicts>true</report-conflicts> <validate-tool-output>false</validate-tool-output> <internal-output-schema>xml/fits_output.xsd</internal-output-schema> <external-output-schema> <fits-xml-namespace> </output> <!-- file name of the droid signature file to use in tools/droid/--> <droid_sigfile>DROID_SignatureFile_V35.xml</droid_sigfile> </fits_configuration> 10

11 Some Limitations... Speed
Technical metadata only returned if the tool that reported it is in the first <identity> block FITS considers a successful identification to be a combination of the format name and mime type

12 Future Plans More tools
Apache Tika (text document formats) Jhove 2 Aduna Aperture (text, documents, formats) Mediainfo (audio and video formats) Better audio and video format support as we add object support for them to DRS2

13 Wrap Up http://fits.googlecode.com http://ots-schemas.googlecode.com
Java library for reading and writing METS (limited support), MODS, PREMIS, MIX, TextMD, DocumentMD, and soon AES audio metadata More information on DRS2: ments.html


Download ppt "FITS: The File Information Tool Set"

Similar presentations


Ads by Google