Presentation is loading. Please wait.

Presentation is loading. Please wait.

EXtensible Characterisation Languages (XCL) Manfred Thaller, (University at Cologne) DPP meeting, Glasgow, Nov. 23 rd 2006.

Similar presentations


Presentation on theme: "EXtensible Characterisation Languages (XCL) Manfred Thaller, (University at Cologne) DPP meeting, Glasgow, Nov. 23 rd 2006."— Presentation transcript:

1 eXtensible Characterisation Languages (XCL) Manfred Thaller, (University at Cologne) DPP meeting, Glasgow, Nov. 23 rd 2006

2 M. Thaller DPP meeting, Glasgow, Nov. 23 rd 2006 Vision:

3 M. Thaller DPP meeting, Glasgow, Nov. 23 rd 2006 Vision:

4 M. Thaller DPP meeting, Glasgow, Nov. 23 rd 2006 Vision:

5 M. Thaller DPP meeting, Glasgow, Nov. 23 rd 2006 Vision:

6 M. Thaller DPP meeting, Glasgow, Nov. 23 rd 2006 Vision:

7 Questions … M. Thaller DPP meeting, Glasgow, Nov. 23 rd 2006 1. Is all information contained within oldFormat also contained within newFormat?

8 Questions … M. Thaller DPP meeting, Glasgow, Nov. 23 rd 2006 1. Is all information contained within oldFormat also contained within newFormat? 2. Is all information, which is relevant for the usage of the information, within oldFormat also contained within newFormat?

9 Questions … * M. Thaller DPP meeting, Glasgow, Nov. 23 rd 2006 1. Is all information contained within oldFormat also contained within newFormat? 2. Is all information, which is relevant for the usage of the information, within oldFormat also contained within newFormat? 3. Is the conversion process a(oldFormat, newFormat) better than b(oldFormat, newFormat), i.e. does it preserve more of the information contained within oldFormat?

10 Building Block I: XCEL M. Thaller DPP meeting, Glasgow, Nov. 23 rd 2006 A language, which allows a program to read "any file specification" based on a ==> "eXtensible Characterisation Extraction Language" Formulate the humanly readable specifications of TIFF, RTF, WAV …in a language, which a general purpose program can read. General enough that any existing format specification can be expressed in it. (LATeX, MAX, VRML …)

11 XCEL – Structuring Elements M. Thaller DPP meeting, Glasgow, Nov. 23 rd 2006 range item subitem item symbol property

12 XCEL – Structuring Elements M. Thaller DPP meeting, Glasgow, Nov. 23 rd 2006 Byte offsets: 1000, 1248 Truly binary files: Most sound, image formats Binary addressable files: PDF, Max

13 XCEL – Structuring Elements M. Thaller DPP meeting, Glasgow, Nov. 23 rd 2006 Procedures: p(begin, trigger) q(trigger,filter,implication) Encoded / mark up files: RTF, TeX, SVG, VRML …

14 XCEL – Structuring Elements * M. Thaller DPP meeting, Glasgow, Nov. 23 rd 2006 Procedures: p(current_Position, ”). q(“ ”,pair(“ ”,” ”), implyBy(“ ”)) Encoded / mark up files: RTF, TeX, SVG, VRML …

15 Building Block II: XCDL M. Thaller DPP meeting, Glasgow, Nov. 23 rd 2006 A language, which allows a program to describe "any file content" using a ==> "eXtensible Characterisation Definition Language" Formulate the content of any file in an abstract language, which captures the complete information contained in it. General enough that any existing content can be expressed in it.

16 XCDL: Basic Architecture M. Thaller DPP meeting, Glasgow, Nov. 23 rd 2006 1. Sequences of bytes 2. With properties applicable to subsequences

17 XCDL: Basic Architecture M. Thaller DPP meeting, Glasgow, Nov. 23 rd 2006 Ashes to Ashes once more {\rtf1\ansi\ansicpg1252\deff0\deflang1031{\fonttbl{\f0\fswiss\f charset0 Arial;}}\viewkind4\uc1\pard\f0\fs20 \b Ashes\b0 to \b Ashes\b0 once \b more\b0.\par}

18 XCDL: Basic Architecture M. Thaller DPP meeting, Glasgow, Nov. 23 rd 2006 Ashes to Ashes once more. boldFace Ashes more

19 XCDL: Basic Architecture M. Thaller DPP meeting, Glasgow, Nov. 23 rd 2006 Assumption 1: A file format is a set of rules which formalize all knowledge needed to process the binary information contained within a distinct and complete block of binary information, traditionally called a file.

20 XCDL: Basic Architecture M. Thaller DPP meeting, Glasgow, Nov. 23 rd 2006 Assumption 2: The extensible characterisation extraction language is designed to be able to express all such rules within a given file format. The extensible characterisation definition language is designed to be able to describe all the information contained within a file the format of which is described by a valid XCEL description.

21 XCDL: Basic Architecture *M. Thaller DPP meeting, Glasgow, Nov. 23 rd 2006 Assumption 3: A specific XCEL description is not required to express all the rules within a specific file format. A XCDL derived from such a partial XCEL will, therefore, potentially also contain only part of the information of a file encoded in that format. Even when the XCEL describes a format completely, an extractor is not required to extract all characteristics of a file. Some characteristics are only important for processing: compression method not important, after decompression succeeded.

22 Building Block III: Metrics M. Thaller DPP meeting, Glasgow, Nov. 23 rd 2006

23 Building Block III: Metrics M. Thaller DPP meeting, Glasgow, Nov. 23 rd 2006 Starting in month 13. However...

24 Metrics: Basic Assumptions M. Thaller DPP meeting, Glasgow, Nov. 23 rd 2006 Currently bottom up approach: Observe characteristics occuring within files … … and build name libraries from them. {"color depth", "# of planes"} => colorDepth

25 Metrics: Basic Assumptions M. Thaller DPP meeting, Glasgow, Nov. 23 rd 2006 Later parallel top down approach: Create file characteristics ontology … … and link it to the name libraries. "width" in image file != "width" in text file.

26 Metrics: Example I M. Thaller DPP meeting, Glasgow, Nov. 23 rd 2006 Percentage of bytes in a binary stream which are preserved within range of +/- 5 of original. (Images: Would scarcely be observable on screen.) E.g. relevant when colorspace appropriate for printing is transformed into a colorspace optimized for screen.

27 Metrics: Example II M. Thaller DPP meeting, Glasgow, Nov. 23 rd 2006 Degree to which font applied recreates the original typesetting characteristics. (Texts:Derived metric from comparison of font metrics.)

28 Metrics: Problem M. Thaller DPP meeting, Glasgow, Nov. 23 rd 2006 Problem not so much individual metrics but summation rules. An image migration step preserves 98 % of the image bytes within +/- 1 %. It also preserves 4 of 20 ( = 25 %) boolean properties (creator, scanning equipment …). Quality of the migration: (0.98 + 0.25) / 2 =.615?

29 Metrics: Problem *M. Thaller DPP meeting, Glasgow, Nov. 23 rd 2006 Possible solution: " weights derived from PP. An image migration step preserves 98 % of the image bytes within +/- 1 %. It also preserves 4 of 20 ( = 25 %) boolean properties (creator, scanning equipment …). Weight engineering metrics by "arbitrary Quality of the migration: 0.98*w 1 + 0.25*w 2 / 2 =

30 M. Thaller DPP meeting, Glasgow, Nov. 23 rd 2006

31 Thank you! M. Thaller DPP meeting, Glasgow, Nov. 23 rd 2006


Download ppt "EXtensible Characterisation Languages (XCL) Manfred Thaller, (University at Cologne) DPP meeting, Glasgow, Nov. 23 rd 2006."

Similar presentations


Ads by Google