Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Document Examiner Feature Extraction: Thinned vs Skeletonised Images Vladimir Pervouchine and Graham Leedham Forensics and Security Laboratory School.

Similar presentations


Presentation on theme: "1 Document Examiner Feature Extraction: Thinned vs Skeletonised Images Vladimir Pervouchine and Graham Leedham Forensics and Security Laboratory School."— Presentation transcript:

1 1 Document Examiner Feature Extraction: Thinned vs Skeletonised Images Vladimir Pervouchine and Graham Leedham Forensics and Security Laboratory School of Computer Engineering Nanyang Technological University Singapore

2 2 Outline Forensic handwriting examination The need for accurate stroke extraction Thinning based method Vector skeletonisation method Feature extraction –From thinned images –From vector skeletons Writer classification method Results Conclusions

3 3 Variation of the word “the” written by 8 different writers. Source: Harrison, 1981 Forensic handwriting examination

4 4 Variation of the letters “G” and “R” written by 15 different writers. Source: Harrison, 1981 Forensic handwriting examination

5 5 Example of variation in letter formation styles in 10 letters from 9 different writers. Source: Harrison, 1981 Forensic handwriting examination

6 6 Current Methods used by Forensic Document Examiners Primarily involves manual extraction and comparison of various global and local visible features. They are usually doing a comparison test between a “Questioned Document” and a set of “Known Documents”. The objective is to determine whether the “Questioned Document” was, or was not, written by a particular individual. The “Questioned Document” may be in disguised handwriting.

7 7 Forgery / Disguise / Alteration (i)Is the writing GENUINE? (the author is who he claims to be) (ii)Is the writing FORGED? (the author is not who he claims to be and is attempting to assert the writing is the same as someone else’s) or (iii)Is the writing DISGUISED? (the author wishes to deny doing the writing at a later date) or (iv)Is the writing ALTERED? (Has someone modified or altered the original document?)

8 8 Extraction of handwritten strokes from images Forensic document examiners analyse the pen tip trajectory The trajectory is not readily available from the grayscale handwriting images To mimic extraction of document examiner features it is necessary to approximate pen trajectory We need to preserve individual information in character shapes Many algorithms have been proposed for a similar problem in offline handwriting recognition, but they do not need to preserve the individual traits of characters

9 9 Thinning based stroke approximation Matlab Image Processing toolbox thinning (Zhang and Suen thinning algorithm) is used for the first approximation Post processing is applied to –remove extra branches –remove spurious loops –remove small connected components Feature extraction attempts to overcome remaining artifacts Original image Binarisation Thinning Remove small connected components Find junction points Find end points Correct spurious loops Prune short branches While changes are made

10 10 Thinning based stroke approximation 4. Corrected image2. Binarised image3. Thinned image1. Original image

11 11 Vector skeletonisation method 1 st stage: vectorisation. Spline- approximated skeletal branches are formed 2 nd stage: minimum cost configuration of branch interconnections is found. Branches are grouped into strokes –For each retraced segment of stroke restoration of hidden loop is attempted 3 rd stage: Near-junction and loop spline knots are adjusted to make strokes smoother Original image Vectorisation Binary encoding of junction points configuration GA optimisation to find configuration with lowest cost Adjustment of loop and near-junction knots

12 12 Vector skeletonisation method 1. Original image2. Skeletal branches 3. Strokes with retraced segments and loops 4. Adjusted skeleton

13 13 Feature extraction: list of features Features extracted from both raster and vector skeletons 1.Height 2.Width 3.Height to width ratio 4.Distance HC 5.Distance TC 6.Distance TH 7.Angle between TH and TC 8.Slant of stem of t 9.Slant of stem of h 10.Position of t-bar 11.Connected/disconnected t and h 12.Average stroke width 13.Average pseudo-pressure 14.Standard deviation of average pseudo-pressure Features extracted from vector skeleton only 15.Standard deviation of stroke width 16.Number of strokes 17.Number of loops and retraced branches 18.Straightness of t-stem 19.Straightness of t-bar 20.Straightness of h-stem 21.Presence of loop at top of t-stem 22.Presence of loop at top of h- stem 23.Maximum curvature of h-knee 24.Average curvature of h-knee 25.Relative size (diameter) of h- knee

14 14 Feature extraction Position of t-bar feature is binary: 1 if t-bar crosses stem and 0 if touches or is separated or missing Size of h-knee is measured parallel to a horizontal line Pseudo-pressure is measured as the gray level normalised to 1. Straightness is measured as the ratio of the stroke length to the distance between its ends h-knee t-stemh-stem t-bar

15 15 Writer classification scheme Constructive ANN with spherical threshold units (DistAl) was used as classifier 100 samples of grapheme “th” drawn from 20 different writers 5-fold cross-validation method is used to evaluate classification accuracy Three experiments: –Original feature set (features 1-14), features extracted using raster skeleton –Original feature set, features extracted using vector skeleton –Extended feature set (features 1-25),features extracted from vector skeleton Additionally, accuracy of feature extraction was measured

16 16 Results: accuracy of feature extraction Extraction software performed analysis of shape to detect various parts of character Analysis was performed step by step At each step some feature was extracted If at least one feature was not extracted or extracted incorrectly, the sample was counted as “failure” MethodAccuracy, % Raster87 Vector94 Input: original image, binarised image, skeleton Height, width, height to width ratio Analysis of branches originating from top end points Stem features Search for t-bar … Feature vector

17 17 Results: accuracy of writer classification Conclusions Use of vector skeleton results in less feature extraction failures Use of vector skeleton produces higher writer classification accuracy even on the same feature set – this indicates that feature values are measured more accurately Vector skeletonisation enables extraction of more structural features, which, in turn, increases writer classification accuracy MethodWriter classification accuracy, % Original feature set + raster skeleton73 Original feature set + vector skeleton87 Extended feature set + vector skeleton98


Download ppt "1 Document Examiner Feature Extraction: Thinned vs Skeletonised Images Vladimir Pervouchine and Graham Leedham Forensics and Security Laboratory School."

Similar presentations


Ads by Google