Presentation on theme: "Document Examiner Feature Extraction: Thinned vs Skeletonised Images"— Presentation transcript:
1Document Examiner Feature Extraction: Thinned vs Skeletonised Images Vladimir Pervouchine and Graham LeedhamForensics and Security LaboratorySchool of Computer EngineeringNanyang Technological UniversitySingapore
2Outline Forensic handwriting examination The need for accurate stroke extractionThinning based methodVector skeletonisation methodFeature extractionFrom thinned imagesFrom vector skeletonsWriter classification methodResultsConclusions
3Forensic handwriting examination Variation of the word “the” written by 8 different writers. Source: Harrison, 1981
4Variation of the letters “G” and “R” written by 15 different writers. Forensic handwritingexaminationVariation of the letters “G” and “R” written by 15 different writers.Source: Harrison, 1981
5Forensic handwritingexaminationExample of variation in letter formation styles in 10 letters from 9 different writers.Source: Harrison, 1981
6Current Methods used by Forensic Document Examiners Primarily involves manual extraction and comparison of various global and local visible features.They are usually doing a comparison test between a “Questioned Document” and a set of “Known Documents”.The objective is to determine whether the “Questioned Document” was, or was not, written by a particular individual.The “Questioned Document” may be in disguised handwriting.
7Forgery / Disguise / Alteration Is the writing GENUINE? (the author is who he claims to be)Is the writing FORGED? (the author is not who he claims to be and is attempting to assert the writing is the same as someone else’s) orIs the writing DISGUISED? (the author wishes to deny doing the writing at a later date) orIs the writing ALTERED? (Has someone modified or altered the original document?)
8Extraction of handwritten strokes from images Forensic document examiners analyse the pen tip trajectoryThe trajectory is not readily available from the grayscale handwriting imagesTo mimic extraction of document examiner features it is necessary to approximate pen trajectoryWe need to preserve individual information in character shapesMany algorithms have been proposed for a similar problem in offline handwriting recognition, but they do not need to preserve the individual traits of charactersNeed for accurate stroke representation:Many features used by forensic document examiners to describe character shapes are extracted from the pen tip trajectory. In order to extract the same or similar features automatically it is necessary to have extract the trajectory from grayscale images of handwriting.Since the features of handwriting used to distinguish between different writers, the approximation of pen tip trajectory should preserve the individual traits of the trajectory.Many algorithms for approximation of handwritten strokes have been designed for handwriting recognition. Thus, preservation of the individual traits of characters was a drawback rather than an advantage (since it is better for handwriting recognition when characters written by different people look as similar as possible).On the opposite, for the problem of extraction of features to distinguish writers the letters that characters represent are assumed to be known and the feature extraction is focused on differences in shapes of the same characters written by different people.Hence, approximation of handwritten strokes should be different from that used in handwriting recognition. The more accurately we approximate original pen trajectory, the more accurately we can measure individual features from it.
9Thinning based stroke approximation Matlab Image Processing toolbox thinning (Zhang and Suen thinning algorithm) is used for the first approximationPost processing is applied toremove extra branchesremove spurious loopsremove small connected componentsFeature extraction attempts to overcome remaining artifactsOriginal imageBinarisationThinningRemove small connected componentsFind junction pointsFind end pointsCorrect spurious loopsWhile changes are madePrune short branches
10Thinning based stroke approximation 1. Original image2. Binarised image3. Thinned image4. Corrected image
11Vector skeletonisation method 1st stage: vectorisation. Spline-approximated skeletal branches are formed2nd stage: minimum cost configuration of branch interconnections is found. Branches are grouped into strokesFor each retraced segment of stroke restoration of hidden loop is attempted3rd stage: Near-junction and loop spline knots are adjusted to make strokes smootherOriginal imageVectorisationBinary encoding of junction points configurationGA optimisation to find configuration with lowest costAdjustment of loop and near-junction knots
12Vector skeletonisation method 1. Original image2. Skeletal branches4. Adjusted skeletonRed arrows show segments that changed after adjustment (left to right: two branches, retraced branch, hidden loop)Thick lines were put manually on the skeleton to make it more visible.3. Strokes with retraced segments and loops
13Feature extraction: list of features Features extracted from both raster and vector skeletonsHeightWidthHeight to width ratioDistance HCDistance TCDistance THAngle between TH and TCSlant of stem of tSlant of stem of hPosition of t-barConnected/disconnected t and hAverage stroke widthAverage pseudo-pressureStandard deviation of average pseudo-pressureFeatures extracted from vector skeleton onlyStandard deviation of stroke widthNumber of strokesNumber of loops and retraced branchesStraightness of t-stemStraightness of t-barStraightness of h-stemPresence of loop at top of t-stemPresence of loop at top of h-stemMaximum curvature of h-kneeAverage curvature of h-kneeRelative size (diameter) of h-knee
14Feature extractionPosition of t-bar feature is binary: 1 if t-bar crosses stem and 0 if touches or is separated or missingSize of h-knee is measured parallel to a horizontal linePseudo-pressure is measured as the gray level normalised to 1.Straightness is measured as the ratio of the stroke length to the distance between its endsh-kneet-bart-stemh-stem
15Writer classification scheme Constructive ANN with spherical threshold units (DistAl) was used as classifier100 samples of grapheme “th” drawn from 20 different writers5-fold cross-validation method is used to evaluate classification accuracyThree experiments:Original feature set (features 1-14), features extracted using raster skeletonOriginal feature set, features extracted using vector skeletonExtended feature set (features 1-25),features extracted from vector skeletonAdditionally, accuracy of feature extraction was measured
16Results: accuracy of feature extraction Extraction software performed analysis of shape to detect various parts of characterAnalysis was performed step by stepAt each step some feature was extractedIf at least one feature was not extracted or extracted incorrectly, the sample was counted as “failure”Input: original image, binarised image, skeletonFeature vectorHeight, width, height to width ratioAnalysis of branches originating from top end pointsStem featuresMethodAccuracy, %Raster87Vector94Search for t-bar…
17Results: accuracy of writer classification MethodWriter classification accuracy, %Original feature set + raster skeleton73Original feature set + vector skeleton87Extended feature set + vector skeleton98ConclusionsUse of vector skeleton results in less feature extraction failuresUse of vector skeleton produces higher writer classification accuracy even on the same feature set – this indicates that feature values are measured more accuratelyVector skeletonisation enables extraction of more structural features, which, in turn, increases writer classification accuracyAdvantages in green, drawbacks in red.