Presentation is loading. Please wait.

Presentation is loading. Please wait.

IIIT Hyderabad. Handwriting  Graphical representation of thoughts Using predefined symbols Still used frequently (e.g., note taking)  An acquired skill.

Similar presentations


Presentation on theme: "IIIT Hyderabad. Handwriting  Graphical representation of thoughts Using predefined symbols Still used frequently (e.g., note taking)  An acquired skill."— Presentation transcript:

1 IIIT Hyderabad

2 Handwriting  Graphical representation of thoughts Using predefined symbols Still used frequently (e.g., note taking)  An acquired skill Years of habituation and practice  Complex generation process Neuromuscular perceptual-motor task Hand contains some 27 bones and 40 muscles

3 IIIT Hyderabad Handwriting Identification  Handwritten documents have associated identity  Handwriting Identification Study of writership of the documents Comparison with reference handwritten documents

4 IIIT Hyderabad Individuality (example)

5 IIIT Hyderabad Recognition Vs Identification  Handwriting Recognition To automatically understand the underline text in the document Design of automated handwritten document reading systems Suppress variation due to writer or handwriting style  Handwriting Identification Study to determine the writer of the document Enhance the variation due to different handwriting styles

6 IIIT Hyderabad Problem Statement  Writer Identification Identify writer of a questioned document Given pool of writers  Writer Verification Verify whether the claimed identity is right? Given: Data based of writers  Forensic Document Analysis Verify whether two given documents are written by same person?

7 IIIT Hyderabad Identification Reference Data Base Questioned Document 35 50 65 Matching Score Result Writer - 3 Comparisons Who wrote this document? 1: N Matching

8 IIIT Hyderabad Verification Reference Data Base Questioned Document Mayank: I wrote this document !!! MayankSachin Amit Comparator Distance < Threshold Yes NO Threshold: decided based on training documents’ Within and Between writer distance distributions 1: 1 Matching

9 IIIT Hyderabad Individuality Features  Sub-character and character level Shape and size Choice of allograph  Word level Connections and character spacing Aspect Ratio  Line level Slant and slope Word spacing  Paragraph and page level Indentations and arrangements of text Uniformity of margins W1W2 Character Level Individuality W1 W2 Word Level Individuality

10 IIIT Hyderabad Line and Paragraph Level Writer-1Writer-2 Slant and Slope of lines Parallelism of Lines Word Spacing – number of words in a lineUniformity of Margins Overall Texture

11 IIIT Hyderabad Challenges  High within writer variations Due to mood dependent nature of handwriting No two piece of handwriting by any individual are same  Low between writer variations Handwriting must be readable Degree of variations are low

12 IIIT Hyderabad Online Vs Offline  Offline Matrix of integers Only shape and size information is available Temporal information about how stroke is drawn is lost  Online Sequence of X-Y coordinates, Pen up-down events Shape and size information is available Sequencing of points and strokes is available

13 IIIT Hyderabad Data collection and Annotation  Major Hurdle Sequential process: Devices needed for online handwriting People are reluctant to writing Standard databases are not available  Online handwriting collection devices are not accurate  Automatic segmentation and annotation Research problem  Data collection 600 pages of data from around 50 writers in various scripts

14 IIIT Hyderabad State of the Art  Done by handwriting experts Mostly manually State of art systems are not available  Using Context dependent information such as origin, type and condition of the documents Difficult to model mathematically

15 IIIT Hyderabad Theme  Identifying consistent features automatically To discriminate between writers  Usability of discriminating features Preserve discrimination

16 IIIT Hyderabad Major Contributions  Text-independent writer identification Designing codebook of writers Automatically identifying and extracting discriminating features  Text-dependent writer verification Writer-specific text generation Robust to forgery  Forensic document examination Repudiation detection in handwritten documents

17 IIIT Hyderabad Text-independent writer identification

18 IIIT Hyderabad Text-independent ?  Underline text is not known Data is not annotated Given: Sequence of strokes and x-y coordinate values  Challenges of text-independent Extract consistent curves (features) from documents Compare similar features between two documents Design codebook of individual writers

19 IIIT Hyderabad Consistency…

20 IIIT Hyderabad Codebook of a writer Six different clusters extracted from Devanagari script.

21 IIIT Hyderabad Theoretical background  Handwriting modeling studies Strokes is the combination of different forces Handwriting curves become consistent due to habituation  Relative velocity points of strokes are constant for same writer ( Empirical results ) Velocity Profile of above stroke Stroke from Devanagari Script

22 IIIT Hyderabad Classifier Soft Classification NN 1 NN 2 NN 3 NN n ……. Combined Result Classify Writers Summarized framework Questioned document Cluster into different clusters Writer Classification

23 IIIT Hyderabad Results  Experimented with Roman, Hindi, Cyrillic, Arabic and Hebrew  Training data Approx. 300-400 curves for Roman Approx. 700-800 curves for others  Test Data 100 curves for Roman 200-300 curves for others Tables and graphs are on next page…..

24 IIIT Hyderabad Varying No of Curves  Accuracy increases with number of curves.  >85% accuracy reached with 200 curves (10-12 words). Accuracy with 12 words

25 IIIT Hyderabad Script Vs Accuracy  ~10 writers for all scripts  For Most Scripts Top-2 accuracy is nearly 100% except Chinese  Confusion between pairs of writers

26 IIIT Hyderabad Related work Line level features –Word spacing –Lower and Upper profile –Fractal & wavelet features –Loops and Blobs Paragraph level features –Image processing Grey scale histogram Run length coding Fractal image compression –Texture features Gabor filter, Wavelet Contour-let GGD Grey scale covariance matrix –Online features Pen pressure, velocity, azimuth Velocity of Bary center –Codebook generation Using directional features Our approach –Code book design using –Sub-character features –Script independent framework –Online handwriting data –Identification with less amount of data –Automatic Identification of consistent and discriminating features

27 IIIT Hyderabad Result comparison  Schomaker et al[28] Combination of directional, texture and image processing features Identification: accuracy of 87% with 900 writers Verification: Equal error rate of 3%-8% Test Data size: 1 page of handwritten data  Our approach[5] Using shape based features Identification accuracy of ~85% with 15 writers Test data size: 12 words (1 line)

28 IIIT Hyderabad Analysis  Shape and size based primitives Obtain reasonable accuracy with simple algorithm.  Chinese script Most of the strokes are straight line segment Inter-stroke relations based features can be used  To increase accuracy Robust clustering and classification algorithm Fusion with high level like line and paragraph primitive

29 IIIT Hyderabad Text dependent writer Verification

30 IIIT Hyderabad Problem Statement  Text-independent systems Large amount of data needed  Text-dependent framework Higher Accuracy Small amount of data needed  Problems (Text-dependent systems) Forgery (due to fixed text known in advance) Authentication text not known (usually random text is used)

31 IIIT Hyderabad Signature Vs Text-dependent  Signature and Text-dependent handwriting Variations are unlimited, signature need not be readable Writer consciously tries to write the same signature  Challenges Discrimination between Within and Between writer variation has to be done Discriminating distance method have to find out

32 IIIT Hyderabad System Specification  Empirical finding Discriminating power of primitives vary for individuals Primitives: sub-characters, characters, words, etc.  System Specifications Writer – specific text For higher accuracies With limited amount of text Varying text across multiple authentication Robust to forgery

33 IIIT Hyderabad Boosting?  Classifier combination method Combines weak classifiers to generate a accurate learning algorithm Greedy algorithm  Select weak classifiers on each stage based on previously selected classifier  Maintains a distribution of weights over training samples

34 IIIT Hyderabad Framework  Verification as 2-class problem Positive samples Vs Negative samples  Given Set of writers and primitives Table of discriminating power  Randomness is included at each stage Proportional to the Discriminating power of the classifier More Discriminating: more probable to be accepted

35 IIIT Hyderabad Text Generation Process Bag of Primitives List of Writers W1W2W3 W4W5W6  Randomness is included at selection process.  Threshold selected Is biased: accepting the writer For lower False Rejection Rates Fix Threshold and Reject Writers Select it or not? Accuracy

36 IIIT Hyderabad Effect of Boosting Distance Probability X1 Within writer Distance Between writer Distance Number of Boosting Stages

37 IIIT Hyderabad Dynamic Time Warping Naïve Alignment Re-sampled series DTW Alignment Time Series Alignment Dynamic Programming Approach Different length feature vectors can be compared

38 IIIT Hyderabad Stroke Comparison  Dynamic Time Warping Alignment of stroke done using dynamic programming  Directional features Strokes representation: 12 Bins of curvature directions Curvature angle: Different between adjacent tangents direction 112334300001 0360

39 IIIT Hyderabad Results  Experimented with English script (20 writers) and Hindi script(10 writers)  DTW and Directional feature extraction methods are used  Each user written about 10-12 words each 3 fold cross-validation is used

40 IIIT Hyderabad Performance measures  False acceptance rate Percentage of user forge user those are accepted Should be lower for forensic application Security is the major concern  False rejection rates Percentage of genuine users those are rejected Should be lower for civilian applications Usability is the major concern

41 IIIT Hyderabad False Accept Rate (Directional Feature)

42 IIIT Hyderabad False Reject Rate (Directional Features)

43 IIIT Hyderabad False Accept Rate (DTW)

44 IIIT Hyderabad False Reject Rate (DTW)

45 IIIT Hyderabad Definition  Threshold-1 Control the range of variations within writers Decided based on positive samples  Threshold-2 Confidence before rejecting other writers (negative samples) Lower threshold-2 == Higher confidence

46 IIIT Hyderabad Effect of thresholds.. (DTW and Hindi script)

47 IIIT Hyderabad Effect of thresholds.. (DTW and Hindi script)

48 IIIT Hyderabad No. of word comparisons.. (DTW & Hindi script)

49 IIIT Hyderabad Effect of thresholds.. (Directional feature and Hindi script)

50 IIIT Hyderabad Effect of thresholds.. (Directional feature and Hindi script)

51 IIIT Hyderabad Effect of thresholds.. (Directional features and English script)

52 IIIT Hyderabad Effect of thresholds.. (Directional features and English script)

53 IIIT Hyderabad No. of word comparisons.. (Directional & Hindi script)

54 IIIT Hyderabad No. of word comparisons.. (Directional & English Script)

55 IIIT Hyderabad Number of writers Vs Accuracy (English)

56 IIIT Hyderabad Number of writers Vs Accuracy (Hindi Script)

57 IIIT Hyderabad Analysis and Summary  Writer-specific text generation framework  Automatic text generation  Automatic threshold generation  Text is Varied Robust to forgery

58 IIIT Hyderabad Related work Features –Character level GSC features Structural features Directional features –Word level Word model recognition Shape curvature Shape context Morphological features Feature selection –Static feature selection –PCA based discriminating power Our approach –Writer-specific text generation –Boosting based framework –Text variation –Higher accuracy with limited amount of data

59 IIIT Hyderabad Comparison  Srihari et al.[17] Shape context, Shape curvature, GSC features, WMR features Performance: 42%, 22%, 62% and 28% respectively (1000 writers) Test data size- 10 words  Our approach Directional features Performance: 95% (20 writers) Test data size: 5 words

60 IIIT Hyderabad Repudiation Detection in Handwriting Documents

61 IIIT Hyderabad Traditional writer identification Vs QDE  Assumption of Natural Handwriting  Biometrics Terms Repudiation (Negative Biometrics) Forgery (Positive Biometrics)  Quantity and quality of data available  Cost factor involved Used as expert witness in legal Verdict

62 IIIT Hyderabad Repudiation  The rejection or renunciation of a duty or obligation (as under a contract) Merriam-Webster's Dictionary of Law  Handwriting Repudiation Deliberately alter his natural handwriting to avoid detection To deny involvement in the case

63 IIIT Hyderabad Repudiation Comparator Calculate Distance Significant Distance? 1 : 1 Matching Questioned Document Data Base Reference Document Same Writer ? Different Writers ? Hypothesis Testing Written by same writer? No Database Dis

64 IIIT Hyderabad Verify whether given documents written by same person or different without assuming Natural Handwriting

65 IIIT Hyderabad hard problem? Normal HandwritingRepudiated Handwriting

66 IIIT Hyderabad Challenges  With in writer variations become high  Between-writer variations become less as compared.  Learning can’t be done as data is not available.

67 IIIT Hyderabad Ray of Hope  One can’t exclude from one’s own writing, those discriminating elements of which he/she is not aware  Maximum and minimum velocity points remain the same in-spite of absolute velocity.  Words have significant overlap at sub-character level.

68 IIIT Hyderabad Framework Statistically significant score between two documents. Utilize online information that can be available No assumptions about distribution of data. May lead to erroneous conclusions.

69 IIIT Hyderabad Assumptions Questioned and reference document either have significant overlap or are same at word level. Reference document is collected in online mode.

70 IIIT Hyderabad System Framework Hypothesis Testing Word Segmentation Word Comparison

71 IIIT Hyderabad Hypothesis Testing To calculate significance of distance between two distributions. According to Neyman Pearson paradigm H0 : Documents written by same writer (Null Hypothesis) H1 : Document written by different writers (Alternative Hypothesis) Intra-document word distances and inter-document word distances are two distribution to be compared. Distributions are compared to find out whether they are generated from same population.

72 IIIT Hyderabad Distribution Comparison KL divergence test (make assumptions on nature of distribution) Kolmogorov Smirnov Test (don’t make any assumptions)

73 IIIT Hyderabad Results Data being collected from 23 different users in English. Each users 3 pages of normal data and 3 pages of repudiated data is collected. Preprocessing: –Words are segmented using semi-automatic toolkit for word segmentation.

74 IIIT Hyderabad Results Intra-document distance Inter-document distance

75 IIIT Hyderabad ROC Curve Genuine Rejection – 82% @ Genuine Acceptance – 100%

76 IIIT Hyderabad Analysis of Results Semi automatic System Used as an aid to expert Null Hypothesis is never accepted without expert intervention. 1 0 Similar Different strong probability of identification probable indications no conclusion indications did not probably did not strong probability did not Scale Used by Forensic Experts

77 IIIT Hyderabad Conclusion and Future work  Learning based framework to learn similarity, in- spite of discrimination between documents.  Can we tell whether writer is trying to repudiate.  Framework which can learn more features and can give independent scores on each feature.

78 IIIT Hyderabad Conclusions  Proposed algorithms for automatic identification and extraction of discriminating features for online handwriting  Framework proposed for writer-specific text generation and text variations for text-dependent systems  Introduced the problem of repudiation and proposed a hypothesis testing based framework for the same

79 IIIT Hyderabad  Sachin Gupta and Anoop M. Namboodiri, Repudiation Detection in Handwritten Documents Proc of The 2nd International Conference on Biometrics (ICB'07), PP. 356- 365 Seoul, Korea, 27-29 August, 2007.  Anoop M. Namboodiri and Sachin Gupta Text Independent Writer Identification from Online Handwriting, International Workshop on Frontiers in Handwriting Recognition(IWFHR'06), October 23-26, 2006, La Baule, Centre de Congress Atlantia, France.  Sachin Gupta and Anoop M. Namboodiri Text dependent Writer Verification using Boosting, submitted to International Conference on Frontiers in Handwriting Recognition (ICFHR’08), Montreal, Canada  Sachin Gupta and Anoop M. Namboodiri Text dependent Writer Verification, planned in IEEE Transactions on Information Forensics and Security, 2008 Publications

80 IIIT Hyderabad Future work  Fusion of online and offline features for higher accuracies  Can we automatically detect person intention to repudiate or forge Based on single document  More robust algorithms for feature extraction Different than standard feature selection approaches

81 IIIT Hyderabad THANKING YOU gupta.sachin25@gmail.com

82 IIIT Hyderabad  Representation:  Incident Angle [1]  Curvature [2-4]  Size [5-8] Where: S j be the j th primitive C k be the k th cluster W i be the i th Writer is the discriminability of the k th cluster for the i th writer. Proposed framework Online Text Document  Critical Points: Minimum and Maximum velocity points.  Shape curve: Curve between any two consecutive minimum velocity points. Velocity Profile of above stroke Stroke from Devanagari Script 1 4 3 2 5 67 8  Consistent Primitive  Repeating curves  Extraction  Unsupervised learning algorithms  Experimental setup  K-Means Six different clusters extracted From Devanagari script. Curve Extraction Representation Characteristic curve Extraction Writer Identification

83 IIIT Hyderabad Number of Writers Vs Accuracy Accuracy Number of writers  Results for Devanagari Script  Accuracy dependent on the individuality of specific writer

84 IIIT Hyderabad Proposed Framework (example)

85 IIIT Hyderabad Framework (Authentication)

86 IIIT Hyderabad Writer-specific Text Generation  Given A bunch of primitives Varying discriminating power for different pairs of writers  Aim To select the optimal set of weights for primitives To discriminate specific writer from others  Dynamic Feature selection Static feature selection achieve single optimum

87 IIIT Hyderabad Writer-specific Text Generation  Text Variation require features robust to forgery  Handwriting can have different optimums Different combination of handwriting can provide desired results

88 IIIT Hyderabad Boosting Algorithm  Given set of training samples(X) and underline labels(Y) Set of weak hypothesis (h)  Initialize weights distribution(D) ( over training samples )  Select weak hypothesis h j, such that m – total number of training samples t - boosting stage

89 IIIT Hyderabad Boosting  Update weights Where,  Final Hypothesis -- Weight of the classifier t - boosting stage T– total number of Boosting stages

90 IIIT Hyderabad Discriminating Power of primitives

91 IIIT Hyderabad Text Generation Process Rejected Writer Distance Probability X1 X2 X3 X4 X5 X6 Rejected Writer Distance Probability X1 X3 X4 X6 Rejected Writer Distance Probability X1 X4  Randomness is included at each stage. Each classifier might be rejected Based on discriminating power.  Threshold is Biased towards accepting writer Writer specific thresholds  Rejection at any stage will also reject claims Calculate Threshold Select or not?

92 IIIT Hyderabad Normal Handwriting Repudiated Handwriting Repudiated writer - 1 Repudiated writer - 2 Normal writer - 1 Normal writer - 2 Why Repudiation is hard problem? I am confused

93 IIIT Hyderabad Word Comparison Sub-character Information DTW Matching


Download ppt "IIIT Hyderabad. Handwriting  Graphical representation of thoughts Using predefined symbols Still used frequently (e.g., note taking)  An acquired skill."

Similar presentations


Ads by Google