IIIT Hyderabad. Handwriting  Graphical representation of thoughts Using predefined symbols Still used frequently (e.g., note taking)  An acquired skill.

IIIT Hyderabad

Handwriting  Graphical representation of thoughts Using predefined symbols Still used frequently (e.g., note taking)  An acquired skill Years of habituation and practice  Complex generation process Neuromuscular perceptual-motor task Hand contains some 27 bones and 40 muscles

IIIT Hyderabad Handwriting Identification  Handwritten documents have associated identity  Handwriting Identification Study of writership of the documents Comparison with reference handwritten documents

IIIT Hyderabad Individuality (example)

IIIT Hyderabad Recognition Vs Identification  Handwriting Recognition To automatically understand the underline text in the document Design of automated handwritten document reading systems Suppress variation due to writer or handwriting style  Handwriting Identification Study to determine the writer of the document Enhance the variation due to different handwriting styles

IIIT Hyderabad Problem Statement  Writer Identification Identify writer of a questioned document Given pool of writers  Writer Verification Verify whether the claimed identity is right? Given: Data based of writers  Forensic Document Analysis Verify whether two given documents are written by same person?

IIIT Hyderabad Identification Reference Data Base Questioned Document 35 50 65 Matching Score Result Writer - 3 Comparisons Who wrote this document? 1: N Matching

IIIT Hyderabad Verification Reference Data Base Questioned Document Mayank: I wrote this document !!! MayankSachin Amit Comparator Distance < Threshold Yes NO Threshold: decided based on training documents’ Within and Between writer distance distributions 1: 1 Matching

IIIT Hyderabad Individuality Features  Sub-character and character level Shape and size Choice of allograph  Word level Connections and character spacing Aspect Ratio  Line level Slant and slope Word spacing  Paragraph and page level Indentations and arrangements of text Uniformity of margins W1W2 Character Level Individuality W1 W2 Word Level Individuality

IIIT Hyderabad Line and Paragraph Level Writer-1Writer-2 Slant and Slope of lines Parallelism of Lines Word Spacing – number of words in a lineUniformity of Margins Overall Texture

IIIT Hyderabad Challenges  High within writer variations Due to mood dependent nature of handwriting No two piece of handwriting by any individual are same  Low between writer variations Handwriting must be readable Degree of variations are low

IIIT Hyderabad Online Vs Offline  Offline Matrix of integers Only shape and size information is available Temporal information about how stroke is drawn is lost  Online Sequence of X-Y coordinates, Pen up-down events Shape and size information is available Sequencing of points and strokes is available

IIIT Hyderabad Data collection and Annotation  Major Hurdle Sequential process: Devices needed for online handwriting People are reluctant to writing Standard databases are not available  Online handwriting collection devices are not accurate  Automatic segmentation and annotation Research problem  Data collection 600 pages of data from around 50 writers in various scripts

IIIT Hyderabad State of the Art  Done by handwriting experts Mostly manually State of art systems are not available  Using Context dependent information such as origin, type and condition of the documents Difficult to model mathematically

IIIT Hyderabad Theme  Identifying consistent features automatically To discriminate between writers  Usability of discriminating features Preserve discrimination

IIIT Hyderabad Major Contributions  Text-independent writer identification Designing codebook of writers Automatically identifying and extracting discriminating features  Text-dependent writer verification Writer-specific text generation Robust to forgery  Forensic document examination Repudiation detection in handwritten documents

IIIT Hyderabad Text-independent writer identification

IIIT Hyderabad Text-independent ?  Underline text is not known Data is not annotated Given: Sequence of strokes and x-y coordinate values  Challenges of text-independent Extract consistent curves (features) from documents Compare similar features between two documents Design codebook of individual writers

IIIT Hyderabad Consistency…

IIIT Hyderabad Codebook of a writer Six different clusters extracted from Devanagari script.

IIIT Hyderabad Theoretical background  Handwriting modeling studies Strokes is the combination of different forces Handwriting curves become consistent due to habituation  Relative velocity points of strokes are constant for same writer ( Empirical results ) Velocity Profile of above stroke Stroke from Devanagari Script

IIIT Hyderabad Classifier Soft Classification NN 1 NN 2 NN 3 NN n ……. Combined Result Classify Writers Summarized framework Questioned document Cluster into different clusters Writer Classification

IIIT Hyderabad Results  Experimented with Roman, Hindi, Cyrillic, Arabic and Hebrew  Training data Approx. 300-400 curves for Roman Approx. 700-800 curves for others  Test Data 100 curves for Roman 200-300 curves for others Tables and graphs are on next page…..

IIIT Hyderabad Varying No of Curves  Accuracy increases with number of curves.  >85% accuracy reached with 200 curves (10-12 words). Accuracy with 12 words

IIIT Hyderabad Script Vs Accuracy  ~10 writers for all scripts  For Most Scripts Top-2 accuracy is nearly 100% except Chinese  Confusion between pairs of writers

IIIT Hyderabad Related work Line level features –Word spacing –Lower and Upper profile –Fractal & wavelet features –Loops and Blobs Paragraph level features –Image processing Grey scale histogram Run length coding Fractal image compression –Texture features Gabor filter, Wavelet Contour-let GGD Grey scale covariance matrix –Online features Pen pressure, velocity, azimuth Velocity of Bary center –Codebook generation Using directional features Our approach –Code book design using –Sub-character features –Script independent framework –Online handwriting data –Identification with less amount of data –Automatic Identification of consistent and discriminating features

IIIT Hyderabad Result comparison  Schomaker et al[28] Combination of directional, texture and image processing features Identification: accuracy of 87% with 900 writers Verification: Equal error rate of 3%-8% Test Data size: 1 page of handwritten data  Our approach[5] Using shape based features Identification accuracy of ~85% with 15 writers Test data size: 12 words (1 line)

IIIT Hyderabad Analysis  Shape and size based primitives Obtain reasonable accuracy with simple algorithm.  Chinese script Most of the strokes are straight line segment Inter-stroke relations based features can be used  To increase accuracy Robust clustering and classification algorithm Fusion with high level like line and paragraph primitive

IIIT Hyderabad Text dependent writer Verification

IIIT Hyderabad Problem Statement  Text-independent systems Large amount of data needed  Text-dependent framework Higher Accuracy Small amount of data needed  Problems (Text-dependent systems) Forgery (due to fixed text known in advance) Authentication text not known (usually random text is used)

IIIT Hyderabad Signature Vs Text-dependent  Signature and Text-dependent handwriting Variations are unlimited, signature need not be readable Writer consciously tries to write the same signature  Challenges Discrimination between Within and Between writer variation has to be done Discriminating distance method have to find out

IIIT Hyderabad System Specification  Empirical finding Discriminating power of primitives vary for individuals Primitives: sub-characters, characters, words, etc.  System Specifications Writer – specific text For higher accuracies With limited amount of text Varying text across multiple authentication Robust to forgery

IIIT Hyderabad Boosting?  Classifier combination method Combines weak classifiers to generate a accurate learning algorithm Greedy algorithm  Select weak classifiers on each stage based on previously selected classifier  Maintains a distribution of weights over training samples

IIIT Hyderabad Framework  Verification as 2-class problem Positive samples Vs Negative samples  Given Set of writers and primitives Table of discriminating power  Randomness is included at each stage Proportional to the Discriminating power of the classifier More Discriminating: more probable to be accepted

IIIT Hyderabad Text Generation Process Bag of Primitives List of Writers W1W2W3 W4W5W6  Randomness is included at selection process.  Threshold selected Is biased: accepting the writer For lower False Rejection Rates Fix Threshold and Reject Writers Select it or not? Accuracy

IIIT Hyderabad Effect of Boosting Distance Probability X1 Within writer Distance Between writer Distance Number of Boosting Stages

IIIT Hyderabad Dynamic Time Warping Naïve Alignment Re-sampled series DTW Alignment Time Series Alignment Dynamic Programming Approach Different length feature vectors can be compared

IIIT Hyderabad Stroke Comparison  Dynamic Time Warping Alignment of stroke done using dynamic programming  Directional features Strokes representation: 12 Bins of curvature directions Curvature angle: Different between adjacent tangents direction 112334300001 0360

IIIT Hyderabad Results  Experimented with English script (20 writers) and Hindi script(10 writers)  DTW and Directional feature extraction methods are used  Each user written about 10-12 words each 3 fold cross-validation is used

IIIT Hyderabad Performance measures  False acceptance rate Percentage of user forge user those are accepted Should be lower for forensic application Security is the major concern  False rejection rates Percentage of genuine users those are rejected Should be lower for civilian applications Usability is the major concern

IIIT Hyderabad False Accept Rate (Directional Feature)

IIIT Hyderabad False Reject Rate (Directional Features)

IIIT Hyderabad False Accept Rate (DTW)

IIIT Hyderabad False Reject Rate (DTW)

IIIT Hyderabad Definition  Threshold-1 Control the range of variations within writers Decided based on positive samples  Threshold-2 Confidence before rejecting other writers (negative samples) Lower threshold-2 == Higher confidence

IIIT Hyderabad Effect of thresholds.. (DTW and Hindi script)

IIIT Hyderabad No. of word comparisons.. (DTW & Hindi script)

IIIT Hyderabad Effect of thresholds.. (Directional feature and Hindi script)

IIIT Hyderabad Effect of thresholds.. (Directional features and English script)

IIIT Hyderabad No. of word comparisons.. (Directional & Hindi script)

IIIT Hyderabad No. of word comparisons.. (Directional & English Script)

IIIT Hyderabad Number of writers Vs Accuracy (English)

IIIT Hyderabad Number of writers Vs Accuracy (Hindi Script)

IIIT Hyderabad Analysis and Summary  Writer-specific text generation framework  Automatic text generation  Automatic threshold generation  Text is Varied Robust to forgery

IIIT Hyderabad Related work Features –Character level GSC features Structural features Directional features –Word level Word model recognition Shape curvature Shape context Morphological features Feature selection –Static feature selection –PCA based discriminating power Our approach –Writer-specific text generation –Boosting based framework –Text variation –Higher accuracy with limited amount of data

IIIT Hyderabad Comparison  Srihari et al.[17] Shape context, Shape curvature, GSC features, WMR features Performance: 42%, 22%, 62% and 28% respectively (1000 writers) Test data size- 10 words  Our approach Directional features Performance: 95% (20 writers) Test data size: 5 words

IIIT Hyderabad Repudiation Detection in Handwriting Documents

IIIT Hyderabad Traditional writer identification Vs QDE  Assumption of Natural Handwriting  Biometrics Terms Repudiation (Negative Biometrics) Forgery (Positive Biometrics)  Quantity and quality of data available  Cost factor involved Used as expert witness in legal Verdict

IIIT Hyderabad Repudiation  The rejection or renunciation of a duty or obligation (as under a contract) Merriam-Webster's Dictionary of Law  Handwriting Repudiation Deliberately alter his natural handwriting to avoid detection To deny involvement in the case

IIIT Hyderabad Repudiation Comparator Calculate Distance Significant Distance? 1 : 1 Matching Questioned Document Data Base Reference Document Same Writer ? Different Writers ? Hypothesis Testing Written by same writer? No Database Dis

IIIT Hyderabad Verify whether given documents written by same person or different without assuming Natural Handwriting

IIIT Hyderabad hard problem? Normal HandwritingRepudiated Handwriting

IIIT Hyderabad Challenges  With in writer variations become high  Between-writer variations become less as compared.  Learning can’t be done as data is not available.

IIIT Hyderabad Ray of Hope  One can’t exclude from one’s own writing, those discriminating elements of which he/she is not aware  Maximum and minimum velocity points remain the same in-spite of absolute velocity.  Words have significant overlap at sub-character level.

IIIT Hyderabad Framework Statistically significant score between two documents. Utilize online information that can be available No assumptions about distribution of data. May lead to erroneous conclusions.

IIIT Hyderabad Assumptions Questioned and reference document either have significant overlap or are same at word level. Reference document is collected in online mode.

IIIT Hyderabad System Framework Hypothesis Testing Word Segmentation Word Comparison

IIIT Hyderabad Hypothesis Testing To calculate significance of distance between two distributions. According to Neyman Pearson paradigm H0 : Documents written by same writer (Null Hypothesis) H1 : Document written by different writers (Alternative Hypothesis) Intra-document word distances and inter-document word distances are two distribution to be compared. Distributions are compared to find out whether they are generated from same population.

IIIT Hyderabad Distribution Comparison KL divergence test (make assumptions on nature of distribution) Kolmogorov Smirnov Test (don’t make any assumptions)

IIIT Hyderabad Results Data being collected from 23 different users in English. Each users 3 pages of normal data and 3 pages of repudiated data is collected. Preprocessing: –Words are segmented using semi-automatic toolkit for word segmentation.

IIIT Hyderabad Results Intra-document distance Inter-document distance

IIIT Hyderabad ROC Curve Genuine Rejection – 82% @ Genuine Acceptance – 100%

IIIT Hyderabad Analysis of Results Semi automatic System Used as an aid to expert Null Hypothesis is never accepted without expert intervention. 1 0 Similar Different strong probability of identification probable indications no conclusion indications did not probably did not strong probability did not Scale Used by Forensic Experts

IIIT Hyderabad Conclusion and Future work  Learning based framework to learn similarity, in- spite of discrimination between documents.  Can we tell whether writer is trying to repudiate.  Framework which can learn more features and can give independent scores on each feature.

IIIT Hyderabad Conclusions  Proposed algorithms for automatic identification and extraction of discriminating features for online handwriting  Framework proposed for writer-specific text generation and text variations for text-dependent systems  Introduced the problem of repudiation and proposed a hypothesis testing based framework for the same

IIIT Hyderabad  Sachin Gupta and Anoop M. Namboodiri, Repudiation Detection in Handwritten Documents Proc of The 2nd International Conference on Biometrics (ICB'07), PP. 356- 365 Seoul, Korea, 27-29 August, 2007.  Anoop M. Namboodiri and Sachin Gupta Text Independent Writer Identification from Online Handwriting, International Workshop on Frontiers in Handwriting Recognition(IWFHR'06), October 23-26, 2006, La Baule, Centre de Congress Atlantia, France.  Sachin Gupta and Anoop M. Namboodiri Text dependent Writer Verification using Boosting, submitted to International Conference on Frontiers in Handwriting Recognition (ICFHR’08), Montreal, Canada  Sachin Gupta and Anoop M. Namboodiri Text dependent Writer Verification, planned in IEEE Transactions on Information Forensics and Security, 2008 Publications

IIIT Hyderabad Future work  Fusion of online and offline features for higher accuracies  Can we automatically detect person intention to repudiate or forge Based on single document  More robust algorithms for feature extraction Different than standard feature selection approaches

IIIT Hyderabad THANKING YOU gupta.sachin25@gmail.com

IIIT Hyderabad  Representation:  Incident Angle [1]  Curvature [2-4]  Size [5-8] Where: S j be the j th primitive C k be the k th cluster W i be the i th Writer is the discriminability of the k th cluster for the i th writer. Proposed framework Online Text Document  Critical Points: Minimum and Maximum velocity points.  Shape curve: Curve between any two consecutive minimum velocity points. Velocity Profile of above stroke Stroke from Devanagari Script 1 4 3 2 5 67 8  Consistent Primitive  Repeating curves  Extraction  Unsupervised learning algorithms  Experimental setup  K-Means Six different clusters extracted From Devanagari script. Curve Extraction Representation Characteristic curve Extraction Writer Identification

IIIT Hyderabad Number of Writers Vs Accuracy Accuracy Number of writers  Results for Devanagari Script  Accuracy dependent on the individuality of specific writer

IIIT Hyderabad Proposed Framework (example)

IIIT Hyderabad Framework (Authentication)

IIIT Hyderabad Writer-specific Text Generation  Given A bunch of primitives Varying discriminating power for different pairs of writers  Aim To select the optimal set of weights for primitives To discriminate specific writer from others  Dynamic Feature selection Static feature selection achieve single optimum

IIIT Hyderabad Writer-specific Text Generation  Text Variation require features robust to forgery  Handwriting can have different optimums Different combination of handwriting can provide desired results

IIIT Hyderabad Boosting Algorithm  Given set of training samples(X) and underline labels(Y) Set of weak hypothesis (h)  Initialize weights distribution(D) ( over training samples )  Select weak hypothesis h j, such that m – total number of training samples t - boosting stage

IIIT Hyderabad Boosting  Update weights Where,  Final Hypothesis -- Weight of the classifier t - boosting stage T– total number of Boosting stages

IIIT Hyderabad Discriminating Power of primitives

IIIT Hyderabad Text Generation Process Rejected Writer Distance Probability X1 X2 X3 X4 X5 X6 Rejected Writer Distance Probability X1 X3 X4 X6 Rejected Writer Distance Probability X1 X4  Randomness is included at each stage. Each classifier might be rejected Based on discriminating power.  Threshold is Biased towards accepting writer Writer specific thresholds  Rejection at any stage will also reject claims Calculate Threshold Select or not?

IIIT Hyderabad Normal Handwriting Repudiated Handwriting Repudiated writer - 1 Repudiated writer - 2 Normal writer - 1 Normal writer - 2 Why Repudiation is hard problem? I am confused

IIIT Hyderabad Word Comparison Sub-character Information DTW Matching

IIIT Hyderabad. Handwriting  Graphical representation of thoughts Using predefined symbols Still used frequently (e.g., note taking)  An acquired skill.

Similar presentations

Presentation on theme: "IIIT Hyderabad. Handwriting  Graphical representation of thoughts Using predefined symbols Still used frequently (e.g., note taking)  An acquired skill."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

IIIT Hyderabad. Handwriting  Graphical representation of thoughts Using predefined symbols Still used frequently (e.g., note taking)  An acquired skill.

Similar presentations

Presentation on theme: "IIIT Hyderabad. Handwriting  Graphical representation of thoughts Using predefined symbols Still used frequently (e.g., note taking)  An acquired skill."— Presentation transcript:

Similar presentations

About project

Feedback