Segmentation of Touching Characters in Devnagari & Bangla Scripts Using Fuzzy MultiFactorial Analysis Presented By: Sanjeev Maharjan St. Xavier’s College.

Segmentation of Touching Characters in Devnagari & Bangla Scripts Using Fuzzy MultiFactorial Analysis Presented By: Sanjeev Maharjan St. Xavier’s College

Contents  Background  Introducing Devnagari & Bangla Scripts  Segmentation in Devnagari & Bangla  Touching Characters in OCR  Fuzzy Mutifactorial Analysis (Solving Touching Character Problem)  Result  Conclusion

BACKGROUND  Document Analysis System  OCR for Document Analysis  How OCR works?

BACKGROUND (Document Analysis System)  Facilitates transfer of information on a paper document to the computer systems without intensive manual keying

BACKGROUND (OCR for Document Analysis System)  Recognizes the well-shaped and well-spaced characters from the scanned paper document  Converts those recognized characters to machine- editable format

BACKGROUND (How OCR Works???)  Take Text Image as Input  Segmentation of Text Segment the text into lines Segment the lines into words Segment the words into characters  Character Recognition Extract distinct features of character Map the character with predefined character set

INTRODUCING ‘Devnagari’ & ‘Bangla’ SCRIPTS  Writing style Left to right  50 basic characters in both scripts  Vowels take modified shape (allograph)  Consonants combine to form compound characters  More than 250 characters  Characters of the word are combined by ‘headline’ or ‘Shirorekha’ or ‘Dikka’

INTRODUCING ‘Devnagari’ & ‘Bangla’ SCRIPTS FIGURE: Three parts in Devnagari Script (a) & Bangla Script (b)

SEGMENTATION IN ‘Devnagari’ & ‘Bangla’ SCRIPTS  Headline is detected in text by row wise sum of black pixels  Position between two consecutive headlines where projection profile height is least, segments into lines  Vertical projection pixel profile segments words  Removing the headline segments words into individual characters

SEGMENTATION IN ‘Devnagari’ & ‘Bangla’ SCRIPTS

TOUCHING CHARACTERS IN OCR  Efficiency of OCR relies on segmentation error rate  Segmentation is based on connectivity analysis  Invalid Touching Characters degrade segmentation efficiency  Touching characters are more frequent in the Devnagari & Bangla Scripts

TOUCHING CHARACTERS IN OCR (Research Observation)  Mostly constitutes of two characters  Not valid characters  Have larger aspect ratio than isolated character  Vertical thickness of the black bob at touching position is small  At most of the touching positions single black run is encountered  Touch mostly at middle of the middle zone  Uncommon stroke patterns are generated at touching points

FUZZY MULTIFACTORIAL ANALYSIS  Wang defined the concept of factor spaces(1982)  He defined ‘factor’ as primary time with possible states & characteristics  Eg: If factor is length, then 1mtr, 10 mtr etc are its states and long, short are its characteristics

FUZZY MULTIFACTORIAL ANALYSIS  H.X. Li & V.C. Yen discussed 4 factors: Measurable Factors: (like time) Nominal Factors: (like religion) Degree/Fuzzy Factors (Degree of Similarity) Switch/Boolean Factors (0/1)  Multiple fuzzy factors are analyzed to identify & segment the touching characters

FUZZY MULTIFACTORIAL ANALYSIS (Identifying Touching Characters)  Factors Considered: Dissimilarity factor (F md ) Aspect Ratio (F ar )  F md =1-d off /d d off =minimum similarity distance for a target character against a set of stored prototypes d=the offset distance used by character classifiers  F ar= e a /1+e a a=w/h, w, h=width and height of minimum upright bounding box of character  Multifactorial Function (Mid)=1/2(F md + F ar )

FUZZY MULTIFACTORIAL ANALYSIS (Finding Cut Positions)  Factors Considered F ic (Inverse crossing count)= c-1 c = vertical crossing count for a pixel column. F mt (measure of blob thickness)=1-t/T t = no. of black pixel found in one column scan T = height of the characters middle zone F dm (degree of Middleness) F dm =min(l1,l2)/max(l1,l2)

FUZZY MULTIFACTORIAL ANALYSIS (Finding Cut Positions)  Factors Considered: F up (Up Stroke pattern) F low (Lower Stroke Pattern)

FUZZY MULTIFACTORIAL ANALYSIS (Finding Cut Positions)  For ‘m’ pixel columns, all five factors are evaluated, forming nx5 evaluation matrix  Multifactorial function

FUZZY MULTIFACTORIAL ANALYSIS (Confirming Cut Column) 1.List optimal cut positions identified 2.Take cut position with highest multifactor evaluation value 3.Segment the touching character resulting two characters say p1 & p2 4.Send p1 and p2 to character classifier 5.If p1 & p2 both are recognized then cut position confirmed else if p1 is recognized then take cut position with 2 nd highest multifactor evaluation repeat from step 3 to segment p2 else if p2 is recognized then take cut position with 2 nd highest multifactor evaluation repeat from step 3 to segment p1 else take cut position with 2 nd highest multifactor evaluation repeat from step 3

FUZZY MULTIFACTORIAL ANALYSIS (Confirming Cut Column) Touching Character After identification of cut positions Separation using Right-Most Cut Position Character Classifier identifies as ‘Ka’ Character Classifier identifies as ‘La’ Character Classifier identifies as ‘Ma’ Separating Again

RESULTS (By using Fuzzy Multifactorial Analysis)  Still problem in identifying the touching characters  Segmentation accuracy is 98.92% and 98.47% in Devnagari and Bangla Scripts repectively  System Throughput (T) is calculated as : T= C/t where C is total number of characters properly recognized by the OCR, and t is total time elapsed for the operation  System Efficiency (E) is calculated as E=(Nv*100)/Nt where Nv is the number of valid cut columns and Nt is total number of cut columns checked to find the valid cuts.

CONCLUSION  Touching characters being one of the major problem for Nepali OCR at the present  Use of Fuzzy Multifactorial Analysis would certainly contribute in minimising the joining errors in Nepali OCR

Resource Material  Paper on “Segmentation of Touching Characters Printed Devnagari and Bangla Scripts Using Fuzzy Multifactorial Analysis” By: and Utpal GarainBidyut B. Chaudhari

Segmentation of Touching Characters in Devnagari & Bangla Scripts Using Fuzzy MultiFactorial Analysis Presented By: Sanjeev Maharjan St. Xavier’s College.

Similar presentations

Presentation on theme: "Segmentation of Touching Characters in Devnagari & Bangla Scripts Using Fuzzy MultiFactorial Analysis Presented By: Sanjeev Maharjan St. Xavier’s College."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Segmentation of Touching Characters in Devnagari & Bangla Scripts Using Fuzzy MultiFactorial Analysis Presented By: Sanjeev Maharjan St. Xavier’s College.

Similar presentations

Presentation on theme: "Segmentation of Touching Characters in Devnagari & Bangla Scripts Using Fuzzy MultiFactorial Analysis Presented By: Sanjeev Maharjan St. Xavier’s College."— Presentation transcript:

Similar presentations

About project

Feedback