Presentation is loading. Please wait.

Presentation is loading. Please wait.

Optical Character Recognition Qurat-ul-Ain (Ainie) Akram Sarmad Hussain Center for language Engineering Al-Khawarizmi Institute of Computer Science University.

Similar presentations


Presentation on theme: "Optical Character Recognition Qurat-ul-Ain (Ainie) Akram Sarmad Hussain Center for language Engineering Al-Khawarizmi Institute of Computer Science University."— Presentation transcript:

1 Optical Character Recognition Qurat-ul-Ain (Ainie) Akram Sarmad Hussain Center for language Engineering Al-Khawarizmi Institute of Computer Science University of Engineering and Technology, Lahore, Pakistan Lecture 8

2 Syllable String Creation using lookup table ISSALE 20142 Syllable String Main body ID Diacritics1_I D …. تا 5002 و 501 پتھر 200212

3 Project Presentation 1.Front Page – Optical Character Recognition(in English) – Optical Character Recognition(in Your Language) – Document Image – Output of OCR (Recognized Syllable Strings of OCR) – Syllable String Recognition Accuracy(Syllables /Total Syllables*100) – Group Members Name ISSALE 20143

4 1.Preprocessing – Line Segmentation Samples of line segmentation Line segmentation accuracy results Samples of incorrect line segmentation – Syllable/Ligature Segmentation Samples of Syllable/Ligature segmentation Syllable/Ligature Segmentation Accuracy Results Samples of incorrect Syllable/Ligature segmentation ISSALE 20144 Total LinesCorrectl LinesIncorrect Lines % Accuracy Total SyllablesCorrectly Syllables Incorrect Syllables % Accuracy

5 Pre-processing – Main body and diacritics disambiguation ISSALE 20145 Total main bodiesCorrectly classified as main bodies % Accuracy Total diacriticsCorrectly classified as diacritics % Accuracy

6 Classification and Recognition – Data Description 15 Main body Types (DataSet-1) – Training Data (35 Tokens) – Testing Data (15 Tokens) – Image samples Document Images(DataSet-2) – Testing Data » X Tokens of Y main body Types » X Tokens of Y diacritics Types » Image sample ISSALE 20146 Main body TypeTotal tokens in document images Total unique syllables in document images 500154

7 Classification and recognition results – Recognition Results on DataSet-1 using Decision Trees Main body recognition accuracy – Diacritics recognition accuracy – Recognition Results on DataSet-1 using Tesseract Main body recognition accuracy – Diacritics recognition accuracy ISSALE 20147 Class TypeTotal Samples Test data (15 Tokens) Correctly Recognized % Accuracy Class TypeTotal Samples Test data (15 Tokens) Correctly Recognized % Accuracy

8 Classification and recognition results – Recognition Results on DataSet-2 using Decision Trees Main body recognition accuracy – Diacritics recognition accuracy OR – Recognition Results on DataSet-2 using Tesseract Main body recognition accuracy – Diacritics recognition accuracy ISSALE 20148 Class TypeTotal SamplesCorrectly Recognized % Accuracy Class TypeTotal SamplesCorrectly Recognized % Accuracy

9 Post-processing – Syllable String Creation – Syllable String Recognition Accuracy ISSALE 20149 Syllable String Main body ID Diacritics1_I D …. تا 5002 و 501 Syllable TypeTotal SamplesCorrectly Recognized % Accuracy

10 Output of OCR Input Document Image ISSALE 201410 OCR Output

11 Deliverables to submit 1.Presentation slides 2.OCR Complete Code 1.Line segmentation 2.Syllable segmentation 3.Recognition of diacritics and main bodies 4.Syllable string creation using lookup Table 5.Output.txt file generation 3.Data Set-1 4.Data Set-2 5.Tesseract Traineddata file ISSALE 201411

12 Good Luck

13 Document Image Creation ISSALE 201413 Syllable_of_MB1_Samples_1 Syllable_of_MB2_Samples_1 Syllable_of_MB2_Samples_1 Syllable_of_MB3_Samples_1 Syllable_of_MB4_Samples_1 Syllable_of_MB5_Samples_1,,, Syllable_of_MB15_Samples_1 Syllable_of_MB1_Samples_2 Syllable_of_MB2_Samples_2 Syllable_of_MB2_Samples_2 Syllable_of_MB3_Samples_2 Syllable_of_MB4_Samples_2 Syllable_of_MB5_Samples_2,,, Syllable_of_MB15_Samples_2 Syllable_of_MB1_Samples_3 Syllable_of_MB2_Samples_3 Syllable_of_MB2_Samples_3 Syllable_of_MB3_Samples_3 Syllable_of_MB4_Samples_3 Syllable_of_MB5_Samples_3,,, Syllable_of_MB15_Samples_3 Syllable_of_MB1_Samples_4 Syllable_of_MB2_Samples_4 Syllable_of_MB2_Samples_4 Syllable_of_MB3_Samples_4 Syllable_of_MB4_Samples_4 Syllable_of_MB5_Samples_4,,, Syllable_of_MB15_Samples_4, Syllable_of_MB1_Samples_15 Syllable_of_MB2_Samples_15 Syllable_of_MB2_Samples_15 Syllable_of_MB3_Samples_15 Syllable_of_MB4_Samples_15 Syllable_of_MB5_Samples_15,,, Syllable_of_MB15_Samples_15 Syllable = MB + Diacritics or Syllable = MB

14 Examples of Document Image ISSALE 201414


Download ppt "Optical Character Recognition Qurat-ul-Ain (Ainie) Akram Sarmad Hussain Center for language Engineering Al-Khawarizmi Institute of Computer Science University."

Similar presentations


Ads by Google