Presentation is loading. Please wait.

Presentation is loading. Please wait.

Image Text & Audio hacks. Introduction Image Processing is one of the fastest growing technology in the field of computer science. It is a method to convert.

Similar presentations


Presentation on theme: "Image Text & Audio hacks. Introduction Image Processing is one of the fastest growing technology in the field of computer science. It is a method to convert."— Presentation transcript:

1 Image Text & Audio hacks

2 Introduction Image Processing is one of the fastest growing technology in the field of computer science. It is a method to convert an image into digital form and perform some operations on it, in order to get an enhanced image or to extract some useful information from it. Image processing is done usually for 1. Visualization - Observe the objects that are not visible. 2. Image sharpening and restoration - To create a better image. 3. Image retrieval - Seek for the image of interest. 4. Measurement of pattern – Measures various objects in an image. 5. Image Recognition – Distinguish the objects in an image. In this hackfest, Using the openCV library in C++ and tesseract OCR, we aim to build the following three devices:

3 A Friend of the Blind India has the world’s largest blind population. More than 1.43 million people are visually impaired. Only 5% of the blind receives any kind of education. Braille books,braille printers and Scanners are not readily available (expensive & non-portable) The device helps these people to read a normal text book. It has a right angle guide which helps blind person to navigate the finger in straight line and help him to move to the next line by giving an audio feedback.

4 A Teaching Assistant India ranks 185 on the basis of average Literacy Rate of countries. Around 70% of people in India still live in villages. It is very common to find no quality education in these areas. The device becomes really helpful in these cases. It acts as a teaching assistant,thus enabling a child to learn more interactively and efficiently by visualizing.

5 A Tourist Companion Imagine yourself in a foreign country where the spoken language is not known to you!! Naturally, you could land yourself in serious troubles in this situation. No problem! Here comes your rescuer in the form of this device.It translates the text in the image to the required language of the user thereby enabling him to understand the foreign language.

6 List of Libraries and APIs used OpenCV (C++) – For preprocessing the image before feeding it into tesseract. Tesseract – To convert test in images to text Pyenchant – Dictionary gTTS – Google Text to Speech(Online Mode)output in mp3 format Pico2wave- Text to Speech (Offline Mode) Pygame – play mp3 file

7 Steps Followed to read word pointed by finger Find the centroid of Finger part and save it. Crop the part above the centroid. Invert the Image (White and black to black and white) Morphological operation - Dilation on Inverted image Draw the rectangles and find centroids of all contours. Crop out that particular contour whose X coordinates include the centroid of finger part

8 Identify finger part Given a range of colour in RGB only those values are white rest all are black Find Contours Find the Centroid of maximum size contour

9 Morophological operation on contours Dilation Erosion Contour Properities

10 Captured Frame by camera

11 Finger Part Cropped out

12 Invert Image

13 Find Contours

14 Identify the word we are pointing at

15 Adaptive thresholding

16 Crop out the word pointed by red color Finger part

17

18 Cropped out word is not so clear and Tesseract is not soo Sensitive to identify alphabets even if they are not clear. Problem Solved – by Operation Smoothening the edges

19 Smoothening By Averaging and Guassian Filter

20 What if the line with words in image is bent? Is Tesseract good enough to identify characters even if the line with text is bent ? Answer is : N0

21 Solution to above stated Problem of line bending Draw the ellipse around the Counter of last Line. Once we can draw the ellipse we can find the angle of inclination of axis of ellipse with the X-axis and hence angle of tilt and Hence rotate complete image by that angle.

22

23 How to find the font size in image ? We have to find Font size coz’ we do dilation operation accordingly based on font size This is done by first inverting normal image Considering each alphabet as separate contour Draw the rectangles across each contour and find height of each contour

24 Overall View

25 We are sliding our finger Continuously and words are read out dynamically..!!! Any problems faced?  Blurred images should be removed  Solution: Laplacian edge Detector  Once a word is read out that image should not be processed again  Solution : Scale invariant Feature transform (SIFT) to tell whether present image is same as previous processed image Two parallel threads : One for processing image, Second for taking input image from camera

26 Laplacian Edge Detector If edges are crisp max value of matrix will be 255 Else somewhere around 100

27 Scale Invariant feature transform

28 Problem : Incomplete Images( Premature images) Solution : Using Two Parallel Threads(Programs)

29 Removal Of Noise NOISE PART

30 Removal of Noise

31 Conclusions:- What we built is …. Working prototype for Reading text in a book : Extracting text from images of wrappers of objects and reading out Translating the words to required language by using google APIs(helpful for tourists) and Displaying the image of the word that a child points at on screen (teaching assistant interactive learning).

32 After the hackathon, we aim to make the following key improvements:- present algorithm needs to be more faster, should include machine learning algorithms Should be a portable device.


Download ppt "Image Text & Audio hacks. Introduction Image Processing is one of the fastest growing technology in the field of computer science. It is a method to convert."

Similar presentations


Ads by Google