Presentation is loading. Please wait.

Presentation is loading. Please wait.

By: Hossein and Hadi Shayesteh Supervisor: Mr. James Connan.

Similar presentations


Presentation on theme: "By: Hossein and Hadi Shayesteh Supervisor: Mr. James Connan."— Presentation transcript:

1 By: Hossein and Hadi Shayesteh Supervisor: Mr. James Connan

2 Introduction Phone Reader Intended for Blinds and Illiterates People dealing with a Foreign Language Potential Users Converting Image to text using OCR Reading the text using TTS

3 Preprocessing +Image [] : int -dotRemove ( img[]:int) : int -binarization ( img[]:int) : int +thresholding ( img[]:int ) : int Segmentation -Width : int -Height : int +Image[] : int -SegmentImg ( width:int,height:int,Img[]:int) : int +Diff (Img[]:int) : int Feature_Extraction -ImgSeg[] : int -Pixel_ID ( ImgSeg[]:int ) : int -Loci ( ImgSeg[][]:int ) : int +Feature_Vector ( ImgSeg[][]:int) : int Classification -FV[] : int +Classification (Cnum: int) : int +Feature_Vector ( class[]:int) : int -Binary_Mask ( FV[]:int,CFV[][]:int) -Distance ( FV[]:int,CFV[][]:int) : int) : int -Classifier ( distance:int ) :int HLD / LLD of OCR 1 1.. 0..1

4 HLD / LLD of OCR Feature Extraction Assigning PixelNum to each class Calculating LociNum using Characteristic Loci approach Creating Feature Vector for each segment Classification Creating 38 classes Creating Feature Vector for each class Applying a Binary Mask Calculating Euclidean Distance: Classifying the input Image : Feature_Extraction ImgSeg [5] = 200 Image : Classification Fv [256] = { 4,18,……,9 }

5 Tokenization -Token :string -InputText :string -InputReader :string -getTokenizer( ) -setInputText( java.lang.string.textToTokenize ) -setInputReader(java.io.Reader reader) +CMUdiphoneVoice UnitSelector + UnitDataBaseName :string -getFeartures( ) +getString(DataBaseName) UnitConcatenator +utterance :string +ProcessUtterance(Utterance utterance) +java.lang.string to string ( ) Prosody - Rate [] :int - Pitch [] :int - PitchRange [] :int - SetRate( ) :int - Setpitch( ) :int - setPitchRange ( ) :int - setupfreatureSet( ) HLD / LLD for TTS Play Audio +begin :int +end :int +time :int +format :int +SetAudioFormat (new AudioFormat(8000,16,1,false,true) +set.Audioplayer(voicePLAYER) Voice Manager Attributes +Boolean contains(java.lang.string.voicename) +static VoiceManager getInstance( ) +voice getvoice(java.lang.string.voicename) +voice[] getVoice(DataBase)

6 Low level Design / TTS Word Pronunciation(UnitConcatenator) Accepts the text Checks the database for the word pronunciation Reverts to “letter to sound rules” If the word doesn’t exist outputs a sequence of phonemes Passes the pronunciation to prosody stage Play Audio Engine receives the phoneme Loads the digital audio from a database Does some pitch, time, and volume changes Sends it out to the sound card. Text: UnitConcatenator Utterance: “Hello” Text: PlayAudio SetAudioFormat (new AudioFormat(8000,16,1,false,true) Begin = 20; Time = 30; End = 50;

7 OCR Package TTS Package Overall Software Architecture Main Method - Captures text image - Invokes OCR package - Sends extracted text to TTS package - Reads out the text

8 Project Plan Term 3 Term 4 Implementing OCR and TTS engines in emulator environment Integrating OCR and TTS engines Porting the complete package to the mobile platform Testing the final package

9 Project Demo Functionality Demo User Interface Demo Demonstrating project functionality using a win32 application Demonstrating project User Interface using a mobile emulator

10 Question and Answer


Download ppt "By: Hossein and Hadi Shayesteh Supervisor: Mr. James Connan."

Similar presentations


Ads by Google