By: Hossein and Hadi Shayesteh Supervisor: Mr. James Connan.

Slides:

Advertisements

Similar presentations

By: Hossein and Hadi Shayesteh Supervisor: Mr J.Connan.

Advertisements

Requirements Specification and Management

Speech Synthesis Markup Language V1.0 (SSML) W3C Recommendation on September 7, 2004 SSML is an XML application designed to control aspects of synthesized.

Voiceprint System Development Design, implement, test unique voiceprint biometric system Research Day Presentation, May 3 rd 2013 Rahul Raj (Team Lead),

Mini Project Seminar on Pizza Ordering Application for Android

Video, audio, embed, iframe, HTML Form

Phone Reader Project Presenter: Marilyn Bihina Supervisor: James Connan 1.

AUTOMATIC ORGANIZING AND FORMATTING FOR LECTURE NOTES SHIQING (LICIA) HE ADIVISOR: PROF.KRISTINA STRIEGNITZ SPRING 2014 STRUCTURING THE UNSTRUCTURED NOTE:

WELCOME PROJECT GROUP MEMBERS  Orhan AKSOY  Rıdvan ÇELEBİ  Ulan BAYALİYEV  Mustafa BAL  Mehmet BIÇAK.

©TheMcGraw-Hill Companies, Inc. Permission required for reproduction or display. COMPSCI 125 Introduction to Computer Science I.

LYU0203 Smart Traveller with Visual Translator for OCR and Face Recognition Supervised by Prof. LYU, Rung Tsong Michael Prepared by: Wong Chi Hang Tsang.

LYU 0102 : XML for Interoperable Digital Video Library Recent years, rapid increase in the usage of multimedia information, Recent years, rapid increase.

Customizable Audio Kaleidoscope Agustya Mehta, Dennis Ramdass, Tony Hwang Final Project Spring 2007.

Artificial Neural Networks (ANNs)

Text-To-Speech Synthesis An Overview. What is a TTS System  Goal A system that can read any text Automatic production of new sentences Not just audio.

The Chinese University of Hong Kong Department of Computer Science and Engineering Lyu0202 Advanced Audio Information Retrieval System.

Chapter 14 Recording and Editing Sound. Getting Started FAQs: − How does audio capability enhance my PC? − How does your PC record, store, and play digital.

Text-To-Speech System for Marathi Miss. Deepa V. Kadam Indian Institute of Technology, Bombay.

I.R.I.S. Toolkits Bring the power of recognition, classification, compression and/or extraction to your application.

1J. M. Kizza - Ethical And Social Issues Module 16: Biometrics Introduction and Definitions Introduction and Definitions The Biometrics Authentication.

Module 14: Biometrics Introduction and Definitions The Biometrics Authentication Process Biometric System Components The Future of Biometrics J. M. Kizza.

By: Hossein and Hadi Shayesteh Supervisor: Mr. James Connan.

Assistive Technology By: Roxanne Majeski, Oscar Guerin, Tasha Reaves, Elias Luna.

Optical Music Recognition Ichiro Fujinaga McGill University 2003.

By Breanna Myers Ms. Williams-Grant 5 th Period Business Computer Applications

1 7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training.

Project : The phone reader Presenter: Marilyn Bihina Supervisor: James Connan 1.

Supervisor: Dr. Eddie Jones Electronic Engineering Department Final Year Project 2008/09 Development of a Speaker Recognition/Verification System for Security.

Introduction CSE 1310 – Introduction to Computers and Programming Vassilis Athitsos University of Texas at Arlington 1.

Speech Recognition ECE5526 Wilson Burgos. Outline Introduction Objective Existing Solutions Implementation Test and Result Conclusion.

Chapter 7. BEAT: the Behavior Expression Animation Toolkit

By: Hadley Scholtz Supervisor: Mehrdad Ghaziasgar Co – supervisor: James Connan Assisted by: Ibraheem Frieslaar.

K. Zagoris, K. Ergina and N. Papamarkos Image Processing and Multimedia Laboratory Department of Electrical & Computer Engineering Democritus University.

Korea Maritime and Ocean University NLP Jung Tae LEE

By: Hadley Scholtz Supervisor: Mehrdad Ghaziasgar Co - supervisor: James Connan Mentor: Ibraheem Frieslaar.

By: Hadley Scholtz Supervisor: Mehrdad Ghaziasgar Co - supervisor: James Connan Mentor: Ibraheem Frieslaar.

Braille Converter For Exam Introduction Purpose of the system Need to create system to reduce paper works Need to reduce time consumption Text.

Chapter 15 Recording and Editing Sound. 2Practical PC 5 th Edition Chapter 15 Getting Started In this Chapter, you will learn: − How sound capability.

Phone Reader Project Presenter: Marilyn Bihina Supervisor: James Connan.

Dan Lopez Dan Lopez Ben Rohner Ben Rohner Erin Loutzenhiser Erin Loutzenhiser.

Tools for information processes Organising IPT 2009.

Marketing Development Block 4 Dr. Uma Kanjilal. Stages of a Multimedia Project  Planning and costing- infrastructure, time, skills etc.  Designing and.

Chatter Box Daniel Dunham Mike Nelson Nick Noack.

Module Overview. Aims apply your programming skills to an applied study of Digital Image Processing, Digital Signal Processing and Neural Networks investigate.

Kearan Mc Pherson Mr. J. Connan. Overview Introduction Design Decisions Implementation Project Plan Demo.

Performance Comparison of Speaker and Emotion Recognition

Oman College of Management and Technology Course – MM Topic 7 Production and Distribution of Multimedia Titles CS/MIS Department.

Chapter 3 AS3 Programming. Introduction Algorithms + data structure =programs Why this formula relevant to application programs created in flash? The.

Design & Implementation of a Gesture Recognition System Isaac Gerg B.S. Computer Engineering The Pennsylvania State University.

By: Hossein and Hadi Shayesteh Supervisor: Mr. James Connan.

1 MIT 5316 Web-Based Computing Lecture 1. 2 Welcome Introduction Syllabus.

Chapter I: Introduction to Computer Science. Computer: is a machine that accepts input data, processes the data and creates output data. This is a specific-purpose.

Speech Recognition through Neural Networks By Mohammad Usman Afzal Mohammad Waseem.

Automatic License Plate Recognition for Electronic Payment system Chiu Wing Cheung d.

Chapter 15 Recording and Editing Sound

G. Anushiya Rachel Project Officer

Input and output devices for visually impaired users

Piano Chord Builder Artem Kuligin CS 470 6/17/2018 3:08 AM

Text-To-Speech System for English

bReader – Blind can read now

Extracting Old Persian Cuneiform Font Out of

Laser Harp Team: Peter Crinklaw Qiushi Jiang Edwin Rodriguez.

FISH IDENTIFICATION SYSTEM

7 INPUT AND OUTPUT CHAPTER

Optical Character Recognition

Senior Design Capstone Project I

Optical Music Recognition

Welcome W 7.1 Introduction to Engineering Design II (IE 202)

Understand the interaction between computer hardware and software

FISH IDENTIFICATION SYSTEM

Presentation transcript:

By: Hossein and Hadi Shayesteh Supervisor: Mr. James Connan

Introduction Phone Reader Intended for Blinds and Illiterates People dealing with a Foreign Language Potential Users Converting Image to text using OCR Reading the text using TTS

Preprocessing +Image [] : int -dotRemove ( img[]:int) : int -binarization ( img[]:int) : int +thresholding ( img[]:int ) : int Segmentation -Width : int -Height : int +Image[] : int -SegmentImg ( width:int,height:int,Img[]:int) : int +Diff (Img[]:int) : int Feature_Extraction -ImgSeg[] : int -Pixel_ID ( ImgSeg[]:int ) : int -Loci ( ImgSeg[][]:int ) : int +Feature_Vector ( ImgSeg[][]:int) : int Classification -FV[] : int +Classification (Cnum: int) : int +Feature_Vector ( class[]:int) : int -Binary_Mask ( FV[]:int,CFV[][]:int) -Distance ( FV[]:int,CFV[][]:int) : int) : int -Classifier ( distance:int ) :int HLD / LLD of OCR

HLD / LLD of OCR Feature Extraction Assigning PixelNum to each class Calculating LociNum using Characteristic Loci approach Creating Feature Vector for each segment Classification Creating 38 classes Creating Feature Vector for each class Applying a Binary Mask Calculating Euclidean Distance: Classifying the input Image : Feature_Extraction ImgSeg [5] = 200 Image : Classification Fv [256] = { 4,18,……,9 }

Tokenization -Token :string -InputText :string -InputReader :string -getTokenizer( ) -setInputText( java.lang.string.textToTokenize ) -setInputReader(java.io.Reader reader) +CMUdiphoneVoice UnitSelector + UnitDataBaseName :string -getFeartures( ) +getString(DataBaseName) UnitConcatenator +utterance :string +ProcessUtterance(Utterance utterance) +java.lang.string to string ( ) Prosody - Rate [] :int - Pitch [] :int - PitchRange [] :int - SetRate( ) :int - Setpitch( ) :int - setPitchRange ( ) :int - setupfreatureSet( ) HLD / LLD for TTS Play Audio +begin :int +end :int +time :int +format :int +SetAudioFormat (new AudioFormat(8000,16,1,false,true) +set.Audioplayer(voicePLAYER) Voice Manager Attributes +Boolean contains(java.lang.string.voicename) +static VoiceManager getInstance( ) +voice getvoice(java.lang.string.voicename) +voice[] getVoice(DataBase)

Low level Design / TTS Word Pronunciation(UnitConcatenator) Accepts the text Checks the database for the word pronunciation Reverts to “letter to sound rules” If the word doesn’t exist outputs a sequence of phonemes Passes the pronunciation to prosody stage Play Audio Engine receives the phoneme Loads the digital audio from a database Does some pitch, time, and volume changes Sends it out to the sound card. Text: UnitConcatenator Utterance: “Hello” Text: PlayAudio SetAudioFormat (new AudioFormat(8000,16,1,false,true) Begin = 20; Time = 30; End = 50;

OCR Package TTS Package Overall Software Architecture Main Method - Captures text image - Invokes OCR package - Sends extracted text to TTS package - Reads out the text

Project Plan Term 3 Term 4 Implementing OCR and TTS engines in emulator environment Integrating OCR and TTS engines Porting the complete package to the mobile platform Testing the final package

Project Demo Functionality Demo User Interface Demo Demonstrating project functionality using a win32 application Demonstrating project User Interface using a mobile emulator

Question and Answer