Korea Maritime and Ocean University NLP Jung Tae LEE

Korea Maritime and Ocean University NLP Jung Tae LEE inverse90@nate.com

` 1. Introduction of NETtalk NETtalk  One of the method for converting text to speech(TTS).  Automated learning procedure for parallel network of deterministic processing units. Conventional approach is converted by applying phonolohical rules, and handling exceptions with a look-up table.  After trainig, it achives good performance and generalizes to novel words.

` Characteristics of TTS in Eng  English is amongst the most difficult languages to read aloud.  Speech sounds have exceptions that are often context-sensitive - EX) the “a” in almost all words ending in “ave”, such as “brave” and “gave”, is a long vowel, but-not in “have”, and some words can vary in pronuciation with their syntactic role. This is the problem in conventional approach

` DECtalk : commercial product  DECtalk used two methods for converting text to phonemes 1. A word is first looked up in a pronunciation dictionary of common words; if it is not found there the a set of phonological rules is applied. (For novel words that are not correctly pronounced) 2. alternative approach is based on massively-parallel network models. Knowledge in these models is distributed over many processing units and make decision by exchange of information between the processing unit

` In this paper :  Network learning algorithms with three layers.  NETtalk can be trained on any dialect of any languages.  Demonstrates that a relatively small network can capture most of the significant regularities in English pronunciation as well as absorb many of the irregulatities.

` 2. Network Architecture Processing Unit  The network is composed of processing units that non-linearly transform their summed, continuous-valued inputs. The connection strength, or weight, linking one unit to another unit can be a positive or negative real value.

Processing Unit  The ouput of the ith unit is determined by first summing all of its inputs

Processing Unit value, representing either an excitatory or an inhibitory influence of the first unit on the output of the second unit  NETtalk is hierarchically arranged into three layers of units

Representations of Letters and Phonemes  There are seven groups of units in the input layer - Each input group encodes one letter of the input text. - Seven letters are presented to the input units at any one time.  And one group of units in each of the other two layers - The desired output of the network is the correct phoneme, or contrastive speech sound, associated with the center, or fourth  Except for center letter provide a partial context for this decision - The test is stepped through the window letter-by-letter.  At each step, the network computes a phoneme, and after each word the weights are adjusted according to how closely the computed pronunciation matches the correct one.

Representations of Letters and Phonemes  The letters are represented by alphabet, plus an additional 3 units to encode punctuation and word boundaries  The phonemes, are represented in terms of 23 articulatory features, such as point of articulation, voicing, vowel height, and so on  Three additional units encode stress and syllable boundaries goal of the learning algorithm is to adjust the weights between the units in the network in order to make the hidden units good feature detectors

Learning Algorithm  Two texts were used to train the network: - Phonetic transcriptions from informal, continuous speech of a child - 20,012 word corpus from a dictionary A subset of 1000 words was chosen from this dictionary taken from the Brown corpus of the most common words in English Letters and phonemes were aligned like this: “phone” - /f-on-/

Learning Algorithm

` 3. Performance Performance  Two measures of performance were computed  Best Guess - best guess, which was the phoneme making the smallest angle with the output vector.  Perfect match - value of each articulatory feature was within a margin of 0.1 of its corrects value.

` Continuous Informal Speech  Learining after 50,000words. Perfect matches were at 55%.

` Continuous Informal Speech  Examples of raw output from the simulator stresses text phonemes 200word 1 iter 25 iter Cont’

` Continuous Informal Speech  Graphical summary of the weights between the letter units and some of the hidden units Negative(inhibitory weight) Positive(excitatory weight)

` Continuous Informal Speech  Damage to the network and recovery from damage.

` Dictionary  Used the 1000 most Common word in EGN. Hard pron soft pron

` 4. Summary Seven groups of nodes in the input layer, The text was stepped through the window on a letter-by-letter basis. standard back-propagation algorithm Strings of seven letters were thus presented to the input layer at any one time.

Korea Maritime and Ocean University NLP Jung Tae LEE inverse90@nate.com

Korea Maritime and Ocean University NLP Jung Tae LEE

Similar presentations

Presentation on theme: "Korea Maritime and Ocean University NLP Jung Tae LEE"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Korea Maritime and Ocean University NLP Jung Tae LEE

Similar presentations

Presentation on theme: "Korea Maritime and Ocean University NLP Jung Tae LEE"— Presentation transcript:

Similar presentations

About project

Feedback