Presentation is loading. Please wait.

Presentation is loading. Please wait.

Latent Semantic Mapping (LSA) 資工四 阮鶴鳴 資工四 李運寰 “A multi-span language modeling framework for large vocabulary speech recognition,” J.R. Bellegarda -, 1998.

Similar presentations


Presentation on theme: "Latent Semantic Mapping (LSA) 資工四 阮鶴鳴 資工四 李運寰 “A multi-span language modeling framework for large vocabulary speech recognition,” J.R. Bellegarda -, 1998."— Presentation transcript:

1 Latent Semantic Mapping (LSA) 資工四 阮鶴鳴 資工四 李運寰 “A multi-span language modeling framework for large vocabulary speech recognition,” J.R. Bellegarda -, 1998

2 LSM feature M be an inventory of M individual units. N be a collection of N compositions. Define a mapping between (M,N) and a continue vector space L. Each unit in M is represented in L. Each composition in N is represented in L

3 Feature extraction :the number of times occurs in :the total number of units present in,

4 Singular Value Decomposition (SVD) :eigenvector of :(eigenvalue of ) 1/2 :eigenvector of

5 LSM framework extension

6 Improved Topic Separability Under LSA

7 Inter-Topic:  LSA does not appreciably affect the average distance between inter-topic pairs. Intra-Topic:  LSA dramatically reduces the average distance between intra-topic pairs. Conclusion;  Maybe that this is because the similarity of units and compositions.

8 Improved Topic Separability Under LSA - Something Correlative LSI (Latent Semantic Indexing)  It also uses SVD to span the LSI space.  The example above comes from an experiment of a paper about LSI 2000 terms 20 topic 100 terms per topic for prime set 95% from prime set/5% from others Distances are in radians

9 Improved Topic Separability Under LSA - Something Correlative

10 LSM applications Mapping (M,N) to the same vector space  L In LSA : M  individual words/unit N  collection of meaningful composition Using the SVD formalism.

11 Example Four sentences:  1) “what is the time.”  2) “what is the day.”  3) “what time is the meeting.’’  4) “cancel the meeting.” Unit set M:  What  is  The  Time  Day  Meeting  Cancel

12 LSM additional application -- junk e-mail filtering Separate the legitimate and junk mail (M,N) : M  inventory of words or symbols. N  collection of legitimate and junk e-mail.

13 LSM additional Application -- pronunciation modeling Assigning phonemic phonetic transcriptions to graphemic word. (GPC) (M,N) : M  an inventory of letter n-tuples. N  a collection of words. When the out-of-vocabulary word happen?  Straightforward to gather the corresponding set of pronunciations from the existing dictionary

14 Example: pronunciation modeling Four words (composition set N):  “rough”  “though”  “through” Unit set M:  tho  hou  oug  ugh  gh~  ~th  rou  thr  hro  ~ro

15 LSM additional application -- TTS unit selection (M,N) :  M  an inventory of centered pitch periods from the boundary region  N  a collection of time slices

16 LSM additional application -- TTS unit selection

17

18

19 TTS:  NLP + DSP  What is MLDS and FSS?

20 LSM additional application -- TTS unit selection NLP:  Accurate phonetic transcription can only be achieved provided the part of speech category of some words is available, as well as if the dependency relationship between successive words is known.  Natural prosody heavily relies on syntax. It also obviously has a lot to do with semantics and pragmatics, but since very few data is currently available on the generative aspects of this dependence, TTS systems merely concentrate on syntax. Yet few of them are actually provided with full disambiguation and structuration capabilities.

21 The LSM inherent tradeoff/drawback descriptive power and mathematical tractability --ex: the dimension of R updating data and computation --ex: SVD recomputation the narrow training data and polysemy

22 Word clustering example Cluster 1: antique, art, artist, collector, drawing, painter, Picasso,……… Cluster 2: attorney, court, criminal, judge, jury, rule, witness,…….. drawing a conclusion; breaking a rule.

23 Reference “Latent semantic indexing: A probabilistic analysis” in Proc. 17th ACM Symp.1998, “High-quality text-to-speech synthesis :an overview.” Thierry DUTOIT Faculte Polytechnique de Mons, TCTS Lab


Download ppt "Latent Semantic Mapping (LSA) 資工四 阮鶴鳴 資工四 李運寰 “A multi-span language modeling framework for large vocabulary speech recognition,” J.R. Bellegarda -, 1998."

Similar presentations


Ads by Google