Latent Semantic Mapping (LSA) 資工四阮鶴鳴資工四李運寰 “A multi-span language modeling framework for large vocabulary speech recognition,” J.R. Bellegarda -, 1998.

Latent Semantic Mapping (LSA) 資工四阮鶴鳴資工四李運寰 “A multi-span language modeling framework for large vocabulary speech recognition,” J.R. Bellegarda -, 1998

LSM feature M be an inventory of M individual units. N be a collection of N compositions. Define a mapping between (M,N) and a continue vector space L. Each unit in M is represented in L. Each composition in N is represented in L

Feature extraction :the number of times occurs in :the total number of units present in,

Singular Value Decomposition (SVD) :eigenvector of :(eigenvalue of ) 1/2 :eigenvector of

LSM framework extension

Improved Topic Separability Under LSA

Inter-Topic:  LSA does not appreciably affect the average distance between inter-topic pairs. Intra-Topic:  LSA dramatically reduces the average distance between intra-topic pairs. Conclusion;  Maybe that this is because the similarity of units and compositions.

Improved Topic Separability Under LSA - Something Correlative LSI (Latent Semantic Indexing)  It also uses SVD to span the LSI space.  The example above comes from an experiment of a paper about LSI 2000 terms 20 topic 100 terms per topic for prime set 95% from prime set/5% from others Distances are in radians

Improved Topic Separability Under LSA - Something Correlative

LSM applications Mapping (M,N) to the same vector space  L In LSA : M  individual words/unit N  collection of meaningful composition Using the SVD formalism.

Example Four sentences:  1) “what is the time.”  2) “what is the day.”  3) “what time is the meeting.’’  4) “cancel the meeting.” Unit set M:  What  is  The  Time  Day  Meeting  Cancel

LSM additional application -- junk e-mail filtering Separate the legitimate and junk mail (M,N) : M  inventory of words or symbols. N  collection of legitimate and junk e-mail.

LSM additional Application -- pronunciation modeling Assigning phonemic phonetic transcriptions to graphemic word. (GPC) (M,N) : M  an inventory of letter n-tuples. N  a collection of words. When the out-of-vocabulary word happen?  Straightforward to gather the corresponding set of pronunciations from the existing dictionary

Example: pronunciation modeling Four words (composition set N):  “rough”  “though”  “through” Unit set M:  tho  hou  oug  ugh  gh~  ~th  rou  thr  hro  ~ro

LSM additional application -- TTS unit selection (M,N) :  M  an inventory of centered pitch periods from the boundary region  N  a collection of time slices

LSM additional application -- TTS unit selection

TTS:  NLP + DSP  What is MLDS and FSS?

LSM additional application -- TTS unit selection NLP:  Accurate phonetic transcription can only be achieved provided the part of speech category of some words is available, as well as if the dependency relationship between successive words is known.  Natural prosody heavily relies on syntax. It also obviously has a lot to do with semantics and pragmatics, but since very few data is currently available on the generative aspects of this dependence, TTS systems merely concentrate on syntax. Yet few of them are actually provided with full disambiguation and structuration capabilities.

The LSM inherent tradeoff/drawback descriptive power and mathematical tractability --ex: the dimension of R updating data and computation --ex: SVD recomputation the narrow training data and polysemy

Word clustering example Cluster 1: antique, art, artist, collector, drawing, painter, Picasso,……… Cluster 2: attorney, court, criminal, judge, jury, rule, witness,…….. drawing a conclusion; breaking a rule.

Reference “Latent semantic indexing: A probabilistic analysis” in Proc. 17th ACM Symp.1998, “High-quality text-to-speech synthesis :an overview.” Thierry DUTOIT Faculte Polytechnique de Mons, TCTS Lab

Latent Semantic Mapping (LSA) 資工四阮鶴鳴資工四李運寰 “A multi-span language modeling framework for large vocabulary speech recognition,” J.R. Bellegarda -, 1998.

Similar presentations

Presentation on theme: "Latent Semantic Mapping (LSA) 資工四阮鶴鳴資工四李運寰 “A multi-span language modeling framework for large vocabulary speech recognition,” J.R. Bellegarda -, 1998."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Latent Semantic Mapping (LSA) 資工四 阮鶴鳴 資工四 李運寰 “A multi-span language modeling framework for large vocabulary speech recognition,” J.R. Bellegarda -, 1998.

Similar presentations

Presentation on theme: "Latent Semantic Mapping (LSA) 資工四 阮鶴鳴 資工四 李運寰 “A multi-span language modeling framework for large vocabulary speech recognition,” J.R. Bellegarda -, 1998."— Presentation transcript:

Similar presentations

About project

Feedback

Latent Semantic Mapping (LSA) 資工四阮鶴鳴資工四李運寰 “A multi-span language modeling framework for large vocabulary speech recognition,” J.R. Bellegarda -, 1998.

Presentation on theme: "Latent Semantic Mapping (LSA) 資工四阮鶴鳴資工四李運寰 “A multi-span language modeling framework for large vocabulary speech recognition,” J.R. Bellegarda -, 1998."— Presentation transcript: