Presentation is loading. Please wait.

Presentation is loading. Please wait.

Page 1 SenDiS Sectoral Operational Programme "Increase of Economic Competitiveness" "Investments for your future" Project co-financed by the European Regional.

Similar presentations


Presentation on theme: "Page 1 SenDiS Sectoral Operational Programme "Increase of Economic Competitiveness" "Investments for your future" Project co-financed by the European Regional."— Presentation transcript:

1 Page 1 SenDiS Sectoral Operational Programme "Increase of Economic Competitiveness" "Investments for your future" Project co-financed by the European Regional Development Fund General Word Sense Disambiguation System applied to Romanian and English Languages - SenDiS - Andrei Mincă - aminca@softwin.ro aminca@softwin.roaminca@softwin.ro SenDiS – WSD model, components, algorithms, methods & results

2 Page 2 SenDiS WSD model

3 Page 3 SenDiS System components

4 Page 4 SenDiS  Order Lexicon Network (OLN)  Build Meaning Semantic Signatures (BMSS)  Compare Meaning Semantic Signatures (CMSS)  Compute WSD Variants (CwsdV) WSD phases

5 Page 5 SenDiS  Input: unordered lexicon network  lexicon network optimizations considering number of edges loops or strong connected components number of roots and leafs number of levels (in the case of leveling the LN)  Output: ordered lexicon network OLN Algorithms

6 Page 6 SenDiS  Input a lexicon network (not necessarily ordered) a meaning ( ID )  Builds a semantic interpretation for the specified meaning over the lexicon network spanning trees sets of nodes sequences of edges or combinations of the above  Output : a semantic interpretation (signature) for the meaning BMSS Algorithms

7 Page 7 SenDiS  Input: two or more semantic signatures  comparison depends on the nature of the semantic signatures  Output: degrees of similarity CMSS Algorithms

8 Page 8 SenDiS  Input : a matrix with degrees of similarity between the context words sense  Output : one or several WSD variants with the highest cost CwsdV Algorithms

9 Page 9 SenDiS  Input text list of meanings lexicon network  Computing tokenization of text annotation of text tokens with meaning interpretations selecting a window-text for WSD other context filters or topologies build meaning semantic signatures for each word-sense compare meaning semantic signatures and fill the matrix compute best WSD variants  Output one or more WSD variants with one or more meaning interpretations for each text token WSD methods

10 Page 10 SenDiS  tokenization  part-of-speech tagging  lemmatization  sense interpretations  chunking  parsing general WSD requirements

11 Page 11 SenDiS  Performance indicators P - precision P = noCorrectlyDisambiguated_TargetWords / noDisambiguated_TargetWords R - recall R = noCorrectlyDisambiguated_TargetWords / noTargetWords F-measure 2 * P * R / (P+R)  state-of-the-art results (F-measure) lexical sample task coarse-grained: ~ 90% fine-grained: ~ 73% All-words task coarse-grained: ~83% fine-grained: ~ 65% Testing WSD

12 Page 12 SenDiS  A test configuration for SenDiS consists of: a meaning inventory a lexicon network an OLN algorithm a BMSS algorithm a CMSS algorithm a CwsdV algorithm a WSD method a Corpus test Testing SenDiS nMIs x nLNs x nOLNs x nBMSSs x nCMSSs x nCwsdVs x nWSDMs x nCorpusTests

13 Page 13 SenDiS Results Senseval 2 No. Texts LexNetPRF-measureTime (h) Observations (no POS tagging) 224WN_ex0.28910.21760.245976440.4 meaning interpretations only for recognized lemmas 225WN_ex0.31190.29020.299732050.4 20% coverage for GRAALAN Inflection Form Entries 225WN_ex0.3913 0.391275890.36 20% IFEs + corpus target words lemmas tags Senseval 3 No. Texts LexNetPRF-measureTime (h) Observations (no POS tagging) 254WN_ex0.23130.15950.185077120.1no IFEs 265WN_ex0.21850.20880.213051910.420% IFEs 256WN_ex0.2845 0.284478320.33 20% IFEs + corpus target words lemmas tags Semcor No. Texts LexNetPRF-measureTime (h) Observations (no POS tagging) 33,855WN_ex0.19610.18380.188888045020% IFEs 33,866WN_ex0.2515 0.251471546 20% IFEs + corpus target words lemmas tags

14 Page 14 SenDiS Tagged glosses as a Test Corpus WN_ex No. Texts LexNetPRF-measureTime (h) Observations (no POS tagging) 206,941WN_ex0.7120660.7120570.71206139only corpus target words lemmas tags 158,378WN_ex0.33870.33320.335482069020% IFEs 158,667WN_ex0.45770.41980.4341296790 20% IFEs + corpus target words lemmas tags LLR_99% No. Texts LexNetPRF-measureTime (h) Observations (no POS tagging) 106,899LLR_99%0.48480.28920.3447658289no IFEs 110,596LLR_99%0.5620.56080.56132905262100% IFEs 110,635LLR_99%0.66410.65050.65627624246 100% IFEs + corpus target words lemmas tags LLE_2% No. Texts LexNetPRF-measureTime (h) Observations (no POS tagging) 2,927LLE_2%0.64660.58350.60801071.4no IFEs 3,125LLE_2%0.76330.76250.7628381453% IFEs 3,071LLE_2%0.8594 0.859375791.5 53% IFEs + corpus target words lemmas tags

15 Page 15 SenDiS


Download ppt "Page 1 SenDiS Sectoral Operational Programme "Increase of Economic Competitiveness" "Investments for your future" Project co-financed by the European Regional."

Similar presentations


Ads by Google