Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multiword Expressions Presented by: Bhuban Seth (09305005)Somya Gupta (10305011)Advait Mohan Raut (09305923)Victor Chakraborty (09305903) Under the guidance.

Similar presentations


Presentation on theme: "Multiword Expressions Presented by: Bhuban Seth (09305005)Somya Gupta (10305011)Advait Mohan Raut (09305923)Victor Chakraborty (09305903) Under the guidance."— Presentation transcript:

1 Multiword Expressions Presented by: Bhuban Seth ( )Somya Gupta ( )Advait Mohan Raut ( )Victor Chakraborty ( ) Under the guidance of : Prof. Pushpak Bhattacharya.

2 Contents Introduction Motivation Linguistic Levels Types of MWEs Approaches to identify MWEs Limitations Conclusion References

3 Introduction Put the sweater on Put the sweater on the table Put the light on

4 Introduction Put the sweater on Put the sweater on the table Put the light on Roughly defined as: Idiosyncratic interpretations that cross word boundaries (or spaces)

5 Examples His grandfather kicked the bucket. This job is a piece of cake Put the sweater on He is the dark horse of the match Google Translations of above sentences: अपने दादा बाल्टी लात मारी इस काम के केक का एक टुकड़ा है स्वेटर पर रखो वह मैच के अंधेरे घोड़ा है

6 Motivation Multiword expressions “ Of the same order of magnitude as the number of single words ” (Jakendoff 1977) 41% - WordNet 1.7 (Fellbaum 1999) Resolution needed in: Machine Translation – Google translate Poor performance example Information Retrieval Tagging, Parsing, Question Answering System, WSD

7 Linguistic Levels In short, Ad hoc Lexicology Put on weight, Put the sweater on Morphology and Syntax Spill the Beans Semantics Kick the Bucket, Kick the bucket filled with water Pragmatics

8 How to Handle These? Variation in FlexibilitySyntactic Idiomaticity

9 Types ( Sag et al 2002 )

10 Types - Examples TypeExample Fixed In Short, Ad hoc, Palo Alto, Alta Vista Compound Nominals Congressman, Car park, Part of Speech Proper Names Deccan Chargers, Delhi Daredevils Non Decomposable Idioms Kick the Bucket Decomposable Idioms Spill the Beans, Let the Cat out Verb Particle Constructions Take off, Put on, Light Verb Constructions Give a Demo, Take a Shower Institutionalized Phrases Black and White, Traffic Light, Telephone booth

11 Approaches

12 Knowledge Based Approach 1)Word with space : Fixed expression Stemmer may be used to detect MWEs. But it fails.. Why??? Kicks the bucket  MWE Kick the buckets  Not MWE Princeton Wordnet – Flaw 2)Circumscribed Constructions: Consecutive Nouns  Most probably MWE 3) Inflection Head : Semi fixed expression Ex : part of speech  parts of speech

13 Statistical Approaches Co-occurrence properties Substitutability Distributional Similarity Semantic Similarity

14 Co-occurrence properties Example: Black and White Scan a corpus and find probabilities of bigrams and tri-grams. P(X|Y) = P(XY)/P(Y) If P(X|Y) is high, then there is a chance that word sequence ‘YX’ is a MWE. Demerit: “I am “  Not MWE.

15 Point-wise Mutual Information (PMI) PMI ( X,Y )= log { P(X,Y)/(P(X).P(Y))} PMI ( X,Y ) of a word pair (X,Y) is measure of strength of their collocation Other methods like students-t test and Pearson chi-square can also be used. Demerit: Need to differentiate between systematic & chance co-occurrence

16 Pearson’s chi-square test Based on assumption of normal distribution of word frequency, which could be a limitation Null hypothesis: the words are independent of each other. Higher the value of the chi-square statistic, the stronger the association between the words Demerit: For small data collections, assumptions of normality and chi-square distribution do not hold. Hence, large corpus required

17 Substitutability The ability to replace parts of lexical items with alternatives. Alternatives can be similar or opposite words with respect to tasks & approaches. Mostly after the substitution the new phrase no longer remains MWE. Can be used to remove possible Non- MWEs Src: Kim, 2008

18 Distributional Similarity A method to extract the semantic similarity using the context When two words are similar, then their context words are also similar Src: Kim, 2008

19 Semantic Similarity Similar NCs could have same semantic relations Src: Kim, 2008

20 Method Src: Kim, 2008

21 MWE Resources British National Corpus (BNC) Brown Corpus Corpus WordNet Moby’s Thesaurus- contains 30K root words & 2.5M synonyms and related words Lexical Resources WordNet::Similarity- gives measure of semantic similarity between two given words Tools

22 Limitations of current Approaches Many NLP approaches treat MWEs according to the words-with-spaces method Many approaches get commonly-attested MWE usages right, sometimes using “ad hoc” methods, e.g. preprocessing However, most approaches handle variation badly, fail to generalize, and result in NLP systems that are difficult to maintain and extend

23 Conclusion MWEs have been classified in terms of lexicalized phrases (like fixed, semi fixed and syntactically flexible) and institutionalized phrases. MWE analysis in NLP is equally important as any of the other domain like MT or WSD. Hybrid approach is most probably the best method so far to extract MWE from corpus.

24 References Kim, S. N. (2008). Statistical modeling of multiword expressions. Sag, I. A., Baldwin, T., Bond, F., Copestake, A., & Filckinger, D. (2001). Multiword Expression : A pain in the neck for the NLP. In the proceeding of the 3rd International conference on Intelligent text processing and computational linguistics. Calzolari, N. a. (2002). Towards best practice for multiword expressions in computational lexicons. Proc. of the 3rd International conference of language resources and evaluation, (pp ).

25 Thank You Questions???


Download ppt "Multiword Expressions Presented by: Bhuban Seth (09305005)Somya Gupta (10305011)Advait Mohan Raut (09305923)Victor Chakraborty (09305903) Under the guidance."

Similar presentations


Ads by Google