Presentation on theme: "RECOGNISING NOMINALISATIONS"— Presentation transcript:
1 RECOGNISING NOMINALISATIONS Supervisors: Dr. Alex LascaridesDr. Mirella Lapata(Andrew) Yuk On KONGUniversity of Edinburgh
2 DEFINITION“Nominalisation refers to the process of forming a noun from some other word-class. (e.g. red+ness)or (in classical transformational grammar especially) the derivation of a noun phrase from an underlying clause (e.g. Her answering of the letter….from She answered the letter).The term is also used in the classification of relative clauses (e.g. What concerns me is her attitude)…….” (Crystal 1997)
3 Nominalisations (1st definition) from verbs only are considered here, e.g. "statement" from "state". Problem: WORD--noun? from a verb or not?Nominalsations derived from verbs are very productive in English and are usually created by means of suffixation (i.e., suffixes that form nouns are attached to verb bases).
4 EXCLUSIONS Nominals, e.g. the poor, the wounded Nominalisation NOT From Verb, e.g. redness-ing form, e.g. the making of the movieAntidisestablish-ment-arian-ism
11 WHEN TO USE WHICH SUFFIX -tion/-sioner/orDebate debaterTalk talkerCollect collectorConduct conductor
12 IRREGULAR NOMINALISATION Choose choiceSucceed success;succession;successorDecide decisionSell sale
13 PSEUDO-NOMINALISATION mote?? Motion(noun; a very small piece of dust)Depart Departure; Department???Apart apartment????
14 WHY BOTHER?The identification of nominalisations and their associated verbs (e.g. "statement" and "state"). important for a number of NLP tasks:machine translationinformation retrievalautomatic learning of machine-readable dictionariesgrammar induction
15 HOW ? nominalisation is a productive morphological phenomenon: list all acceptable nominalised forms?New words?
16 techniques NOT focusing on nominalisations build rulesmachine-learning approaches to induce morphological structures using large corporaknowledge-free induction of inflectional morphologies (Schone and Jurafsky 2001).
17 SCHONE AND JURAFSKY (2001)Schone and Jurafsky (2001) have performed work for acquiring cognates and morphological variants. Induced semantics—Latent Semantic Analysis (LSA)Induced orthographic infoInduced syntactic infoTransitive informationAffix frequencies
18 GOAL OF THIS STUDYThe principal goal of this project is to develop a system which can recognise nominalisations, together with the verbs from which they are derived.
19 EXPERIMENT 1 (baseline) identify nouns using the tags in the corpusidentify potential nominalisations from the list of nouns with a list of nominalisation suffixesfind the corresponding potential verb for each by identifying the verb (from among verbs as tagged) that shares with it the greatest number of letters in sequenceaccept a pair of nominalisation and verb if the % letter matched > 50% and discard any other
20 EXPERIMENT 2 using decision tree to build a model possible features include:-letter similarity between verbs and nouns-suffix frequency-verb frequency-verb semantics-subject of noun-subject of verb
21 EVALUATION experiments will be based on the BNC corpus. The obtained nominalisations will be evaluated against the CELEX morphological lexicon and manually annotated data.Precision, recall and F-score
22 BRITISH NATIONAL CORPUS Over 100 million wordsCorpus of modern EnglishBoth spoken (10%) and written (90%)Each word is automatically tagged by the CLAWS stochastic POS tagger65 different tagsencoded using SGML to represent POS tags and a variety of other structural properties of texts (e.g. headings, paragraphs, lists, etc.)
24 CELEX English, Dutch and German Annotated by human using lemmata from two dictionaries of English52,446 lemmata and 160,594 wordformsorthographic, phonological, morphological, syntactic and frequency informationmorphological structure, e.g. ((celebrate),(ion))