Presentation on theme: "RECOGNISING NOMINALISATIONS Supervisors: Dr. Alex Lascarides Dr. Mirella Lapata (Andrew) Yuk On KONG University of Edinburgh."— Presentation transcript:
RECOGNISING NOMINALISATIONS Supervisors: Dr. Alex Lascarides Dr. Mirella Lapata (Andrew) Yuk On KONG University of Edinburgh
DEFINITION “Nominalisation refers to the process of forming a noun from some other word-class. (e.g. red+ness) or (in classical transformational grammar especially) the derivation of a noun phrase from an underlying clause (e.g. Her answering of the letter….from She answered the letter). The term is also used in the classification of relative clauses (e.g. What concerns me is her attitude)…….” (Crystal 1997)
Nominalisations (1 st definition) from verbs only are considered here, e.g. "statement" from "state". Problem: WORD--noun? from a verb or not? Nominalsations derived from verbs are very productive in English and are usually created by means of suffixation (i.e., suffixes that form nouns are attached to verb bases).
EXCLUSIONS Nominals, e.g. the poor, the wounded Nominalisation NOT From Verb, e.g. redness -ing form, e.g. the making of the movie Antidisestablish-ment-arian-ism
PSEUDO-NOMINALISATION mote??Motion (noun; a very small piece of dust) DepartDeparture; Department??? Apartapartment????
WHY BOTHER? The identification of nominalisations and their associated verbs (e.g. "statement" and "state"). important for a number of NLP tasks: –machine translation –information retrieval –automatic learning of machine-readable dictionaries –grammar induction
HOW ? nominalisation is a productive morphological phenomenon: list all acceptable nominalised forms? New words?
techniques NOT focusing on nominalisations build rules machine-learning approaches to induce morphological structures using large corpora knowledge-free induction of inflectional morphologies (Schone and Jurafsky 2001).
SCHONE AND JURAFSKY (2001) Schone and Jurafsky (2001) have performed work for acquiring cognates and morphological variants. –Induced semantics—Latent Semantic Analysis (LSA) –Induced orthographic info –Induced syntactic info –Transitive information –Affix frequencies
GOAL OF THIS STUDY The principal goal of this project is to develop a system which can recognise nominalisations, together with the verbs from which they are derived.
EXPERIMENT 1 (baseline) identify nouns using the tags in the corpus identify potential nominalisations from the list of nouns with a list of nominalisation suffixes find the corresponding potential verb for each by identifying the verb (from among verbs as tagged) that shares with it the greatest number of letters in sequence accept a pair of nominalisation and verb if the % letter matched > 50% and discard any other
EXPERIMENT 2 using decision tree to build a model possible features include: -letter similarity between verbs and nouns -suffix frequency -verb frequency -verb semantics -subject of noun -subject of verb
EVALUATION experiments will be based on the BNC corpus. The obtained nominalisations will be evaluated against the CELEX morphological lexicon and manually annotated data. Precision, recall and F-score
BRITISH NATIONAL CORPUS Over 100 million words Corpus of modern English Both spoken (10%) and written (90%) Each word is automatically tagged by the CLAWS stochastic POS tagger 65 different tags encoded using SGML to represent POS tags and a variety of other structural properties of texts (e.g. headings, paragraphs, lists, etc.)
Shopping including collection of prescriptions Daysitting and nightsitting
CELEX English, Dutch and German Annotated by human using lemmata from two dictionaries of English 52,446 lemmata and 160,594 wordforms orthographic, phonological, morphological, syntactic and frequency information morphological structure, e.g. ((celebrate),(ion))