Presentation is loading. Please wait.

Presentation is loading. Please wait.


Similar presentations

Presentation on theme: "RECOGNISING NOMINALISATIONS"— Presentation transcript:

Supervisors: Dr. Alex Lascarides Dr. Mirella Lapata (Andrew) Yuk On KONG University of Edinburgh

2 DEFINITION “Nominalisation refers to the process of forming a noun from some other word-class. (e.g. red+ness) or (in classical transformational grammar especially) the derivation of a noun phrase from an underlying clause (e.g. Her answering of the letter….from She answered the letter). The term is also used in the classification of relative clauses (e.g. What concerns me is her attitude)…….” (Crystal 1997)

3 Nominalisations (1st definition) from verbs only are considered here, e.g. "statement" from "state".
Problem: WORD--noun? from a verb or not? Nominalsations derived from verbs are very productive in English and are usually created by means of suffixation (i.e., suffixes that form nouns are attached to verb bases).

4 EXCLUSIONS Nominals, e.g. the poor, the wounded
Nominalisation NOT From Verb, e.g. redness -ing form, e.g. the making of the movie Antidisestablish-ment-arian-ism

5 REGULAR? Nominalise nominalisation Interpret interpretation
Interrupt interruption Associate association delete deletion break breakage leak leakage

6 Confine confinement Refine refinement (but define definition) submit submission admit admission (but also admittance) remit remission; remittance; remit

7 VERB=NOUN Debate Debate (not debation); debater Pay pay Love love
Boss boss Stand stand purchase purchase Lie lie (“tell a lie”) (cf lie down)

8 VERB=NOUN (except stress)
transfer transfer transport transport import import rebel rebel; (rebellion)

9 1 VERB, >1 NOUNS Collect collection; collector
Interpret interpretation; interpreter Cover cover; coverage Conduct conduction; conductor; Depend dependant/dependent; dependence; dependency

10 SEMANTICS Conduct conduction (conduct electricity/heat)
Conduct conduct (behave/organise)

-tion/-sion er/or Debate debater Talk talker Collect collector Conduct conductor

Choose choice Succeed success;succession;successor Decide decision Sell sale

mote?? Motion (noun; a very small piece of dust) Depart Departure; Department??? Apart apartment????

14 WHY BOTHER? The identification of nominalisations and their associated verbs (e.g. "statement" and "state"). important for a number of NLP tasks: machine translation information retrieval automatic learning of machine-readable dictionaries grammar induction

15 HOW ? nominalisation is a productive morphological phenomenon:
list all acceptable nominalised forms? New words?

16 techniques NOT focusing on nominalisations
build rules machine-learning approaches to induce morphological structures using large corpora knowledge-free induction of inflectional morphologies (Schone and Jurafsky 2001).

17 SCHONE AND JURAFSKY (2001) Schone and Jurafsky (2001) have performed work for acquiring cognates and morphological variants.  Induced semantics—Latent Semantic Analysis (LSA) Induced orthographic info Induced syntactic info Transitive information Affix frequencies

18 GOAL OF THIS STUDY The principal goal of this project is to develop a system which can recognise nominalisations, together with the verbs from which they are derived.

19 EXPERIMENT 1 (baseline)
identify nouns using the tags in the corpus identify potential nominalisations from the list of nouns with a list of nominalisation suffixes find the corresponding potential verb for each by identifying the verb (from among verbs as tagged) that shares with it the greatest number of letters in sequence accept a pair of nominalisation and verb if the % letter matched > 50% and discard any other

20 EXPERIMENT 2 using decision tree to build a model
possible features include: -letter similarity between verbs and nouns -suffix frequency -verb frequency -verb semantics -subject of noun -subject of verb

21 EVALUATION experiments will be based on the BNC corpus.
The obtained nominalisations will be evaluated against the CELEX morphological lexicon and manually annotated data. Precision, recall and F-score

Over 100 million words Corpus of modern English Both spoken (10%) and written (90%) Each word is automatically tagged by the CLAWS stochastic POS tagger 65 different tags encoded using SGML to represent POS tags and a variety of other structural properties of texts (e.g. headings, paragraphs, lists, etc.)

23 <item> <s n=086> <w NN1-VVG>Shopping <w PRP>including <w NN1>collection <w PRF>of <w NN2>prescriptions </item> <s n=087> <w VVG>Daysitting <w CJC>and <w VVG>nightsitting

24 CELEX English, Dutch and German
Annotated by human using lemmata from two dictionaries of English 52,446 lemmata and 160,594 wordforms orthographic, phonological, morphological, syntactic and frequency information morphological structure, e.g. ((celebrate),(ion))

25 MILESTONES 6/2002 Experiment 1—baseline 7/2002 Experiment 2
8/2002 Write-up 9/2002 Finalise report


Similar presentations

Ads by Google