# HIT Summer School 2008, K.v.Deemter Vagueness: a problem for AI Kees van Deemter University of Aberdeen Scotland, UK.

## Presentation on theme: "HIT Summer School 2008, K.v.Deemter Vagueness: a problem for AI Kees van Deemter University of Aberdeen Scotland, UK."— Presentation transcript:

HIT Summer School 2008, K.v.Deemter Vagueness: a problem for AI Kees van Deemter University of Aberdeen Scotland, UK

HIT Summer School 2008, K.v.Deemter Overview 1.Meaning: the received view 2.Vague expressions in NL 3.Why vagueness? a. vague objects b. vague classes c. vague properties 4.Vagueness and context

HIT Summer School 2008, K.v.Deemter Overview (ctd.) 5.Models of vagueness a. Classical models b. Partial Logic c. Context-based models d. Fuzzy Logic e.Probabilistic models 6.Generating vague descriptions Natural Language Generation 7. Project: advertising real estate

HIT Summer School 2008, K.v.Deemter The plan The lectures will give you a broad overview of some of the key concepts and issues to do with vagueness –Lots of theory (very briefly of course) –Some simple/amusing examples The project will let you apply some of these concepts practically The project gives you a lot of freedom: you can do it in your own way!

HIT Summer School 2008, K.v.Deemter Not Exactly: In Praise of Vagueness Oxford University Press To appear, 2009

HIT Summer School 2008, K.v.Deemter 1. Meaning: the received view Linguistic Theory (e.g., Kripke, Montague) : in a given world, each constituent denotes a set-theoretic object. For example, Domain = {a,b,c,d,e,f,g,h,i,j,k} [[cat]] = {a,b,f,g,h,i,j} [[dog]] = {c,d,e,k} There are no unclear cases. Everything is crisp. (Nothing is vague.) Sentences are always true or false.

HIT Summer School 2008, K.v.Deemter 1. Meaning: the received view Linguistic Theory (e.g., Kripke, Montague) : in a given world, each constituent denotes a set-theoretic object. Example: Domain of animals {a,..,k} in a village [[black]] = {a,b,c,d,e} [[white]] = {f,g,h,i,j,k} [[cat]] = {a,b,f,g,h,i,j} [[healthy]] = {d,e,f} [[dog]] = {c,d,e,k} [[scruffy]] = {a,b,h,i} [[black cat]] = {a,b,c,d,e} {a,b,f,g,h,i,j} = {a,b} [[All black cats are scruffy]]= {a,b} {a,b,h,i} = 1

HIT Summer School 2008, K.v.Deemter Complications This was only a rough sketch... –One needs syntax to know how to combine the meanings of constituents –The Example neglects intensionality: [[fake cats]] = [[fake]] [[cat]] ?? These and other issues are tackled in tradition started by Richard Montague –(Not the focus of these lectures)

HIT Summer School 2008, K.v.Deemter Complications Theres more to communication. –Why was the sentence uttered? warn a child not to approach a particular cat? –The same information could be expressed differently ``Cats are always scruffy`` ``Dont touch that cat`` What motivates the choice? This is a key question when you try to let a computer generate sentences: Natural Language Generation (NLG)

HIT Summer School 2008, K.v.Deemter Modern Computational Semantics... is still broadly in line with this ``crisp`` picture Statistical methods do not really change matters Example: recent work on logical entailment –uses stats to learn entailment relations and test theories by comparing with human judgments –Entailment itself is typically assumed to be crisp But what if the extension of a word is vague? –clearly scruffy: a,b,h,i –clearly not scruffy: f –unclear cases: d,g,j : maybe scruffy? a little scruffy? Does x is scruffy entail x is not healthy? (Perhaps just a bit?) A subtler semantic theory is needed, which takes vagueness into account

HIT Summer School 2008, K.v.Deemter 2. Vague expressions in NL Vagueness in every linguistic category. E.g., adjectives: large, small, young, old,... adverbs: quickly, slowly, well, badly,... determiners/articles: many, few,... nouns: girl, boy, castle, fortune, poison (?)... verbs: run, improve (significantly), cause (?)... prepositions: (right) after, (right) above,... intensifiers: very, quite, somewhat, a little

HIT Summer School 2008, K.v.Deemter In some categories, vagueness is normal: Adjectives: hard to find crisp ones Adverbs: hard to find crisp ones British National Corpus (BNC): 7 of the 10 most frequent adjectives are vague; the others are... borderline cases of adjectives ( `last`, `other`, `different` ):

HIT Summer School 2008, K.v.Deemter British National Corpus (BNC) last (140.063 occurrences) Adj? other (135.185) Adj? new (115.523) good (100.652) old (66.999) great (64.369) high (52.703) small (51.626) different (48.373) Adj? large (47.185)

HIT Summer School 2008, K.v.Deemter Non-lexical vagueness Generics: ``Cats are scruffy``. All cats?? ``Dutchmen are tall``. All Dutchmen?? Temporal succession: ``He came, he saw, he conquered. In quick succession! Irony: ``George is not the brightest spark`. George is probably quite stupid. Aside: Same mechanisms in Chinese? Many possible linguistic research projects!

HIT Summer School 2008, K.v.Deemter 3. Why do we use vagueness? 1.Vague objects Example: Old Number One. (A court case made famous by a recent article by the philosopher Graeme Forbes.)

HIT Summer School 2008, K.v.Deemter Old Number One [Details omitted here, but see the following animation]

HIT Summer School 2008, K.v.Deemter

Other objects are vague too... Rowing boats are repaired by replacing planks. When no plank is the same, is it the same boat? Think of a book or PhD thesis. How is it written? Doesnt it change all the time? What if its translated? What if its translated badly? How many languages are spoken in China? Every linguists gives a different answer. Are you the same person as 3 weeks after conception? After youve lost your memory? The cat is loosing hair. Is it still part of the cat?

HIT Summer School 2008, K.v.Deemter Why vagueness? 2. Vague classes Biology example made famous by Richard Dawkins: the ring species called ensatina. Standard principle: a and b belong to the same species iff a and b can produce fertile offspring.

HIT Summer School 2008, K.v.Deemter Why vagueness? 2. Vague classes Biology example made famous by Richard Dawkins: the ring species called ensatina. Standard principle: a and b belong to the same species iff a and b can produce fertile offspring. It follows that species overlap, and that there are many more of them than is usually assumed!

HIT Summer School 2008, K.v.Deemter The ensatina salamander [Details omitted here]

HIT Summer School 2008, K.v.Deemter Why vagueness? 3. Vague predicates Why do we use words like `large/small`, `dark/light`, `red/pink/...`, `tall/short` ? Question: would you believe me if I told you that Dutch has no vague word meaning `tall`, but only the word `lang` which means `height 1.85cm`? For example, Kees is lang means height(Kees) 1.85cm

HIT Summer School 2008, K.v.Deemter Why vagueness? `Kees is taller than 1.85cm` 1.85cm The threshold of 185cm makes a crisp distinction between A : height >185cm B: height 185cm A B

HIT Summer School 2008, K.v.Deemter Why vagueness? `Kees is taller than 185cm` 184.999cm 185.001cm A B x y Some elements of A and B are too close to be distinguished by anyone! Example: x in A y in B

HIT Summer School 2008, K.v.Deemter Vagueness can have other reasons, including We may not have an objective measurement scale (e.g., `John is nice`) You may not share a scale with your audience (Celcius/Fahrenheit) You may want to add your own interpretation (e.g., `Blood pressure is too high`) Youre passing on words verbatim (e.g., I tell you what I heard in the weather forecast) Many of these reasons are relevant for NLP (including NLG)!

HIT Summer School 2008, K.v.Deemter 4. Vagueness and context

HIT Summer School 2008, K.v.Deemter 4. Vagueness and Context Vague expressions can often not be interpreted without context We know (roughly) how to apply `tall` to a person. Roughly: Tall(x) Height(x) >> 170cm If we used the same standards for buildings as for people, than no building would ever be tall! Context-dependence allows us to associate one word with very different standards. Very efficient! (Cf. Barwise & Perry, Situations and Attitudes)

HIT Summer School 2008, K.v.Deemter Variant of old example [[animal]] = {a,...,k} [[black]] = {a,b,c,d,e} [[cat]] = {a,b,f,g,h,i,j} [[elephant]] = {c,d,e,k} [[black cat]] = {a,b,c,d,e} {a,b,f,g,h,i,j} = {a,b} [[small elephant]] = ? [[small]] = ? Any answer implies that x is a small elephant x is a small animal

HIT Summer School 2008, K.v.Deemter Other types of vagueness are also context dependent [[many]] = how many? [[(increase) rapidly]] = how rapidly? temporal succession. Compare the time lags between the events reported: 1. ``John entered the room. [..] He looked around. less than a second 2. ``Caesar came, [..] he saw, [..] he conquered`` weeks or months 3. ``The ice retreated. [..] Plants started growing. [..] New species appeared.`` many years

HIT Summer School 2008, K.v.Deemter 5. Models of vagueness

HIT Summer School 2008, K.v.Deemter 5. Models of Vagueness A problem that we would like our models to shed light on: The sorites paradox Oldest known version (Euboulides): 1 stone makes no heap; if x stones dont make a heap then x+1 stones dont make a heap. So, no finite heaps exist (consisting of stones only). Henceforth: `~` = `indistinguishable` `<<` = observably smaller than

HIT Summer School 2008, K.v.Deemter Sorites paradox (version 2) Short(150.000cm) Short(200.000cm) 150.000cm ~150.001cm therefore Short(150.001cm) 150.001cm ~ 150.002cm therefore Short(150.002cm)... therefore Short(200.000cm)

HIT Summer School 2008, K.v.Deemter Sorites paradox We derive a contradiction Of course there is something wrong here But what exactly is the error? One bit that we would like to ``honour``: Principle of Tolerance: things that resemble each other so closely that people cannot notice the difference should not be assigned (very) different semantic values

HIT Summer School 2008, K.v.Deemter 5. Models of Vagueness 1.Partial Logic. A small modification of Classical Logic. Three truth values: True, False, Undecided (in the gap between True and False).

HIT Summer School 2008, K.v.Deemter Partial Logic 185cm Partial Logic makes some formulas true, others false, yet others undecided. In the partial model on the right, if Height(John)=170cm then [[Tall(John)]] = undecided Tall Not tall Gap 165cm

HIT Summer School 2008, K.v.Deemter Some connectives defined [[p q]] = True iff [[p]]=True or [[q]]=True [[p q]] = False iff [[p]]=[[q]]=False [[p q]] = Undecided otherwise [[ p]] = True iff [[p]] = False [[ p]] = False iff [[p]] = True [[ p]] = Undecided otherwise

HIT Summer School 2008, K.v.Deemter Consider... [[Tall(John) Tall(John)]] (an instance of the Law of Excluded Middle) Is this true, false, or undecided?

HIT Summer School 2008, K.v.Deemter Consider... [[Tall(John) Tall(John)]] [[Tall(John)]] = undecided, therefore [[Tall(John) Tall(John)]] = undecided

HIT Summer School 2008, K.v.Deemter Repair by means of supervaluations Suppose I am uncertain about something (e.g., the exact threshold for ``Tall``) Suppose p is true regardless of how my uncertainty is resolved... Then I can conclude that p

HIT Summer School 2008, K.v.Deemter Consider [[Tall(John) Tall(John)]] The yes/no threshold for ``Tall`` can be anywhere between 165 and 185cm. Where-ever the threshold is, there are only two possibilities: 1. [[Tall(John)]] = True. In this case [[Tall(John) Tall(John)]] = True. 2. [[Tall(John)]] = False. In this case [[ Tall(John)]] = True. Therefore [[Tall(John) Tall(John)]] = True. The formula must therefore be True.

HIT Summer School 2008, K.v.Deemter Partial Logic + supervaluations Supervaluations enable Partial Logic to be ``almost Classical`` in its behaviour. How good is this as a model of vagueness? Like the Classical model that put the threshold at 185cm, the partial model makes a distinction that people could never make:

HIT Summer School 2008, K.v.Deemter Partial Logic This time, we even have 2 such artificial boundaries: This still contradicts the Principle of Tolerance 185.001cm Tall Not tall Gap 165.001cm 184.999cm 164.999cm

HIT Summer School 2008, K.v.Deemter 5. Models of vagueness 2. Context-based models (Version by N.Goodman, M.Dummett, F.Veltman, R.Muskens) Suppose x~y, but h << x h~y The observer can deduce that y<x, so y and x become distinguishable. Short(y) no longer implies Short(x) 184.999 A B 185.001 x y 184.000 h

HIT Summer School 2008, K.v.Deemter 5. Models of vagueness h is called a help element Sorites introduces more and more things into the argument that can serve as help elements. 184.999 A B 185.001 x y 184.000 h

HIT Summer School 2008, K.v.Deemter 5. Models of vagueness New Principle of Tolerance: things that resemble each other so closely that people cannot notice the difference even when taking help elements into account should not be assigned different semantic values Theorem: this new tolerance principle does not lead to sorites 184.999 A B 185.001 x y 184.000 h

HIT Summer School 2008, K.v.Deemter 5. Models of vagueness Problem: New Tolerance Principle assumes that `~` is crisp 184.999 A B 185.001 x y 184.000 h

HIT Summer School 2008, K.v.Deemter 5. Models of vagueness If you ask a subject to compare x,y and h 1000 times, you get different responses, e.g. 1.x~y, h<<x, h~y. 2.x~y, h~x, h~y. 3.y<<x, h<<x, h<<y. 4.y<<x, h<<x, h~y. 5.x<<y, h<<x, h~y. 6.x~y, h<<x, h~y. 7.... (etc)...

HIT Summer School 2008, K.v.Deemter 5. Models of vagueness If you ask a subject to compare x,y and h 1000 times, you get different responses The idea of a fixed just noticeable difference is a simplification This is a serious problem for context-based approaches. Other context-based models exist, e.g. H.Kamp 1981, but these are further removed from classical logic

HIT Summer School 2008, K.v.Deemter Models of vagueness Book: agent-based model which takes the probabilistic nature of `~` into account. Inspired by J.Halperns work. Model revolves around Margin of Error (MoE): x~y def |MeasuredHeight(x)-MeassuredHeight(y)| MoE Clearly(Short(x)) def MeasuredHeight(x) (threshold short – MoE) Tolerance: Clearly(Short(x)) & x~y Short(y)

HIT Summer School 2008, K.v.Deemter Models of vagueness ClearlyShort(150.000cm) MoE=5cm ClearlyShort(150.001cm)... ClearlyShort(160.000cm) 160.000cm ~ 165.000cm therefore, Short(165.000cm) ClearlyShort(165.000cm), because the individual measured as 165.000cm might by as tall as 170.000cm.

HIT Summer School 2008, K.v.Deemter Models of vagueness Intuition behind this agent-based model: We are aware of our own MoE This allows us to distinguish between things we know and things that we are not so sure about This suggests a solution to sorites that distinguishes between the two. But: –Still the same problems with Principle of Tolerance –Is MoE really crisp?? Perhaps we need a non-crisp model!

HIT Summer School 2008, K.v.Deemter 5. Models of Vagueness 3. Fuzzy Logic. A much more drastic deviation from Classical Logic.

HIT Summer School 2008, K.v.Deemter Given that `crisp boundaries are a disadvantage of 3-valued logic, how about using real numbers as values? Best known version is fuzzy logic (Zadeh 1975): [φ] ε [0,1], where [φ] [ ] is at least as true as φ Thats all (for atomic formulas). Fuzzy logic does not say how truth values may be obtained

HIT Summer School 2008, K.v.Deemter Truth definition (fuzzy logic) Everything is done truthfunctionally: 1.Negation: [¬φ] = 1- [φ] 2.Disjunction [φv ] = max([φ], [ ]) 3.Conjuntion [φ ] = min([φ], [ ]) 4. Implicaton [φ ]. Various options, e.g.: (a)[φ ] = [¬φ v ] (b) If [φ] [ ] then [φ ] = 1, otherwise [φ ] = 1- ([φ]-[ ])

HIT Summer School 2008, K.v.Deemter Using option (a) [small(i) small(i+1)]= [¬ small(i) v small(i+1)]= max([¬ small(i)], [small(i+1)])= max(1-[small(i)], [small(i+1)]) For i=150, this is [small(151)]. For i=151, this is [small(152)] … Presumably, these are all close to 1 For some i, [small(i) ]=1/2+ε, [small(i+1)]= 1/2-ε, At which point, we have [small(i) small(i+1)] = max((1-1/2)+ ε, 1/2-ε) = 1/2-ε. Some very different values, some of which are quite low!

HIT Summer School 2008, K.v.Deemter Using option (b) (b) If [φ] [ ] then [φ ] = 1 otherwise, [φ ] = 1- ([φ]-[ ]) Assume that [small(i)]- [small(i+1)] = ε, for some small constant ε. Then [small(i) small(i+1)] = 1- [small(i)]-[small(i+1)], regardless of i. This is better! Lets look at the paradox again

HIT Summer School 2008, K.v.Deemter Conclusions get increasingly low values small(0) [Some high value, lets say 1] small(1) [1-ε] small(2) [1-2ε] small(3) [1-3ε], etc. This is great ! But other drawbacks remain. For example, for most i, we have [small(i) v ¬small(i)] 1

HIT Summer School 2008, K.v.Deemter The problem with truthfunctionality Truthfunctionality implies: If [q] = [¬p] then [p V ¬p] = [p V q] Truth definition of `V` cannot take into account whether the disjuncts are related

HIT Summer School 2008, K.v.Deemter Other forms of the same problem Lets take a vote: Which one is the big expensive car? a. [car(x)]=1 [big(x)]=0.5 [expensive(x)]=0.5 b. [car(y)=1 [big(x)]=0.5 [expensive(x)]=1

HIT Summer School 2008, K.v.Deemter Other forms of the same problem a. [car(x)]=1 [big(x)]=0.5 [expensive(x)]=0.5 b.[car(y)=1 [big(y)]=0.5 [expensive(x)]=1 Fuzzy Logic says, counter-intuitively, that [car(x) big(x) exp(x)]=0.5 [car(y) big(y) exp(y)]=0.5

HIT Summer School 2008, K.v.Deemter To address these problems, Fuzzy Logic allows different kinds of conjunction, for example Conjunction 1: [φ ] = min([φ], [ ]) Conjunction 2: [φ ] = [φ] * [ ]

HIT Summer School 2008, K.v.Deemter Pros and cons of using fuzzy logic Fuzzy logic captures the intuition that sorites leads to increasingly doubtful conclusions. By using many truth values, it becomes difficult to determine the truth values of atoms Different values must sometimes be attributed to [small(x)] and [small(y)] where x and y are indistinguishable. If not, the sorites paradox returns! Imagine the line-up of people again: a 1 ~ a 2 ~ … ~ a 4001 Is this not at odds with the Tolerance Principle?

HIT Summer School 2008, K.v.Deemter Pros and cons of using fuzzy logic (Ctd) Definition of logical operators proves to be a bit arbitrary Logical principles tend to be sacrificed (e.g. excluded middle) Fuzzy Logic not popular as a solution for the sorites paradox But lots of maths and engineering (e.g., fuzzy control)

HIT Summer School 2008, K.v.Deemter Fuzzy Expert Systems Essentially, the same problems plague fuzzy engineering each fuzzy system defines the connectives in its own, somewhat ad hoc way. accrual is often an additional problem Fuzzy Logic is not really one logic, but a toolkit for making logics An example

HIT Summer School 2008, K.v.Deemter Fuzzy Expert Systems R1: (female(x) smoking(x)) at_risk(x) R2: hypertension(x) at_risk(x) What kind of conjunction should we use? What if both rules fire; does this increase the value [at_risk(x)]?

HIT Summer School 2008, K.v.Deemter 5. Models of Vagueness 4. Probabilistic Logic. An alternative to Fuzzy Logic: -- based on probability -- conditional probabilities are not truth functional -- the result is a much more classical and principled logic.

HIT Summer School 2008, K.v.Deemter Probabilistic Logic [details omitted] See various papers by D.Edgington See also my book Not Exactly: in Praise of Vagueness (2009).

HIT Summer School 2008, K.v.Deemter 6. Generating Vague Descriptions My own work in this area: generation of phrases like –the large dog –the largest two of the cheapest dogs –etc. from a purely numerical database. Van Deemter 2006 ``Generating Referring Expressions that Involve Gradable Properties. Computational Linguistics 32 (2).

HIT Summer School 2008, K.v.Deemter 7. Project: vagueness in real estate NLP research at the University of Aberdeen focusses on Natural Language Generation Project: explore vagueness from an NLG viewpoint Concretely, Id like you to study the use of vague expressions in adverts for real estate Next slides: a computational project: specify, construct and document an NLG program that generates adverts. You can use a computing platform and programming language of your own choosing: Whatever works best for you!

HIT Summer School 2008, K.v.Deemter Hints for a computational project Input: a self-made database containing information on real estate (flats, houses, etc.): size of each room/garden, using numbers (e.g., the widest extensions of the kitchen may be 2.00cm by 3.23cm) something about their quality (e.g., the kitchen may be modern, well-equipped, with nice sea views). Use your imagination!

HIT Summer School 2008, K.v.Deemter Hints for a computational project Output: (Parts of) an advert, containing sentences that the estate agent might use For example: We are pleased to offer this magnificent 5-bedroom apartment, which overlooks the sea. It has a spacious, well- equipped kitchen, whose widest dimensions are 4.30cm by 5.13cm. The living room is... (This is only an example of the kind of text a program might generate given one input. Your own program might generate much simpler text.) Dont worry about the user interface (e.g. how the user selects an apartment). Keep it simple.

HIT Summer School 2008, K.v.Deemter Hints for a computational project Things to bear in mind: context-dependence: large for a bathroom is not the same as large for a living room perspective: youre the estate agent. Are you honest (a small kitchen) or just trying to sell (a cosy kitchen). Whatever you do, please document it by writing a short report, written in clear English. What is not documented does not exist!

HIT Summer School 2008, K.v.Deemter Questions: Can a house be small if all its rooms are large? Can a house be modern if all its rooms are old-fashioned? Whats your answer to the Principle of Tolerance? Which of the formal models of vagueness did you find most useful (if any)?

HIT Summer School 2008, K.v.Deemter Your report should contain An explanation of the kinds of inputs that your program accepts An explanation of the mechanisms that you use for modelling the facts An explanation of your generation algorithm Examples of output texts (along with the inputs from which they were generated) A brief discussion of the pros and cons of your approach. What would you do differently the next time? What would you do next (if you had the time)? A clear installation manual for your program.

HIT Summer School 2008, K.v.Deemter The previous slides assume a project that is mainly computational. A more linguistic alternative: –Study a corpus of real-estate adverts, and report on your findings. Example texts can be found on the web, e.g. the ASPC real-estate web site in Aberdeen: http://www.aspc.co.uk/ (Also relevant for computational project)

HIT Summer School 2008, K.v.Deemter The previous slides assume a project that is mainly computational. A more linguistic alternative: –Study a corpus of real-estate adverts, and report on your findings. Example texts can be found on the web, e.g. the ASPC real-estate web site in Aberdeen. –Ideal: something in between (a mix?) of these two types of projects

HIT Summer School 2008, K.v.Deemter Concluding Interested in working on vagueness, or NLG, or both? Contact me: k.vdeemter@abdn.ac.uk Kees van Deemter University of Aberdeen Scotland, UK

HIT Summer School 2008, K.v.Deemter Appendix: vagueness in computational semantics

HIT Summer School 2008, K.v.Deemter

Survey of 2007 International Workshop on Computational Semantics (IWCS-7) Semantics used in 45 non-invited papers: DRT or UDRT: 4 Hole Semantics: 2 MRS: 1 Description Logic: 2 Default Logic: 1 Modal and Temporal Logic: 3 Lambda Calculus: 3 Simple atomic formulas: 6 Informal categories (like Agent/Patient, or Event/Action): 18, of which modelling gradable dimensions (e.g., word similarity): 3 Partial Logic: 1 Prototype Theory: 1

HIT Summer School 2008, K.v.Deemter Computational Semantics Everything written in blue was overtly Boolean The bits in red: –2 papers model gradability –3 papers identify gradable phenomena (word similarity, document similarity, language change and learning)

HIT Summer School 2008, K.v.Deemter End of Appendix

Download ppt "HIT Summer School 2008, K.v.Deemter Vagueness: a problem for AI Kees van Deemter University of Aberdeen Scotland, UK."

Similar presentations