Download presentation

Presentation is loading. Please wait.

Published byEric Hoffman Modified over 3 years ago

1
Kees van Deemter (SSE, Jan '10) Vagueness Facilitates Search Kees van Deemter Computing Science University of Aberdeen

2
Kees van Deemter (SSE, Jan '10) Who I am: I... studied logic and philosophy of language –University of Amsterdam (PhD) –Stanford University (postdoc) worked for Philips Electronics on HCI and Language Technology have worked on Natural Language Generation since 1994 –University of Brighton (ITRI) –University of Aberdeen

3
Kees van Deemter (SSE, Jan '10) sources for this talk (KvD 2009a) What Game Theory Can Do For NLG: the case of vague language. In Proc. 12th Eur. Workshop on Natural Language Generation. (KvD2009b) Utility and Language Generation: The Case of Vagueness. J. Philosophical Logic 38/6. (Kvd2009c) Vagueness Facilitates Search, in Proc. of the Amsterdam Colloquium, Dec. 2009

4
Kees van Deemter (SSE, Jan '10) An expression is vague (V) iff it has borderline cases or degrees, e.g. large, small, fast, slow, many, few,... Not just words, e.g., –he came, he saw, he conquered –generic statements Common in all human languages 8 out of top 10 adjectives in BNC Dominant among the first words we learn

5
Kees van Deemter (SSE, Jan '10) Two big problems with vagueness 1.The semantic problem: How to model the meaning of V expressions? –Classical models: 2-valued –Partial models: 3-valued –Degree models: many-valued (e.g. Fuzzy Logic, Probabilistic Logic) No agreement how to answer this question (e.g., Keefe & Smith 1997)

6
Kees van Deemter (SSE, Jan '10) The pragmatic problem: 2.Why is language vague? Vague expressions seem a bit unclear Is it ever a good idea to be V? Suppose you built an electronic information provider; would you ever want it to offer you V information?

7
Kees van Deemter (SSE, Jan '10) Barton Lipman chapter in A.Rubinstein, Economics and Language (2000) working paper Why is Language Vague (2006) Lipman: Why have we tolerated an apparent worldwide several-thousand year efficiency loss? Thats todays topic

8
Kees van Deemter (SSE, Jan '10) The scenario of Lipman (2000, 2006) Airport scenario: I describe Mr X to you, to pick up X from the airport. All I know is Xs height; heights are uniformly distributed across people on [0,1]. If you identify X right away, you get payoff 1; if you dont then you get payoff -1

9
Kees van Deemter (SSE, Jan '10) What description would work best? State Xs height precisely If each of us knows Xs exact height then the probability of confusion is close to 0.

10
Kees van Deemter (SSE, Jan '10) What description would work best? State Xs height precisely If each of us knows Xs exact height then the probability of confusion is close to 0. If only one property is allowed: Say the tall person if height(X) > 1/2, else say the short person.

11
Kees van Deemter (SSE, Jan '10) What description would work best? State Xs height precisely If each of us knows Xs exact height then the probability of confusion is close to 0. If only one property is allowed: Say the tall person if height(X) > 1/2, else say the short person. No boundary cases, so this is not vague! Theorem: under standard game-theory assumptions (Crawford/Sobel), vague communication can never be optimal

12
Kees van Deemter (SSE, Jan '10) One type of answer to Lipman: conflict between S and H Aragones and Neeman (2000): ambiguity can add to speakers utility U S (politicians example)

13
Kees van Deemter (SSE, Jan '10) One type of answer to Lipman: conflict between S and H Aragones and Neeman (2000): ambiguity can add to speakers utility U S (politicians example) –De Jaegher (2003) argued that Aragones & Neemans solution wont work for vagueness –(KvD 2009a) adapted Aragones & Neeman for vagueness But what if language is used honestly? (i.e., U S =U H ) An illustrative application: Natural Language Generation

14
Kees van Deemter (SSE, Jan '10) Natural Language Generation (NLG) is an area of AI with many practical applications (e.g. Reiter and Dale 2000) An NLG program translates input data to linguistic output Essentially the problem of choosing the best linguistic Form for a given Content What does this choice depend on?

15
Kees van Deemter (SSE, Jan '10) Example: Roadgritting (Turner et al. 2009)

16
Kees van Deemter (SSE, Jan '10)

17
Example: Roadgritting (e.g.,Turner et al. 2009) Compare 1.Roads above 500m are icy 2.Roads in the Highlands are icy Decision-theoretic perspective: 1. 100 false positives, 2 false negatives 2. 10 false positives, 10 false negatives Suppose each false positive has utility of -0.1 each false negative has utility of -2

18
Kees van Deemter (SSE, Jan '10) Example: Roadgritting (e.g.,Turner et al. 2009) Suppose false positive has utility of -0.1 false negative has utility of -2 Then 1: 100 false pos, 2 false neg = -14 2: 10 false pos, 10 false neg = -21 So summary 1 is preferred over summary 2.

19
Kees van Deemter (SSE, Jan '10) Our question: When (if ever) is vague communication more useful than crisp communication? The question is not: Can vague communication be of some use? The question is: When is vague communication more useful than crisp communication? –Aside: Is vagueness useful at all?

20
Kees van Deemter (SSE, Jan '10) Gary Marcus: The haphazard construction of the human mind

21
Kees van Deemter (SSE, Jan '10) Two related questions 1.When/why should a person express information vaguely? 2.When/why should an NLG system express information vaguely?

22
Kees van Deemter (SSE, Jan '10) 1. Vicissitudes of measurement 11m 12m

23
Kees van Deemter (SSE, Jan '10) 1. Vicissitudes of measurement [a] Example: One house of 11m height and one house of 12m height 1.the house thats 12m tall needs to be demolished 2.the tall house needs to be demolished Comparison is easier and more reliable than measurement prefer utterance 2 –Measurable as likelihood of incorrect action

24
Kees van Deemter (SSE, Jan '10) 1. Vicissitudes of measurement [a] Example: One house of 11m height and one house of 12m height 1.the house thats 12m tall needs to be demolished 2.the tall house needs to be demolished Comparison is easier and more reliable than measurement prefer utterance 2 [But arguably, this utterance is not vague Its vagueness is merely local]

25
Kees van Deemter (SSE, Jan '10) Apparent vagueness is frequent the tall house the tallest house Physical exercise is good for young and old regardless of age Bad for bacteria, good for gums gums improve as a result of bacterial death Fast-flowing rivers are deep the faster the deeper (positive correlation between variables)

26
Kees van Deemter (SSE, Jan '10) 1. Vicissitudes of measurement [b] Numbers can suggest spurious precision Weather prediction: It will be 23.75 degrees Celcius –Margin of error may be as much as 2 degrees It will be mild does not have this problem [But why not say: approx. 24 degrees ?]

27
Kees van Deemter (SSE, Jan '10) 2. Production/interpretation Effort Effort needs to be commensurate with utility. In many cases, more precision adds little benefit. E.g., Feasibility of an outing does not depend on whether its 20C or 30C. Mild takes fewer syllables than twenty three point seven five. –Time pressure affects speakers choices (e.g. Horton & Keysar 1996) –Vague words tend to be short (Krifka 2002) Context dependence adds to efficiency

28
Kees van Deemter (SSE, Jan '10) 3.Evaluation payoff Example: The doctor says –Utterance 1: Your blood pressure is 153/92. –Utterance 2: Your blood pressure is high. U2 offers less detail than U1 But U2 also offers evaluation of your condition (cf. Veltman 2000)

29
Kees van Deemter (SSE, Jan '10)

30
Example: The doctor says –Utterance 1: Your blood pressure is 153/92. –Utterance 2: Your blood pressure is high. U2 offers less detail than U1 But U2 also offers evaluation of your condition (cf. Veltman 2000) –A link with actions (cut down on salt, etc.) –Especially useful if metric is difficult –Measurable as likelihood of incorrect action

31
Kees van Deemter (SSE, Jan '10) Empirical evidence B.Zikmund-Fisher et al. (2007) E.Peters et al. (2009) Experiments showing that evaluative categories (i.e., labels) affect readers decisions

32
Kees van Deemter (SSE, Jan '10) Empirical evidence E.Peters et al. (2009): Hospital ratings based on numerical factors: (1) survival percentages (!), (2) percentages of recommended treatment, and (3) patient satisfaction Evaluation judgment: How attractive is this hospital to you?

33
Kees van Deemter (SSE, Jan '10) E.Peters et al. (2009) When numerical information was accompanied by labels (fair, good, excellent), a greater proportion of variance in evaluation judgments could be explained by the numeric factors –Without labels, the most important information (i.e., factor 1) was not used at all –Without labels, less numerate subjects were influenced by mood (I feel good/bad/happy/upset)

34
Kees van Deemter (SSE, Jan '10) Practical implications People are bad at interpreting numbers Mood etc. tend to take their place Evaluative categories (i.e., labels like good) matter

35
Kees van Deemter (SSE, Jan '10) [But why should labels be vague?] [Why does English not have a (brief) expression that says Your blood pressure is 150/90 and too high?] Compare You are obese means Your BMI is above 30 and this is dangerous.

36
Kees van Deemter (SSE, Jan '10) 4. Future contingencies Indecent Displays Control Act (1981) forbids public display of indecent matter –indecent at the time the law has been parameterised (Waismann 1968, Hart 1994, Lipman 2006) Obama/Volcker: Not too much risk should be concentrated into one bank (Jan. 2010) –opening bid in a policy war

37
Kees van Deemter (SSE, Jan '10) 5. Lack of a good metric –Mathematics: How difficult is a proof? (As the reader may easily verify) –Multidimensional measurements: Whats the size of a house? –Esthaetics: How beautiful is a sunset?

38
Kees van Deemter (SSE, Jan '10)

39
End of survey... on previous answers to the question why language is vague

40
Kees van Deemter (SSE, Jan '10) Remainder of this talk Explore tentative new answer: V can oil the wheels of communication Starting point: its almost inconceivable that all speakers arrive at exactly the same concepts

41
Kees van Deemter (SSE, Jan '10) Causes of semantic mismatches Perception varies per individual –Hilbert 1987 on colour terms: density of pigment on lens & retina; sensitivity of photo receptors Cultural issues. –Reiter et al. 2005 on temporal expressions. Example: evening: Are the times of dinner and sunset relevant? R.Parikh (1994) recognised that mismatches exist …

42
Kees van Deemter (SSE, Jan '10) Parikh proposed utility-oriented perspective on meaning –Utility as reduction in search effort showed communication doesnt always break down when words are understood (slightly) differently by different speakers Consider the expression blue book

43
Kees van Deemter (SSE, Jan '10)

44
Blue books (Bob) Blue books (Ann) 25 75 225 Ann: Bring the blue book on topology Bob: Search [[blue]] Bob, then, if necessary, all other books (only10% probability!) 675

45
Kees van Deemter (SSE, Jan '10) What Parikh did not do: show utility of V –Ann and Bob used crisp concepts! This talk: tall instead of blue. 2-dimensional [[tall 1 ]] [[tall 2 ]] or [[tall 2 ]] [[tall 1 ]] Focus on V

46
Kees van Deemter (SSE, Jan '10) The story of the stolen diamond A diamond has been stolen from the Emperor and (…) the thief must have been one of the Emperors 1000 eunuchs. A witness sees a suspicious character sneaking away. He tries to catch him but fails, getting fatally injured (...). The scoundrel escapes. (…) The witness reports The thief is tall, then gives up the ghost. How can the Emperor capitalize on these momentous last words? (book, to appear)

47
Kees van Deemter (SSE, Jan '10) The problem with dichotomies Suppose Emperor uses a dichotomy, e.g. Model A: [[tall]] Emperor = [[>180cm]] (e.g., 500 people) What if [[tall]] Witness = [[>175cm]]? If thief [[tall]] Witness - [[tall]] Emperor then Predicted search effort: 500+ ½(500)=750 Without witness utterance: ½(1000)=500 The witness utterance misled the Emperor 180cm thief 175cm A

48
Kees van Deemter (SSE, Jan '10) Model A uses a crisp dichotomy between [[tall]] A and [[ tall]] A Contrast this with a partial model B, which has a truth value gap [[?tall?]] B

49
Kees van Deemter (SSE, Jan '10) Model AModel B tall A tall B ?tall? B tall B 2-valued 3-valued 180cm 165cm

50
Kees van Deemter (SSE, Jan '10) How does the Emperor classify the thief? s(X) = def expected search time given model X Three types of situations Type 1: thief [[tall]] A = [[tall]] B In this case, s(A)=s(B)

51
Kees van Deemter (SSE, Jan '10) How does the emperor classify the thief? Type 2: thief [[?tall?]] B model A: search all of [[tall]] A in vain, then (on average) half of [[ tall]] A model B: search all of [[tall]] B in vain, then (on average) half of [[?tall?]] B B searches ½(card([[ tall]] B )) less! So, s(B)~~
{
"@context": "http://schema.org",
"@type": "ImageObject",
"contentUrl": "http://images.slideplayer.com/3/778928/slides/slide_51.jpg",
"name": "Kees van Deemter (SSE, Jan 10) How does the emperor classify the thief.",
"description": "Type 2: thief [[ tall ]] B model A: search all of [[tall]] A in vain, then (on average) half of [[ tall]] A model B: search all of [[tall]] B in vain, then (on average) half of [[ tall ]] B B searches ½(card([[ tall]] B )) less. So, s(B)~~~~
~~

52
Kees van Deemter (SSE, Jan '10) How does the emperor classify the thief? Type 3: thief [[ tall]] B model A: search all of [[tall]] A in vain, then (on average) half of [[ tall]] A model B: search all of [[tall]] B in vain, then all of [[?tall?]] B then (on average) half of [[ tall]] B Now A searches ½(card([[?tall?]] B )) less. So, s(A)~~
{
"@context": "http://schema.org",
"@type": "ImageObject",
"contentUrl": "http://images.slideplayer.com/3/778928/slides/slide_52.jpg",
"name": "Kees van Deemter (SSE, Jan 10) How does the emperor classify the thief.",
"description": "Type 3: thief [[ tall]] B model A: search all of [[tall]] A in vain, then (on average) half of [[ tall]] A model B: search all of [[tall]] B in vain, then all of [[ tall ]] B then (on average) half of [[ tall]] B Now A searches ½(card([[ tall ]] B )) less. So, s(A)~~~~
~~

53
Kees van Deemter (SSE, Jan '10) Normally model B wins Type 2 is more probable than Type 3 Let S = xy ((x [[?tall?]] B & y [[ tall]] B ) p(tall(x)) > p(tall(y)) S is highly plausible! (Taller individuals are more likely to be called tall)

54
Kees van Deemter (SSE, Jan '10) Normally model B wins S : xy ((x [[?tall?]] B & y [[ tall]] B ) p(tall(x)) > p(tall(y) ) S implies xy ((x [[?tall?]] B & y [[ tall]] B ) p(thief(x)) > p(thief(y) ) It pays to search [[?tall?]] B before [[ tall]] B A priori, s(B) < s(A)

55
Kees van Deemter (SSE, Jan '10) Why does model B win? 3-valued models beat 2-valued models because they distinguish more finely Therefore, many-valued models should be even better! –All models that allow degrees or ranking (including e.g. Kennedy 2001)

56
Kees van Deemter (SSE, Jan '10) Consider degree model C v(tall(x)) [0,1] (e.g., Fuzzy or Probabilistic Logic) xy ((v(tall(x)) > v((tall(y)) p(tall(x)) > p(tall(y)) height(x) probability of x being called tall

57
Kees van Deemter (SSE, Jan '10) Consider degree model C Analogous to the Partial model C allows the Emperor to –rank the eunuchs, and –start searching the tallest ones, etc. Assuming >3 differences in height (i.e., more than 3 differences in v(tall(x)), this is even quicker than B: s(C) < s(B)

58
Kees van Deemter (SSE, Jan '10) An example of model C v(tall(a1)=v(tall(a2))=0.9 v(tall(b1)=v(tall(b2))=0.7 v(tall(c1)=v(tall(c2))=0.5 v(tall(d1)=v(tall(d2))=0.3 v(tall(e1)=v(tall(e2))=0.1 Search {a1,a2} first, then {b1,b2}, etc. (5 levels) This is quicker than a partial model (3 levels) which is quicker than a classical model (2 levels)

59
Kees van Deemter (SSE, Jan '10) This analysis suggests …

60
Kees van Deemter (SSE, Jan '10) This analysis suggests …... that it helps the Emperor to understand tall as having borderline cases or degrees But, borderline cases and degrees are the hallmark of V It appears to follow that V has benefits for search

61
Kees van Deemter (SSE, Jan '10) This analysis also suggests …... that degree models offer a better understanding of V than partial models –Compare the first big problem with V Radical interpretation: V concepts dont serve to narrow down search space, but to suggest an ordering of it

62
Kees van Deemter (SSE, Jan '10) Objections …

63
Kees van Deemter (SSE, Jan '10) Objections … Objection 1: Smart search would have been equally possible based on a dichotomous (i.e., classical) model.

64
Kees van Deemter (SSE, Jan '10) Objections … Objection 1: Smart search would have been equally possible based on a dichotomous (i.e., classical) model. The idea: You can use a classical model, yet understand that other speakers use other classical models. Start searching individuals who are tall on most models.

65
Kees van Deemter (SSE, Jan '10) Objections … Objection 1: Smart search would have been equally possible based on a dichotomous (i.e., classical) model. The idea: you can use a classical model yet understand that other speakers use other classical models. Start searching individuals who are tall on most models Response: Reasoning about different classical models is not a classical logic but a Partial Logic with supervaluations. Presupposes that tall is understood as vague.

66
Kees van Deemter (SSE, Jan '10) Objections … Objection 2: Truth value gaps (or degrees) do not imply vagueness.

67
Kees van Deemter (SSE, Jan '10) Objections … Objection 2: Truth value gaps (or degrees) do not imply vagueness. The idea: Truth value gaps that are crisp fail to model V. (Similar for degrees.) Higher-order V needs to be modelled.

68
Kees van Deemter (SSE, Jan '10) Objections … Objection 2: A truth value gap (or degrees) does not imply vagueness. The idea: Truth value gaps that are crisp fail to model V. (Similar for degrees.) Higher-order V needs to be modelled. Response: few formal accounts of V pass this test

69
Kees van Deemter (SSE, Jan '10) Objections … Objection 3: Why was the witness not more precise?

70
Kees van Deemter (SSE, Jan '10) Objections … Objection 3: Why was the witness not more precise? The idea: Witness should have said the thief was 185cm tall, giving an estimate

71
Kees van Deemter (SSE, Jan '10) Objections … Objection 3: Why was the witness not more precise? The idea: Witness should have said the thief was 185cm tall, giving an estimate Response: (1) Why are such estimates vague (approximately 185cm) rather than crisp (185cm =/- 0.5cm)? (2) This can be answered along the lines proposed.

72
Kees van Deemter (SSE, Jan '10) Objections … Objection 4: This contradicts Lipmans theorem

73
Kees van Deemter (SSE, Jan '10) Objections … Objection 4: This contradicts Lipmans theorem The idea: Lipman (2006) proved that every V predicate can be replaced by a crisp one that has a utility at least as high.

74
Kees van Deemter (SSE, Jan '10) Objections … Objection 4: This contradicts Lipmans theorem The idea: Lipman (2006) proved that every V predicate can be replaced by a crisp one that has a utility at least as high. Response: Lipmans assumptions dont apply: (1) Theorem models V through probability distribution. (2) Theorem assumes that hearer knows what crisp model the speaker uses (e.g. >185cm).

75
Kees van Deemter (SSE, Jan '10) Lipmans theorem assumes that hearer knows what crisp model the speaker uses Our starting point: in continuous domains, perfect alignment between speakers/hearers would be a miracle (pace epistemicist approaches to V !)

76
Kees van Deemter (SSE, Jan '10) Summing up Why information reduction is useful is well understood Why this should involve borderline cases is less clear. (Lipmans question) Several tentative answers have been published. (Survey) A new answer, based on mismatches in perception and benefits for search –complements earlier answers

77
Kees van Deemter (SSE, Jan '10) Not Exactly: in Praise of Vagueness. Oxford University Press, Jan. 2010 Part 1: Vagueness in science and daily life. Part 2: Linguistic and logical models of vagueness. Part 3: Working models of V in Artificial Intelligence.

78
Kees van Deemter (SSE, Jan '10) Background on the book: http://www.csd.abdn.ac.uk/~kvdeemte/NotExactly Acknowledgments relating to this talk: Ehud Reiter, Advaith Siddharthan

79
Kees van Deemter (SSE, Jan '10) Appendix Extra slides

80
Kees van Deemter (SSE, Jan '10) Normally model B wins Let S abbreviate: p(t [[?tall?]] B | tall(t)) > p(t [[ tall]] B | tall(t)) For example, S is plausible if card([[?tall?]] B )=card([[ tall]] B )

81
Kees van Deemter (SSE, Jan '10) Example (draft) Let [[tall]]={a,b,c,d} p(thief [[tall]]) = 1/2 [[?tall]]={e,f} p(thief [[?tall?]]) = 1/4 [[ tall]]={g,h,i,j} p(thief [[ tall]]) = 1/4 then s( )= 1/2*(2)+1/4*(4+1)+1/4*(6+2)= 4.25 s( )= 1/2*(2)+1/4*(4+2)+1/4*(8+1)= 4.75 s( )= 1/4*(2)+1/4*(4+1)+1/2*(6+2)= 5.75

82
Kees van Deemter (SSE, Jan '10) Consider degree model C In C: v(tall(x)) [0,1] (e.g., Fuzzy or Probabilistic Logic) S = ab: a>b p(tall(x) | v(tall(x)=a)) > p(tall(x) | v(tall(x)=b)) height(x) probability of x being called tall

83
Kees van Deemter (SSE, Jan '10) rich=earn more than 10 6 EUR (1) xy: ( x South & y North p(rich(x)) > p(rich(y)) ) (2) x: (rich(x) oil(x)) Therefore (3) xy: ( x South & y North p(oil(x)) > p(oil(y)) )

Similar presentations

OK

The expected value The value of a variable one would “expect” to get. It is also called the (mathematical) expectation, or the mean.

The expected value The value of a variable one would “expect” to get. It is also called the (mathematical) expectation, or the mean.

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on open cast coal mining Ppt on bluetooth communications Ppt on chromosomes and genes model World map download ppt on pollution Ppt on vertical axis windmill Ppt on high sea sales agreement Ppt on five generation of computers Ppt on wireless networking technology Ppt on ms excel charts Ppt on rainwater harvesting in india download