Presentation is loading. Please wait.

Presentation is loading. Please wait.

Presented by Rani Qumsiyeh & Andrew Zitzelberger.

Similar presentations


Presentation on theme: "Presented by Rani Qumsiyeh & Andrew Zitzelberger."— Presentation transcript:

1 Presented by Rani Qumsiyeh & Andrew Zitzelberger

2  Common approaches  Collocation analysis: Producing anonymous relations without a label.  Syntactic Dependencies: The dependencies between verbs and arguments.  Hearst’s approach: Matching lexico-syntactic patterns.

3  Definition: A pair of words which occur together more often than expected by chance within a certain boundary.  Can be detected by Student’s t-test or X^2 test.  Examples of such techniques are presented in the related work section.

4  “A person works for some employer”  Relation: work-for  Concepts: person, employer  The acquisition of selectional restrictions  Detecting verbs denoting the same ontological relation.  Hierarchical ordering of relations.  Discussed later in detail.

5  Used to discover very specific relations such as part-of, cause, purpose.  Charniak employed part-of-speech tagging to detect such patterns.  Other approaches to detect causation and purpose relations are discussed later.

6  Learning Attributes relying on the syntactic relation between a noun and its modifying adjectives.  Learning Relations on the basis of verbs and their arguments.  Matching lexico-syntactic patterns and aims at learning qualia structures for nouns.

7  Attributes are defined as relations with a datatype as range.  Attributes are typically expressed in texts using the preposition of, the verb have or genitive constructs:  the color of the car  every car has a color  the car's color  Peter bought a new car. Its color [...]

8

9  attitude adjectives, expressing the opinion of the speaker such as in 'good house'  temporal adjectives, such as the 'former president' or the 'occasional visitor‘  membership adjectives, such as the 'alleged criminal', a 'fake cowboy‘  event-related adjectives, such as 'abusive speech', in which either the agent of the speech is abusive or the event itself

10  Find the corresponding description for the adjective by looking up its corresponding attribute in WordNet.  Consider only those adjectives which do have such an attribute relation.  This increases the probability that the adjective being considered denotes the value of some attribute, quality or property.

11  Tokenize and part-of-speech tag the corpus using TreeTagger.  Match to the following two expressions and extract adjective/noun pairs:  (\w+{DET})? (\w+{NN})+ is{VBZ} \w+{JJ}  (\w+{DET})? \w+{JJ} (\w+{NN})+  Cond (n, a) := f(n, a)/f(n)

12  Tourism Corpus  Threshold = 0.01  Car

13  For each of the adjectives we look up the corresponding attribute in WordNet  age is one of {new, old}  value is one of {black}  numerousness/numerosity/multiplicity is one of {many}  otherness/distinctness/separateness is one of {other}  speed/swiftness/fastness is one of {fast}  size is one of {small, little, big}

14  Evaluate for every domain concept according to (i) its attributes and their (ii) corresponding ranges by assigning them a rate from '0' to '3‘ ▪ '3' means that the attribute or its range is totally reasonable and correct. ▪ '0' means that the attribute or the range does not make any sense.

15

16  A new approach that not only lists relations but finds the general relation.  work-for (man, department), work.for (employee, institute), work.for (woman, store)  work-for (person,organization)

17  Conditional probability.  Pointwise mutual information (PMI).  A measure based on the x^-test.  Evaluate by applying their approach to the Genia corpus using the Genia ontology

18  Extract verb frames using Steven Abney's chunker.  Extract tuples NP-V-NP and NP-V-P-NP.  Construct binary relations from tuples.  Use the lemmatized verb V as corresponding relation label  Use the head of the NP phrases as concepts.

19

20  protein_molecule: 5  Protein_family_or_group: 10  amino-acid: 10

21  Take into account the frequency of occurrence.  Chose the highest one

22  Penalize concepts c which occur too frequently.  P{amino-acid) = 0.27, P(protein) = 0.14

23  Compares contingencies between two variables (the two variables are statistically independent or not)  we can generalize c to ci if the X^2-test reveals the verb v and c to be statistically dependent  Level of significance = 0.05

24  the Genia corpus contains 18.546 sentences with 509.487 words and 51.170 verbs.  Extracted 100 relations, 15 were regarded as inappropriate by a biologist evaluator.  The 85 remaining was evaluated  Direct matches for domain and range (DM),  Average distance in terms of number of edges between correct and predicted concept (AD)  A symmetric variant of the Learning Accuracy (LA)

25

26

27  Nature of Objects  Aristotle  Material cause (made of)  Agentive cause (movement, creation, change)  Formal cause (form, type)  Final cause (purpose, intention, aim)

28 Generative Lexicon framework [Pustejovsky, 1991] Qualia Structures Constitutive (components) Agentive (created) Formal (hypernym) Telic (function) Knife

29  Human  Subjective decisions  Web  Linguistic errors  Ranking errors  Commercial Bias  Erroneous information  Lexical Ambiguity

30

31  Pattern library tuples (p, c)  p is pattern  c is clue (c:string -> string)  Given a term t and a clue c  c(t) is sent to the search engine  π(x) refers to plural forms of x

32

33

34

35  Amount words:  variety, bundle, majority, thousands, millions, hundreds, number, numbers, set, sets, series, range  Example:  “A conversation is made up of a series of observable interpersonal exchanges.” ▪ Constitutive role = exchange

36 PURP:=\w+{VB} NP I NP I be{VB} \w+{VBD}).

37  No good patterns  X is made by Y  X is produced by Y  Instead:  Agentive_verbs = {build, produce, make, write, plant, elect, create, cook, construct, design}

38  e = element  t = term

39  Lexical elements: knife, beer, book, computer  Abstract Noun: conversation  Specific multi-term words:  Natural language processing  Data mining

40  Students score  0 = incorrect  1 = not totally wrong  2 = still acceptable  3 = totally correct

41

42

43 Reasoning: Formal and constitutive patterns are more ambiguous.

44

45

46  Madche and Stabb, 2000  Find relations using association rules  Transaction is defined as words occurring together in syntactic dependency  Calculate support and confidence  Precision = 11%, Recall = 13%

47  Kavalec and Svatek, 2005  Added ‘above expectation’ heuristic ▪ Measure association between verb and pair of concepts

48  Gamallo et al., 2002  Map syntactic dependencies to semantic relations  1) shallow parser + heuristics to derive syntactic dependencies  2) cluster based on syntactic positions  Problems ▪ Mapping is under specified ▪ Largely domain dependent

49  Ciaramita et al., 2005  Statistical dependency parser to extract: ▪ SUBJECT-VERB-DIRECT_OBJECT ▪ SUBJECT-VERB-INDIRECT_OBJECT  χ 2 test – keep those occurring significantly more often than by chance  83% of learned relations are correct  53.1% of generalized relations are correct

50  Heyer et al., 2001  Calculate 2 nd order collocations  Use set of defined rules to reason

51  Ogata and Collier, 2004  HEARST patterns for extraction  Use heuristic reasoning rules

52  Yamaguchi, 2001  Word space algorithm using 4 word window  Cos(angle) measure for similarity ▪ If similarity > threshold relationship  Precision = 59.89% for legal corpus

53  Poesio and Almuhareb, 2005  Classify attributes into one of six categories: ▪ Quality, part, related-object, activity, related-agent, non- attribute  Classifier was trained using: ▪ Morphological information, clustering results, search engine results, and heuristics  Better results from combining related-object and part  F-measure = 53.8% for non-attribute class, and between 81-95% for other classes

54  Claveau et al., 2003  Inductive Logic Programming Approach  Doesn’t distinguish between different qualia roles

55  Learning relations from non-verbal structures  Gold standard of qualia structures  Deriving a reasoning calculus

56  Strengths  Explained (their) methods in detail  Weaknesses  Required a lot of NLP background knowledge  Short summaries of other’s work


Download ppt "Presented by Rani Qumsiyeh & Andrew Zitzelberger."

Similar presentations


Ads by Google