Presentation is loading. Please wait.

Presentation is loading. Please wait.

Article by: Feiyu Xu, Daniela Kurz, Jakub Piskorski, Sven Schmeier Article Summary by Mark Vickers.

Similar presentations


Presentation on theme: "Article by: Feiyu Xu, Daniela Kurz, Jakub Piskorski, Sven Schmeier Article Summary by Mark Vickers."— Presentation transcript:

1 Article by: Feiyu Xu, Daniela Kurz, Jakub Piskorski, Sven Schmeier Article Summary by Mark Vickers

2 Presentation Layout Introduction to Research Methods used Overview of GermaNet Overview of SPPC Details of their Approach Results Conclusion

3 Goal Automatic Acquisition of Domain Relevant terms and their relations How? Single-word Terms: TFIDF classification Domain Relevant Relations: Use Lexico-syntactic patters:  Existing Ontologies  Collocation methods  Introduction Methods Used GermaNet SPPC Approach Results Conclusion

4 Input No seed words No syntactic patterns Just a collection of classified documents  Introduction Methods Used GermaNet SPPC Approach Results Conclusion

5 Methods Used Builds on Other Systems: GermaNet (They built an Ontology Inference Machine to search GermaNet) For: Accessing Semantic relations SPPC (Shallow Processing Production Center) For: Linguistic Annotation Introduction  Methods Used GermaNet SPPC Approach Results Conclusion

6 Accessing Semantic Relations GermaNet Developed within the LSD Project at the Division of Computational Linguistics of the Linguistics Department at the University of Tübingen, Germany A lexical-semantic net German nouns, verbs, and adjectives are semantically grouped by an underlying lexical concept (like a thesaurus) – called synsets Synsets are connected by semantic relations Lexical relationships include synonyms, antonyms, and “pertains to” Conceptual relations include hyponyms (‘is-a’), meronyms (‘has-a’), entailment, and cause Based off the technology of WordNet (Princeton) Introduction Methods Used  GermaNet SPPC Approach Results Conclusion

7 Accessing Semantic Relations WordNet Introduction Methods Used  GermaNet SPPC Approach Results Conclusion

8 Accessing Semantic Relations WordNet Introduction Methods Used  GermaNet SPPC Approach Results Conclusion

9 Accessing Semantic Relations WordNet Introduction Methods Used  GermaNet SPPC Approach Results Conclusion

10 Accessing Semantic Relations Inference Machine Allows GermaNet’s relations to be searched by other applications Provides 3 different functions: Retrieval of relations assigned to words Example : “Find all synonyms for the word bar”  rod, saloon, … Retrieval of relations between words Example : “Find relations between Internet-Service-Provider and Company”  hyponym (so and ISP is a company) Navigation in the GermaNet graph Introduction Methods Used  GermaNet SPPC Approach Results Conclusion

11 Linguistic Annotation SPPC SPPC (Shallow Processing Production Center) Robust German NLP that uses cascaded optimized weighted finite state devices SPPC parts: Tokenizer Lexical Processor Part-of-Speech Filtering Named-entity Finder Chunk recognizer Introduction Methods Used GermaNet  SPPC Approach Results Conclusion

12 Their Extraction Engine Three Main components: 1. TFIDF-based single-word term classifier 2. Lexico-syntactic pattern finder 1. Learns patterns based on known relations 2. Learns patterns based on term collocation methods 3. Relation Extractor Introduction Methods Used GermaNet SPPC  Approach Results Conclusion

13 Their Extraction Engine Introduction Methods Used GermaNet SPPC  Approach Results Conclusion 1. Extract Single- word terms 2. Learn multi-word terms & identify syntactic patterns 3. Learn patterns from known relations 4. Extract related terms using found lexico- syntactic patterns Single-word term extraction (KFIDF)

14 Discovering Domain Relevant Terms Apply a TFIDF measure: KFIDF Introduction Methods Used GermaNet SPPC  Approach Results Conclusion

15 Their Extraction Engine Introduction Methods Used GermaNet SPPC  Approach Results Conclusion Collocation learner

16 Learning Term Collocations Examples : man-eating shark, dead serious, depend on, blue-collard Measures: Mutual Information (probabilities) - Occurrence of one word predicts the occurrence of another - Not practical for sparse data Log-Likelihood Measures (contingency tables) - Tells how much more likely the occurrence of one pair is over the another T-test - Accept or reject the null hypothesis (terms are independent) Introduction Methods Used GermaNet SPPC  Approach Results Conclusion

17 Their Extraction Engine Introduction Methods Used GermaNet SPPC  Approach Results Conclusion Relation Extractor

18 Learning Relations with Lexico-syntactic patterns Introduction Methods Used GermaNet SPPC  Approach Results Conclusion Example of a lexico-syntactic pattern finding relations Pattern: “or other” Sentence: Bruises, wounds, or other injuries are common. Hyponym Relations: (Bruises, Injuries), (Wounds, Injuries) ------------------------------------------------------------------------------- Pattern: “as well as” Sentence: Cocaine as well as Hashish, and LSD… Near synonyms? -- Now we can match LSD to Drug domain

19 Learning Relations with Lexico-syntactic patterns Introduction Methods Used GermaNet SPPC  Approach Results Conclusion Extracted terms GermaNet (semantic relationships) Terms with semantic relations (synonymy, hyponymy, meronymy) Put semantically similar fragments Into Landau-Finkelstien and Morin’s Algorithm to cluster patterns Domain independent patterns Domain specific patterns Term relation extractor applies newly extracted lecixo-syntactic patterns With Near Synonyms – search GermaNet to find common hyponyms, then assign the newly found hyponymous relation to the term not encode in the GermaNet List of related terms with possible hyponymous relations

20 Results Introduction Methods Used GermaNet SPPC Approach  Results Conclusion There’s a correlation between corpus size and precision LogLike delivers best result compared to Mutual Information And T-Test Noun-Verb collocations were most prominent and had best results In Drug domain, N-V = 56% precision and N-N = 41% precision

21 KFIDF proves promising for single-word term extraction Statistical measures are suitable for free- word order languages like German Extracting term relations useful for real- world IE Conclusion Introduction Methods Used GermaNet SPPC Approach Results  Conclusion

22 + Uses well known existing systems + Seemingly no human interaction + Domain Adaptive (robust) - Precision does not seem to be too impressive, and recall? I’d like to see more results We see from the past few papers that automatic ontology generation approaches consist of: Combining multiple strategies (statistics, existing ontologies) Have a cyclic, machine learning nature. My Evaluation Introduction Methods Used GermaNet SPPC Approach Results  Conclusion


Download ppt "Article by: Feiyu Xu, Daniela Kurz, Jakub Piskorski, Sven Schmeier Article Summary by Mark Vickers."

Similar presentations


Ads by Google