Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ontology based Information Extraction

Similar presentations


Presentation on theme: "Ontology based Information Extraction"— Presentation transcript:

1 Ontology based Information Extraction
Jin Mao Postdoc, School of Information, University of Arizona Oct. 9th, 2015

2 Outline Definition Common Architectures Information Extraction Methods
Ontology Construction/Enhancement Performance Evaluation

3 Information Extraction
The process of obtaining pertinent information (facts) from documents. Examples: The forest area in India extended to about 75 million hectares, which in terms of geographical area is approximately 22 percent of the total land. What’s the relationship between forest area and geographical area?

4 Ontology Based Information Extraction (OBIE)
Terminology Ontology Based Information Extraction(Wimalasuriya and Dou, 2010) Ontology-driven Information Extraction(Yildiz and Miksch, 2007) The same as Ontology Based Information Extraction Whether the ontology part is within the system (Yildiz and Miksch, 2007)

5 Ontology Based Information Extraction (OBIE)
Key Characteristics Process unstructured or semi-structured natural language text Present the output using ontologies Ontology as input(Li and Bontcheva, 2007), released Use an IE process guided by an ontology no new IE method an existing one is oriented to identify the components of an ontology (classes, properties and instances) Extractors belong to an ontology? linguistic rules

6 Ontology Based Information Extraction (OBIE)
Why An ontology helps to clarify a domain’s semantics. E.g., concepts and their relationships To alleviate a wide variety of natural language ambiguities

7 Ontology Based Information Extraction (OBIE)
Applications Business Intelligence (BI) in e-business Social Media—twitter Metadata Generation for digital resources. ……

8 Common Architectures Major Challenges Information Extraction: Identify instances from the ontology in the text. Classes, Instances, Mentions, Properties, Property Values Free texts in natural language. Example 1: Classical fried egg Mycoplasma-type colonies were not observed on 1% agar medium. Example 2: The cells are not motile, are not lysed in 1% SDS (wt/vol), and stain Gram positively.

9 Ontology Enhancement / Updating
Common Architectures Major Challenges Ontology Enhancement / Updating Upgrade the ontology with new instances to cover the knowledge better in a domain Not in the common architecture.

10 Common Architectures General Architecture

11 Define the semantic elements to be extracted
Common Architectures First Step Define the semantic elements to be extracted An example (Muller et al., 2004) Concept (C): named entities about every parts of human body such as heart,lung, kidney… Name of Disease (N): words or phrases of disease names. Description (D): any words or phrases that describe Concepts. “Description”refers to any kind of words or phrases that relates semantically to Concepts. Pair of Concept and Description (P): all possible combinations of Concepts and Descriptions. Combinations contain full meaning of relationships between C and D.

12 Information Extraction Methods
Linguistic rules Using regular expressions/patterns (watched|seen) <NP> Part-of-Speech Tag Implemented using finite-state transducers which consist of a series of finite-state automata Automatically generate regular rules: “[Ii]nteract(s|ed|ing)?”“interact,” “interacts,” “interacted,” “interacting,” ”Interact,” “Interacts,” “Interacted,” and “Interacting.” Simple, surprisingly good results

13 Information Extraction Methods
Linguistic rules automatically mine extraction rules from text A dictionary inductive learning algorithm(Vargas-Vera et al., 2001) Finding the longest common subsequence problem (Romano et al., 2006) Relational Learning(Califf and Mooney, 1999), a bottom-up learning

14 Information Extraction Methods
Gazetteer Lists To recognize individual words or phrases widely used in the named-entity recognition E.g., to recognize states of the US or countries of the world Conditions: Specify exactly what is being identified by the gazetteer. Specify where the information for the gazetteer lists was obtained from.

15 Information Extraction Methods
Classification Techniques Linguistic features such as POS tags, capitalization information and individual Part of IE as classification problems: whether a word token is the start/end of an entity (Li et al., 2004) identify different components of an ontology such as instances (Li and Bontcheva, 2007) and property values (Wu and Weld, 2007)

16 Information Extraction Methods
Syntax/Shallow NLP A semantically annotated parse tree for the text as a part of the IE process Linguistic extraction rules with partial parse trees (Todirascu et al., 2002).

17 Ontology Construction
to consider the ontology as an input to the system to construct an ontology as a part of the OBIE process

18 Ontology Enhancement update the ontology by adding new classes and properties through the IE process. NOT instances and their property values Such systems include the implementations by Maedche et al. (2003) and Dung and Kameyama (2007). Fuzzy Relationship Rule: Define rules according to the relationships among semantic elements. Generate a suggestion list for the domain experts to extract real semantic elements.

19 Performance Evaluation
Measure the accuracy of identifying instances and property values. Most IE systems face a trade-off between improving precision and recall. when β2<1, p should be more important

20 Performance Evaluation
Evaluation in different scales (Maynard et al., 2004) each answer is categorized as correct or incorrect, however, different degrees of correctness should be allowed. Learning Accuracy (LA) : This measures the closeness of the assigned class label to the correct class label based on the hierarchy of the ontology (Cimiano et al., 2005). Multi-dimensional evaluation beyond Precision and Recall

21 Performance Evaluation
Cost-based metrics(Maynard et al., 2004) cost would typically be associated with a miss and a false alarm (spurious answer) augmented precision (AP) augmented recall (AR)

22 Potentials Automatically processing the information contained in natural language text Creating semantic contents for the Semantic Web automatic metadata generation semantic annotation Improving the quality of ontologies

23 ACKNOWLEDGEMENT Most of the materials are adapted from:
Wimalasuriya, D. C., & Dou, D. (2010). Ontology-based information extraction: An introduction and a survey of current approaches. Journal of Information Science. Other References (part): Muhammad, A., & Dey, L. (2005). Biological Ontology enhancement with Fuzzy Relation: A Text Mining Framework. In International Conference on Web Intelligence WI (Vol. 5).  R. Romano, L. Rokach and O. Maimon, Automatic discovery of regular expression patterns representing negated findings in medical narrative reports. In: Proceedings of the 6th International Workshop on Next Generation Information Technologies and Systems (Springer, Berlin, 2006). Muller, H. M., Kenny, E. E., & Sternberg, P. W. (2004). Textpresso: an ontology-based information retrieval and extraction system for biological literature. PLoS Biol, 2(11), e309. Dung, T. Q., & Kameyama, W. (2007). Ontology-based information extraction and information retrieval in health care domain. In Data Warehousing and Knowledge Discovery (pp ). Springer Berlin Heidelberg.

24 Thank you!


Download ppt "Ontology based Information Extraction"

Similar presentations


Ads by Google