Presentation is loading. Please wait.

Presentation is loading. Please wait.

Artificial Intelligence Research Centre Program Systems Institute Russian Academy of Science 152020 Pereslavl-Zalessky Russia.

Similar presentations


Presentation on theme: "Artificial Intelligence Research Centre Program Systems Institute Russian Academy of Science 152020 Pereslavl-Zalessky Russia."— Presentation transcript:

1 Artificial Intelligence Research Centre Program Systems Institute Russian Academy of Science 152020 Pereslavl-Zalessky Russia

2 INEX: Tools for Information Extraction Artificial Intelligence Research Centre Program Systems Institute Russian Academy of Science 152020 Pereslavl-Zalessky Russia +7 48535 98065 inex@epk.botik.ru

3 Information extraction Objective: extract meaningful information of a pre- specified type from (typically large amounts of) texts for further analytical purposes Output: data structures of a pre-specified format (filled scenario templates)

4 Examples Sports report:,,,, … Database on rental accommodation opportunities:,,, …

5 Possible IE application scenarios: inference of new information (knowledge acquisition) query formulation and answering in human-computer systems automatic generation of abstracts and summaries visualization of document content, etc.

6 The `Newsmaking’ task (person or organization) (original, cited, a reference to another newsmaker)

7 IE system architecture

8 Tokenisation & sentence segmentation Tokenisation identification of words, punctuation marks, delimiters, special characters Sentence segmentation recognizing sentence boundaries

9 Morphological analysis maps every word-form of the input text to (a) canonical form(s) recognizes the word's morphological properties Results are typically ambiguous.

10 Filtering reduces the text to be subjected to further processing to potentially relevant portions

11 Disambiguation a side effect of other processes (e.g., microsyntactic analysis) a stand-alone stage

12 Microsyntactic analysis identifies noun phrases (NP) identifies some regularly formed constructions (numbers, dates, personal proper names)

13 Macrosyntactic analysis identifies clause boundaries constructs clause hierarchy within a sentence

14 Named entity recognizer identifies proper names assigns semantic features to certain items

15 Information extraction rules a domain knowledge representation formalism (scenario templates) a set of patterns to identify template elements in a text (covering the many possible ways to talk about the target event elements)

16 IE pattern includes: a set of rules that define how to retrieve this pattern in a text a set of constraints imposed on textual elements to fit into a particular slot of the target

17 Coreference Resolver recognizes different occurrences of the same entity in a text

18 Merging partial results merging partially filled templates to produce a final, maximally filled template


Download ppt "Artificial Intelligence Research Centre Program Systems Institute Russian Academy of Science 152020 Pereslavl-Zalessky Russia."

Similar presentations


Ads by Google