Download presentation
Presentation is loading. Please wait.
Published byFranklin Howard Modified over 9 years ago
1
Artificial Intelligence Research Centre Program Systems Institute Russian Academy of Science 152020 Pereslavl-Zalessky Russia
2
INEX: Tools for Information Extraction Artificial Intelligence Research Centre Program Systems Institute Russian Academy of Science 152020 Pereslavl-Zalessky Russia +7 48535 98065 inex@epk.botik.ru
3
Information extraction Objective: extract meaningful information of a pre- specified type from (typically large amounts of) texts for further analytical purposes Output: data structures of a pre-specified format (filled scenario templates)
4
Examples Sports report:,,,, … Database on rental accommodation opportunities:,,, …
5
Possible IE application scenarios: inference of new information (knowledge acquisition) query formulation and answering in human-computer systems automatic generation of abstracts and summaries visualization of document content, etc.
6
The `Newsmaking’ task (person or organization) (original, cited, a reference to another newsmaker)
7
IE system architecture
8
Tokenisation & sentence segmentation Tokenisation identification of words, punctuation marks, delimiters, special characters Sentence segmentation recognizing sentence boundaries
9
Morphological analysis maps every word-form of the input text to (a) canonical form(s) recognizes the word's morphological properties Results are typically ambiguous.
10
Filtering reduces the text to be subjected to further processing to potentially relevant portions
11
Disambiguation a side effect of other processes (e.g., microsyntactic analysis) a stand-alone stage
12
Microsyntactic analysis identifies noun phrases (NP) identifies some regularly formed constructions (numbers, dates, personal proper names)
13
Macrosyntactic analysis identifies clause boundaries constructs clause hierarchy within a sentence
14
Named entity recognizer identifies proper names assigns semantic features to certain items
15
Information extraction rules a domain knowledge representation formalism (scenario templates) a set of patterns to identify template elements in a text (covering the many possible ways to talk about the target event elements)
16
IE pattern includes: a set of rules that define how to retrieve this pattern in a text a set of constraints imposed on textual elements to fit into a particular slot of the target
17
Coreference Resolver recognizes different occurrences of the same entity in a text
18
Merging partial results merging partially filled templates to produce a final, maximally filled template
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.