Presentation on theme: "CLiNG - May 24 2002 Overview of Research - Computational Terminology - Knowledge extraction from Text - Study of causal relation - Corpus building - Uncertainty."— Presentation transcript:
CLiNG - May 24 2002 Overview of Research - Computational Terminology - Knowledge extraction from Text - Study of causal relation - Corpus building - Uncertainty - Computer Assisted Language Learning (CALL) - Interdisciplinary project on French Second Language - Text understanding - From speech to sentence
CLiNG - May 24 2002 SeRT - a tool for knowledge extraction from text Caroline Barrière School of Information Technology and Engineering University of Ottawa Ottawa, Ontario, Canada firstname.lastname@example.org
CLiNG - May 24 2002 A few questions... - Why knowledge extraction from text? For building a Knowledge Base... - What’s a Knowledge Base? It depends who defines it.... - From a terminological standpoint: A static repository of domain-specific knowledge, giving the important concepts and their relations. - What kind of relations? Hyperonymy (is-a), meronymy (part-of), synonymy, function, definition, causality - Why start from text? What are the alternatives?
CLiNG - May 24 2002 Semantic Relations in Text (SeRT) - Goal : Starting from a corpus of texts on a specific domain, capture and store the important concepts (terms) of that domain, as well as their relations. - Hypothesis - definitions can be derived from text analysis - text is used as language and meta-language - paradigmatic relations can be found in texts by pattern search - present knowledge representation formalism allow the representation of this information
CLiNG - May 24 2002 Example of a pattern search for hyperonymy (Corpus on Composting )
CLiNG - May 24 2002 SeRT - Features - parallel search of terms and relations - term extraction - search for surface patterns leading to semantic relations - focus on user interaction (nothing fully automatic) - term selection and validation - user definition of surface patterns corresponding to semantic relations - user selection of concepts involved (tuple) in the semantic relation - raw text used (no preprocessing necessary) - easy access to KB : save and retrieval - to be used in “bootstrapping” mode
CLiNG - May 24 2002 Term extraction - Usage of a stop list a, able, about, above, according, accordingly, across, actually … - appropriate method for English (but maybe not for French) satellite link - liaison par satellite laser printer - imprimante au laser communication network - réseau de communication - no syntactic analysis - different from: Daille 1994: linguistic patterns (French) Bourigault 1994: morpho-syntactic markers (French) - lemmatization 'moving quickly' ‘mov[ing] quick[ly] 'mov* quick*
Search for patterns indicating semantic relations - pre-encoded patterns (earlier work - Barrière 1997) - find list from all other authors - pattern search has multiple possibilities: - string matching - lemmatized token matching - part of speech matching - inclusion of a dictionary look-up (derived from Collins + morphological rules added) - possibility of searching for a pattern around 1 term - usually what Computational Terminologists want to do - display limited or enlarged context
CLiNG - May 24 2002 Example of search patterns Hyperonymy such as (string matching) and other *|n (string + POS) includ* *|n (lemmatized string + POS) *|n is a *|a of [~part] (negative filter) *|y organic materi* [mostly, especially, specifically] (positive filter) + (search with specific term) Synonymy known as (string matching) also called (string matching) Meronymy contains *|n (string + POS) is a *|a part of (string + POS)
Information storage in the TKB - transfer of info found at previous step - user selects the terms (concepts) around the pattern - semantic relation / pattern / tuple are stored in the TKB - an uncertainty factor can also be added to the tuple - research on causal relation has lead to realize the necessity of this information - applies to different relations
CLiNG - May 24 2002 Semantic relation extraction
CLiNG - May 24 2002 Results - semantic relations - Exploration of a few patterns - contain? (meronymy) - such as & and other (hypernymy)
Could we infer is-a relations and extend the type hierarchy?
CLiNG - May 24 2002 SeRT use - Parallel mode - searching on patterns can suggest terms to be explored - search on terms can suggest patterns around them - Bootstrapping mode for relations - start with one pattern: enhance - tuplet compost/soil found used to find other patterns
Future work Short term (tool itself) - Add list of predefined relations & patterns - Add flexibility in pattern search - toward a mix of semantic and syntactic search - Construction of a graphical representation of the semantic network built
CLiNG - May 24 2002 Future work Long term (tool + theoretical background) - Work on compound nouns - much implicit information that could be put explicitly in the KB - Work on representational scheme - the relational database is too limiting - causal relation requires a different type of representation - contexts for expressing the relation (possibly nested) - uncertainty factors - inferencing - Explore pattern search in French - Batch mode extraction (no user) - automatic selection of terms around patterns - after certain terms and patterns have been identified - need an integration of confidence levels on patterns
Your consent to our cookies if you continue to use this website.