Presentation on theme: "Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His."— Presentation transcript:
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His Chen Reporter: Chi-Hsin Yu Date: 2007.08.02 Raymond J. Mooney Department of Computer Sciences, University of Texas at Austin
Outlines Introduction Sample Applications and their MRLs Systems for Learning Semantic Parsers Experimental Evaluation Future Research
Introduction Semantic parsing the task of mapping a natural language sentence into a complete, formal meaning representation (MR) or logical form Meaning representation language (MRL) a formal unambiguous language that allows for automated inference and processing such as first-order predicate logic
Introduction (cont.) MRL of this paper is “executable” and can be directly used by another program to perform some task, such as answering questions from a database controlling the actions of a real or simulated robot The goal of these systems is to induce an efficient and accurate semantic parser that can map novel sentences into this MRL Training corpus sentences annotated (NL, MR) pairs extra training input: such as syntactic parse trees or semantically annotated parse trees
Sample Applications and their MRLs Database query language a sample database on U.S. geography logical query language based on Prolog Coaching language for robotic soccer developed for the RoboCup Coach Competition a formal language called CLANG Tactics and behaviors are expressed in terms of if-then rules
Systems for Learning Semantic Parsers Three approaches to learning statistical semantic parsers SCISSOR (CoNLL-2005, COLING-ACL-06) adds detailed semantics to a statistical syntactic parser WASP (HLT/NAACL-06) adapts statistical machine translation methods to map from NL to MRL KRISP (COLING-ACL-06) uses SVM with a subsequence kernel specialized for text learning
Systems for Learning Semantic Parsers – SCISSOR SCISSOR Semantic Composition that Integrates Syntax and Semantics to get Optimal Representations learns a statistical parser that generates a semantically augmented parse tree (SAPT) Training data: (NL, SAPT, MR) triples Process (1) an enhanced version of Collin’s parser (head-driven model 2) is trained to produce SAPTs (2) a recursive procedure is used to compositionally construct the MR for each node in the SAPT given the MRs of its children
Systems for Learning Semantic Parsers – SCISSOR (cont.) Ball owner (type concept) Predicate concept
Systems for Learning Semantic Parsers – WASP WASP Word Alignment-based Semantic Parsing uses Statistical Machine Translation (SMT) techniques (parallel corpora) to translate from NL to MRL Process (1) An SMT word alignment system, GIZA++ is used to produce an N to 1 alignment between the words in the NL sentence and a sequence of MRL productions. (2) A synchronous CFG (SCFG) produces complete MRs by combining these NL substrings and their translations.
Systems for Learning Semantic Parsers – WASP (cont.)
Systems for Learning Semantic Parsers – WRISP KRISP Kernel-based Robust Interpretation for Semantic Parsing uses SVMs with string kernels to build semantic parsers Process (1) learns classifiers: a word or phrase a particular concept in the MRL (2) learns classifiers: NL substrings a production (3) each classifier estimates the probability of each production covering different substrings of the sentence.
Systems for Learning Semantic Parsers – WRISP (cont.)
Experimental Evaluation (1) Two corpora of NL sentences paired with MRs CLANG the average NL sentence length: 22.52 words 300 pieces of coaching advice GEOQUERY the average NL sentence length: 6.87 words 250 questions manually translated into logical form
Experimental Evaluation (2) Evaluation 10-fold cross validation Recall: % sentences resulted in complete MRs Precision: % MRs that were correct CLANG: exact match except reorder of arguments GEOQUERY: same retrieved answer from DB
Future Research (1) SCISSOR: more accurate requires additional human annotation in the form of SAPTs constructed automatically Domain & corpus Limited domains open domain Constructing large annotated corpus of (NL MR) pairs OntoNotes corpus is assembling currently.
Future Research (2) Another way to obtain the requisite supervision to allow ordinary users themselves to provide the necessary feedback Sentence-meaning pair could be automatically constructed inferring the meaning of a sentence from the context in which it was uttered
Future Research (3) Symbol Grounding Problem (SGP) Harnad, S. (1990) Extended from Chinese Room Argument (Searle, 1980) Challenge against Turing Test the Dictionary-Go-Round (1) Suppose you had to learn Chinese as a second language and the only source of information you had was a Chinese/Chinese dictionary. The Dictionary-Go-Round (2) -- SGP Suppose you had to learn Chinese as a first language and the only source of information you had was a Chinese/Chinese dictionary! Clearly, a deep understanding of most natural language requires capturing the connection between the abstract concepts underlying words and phrases and their embodiment in the physical world.