Computational Language Andrew Hippisley. Computational Language Computational language and AI Language engineering: applied computational language Case.

Slides:



Advertisements
Similar presentations
© 2000 XTRA Translation Services Is MT technology available today ready to replace human translators?
Advertisements

The Development of AI St Kentigerns Academy Unit 3 – Artificial Intelligence.
Approaches, Tools, and Applications Islam A. El-Shaarawy Shoubra Faculty of Eng.
Introduction to Computational Linguistics
Introduction to Computational Linguistics
Spelling Correction for Search Engine Queries Bruno Martins, Mario J. Silva In Proceedings of EsTAL-04, España for Natural Language Processing Presenter:
LING 438/538 Computational Linguistics Sandiway Fong Lecture 17: 10/25.
Probabilistic Language Processing Chapter 23. Probabilistic Language Models Goal -- define probability distribution over set of strings Unigram, bigram,
TCN Spell Checker Team AZP: Mark Biddlecom, Joshua Correa, Jatinder Singh, Zianeh Kemeh- Gama, Eric Engquist.
How do we work in a virtual multilingual classroom? A virtual multilingual classroom with Moodle and Apertium Cultural and Linguistic Practices in the.
A Financial News Summarisation System based on Lexical Cohesion
Language Model based Information Retrieval: University of Saarland 1 A Hidden Markov Model Information Retrieval System Mahboob Alam Khalid.
A BAYESIAN APPROACH TO SPELLING CORRECTION. ‘Noisy channels’ In a number of tasks involving natural language, the problem can be viewed as recovering.
Natural Language and Speech Processing Creation of computational models of the understanding and the generation of natural language. Different fields coming.
Software Engineering COMP 201
The Unreasonable Effectiveness of Data Alon Halevy, Peter Norvig, and Fernando Pereira Kristine Monteith May 1, 2009 CS 652.
Natural Language Processing AI - Weeks 19 & 20 Natural Language Processing Lee McCluskey, room 2/07
Gobalisation Week 8 Text processes part 2 Spelling dictionaries Noisy channel model Candidate strings Prior probability and likelihood Lab session: practising.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
ITCS 6010 Natural Language Understanding. Natural Language Processing What is it? Studies the problems inherent in the processing and manipulation of.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Spelling Checkers Daniel Jurafsky and James H. Martin, Prentice Hall, 2000.
Metodi statistici nella linguistica computazionale The Bayesian approach to spelling correction.
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing INTRODUCTION Muhammed Al-Mulhem March 1, 2009.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Database Design - Lecture 1
November 2005CSA3180: Statistics III1 CSA3202: Natural Language Processing Statistics 3 – Spelling Models Typing Errors Error Models Spellchecking Noisy.
CSCI 4410 Introduction to Artificial Intelligence.
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
LING 438/538 Computational Linguistics
6. N-GRAMs 부산대학교 인공지능연구실 최성자. 2 Word prediction “I’d like to make a collect …” Call, telephone, or person-to-person -Spelling error detection -Augmentative.
Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and.
Language Services Industry 华豫江 John Hua.
Chapter 1 In-lab Quiz Next week
1 Computational Linguistics Ling 200 Spring 2006.
Chapter 5. Probabilistic Models of Pronunciation and Spelling 2007 년 05 월 04 일 부산대학교 인공지능연구실 김민호 Text : Speech and Language Processing Page. 141 ~ 189.
LING/C SC/PSYC 438/538 Lecture 19 Sandiway Fong. Administrivia Next Monday – guest lecture from Dr. Jerry Ball of the Air Force Research Labs to be continued.
Intelligent Database Systems Lab Presenter : WU, MIN-CONG Authors : Jorge Villalon and Rafael A. Calvo 2011, EST Concept Maps as Cognitive Visualizations.
Some Probability Theory and Computational models A short overview.
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
Academic Year 2014 Spring. MODULE CC3005NI: Advanced Database Systems “QUERY OPTIMIZATION” Academic Year 2014 Spring.
Language Technology I © 2005 Hans Uszkoreit Language Technology I 2005/06 Hans Uszkoreit Universität des Saarlandes and German Research Center for Artificial.
Introduction to CL & NLP CMSC April 1, 2003.

Artificial Intelligence By Michelle Witcofsky And Evan Flanagan.
1 CSI 5180: Topics in AI: Natural Language Processing, A Statistical Approach Instructor: Nathalie Japkowicz Objectives of.
I Robot.
CSA2050 Introduction to Computational Linguistics Lecture 1 What is Computational Linguistics?
CSE467/567 Computational Linguistics Carl Alphonce Computer Science & Engineering University at Buffalo.
A Joint Source-Channel Model for Machine Transliteration Li Haizhou, Zhang Min, Su Jian Institute for Infocomm Research 21 Heng Mui Keng Terrace, Singapore.
ARTIFICIALINTELLIGENCE ARTIFICIAL INTELLIGENCE EXPERT SYSTEMS.
Autumn Web Information retrieval (Web IR) Handout #3:Dictionaries and tolerant retrieval Mohammad Sadegh Taherzadeh ECE Department, Yazd University.
- How to draw a clear distinction between a client and a server(there is often no clear distinction) - A server may continuously act as a client - Distinction.
Realtime Financial Monitoring and Analysis System May 2010 Lietu Search Engine.
NATURAL LANGUAGE PROCESSING
Artificial Intelligence, simulation and modelling.
January 2012Spelling Models1 Human Language Technology Spelling Models.
Selecting Relevant Documents Assume: –we already have a corpus of documents defined. –goal is to return a subset of those documents. –Individual documents.
Chapter 13: Query Processing
LING/C SC/PSYC 438/538 Lecture 24 Sandiway Fong 1.
NOTE: To change the image on this slide, select the picture and delete it. Then click the Pictures icon in the placeholder to insert your own image. COMMON.
Spell checking. Spelling Correction and Edit Distance Non-word error detection: – detecting “graffe” “ سوژن ”, “ مصواک ”, “ مداا ” Non-word error correction:
Definition CASE tools are software systems that are intended to provide automated support for routine activities in the software process such as editing.
Text Based Information Retrieval
Do-Gil Lee1*, Ilhwan Kim1 and Seok Kee Lee2
Presentation 王睿.
CS621/CS449 Artificial Intelligence Lecture Notes
CSA3180: Natural Language Processing
CPSC 503 Computational Linguistics
Natural Language Processing
Presentation transcript:

Computational Language Andrew Hippisley

Computational Language Computational language and AI Language engineering: applied computational language Case study: spell checkers

Computational language & AI Artificial Intelligence: “the simulation on computer of distinctly human mental functions.” Wilks (1993)

Computational language & AI Language integral to intelligent systems Artificial Intelligence Turing Test ELIZA

Computational language & AI Why language engineering? Language integral to intelligent systems Artifiicial Intelligence Turing Test ELIZA Expert systems: natural language interface, natural language database

Computational language & AI Methods shared across systems Finite State Transition Networks (FSTN) Logic Formal rules Probability Data: you know it!

Applied computational language History of the field Machine Translation: 1960, 1966, post 1966 Database access Text interpretation Information retrieval Text categorisation

Language engineering Information overload Need a way of automatically processing text documents Information extraction

Language engineering Information extraction GIDA: system for automatically monitoring financial market sentiment

GIDA

Language engineering Information overload Need a way of automatically processing text documents Information extraction Summarisation

Automatic summarisation (courtesy of Paulo FERNANDES de OLIVEIRA, PhD) Get information source; Extract some content from it; most importantPresent the most important part to the user xx xxx xxxx x xx xxxx xx xxx x xxx xx xxx x xx x xxxx xxxx xxxx xx xx xxxx xxx xxx xx xx xxxx x xxx xx x xx xx xxxxx x x xx xxx xxxxxx xxxxxx x x xxxxxxx xx x xxxxxx xxxx xx xx xxxxx xxx xx x xx xx xxxx xxx xxxx xx xxx xx xxx xxxx xx xxx x xxxx x xx xxxx xx xx xxxxx x x xx xxxxxxx xx x xxxxxx xxxx xx xx xxxxx xxx xx xxx xx xxxx x xxxxx xx xxxxx x

Lexical Cohesion Sentence 23: J&J's stock added 83 cents to $ Sentence 26: Flagging stock markets kept merger activity and new stock offerings on the wane, the firm said. Sentence 42: Lucent, the most active stock on the New York Stock Exchange, skidded 47 cents to $4.31, after falling to a low at $4.30. Sentence 15: "For the stock market this move was so deeply discounted that I don't think it will have a major impact". Links Example Text title: U.S. stocks hold some gains. Collected from Reuters’ Website on 20 March 2002.

Lexical Cohesion 17. In other news, Hewlett-Packard said preliminary estimates showed shareholders had approved its purchase of Compaq Computer -- a result unconfirmed by voting officials. 19. In a related vote, Compaq shareholders are expected on Wednesday to back the deal, catapulting HP into contention against International Business Machines for the title of No. 1 computer company. Bonds Example Text title: U.S. stocks hold some gains. Collected from Reuters’ Website on 20 March 2002.

Language engineering Information overload Need a way of automatically processing text documents Information extraction Summarisation Translation Retrieve only relevant documents Voice processing

Language engineering Two main approaches Symbolic Stochastic

Case study spell checkers

Spelling dictionaries aim? given a sequence of symbols: 1. identify misspelled strings 2. generate a list of possible ‘candidate’ correct strings 3. select most probable candidate from the list

Spelling dictionaries Implementation: Probabilistic framework bayesian rule noisy channel model

Spelling dictionaries Types of spelling error actual word errors non-word errors

Spelling dictionaries Types of spelling error actual word errors /piece/ instead of /peace/ /there/ instead of /their/ non-word errors

Spelling dictionaries Types of spelling error actual word errors /piece/ instead of /peace/ /there/ instead of /their/ non-word errors /graffe/ instead of /giraffe/

Spelling dictionaries Types of spelling error actual word errors /piece/ instead of /peace/ /there/ instead of /their/ non-word errors /graffe/ instead of /giraffe/ of all errors in type written texts, 80% are non- word errors

Spelling dictionaries non-word errors Cognitive errors /seperate/ instead of /separate/ phonetically equivalent sequence of symbols has been substituted due to lack of knowledge about spelling conventions

Spelling dictionaries non-word errors Cognitive errors Typographic (‘typo’) errors influenced by keyboard e.g. substitution of /w/ for /e/ due to its adjacency on the keyboard /thw/ instead of /the/

Spelling dictionaries non-word errors noisy channel model The actual word has been passed through a noisy communication channel This has distorted the word, thereby changing it in some way The misspelled word is the distorted version of the actual word Aim: recover the actual word by hypothesising about the possible ways in which it could have been distorted

Spelling dictionaries non-word errors noisy channel model What are the possible distortions? insertion deletion substitution transposition all of these viewed as transformations that take place in the noisy channel

Spelling dictionaries Implementing spelling identification and correction algorithm

Spelling dictionaries Implementing spelling identification and correction algorithm STAGE 1: compare each string in document with a list of legal strings; if no corresponding string in list mark as misspelled STAGE 2: generate list of candidates Apply any single transformation to the typo string Filter the list by checking against a dictionary STAGE 3: assign probability values to each candidate in the list STAGE 4: select best candidate

Spelling dictionaries STAGE 3 prior probability given all the words in English, is this candidate more likely to be what the typist meant than that candidate? P(c) = c/N where N is the number of words in a corpus likelihood Given, the possible errors, or transformation, how likely is it that error y has operated on candidate x to produce the typo? P(t/c), calculated using a corpus of errors, or transformations Bayesian rule: get the product of the prior probability and the likelihood P(c) X P(t/c)

Spelling dictionaries non-word errors Implementing spelling identification and correction algorithm STAGE 1: identify misspelled words STAGE 2: generate list of candidates STAGE 3a: rank candidates for probability STAGE 3b: select best candidate Implement: noisy channel model Bayesian Rule

Next week Finite state machines and regular expressions