Spelling Checkers Daniel Jurafsky and James H. Martin, Prentice Hall, 2000.

Slides:



Advertisements
Similar presentations
1 CS 388: Natural Language Processing: N-Gram Language Models Raymond J. Mooney University of Texas at Austin.
Advertisements

Using Word 2010 Part 1 Chapter 2 1. What is a Word Processor? 2.
Introduction to Computational Linguistics
Spelling Correction for Search Engine Queries Bruno Martins, Mario J. Silva In Proceedings of EsTAL-04, España for Natural Language Processing Presenter:
Standard Grade Notes General Purpose Packages. These are Software packages which allow the user to solve a range of problems.
LING 438/538 Computational Linguistics Sandiway Fong Lecture 17: 10/25.
© Paradigm Publishing, Inc Word 2010 Level 2 Unit 1Formatting and Customizing Documents Chapter 2Proofing Documents.
APPLICATIONS OF TECHNOLOGY Improving Writing Skills.
Spelling correction as an iterative process that exploits the collective knowledge of web users Silviu Cucerzan and Eric Brill July, 2004 Speaker: Mengzhe.
An Online Microsoft Word Tutorial & Evaluation Begin.
Word Lesson 3 Helpful Word Features © 2012 M and K Solutions, LLC -- All Rights Reserved.
Linda Borger Department of Education, University of Gothenburg.
A BAYESIAN APPROACH TO SPELLING CORRECTION. ‘Noisy channels’ In a number of tasks involving natural language, the problem can be viewed as recovering.
6/9/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 4 Giuseppe Carenini.
6/9/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 4 Giuseppe Carenini.
Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 12: Sequence Analysis Martin Russell.
CS 4705 Lecture 13 Corpus Linguistics I. From Knowledge-Based to Corpus-Based Linguistics A Paradigm Shift begins in the 1980s –Seeds planted in the 1950s.
CSCI 5832 Natural Language Processing Lecture 5 Jim Martin.
Gobalisation Week 8 Text processes part 2 Spelling dictionaries Noisy channel model Candidate strings Prior probability and likelihood Lab session: practising.
Computational Language Andrew Hippisley. Computational Language Computational language and AI Language engineering: applied computational language Case.
Metodi statistici nella linguistica computazionale The Bayesian approach to spelling correction.
CS276 – Information Retrieval and Web Search Checking in. By the end of this week you need to have: Watched the online videos corresponding to the first.
Pasewark & Pasewark 1 Word Lesson 3 Helpful Word Features Microsoft Office 2007: Introductory.
Introduction to Information Retrieval Introduction to Information Retrieval CS276: Information Retrieval and Web Search Christopher Manning, Pandu Nayak.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
8/27/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 5 Giuseppe Carenini.
BİL711 Natural Language Processing1 Statistical Language Processing In the solution of some problems in the natural language processing, statistical techniques.
EDITING A DOCUMENT Lesson 9. Translate ◦ Feature that allows users to translate their document into other languages ◦ Must be connected to the Internet.
Word Processing Standard Grade Computing LA/LM. Word processor a computer program that allows you to manipulate text What is?
November 2005CSA3180: Statistics III1 CSA3202: Natural Language Processing Statistics 3 – Spelling Models Typing Errors Error Models Spellchecking Noisy.
LING 438/538 Computational Linguistics
6. N-GRAMs 부산대학교 인공지능연구실 최성자. 2 Word prediction “I’d like to make a collect …” Call, telephone, or person-to-person -Spelling error detection -Augmentative.
Chapter 5. Probabilistic Models of Pronunciation and Spelling 2007 년 05 월 04 일 부산대학교 인공지능연구실 김민호 Text : Speech and Language Processing Page. 141 ~ 189.
1 BILC SEMINAR 2009 Speech Recognition: Is It for Real? Tony Mirabito Defense Language Institute English Language Center (DLIELC) DLIELC.
1 The Ferret Copy Detector Finding short passages of similar texts in large document collections Relevance to natural computing: System is based on processing.
LING/C SC/PSYC 438/538 Lecture 19 Sandiway Fong. Administrivia Next Monday – guest lecture from Dr. Jerry Ball of the Air Force Research Labs to be continued.
Chapter 6: N-GRAMS Heshaam Faili University of Tehran.
Resources: Problems in Evaluating Grammatical Error Detection Systems, Chodorow et al. Helping Our Own: The HOO 2011 Pilot Shared Task, Dale and Kilgarriff.
Language Modeling Anytime a linguist leaves the group the recognition rate goes up. (Fred Jelinek)
I'm thinking of a number. 12 is a factor of my number. What other factors MUST my number have?
CS276: Information Retrieval and Web Search
S1: Chapter 1 Mathematical Models Dr J Frost Last modified: 6 th September 2015.
Resolving Word Ambiguities Description: After determining word boundaries, the speech recognition process matches an array of possible word sequences from.
Minimum Edit Distance Definition of Minimum Edit Distance.
Chapter 5: Spell Checking
Corpus-based generation of suggestions for correcting student errors Paper presented at AsiaLex August 2009 Richard Watson Todd KMUTT ©2009 Richard Watson.
12/7/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 4 Giuseppe Carenini.
Melodic Similarity Presenter: Greg Eustace. Overview Defining melody Introduction to melodic similarity and its applications Choosing the level of representation.
A COMPARISON OF HAND-CRAFTED SEMANTIC GRAMMARS VERSUS STATISTICAL NATURAL LANGUAGE PARSING IN DOMAIN-SPECIFIC VOICE TRANSCRIPTION Curry Guinn Dave Crist.
On using context for automatic correction of non-word misspellings in student essays Michael Flor Yoko Futagi Educational Testing Service 2012 ACL.
1 An Introduction to Computational Linguistics Mohammad Bahrani.
More Action... Example or regular text OptionsAllDictionaryThesaurusAppleWikipediaTest OK John Appleseed TextBox Label DesktopScreen Saver Long Button.
Autumn Web Information retrieval (Web IR) Handout #3:Dictionaries and tolerant retrieval Mohammad Sadegh Taherzadeh ECE Department, Yazd University.
Proofing Documents Lesson 9 #1.09.
2/29/2016CPSC503 Winter CPSC 503 Computational Linguistics Lecture 5 Giuseppe Carenini.
LING/C SC/PSYC 438/538 Lecture 24 Sandiway Fong 1.
Minimum Edit Distance Definition of Minimum Edit Distance.
Nick Fogal. Overview  Introduction  Common Errors  Scenario  Methods  Applications  Future.
Spell checking. Spelling Correction and Edit Distance Non-word error detection: – detecting “graffe” “ سوژن ”, “ مصواک ”, “ مداا ” Non-word error correction:
Spelling correction. Spell correction Two principal uses Correcting document(s) being indexed Retrieve matching documents when query contains a spelling.
Spelling Correction and the Noisy Channel Real-Word Spelling Correction.
Editing Editing – the process of updating a word processing document to: make changes correct errors make it visually appealing.
Definition of Minimum Edit Distance
INTRODUCTORY MICROSOFT WORD Lesson 3 – Helpful Word Features
CS621/CS449 Artificial Intelligence Lecture Notes
CSA3180: Natural Language Processing
CPSC 503 Computational Linguistics
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
Lecture 13 Corpus Linguistics I CS 4705.
Presentation transcript:

Spelling Checkers Daniel Jurafsky and James H. Martin, Prentice Hall, 2000.

Dealing with Spelling Errors spell check on modern word processors optical character recognition on-line handwriting recognition isolated-word error detection and correction: correcting spelling errors that result in non-words (e.g. graffe for giraffe) context-dependent error detection and correction: using context to detect and correct spelling errors even if they accidentally result in another English word. Typographical (e.g. three for there) or cognitive (e.g. piece for peace)

Minimum edit method of spelling error correction Damerau (1964) found that 80% of spelling errors in a sample of human keypunched texts were single-error misspellings, a single one of the following: insertion: mistyping the as ther deletion: mistyping the as th substitution: mistyping the as thw transposition: mistyping the as hte This suggests the minimum edit method of spelling error correction. The minimum edits is the least number of insertions, deletions and substitutions required to transform one word into another. Exercise: Given a dictionary consisting of scarf, scare, scene and scent, what is the most likely correct spelling of sene?

Dice’s similarity coefficient Another method: Dice’s Similarity Coefficient with bigrams: DSC = 2 * matches / (bigrams_in_A + bigrams_in_B) A = se - en - ne B = sc - ce - en - ne Matches = 2, bigrams in A = 3, bigrams in B = 4 Dice = ( 2 * 2 ) / ( ) = 4/7 ~ 0.43 DSC = 1 if A and B are identical 0 if A and B have no bigrams in common

Word Prediction and N-Grams I’m going to make a telephone… Word prediction is an essential subtask of speech recognition, augmentative communication for the disabled, context-sensitive spelling error detection, inputting Chinese characters, etc. Some attested real-word spelling errors (Kukich, 1992): They are leaving in about fifteen minuets. The study was conducted be John Black. The design an construction of the system will take more than a year. Hopefully, all with continue smoothly in my absence. He is trying to fine out. An N-gram language model uses the previous N-1 words to predict the next one. A bigram is called a first-order Markov Model. A fragment of a bigram grammar from the Berkeley Restaurant Project - a speech based restaurant consultant: See p 199, and note the formula at the bottom.