TCN Spell Checker Team AZP: Mark Biddlecom, Joshua Correa, Jatinder Singh, Zianeh Kemeh- Gama, Eric Engquist.

Slides:



Advertisements
Similar presentations
Business Development Suit Presented by Thomas Mathews.
Advertisements

Spelling Correction for Search Engine Queries Bruno Martins, Mario J. Silva In Proceedings of EsTAL-04, España for Natural Language Processing Presenter:
AskMe A Web-Based FAQ Management Tool Alex Albu. Background Fast responses to customer inquiries – key factor in customer satisfaction Costs for customer.
Int 1 Revision Word Processing Most people are familiar with word processing packages such as Microsoft Word, Open Office and Word Perfect. Here are some.
Front-end for a RETAIN Function Melissa Chávez Client: IBM February 28 th, 2004.
CS 325: Software Engineering April 7, 2015 Software Configuration Management Task Scheduling & Prioritization Reporting Project Progress Configuration.
Team Spider Interim Presentation. Team Spider Members Sponsor  Telecom Consulting Group N.E. Corp. (TCN) Advisor  Professor Raghu Reddy Students  Adam.
Cornell University Library Instruction Statistics Reporting System Members: Patrick Chen (pyc7) Soo-Yung Cho (sc444) Gregg Herlacher (gah24) Wilson Muyenzi.
Academic Advisor: Prof. Ronen Brafman Team Members: Ran Isenberg Mirit Markovich Noa Aharon Alon Furman.
Input Validation For Free Text Fields ADD Project Members: Hagar Offer & Ran Mor Academic Advisor: Dr Gera Weiss Technical Advisors: Raffi Lipkin & Nadav.
Gobalisation Week 8 Text processes part 2 Spelling dictionaries Noisy channel model Candidate strings Prior probability and likelihood Lab session: practising.
APPLICATION DEVELOPMENT BY SYED ADNAN ALI.
Computational Language Andrew Hippisley. Computational Language Computational language and AI Language engineering: applied computational language Case.
Russell Taylor Lecturer in Computing & Business Studies.
1 ES 314 Advanced Programming Lec 2 Sept 3 Goals: Complete the discussion of problem Review of C++ Object-oriented design Arrays and pointers.
Swami NatarajanJuly 14, 2015 RIT Software Engineering Reliability: Introduction.
Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Copyright © 2006 by The McGraw-Hill Companies,
ONLINE DATA STORAGE & DOCUMENTS Lesson 3. Lesson 3 – Online documents In this lesson we will be covering:  Online documents  Compression and expansion.
To quantitatively test the quality of the spell checker, the program was executed on predefined “test beds” of words for numerous trials, ranging from.
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 1 August 15th, 2012 BP & IA Team.
Xpantrac connection with IDEAL Sloane Neidig, Samantha Johnson, David Cabrera, Erika Hoffman CS /6/2014.
Project Proposal: Academic Job Market and Application Tracker Website Project designed by: Cengiz Gunay Client: Cengiz Gunay Audience: PhD candidates and.
Improving Productivity on an SEI Level IV Project presented by Kelly Ohlhausen.
Object Oriented Software Development
Christopher Jeffers August 2012
What’s New in Sage SalesLogix V Release Overview Sage SalesLogix v7.5.2 focuses on: −User Enhancements streamline the user experience furthering.
Copyright COMPLETExRM, Inc. All rights reserved. Sales Presentation 1 For Real Estate.
Avalanche Internet Data Management System. Presentation plan 1. The problem to be solved 2. Description of the software needed 3. The solution 4. Avalanche.
Software Engineering 2003 Jyrki Nummenmaa 1 CASE Tools CASE = Computer-Aided Software Engineering A set of tools to (optimally) assist in each.
Visual Linker Final presentation.
Project Overview Project Name: Client Information Management & Retrieval System M.S. Project : CS University of Bridgeport Student Name: Gandhi Tejas J.
1 Tradedoubler & Mobile Mobile web & app tracking technical overview.
Key Takeaway Points A use case is a business process; it begins with an actor, ends with the actor, and accomplishes a business task for the actor. Use.
NCSU Libraries Kristin Antelman NCSU Libraries June 24, 2006.
By Matt Baker Eric Sprauve Stephen Cauterucio. The Problem Advisors create a sign-up sheet to be posted on the door of their office. These sign-up sheets.
ISV Innovation Presented by ISV Innovation Presented by Business Intelligence Fundamentals: Data Cleansing Ola Ekdahl IT Mentors 9/12/08.
OLAP Council APB-1 OLAP Benchmark Release II
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 22 Slide 1 Software Verification, Validation and Testing.
What’s new in Kentico CMS 5.0 Michal Neuwirth Product Manager Kentico Software.
Chapter 1 Introduction Chapter 1 Introduction 1 st Semester 2015 CSC 1101 Computer Programming-1.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 4 Slide 1 Slide 1 Analysis Workflow l The primary activities of the Analysis workflow are.
Final Presentation Industrial project Automatic tagging tool for Hebrew Wiki pages Supervisors: Dr. Miri Rabinovitz, Supervisors: Dr. Miri Rabinovitz,
Ashley Montebello – CprE Katie Githens – SE Wayne Rowcliffe – SE Advisor/Client: Akhilesh Tyagi.
Software Development A Proposed Process and Methodology.
Team 13 Darius Zakeri – Quality/Consistency Control Officer Esther Quintero – Secretary/Presentation Leader Justus Karban – Project Leader Matt McCloy.
CASE (Computer-Aided Software Engineering) Tools Software that is used to support software process activities. Provides software process support by:- –
Cross Language Clone Analysis Team 2 February 3, 2011.
CS307P-SYSTEM PRACTICUM CPYNOT. B13107 – Amit Kumar B13141 – Vinod Kumar B13218 – Paawan Mukker.
May08-21 Model-Based Software Development Kevin Korslund Daniel De Graaf Cory Kleinheksel Benjamin Miller Client – Rockwell Collins Faculty Advisor – Dr.
System Maintenance Modifications or corrections made to an information system after it has been released to its customers Changing an information system.
The Online World ONLINE DOCUMENTS. Online documents Online documents (such as text documents, spreadsheets, presentations, graphics and forms) are any.
Cross Language Clone Analysis Team 2 February 3, 2011.
SPI NIGHTLIES Alex Hodgkins. SPI nightlies  Build and test various software projects each night  Provide a nightlies summary page that displays all.
GAIA (Genetic Algorithm Interface Architecture) Requirements Analysis Document (RAD) Version 1.0 Created By: Charles Hall Héctor Aybar William Grim Simone.
Equations for Ecademy Client: ISU Computation Center Faculty Advisor: Dr. Robert Anderson Technical Advisor: Dr. Pete Boysen Team Members:  Tim Arganbright,
Clients/Faculty Advisors Dr. Eric Bartlett May01-14 Team Members David Herrick Brian Kerhin Chris Kirk Ayush Sharma Incremental Learning With Neural Networks.
 Project Team: Suzana Vaserman David Fleish Moran Zafir Tzvika Stein  Academic adviser: Dr. Mayer Goldberg  Technical adviser: Mr. Guy Wiener.
UML Class Diagrams David Millard
1 Copyright © 2008, Oracle. All rights reserved. Repository Basics.
Spelling correction. Spell correction Two principal uses Correcting document(s) being indexed Retrieve matching documents when query contains a spelling.
SharePoint 101 – An Overview of SharePoint 2010, 2013 and Office 365
STOCK TRADING SIMULATION SYSTEM
PLM, Document and Workflow Management
CS 325: Software Engineering
T Project Review Group: pdm I2 Iteration
Software Documentation
An Introduction to Data Warehousing
Chapter 1 Introduction(1.1)
COMP 208/214/215/216 – Lecture 7 Documenting Design.
McGraw-Hill Technology Education
Presentation transcript:

TCN Spell Checker Team AZP: Mark Biddlecom, Joshua Correa, Jatinder Singh, Zianeh Kemeh- Gama, Eric Engquist

Team AZP Team descendant of previous project groups Primary roles by member:  Joshua Correa – Project Lead, TCN Liason  Eric Engquist – Materials and Metrics Manager  Mark Biddlecom – Resource and Process Manager  Zianeh Kemeh-Gama – Schedule Manager  Jatinder Singh – Research Lead  Dr. Ludi – Faculty Advisor Website:

TCN Software development and staffing company based here in Rochester, NY Developer of web-based search and knowledge management programs  KnowledgeTrac Customizable multilingual web search tool Standalone spider  TecTrac, AppTrac, AuditTrac, HelpTrac, TestTrac Document and database search and management tools

Document Collaboration Tool Online repository for management documents  Meeting minutes  Metrics  Research links  Presentations and diagrams  Task and issues for each team member  notifications of changes Custom developed for this project

Spell Checker Should compensate for mistyped search terms  Match misspelled words with correct spelling “atourney”  attorney  Match misspelled words with correct results “atourney”  legal services, lawyers Meant to make searches more useful for average web search users  1) Takes in search terms from user  2) Checks spelling/matches with known search terms  3) Returns suggestions to search engine

Spell Checker Requirements Functional Requirements: Look up search terms in a dictionary Suggest replacements for misspelled terms (closest match) Add new terms to dictionary Process phrases (as opposed to single words) Support multiple dictionaries

Spell Checker Requirements Non-functional Requirements: Object-oriented design to be implemented as a web service with VB.NET Adaptability  Must support ability to work with different data stores  Must support the addition of new components Performance  Analysis of a search string cannot take longer than one second.

Spell Check Process Load configuration Load dictionaries (from cache or rebuild) Apply rules  Parse search string  Apply algorithm to each term  Short-circuit if enough results have been found Return results set of suggestions

Configuration Application configuration file  Provides application-level settings (e.g., maximum memory usage, maximum processor time for search)  Points to search configuration file Search configuration file  Allows control over how memory is used vs. algorithm performance  Defines dictionaries and methodologies  Methodologies include rules

Loaders Load a set of words for use in dictionaries Used to create root dictionaries ( in the configuration file) Word sets returned by loaders are not cached, but instead used to create algorithm dictionaries

Formatters Provide a dictionary specialized for use with a specific algorithm Created by tags in the configuration file Dictionaries created by formatters are cached for use between application sessions

Parsers Split a search string up into a number of terms For a given rule, the algorithm is applied to each term supplied by the parser

Data Flow

Algorithms – String Similarity Calculates number of operations to go from one word to another  Insertion, Deletion, Substitution Few operations  Good Suggestion Extra features  Swapping operation  Operation weighting

Algorithms – String Similarity Complexity of O(s1*s2)  S1,s2 lengths of strings being compared Can be improved to O(s1*k)  K is edit distance worD W10123 A21234 r32222

Algorithms - Phonetic Several rules used to parse English words into a sequence of phonetic sounds  Example: Phonetic  pntk Parse dictionary, parse search term String similarity comparison

Deliverable Schedule Iteration 1: February 1st 2005 Complete system design for system iterations 1-3 Instructions for installation and integration with TCN client software Research  Analysis of historic search strings and business names from TCN  Dictionaries (common words)  Word search algorithms Basic System Implementation Database integration Testing

Deliverable Schedule Iteration 2: February 18th 2005 Suggest replacements for words not in the dictionary Addition of a new search algorithm to provide more intelligent searches  Closest Match Using multiple dictionaries Unit Testing for all written code

Deliverable Schedule Iteration 3: March 21st 2005 Phonetic Matching Dynamically add words/phrases to the dictionary Support phrase searching Addition of further search algorithms GUI Configuration tool Algorithm Optimization

Metrics Schedule/estimation accuracy  Estimation accuracy (hours per task)  Slippage percentages Defect statistics and analysis  Severity and complexity of defects  Defect source tracking  Average age of defects

Age of Known Defects

Severity of Defects

Complexity of Defects

Sources of Defects

Research References “Approximate String Matching” by Ricardo Baeza-Yates at University of Chile “A Guided Tour to Approximate String Matching” by Gonzalo Navarro at University of Chile, 2001 “An Extension of Ukkonen’s Enhanced Dynamic Programming ASM Algorithm” by Hal Berghel (U of Arkansas) and David Roach (Acxiom Corp.), 1996