Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Model for Learning Words by Crawling the Web Jeff Thomson, Sygys.com Rex Gantenbein, University of Wyoming 1CAINE November 2009.

Similar presentations


Presentation on theme: "A Model for Learning Words by Crawling the Web Jeff Thomson, Sygys.com Rex Gantenbein, University of Wyoming 1CAINE November 2009."— Presentation transcript:

1 A Model for Learning Words by Crawling the Web Jeff Thomson, Sygys.com Rex Gantenbein, University of Wyoming 1CAINE November 2009

2 Overview Goal: create an autonomous language learning system – Use Web crawler technology – Extract meaning from paragraphs and sentences to create language understanding Major issues – Irregularity of natural language constructions – Understanding paragraphs and sentences – Determining meaning of new words CAINE November 2009 2

3 Handling irregularities Most major parts of a language (English, anyway) can be generalized – Exceptions require preprocessing to fit them into generalizable categories – Example: Inflectional endings on verbs batis batsam battingare battedwas CAINE November 20093

4 Handling irregularities Idiomatic phrases require understanding of the entire phrase in a colloquial context “Go jump in the lake” vs. “Go cook yourself an egg” Pronoun resolution “Three boys each bought a pizza. They ate them in the park.” CAINE November 20094

5 Extracting understanding Paragraph understanding – Matching paragraph structure to common forms – Finding the nucleus of the paragraph’s meaning Sentence understanding – Matching sentence structure to common forms – Determining the meaning of the words in the sentence CAINE November 20095

6 Our approach Exception-first processing – Preprocessing to handle irregularities Linguistic classifications based on tree structure CAINE November 20096 ClauseFiniteImperativeIndicativeNon-finite Interrogative Declarative

7 Our approach Parser (incorporated into Web crawler) to determine structure – Some structures are disregarded when keywords are already classified Word classification – Type, gender, number – Unknown words are analyzed according to rules using placement in sentence and surrounding classified words CAINE November 20097

8 Our approach Keyword recognition – Use “word chains” (sequences of words) with application of linguistic knowledge Word-level understanding – Reduce words to root form to process them as keywords – Reduce irregular forms using an exception database created at preprocessing CAINE November 20098

9 System model Exception database – Separates generalizable and exception verbs – Processes word endings – Scans exception database for exception – Processes “normal” words according to rules CAINE November 20099

10 System model Categorization generator – Separates generalizable and exception words – Processes word endings – Scans exception database for exceptions and processes these first – Processes “normal” words according to rules Sentence parser with disregard capacity Paragraph understanding rules CAINE November 200910

11 System model Web crawler searches for source material – Processes the material and enhances its own rules and exceptions – Eventually will learn enough to understand most material in a given language Future work – Implement a pilot version of this system – Determine how to control for a “given” language CAINE November 200911

12 Questions? CAINE November 200912


Download ppt "A Model for Learning Words by Crawling the Web Jeff Thomson, Sygys.com Rex Gantenbein, University of Wyoming 1CAINE November 2009."

Similar presentations


Ads by Google