Presentation is loading. Please wait.

Presentation is loading. Please wait.

Final Project: English Preposition Usage Checker J.-S. Roger Jang ( 張智星 ) MIR Lab, CSIE Dept. National Taiwan University.

Similar presentations


Presentation on theme: "Final Project: English Preposition Usage Checker J.-S. Roger Jang ( 張智星 ) MIR Lab, CSIE Dept. National Taiwan University."— Presentation transcript:

1 Final Project: English Preposition Usage Checker J.-S. Roger Jang ( 張智星 ) jang@mirlab.org http://mirlab.org/jang MIR Lab, CSIE Dept. National Taiwan University

2 2/7 English Preposition Usage Checker English preposition usage checker Goal: To suggest the right usage of English prepositions based on Google Web 1T dataset 20 prepositions in consideration of, to, in, for, with, on, at by, from, up, about, than, after, before, down, between, under, since, without, near. General procedure Generate a candidate set of extended queries based on the given query. Compute the frequency of each element in the candidate set. List top-10 candidates (with nonzero frequencies) based on descending order of the frequency.

3 3/7 Corpus Google Web 1T dataset Freely available at LDC Introduction: Natural Language Corpus Data by Peter NorvigNatural Language Corpus DataPeter Norvig Applications: Linggle by Jason Chang sLinggleJason Chang Our version: A slim version around 300MB, already under /tmp2/dsa2016_project/ of linux1 to linux13. 2-gram (bigrams) 3-gram (trigrams) 4-gram (fourgrams) 5-gram (fivegrams)

4 4/7 Two rules for generating the candidate set If there is no prepositions in the query Define EDIT a based on insertion only Candidate set = EDIT a (EDIT a (query)) + EDIT a (query) + query Otherwise Find preposition sequences Define EDIT b based on insert, delete, substitute Find EDIT b set of each preposition sequences Candidate set = Cartesian product of all EDIT b sets, together with the original non-preposition words. Remember to add the original input query to the candidate set of extended queries. Candidate Set

5 5/7 Examples like listen musiclog in to check with

6 6/7 Example Input and Output Files Example input have difficulty finding angry at me pleased at me worry for cancer Example output query: have difficulty finding output: 3 have difficulty finding30918 have difficulty in finding4636 to have difficulty finding1174 query: angry at me output: 2 angry with me60929 angry at me24354 query: pleased at me output: 3 pleased me40402 pleased with me10015 pleased for me1067 query: worry for cancer output: 1 worry about cancer2120 At most 10 entries ordered by freq first, then by alphabetic order

7 7/7 Percentage of scoring 75%: Correctness of your program 25%: Efficiency of your program Formula: 0.75*C+25*(1-rankRatio)*C/100 C: Score based on correctness of answer rankRatio: Zero-based rank of speed divided by no. of students minus 1 The formula will not be changed unless something unexpected happens to destroy the fairness of the original formula. Scoring

8 8/7 Other tasks that rely on Google Web1T dataset Text normalization Smoothing Grammar/usage check for a given sentence Verbs & nouns collocation Scoring of English articles Real-time computer-assisted composition Demos Linggle Writeahead Future Work


Download ppt "Final Project: English Preposition Usage Checker J.-S. Roger Jang ( 張智星 ) MIR Lab, CSIE Dept. National Taiwan University."

Similar presentations


Ads by Google