Presentation is loading. Please wait.

Presentation is loading. Please wait.

Natural Language Processing Course Project: Zhao Hai 赵海 Department of Computer Science and Engineering Shanghai Jiao Tong University

Similar presentations


Presentation on theme: "Natural Language Processing Course Project: Zhao Hai 赵海 Department of Computer Science and Engineering Shanghai Jiao Tong University"— Presentation transcript:

1 Natural Language Processing Course Project: Zhao Hai 赵海 Department of Computer Science and Engineering Shanghai Jiao Tong University zhaohai@cs.sjtu.edu.cn zhaohai@cs.sjtu.edu.cn

2 2 Goals Develop an English grammatical error checker –Only consider tense errors for verbs

3 Examples 2 I plays football yesterday. 2 l drink tea last week. 2 Mary visits the factory last month. 2 I finished reading the novel by nine o'clock last night. 2 We has learned over two thousand English words by the end of last term. 3 They had plant six hundred trees by the end of last Wednesday. 3

4 4 Data Format Data format of input file like the following (each sentence in a line) : –I likes this bicycle. You program can support the above test input file and output your results as follows with numbers indicate which words have errors ( -1 means no error). –2 I likes bicycle as I was a boy. –2 7 He follow the great idea that have made a great success. –-1 I enjoy the dinner. All submitted systems should accept arguments in command line : –Your_program_test.input output.test

5 5 Evaluation Metric: Definition Comparing the difference between golden test data and your system outputs, our evaluation program will get a f-score to score your outputs F=2RP/(R+P) R = number of correctly marked words / number of problematic words in golden set P = number of correctly marked words / number of marked words in output

6 6 Schedule Five weeks for your system. Test dataset will be released 24 hours in advance before the submission deadline for your system outputs.

7 7 Submission Four parts are required for the submission (please package all your files and then upload): –The complete source code of your system, and one executable file for a specific OS at least. –Document 1 : about your code infrastructure, compiling options and environment and running setting. –Document 2 : the principles of your system, including which classifier, features and decoding algorithm that your opt. –If available: Models that you train from the provided corpus and your system outputs for the given test data.

8 8 Groups and Scoring Grouping –1 member for a team, 100%

9 9 Groups and Scoring The team who gives the highest F-score will receive a score of 100 and the lowest team will receive 60, other teams will receive their scores based on an interpolation strategy between these two scores. Plus –Document quality You may adopt any open-source toolkit in your system. It has no impact on your system scoring, but We must see a footnote about where the toolkit is from Compiling error, incomplete document, or incorrect data format may cause score loss.

10 10 Attention We will compare all system outputs, exact match will let all teams receive ZERO point. The system that fails to output the same result as that in the corresponding package will receive ZERO point.

11 11 Tips It is expected to be a rule-based system Write your own scoring program

12 12 Techniques Building you checker, you may need part-of-speech for word to design your rules. POS tagging toolkits are available online. Consider using them! If you have to adopt these existing toolkit, then you must provide necessary information in the document to let us know.

13 13 Techniques: building your own POS tagger Machine learning model –HMM, or –Maximum entropy Markov model Decoding algorithm –Viterbi Reference –http://www.aclweb.org/anthology/I/I08/I08-4011.pdfhttp://www.aclweb.org/anthology/I/I08/I08-4011.pdf –For the best performance, two-pass decoding was adopted in the above paper. However, you may consider one-pass only decoding for better efficiency. Tips: there are many open source POS taggers online, consider revise them and integrate them into your system.

14 CoNLL 2013 shared task Survey paper: –http://www.comp.nus.edu.sg/~nlp/conll13st/Co NLLST01.pdfhttp://www.comp.nus.edu.sg/~nlp/conll13st/Co NLLST01.pdf Proceeding –http://wing.comp.nus.edu.sg/~antho/signll.htmlhttp://wing.comp.nus.edu.sg/~antho/signll.html Note this project requires a rule-based system rather than a supervised learning system like CoNLL 2013 shared task 14


Download ppt "Natural Language Processing Course Project: Zhao Hai 赵海 Department of Computer Science and Engineering Shanghai Jiao Tong University"

Similar presentations


Ads by Google