Natural Language Processing Course Project: Zhao Hai 赵海 Department of Computer Science and Engineering Shanghai Jiao Tong University

Slides:



Advertisements
Similar presentations
ExCEL August Institute ExCEL After School High School Credit Recovery
Advertisements

User Interface Design.
Contest format 5 hours, around 8-12 problems One computer running (likely)Linux, plus printer 3 people on one machine No cell phones, calculators, USB.
Lecture 9: May 4 Instructor: Craig Duckett. Announcements Assignment 2 Assignment 2 has been graded and returned! Assignment 2 Revision is on LECTURE.
1/1/ A Knowledge-based Approach to Citation Extraction Min-Yuh Day 1,2, Tzong-Han Tsai 1,3, Cheng-Lung Sung 1, Cheng-Wei Lee 1, Shih-Hung Wu 4, Chorng-Shyong.
Zhang Hongyi CSCI2100B Data Structures Tutorial 2
Khandelwal English Primary School Science Meet.
HOO 2012: A Report on the Preposition and Determiner Error Correction Shared Task Robert Dale, Ilya Anisimoff and George Narroway Centre for Language Technology.
1 Developing Statistic-based and Rule-based Grammar Checkers for Chinese ESL Learners Howard Chen Department of English National Taiwan Normal University.
Introduction to Computer Programming I CSE 113
Shallow Processing: Summary Shallow Processing Techniques for NLP Ling570 December 7, 2011.
Part-of-speech Tagging cs224n Final project Spring, 2008 Tim Lai.
Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur Part-of-Speech Tagging and Chunking with Maximum Entropy Model Sandipan Dandapat.
(Some issues in) Text Ranking. Recall General Framework Crawl – Use XML structure – Follow links to get new pages Retrieve relevant documents – Today.
Introduction to Machine Learning Approach Lecture 5.
Computer Science 2211b Software Tools and Systems Programming.
1 Software John Sum Institute of Technology Management National Chung Hsing University.
Assignment 3: A Team-based and Integrated Term Paper and Project Semester 1, 2012.
Systems Analysis – Analyzing Requirements.  Analyzing requirement stage identifies user information needs and new systems requirements  IS dev team.
© Janice Regan, CMPT 128, Jan CMPT 128 Introduction to Computing Science for Engineering Students Creating a program.
1 Programming Thinking and Method (0) Zhao Hai 赵海 Department of Computer Science and Engineering Shanghai Jiao Tong University
Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.
Profile and a quick introduction Software Engineering: ) هندسة البرمجيات (in Arabic: is the branch of computer science Designed to develop a set rules.
English-Persian SMT Reza Saeedi 1 WTLAB Wednesday, May 25, 2011.
Computer Science 2211b Software Tools and Systems Programming.
Chapter 3: Completing the Problem- Solving Process and Getting Started with C++ Introduction to Programming with C++ Fourth Edition.
CSC-115 Introduction to Computer Programming
Comparative study of various Machine Learning methods For Telugu Part of Speech tagging -By Avinesh.PVS, Sudheer, Karthik IIIT - Hyderabad.
The CoNLL-2013 Shared Task on Grammatical Error Correction Hwee Tou Ng, Yuanbin Wu, and Christian Hadiwinoto 1 Siew.
Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur Part-of-Speech Tagging for Bengali with Hidden Markov Model Sandipan Dandapat,
1 Project Information and Acceptance Testing Integrating Your Code Final Code Submission Acceptance Testing Other Advice and Reminders.
Operating Systems 作業系統 熊博安 國立中正大學資訊工程學系 EA-101, EA001.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
A Weakly-Supervised Approach to Argumentative Zoning of Scientific Documents Yufan Guo Anna Korhonen Thierry Poibeau 1 Review By: Pranjal Singh Paper.
SYMPOSIUM ON SEMANTICS IN SYSTEMS FOR TEXT PROCESSING September 22-24, Venice, Italy Combining Knowledge-based Methods and Supervised Learning for.
Using Short-Answer Format Questions for an English Grammar Tutoring System Conceptualization & Research Planning Jonggun Gim.
Slide 1 Project Management Chapter 4. Slide 2 Objectives ■ Become familiar with estimation. ■ Be able to create a project workplan. ■ Become familiar.
CS 111 – Nov. 22 Chapter 7 Software engineering Systems analysis Commitment –Please read Section 7.4 (only pp ), Sections –Homework #2.
1 These materials are prepared only for the students enrolled in the course Distributed Software Development (DSD) at the Department of Computer Science.
A Systematic Exploration of the Feature Space for Relation Extraction Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois,
©2003 Paula Matuszek Taken primarily from a presentation by Lin Lin. CSC 9010: Text Mining Applications.
Apply Quality Management Techniques Project Quality Processes Certificate IV in Project Management Qualification Code BSB41507 Unit Code BSBPMG404A.
Advanced Legal Writing Seminar: Wednesdays, 10:00 p.m. EST Office Hours: Mondays from 3 – 5 p.m. EST, and by appointment AIM sign-in: cssouthall
IR Homework #3 By J. H. Wang May 4, Programming Exercise #3: Text Classification Goal: to classify each document into predefined categories Input:
IR Homework #3 By J. H. Wang May 10, Programming Exercise #3: Text Classification Goal: to classify each document into predefined categories Input:
Submitted To: Rutvi sarang Submitted By: Kushal Bhagat.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
BSBPMG404A Apply Quality Management Techniques Apply Quality Management Techniques Project Quality Processes C ertificate IV in Project Management
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
Portfolios A number of years ago the portfolio became part of the requirements to attain the two highest levels of graduation status. Though one.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
How to Turnitin Dr Stephen Rankin Lecturer in Academic Writing and Literacy Murdoch University A 6 step guide for submitting your assignments to Turnitin.
Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.
CIS 115 All Exercises Devry University (Devry) FOR MORE CLASSES VISIT CIS 115 All Exercises Devry University.
The University of Illinois System in the CoNLL-2013 Shared Task Alla RozovskayaKai-Wei ChangMark SammonsDan Roth Cognitive Computation Group University.
Introduction Aaron Day ● Software Architect ● Open Solutions Interests and Hobbies ● Family ● Software Development ● Woodworking ● Gaming ● Shooting.
PRG 420 Week 4 Learning Team Quality Control Review To purchase this material click below link Week-4-Learning-Team-Quality-Control-Review.
Project Management Chapter 3.
Big Data Analytics: HW#3
CIS 700 Advanced Machine Learning Structured Machine Learning:   Theory and Applications in Natural Language Processing Shyam Upadhyay Department of.
Digital Speech Processing
OPS/571 Operations Management
Problem Solving Techniques
Compilation VS Interpretation
Software John Sum Institute of Technology Management
Chapter 1 Introduction(1.1)
CSCI 5832 Natural Language Processing
Programming Assignment Tutorial
The Software Development Cycle
Presentation transcript:

Natural Language Processing Course Project: Zhao Hai 赵海 Department of Computer Science and Engineering Shanghai Jiao Tong University

2 Goals Develop an English grammatical error checker –Only consider tense errors for verbs

Examples 2 I plays football yesterday. 2 l drink tea last week. 2 Mary visits the factory last month. 2 I finished reading the novel by nine o'clock last night. 2 We has learned over two thousand English words by the end of last term. 3 They had plant six hundred trees by the end of last Wednesday. 3

4 Data Format Data format of input file like the following (each sentence in a line) : –I likes this bicycle. You program can support the above test input file and output your results as follows with numbers indicate which words have errors ( -1 means no error). –2 I likes bicycle as I was a boy. –2 7 He follow the great idea that have made a great success. –-1 I enjoy the dinner. All submitted systems should accept arguments in command line : –Your_program_test.input output.test

5 Evaluation Metric: Definition Comparing the difference between golden test data and your system outputs, our evaluation program will get a f-score to score your outputs F=2RP/(R+P) R = number of correctly marked words / number of problematic words in golden set P = number of correctly marked words / number of marked words in output

6 Schedule Five weeks for your system. Test dataset will be released 24 hours in advance before the submission deadline for your system outputs.

7 Submission Four parts are required for the submission (please package all your files and then upload): –The complete source code of your system, and one executable file for a specific OS at least. –Document 1 : about your code infrastructure, compiling options and environment and running setting. –Document 2 : the principles of your system, including which classifier, features and decoding algorithm that your opt. –If available: Models that you train from the provided corpus and your system outputs for the given test data.

8 Groups and Scoring Grouping –1 member for a team, 100%

9 Groups and Scoring The team who gives the highest F-score will receive a score of 100 and the lowest team will receive 60, other teams will receive their scores based on an interpolation strategy between these two scores. Plus –Document quality You may adopt any open-source toolkit in your system. It has no impact on your system scoring, but We must see a footnote about where the toolkit is from Compiling error, incomplete document, or incorrect data format may cause score loss.

10 Attention We will compare all system outputs, exact match will let all teams receive ZERO point. The system that fails to output the same result as that in the corresponding package will receive ZERO point.

11 Tips It is expected to be a rule-based system Write your own scoring program

12 Techniques Building you checker, you may need part-of-speech for word to design your rules. POS tagging toolkits are available online. Consider using them! If you have to adopt these existing toolkit, then you must provide necessary information in the document to let us know.

13 Techniques: building your own POS tagger Machine learning model –HMM, or –Maximum entropy Markov model Decoding algorithm –Viterbi Reference – –For the best performance, two-pass decoding was adopted in the above paper. However, you may consider one-pass only decoding for better efficiency. Tips: there are many open source POS taggers online, consider revise them and integrate them into your system.

CoNLL 2013 shared task Survey paper: – NLLST01.pdfhttp:// NLLST01.pdf Proceeding – Note this project requires a rule-based system rather than a supervised learning system like CoNLL 2013 shared task 14