NICK PENDAR AND ELENA COTOS IOWA STATE UNIVERSITY THE 3RD WORKSHOP ON INNOVATIVE USE OF NLP FOR BUILDING EDUCATIONAL APPLICATIONS JUNE 19, 2008 Automatic.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

Chapter 5: Introduction to Information Retrieval
Introduction to: Automated Essay Scoring (AES) Anat Ben-Simon Introduction to: Automated Essay Scoring (AES) Anat Ben-Simon National Institute for Testing.
Hierarchical Linear Modeling for Detecting Cheating and Aberrance Statistical Detection of Potential Test Fraud May, 2012 Lawrence, KS William Skorupski.
® Towards Using Structural Events To Assess Non-Native Speech Lei Chen, Joel Tetreault, Xiaoming Xi Educational Testing Service (ETS) The 5th Workshop.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
Using Web Queries for Learner Error Detection Michael Gamon, Microsoft Research Claudia Leacock, Butler-Hill Group.
Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.
Text Classification With Support Vector Machines
1 Application of Metamorphic Testing to Supervised Classifiers Xiaoyuan Xie, Tsong Yueh Chen Swinburne University of Technology Christian Murphy, Gail.
Support Vector Machines Based on Burges (1998), Scholkopf (1998), Cristianini and Shawe-Taylor (2000), and Hastie et al. (2001) David Madigan.
Sentence Classifier for Helpdesk s Anthony 6 June 2006 Supervisors: Dr. Yuval Marom Dr. David Albrecht.
Xiaomeng Su & Jon Atle Gulla Dept. of Computer and Information Science Norwegian University of Science and Technology Trondheim Norway June 2004 Semantic.
Introduction to Machine Learning Approach Lecture 5.
Introduction to machine learning
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Introduction.  Classification based on function role in classroom instruction  Placement assessment: administered at the beginning of instruction 
Mining the Peanut Gallery: Opinion Extraction and Semantic Classification of Product Reviews K. Dave et al, WWW 2003, citations Presented by Sarah.
Mining Binary Constraints in the Construction of Feature Models Li Yi Peking University March 30, 2012.
Slide Image Retrieval: A Preliminary Study Guo Min Liew and Min-Yen Kan National University of Singapore Web IR / NLP Group (WING)
Experiences and requirements in teacher professional development: Understanding teacher change Sylvia Linan-Thompson, Ph.D. The University of Texas at.
Language Identification of Search Engine Queries Hakan Ceylan Yookyung Kim Department of Computer Science Yahoo! Inc. University of North Texas 2821 Mission.
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Text Classification using SVM- light DSSI 2008 Jing Jiang.
1 A study on automatically extracted keywords in text categorization Authors:Anette Hulth and Be´ata B. Megyesi From:ACL 2006 Reporter: 陳永祥 Date:2007/10/16.
COMPUTER-ASSISTED PLAGIARISM DETECTION PRESENTER: CSCI 6530 STUDENT.
The CoNLL-2013 Shared Task on Grammatical Error Correction Hwee Tou Ng, Yuanbin Wu, and Christian Hadiwinoto 1 Siew.
 Text Representation & Text Classification for Intelligent Information Retrieval Ning Yu School of Library and Information Science Indiana University.
GA-Based Feature Selection and Parameter Optimization for Support Vector Machine Cheng-Lung Huang, Chieh-Jen Wang Expert Systems with Applications, Volume.
Universit at Dortmund, LS VIII
A Weakly-Supervised Approach to Argumentative Zoning of Scientific Documents Yufan Guo Anna Korhonen Thierry Poibeau 1 Review By: Pranjal Singh Paper.
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
Text Feature Extraction. Text Classification Text classification has many applications –Spam detection –Automated tagging of streams of news articles,
Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.
Binxing Jiao et. al (SIGIR ’10) Presenter : Lin, Yi-Jhen Advisor: Dr. Koh. Jia-ling Date: 2011/4/25 VISUAL SUMMARIZATION OF WEB PAGES.
Xiangnan Kong,Philip S. Yu Multi-Label Feature Selection for Graph Classification Department of Computer Science University of Illinois at Chicago.
CS 6998 NLP for the Web Columbia University 04/22/2010 Analyzing Wikipedia and Gold-Standard Corpora for NER Training William Y. Wang Computer Science.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
A Systematic Exploration of the Feature Space for Relation Extraction Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois,
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
Chapter 23: Probabilistic Language Models April 13, 2004.
TEXT ANALYTICS - LABS Maha Althobaiti Udo Kruschwitz Massimo Poesio.
Neural Text Categorizer for Exclusive Text Categorization Journal of Information Processing Systems, Vol.4, No.2, June 2008 Taeho Jo* 報告者 : 林昱志.
Matwin Text classification: In Search of a Representation Stan Matwin School of Information Technology and Engineering University of Ottawa
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
Recognizing Stances in Online Debates Unsupervised opinion analysis method for debate-side classification. Mine the web to learn associations that are.
UWMS Data Mining Workshop Content Analysis: Automated Summarizing Prof. Marti Hearst SIMS 202, Lecture 16.
0 / 27 John-Paul Hosom 1 Alexander Kain Brian O. Bush Towards the Recovery of Targets from Coarticulated Speech for Automatic Speech Recognition Center.
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
Detecting Missing Hyphens in Learner Text Aoife Cahill, SusanneWolff, Nitin Madnani Educational Testing Service ACL 2013 Martin Chodorow Hunter College.
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
A Maximum Entropy Language Model Integrating N-grams and Topic Dependencies for Conversational Speech Recognition Sanjeev Khudanpur and Jun Wu Johns Hopkins.
Learning Event Durations from Event Descriptions Feng Pan, Rutu Mulkar, Jerry R. Hobbs University of Southern California ACL ’ 06.
COURSE AND SYLLABUS DESIGN
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.
By: Niraj Kumar Automatic Essay Grading Novelty Detection 1. 2.
TEXT CLASSIFICATION AND CLASSIFIERS: A SURVEY & ROCCHIO CLASSIFICATION Kezban Demirtas
King Faisal University جامعة الملك فيصل Deanship of E-Learning and Distance Education عمادة التعلم الإلكتروني والتعليم عن بعد [ ] 1 جامعة الملك فيصل عمادة.
© NCSR, Frascati, July 18-19, 2002 CROSSMARC big picture Domain-specific Web sites Domain-specific Spidering Domain Ontology XHTML pages WEB Focused Crawling.
Opinion spam and Analysis 소프트웨어공학 연구실 G 최효린 1 / 35.
The University of Illinois System in the CoNLL-2013 Shared Task Alla RozovskayaKai-Wei ChangMark SammonsDan Roth Cognitive Computation Group University.
Language Identification and Part-of-Speech Tagging
Automatic Writing Evaluation
Independent work of students
INF 141: Information Retrieval
Extracting Why Text Segment from Web Based on Grammar-gram
Presentation transcript:

NICK PENDAR AND ELENA COTOS IOWA STATE UNIVERSITY THE 3RD WORKSHOP ON INNOVATIVE USE OF NLP FOR BUILDING EDUCATIONAL APPLICATIONS JUNE 19, 2008 Automatic Identification of Discourse Moves in Scientific Article Introductions

Outline Background and motivation Discourse move identification  Data and annotation scheme  Feature selection  Sentence representation  Classifier  Evaluation  Inter-annotator agreement Further work

Automated evaluation: Background Automated essay scoring (AES) in performance-based and high-stakes standardized tests (e.g., ACT, GMAT, TOEFL, etc.) ‏ Automated error detection in L2 output (Burstein and Chodorow, 1999; Chodorow et al., 2007; Han et al., 2006; Leacock and Chodorow, 2003) ‏ Assessment of various constructs, e.g., topical content, grammar, style, mechanics, syntactic complexity, and deviance or plagiarism (Burstein, 2003; Elliott, 2003; Landauer et al., 2003; Mitchell et al., 2002; Page, 2003; Rudner and Liang, 2002) Text organization limited to recognizing the five- paragraph essay format, thesis, and topic sentences AntMover (Anthony and Lashkia, 2003) ‏

Wide range of possibilities for high quality evaluation and feedback ( Criterion ; Burstein, Chodorow, & Leacock, 2004) ‏ Potential in formative assessment, but – the effects of intelligent formative feedback are not fully investigated Warschauer and Ware (2006) call for the development of a classroom research agenda that would help evaluate and guide the application of AES in the writing pedagogy “the potential of automated essay evaluation for improving student writing is an empirical question, and virtually no peer-reviewed research has yet been published” (Hyland and Hyland, 2006, p. 109) ‏ Automated evaluation: CALI Motivation

Automated evaluation: EAP Motivation EAP pedagogical approaches (Cortes, 2006; Levis & Levis- Muller, 2003; Vann & Myers, 2001) fail to provide NNSs with sufficient academic writing practice and remediational guidance Problem of disciplinarity An NLP-based academic discourse evaluation software application could account for this drawback Such an application has not yet been developed

Automated evaluation: Research Motivation Long-term research goals:  design and implementation of IADE (Intelligent Academic Discourse Evaluator) ‏  analysis of IADE effectiveness for formative assessment purposes

Evaluates students’ research article introductions in terms of moves/steps (Swales 1990, 2004) ‏ Draws from  SLA models: interactionist views (Caroll, 1999; Gass, 1997; Long, 1996; Long & Robinson, 1998; Mackey, Gass, & McDonough, 2000; Swain, 1993) and Systemic Functional Linguistics (Martin, 1992; Halliday, 1985) ‏  Skill Acquisition Theory of learning (DeKeyser, 2007 ) ‏ Is informed by empirical research on the provision of feedback Is informed by Evidence Centered Design principles (Mislevy et al., 2006) ‏

Discourse Move Identification Approached as a classification problem (similar to Burstein et al., 2003) ‏  given a sentence and a finite set of moves and steps, what move/step does the sentence signify? ISUAW corpus: 1,623 articles; 1,322,089 words; average length of articles words Stratified sampling of 401 introduction sections representative of 20 academic disciplines Sub-corpus: 267,029 words; average length words; 11,149 sentences Manual annotation

Discourse Move Identification Annotation scheme (Swales, 1990; Swales, 2004) ‏

Discourse Move Identification Multiple layers of annotation for cases when the same sentence signified more than one move or more than one step

Feature Selection Features that reliably indicate a move/step Text-categorization approach (see Sebastiani, 2002) ‏ Each sentence treated as a data item to be classified and represented as an n-dimensional vector in the Euclidean space The task of the learning algorithm is to find a function F : S → M that would map the sentences in the corpus S to classes in M = {m1,m2,m3} Identification of moves, not yet steps

Feature Selection Extraction of word unigrams, bigrams, and trigrams from the annotated corpus Preprocessing:  All tokens stemmed using the NLTK port of the Porter Stemmer algorithm (Porter, 1980) ‏  All numbers in the texts replaced by the string _number_  The tokens inside each n-gram alphabetized in case of bigrams and trigrams  All n-grams with a frequency of less than five excluded

Feature Selection Odds ratio Conditional probabilities are calculated as maximum likelihood estimates N-grams with maximum odds ratios selected as features

Sentence Representation Each sentence represented as a vector Presence or absence of terms in sentences recorded as Boolean values (0 for the absence of the corresponding term or a 1 for its presence) ‏

Classifier Support Vector Machines (SVM) (Basu et al., 2003; Burges, 1998; Cortes and Vapnik, 1995; Joachims, 1998; Vapnik, 1995) ‏ five-fold cross validation Machine learning environment RAPIDMINER (Mierswa et al., 2006) ‏  RBF kernel found through a set of different parameter settings on the feature set with 3,000 unigrams Parameters not necessarily the best; exhaustive searches will be performed on the other feature sets

Evaluation Five-fold cross validation on 14 different feature sets were performed

Evaluation Accuracy - the proportion of classifications that agreed with the manually assigned labels

Evaluation Precision - what proportion of the items assigned to a given category actually belonged to it Recall - what proportion of the items actually belonging to a category were labeled correctly

Evaluation Trigram models result in the best precision Unigram models result in the best recall

Evaluation Move 2 is most difficult to identify as revealed by error analysis – Move 2 gets misclassified as Move 1  Use the relative position of the sentence in the text to disambiguate the move involved  see what percentage of Move 2 sentences identified as Move 1 by the system also have been labeled Move 1 by the annotator Extracted features are not discipline-dependent

This just in… Built a model with top 3000 unigrams and top 3000 trigrams  Precision: 91.14%  Recall: 82.98%  Kappa: 87.57

Inter-annotator agreement Second annotations on a sample of files across all 20 disciplines = 487 sentences k - inter-annotator agreement  P(A) - observed probability of agreement  P(E) - expected probability of agreement Average k = over the three moves

Further work on IADE Ongoing experiments to improve accuracy  experimenting with different kernel parameters to find optimal models More annotation Inter-annotator agreement (3 annotators) ‏ Identification of steps Development of intelligent feedback Web interface design

Further research with IADE Evaluation of IADE effectiveness  Learning potential  Learner fit  Meaning focus  Authenticity  Impact  Practicality (Chapelle, 2001) Process/product research direction - interaction between use and outcome (Warschauer &Ware, 2006) ‏ Target for evaluation - “what is taught through technology” (Chapelle, 2007, p.30) ‏

THANK YOU! Questions? Suggestions?