Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.

Slides:



Advertisements
Similar presentations
Yansong Feng and Mirella Lapata
Advertisements

The Chinese Room: Understanding and Correcting Machine Translation This work has been supported by NSF Grants IIS Solution: The Chinese Room Conclusions.
Academic Writing Writing an Abstract.
Book Report Academic Writing for Graduate Students Essential Tasks and Skills (3 rd edition) Asst. Prof. Dr. Siriluck Usaha Department of English for Business.
Writing for Publication
Expository Essay Academic Vocabulary.
Sentiment Analysis An Overview of Concepts and Selected Techniques.
Funding Networks Abdullah Sevincer University of Nevada, Reno Department of Computer Science & Engineering.
Text Specificity and Impact on Quality of News Summaries Annie Louis & Ani Nenkova University of Pennsylvania June 24, 2011.
Predicting Text Quality for Scientific Articles AAAI/SIGART-11 Doctoral Consortium Annie Louis : Louis A. and Nenkova A Automatically.
Language, Gender and Culture
English Skills, Chapter 18 by John Langan
Approaches to automatic summarization Lecture 5. Types of summaries Extracts – Sentences from the original document are displayed together to form a summary.
Common Core State Standards Professional Learning Module Series
Expository Writing.
Writing a Synthesis Essay
Query session guided multi- document summarization THESIS PRESENTATION BY TAL BAUMEL ADVISOR: PROF. MICHAEL ELHADAD.
Introduction.  Classification based on function role in classroom instruction  Placement assessment: administered at the beginning of instruction 
Writing Reports, Proposals, and Technical Documents By Laurie A. Pinkert.
Dr. Alireza Isfandyari-Moghaddam Department of Library and Information Studies, Islamic Azad University, Hamedan Branch
Group 8 ‘GudBoyz’ teaching writing to L2 learners Agus Prayogo Asih Nurakhir Nico Ouwpoly Sutarno.
What is Readability?  A characteristic of text documents..  “the sum total of all those elements within a given piece of printed material that affect.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
IMSS005 Computer Science Seminar
1 Academic Skills Tips for Essay Writing. 2 Outline of today’s lecture Academic skills Essay writing Paraphrasing Summarizing.
Invention and Arrangement
11 Writing a Conference Research Paper Miguel A. Labrador Department of Computer Science & Engineering
Chris Luszczek Biol2050 week 3 Lecture September 23, 2013.
Easy-to-Understand Tables RIT Standards Key Ideas and Details #1 KindergartenGrade 1Grade 2 With prompting and support, ask and answer questions about.
How to write better text responses A Step by Step Guide.
1 The Ferret Copy Detector Finding short passages of similar texts in large document collections Relevance to natural computing: System is based on processing.
This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.
A Machine Learning Approach to Sentence Ordering for Multidocument Summarization and Its Evaluation D. Bollegala, N. Okazaki and M. Ishizuka The University.
Summary-Response Essay Responding to Reading. Reading Critically Not about finding fault with author Rather engaging author in a discussion by asking.
Opinion Holders in Opinion Text from Online Newspapers Youngho Kim, Yuchul Jung and Sung-Hyon Myaeng Reporter: Chia-Ying Lee Advisor: Prof. Hsin-Hsi Chen.
*Erasmus University Rotterdam P.O. Box 1738, NL-3000 DR Rotterdam, the Netherlands † Teezir BV Wilhelminapark 46, NL-3581 NL, Utrecht, the Netherlands.
Summarization Focusing on Polarity or Opinion Fragments in Blogs Yohei Seki Toyohashi University of Technology Visiting Scholar at Columbia University.
Writing a Critical Review
How to Write an Excellent AP English Language and Composition Essay
Methods for Automatic Evaluation of Sentence Extract Summaries * G.Ravindra +, N.Balakrishnan +, K.R.Ramakrishnan * Supercomputer Education & Research.
From description to analysis
Automatic Evaluation of Linguistic Quality in Multi- Document Summarization Pitler, Louis, Nenkova 2010 Presented by Dan Feblowitz and Jeremy B. Merrill.
Blog Summarization We have built a blog summarization system to assist people in getting opinions from the blogs. After identifying topic-relevant sentences,
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
 An article review is written for an audience who is knowledgeable in the subject matter instead of a general audience  When writing an article review,
Have we had Hard Times or Cosy Times? A Discourse Analysis of Opinions Expressed over Socio-political Events in News Editorials Bal Krishna Bal Information.
1 Generating Comparative Summaries of Contradictory Opinions in Text (CIKM09’)Hyun Duk Kim, ChengXiang Zhai 2010/05/24 Yu-wen,Hsu.
UWMS Data Mining Workshop Content Analysis: Automated Summarizing Prof. Marti Hearst SIMS 202, Lecture 16.
How to Write Abstract How to write title? a good title (typically 10–12 words long) 6,7 will use descriptive terms and phrases that.
L ITERATURE REVIEW RESEARCH METHOD FOR ACADEMIC PROJECT I.
Learning Development Centre
Principals of Research Writing. What is Research Writing? Process of communicating your research  Before the fact  Research proposal  After the fact.
Writing a paragraph.
DESIGNING AN ARTICLE Effective Writing 3. Objectives Raising awareness of the format, requirements and features of scientific articles Sharing information.
Pastra and Saggion, EACL 2003 Colouring Summaries BLEU Katerina Pastra and Horacio Saggion Department of Computer Science, Natural Language Processing.
INFOMGP Student names and numbers Papers’ references Title.
Writing Exercise Try to write a short humor piece. It can be fictional or non-fictional. Essay by David Sedaris.
An evolutionary approach for improving the quality of automatic summaries Constantin Orasan Research Group in Computational Linguistics School of Humanities,
Short Text Similarity with Word Embedding Date: 2016/03/28 Author: Tom Kenter, Maarten de Rijke Source: CIKM’15 Advisor: Jia-Ling Koh Speaker: Chih-Hsuan.
Abstract  An abstract is a concise summary of a larger project (a thesis, research report, performance, service project, etc.) that concisely describes.
Language, Gender and Culture Lorde “Transformation of Silence into Language and Action.” Young “Prelude: The Barbershop.”
Automatic Writing Evaluation
academic language development
Academic writing.
Dr Anie Attan 26 April 2017 Language Academy UTMJB
Reading & Writing assignments in Chemistry
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Parts of an Academic Paper
Improving a Pipeline Architecture for Shallow Discourse Parsing
Writing A critical Review
Presentation transcript:

Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova

 Quality of content and writing in the text  Useful to know text quality in different settings  Eg: Search Lots of relevant results Further rank by content and writing quality Text quality: Well-written nature 2

Problem definition  “Define text quality factors that are 1. generic (applicable to most texts) 2. domain-specific (unique to writing about science) and develop automatic methods to quantify them.”  Two types of science writing 3 1. Conference and journal publications 2. Science journalism

Application settings 4 2. Writing feedback 3. Science news recommendation 1. Evaluation of system- generated summaries Generic text quality Domain-specific text quality

Previous work in text quality prediction  Focus on generic indicators of text quality Word familiarity, sentence length, syntax, discourse 5 Machine-produced text Summarization, machine translation Human-written text Predicting grade level of an article Automatic essay scoring 12

Thesis contributions 1. Make a distinction between generic and domain- specific text quality aspects 2. Define new domain-specific aspects in the genre of writing about science 3. Demonstrate the use of these measures in representative applications 6

Overview 1. Generic text quality factors and summary evaluation 2. Predicting quality for science articles and applications 7

I. Generic text quality - Applied to Automatic Summary Evaluation 8

Automatic Summary Evaluation  Facilitates system development  Lots of summaries with human ratings available From large scale summarization evaluations  Goal: find automatic metrics that correlate with human judgements of quality 9 1. Content quality - What is said in the summary? 2. Linguistic quality - How it is conveyed?

1. Content evaluation of summaries [Louis, Nenkova, 2009]  Input-summary similarity ~ summary content quality  Best way to measure similarity: Jensen- Shannon divergence  JSD: How much two probability distributions differ Word distributions: ‘input’ I, ‘summary’ S 10

Performance of the automatic content evaluation method  When systems are ranked by JS divergence scores, the ranking correlates highly with human assigned ranks: 0.88  Among the best systems for evaluating news summaries 11

2. Linguistic quality evaluation for summaries [Pitler, Louis, Nenkova, 2010]  Consider numerous aspects Syntax, referring expressions, discourse connectives, 1. Language models: familiarity of words A huge table of words and their probabilities in large corpus of general text Use these probabilities to predict familiarity of new texts 2. Syntax: sentence complexity Parse tree depth Length of phrases 12

Performance of evaluation method 3. Word coherence: flow between sentences Learn conditional probabilites (w 2 /w 1 ) where w 1 and w 2 occur in subsequent sentences from a large corpus Use to compute likelihood of a new sentence sequence  The method is 80% accurate for ranking systems and evaluated on news summaries 13

Why domain-specific factors?  Generic factors matter for most texts and give us useful applications  What are other domain-specific factors?  They might aid developing other interesting applications 14

II. Predicting quality of science articles - Publications and science news 15

Science writing has distinctive characteristics  Their function is different from informational texts  Academic writing in several genres involve properly motivating the problem and approach  Science journalists should create interest in research study among lay readers 16

Academic and Science News writing 17  … We identified 43 features … from the text and that could help determine the semantic similarity of two short text units. [Hatzivassiloglou et. al, 2001]  A computer is fed pairs of text samples that it is told are equivalent -- two translations of the same sentence from Madame Bovary, say. The computer then derives its own set of rules for recognizing matches. [Technology Review, 2005]

My hypotheses  Academic writing 1. Subjectivity: opinion, evaluation, argumentation 2. Rhetorical zones: role of a sentence in the article  Science journalism 1. Visual nature: aid explaining difficult concepts 2. Surprisal: present the unexpected thereby creating interest 18

First challenge: Defining text quality  Academic writing Citations Annotations: are not highly correlated with citations  Science journalism New York Times articles from Best American Science Writing books Negative examples are sampled from NYT corpus around similar topic during the same time 19

Annotation for academic writing 20  Abstract, introduction, related work, conclusion  Focus annotations using a set of questions  Introduction Why is this problem important? Has the problem been addressed before? Is the proposed solution motivated and explained?  Pairwise: Article A vs. Article B More reliable than ratings on a scale (1-5)

Text quality factors for writing about science 21 Academic writing Subjectivity Rhetorical zones Science news Surprisal Visual quality 12

Subjectivity: Academic writing  Opinions make an article interesting! “Conventional methods to solve this problem are complex and time-consuming.” 22

Automatic identification of subjective expressions 1. Annotate subjective expressions: clause level 2. Create a dictionary of positive/negative words in academic writing using unsupervised methods 3. Classify a clause as subjective or not, depending on polar words and other features Eg. Context: subjective expressions often occur near causal relations and near statements which describe technique/approach 23

Rhetorical zones: Academic writing  Defined for each sentence: function of the sentence in the article  Previous work in this area have devised annotation schemes and have shown good performance on automatic zone prediction Used for information extraction and summarization 24 Aim … Background … Own work … Comparison

Rhetorical zones and text quality  Hypothesis: good and poorly-written articles would have different distribution and sequence of rhetorical zones  Approach Identify zones Compute features related to sizes of zones and likelihood under transition model of good articles 25 aim motivation example prior work comparison Sequences in good articles

A simple authoring tool for academic writing  Highlighting based feedback  Mark zone transitions that are less preferable  Low levels of subjectivity 26

Surprisal: Science news  “Sara Lewis is fluent in firefly.”  Syntactic, lexical, topic correlates of surprise Surprisal under language model Parse probability Verb-argument compatibility Order of verbs Rare topics in news 27

Visual quality: Science news  Large corpus of tags associated with images  Visual words and article quality Concentration of visual words Position in the article (lead, beginning of paragraphs) Variety in visual topics (tags from different pictures) 28 Lake, mountain, tree, clouds … Visual words

Article recommendation for science news  People who like reading science news  Ask for a preferred topic and show matching articles Ranking 1: based on relevance to keyword Ranking 2: incorporate visual and surprisal scores with relevance  Evaluate how often ranking 2 is preferred 29

Summary  In this thesis, I develop text quality metrics which are Generic: Summary evaluation Domain-specific: Focused on scientific writing Evaluated in relevant application-settings  Challenges Defining text quality Technical approach Designing feedback in the authoring support tool 30

Thank you! 31