Measuring Linguistic Complexity Kristopher Kyle 3-5-2015.

Slides:



Advertisements
Similar presentations
What you need today 1.Yellow response journal 2.Writers Choice book 3.Sentence combining worksheet (homework) 4.Pen or pencil They are: Simple Sentence.
Advertisements

“Hounding the Innocent”
Tracking L2 Lexical and Syntactic Development Xiaofei Lu CALPER 2010 Summer Workshop July 14, 2010.
Syntax. Definition: a set of rules that govern how words are combined to form longer strings of meaning meaning like sentences.
Day 1 Punctuation and Capitalization
Blueprints or conduits? Using an automated tool for text analysis Stuart G. Towns and Richard Watson Todd King Mongkut’s University of Technology Thonburi.
Grammars, Languages and Parse Trees. Language Let V be an alphabet or vocabulary V* is set of all strings over V A language L is a subset of V*, i.e.,
® Towards Using Structural Events To Assess Non-Native Speech Lei Chen, Joel Tetreault, Xiaoming Xi Educational Testing Service (ETS) The 5th Workshop.
Statistical NLP: Lecture 3
Linear Model Incorporating Feature Ranking for Chinese Documents Readability Gang Sun, Zhiwei Jiang, Qing Gu and Daoxu Chen State Key Laboratory for Novel.
Grammars.
Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.
Introduction to Computational Linguistics Lecture 2.
Stemming, tagging and chunking Text analysis short of parsing.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Fall 2005-Lecture 2.
1 CONTEXT-FREE GRAMMARS. NLE 2 Syntactic analysis (Parsing) S NPVP ATNNSVBD NP AT NNthechildrenate thecake.
Parsing SLP Chapter 13. 7/2/2015 Speech and Language Processing - Jurafsky and Martin 2 Outline  Parsing with CFGs  Bottom-up, top-down  CKY parsing.
11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.
Mining and Summarizing Customer Reviews
Developing Theory-Based Diagnostic Tests of English Grammar: Application of Processability Theory Rosalie Hirch April 26, 2013.
McEnery, T., Xiao, R. and Y.Tono Corpus-based language studies. Routledge. Unit A 2. Representativeness, balance and sampling (pp13-21)
Conventions: Clauses and Phrases.  A complete sentence must have a subject, a matching verb, and express a complete thought.
Grammars.
Texas Comprehensive SEDL Austin, Texas March 16–17, 2009 Making Consistent Decisions About Accommodations for English Language Learners – Research.
GLOSSARY COMPILATION Alex Kotov (akotov2) Hanna Zhong (hzhong) Hoa Nguyen (hnguyen4) Zhenyu Yang (zyang2)
1 Statistical Parsing Chapter 14 October 2012 Lecture #9.
The CoNLL-2013 Shared Task on Grammatical Error Correction Hwee Tou Ng, Yuanbin Wu, and Christian Hadiwinoto 1 Siew.
Writing Assessments Informal. WORK WITH YOUR SLP Writing tied to language and reading.
Dr. Monira Al-Mohizea MORPHOLOGY & SYNTAX WEEK 11.
How to Tag a Corpus Using Stanford Tagger. Accuracy All tokens: 97.32% Unknown words: 90.79%
CRESST ONR/NETC Meetings, July 2003, v1 ONR Advanced Distributed Learning Linguistic Modification of Test Items Jamal Abedi University of California,
Assessment of Morphology & Syntax Expression. Objectives What is MLU Stages of Syntactic Development Examples of Difficulties in Syntax Why preferring.
LAS LINKS DATA ANALYSIS. Objectives 1.Analyze the 4 sub-tests in order to understand which academic skills are being tested. 2.Use sample tests to practice.
Essentials of Business Communication, Asian Edition Business Communication Workshop Course Coordinator:Ayyaz Qadeer Lecture # 11.
Automatic Readability Evaluation Using a Neural Network Vivaek Shivakumar October 29, 2009.
Tracking Language Development with Learner Corpora Xiaofei Lu CALPER 2010 Summer Workshop July 12, 2010.
Rules, Movement, Ambiguity
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
Writing Effectively Sentences and Paragraphs. Clauses Independent Clause – Can stand alone as a complete, simple sentence. Subordinate Clause – Contains.
Welcome Grammar Workshop. Key Areas of Grammar Teaching in Foundation stage. Speaking They use language in the past, present and future form accurately.
Unit 8 Syntax. Syntax Syntax deals with rules for combining words into sentences, as well as with relationship between elements in one sentence Basic.
What do we mean by Syntax? Unit 6 – Presentation 1 “the order or arrangement of words within a sentence” And what is a ‘sentence’? A group of words that.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Spring 2006-Lecture 2.
Putting it All Together Xiaofei Lu APLNG 596D July 17, 2009.
Chunk Parsing. Also called chunking, light parsing, or partial parsing. Method: Assign some additional structure to input over tagging Used when full.
PARSING David Kauchak CS159 – Fall Admin Assignment 3 Quiz #1  High: 36  Average: 33 (92%)  Median: 33.5 (93%)
X-Bar Theory. The part of the grammar regulating the structure of phrases has come to be known as X'-theory (X’-bar theory'). X-bar theory brings out.
Compare/Contrast Rough Draft 3. Types of Sentences Simple (S) – 1 independent clause & 0 dependent clauses Compound (CD) – 2 or more independent clauses.
LING/C SC 581: Advanced Computational Linguistics Lecture Notes Feb 17 th.
10/31/00 1 Introduction to Cognitive Science Linguistics Component Topic: Formal Grammars: Generating and Parsing Lecturer: Dr Bodomo.
TRUE or FALSE? Syntax= the order of words in a sentence.
SPAG Parent Workshop April Agenda English and the new SPaG curriculum How to help your children at home How we teach SPaG Sample questions from.
Criterial features If you have examples of language use by learners (differentiated by L1 etc.) at different levels, you can use that to find the criterial.
Beginning Syntax Linda Thomas
Words, Phrases, Clauses, & Sentences
Statistical NLP: Lecture 3
Revision Outcome 1, Unit 1 The Nature and Functions of Language
David Mareček and Zdeněk Žabokrtský
Syntax.
LING/C SC 581: Advanced Computational Linguistics
Parts of Speech Review Commas
R.Rajkumar Asst.Professor CSE
CS 3304 Comparative Languages
Stylistics and Stylometry
Parts of Speech Review Commas
Using GOLD to Tracking L2 Development
Teori Bahasa dan Automata Lecture 9: Contex-Free Grammars
Applied Linguistics Chapter Four: Corpus Linguistics
What is a clause? A clause is a group of related words containing a subject and a predicate. It is different from a phrase in that a phrase does not include.
David Kauchak CS159 – Spring 2019
Presentation transcript:

Measuring Linguistic Complexity Kristopher Kyle

Who is this guy?  Interested in:  L2 Writing Quality/Development  Assessment  Natural Language Processing  Productive Vocabulary  Productive Syntax

Outline of Workshop  Why measure linguistic complexity?  How can linguistic complexity measures be conceptualized?  How do we actually measure linguistic complexity?  Hands-on workshop I: Measuring syntactic complexity  Hands-on workshop II: From raw data to findings (if time)

Why measure linguistic complexity?  In the 70’s, SLA researchers (e.g., Larsen-Freeman, 1978) wanted to measure language development  Larsen-Freeman proposed three constructs of development:  complexity  accuracy  fluency  The general hypothesis (with regard to complexity) has been: As language learners develop, their language will become more complex.  How complexity is measured has been the subject of much debate (e.g., Bulté & Housen, 2012)

How can linguistic complexity measures be conceptualized?  Wolfe-Quintero et al. (1998) provides a compendium of CAF measures up until the late 90’s  Lexical Complexity:  a variety of general and part of speech specific type/token ratio counts  Syntactic Complexity  a variety of clause, sentence, and T-unit measures that focus on clausal complexity.

How can linguistic complexity measures be conceptualized?  Most of syntactic complexity indices are ratio scores: (Structure A)/(Structure B).  The denominator (Structure B) is either:  clause: a main verb and its dependents (I eat pizza.)  T-unit: an independent clause and any attached dependent clauses (I eat pizza because it is delicious.)  sentence: A string of words that starts with a capital letter and ends with sentence-ending punctuation (I think you know what a sentence is.)

How can linguistic complexity measures be conceptualized?  The numerator (Structure A) has included many structures:  clauses  dependent clauses  adverbial clauses  T-units  complex T-units  coordinate phrases  complex nominals  verb phrases  passives

How can linguistic complexity measures be conceptualized?  Length of unit measures have also been prominent (e.g., Ortega, 2003; Lu, 2011).  Mean length of clause (MLC)  Mean length of T-unit (MLTU)  Mean length of sentence (MLS)

How can linguistic complexity measures be conceptualized?  The rise of phrasal complexity:  Biber, Poonpon, and Grey (2011) suggested that clausal subordination (i.e., what most syntactic complexity indices measure) is NOT a prominent feature of academic writing  Informal speech includes many dependent clauses, but academic writing includes many dependent phrases (and especially noun phrases.

How can linguistic complexity measures be conceptualized?  Some important issues:  Definition of measures  What counts as a clause?  Prominence of broad indices  What does MLC really tell us about development?  Often only a limited range of measures are used.

How do we actually measure linguistic complexity?  To measure linguistic complexity, we have two options.  Option #1: Count features by hand  Option #2: Count features using a computer

How do we actually measure linguistic complexity?  Advantages of Option 1:  Researcher has full control over how syntactic complexity is measured.  Human counts may be more accurate  Disadvantages of Option 1:  Expensive!  Intra-rater reliability  Inter-rater reliability – who is qualified?

How do we actually measure linguistic complexity?  Advantages of Option 2:  Very cheap  Reliable (same results every time)  Usually Accurate  Biber (e.g., 2004) and Lu (2010, 2011) report accuracies above 90%  Can analyze a broad range of indices at once.  Disadvantages of Option 2:  Research has less control (is at mercy of available programs)  Some data is not well-suited to automatic analysis  Some linguistic features cannot be reliably captured

Hands-on workshop I: Measuring syntactic complexity  Go to and download the “short_samples.zip” file.  Without talking with your neighbor(s) fill in the included excel sheet for examples 1-5.  What were your answers?  Any issues with example 5?  Now do the same for example 6…

Hands-on workshop I: Measuring syntactic complexity  Tool for the Automatic Analysis of Syntactic Complexity (TAASC)  Prototype!!!  Includes indices created by Xiaofe Lu (Syntactic Complexity Analyzer; Lu, 2011)  Also includes some replications of the Biber Tagger

Hands-on workshop I: Measuring syntactic complexity  How TAASC works:  Reads file  Splits file into sentences  Parses each sentence  uses Stanford Parser  Uses regular expressions (a way to search for patterns) to identify particular structures in the parse tree.  uses Stanford Tregex (regular expressions for parse trees)

Hands-on workshop I: Measuring syntactic complexity  Now, lets check to see if your computer is set up correctly.  First, search for Terminal (mac) or Command Prompt (Windows)  Then type: java –version  Then type: python  Go to and download the appropriate version of TAASC (windows or mac).  Extract it to your Desktop  Copy the example files to the “to_process_2” folder

Hands-on workshop I: Measuring syntactic complexity  Now, in Terminal/Command Prompt type:  cd [location of TAASC folder] (then press “return”)  python [name of the appropriate TAASC program] (“return”)  Your results should now be in a file called “results.csv”  If you want to examine the accuracy of the parse trees, look in the folder “parsed_files” using Tregex

Hands-on workshop I: Measuring syntactic complexity  Some simple patterns:  VP  VP<S  Some important patterns:  clause: S|SINV|SQ <<# MD|VBP|VBZ|VBD  T-unit: S|SBARQ|SINV|SQ > ROOT | [$-- S|SBARQ|SINV|SQ !>> SBAR|VP]

Hands-on workshop II: From raw data to findings  Go to and download the “Workshop_Data.zip” file.  58 participants, three timed essays over 1 year.  IEP Levels 3-4 (Intermediate/Advanced)  Now let’s analyze some data!  NOTE: We didn’t get to this in class…