Presentation is loading. Please wait.

Presentation is loading. Please wait.

Measuring Linguistic Complexity Kristopher Kyle 3-5-2015.

Similar presentations


Presentation on theme: "Measuring Linguistic Complexity Kristopher Kyle 3-5-2015."— Presentation transcript:

1 Measuring Linguistic Complexity Kristopher Kyle 3-5-2015

2 Who is this guy?  Interested in:  L2 Writing Quality/Development  Assessment  Natural Language Processing  Productive Vocabulary  Productive Syntax

3 Outline of Workshop  Why measure linguistic complexity?  How can linguistic complexity measures be conceptualized?  How do we actually measure linguistic complexity?  Hands-on workshop I: Measuring syntactic complexity  Hands-on workshop II: From raw data to findings (if time)

4 Why measure linguistic complexity?  In the 70’s, SLA researchers (e.g., Larsen-Freeman, 1978) wanted to measure language development  Larsen-Freeman proposed three constructs of development:  complexity  accuracy  fluency  The general hypothesis (with regard to complexity) has been: As language learners develop, their language will become more complex.  How complexity is measured has been the subject of much debate (e.g., Bulté & Housen, 2012)

5 How can linguistic complexity measures be conceptualized?  Wolfe-Quintero et al. (1998) provides a compendium of CAF measures up until the late 90’s  Lexical Complexity:  a variety of general and part of speech specific type/token ratio counts  Syntactic Complexity  a variety of clause, sentence, and T-unit measures that focus on clausal complexity.

6 How can linguistic complexity measures be conceptualized?  Most of syntactic complexity indices are ratio scores: (Structure A)/(Structure B).  The denominator (Structure B) is either:  clause: a main verb and its dependents (I eat pizza.)  T-unit: an independent clause and any attached dependent clauses (I eat pizza because it is delicious.)  sentence: A string of words that starts with a capital letter and ends with sentence-ending punctuation (I think you know what a sentence is.)

7 How can linguistic complexity measures be conceptualized?  The numerator (Structure A) has included many structures:  clauses  dependent clauses  adverbial clauses  T-units  complex T-units  coordinate phrases  complex nominals  verb phrases  passives

8 How can linguistic complexity measures be conceptualized?  Length of unit measures have also been prominent (e.g., Ortega, 2003; Lu, 2011).  Mean length of clause (MLC)  Mean length of T-unit (MLTU)  Mean length of sentence (MLS)

9 How can linguistic complexity measures be conceptualized?  The rise of phrasal complexity:  Biber, Poonpon, and Grey (2011) suggested that clausal subordination (i.e., what most syntactic complexity indices measure) is NOT a prominent feature of academic writing  Informal speech includes many dependent clauses, but academic writing includes many dependent phrases (and especially noun phrases.

10 How can linguistic complexity measures be conceptualized?  Some important issues:  Definition of measures  What counts as a clause?  Prominence of broad indices  What does MLC really tell us about development?  Often only a limited range of measures are used.

11 How do we actually measure linguistic complexity?  To measure linguistic complexity, we have two options.  Option #1: Count features by hand  Option #2: Count features using a computer

12 How do we actually measure linguistic complexity?  Advantages of Option 1:  Researcher has full control over how syntactic complexity is measured.  Human counts may be more accurate  Disadvantages of Option 1:  Expensive!  Intra-rater reliability  Inter-rater reliability – who is qualified?

13 How do we actually measure linguistic complexity?  Advantages of Option 2:  Very cheap  Reliable (same results every time)  Usually Accurate  Biber (e.g., 2004) and Lu (2010, 2011) report accuracies above 90%  Can analyze a broad range of indices at once.  Disadvantages of Option 2:  Research has less control (is at mercy of available programs)  Some data is not well-suited to automatic analysis  Some linguistic features cannot be reliably captured

14 Hands-on workshop I: Measuring syntactic complexity  Go to www.kristopherkyle.com/workshop/ and download the “short_samples.zip” file.www.kristopherkyle.com/workshop/  Without talking with your neighbor(s) fill in the included excel sheet for examples 1-5.  What were your answers?  Any issues with example 5?  Now do the same for example 6…

15 Hands-on workshop I: Measuring syntactic complexity  Tool for the Automatic Analysis of Syntactic Complexity (TAASC)  Prototype!!!  Includes indices created by Xiaofe Lu (Syntactic Complexity Analyzer; Lu, 2011)  Also includes some replications of the Biber Tagger

16 Hands-on workshop I: Measuring syntactic complexity  How TAASC works:  Reads file  Splits file into sentences  Parses each sentence  uses Stanford Parser  Uses regular expressions (a way to search for patterns) to identify particular structures in the parse tree.  uses Stanford Tregex (regular expressions for parse trees)

17 Hands-on workshop I: Measuring syntactic complexity  Now, lets check to see if your computer is set up correctly.  First, search for Terminal (mac) or Command Prompt (Windows)  Then type: java –version  Then type: python  Go to www.kristopherkyle.com/workshop/ and download the appropriate version of TAASC (windows or mac).www.kristopherkyle.com/workshop/  Extract it to your Desktop  Copy the example files to the “to_process_2” folder

18 Hands-on workshop I: Measuring syntactic complexity  Now, in Terminal/Command Prompt type:  cd [location of TAASC folder] (then press “return”)  python [name of the appropriate TAASC program] (“return”)  Your results should now be in a file called “results.csv”  If you want to examine the accuracy of the parse trees, look in the folder “parsed_files” using Tregex

19 Hands-on workshop I: Measuring syntactic complexity  Some simple patterns:  VP  VP<S  Some important patterns:  clause: S|SINV|SQ <<# MD|VBP|VBZ|VBD  T-unit: S|SBARQ|SINV|SQ > ROOT | [$-- S|SBARQ|SINV|SQ !>> SBAR|VP]

20 Hands-on workshop II: From raw data to findings  Go to www.kristopherkyle.com/workshop/ and download the “Workshop_Data.zip” file.www.kristopherkyle.com/workshop/  58 participants, three timed essays over 1 year.  IEP Levels 3-4 (Intermediate/Advanced)  Now let’s analyze some data!  NOTE: We didn’t get to this in class…


Download ppt "Measuring Linguistic Complexity Kristopher Kyle 3-5-2015."

Similar presentations


Ads by Google