Presentation is loading. Please wait.

Presentation is loading. Please wait.

Machine Learning in Practice Lecture 13 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.

Similar presentations


Presentation on theme: "Machine Learning in Practice Lecture 13 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute."— Presentation transcript:

1 Machine Learning in Practice Lecture 13 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute

2 Plan for the Day Announcements  Questions?  Next two quizzes and next assignment will be prep for Mid-Term More about Text  More Linguistic Tools  Basic Feature Extraction from Text

3 More Linguistic Tools

4 Syntactic Categories Note: some words can fit under more than one category!!!! Noun: Harry, boy, wheat, policy, this Verb: arrive, discuss, melt, hear, remain Adjective: good, tall, silent, old, expensive Preposition: to, in, on, near, at, by Adverb: silently, slowly, quietly, quickly, now Determiner: the, a, an, this Auxilliary: will, can, may, must, be, have Conjunction: and, or, but Intensifier: too, so, very, almost, more, quite

5 Part of Speech Tagging 1. CC Coordinating conjunction 2. CD Cardinal number 3. DT Determiner 4. EX Existential there 5. FW Foreign word 6. IN Preposition/subord 7. JJ Adjective 8. JJR Adjective, comparative 9. JJS Adjective, superlative 10.LS List item marker 11.MD Modal 12.NN Noun, singular or mass 13.NNS Noun, plural 14.NNP Proper noun, singular 15.NNPS Proper noun, plural 16.PDT Predeterminer 17.POS Possessive ending 18.PRP Personal pronoun 19.PP Possessive pronoun 20.RB Adverb 21.RBR Adverb, comparative 22.RBS Adverb, superlative http://www.ldc.upenn.edu/Catalog/docs/treebank2/cl93.html

6 Part of Speech Tagging 23.RP Particle 24.SYM Symbol 25.TO to 26.UH Interjection 27.VB Verb, base form 28.VBD Verb, past tense 29.VBG Verb, gerund/present participle 30.VBN Verb, past participle 31.VBP Verb, non-3rd ps. sing. present 32.VBZ Verb, 3rd ps. sing. present 33.WDT wh-determiner 34.WP wh-pronoun 35.WP Possessive wh- pronoun 36.WRB wh-adverb http://www.ldc.upenn.edu/Catalog/docs/treebank2/cl93.html

7 Examples from the homework

8 Questions: How are POS tags assigned? What happens if a word can have more than one tag?

9 Parsers Do a structural analysis on sentences Can be rule based or statistical Can represent surface syntax or deep syntax

10 X’ Structure X’’ X’ X Pre-head ModSpecPost-head Mod X’ Head The black cat in the hat A complete phrase Sometimes called “a maximal projection”

11 Movement: deep versus surface syntax The cat at the food. What did the cat eat e? The food that the cat ate e. The cat that e ate the food. The food that e was eaten e by the cat. Note: Deep syntax important for semantic interpretation. Most off-the-shelf parsers do surface syntax.

12 From Structure to Meaning Semantics is about what sentences mean We use cues from syntax to help us derive/construct what sentences mean  Remember that we can use cues from one level of linguistic analysis to help predict what is going on at a higher level The meaning of a sentence is made up of the meaning of its parts and the manner in which those parts are arranged syntactically  The man ate the hamburger.  But not: The hamburger ate the man. But not all meaning is compositional  “kicked the bucket”

13 Thematic Roles Can you tell how to use syntax to identify thematic roles in these sentences? Agent: who is doing the action Theme: what the action is done to Recipient: who benefits from the action Source: where the theme started Destination: where the theme ended up Tool: what the agent used to do the action to the theme Manner: how the agent behaved while doing the action 1.The man chased the intruder. 2.The intruder was chased by the man. 3.Aaron carefully wrote a letter to Marilyn. 4.Marilyn received the letter. 5.John moved the package from the table to the sofa. 6.The governor entertained the guests in the parlor.

14 Verb Alternations Verb classes are associated with alternative arrangements of arguments I loaded hay on the truck.  I loaded the truck with hay.  But not: I put the truck with hay. I gave a cake to my mom.  I gave my mom a cake.  But not: I donated my mom a cake. I baked brownies for my mom.  I baked my mom brownies.  But not: I prepared my mom brownies.

15 Overview of Basic Feature Extraction from Text

16 http:www.cs.cmu.edu/~cprose/TagHelper.html

17 TagHelper Customizations Feature Space Design  If you want to find questions, you don’t need to do a complete syntactic analysis  Look for question marks  Look for wh-terms that occur immediately before an auxilliary verb  Look for topics likely to be indicative of questions

18 TagHelper Customizations Feature Space Design  Punctuation can be a “stand in” for mood “you think the answer is 9?” “you think the answer is 9.”  Bigrams capture simple lexical patterns “common denominator” versus “common multiple”  POS bigrams capture stylistic information “the answer which is …” vs “which is the answer”  Line length can be a proxy for explanation depth

19 TagHelper Customizations Feature Space Design  Contains non-stop word can be a predictor of whether a conversational contribution is contentful “ok sure” versus “the common denominator”  Remove stop words removes some distracting features  Stemming allows some generalization Multiple, multiply, multiplication  Removing rare features is a cheap form of feature selection Features that only occur once or twice in the corpus won’t generalize, so they are a waste of time to include in the vector space

20 Why create new features by hand? Rules  For simple rules, it might be easier and faster to write the rules by hand instead of learning them from examples Features  More likely to capture meaningful generalizations  Build in knowledge so you can get by with less training data

21 Rule Language ANY() is used to create lists  COLOR = ANY(red,yellow,green,blue,purple)  FOOD = ANY(cake,pizza,hamburger,steak,bread) ALL() is used to capture contingencies  ALL(cake,presents) More complex rules  ALL(COLOR,FOOD) * Note that you may wish to use part-of-speech tags in your rules!

22 What can you do with this rule language? You may want to generalize across sets of related words  Color = {red,yellow,orange,green,blue}  Food = {cake,pizza,hamburger,steak,bread} You may want to detect contingencies  The text must mention both cake and presents in order to count as a birthday party You may want to combine these  The text must include a Color and a Food

23 Examples from the homework

24 Note that “Professor NNP NNP” is a pattern that matches against a named entity.

25 For the assignment… You will create some rules  ALL(Professor,NNP_NNP)  Don’t worry that you can’t enforce contiguity This will give you experience in building patterns that can be used as features for classification Question: For “African Union envoy”, is the whole expression a named entity or just “African Union”

26 Named Entity Extraction This is sort of what you’re doing, although our rule language has less power.

27 Advanced Feature Editing

28 * For small datasets, first deselect Remove rare features.

29 Advanced Feature Editing * Next, Click on Adv Feature Editing

30 Advanced Feature Editing * Now you may begin creating your own features.

31 Types of Basic Features Primitive features inclulde unigrams, bigrams, and POS bigrams

32 Types of Basic Features The Options change which primitive features show up in the Unigram, Bigram, and POS bigram lists  You can choose to remove stopwords or not  You can choose whether or not to strip endings off words with stemming  You can choose how frequently a feature must appear in your data in order for it to show up in your lists

33 Types of Basic Features * Now let’s look at how to create new features.

34 Creating New Features * You can use the feature editor to create new features.

35 Creating New Features * First click on ANY

36 Creating New Features * Then click ALL

37 Creating New Features * Now fill in ‘tell’ and ‘me’

38 Creating New Features * Now fill in the rest of the pattern from the POS Bigram list

39 Creating New Features * Now change the name

40 Creating New Features * Click to add to feature list

41 Using the Display Option

42

43

44

45 Viewing Created Features

46

47

48 Take Home Message We learned about Part-of-Speech taggers and parsers We talked about syntax and what level of structure can be approximated with part of speech tags We then looked at the types of features that can be extracted using TagHelper tools We talked about simple rule representation technology for creating new features for classification


Download ppt "Machine Learning in Practice Lecture 13 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute."

Similar presentations


Ads by Google