Presentation is loading. Please wait.

Presentation is loading. Please wait.

Carolyn Penstein Rosé Language Technologies Institute Human-Computer Interaction Institute School of Computer Science With funding from the National Science.

Similar presentations


Presentation on theme: "Carolyn Penstein Rosé Language Technologies Institute Human-Computer Interaction Institute School of Computer Science With funding from the National Science."— Presentation transcript:

1 Carolyn Penstein Rosé Language Technologies Institute Human-Computer Interaction Institute School of Computer Science With funding from the National Science Foundation and the Office of Naval Research 1 LightSIDE

2 2

3 3

4 4 Click here to load a file

5 5 Select Heteroglossia as the predicted category

6 6 Make sure the text field is selected to extract text features from

7  Punctuation can be a “stand in” for mood  “you think the answer is 9?”  “you think the answer is 9.”  Bigrams capture simple lexical patterns  “common denominator” versus “common multiple”  Trigrams (just like bigrams, but with 3 words next to each other)  Carnegie Mellon University  POS bigrams capture syntactic or stylistic information  “the answer which is …” vs “which is the answer”  Line length can be a proxy for explanation depth

8  Contains non-stop word can be a predictor of whether a conversational contribution is contentful  “ok sure” versus “the common denominator”  Remove stop words removes some distracting features  Stemming allows some generalization  Multiple, multiply, multiplication  Removing rare features is a cheap form of feature selection  Features that only occur once or twice in the corpus won’t generalize, so they are a waste of time to include in the vector space

9  Think like a computer!  Machine learning algorithms look for features that are good predictors, not features that are necessarily meaningful  Look for approximations  If you want to find questions, you don’t need to do a complete syntactic analysis  Look for question marks  Look for wh-terms that occur immediately before an auxilliary verb

10 10 Click to extract text features

11 11 Select Logistic Regression as the Learner

12 12 Evaluate result by cross validation over sessions

13 13 Run the experiment

14 14

15  A sequence of 1 to 6 categories  May include GAPs  Can cover any symbol  GAP+ may cover any number of symbols  Must not begin or end with a GAP

16 16

17 17

18 18

19  Identify large error cells  Make comparisons  Ask yourself how it is similar to the instances that were correctly classified with the same class (vertical comparison)  How it is different from those it was incorrectly not classified as (horizontal comparison) Positive Negative

20 20

21 21 Error Analysis on Development Set

22 22 Error Analysis on Development Set

23 23 Error Analysis on Development Set

24 24 Error Analysis on Development Set

25 25

26 26

27  Positive: is interesting, an interesting scene  Negative: would have been more interesting, potentially interesting, etc. 27

28 28

29 29

30 30

31 31

32 32

33 33 * Note that in this case we get no benefit if we use feature selection over the original feature space.

34 34 General Domain ADomain BGeneral Why is this nonlinear? It represents the interaction between each feature and the Domain variable Now that the feature space represents the nonlinearity, the algorithm to train the weights can be linear.

35 35 Healthcare Bill Dataset

36 36 Healthcare Bill Dataset

37 37 Healthcare Bill Dataset

38 38 Healthcare Bill Dataset

39 39 Healthcare Bill Dataset

40 40 Healthcare Bill Dataset

41 41 Healthcare Bill Dataset

42 42 Healthcare Bill Dataset

43 43 Healthcare Bill Dataset


Download ppt "Carolyn Penstein Rosé Language Technologies Institute Human-Computer Interaction Institute School of Computer Science With funding from the National Science."

Similar presentations


Ads by Google