Presentation is loading. Please wait.

Presentation is loading. Please wait.

Moving Ahead: Creative Feature Extraction and Error Analysis Techniques Carolyn Penstein Rosé Carnegie Mellon University Funded through the Pittsburgh.

Similar presentations


Presentation on theme: "Moving Ahead: Creative Feature Extraction and Error Analysis Techniques Carolyn Penstein Rosé Carnegie Mellon University Funded through the Pittsburgh."— Presentation transcript:

1 Moving Ahead: Creative Feature Extraction and Error Analysis Techniques Carolyn Penstein Rosé Carnegie Mellon University Funded through the Pittsburgh Science of Learning Center and The Office of Naval Research, Cognitive and Neural Sciences Division

2 Outline  New Feature Creation  Error Analysis

3 New Feature Creation

4 Why create new features?  You may want to generalize across sets of related words  Color = {red,yellow,orange,green,blue}  Food = {cake,pizza,hamburger,steak,bread}  You may want to detect contingencies  The text must mention both cake and presents in order to count as a birthday party  You may want to combine these  The text must include a color and a food

5 Why create new features by hand?  More likely to capture meaningful generalizations  Build in knowledge so you can get by with less training data

6 Rule Language  ANY() is used to create lists  COLOR = ANY(red,yellow,green,blue,purple)  FOOD = ANY(cake,pizza,hamburger,steak,bread)  ALL() is used to capture contingencies  ALL(cake,presents)  More complex rules  ALL(COLOR,FOOD)

7 Group Project: Make a rule that will match against questions but not statements QuestionTell me what your favorite color is. StatementI tell you my favorite color is blue. QuestionWhere do you live? StatementI live where my family lives. QuestionWhich kinds of baked goods do you prefer StatementI prefer to eat wheat bread. QuestionWhich courses should I take? Statement You should take my applied machine learning course. QuestionTell me when you get up in the morning. StatementI get up early.

8 Possible Rule  ANY(ALL(tell,me),BOL_WDT,BOL_WRB)

9 Advanced Feature Editing * Click here

10 Types of Basic Features  Primitive features inclulde unigrams, bigrams, and POS bigrams

11 Types of Basic Features  The Options change which primitive features show up in the Unigram, Bigram, and POS bigram lists  You can choose to remove stopwords or not  You can choose whether or not to strip endings off words with stemming  You can choose how frequently a feature must appear in your data in order for it to show up in your lists

12 Types of Basic Features * Now let’s look at how to create new features.

13 Creating New Features *The feature editor allows you to create new feature definitions * Click on + to add your new feature

14 Examining a New Feature Right click on a feature to examine where it matches in your data

15 Examining a New Feature

16 Error Analysis

17 Create an Error Analysis File

18 Use TagHelper to Code Uncoded File The output file contains the codes TagHelper assigned. What you want to do now is to remove prediction column and insert the correct answers next to the TagHelper assigned answers.

19 Load Error Analysis File

20

21 Error Analysis Strategies  Look for large error cells in the confusion matrix  Locate the examples that correspond to that cell  What features do those examples share?  How are they different from the examples that were classified correctly?

22 Group Project  Load in the NewsGroupTrain.xls data set  What is the best performance you can get by playing with the standard TagHelper tools feature options?  Train a model using the best settings and then use it to assign codes to NewsGroupTest.xls  Copy in Answer column from NewsGroupAnswers.xls  Now do an error analysis to determine why frequent mistakes are being made  How could you do better?

23


Download ppt "Moving Ahead: Creative Feature Extraction and Error Analysis Techniques Carolyn Penstein Rosé Carnegie Mellon University Funded through the Pittsburgh."

Similar presentations


Ads by Google