Presentation is loading. Please wait.

Presentation is loading. Please wait.

Feature Engineering Studio September 23, 2013. Welcome to Mucking Around Day.

Similar presentations


Presentation on theme: "Feature Engineering Studio September 23, 2013. Welcome to Mucking Around Day."— Presentation transcript:

1 Feature Engineering Studio September 23, 2013

2 Welcome to Mucking Around Day

3 Sort into pairs Partner with the person next to you One group of 3 is allowed

4 Sort into pairs Do we have a group of 3? One of the 3 will work with me

5 Sort into pairs Go over your reports together – A maximum of 5 minutes apiece

6 5 minutes for first person

7 5 minutes for second person

8 Re-assemble into one big group

9 Who here found something really cool while mucking around? Show us, tell us

10 Who here found a histogram with a normal distribution? Show us, tell us

11 Who here found a histogram with a hypermode? Show us, tell us

12 Who here found a histogram with a flat distribution? Show us, tell us

13 Who here found a histogram with a skewed distribution? Show us, tell us

14 Who here found a histogram with a bimodal distribution? Show us, tell us

15 Who here found a histogram with something else interesting? Show us, tell us

16 Who here found something surprising with their min, max, average, stdev?

17 Categorical variables Who here found something curious, weird, or interesting in the distribution of their categorical variables?

18 Who here hasn’t spoken yet? (and analyzed data) Tell us something interesting you found in your data

19 Who here played with pivot tables? What did you learn?

20 My turn to play with pivot tables Who wants to volunteer their data? (I might request a 2 nd or 3 rd data set, depending on how the 1 st one goes)

21 Who here played with vlookup? What did you learn?

22 My turn to play with vlookup Using the same volunteered data set(s)

23 Other cool things you can create with a few simple formulas (plus demos!)

24 Identifying specific cases of interest

25 Did event of interest ever occur for student?

26 Counts-so-far (and total value for student)

27 Counts-last-N-actions

28 First attempts

29 Ratios between events of interest

30 How many students had 3 (or 4, 5, 2,…) of an event

31 Times-so-far

32 Cutoff-based features

33 Unitized actions (such as unitized time)

34 Last 3 or 5 unitized

35 Comparing earlier behaviors to later behaviors through caching

36 Counts-if

37 Percentages of action type

38 Percentages of time spent per action/location/KC/etc.

39 Questions? Comments?

40 Other cool ideas?

41 Assignment 3 Feature Engineering 1 “Bring Me a Rock” Get your data set Open it in Excel Create as many features as you feel inspired to create – Features should be created with the goal of predicting your ground truth variable – At least 12 separate features that are not just variations on a theme (e.g. “time for last 3 actions” and “time for last 4 actions” are variations on a theme; but “time for last 3 actions” and “total time between help requests and next action” are two separate features) For each feature, write a 1-3 sentence “just so story” for why it might work Test how good each features is

42 Testing Feature Goodness For this assignment, there are a bunch of ways to test feature goodness Single-feature prediction models in data mining or stats package, giving correlation or kappa (special session this Wednesday) Compute correlation in Excel (want to see?) – You can do this with binaries variables too, although it’s not really optimal Compute t-test in Excel (want to see?) Compute kappa in Excel (if you don’t know how, easier to do in RapidMiner)

43 Were you right? Which of your “just so stories” seem to be correct? Did any of your feature correlate in the opposite direction from what you expected?

44 Assignment 3 Write a brief report for me Email me an excel sheet with your features You don’t need to prepare a presentation But be ready to discuss your features in class

45 Next Classes 9/25 Special Session – Using RapidMiner to Produce Prediction Models – Come to this if you’ve never built a classifier or regressor in RapidMiner (or a similar tool) – Statistical significance tests using linear regression don’t count… 9/30 Advanced Feature Distillation in Excel – Assignment 3 due – Online Equation Solver Tutorials should be in your INBOX

46 Upcoming Classes 10/2 Special session on prediction models – Come to this if you don’t know why student-level cross-validation is important, or if you don’t know what J48 is 10/7 Advanced Feature Distillation in Google Refine 10/9 Special session? TBD.


Download ppt "Feature Engineering Studio September 23, 2013. Welcome to Mucking Around Day."

Similar presentations


Ads by Google