Presentation is loading. Please wait.

Presentation is loading. Please wait.

Teaching Analytics with Case Studies: Finding Love in a Classification Tree Ruth Hummel, PhD JMP Academic Ambassador.

Similar presentations


Presentation on theme: "Teaching Analytics with Case Studies: Finding Love in a Classification Tree Ruth Hummel, PhD JMP Academic Ambassador."— Presentation transcript:

1 Teaching Analytics with Case Studies: Finding Love in a Classification Tree
Ruth Hummel, PhD JMP Academic Ambassador

2 Stats about Online Dating

3 Dating Apps, OKCupid, and Ethics
large-public-dataset-of-dating-site-users

4 Experimental Interventions
Other Demographics Response

5 Modeling for Change Explanatory: Explain the relationship between variables. Interpret coefficients, e.g., “For women, after age 21 each additional year of age corresponded to a 3.4% decrease in the rate of finding a romantic partner during that year.” “Seeing suggested matches with the same education level corresponded to a 4.1% increase in the rate of finding a romantic partner during that year.” Predictive: Find a model that performs well on predicting similar data. E.g., score Online Dating users according to their demographic, personality, and preference information in order to predict the likelihood of their success in finding a romantic partner. Prescriptive: change something in order to achieve the gains your model suggests. Since the Education Level Matching resulted in higher success, implement this for all users (or for users in certain target groups where the success rate for this is especially high.)

6 Explore the Data…

7

8

9 Analysis Plan Univariate Exploratory Data Analysis (and Data Cleaning, if needed) Bivariate Exploratory Data Analysis Explanatory or Predictive Models? Let’s look at: Multiple Logistic Regression, main effects and interactions Classification Tree …If we had a continuous response, we would look at: Multiple Linear Regression, main effects and interactions Regression Tree

10 *Importance of Holding Out Validation Data

11 Logistic Regression

12

13

14

15 Interaction: Seeing matches with your same education level when you are highly educated results in a LOWER chance of finding love than if you are medium educated. This trend is not true for the random education intervention group.

16 Interaction: Seeing matches with your same education level when you are highly educated results in a LOWER chance of finding love than if you are medium educated. This trend is not true for the random education intervention group.

17 Classification Tree

18

19 Classification Tree (Partition)

20 First Split

21 Second Split

22 Third Split

23 Lots more splits…

24 After 14 Splits

25

26

27

28 Partition – Profiler

29

30

31 Change Classification Cutoff of the Probability?

32

33 In the “Probability of Yes” distribution, what probability corresponds to the 80th percentile (i.e., what probability cutoff would let us classify the most likely 20% as “Yes” – even if they aren’t actually very likely to be “Yes”)?

34

35 True Positives False Positives

36 Back to Logistic Regression to change the classification cutoff…

37 False Positives True Positives

38 Bootstrap Forest

39

40

41 Boosted Tree

42

43

44 https://gizmodo.com/the-future-of-online-dating-is-unsexy-and-brutally- effe-1819781116


Download ppt "Teaching Analytics with Case Studies: Finding Love in a Classification Tree Ruth Hummel, PhD JMP Academic Ambassador."

Similar presentations


Ads by Google