Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hypothesis Testing. Coke vs. Pepsi Hypothesis: tweets reflect market share (people tweet as much as they drink) Market share: – 67% vs. 33% From tweets:

Similar presentations


Presentation on theme: "Hypothesis Testing. Coke vs. Pepsi Hypothesis: tweets reflect market share (people tweet as much as they drink) Market share: – 67% vs. 33% From tweets:"— Presentation transcript:

1 Hypothesis Testing

2 Coke vs. Pepsi Hypothesis: tweets reflect market share (people tweet as much as they drink) Market share: – 67% vs. 33% From tweets: – 71% vs. 29% Happened by chance? Or people tend to talk more about Coke than they drink it?

3 A simpler hypothesis testing Claim: I can distinguish Coke and Pepsi just by tasting. How do you verify my claim?

4 It's like a court judgment If you want to prove something, you have to assume the opposite, and find evidence that contradicts it. In a court, you want to prove a defendant guilty. You assume he/she is innocent.

5 You conducted an experiment… And have some outcome – 62 out 100 correct Assuming I cannot distinguish them, I did it just by random guessing, is the result possible? Of course possible, if I'm lucky, I can get 100 out 100. But is the result surprising?

6 How do we define surprising-ness? Let's play random guess game one million times. If it turns out, 4 of 1 million times someone manages to score 62 or more, then we can say you have to be very super duper lucky to do that. Actually 0.000004% lucky. And we are 99.999996% sure, that you can't get 62 in one game just by luck Thus I am actually be able to distinguish Coke and Pepsi to some extent.

7 But we can't play this game that many times… Or can we? Open Excel In cell B1, type = rand() Can you make B1 say 0 if the random number is less than 0.5 and 1 otherwise? You just flipped a coin in Excel!

8 Random Guessing Game in Excel Flip the coin 100 times, in the same column Find out how many heads you had in cell B101 We've just played the random guessing game one time. Can you do it 10 times?

9 Histogram We want to find out how many times we scored 62 or higher. It's also interesting to look at how the scores are distributed, i.e. which are more likely It's called a histogram Let's create one by hand Then in Excel

10 Now do it 50 times! (or more… doesn't have to be exact) Does the histogram look better? What about 500 times? Look at the histogram

11 How probable is a score of 62? You can calculate it from the histogram Let's play the game in Python for as many times as we want! Here are the steps: – flip a coin 100 times, and record the number of heads (I'll show you how to flip coins in Python) – Do it 1,000 times. Record all the scores (numbers of heads) – Find out how many of them is greater than 62. What's the percentage? – Now calculate this percentage for 2,000 games. 5,000 games, 10,000 and 50,000 games. What about the score 57 or higher? 54? 50? – Ahuh, may be you want to write a function…

12 Back to Coke vs. Pepsi


Download ppt "Hypothesis Testing. Coke vs. Pepsi Hypothesis: tweets reflect market share (people tweet as much as they drink) Market share: – 67% vs. 33% From tweets:"

Similar presentations


Ads by Google