Presentation is loading. Please wait.

Presentation is loading. Please wait.

Predictive modeling competitions

Similar presentations


Presentation on theme: "Predictive modeling competitions"— Presentation transcript:

1 Predictive modeling competitions
making data science a sport Anthony Goldbloom CEO, Kaggle Photo by mikebaird,

2 Motivation Why compete? How it works R on Kaggle The Heritage Health Prize

3 Global competitions Predicting HIV viral load Competition closes 77%
1½ weeks 70.8% State of the art 70% Competitions involve participants from all over the world competing to produce the best models. One of our first competitions helped improve the state of the art in HIV modelling by 10 per cent. The scientific literature or an in-house modeller effort, evolves slowly, somebody tries something, somebody tweaks that approach and so on. Opening up a problem to a wide audience leads to rapid improvements.

4 Crowdsourcing Mismatch between those with data and those with the skills to analyse it It is almost never the case that any single organization has access to the advanced machine learning and statistical techniques that would allow them to extract maximum value from their data. Meanwhile, data scientists crave real-world data to develop and refine their techniques. Crowdsourcing corrects this mismatch by offering companies a cost effective way to harness the ‘cognitive surplus’ of the world's best data scientists. 4 4

5 Countless approaches. Hard to know which will work
There are countless models that can be applied to solve any one predictive analytics problem. It is impossible to know at the outset which technique will be most effective. 5 5 5

6 Additional slides Not MIT, not SAS … UoL?

7 Tourism Forecasting Competition
Forecast Error (MASE) Existing model Aug 9 2 weeks later 1 month later Competition End Very rapid improvements first then the rate of change slows down

8 Chess Ratings Competition
Existing model (ELO) Error Rate (RMSE) Aug 4 1 month later 2 months later Today The algorithm used to power Mark Zuckerberg’s Facemash. For those who have seen the Social Network, it was the algorithm that Eduardo Saverin wrote on Mark’s window.

9 Our User Base From many different (maths-related disciplines)

10 Users apply different techniques
neural networks logistic regression support vector machine decision trees ensemble methods adaBoost Bayesian networks genetic algorithms random forest Monte Carlo methods principal component analysis Kalman filter evolutionary fuzzy modeling Users have the option to tell us their favourite techniques 10 10

11 Motivation Why compete? How it works R on Kaggle The Heritage Health Prize

12 More fun than Sudoku Why Participants Compete 1 2
Clean, Real world data Professional Reputation & Experience 3 4 Interactions with experts in related fields Prizes Participants compete for four reasons: Access to real world data (which is developed on a silver platter) Benchmark their techniques and enhance their professional reputations (winner’s are the rockstars on Kaggle) The opportunity to interact with experts in related fields (who they might otherwise not get to meet) Prizes

13 Motivation Why compete? How it works R on Kaggle The Heritage Health Prize

14 Competitions are judged based on predictive accuracy
How do you know who to choose? Compare techniques on a uniform dataset with a uniform evaluation algorithm 14 14

15 Competitions are judged on objective criteria
Competition Mechanics Competitions are judged on objective criteria The essence of predicting the past competition is deriving insights from data that is already available to facilitate better decisions in the future.

16 Motivation Why compete? How it works R on Kaggle The Heritage Health Prize

17 R on Kaggle The essence of predicting the past competition is deriving insights from data that is already available to facilitate better decisions in the future.

18 R on Kaggle among academics
The essence of predicting the past competition is deriving insights from data that is already available to facilitate better decisions in the future.

19 R on Kaggle among Americans
The essence of predicting the past competition is deriving insights from data that is already available to facilitate better decisions in the future.

20 Who Uses R and How Of the 11 competitions we know what people used. 6 used R. 3 Python, 1 SAS, 1 Matlab

21 Motivation Why compete? How it works R on Kaggle The Heritage Health Prize

22 The essence of predicting the past competition is deriving insights from data that is already available to facilitate better decisions in the future.

23 The essence of predicting the past competition is deriving insights from data that is already available to facilitate better decisions in the future.

24 The essence of predicting the past competition is deriving insights from data that is already available to facilitate better decisions in the future.

25 Mmm… how do I put this into R?
The essence of predicting the past competition is deriving insights from data that is already available to facilitate better decisions in the future.

26 Some SQL Magic The essence of predicting the past competition is deriving insights from data that is already available to facilitate better decisions in the future.

27 Gives us a flat record The essence of predicting the past competition is deriving insights from data that is already available to facilitate better decisions in the future.

28 Voila, an entry! The essence of predicting the past competition is deriving insights from data that is already available to facilitate better decisions in the future.

29 What could the world’s best analysts find in your data?
phone Photo by gidzy,


Download ppt "Predictive modeling competitions"

Similar presentations


Ads by Google