Presentation is loading. Please wait.

Presentation is loading. Please wait.

Predictive modeling competitions

Similar presentations


Presentation on theme: "Predictive modeling competitions"— Presentation transcript:

1 Predictive modeling competitions
making data science a sport Anthony Goldbloom CEO, Kaggle Photo by mikebaird,

2 Global competitions Predicting HIV viral load Competition closes 77%
1½ weeks 70.8% State of the art 70% Competitions involve participants from all over the world competing to produce the best models. One of our first competitions helped improve the state of the art in HIV modelling by 10 per cent. The scientific literature or an in-house modeller effort, evolves slowly, somebody tries something, somebody tweaks that approach and so on. Opening up a problem to a wide audience leads to rapid improvements.

3 Diverse experts solving diverse problems
Grant Application Forecasting Chess Ratings HIV Research Stock Price Prediction Travel Time Prediction Edmund & Adrian London & USA Dr. Derek Gatherer UK Felipe Maia Uppsala University Ivan Russian Federation Philipp Emanuel Widmann Heidelberg, DE Dr. Christopher Hefele, New York Robert Warsaw Chih-Li Sung & Roy Tseng Penghu & Taipei Gzegorz Swiszcz Gera Cole Harris Texas Giuseppe Ragusa Rome Jure Zbontar Ljubljana Claudio Perlich USA Chris DuBois Portland Edmund & Adrian London & USA John Blatz Baltimore Jason Trigg Pennsylvania Chris Raimondi Batimore Rajstennaj Barrabas USA Jason Trigg Pennsylvania Uri Blass Tel-Aviv Lee Baker Las Cruces, NM Nan Zhou Pittsburgh Jeremy Howard Australia Thomas Mahony Canberra Glen Maher Canberra Emir Delic Australia

4 Motivation Why host a competition? Why compete? How it works Heritage Health Prize Questions

5 “I keep saying the sexy job in the next ten years will be statisticians.”
Hal Varian Google Chief Economist 2009

6 Crowdsourcing Mismatch between those with data and those with the skills to analyse it It is almost never the case that any single organization has access to the advanced machine learning and statistical techniques that would allow them to extract maximum value from their data. Meanwhile, data scientists crave real-world data to develop and refine their techniques. Crowdsourcing corrects this mismatch by offering companies a cost effective way to harness the ‘cognitive surplus’ of the world's best data scientists. 6 6

7 Countless possible approaches to any data prediction problem
Countless possible approaches to any data prediction problem. Which to choose? There are countless models that can be applied to solve any one predictive analytics problem. It is impossible to know at the outset which technique will be most effective. 7 7 7

8 18 year old beating his professors
There are countless models that can be applied to solve any one predictive analytics problem. It is impossible to know at the outset which technique will be most effective. 8 8 8

9 Motivation Why host a competition? Why compete? How it works Heritage Health Prize Questions

10 Tourism Forecasting Competition
Forecast Error (MASE) Existing model Aug 9 2 weeks later 1 month later Competition End Very rapid improvements first then the rate of change slows down

11 Chess Ratings Competition
Existing model (ELO) Error Rate (RMSE) Aug 4 1 month later 2 months later Today The algorithm used to power Mark Zuckerberg’s Facemash. For those who have seen the Social Network, it was the algorithm that Eduardo Saverin wrote on Mark’s window.

12 Our User Base From many different (maths-related disciplines)

13 Users apply different techniques
neural networks logistic regression support vector machine decision trees ensemble methods adaBoost Bayesian networks genetic algorithms random forest Monte Carlo methods principal component analysis Kalman filter evolutionary fuzzy modeling Users have the option to tell us their favourite techniques 13 13

14 Benchmarking We’re talking to a bank at the moment in Australia. They are receiving criticism for a credit scores on a particular product – they want to know whether the 14 14

15 Case study: VicRoads has an algorithm that they used to forecast travel time on Melbourne freeways (taking into account time, weather, accidents etc). Their current model is inaccurate and somewhat useless. They want to do better (or at least fnd out about whether it’s possible to do better). 15 15

16 NASA tried, now it’s our turn
~25% Successful grant applications NASA tried, now it’s our turn NASA’s leading experts have tried for years to find galaxies that have been gradationally lensed. Haven’t satisfactorily solved the problem. Now it’s our turn. 16 16

17 Ideal for complex problems
Example a real estate data provider that wants to predict which houses in a particular suburb will go up for sale in any three month period 17 17

18 ~25% Successful grant applications
Outcomes of a competition to predict the success of grant applications: Successful grant applications Better identify likely successes to avoid wasting resources on hopeless applications Identify and communicate the characteristics of a successful application to future applicants Case Study Melbourne University 18 18

19 Motivation Why host a competition? Why compete? How it works Heritage Health Prize Questions

20 More fun than Sudoku Why Participants Compete 1 2
Clean, Real world data Professional Reputation & Experience 3 4 Interactions with experts in related fields Prizes Participants compete for four reasons: Access to real world data (which is developed on a silver platter) Benchmark their techniques and enhance their professional reputations (winner’s are the rockstars on Kaggle) The opportunity to interact with experts in related fields (who they might otherwise not get to meet) Prizes

21 User base Many are academics who want access to real world data and problems 21 21

22 User base

23 Motivation Why host a competition? Why compete? How it works Heritage Health Prize Questions

24 1 2 3 Upload Submit Evaluate & Exchange 24 24

25 Use the wizard to post a competition
25 25

26 Participants make their entries
26 26

27 Competitions are judged based on predictive accuracy
How do you know who to choose? Compare techniques on a uniform dataset with a uniform evaluation algorithm 27 27

28 Competitions are judged on objective criteria
Competition Mechanics Competitions are judged on objective criteria The essence of predicting the past competition is deriving insights from data that is already available to facilitate better decisions in the future.

29 Motivation Why host a competition? Why compete? How it works Heritage Health Prize Questions

30 $3 million prize An upcoming competition, powered by Kaggle
De-identified dataset containing medical records of 100,000 Americans $3 million prize

31 Probability of going to hospital in the next year
& Unfilled Prescriptions & Hypertension & High Cholesterol Diabetes Probability of going to hospital in the next year

32 Projected 100,000 registrations
NetFlix Prize 2006 – 2009 $1 million prize 50,000 registrations 2011 $3 million prize Projected 100,000 registrations

33 Motivation Why host a competition? Why compete? How it works Heritage Health Prize Questions

34 Predict Grant Applications
Tourism Forecasting (Part 2) IJCNN Social Network Challenge Chess Ratings – Elo vs. the Rest of the World

35 Jeff Moser Jeremy Howard Nicholas Gruen Anthony Goldbloom

36 What could the world’s best analysts find in your data?
phone Photo by gidzy,


Download ppt "Predictive modeling competitions"

Similar presentations


Ads by Google