Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Predicting the winner of C.Y. award 指導教授:黃三益博士 組員: 尹川 陳隆賢 陳偉聖.

Similar presentations


Presentation on theme: "1 Predicting the winner of C.Y. award 指導教授:黃三益博士 組員: 尹川 陳隆賢 陳偉聖."— Presentation transcript:

1 1 Predicting the winner of C.Y. award 指導教授:黃三益博士 組員: 尹川 陳隆賢 陳偉聖

2 2 Introduction Baseball sport in Taiwan  CPBL (Chinese Professional Baseball League) MLB (Major League Baseball)  Baseball sport in USA Cy Young Award since 1956  Baseball Writers Association of America  Weighted scores  Each league has one winner per year.

3 3 Measurements There are no definite rules be used to judge. Nevertheless, many measurements could be used to judge whether a pitcher is good or not.  Wins  ERA  WHIP  G/F etc.

4 4 Aim of the study To analysis the historical statistics of pitchers. Building a predictive model. To predict the Cy Young Award winner of the year in the future.

5 5 Data mining procedure Ten data mining methodology steps

6 6 Step 1 : Translate the Problem Directed data mining problem  Target variable: Cy Young Award  Classification  Decision tree Purposes  Gambling game  Predictive activities

7 7 Step 2 : Select Appropriate Data Just MLB statistics data (1871 ~ 2006)  Cy Young Award: 1956 ~ 2006 total 21456 records List of Cy Young Award winners “Time” factor  1999 as the dividing year. Because of the emerging items. Variables: to remove the items that are not representative of a pitcher.

8 8 Step 3 : Get to know the data The materials that we used all come from MLB official site These data have already been disclosed for a lot of years The quality of data is very good some attributes has value since 1999

9 9 Step 4 : Create a model set We divide the data into training data and testing data We do not create a balanced sample The record of MLB is not the seasonal materials we will pick the materials since 1999

10 10 Step 5 : Fix problems with the data These data are taken from MLB official side No missing values single source

11 11 Step 6 : Transform data to bring information to the surface There are no combinations of attributes We delete some attributes We add a attribute-Year We add a attribute (CyYoungAward_Winner) for classification

12 12 Step 7 : Build Models Tools Used Weka Crash Problem Blank Attributes Build Model Handling Blank Attributes

13 13 Tools Used

14 14 Weka Crash Problem Raw data  21456 data instances  42 attributes Weka crashed during model construction Give Weka more memory

15 15 Blank Attributes

16 16 Build Model MLB 1956~2006  with blank attributes  ADTree MLB 1956~2006  without blank attributes  ADTree MLB 1999~2006  ADTree

17 17 Handling Blank Attributes

18 18 1956~2006, with blank attributes, ADTree

19 19 1956~2006, with blank attributes, ADTree === Confusion Matrix === NONWINNERWINNER<-- classified as 2134321NONWINNER 5834WINNER

20 20 1956~2006, without blank attributes, ADTree

21 21 1956~2006, without blank attributes, ADTree === Confusion Matrix === NONWINNERWINNER<-- classified as 2135014NONWINNER 6230WINNER

22 22 1999~2006, ADTree

23 23 1999~2006, ADTree === Confusion Matrix === NONWINNERWINNER<-- classified as 5090 3 NONWINNER 133WINNER

24 24 Not good enough for gambling Step 8 : Assess Models(1/2) === Confusion Matrix === NONWINNERWINNER<-- classified as 2135014NONWINNER 6230WINNER === Confusion Matrix === NONWINNERWINNER<-- classified as 5090 3 NONWINNER 133WINNER

25 25 Step 8 : Assess Models(2/2) Some attributes are more important Number of Appearance of Attributes in Different Models WBBWPCTOBAWHIPK/9ERAGF 1956~2006 ADTree 2311 1956~2006 Without Blank Attributes ADTree 211111 1999~2006 ADTree 21111 1956~2006 Without Blank Attributes J48 3211

26 26 Step 9 : Deploy Models To implement a computer program with the built model. To predict the Cy Young Award winner more easily.

27 27 Step 10 : Assess Results To compare the predictive and the final Cy Young Award winner directly. Not “business” but “interest”.  Assessment from the judgment of the person.

28 28 Conclusions We have used the classification technology to set up the model of predicting We find the accuracy of the built model is not high Some factors that we are not to consider It can not use in the place with essential benefits Just for fun


Download ppt "1 Predicting the winner of C.Y. award 指導教授:黃三益博士 組員: 尹川 陳隆賢 陳偉聖."

Similar presentations


Ads by Google