We think you have liked this presentation. If you wish to download it, please recommend it to your friends in any social system. Share buttons are a little bit lower. Thank you!
Presentation is loading. Please wait.
Published byJarrett Wheadon
Modified about 1 year ago
Data Analysis of Tennis Matches Fatih Çalışır
1.ATP World Tour 250 ATP 250 Brisbane ATP 250 Sydney... 2.ATP World Tour 500 ATP 500 Memphis ATP 500 Dubai Domain of the Data 4 Types of Tennis Tournaments
3.ATP World Tour 1000 ATP 1000 Paris ATP 1000 Shanghai... 4.Grand Slams Australian Open Roland Garros Wimbeldon US Open Domain of the Data
Men’s Single Year 2010 11 ATP 500 Tournament 9 ATP 1000 Tournament 4 Grand Slams Domain of the Data
Source of Data Internet Official Websites of the Players ATP( Association of Tennis Professionals ) Homa Page 2010 Result Archive
Data Construction From different tables Each table from different website Combining easily
Data Construction Players Table
Data Construction Tournament Results Table
Data Construction Tournament Info Table
Data Construction Final Data Table 29 features 1453 instances
Aim of the Project Classification Finding weights for attributes
Missing Values Players’ Height Players’ Weight Players’ BMI Players’ Date of being Professional
Missing Values Players’ Height Consider players with same weight Take the average Players’ Weight Consider players with same height Take the average
Missing Values Players’ Height and Weight If both of them are missing Remove the row Players’ Date of beign Professional Consider players with same age Take the average
Data Understanding Min,Max,Median,Average values for numeric attributes
Data Understanding Occurrence table for categorical and numeric attributes
Data Understanding Histogram for numeric attributes
Data Understanding Box Plot for main characteristics of numerical attributes
Data Understanding Scatter Plot to relate two attributes
Feature Selection Linear Correlation
Feature Selection Backward Elemination Naive Bayes for Ranking
Feature Selection 28 attributes reduced to 19 attributes Atrributes are meaningful
Weight of Attributes RIMARC to find weights
Classification KNIME Decision Tree – C4.5 Gain Ratio Qualitiy Meauser
Classification 1017 instances for training 436 instances for testing 842 positive instances 611 negative instances Training and test data is randomly selected
Classification Decision Tree
Classification Confusion Matrix
Classification Accuracy Statistics
Classification Naive Bayes Classifier Confusion Matrix
Classification Accuracy Statistics
Classification C4.5 vs Naive Bayes Decision Tree (C4.5)Naive Bayes
Classification C4.5 vs Naive Bayes Decision Tree (C4.5) Naive Bayes
Probabilistic Generative Models Rong Jin. Probabilistic Generative Model Classify instance x into one of K classes Class prior Density function for class.
Analysis of World Cup Finals. Outline Project Understanding – World Cup History Data Understanding – How to collect the data Data Manipulation – Data.
1.1 example these are prices for Internet service packages find the mean, median and mode determine what type of data this is create a suitable frequency.
Roger Federer VS Rafael Nadal Federer Nadal Edgar Ilves Project CS 110 Prof. Scaramastra April 28, 2014.
Florida International University COP 4770 Introduction of Weka.
SCATTER PLOTS AND LINES OF BEST FIT SECTION 1.6. An effective way to see a relationship in data is to display the information as a __________________.
Using Appropriate Displays. The rabbit’s max speed is easier to find on the ________. The number of animals with 15 mph max speed or less was easier to.
Loan Default Model Saed Sayad 1www.ismartsoft.com.
Introduction to Data Analysis Data Measurement Measurement of the data is the first step in the process that ultimately guides the final analysis. Consideration.
Unit 3 Section : Regression Regression – statistical method used to describe the nature of the relationship between variables. Positive.
Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.
Naïve Bayes Data import – Delimited, Fixed, SAS, SPSS, OBDC Variable creation & transformation Recode variables Factor variables Missing.
1 Data Mining: Data Lecture Notes for Chapter 2. 2 What is Data? l Collection of data objects and their attributes l An attribute is a property or characteristic.
1 Predicting the winner of C.Y. award 指導教授：黃三益博士 組員： 尹川 陳隆賢 陳偉聖.
Chapter 1 Data Presentation Statistics and Data Measurement Levels Summarizing Data Symmetry and Skewness.
Subgroup Discovery Finding Local Patterns in Data.
CSCI N207: Data Analysis Using Spreadsheets Copyright ©2005 Department of Computer & Information Science Univariate Data Analysis.
A Brief Introduction and Issues on the Classification Problem Jin Mao Postdoc, School of Information, University of Arizona Sept 18, 2015.
Introduction to Probability and Bayesian Decision Making Soo-Hyung Kim Department of Computer Science Chonnam National University.
FPP 7-9 Exploratory Data Analysis: Two Variables 1.
Some Key Questions about you Data Damian Gordon Brendan Tierney Brian Mac Namee.
Chapter 7: Transformations. Attribute Selection Adding irrelevant attributes confuses learning algorithms---so avoid such attributes Both divide-and-conquer.
Chapter 3 – Data Exploration and Dimension Reduction © Galit Shmueli and Peter Bruce 2008 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Data Lecture Notes for Chapter 2 Introduction to Data Mining by Tan, Steinbach,
Chapter 5 Data mining : A Closer Look. Data Warehouse and Data Mining Chapter 5 2 Chapter Objectives Determine an appropriate data mining strategy for.
Classification and Prediction: Ensemble Methods Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Machine Learning Lecture 26: Ensemble Methods. Today Ensemble Methods. –Classifier Fusion “late fusion” vs. “early fusion” –Cross-validation as Ensemble.
1 Classification with Decision Trees I Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei.
Statistical Process Control. A process can be described as a transformation of set of inputs into desired outputs. Inputs PROCESSOutputs What is a process?
1 A Rank-by-Feature Framework for Interactive Exploration of Multidimensional Data Jinwook Seo, Ben Shneiderman University of Maryland Hyun Young Song.
By Andrew Finley. Research Question Is it possible to predict a football player’s professional based on collegiate performance? That is, is it possible.
Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model Prediction p(x; ) Female: Gaussian distribution N(
Descriptive statistics Petter Mostad Goal: Reduce data amount, keep ”information” Two uses: Data exploration: What you do for yourself when.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives Understand when linear regression is an appropriate.
PROCESSING, ANALYSIS & INTERPRETATION OF DATA. MEANING Data preparation to arrive at a meaningful research analysis. Intermediary b/w data collection.
Collaborative Filtering in iCAMP Max Welling Professor of Computer Science & Statistics.
In Stat-I, we described data by three different ways. Qualitative vs Quantitative Discrete vs Continuous Measurement Scales Describing Data Types.
ROC Curves. True positives and False positives True positive rate is TP = P correctly classified / P False positive rate is FP = N incorrectly classified.
1 Introduction to Data Mining with XLMiner Business Intelligence.
Introduction to Statistical Modeling and Machine Learning Lecture 8 Spoken Language Processing Prof. Andrew Rosenberg.
Two-Way Frequency Tables SUMMARIZE CATEGORICAL DATA.
Unit 4: Describing Data After 8 long weeks, we have finally finished Unit 3: Linear & Exponential Functions. Now on to Unit 4 which will last 3 weeks (yes.
Machine Learning in Practice Lecture 7 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
MACHINE LEARNING 6. Multivariate Methods 1. Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Motivating Example Loan.
Introduction to Machine Learning Multivariate Methods 姓名 : 李政軒.
Data Mining: A Closer Look Chapter Data Mining Strategies (p35) Moh!
6.7 Scatter Plots. 6.7 – Scatter Plots Goals / “I can…” Write an equation for a trend line and use it to make predictions Write the equation for a.
PROCESSING OF DATA The collected data in research is processed and analyzed to come to some conclusions or to verify the hypothesis made. Processing of.
Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.
Project Final Presentation – Dec. 6, 2012 CS 5604 : Information Storage and Retrieval Instructor: Prof. Edward Fox GTA : Tarek Kanan ProjArabic Team Ahmed.
© 2017 SlidePlayer.com Inc. All rights reserved.