Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Mining Joyeeta Dutta-Moscato July 10, 2013. Wherever we have large amounts of data, we have the need for building systems capable of learning information.

Similar presentations


Presentation on theme: "Data Mining Joyeeta Dutta-Moscato July 10, 2013. Wherever we have large amounts of data, we have the need for building systems capable of learning information."— Presentation transcript:

1 Data Mining Joyeeta Dutta-Moscato July 10, 2013

2 Wherever we have large amounts of data, we have the need for building systems capable of learning information from the data – predictions in medicine – text and web page classification – speech recognition Learning underlying patterns useful to – to predict the presence of a disease for future patients, – describe the dependencies between diseases and symptoms Data Mining Data Mining focuses on the discovery of (previously) unknown properties from data, using techniques from Machine Learning.

3 4 attributes / features Each attribute has values 3 × 3 × 2 × 2 = 36 possible combinations 14 combinations present in this example Data

4 If outlook = sunny and humidity = high then play = no If outlook = rainy and windy = true then play = no If outlook = overcast then play = yes If humidity = normal then play = yes If none of the above then play = yes A set of rules to predict whether we will get to play could look like this:  A decision list Data  Prediction

5 F = {  Play Tennis? } Decision Tree Learning The goal is to create a model that predicts the value of a target variable based on several input variables.

6 Problem Setting Set of possible instances X Each instance x in X is a feature vector x = Unknown target function f: X  Y Y is discrete valued Set of function hypotheses H = { h | h : X  Y } Each hypothesis h is a decision tree Input: Training examples { } of unknown target function f Output Hypothesis h ∈ H that best approximates target function f Decision Tree Learning

7 Supervised Learning Given a set of training examples of the form: {( x 1, y 1 ), … ( x n, y n )} a learning algorithm seeks a function: g : X  Y Where X is the input space and Y is the output space. Example: - Classify the universe of music into ‘like’ & ‘dislike’ for one person - Training set: A list of songs that the person heard, and marked as ‘like’ or ‘dislike’ - Task: Infer a function of features (of these songs) to predict what other songs the person will like

8 Supervised Learning Given a model family, we are interested in finding the best model parameters, such that the misfit (measured by an error function ) between the data and the model is minimized. An optimal scenario will allow for the algorithm to correctly determine the class labels for unseen instances.

9 Supervised Learning Considerations: The learning algorithm must generalize from the training data to unseen situations in a "reasonable" way: Avoid overfitting Bias-variance tradeoff Number of training examples versus model complexity

10 Supervised Learning Common methods of supervised learning: Regression X discrete or continuous → Y continuous Examples: – debt, equity, orders, sales → stock price – age, height, weight, race, VKORC1 genotype, CYP2C9 genotype → warfarin dose Classification X discrete or continuous → Y discrete Examples: - family history, history of head trauma, age, gender, race, APOE status → Alzheimer’s disease - arrangement of pixels in handwritten digit → “3”

11 Linear Regression Fitting the data to the model Object: Minimize mean square error Supervised Learning

12 Regression Is a mean square error of 0 (i.e. no difference between prediction and target) mean this is the best model?  Overfitting Real test of ‘best model’ is performance on data it has not been trained on

13 Regression What does this mean about the relationship between x and y?

14 Linear classifier Classification Logistic regression Hard threshold Soft threshold Uses the logistic function, which goes between 0 and 1

15 Support Vector machines Other common methods in Supervised Learning Artificial Neural Networks (can also be unsupervised) K-nearest neighbor Graphical models, Bayesian models More sophisticated algorithms are needed for data that are not linearly separable

16 Unsupervised Learning Learn relationships among the inputs, x 1, … x n. No y is given. Clustering – Group inputs based on some measure of similarity - Common “first pass” exploratory data mining technique

17 Hierarchical Clustering A method of cluster analysis which aims to partition into groups that are “close” to each other according to some distance metric.

18 k-means Clustering A method of cluster analysis which aims to partition the data into k clusters in which each observation belongs to the cluster with the nearest mean.

19 Acknowledgments Shyam Visweswaran, Dept. of Biomedical Informatics Tom Mitchell, Dept. of Machine Learning, CMU “Data Mining: Practical Machine Learning Tools and Techniques” Ian H. Witten, Eibe Frank, Mark A. Hall


Download ppt "Data Mining Joyeeta Dutta-Moscato July 10, 2013. Wherever we have large amounts of data, we have the need for building systems capable of learning information."

Similar presentations


Ads by Google