Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Mining CS 341, Spring 2007 Lecture 4: Data Mining Techniques (I)

Similar presentations


Presentation on theme: "Data Mining CS 341, Spring 2007 Lecture 4: Data Mining Techniques (I)"— Presentation transcript:

1 Data Mining CS 341, Spring 2007 Lecture 4: Data Mining Techniques (I)

2 © Prentice Hall2 Review: Information Retrieval Information Retrieval –Similarity measures –Evaluation Metrics : Precision and Recall Question Answering Question Answering Web Search Engine Web Search Engine –An application of IR –Related to web mining

3 © Prentice Hall3 Data Mining Techniques Outline Statistical Statistical –Point Estimation –Models Based on Summarization –Bayes Theorem –Hypothesis Testing –Regression and Correlation Similarity Measures Similarity Measures Goal: Provide an overview of basic data mining techniques

4 © Prentice Hall4 Point Estimation Point Estimate: estimate a population parameter. Point Estimate: estimate a population parameter. May be made by calculating the parameter for a sample. May be made by calculating the parameter for a sample. May be used to predict value for missing data. May be used to predict value for missing data. Ex: Ex: –R contains 100 employees –99 have salary information –Mean salary of these is $50,000 –Use $50,000 as value of remaining employee’s salary. Is this a good idea?

5 © Prentice Hall5 Estimation Error Bias: Difference between expected value and actual value. Bias: Difference between expected value and actual value. Mean Squared Error (MSE): expected value of the squared difference between the estimate and the actual value: Mean Squared Error (MSE): expected value of the squared difference between the estimate and the actual value: Root Mean Square Error (RMSE) Root Mean Square Error (RMSE)

6 © Prentice Hall6 Jackknife Estimate Jackknife Estimate: estimate of parameter is obtained by omitting one value from the set of observed values. Jackknife Estimate: estimate of parameter is obtained by omitting one value from the set of observed values. Named to describe a “handy and useful tool” Named to describe a “handy and useful tool” Used to reduce bias Used to reduce bias Property: The Jackknife estimator lowers the bias from the order of 1/n to 1/n 2 Property: The Jackknife estimator lowers the bias from the order of 1/n to 1/n 2

7 © Prentice Hall7 Jackknife Estimate Definition: Definition: –Divide the sample size n into g groups of size m each, so n=mg. (often m=1 and g=n) –estimate  j by ignoring the jth group. –  _ is the average of  j. –The Jackknife estimator is »  Q = g  – (g-1)  _. Where  is an estimator for the parameter theta.

8 © Prentice Hall8 Jackknife Estimator: Example 1 Estimate of mean for X={x, x, x,}, n =3, g=3, m=1,  =  = (x+ x+ x Estimate of mean for X={x 1, x 2, x 3,}, n =3, g=3, m=1,  =  = (x 1 + x 2 + x 3 )/3  1 = (x + x)/2,  2 = (x + x)/2,  1 = (x + x)/2,  1 = (x 2 + x 3 )/2,  2 = (x 1 + x 3 )/2,  1 = (x 1 + x 2 )/2,  _ = (  1 +  2 +  2 )/3  _ = (  1 +  2 +  2 )/3  Q = g  -(g-1)  _= 3  -(3-1)  _= (x+ x + x)/3  Q = g  -(g-1)  _= 3  -(3-1)  _= (x 1 + x 2 + x 3 )/3 In this case, the Jackknife Estimator is the same as the usual estimator. In this case, the Jackknife Estimator is the same as the usual estimator.

9 © Prentice Hall9 Jackknife Estimator: Example 2 Estimate of variance for X={1, 4, 4}, n =3, g=3, m=1,  =  2 Estimate of variance for X={1, 4, 4}, n =3, g=3, m=1,  =  2  2 = ((1-3) 2 +(4-3) 2 +(4-3) 2 )/3 = 2  2 = ((1-3) 2 +(4-3) 2 +(4-3) 2 )/3 = 2  1 = ((4-4) 2 + (4-4) 2 ) /2 = 0,  1 = ((4-4) 2 + (4-4) 2 ) /2 = 0,  2 = 2.25,  3 = 2.25  2 = 2.25,  3 = 2.25  _ = (  1 +  2 +  2 )/3 = 1.5  _ = (  1 +  2 +  2 )/3 = 1.5  Q = g  -(g-1)  _= 3  -(3-1)  _  Q = g  -(g-1)  _= 3  -(3-1)  _ =3(2)-2(1.5)=3 =3(2)-2(1.5)=3 In this case, the Jackknife Estimator is different from the usual estimator. In this case, the Jackknife Estimator is different from the usual estimator.

10 © Prentice Hall10 Jackknife Estimator: Example 2(cont’d) In general, apply the Jackknife technique to the biased estimator  2 In general, apply the Jackknife technique to the biased estimator  2  2 =  (x i – x ) 2 / n  2 =  (x i – x ) 2 / n then the jackknife estimator is s 2 then the jackknife estimator is s 2  s 2 =  (x i – x ) 2 / (n -1)  Which is known to be unbiased for  2

11 © Prentice Hall11 Maximum Likelihood Estimate (MLE) Obtain parameter estimates that maximize the probability that the sample data occurs for the specific model. Obtain parameter estimates that maximize the probability that the sample data occurs for the specific model. Joint probability for observing the sample data by multiplying the individual probabilities. Likelihood function: Joint probability for observing the sample data by multiplying the individual probabilities. Likelihood function: Maximize L. Maximize L.

12 © Prentice Hall12 MLE Example Coin toss five times: {H,H,H,H,T} Coin toss five times: {H,H,H,H,T} Assuming a perfect coin with H and T equally likely, the likelihood of this sequence is: Assuming a perfect coin with H and T equally likely, the likelihood of this sequence is: However if the probability of a H is 0.8 then: However if the probability of a H is 0.8 then:

13 © Prentice Hall13 MLE Example (cont’d) General likelihood formula: General likelihood formula: Estimate for p is then 4/5 = 0.8 Estimate for p is then 4/5 = 0.8

14 © Prentice Hall14 Expectation-Maximization (EM) Solves estimation with incomplete data. Solves estimation with incomplete data. Obtain initial estimates for parameters. Obtain initial estimates for parameters. Iteratively use estimates for missing data and continue until convergence. Iteratively use estimates for missing data and continue until convergence.

15 © Prentice Hall15 EM Example

16 © Prentice Hall16 EM Algorithm

17 © Prentice Hall17 Models Based on Summarization Basic concepts to provide an abstraction and summarization of the data as a whole. Basic concepts to provide an abstraction and summarization of the data as a whole. –Statistical concepts: mean, variance, median, mode, etc. Visualization: display the structure of the data graphically. Visualization: display the structure of the data graphically. –Line graphs, Pie charts, Histograms, Scatter plots, Hierarchical graphs

18 © Prentice Hall18 Scatter Diagram

19 © Prentice Hall19 Bayes Theorem Posterior Probability: P(h|x) Posterior Probability: P(h 1 |x i ) Prior Probability: P(h) Prior Probability: P(h 1 ) Bayes Theorem: Bayes Theorem: Assign probabilities of hypotheses given a data value. Assign probabilities of hypotheses given a data value.

20 © Prentice Hall20 Bayes Theorem Example Credit authorizations (hypotheses): h 1 =authorize purchase, h= authorize after further identification, h=do not authorize, h= do not authorize but contact police Credit authorizations (hypotheses): h 1 =authorize purchase, h 2 = authorize after further identification, h 3 =do not authorize, h 4 = do not authorize but contact police Assign twelve data values for all combinations of credit and income: Assign twelve data values for all combinations of credit and income: From training data: P(h 1 ) = 60%; P(h 2 )=20%; P(h 3 )=10%; P(h 4 )=10%. From training data: P(h 1 ) = 60%; P(h 2 )=20%; P(h 3 )=10%; P(h 4 )=10%.

21 © Prentice Hall21 Bayes Example(cont’d) Training Data: Training Data:

22 © Prentice Hall22 Bayes Example(cont’d) Calculate P(x i |h j ) and P(x i ) Calculate P(x i |h j ) and P(x i ) Ex: P(x 7 |h 1 )=2/6; P(x 4 |h 1 )=1/6; P(x 2 |h 1 )=2/6; P(x 8 |h 1 )=1/6; P(x i |h 1 )=0 for all other x i. Ex: P(x 7 |h 1 )=2/6; P(x 4 |h 1 )=1/6; P(x 2 |h 1 )=2/6; P(x 8 |h 1 )=1/6; P(x i |h 1 )=0 for all other x i. Predict the class for x 4 : Predict the class for x 4 : –Calculate P(h j |x 4 ) for all h j. –Place x 4 in class with largest value. –Ex: »P(h 1 |x 4 )=(P(x 4 |h 1 )(P(h 1 ))/P(x 4 ) =(1/6)(0.6)/0.1=1. =(1/6)(0.6)/0.1=1. »x 4 in class h 1.

23 © Prentice Hall23 Hypothesis Testing Find model to explain behavior by creating and then testing a hypothesis about the data. Find model to explain behavior by creating and then testing a hypothesis about the data. Exact opposite of usual DM approach. Exact opposite of usual DM approach. H 0 – Null hypothesis; Hypothesis to be tested. H 0 – Null hypothesis; Hypothesis to be tested. H 1 – Alternative hypothesis H 1 – Alternative hypothesis

24 © Prentice Hall24 Chi-Square Test One technique to perform hypothesis testing One technique to perform hypothesis testing Used to test the association between two observed variable values and determine if a set of observed values is statistically different. Used to test the association between two observed variable values and determine if a set of observed values is statistically different. The chi-squared statistic is defines as: The chi-squared statistic is defines as: O – observed value O – observed value E – Expected value based on hypothesis. E – Expected value based on hypothesis.

25 © Prentice Hall25 Chi-Square Test Given the average scores of five schools. Determine whether the difference is statistically significant. Given the average scores of five schools. Determine whether the difference is statistically significant. Ex: Ex: –O={50,93,67,78,87} –E=75 –  2 =15.55 and therefore significant Examine a chi-squared significance table. Examine a chi-squared significance table. –with a degree of 4 and a significance level of 95%, the critical value is 9.488. Thus the variance between the schools’ scores and the expected value cannot be associated with pure chance.

26 © Prentice Hall26 Regression Predict future values based on past values Predict future values based on past values Fitting a set of points to a curve Fitting a set of points to a curve Linear Regression assumes linear relationship exists. Linear Regression assumes linear relationship exists. y = c 0 + c 1 x 1 + … + c n x n –n input variables, (called regressors or predictors) –One out put variable, called response –n+1 constants, chosen during the modlong process to match the input examples

27 © Prentice Hall27 Linear Regression -- with one input value

28 © Prentice Hall28 Correlation Examine the degree to which the values for two variables behave similarly. Examine the degree to which the values for two variables behave similarly. Correlation coefficient r: Correlation coefficient r: 1 = perfect correlation1 = perfect correlation -1 = perfect but opposite correlation-1 = perfect but opposite correlation 0 = no correlation0 = no correlation

29 © Prentice Hall29Correlation Where X, Y are means for X and Y respectively. Where X, Y are means for X and Y respectively. Suppose X=(1,3,5,7,9) and Y=(9,7,5,3,1) Suppose X=(1,3,5,7,9) and Y=(9,7,5,3,1) r = ? Suppose X=(1,3,5,7,9) and Y=(2,4,6,8,10) Suppose X=(1,3,5,7,9) and Y=(2,4,6,8,10) r = ?

30 © Prentice Hall30 Similarity Measures Determine similarity between two objects. Determine similarity between two objects. Similarity characteristics: Similarity characteristics: Alternatively, distance measure measure how unlike or dissimilar objects are. Alternatively, distance measure measure how unlike or dissimilar objects are.

31 © Prentice Hall31 Similarity Measures

32 © Prentice Hall32 Distance Measures Measure dissimilarity between objects Measure dissimilarity between objects

33 © Prentice Hall33 Next Lecture: Data Mining techniques (II) Data Mining techniques (II) –Decision trees, neural networks and genetic algorithms Reading assignments: Chapter 3 Reading assignments: Chapter 3


Download ppt "Data Mining CS 341, Spring 2007 Lecture 4: Data Mining Techniques (I)"

Similar presentations


Ads by Google