1 Statistical Techniques Chapter 10. 2 10.1 Linear Regression Analysis Simple Linear Regression.

1 Statistical Techniques Chapter 10

2 10.1 Linear Regression Analysis Simple Linear Regression

3 Multiple Linear Regression with Excel

4 A Regression Equation for the District Office Building Data

6 Regression Trees

8 Transforming the Linear Regression Model Logistic regression is a nonlinear regression technique that associates a conditional probability with each data instance. 10.2 Logistic Regression

9 The Logistic Regression Model

10 Logistic Regression: An Example

11 10.3 Bayes Classifier

12 Bayes Classifier: An Example

13 The Instance to be Classified Magazine Promotion = Yes Watch Promotion = Yes Life Insurance Promotion = No Credit Card Insurance = No Sex = ?

14 Computing The Probability For Sex = Male

15 Conditional Probabilities for Sex = Male P(magazine promotion = yes | sex = male) = 4/6 P(watch promotion = yes | sex = male) = 2/6 P(life insurance promotion = no | sex = male) = 4/6 P(credit card insurance = no | sex = male) = 4/6 P(E | sex =male) = (4/6) (2/6) (4/6) (4/6) = 8/81

16 The Probability for Sex=Male Given Evidence E P(sex = male | E)  0.0593 / P(E) The Probability for Sex=Female Given Evidence E P(sex = female| E)  0.0281 / P(E)

17 Zero-Valued Attribute Counts

18 Missing Data With Bayes classifier missing data items are ignored.

19 Numeric Data where e = the exponential function  = the class mean for the given numerical attribute  = the class standard deviation for the attribute x = the attribute value

21 Agglomerative Clustering 1.Place each instance into a separate partition. 2.Until all instances are part of a single cluster: a. Determine the two most similar clusters. b. Merge the clusters chosen into a single cluster. 3. Choose a clustering formed by one of the step 2 iterations as a final result. 10.4 Clustering Algorithms

22 Agglomerative Clustering: An Example

24 A final clustering Compare the average within-cluster similarity to the overall similarity Compare the similarity within each cluster to the similarity between each cluster Examine the rule sets generated by each saved clustering

25 Conceptual Clustering 1.Create a cluster with the first instance as its only member. 2.For each remaining instance, take one of two actions at each tree level. a. Place the new instance into an existing cluster. b. Create a new concept cluster having the new instance as its only member.

26 Data for Conceptual Clustering

28 COBWEB(Fisher 1987) Heuristic measure of partition quality Category utility

29 Expectation Maximization 1.Similar to the K-Means procedure 2.Makes use of the finite Gaussian mixtures model 3.The mixture model assigns each individual data instance a probability

30 3.3 The K-Means Algorithm 1.Choose a value for K, the total number of clusters. 2.Randomly choose K points as cluster centers. 3.Assign the remaining instances to their closest cluster center. 4.Calculate a new cluster center for each cluster. 5.Repeat steps 3-5 until the cluster centers do not change.

32 Expectation Maximization 1.Guess initial values for the parameters. 2.Until a termination criterion is achieved: a. Use the probability density function for normal distributions to compute the cluster probability for each instance. b. Use the probability scores assigned to each instance in step 2(a) to re-estimate the parameters.

34 Inductive problem-solving methods Query and visualization techniques Machine learning techniques Statistical techniques 10.5 Heuristics or Statistics?

35 Query and Visualization Techniques Query tools and OLAP tools –Unable to find hidden patterns Visualization tools –Decision trees, bar and pie charts, histograms, maps, surface plot diagrams –Applied after a data mining process to help us understand what has been discovered

36 Machine Learning and Statistical Techniques 1.Statistical techniques typically assume an underlying distribution for the data whereas machine learning techniques do not. 2.Machine learning techniques tend to have a human flavor. 3.Machine learning techniques are better able to deal with missing and noisy data. 4.Most machine learning techniques are able to explain their behavior. 5.Statistical techniques tend to perform poorly with large- sized data.

1 Statistical Techniques Chapter 10. 2 10.1 Linear Regression Analysis Simple Linear Regression.

Similar presentations

Presentation on theme: "1 Statistical Techniques Chapter 10. 2 10.1 Linear Regression Analysis Simple Linear Regression."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Statistical Techniques Chapter 10. 2 10.1 Linear Regression Analysis Simple Linear Regression.

Similar presentations

Presentation on theme: "1 Statistical Techniques Chapter 10. 2 10.1 Linear Regression Analysis Simple Linear Regression."— Presentation transcript:

Similar presentations

About project

Feedback