Download presentation
Presentation is loading. Please wait.
Published byShavonne Banks Modified over 8 years ago
1
1 Statistical Techniques Chapter 10
2
2 10.1 Linear Regression Analysis Simple Linear Regression
3
3 Multiple Linear Regression with Excel
4
4 A Regression Equation for the District Office Building Data
5
5
6
6 Regression Trees
7
7
8
8 Transforming the Linear Regression Model Logistic regression is a nonlinear regression technique that associates a conditional probability with each data instance. 10.2 Logistic Regression
9
9 The Logistic Regression Model
10
10 Logistic Regression: An Example
11
11 10.3 Bayes Classifier
12
12 Bayes Classifier: An Example
13
13 The Instance to be Classified Magazine Promotion = Yes Watch Promotion = Yes Life Insurance Promotion = No Credit Card Insurance = No Sex = ?
14
14 Computing The Probability For Sex = Male
15
15 Conditional Probabilities for Sex = Male P(magazine promotion = yes | sex = male) = 4/6 P(watch promotion = yes | sex = male) = 2/6 P(life insurance promotion = no | sex = male) = 4/6 P(credit card insurance = no | sex = male) = 4/6 P(E | sex =male) = (4/6) (2/6) (4/6) (4/6) = 8/81
16
16 The Probability for Sex=Male Given Evidence E P(sex = male | E) 0.0593 / P(E) The Probability for Sex=Female Given Evidence E P(sex = female| E) 0.0281 / P(E)
17
17 Zero-Valued Attribute Counts
18
18 Missing Data With Bayes classifier missing data items are ignored.
19
19 Numeric Data where e = the exponential function = the class mean for the given numerical attribute = the class standard deviation for the attribute x = the attribute value
20
20
21
21 Agglomerative Clustering 1.Place each instance into a separate partition. 2.Until all instances are part of a single cluster: a. Determine the two most similar clusters. b. Merge the clusters chosen into a single cluster. 3. Choose a clustering formed by one of the step 2 iterations as a final result. 10.4 Clustering Algorithms
22
22 Agglomerative Clustering: An Example
23
23
24
24 A final clustering Compare the average within-cluster similarity to the overall similarity Compare the similarity within each cluster to the similarity between each cluster Examine the rule sets generated by each saved clustering
25
25 Conceptual Clustering 1.Create a cluster with the first instance as its only member. 2.For each remaining instance, take one of two actions at each tree level. a. Place the new instance into an existing cluster. b. Create a new concept cluster having the new instance as its only member.
26
26 Data for Conceptual Clustering
27
27
28
28 COBWEB(Fisher 1987) Heuristic measure of partition quality Category utility
29
29 Expectation Maximization 1.Similar to the K-Means procedure 2.Makes use of the finite Gaussian mixtures model 3.The mixture model assigns each individual data instance a probability
30
30 3.3 The K-Means Algorithm 1.Choose a value for K, the total number of clusters. 2.Randomly choose K points as cluster centers. 3.Assign the remaining instances to their closest cluster center. 4.Calculate a new cluster center for each cluster. 5.Repeat steps 3-5 until the cluster centers do not change.
31
31
32
32 Expectation Maximization 1.Guess initial values for the parameters. 2.Until a termination criterion is achieved: a. Use the probability density function for normal distributions to compute the cluster probability for each instance. b. Use the probability scores assigned to each instance in step 2(a) to re-estimate the parameters.
33
33
34
34 Inductive problem-solving methods Query and visualization techniques Machine learning techniques Statistical techniques 10.5 Heuristics or Statistics?
35
35 Query and Visualization Techniques Query tools and OLAP tools –Unable to find hidden patterns Visualization tools –Decision trees, bar and pie charts, histograms, maps, surface plot diagrams –Applied after a data mining process to help us understand what has been discovered
36
36 Machine Learning and Statistical Techniques 1.Statistical techniques typically assume an underlying distribution for the data whereas machine learning techniques do not. 2.Machine learning techniques tend to have a human flavor. 3.Machine learning techniques are better able to deal with missing and noisy data. 4.Most machine learning techniques are able to explain their behavior. 5.Statistical techniques tend to perform poorly with large- sized data.
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.