Presentation is loading. Please wait.

Presentation is loading. Please wait.

DATA REDUCTION (Lecture# 03) Dr. Tahseen Ahmed Jilani Assistant Professor Member IEEE-CIS, IFSA, IRSS Department of Computer Science University of Karachi.

Similar presentations


Presentation on theme: "DATA REDUCTION (Lecture# 03) Dr. Tahseen Ahmed Jilani Assistant Professor Member IEEE-CIS, IFSA, IRSS Department of Computer Science University of Karachi."— Presentation transcript:

1 DATA REDUCTION (Lecture# 03) Dr. Tahseen Ahmed Jilani Assistant Professor Member IEEE-CIS, IFSA, IRSS Department of Computer Science University of Karachi

2 Dr. Tahseen A. Jilani-DCS-Uok2 Identify the differences in dimensionality reduction based on features, cases, and reduction of value techniques. Identify the differences in dimensionality reduction based on features, cases, and reduction of value techniques. Advantages of dimensionality reduction in the preprocessing of a data mining process could be performed prior to applying the data-mining techniques. Advantages of dimensionality reduction in the preprocessing of a data mining process could be performed prior to applying the data-mining techniques. Understanding basic principles of feature-selection and feature composition tasks using corresponding Statistical methods. Understanding basic principles of feature-selection and feature composition tasks using corresponding Statistical methods. Apply Principal component analysis and entropy based techniques and their comparison. Apply Principal component analysis and entropy based techniques and their comparison. Chapter Objectives

3 Dr. Tahseen A. Jilani-DCS-Uok3 Data Preprocessing steps are sufficient for moderate data sets. Data Preprocessing steps are sufficient for moderate data sets. For really large data sets, there is an increased likelihood that an intermediate, additional step-data reduction-should be performed prior to applying the data-mining techniques. For really large data sets, there is an increased likelihood that an intermediate, additional step-data reduction-should be performed prior to applying the data-mining techniques. Large data sets have the potential for better mining results, there is no guarantee that they will yield better knowledge than small data sets. Large data sets have the potential for better mining results, there is no guarantee that they will yield better knowledge than small data sets. For large databases (datasets), it is possible that huge data have less information (Knowledge) For large databases (datasets), it is possible that huge data have less information (Knowledge) Data Reduction

4 Dr. Tahseen A. Jilani-DCS-Uok4 The three basic operations in a data-reduction process are delete column, delete a row, and reduce the number of values in a column The three basic operations in a data-reduction process are delete column, delete a row, and reduce the number of values in a column These operations attempt to preserve the character of the original data by deleting data that are nonessential. These operations attempt to preserve the character of the original data by deleting data that are nonessential. There are other operations that reduce dimensions, but the new data are unrecognizable when compared to the original data set, and these operations are mentioned here just briefly because they are highly application-dependent. There are other operations that reduce dimensions, but the new data are unrecognizable when compared to the original data set, and these operations are mentioned here just briefly because they are highly application-dependent. Types of Data Reduction

5 Dr. Tahseen A. Jilani-DCS-Uok5 Computing time-Simpler data Computing time-Simpler data Predictive/descriptive accuracy Predictive/descriptive accuracy it measures how well the data is summarized and generalized into the model. We generally expect that by using only relevant features, a data-mining algorithm can not only learn faster but also with higher accuracy. Irrelevant data may mislead a learning process and a final model, while redundant data may complicate the task of learning and cause unexpected data-mining results. Representation Representation If the simplicity of representation improves, a relatively small decrease in accuracy may be tolerable. The need for a balanced view between accuracy and simplicity is necessary, and dimensionality reduction is one of the mechanisms for obtaining this balance. Why Data Reduction

6 Dr. Tahseen A. Jilani-DCS-Uok6 Dimension Reduction The main question is whether some of these prepared and preprocessed data can be discarded without sacrificing the quality of results (Principal of Parsimony) The main question is whether some of these prepared and preprocessed data can be discarded without sacrificing the quality of results (Principal of Parsimony) Can the prepared data be reviewed and a subset found in a reasonable amount of time and space? Can the prepared data be reviewed and a subset found in a reasonable amount of time and space? If the complexity of algorithms for data reduction increases exponentially, then there is little to gain in reducing dimensions in big data. If the complexity of algorithms for data reduction increases exponentially, then there is little to gain in reducing dimensions in big data.

7 Dr. Tahseen A. Jilani-DCS-Uok7 Dimensions of Large Data Sets The choice of data representation, and selection, reduction, or transformation of features is probably the most important issue that determines the quality of a data- mining solution. The choice of data representation, and selection, reduction, or transformation of features is probably the most important issue that determines the quality of a data- mining solution. A large number of features can make available samples of data relatively insufficient for mining. In practice, the number of features can be as many as several hundreds. A large number of features can make available samples of data relatively insufficient for mining. In practice, the number of features can be as many as several hundreds. If we have only a few hundred samples for analysis, dimensionality reduction is required in order for any reliable model to be mined or to be of any practical use. If we have only a few hundred samples for analysis, dimensionality reduction is required in order for any reliable model to be mined or to be of any practical use. On the other hand, data overload, because of high dimensionality, can make some data-mining algorithms non-applicable, and the only solution is again a reduction of data dimensions. On the other hand, data overload, because of high dimensionality, can make some data-mining algorithms non-applicable, and the only solution is again a reduction of data dimensions.

8 Dr. Tahseen A. Jilani-DCS-Uok8 Main Objectives in Data Reduction The three basic operations in a data-reduction process are The three basic operations in a data-reduction process are –Delete a column (Principal Component Analysis) –Delete a row (Profile Analysis, Self Organization Analysis, Classification and Clustering) –Reduce the number of values in a column (smooth a feature). These operations attempt to preserve the character of the original data by deleting data that are nonessential These operations attempt to preserve the character of the original data by deleting data that are nonessential

9 Dr. Tahseen A. Jilani-DCS-Uok9 ENTROPY MEASURE FOR RANKING FEATURES A method for unsupervised feature selection or ranking based on entropy measure is a relatively simple technique; but with a large number of features its complexity increases significantly. A method for unsupervised feature selection or ranking based on entropy measure is a relatively simple technique; but with a large number of features its complexity increases significantly. The basic assumption is that all samples are given as vectors of a feature's values without any classification of output samples. The basic assumption is that all samples are given as vectors of a feature's values without any classification of output samples. The approach is based on the observation that removing an irrelevant feature, a redundant feature, or both from a set may not change the basic characteristics of the data set. The approach is based on the observation that removing an irrelevant feature, a redundant feature, or both from a set may not change the basic characteristics of the data set. The idea is to remove as many features as possible but yet maintain the level of distinction between the samples in the data set as if no features had been removed. The idea is to remove as many features as possible but yet maintain the level of distinction between the samples in the data set as if no features had been removed.

10 Dr. Tahseen A. Jilani-DCS-Uok10 ENTROPY MEASURE FOR RANKING FEATURES Algorithm Algorithm The algorithm is based on a similarity measure S that is in inverse proportion to the distance D between two n- dimensional samples. The algorithm is based on a similarity measure S that is in inverse proportion to the distance D between two n- dimensional samples. The distance measure D is small for close samples (close to zero) and large for distinct pairs (close to one). When the features are numeric, the similarity measure S of two samples can be defined as The distance measure D is small for close samples (close to zero) and large for distinct pairs (close to one). When the features are numeric, the similarity measure S of two samples can be defined as where D ij is the distance between samples xi and xj and α is a parameter mathematically expressed as where D ij is the distance between samples xi and xj and α is a parameter mathematically expressed as

11 Dr. Tahseen A. Jilani-DCS-Uok11 ENTROPY MEASURE FOR RANKING FEATURES (Continue) D is the average distance among samples in the data set. Hence, α is determined by the data. But, in a successfully implemented practical application, it was used a constant value of α = 0.5. Normalized Euclidean distance measure is used to calculate the distance Dij between two samples xi and xj: D is the average distance among samples in the data set. Hence, α is determined by the data. But, in a successfully implemented practical application, it was used a constant value of α = 0.5. Normalized Euclidean distance measure is used to calculate the distance Dij between two samples xi and xj: where n is the number of dimensions and max(k) and min(k) are maximum and minimum values used for normalization of the k-th dimension. where n is the number of dimensions and max(k) and min(k) are maximum and minimum values used for normalization of the k-th dimension. All features are not numeric. The similarity for nominal variables is measured directly using Hamming distance: All features are not numeric. The similarity for nominal variables is measured directly using Hamming distance:

12 Dr. Tahseen A. Jilani-DCS-Uok12 ENTROPY MEASURE FOR RANKING FEATURES (Continue) where where The total number of variables is equal to n. For mixed data, we can discretize numeric values (Binning) and transform numeric features into nominal features before we apply this similarity measure. The total number of variables is equal to n. For mixed data, we can discretize numeric values (Binning) and transform numeric features into nominal features before we apply this similarity measure. Figure 3.1 is an example of a simple data set with three categorical features; corresponding similarities are given in Table 3.1. Figure 3.1 is an example of a simple data set with three categorical features; corresponding similarities are given in Table 3.1.

13 Dr. Tahseen A. Jilani-DCS-Uok13 Features Similarity Measure for Nominal Data SampleF1F1 F2F2 F3F3 R1R1 R2R2 R3R3 R4R4 R5R5 R1R1 AX1R1R1 10/3 2/30/3 R2R2 BY2R2R2-- 12/31/30/3 R3R3 CY2R3R3---- 1 1/3 R4R4 BX1R4R4------ 10/3 R5R5 CZ3R5R5-------- 1 A tabular representation of similarity measures S ENTROPY MEASURE FOR RANKING FEATURES (Continue)

14 Dr. Tahseen A. Jilani-DCS-Uok14 ENTROPY MEASURE FOR RANKING FEATURES (Continue) The distribution of all similarities (distances) for a given data set is a characteristic of the organization and order of data in an n-dimensional space. The distribution of all similarities (distances) for a given data set is a characteristic of the organization and order of data in an n-dimensional space. This organization may be more or less ordered. Changes in the level of order in a data set are the main criteria for inclusion or exclusion of a feature from the features set; these changes may be measured by entropy. This organization may be more or less ordered. Changes in the level of order in a data set are the main criteria for inclusion or exclusion of a feature from the features set; these changes may be measured by entropy. From information theory, we know that entropy is a global measure, From information theory, we know that entropy is a global measure, It is less for ordered configurations and higher for disordered configurations. It is less for ordered configurations and higher for disordered configurations. The proposed technique compares the entropy measure for a given data set before and after removal of a feature. The proposed technique compares the entropy measure for a given data set before and after removal of a feature.

15 Dr. Tahseen A. Jilani-DCS-Uok15 Entropy function If the two measures are close, then the reduced set of features will satisfactorily approximate the original set. For a data set of N samples, the entropy measure is If the two measures are close, then the reduced set of features will satisfactorily approximate the original set. For a data set of N samples, the entropy measure is where Sij is the similarity between samples xi and xj. This measure is computed in each of the iterations as a basis for deciding the ranking of features. We rank features by gradually removing the least important feature in maintaining the order in the configurations of data. The steps of the algorithm are base on sequential backward ranking, and they have been successfully tested on several real-world applications where Sij is the similarity between samples xi and xj. This measure is computed in each of the iterations as a basis for deciding the ranking of features. We rank features by gradually removing the least important feature in maintaining the order in the configurations of data. The steps of the algorithm are base on sequential backward ranking, and they have been successfully tested on several real-world applications

16 Dr. Tahseen A. Jilani-DCS-Uok16 Entropy function Algorithm Start with the initial full set of features F. Start with the initial full set of features F. For each feature, remove one feature f from F and obtain a subset Ff. Find the difference between entropy for F and entropy for all F f. Let f k be a feature such that the difference between entropy for F and entropy for Ffk is minimum. For each feature, remove one feature f from F and obtain a subset Ff. Find the difference between entropy for F and entropy for all F f. Let f k be a feature such that the difference between entropy for F and entropy for Ffk is minimum. Update the set of features F = F – {f k }, where - is a difference operation on sets. In our example, if the difference (E F - E F -F 1 ) is minimum, then the reduced set of features is {F 2, F 3 }. F 1 becomes the bottom of the ranked list. Update the set of features F = F – {f k }, where - is a difference operation on sets. In our example, if the difference (E F - E F -F 1 ) is minimum, then the reduced set of features is {F 2, F 3 }. F 1 becomes the bottom of the ranked list. Repeat steps 2-4 until there is only one feature in F. Repeat steps 2-4 until there is only one feature in F.

17 Dr. Tahseen A. Jilani-DCS-Uok17 Entropy function Algorithm A ranking process may be stopped in any iteration, and may be transformed into a process of selecting features, using the additional criterion mentioned in step 4. A ranking process may be stopped in any iteration, and may be transformed into a process of selecting features, using the additional criterion mentioned in step 4. This criterion is that the difference between entropy for F and entropy for Ff should be less then the approved threshold value to reduce feature fk from set F. This criterion is that the difference between entropy for F and entropy for Ff should be less then the approved threshold value to reduce feature fk from set F. A computational complexity is the basic disadvantage of this algorithm, and its parallel implementation could overcome the problems of working with large data sets and large number of features sequentially. A computational complexity is the basic disadvantage of this algorithm, and its parallel implementation could overcome the problems of working with large data sets and large number of features sequentially.

18 Dr. Tahseen A. Jilani-DCS-Uok18 Principal Component Analysis A Principal component analysis is concerned with explaining the variance-covariance structure of a set of variables through a few linear combinations of these variables. Its general objectives are A Principal component analysis is concerned with explaining the variance-covariance structure of a set of variables through a few linear combinations of these variables. Its general objectives are Data Reduction Data Reduction Interpretation Interpretation If we have p components to describe the complete variability of the system, often much of this variability can be accounted for by a small number of ‘k’ of the principal components. If so, there is (almost) as much information in the k components as there is in the original p variables. The k principal components can then replace the initial p variable, and the original data set, consisting of n measurements on p variables, is reduced to a data set consisting of n measurements of k principal components. If we have p components to describe the complete variability of the system, often much of this variability can be accounted for by a small number of ‘k’ of the principal components. If so, there is (almost) as much information in the k components as there is in the original p variables. The k principal components can then replace the initial p variable, and the original data set, consisting of n measurements on p variables, is reduced to a data set consisting of n measurements of k principal components.

19 Dr. Tahseen A. Jilani-DCS-Uok19 Principal Component Analysis (Continue) An analysis of principal components often reveals relationships that were not previously suspected and thereby allows interpretations that would not ordinarily result. An analysis of principal components often reveals relationships that were not previously suspected and thereby allows interpretations that would not ordinarily result. Analyses of principal components provides intermediate steps in much larger investigations. For example principal components may be input for a multiple regression model or for cluster analysis or factor analysis. Analyses of principal components provides intermediate steps in much larger investigations. For example principal components may be input for a multiple regression model or for cluster analysis or factor analysis.

20 Dr. Tahseen A. Jilani-DCS-Uok20 Principal Component Analysis (Continue) Algebraically, principal components are particular linear combination of the p random variables. Algebraically, principal components are particular linear combination of the p random variables. Geometrically, these linear combinations represent the selection of a new coordinate system obtained by rotating the original system as the coordinate axes. The new axes represent the directions with maximum variability and provide a simpler and more parsimonious description of the covariance structure. Geometrically, these linear combinations represent the selection of a new coordinate system obtained by rotating the original system as the coordinate axes. The new axes represent the directions with maximum variability and provide a simpler and more parsimonious description of the covariance structure. The principal components depends solely on the covariance matrix (or the correlation matrix) of The principal components depends solely on the covariance matrix (or the correlation matrix) of

21 Dr. Tahseen A. Jilani-DCS-Uok21 Principal Component Analysis (Continue) The important characteristic is those principal components do not require assumption of multivariate normal distribution. But if data follows multivariate normal distribution then the interpretation about using constant density and making inference using sample principal components. The important characteristic is those principal components do not require assumption of multivariate normal distribution. But if data follows multivariate normal distribution then the interpretation about using constant density and making inference using sample principal components. Let the random vector have the covariance matrix with eigenvalues. Let the random vector have the covariance matrix with eigenvalues. Consider the linear combinations Consider the linear combinations

22 Dr. Tahseen A. Jilani-DCS-Uok22 Principal Component Analysis (Continue) Then, we can obtain Then, we can obtain The principal components are those uncorrelated linear combinations whose variances are as large as possible. The principal components are those uncorrelated linear combinations whose variances are as large as possible. The first principal component is the linear combination with maximum variance. That is, it maximizes. The first principal component is the linear combination with maximum variance. That is, it maximizes. It is clear that can be increased by multiplying any by some constant. To eliminate this indeterminacy it is convenient to restrict attention to coefficient vector of unit length. We therefore define. It is clear that can be increased by multiplying any by some constant. To eliminate this indeterminacy it is convenient to restrict attention to coefficient vector of unit length. We therefore define.

23 Dr. Tahseen A. Jilani-DCS-Uok23 Principal Component Analysis (Continue) First Principal Component= Linear combination that maximizes subject to. First Principal Component= Linear combination that maximizes subject to. Second Principal Component= Second Principal Component= Linear combination that maximizes subject to and At the ith step that maximizes subject to At the ith step that maximizes subject to and. and. Important Results

24 Dr. Tahseen A. Jilani-DCS-Uok24 Principal Component Analysis (Continue) Proportion of total population variance Proportion of total population variance due to kth principal component due to kth principal component If are the principal components obtained from the covariance matrix, then If are the principal components obtained from the covariance matrix, then are the correlation coefficient between and.Here are the eigenvalue-eigenvector pairs.

25 Dr. Tahseen A. Jilani-DCS-Uok25 Principal Component Analysis (Continue) Example Suppose the random variables have the covariance matrix Suppose the random variables have the covariance matrix It may be verified that the eigenvalues-eigenvector pairs are It may be verified that the eigenvalues-eigenvector pairs are Therefore, the principal components become Therefore, the principal components become

26 Dr. Tahseen A. Jilani-DCS-Uok26 Principal Component Analysis (Continue) The variable is one of the principal components, because it is uncorrelated with the other two variables. This implies The variable is one of the principal components, because it is uncorrelated with the other two variables. This impliesFurthermore

27 Dr. Tahseen A. Jilani-DCS-Uok27 Principal Component Analysis (Continue) Therefore, only first two principal components account for 98% of the total variance. In this case the components Therefore, only first two principal components account for 98% of the total variance. In this case the components could replace the original three variables with little loss of information. The correlations of original vectors with principal components are As is neglected so no need to calculate its correlation.

28 Dr. Tahseen A. Jilani-DCS-Uok28 Principal Component Analysis (Continue) The number of Principal Components There is always a question of how many components to retain. There is no definitive answer to this question. Things to consider include There is always a question of how many components to retain. There is no definitive answer to this question. Things to consider include The amount of total sample variance explained The amount of total sample variance explained The relative sizes of eigenvalues (or say the variance of the sample components) The relative sizes of eigenvalues (or say the variance of the sample components) The subject-matter interpretations of the components. The subject-matter interpretations of the components.

29 Dr. Tahseen A. Jilani-DCS-Uok29 Principal Component Analysis (Continue) Scree Plot A useful visual aid to determine an appropriate number of principal components is a Scree plot. With the eigenvalues ordered from largest to smallest, a Scree plot is a plot of versus i- the magnitude of an eigen value versus its number (bend) in the Scree plot. A useful visual aid to determine an appropriate number of principal components is a Scree plot. With the eigenvalues ordered from largest to smallest, a Scree plot is a plot of versus i- the magnitude of an eigen value versus its number (bend) in the Scree plot.

30 Dr. Tahseen A. Jilani-DCS-Uok30 SPSS FACTOR ANALYSIS OF CUSTOMER.SAV

31 Dr. Tahseen A. Jilani-DCS-Uok31 SPSS FACTOR ANALYSIS OF CUSTOMER.SAV

32 Dr. Tahseen A. Jilani-DCS-Uok32 SPSS FACTOR ANALYSIS OF CUSTOMER.SAV

33 Dr. Tahseen A. Jilani-DCS-Uok33 Factor Analysis Factor analysis attempts to identify underlying variables, or factors, that explain the pattern of correlations within a set of observed variables. Factor analysis attempts to identify underlying variables, or factors, that explain the pattern of correlations within a set of observed variables. Factor analysis is often used in data reduction to identify a small number of factors that explain most of the variance that is observed in a much larger number of manifest variables. Factor analysis is often used in data reduction to identify a small number of factors that explain most of the variance that is observed in a much larger number of manifest variables. Factor analysis can also be used to generate hypotheses regarding causal mechanisms or to screen variables for subsequent analysis (for example, to identify collinearity prior to performing a linear regression analysis). Factor analysis can also be used to generate hypotheses regarding causal mechanisms or to screen variables for subsequent analysis (for example, to identify collinearity prior to performing a linear regression analysis).

34 Dr. Tahseen A. Jilani-DCS-Uok34 Factor Analysis (Continue) The factor analysis procedure offers a high degree of flexibility: The factor analysis procedure offers a high degree of flexibility: –Seven methods of factor extraction are available. –Five methods of rotation are available, including direct oblimin and promax for non-orthogonal rotations. –Three methods of computing factor scores are available, and scores can be saved as variables for further analysis. The essential purpose of factor analysis is t describe, if possible, the covariance (Correlation) relationships among many variables in terms of a few underlying, but unobserved, random quantities called factors. The essential purpose of factor analysis is t describe, if possible, the covariance (Correlation) relationships among many variables in terms of a few underlying, but unobserved, random quantities called factors.

35 Dr. Tahseen A. Jilani-DCS-Uok35 Factor analysis can be considered an extension of principal component analysis. Both can be viewed as attempts to approximate the covariance matrix. However, the approximation based on the factor analysis is more elaborate. Factor analysis can be considered an extension of principal component analysis. Both can be viewed as attempts to approximate the covariance matrix. However, the approximation based on the factor analysis is more elaborate. The primary question in factor analysis is whether the data are consistent with a prescribed structure. The primary question in factor analysis is whether the data are consistent with a prescribed structure. Factor Analysis (Continue)

36 Dr. Tahseen A. Jilani-DCS-Uok36 The orthogonal Factor Model The observed random vector The observed random vector with p components, has mean and covariance matrix. The factor model postulates that upon a few unobserved random variables is linearly dependent called common factors, and p additional sources of variation called errors or sometimes specific factors (includes measurement errors). In particular, the factor analysis model is

37 Dr. Tahseen A. Jilani-DCS-Uok37 The orthogonal Factor Model (Continue) In particular, the factor analysis model is In particular, the factor analysis model is or, in matrix notation or, in matrix notation The coefficients is called loading of the ith variable on the jth factor, so the matrix L is the matrix of factor loading. The coefficients is called loading of the ith variable on the jth factor, so the matrix L is the matrix of factor loading.

38 Dr. Tahseen A. Jilani-DCS-Uok38 The orthogonal Factor Model (Continue) Note that the ith specific factor is associated only with the ith response X. Note that the ith specific factor is associated only with the ith response X. Here the p deviations (of given data) Here the p deviations (of given data) are expressed in terms ofrandom variables

39 Dr. Tahseen A. Jilani-DCS-Uok39 VALUES REDUCTION (BINNING) A reduction in the number of discrete values for a given feature is based on the second set of techniques in the data-reduction phase; these are the feature-discretization techniques. A reduction in the number of discrete values for a given feature is based on the second set of techniques in the data-reduction phase; these are the feature-discretization techniques. The task is to discretize the values of continuous features into a small number of intervals, where each interval is mapped to a discrete symbol. The task is to discretize the values of continuous features into a small number of intervals, where each interval is mapped to a discrete symbol. The benefits of these techniques are simplified data description and easy-to-understand data and final data- mining results. Also, more data mining techniques are applicable with discrete feature values. An "old fashioned" discretization is made manually, based on our a priori knowledge about the feature. The benefits of these techniques are simplified data description and easy-to-understand data and final data- mining results. Also, more data mining techniques are applicable with discrete feature values. An "old fashioned" discretization is made manually, based on our a priori knowledge about the feature.

40 Dr. Tahseen A. Jilani-DCS-Uok40 VALUES REDUCTION (BINNING) Example Example, Binning Age Feature values Example, Binning Age Feature values Given the continuous/measurable nature at the beginning of a data-mining process for age feature (between 0 and 150 years) may be classified into categorical segments: child, adolescent, adult, middle age, and elderly. Cut off points are subjectively defined. Two main questions exist about this reduction process: –What are the cut-off points? –How does one select representatives of intervals?

41 Dr. Tahseen A. Jilani-DCS-Uok41 VALUES REDUCTION (BINNING) Note: A reduction in feature values usually is not harmful for real-world data-mining applications, and it leads to a major decrease in computational complexity.

42 Dr. Tahseen A. Jilani-DCS-Uok42 BINNING- Continue

43 Dr. Tahseen A. Jilani-DCS-Uok43 BINNING- Continue

44 Dr. Tahseen A. Jilani-DCS-Uok44 Feature Discretization: CHI-MERGE Technique An automated discretization algorithm that analyzes the quality of multiple intervals for a given feature by using χ 2 statistics. An automated discretization algorithm that analyzes the quality of multiple intervals for a given feature by using χ 2 statistics. The algorithm determines similarities between distributions of data in two adjacent intervals based on output classification of samples. The algorithm determines similarities between distributions of data in two adjacent intervals based on output classification of samples. If null hypothesis is true then the two consecutive intervals are merged to form a single big interval. Assuming the the intervals are non-overlapping. If null hypothesis is true then the two consecutive intervals are merged to form a single big interval. Assuming the the intervals are non-overlapping.

45 Dr. Tahseen A. Jilani-DCS-Uok45 CHI-MERGE Technique (Continue) A contingency table for 2 × 2 categorical data Class 1Class 2∑ Interval-1A 11 A 12 R1R1 Interval-2A 21 A 22 R2R2 ∑C1C1 C2C2 N

46 Dr. Tahseen A. Jilani-DCS-Uok46 CHI-MERGE Technique (Continue)

47 Dr. Tahseen A. Jilani-DCS-Uok47 VALUES REDUCTION (BINNING)

48 Dr. Tahseen A. Jilani-DCS-Uok48 Mehmed Kantardzic, “Data Mining: Concepts, Models, Methods, and Algorithms, John Wiley & Sons, 2003. Mehmed Kantardzic, “Data Mining: Concepts, Models, Methods, and Algorithms, John Wiley & Sons, 2003. William Johnson, Applied Multivariate Analysis, Parson’s Education, Low Price Edition, 2005. William Johnson, Applied Multivariate Analysis, Parson’s Education, Low Price Edition, 2005. References

49 Dr. Tahseen A. Jilani-DCS-Uok49 Thank You


Download ppt "DATA REDUCTION (Lecture# 03) Dr. Tahseen Ahmed Jilani Assistant Professor Member IEEE-CIS, IFSA, IRSS Department of Computer Science University of Karachi."

Similar presentations


Ads by Google