Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dept. of Computer Science University of Liverpool

Similar presentations


Presentation on theme: "Dept. of Computer Science University of Liverpool"— Presentation transcript:

1 Dept. of Computer Science University of Liverpool
COMP527: Data Mining COMP527: Data Mining M. Sulaiman Khan Dept. of Computer Science University of Liverpool 2009 Attribute Selection March 4, Slide 1

2 COMP527: Data Mining COMP527: Data Mining Introduction to the Course
Introduction to Data Mining Introduction to Text Mining General Data Mining Issues Data Warehousing Classification: Challenges, Basics Classification: Rules Classification: Trees Classification: Trees 2 Classification: Bayes Classification: Neural Networks Classification: SVM Classification: Evaluation Classification: Evaluation 2 Regression, Prediction Input Preprocessing Attribute Selection Association Rule Mining ARM: A Priori and Data Structures ARM: Improvements ARM: Advanced Techniques Clustering: Challenges, Basics Clustering: Improvements Clustering: Advanced Algorithms Hybrid Approaches Graph Mining, Web Mining Text Mining: Challenges, Basics Text Mining: Text-as-Data Text Mining: Text-as-Language Revision for Exam Attribute Selection March 4, Slide 2

3 Dimensionality Reduction Genetic Algorithms Compression
Today's Topics COMP527: Data Mining Sampling Dimensionality Reduction Genetic Algorithms Compression Principal Component Analysis Attribute Selection March 4, Slide 3

4 Instance Selection COMP527: Data Mining Before getting to the data mining, we may want to either remove instances or select only a portion of the complete data set to work with. Why? Perhaps our algorithms don't scale well to the amount of data we have. Partitioning: Split the database into sections and work with each in turn. Often not appropriate unless the algorithm is designed to do it. Sampling: Select a random subset of the data and use that which is hopefully representative. Attribute Selection March 4, Slide 4

5 Simple Random Sample without Replacement:
Sampling COMP527: Data Mining Simple Random Sample without Replacement: Draw n instances, with the same probability of drawing each instance. Once an instance is drawn, it is removed from the pool. Simple Random Sample with Replacement: As above, but the instance is replaced so that it can be drawn again. Advantages: Cost is proportional to number of items in the sample, not the entire data set. However it's random and simple... we might randomly get a very non-representative sample. Attribute Selection March 4, Slide 5

6 Sampling Sample from Clusters:
COMP527: Data Mining Sample from Clusters: Cluster the data first into k clusters, then draw a random sample from each cluster, with or without replacement. If the clustering algorithm performs well, then the sample will be significantly likely to be more representative. However clustering is expensive ... possibly more expensive than using the entire data set. Stratified Sample: Group the instances according to some attribute value (eg class) and draw a random sample from each layer. Could be considered as naive clustering. Attribute Selection March 4, Slide 6

7 Outliers / Noise COMP527: Data Mining It may be tempting to remove all of the noisy data so that the system can learn a cleaner model for classification However, if the data to be classified is also noisy, the classifier trained on pure data will perform worse than one trained with similar noise to the noise in the test set. Successful classifiers have some tolerance for noise in the training set, where as if the noise is removed, it might overfit. Attribute Selection March 4, Slide 7

8 Dimensionality Reduction
COMP527: Data Mining Results can often be improved by reducing the number of attributes (dimensions) in an instance. In particular, removing redundant, noisy, or otherwise unhelpful attributes can improve speed and accuracy. But we need to know which attributes to use. Two approaches: Filter: Remove attributes first by looking at the data Wrapper: Learning algorithm is used to determine importance Eg, the accuracy of the resulting classifier determines the 'goodness' of each attribute. Attribute Selection March 4, Slide 8

9 Dimensionality Reduction
COMP527: Data Mining Either way, we need a way to search through the combinations of attributes to find the best set. Can't examine all of the combinations, so we need a strategy. Stepwise Forward Selection: Find the best attribute and add. Stepwise Backward Elimination: Find the worst attribute and remove. Genetic Algorithms: Use a 'survival of the fittest' along with random cross-breeding approach. Attribute Selection March 4, Slide 9

10 Dimensionality Reduction
COMP527: Data Mining The problem is now an evaluation of the usefulness of each attribute, and when to stop adding or removing attributes. Some possibilities: Entropy or other similar function (Filter)‏ Evaluate the results of a classifier built using the current set plus/minus each attribute in turn and add/remove the best/worst. (Wrapper) (Very computationally expensive, but can combine with sampling)‏ Stopping is a hill climbing exercise again... stop at a (local) maximum point .. eg where adding/removing attributes doesn't improve the performance. Attribute Selection March 4, Slide 10

11 Genetic Algorithms COMP527: Data Mining Main idea: Simulate the evolutionary process in nature, whereby the fit survive and inter-breed with others to allow mutation. Select the most fit individuals to reproduce, which bear offspring, and iterate. eg: Initial Population Evaluate Select Cross-over Mutate Attribute Selection March 4, Slide 11

12 1. Start with a set of random attribute subsets.
Genetic Algorithms COMP527: Data Mining How does this help? We want to find the best set of attributes by examining sub-optimal sets: 1. Start with a set of random attribute subsets. 2. Evaluate each by looking at classifier accuracy. 3. Select the best sets 4. Select attributes from those sets and cross over with attributes in the other fittest sets. 5. Allow for some random mutations 6. Goto 2, until termination conditions are met. Attribute Selection March 4, Slide 12

13 Select a random instance from the data.
Relief COMP527: Data Mining Select a random instance from the data. Locate the nearest neighbours from its class, and the opposite class (for a 2 class dataset)‏ Compare each attribute of the instance to its neighbours and update an overall relevance score based on its similarity to the other member of its class and dis-similarity to the instance not of its class. Repeat a given number of times, rank the attributes by the 'relevance' score and cut at a given threshold. ReliefF designed to work with more than two classes. Attribute Selection March 4, Slide 13

14 Other Ideas COMP527: Data Mining Learn a decision tree, and then only use the attributes that appear in the tree, after pruning. No effect on building another tree, but can give the attributes to a different algorithm. Use the minimal subset of attributes that allow unique identification of each instance. (Not always possible, can easily overfit)‏ Cluster attributes as instances and remove outliers (if there are a LOT of attributes). Eg a vertical cluster, rather than a horizontal cluster. Attribute Selection March 4, Slide 14

15 Other Searching Methods
COMP527: Data Mining Race Search: Rather than check the accuracy many times for many different sets of attributes, we can have a race where the attributes that lag behind are dropped. Schemata Search: Series of races to determine if each attribute should be dropped. Or generate an initial order (eg through entropy) and then race with these initial weights. Attribute Selection March 4, Slide 15

16 DWT: Discrete Wavelet Transform DFT: Discrete Fourier Transform
Compression COMP527: Data Mining We could use a lossy compression technique to remove attributes that are considered not important enough to keep. Consider a 100% jpeg and a 85% jpeg... they look very similar, but some unnecessary information is lost... the sort of thing we want to do with attributes. Techniques: DWT: Discrete Wavelet Transform DFT: Discrete Fourier Transform PCA: Principal Component Analysis Attribute Selection March 4, Slide 16

17 Discrete Wavelet Transform
COMP527: Data Mining Still not going over the exact details of signal processing : )‏ DWT transforms a vector of attribute values into a different vector of 'wavelet coefficients', of the same length as the original. This transformed vector can be truncated at a certain threshold. Any values below the threshold are set to 0. The remaining data is then an approximation of the original, in the transformed space. We can reverse the transformation to return to the original attributes, minus the ones lost in the truncation. There are many different DWTs, in families (eg Haar and Daubechies)‏ Attribute Selection March 4, Slide 17

18 Discrete Fourier Transform
COMP527: Data Mining Another signal processing technique, this time using sines and cosines. Fourier's theory is that any signal can be generated by adding up the correct sine waves. The Fourier Transform is an equation to calculate the frequency, amplitude and phase of each sine wave needed. Discrete Fourier Transform is the same thing using sums instead of integrals. However DWTs are more efficient -- DFT uses more space for the same approximation (and hence more attributes). Attribute Selection March 4, Slide 18

19 Principal Component Analysis
Data Mining Not a signal processing technique! Idea: Just because the dataset has various dimensional axes, doesn't mean those are the best axes to use. Find the best axes in order and drop the least important ones. Dataset on regular axes Dataset on revised axes Attribute Selection March 4, Slide 19

20 Principal Component Analysis
Data Mining Place the first axis in the direction of the greatest variance. Then continue to place axes in order of variance, such that they are orthogonal to all other axes. Not too complicated for a computer program to do (but too complicated to explain how it does it, especially in N dimensions at once!)‏ The variance can be graphed... Attribute Selection March 4, Slide 20

21 Principal Component Analysis
Data Mining First 3 axes make up 84% of the variance! Attribute Selection March 4, Slide 21

22 Benchmarking Attribute Selection Techniques:
Further Reading COMP527: Data Mining Benchmarking Attribute Selection Techniques: Witten , 7.3 Han Chapter 2 Attribute Selection March 4, Slide 22


Download ppt "Dept. of Computer Science University of Liverpool"

Similar presentations


Ads by Google