Presentation is loading. Please wait.

Presentation is loading. Please wait.

Linear Clustering Algorithm BY Horne Ken & Khan Farhana & Padubidri Shweta.

Similar presentations


Presentation on theme: "Linear Clustering Algorithm BY Horne Ken & Khan Farhana & Padubidri Shweta."— Presentation transcript:

1 Linear Clustering Algorithm BY Horne Ken & Khan Farhana & Padubidri Shweta

2 2 Overview Introduction Data Preprocessing Data Mining Data Visualization Experiment Conclusion

3 3 Responsibility Data Preprocessing : Farhana & Ken Data Mining : Ken Data Visualization: Shweta

4 4 Overview A Linear Clustering Algorithm Applications 1. Feature selection – Choose features based on information gain 2. Discretization – Partition based on data set characteristics

5 5 Data Preprocessing Data Ferret(Federated Electronic Research,Review,Extraction & Tabulation Tool) Install the software Web-version http://www.thedataweb.org/what_ferrett.html

6 6 Data Pre-processing : Step Extracted data from CPS (Current Population Survey) Pre-processing Number of features 43 Year 2007-2008 115,000/month rows over 50 states After preprocessing 23 Normalization

7 Data Mining Algorithm Choose an ordinal attribute (X) Order data points based on attribute List potential partition points (between successive values of X) For each potential partition point P Calculate distance of data points where X P Results Can partition data points Order data points by information gain

8 Data Mining Test dataset

9 Data Mining Test dataset 2

10 10 Experimental Setup Environment 1. Data Ferret : Data Pre-processing 2. Java Platform : Implement the Data Mining Algorithm 3. Data Visualization 1. Google App Engine Datastore API Python, javascript and Django Framework 2. Google Chart API Hardware: Windows XP laptop Core2 2.16 GHz 2.00 GB RAM (that hurt)

11 11 Visualization Demo Link for the web-site http://householdstructure-project.appspot.com/

12 12 Conclusions Preliminary results are encouraging Discretization was successful Lessons learnt and future work Comparison with other methods on well known datasets Evaluate performance in feature selection OPTIMIZE Don't pick a novel dataset & novel algorithm at the same time

13 Thank you Questions


Download ppt "Linear Clustering Algorithm BY Horne Ken & Khan Farhana & Padubidri Shweta."

Similar presentations


Ads by Google