Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Why the Information Explosion Can Be Bad for Data Mining, and How Data Fusion Provides a Way Out Written By: Putten, Kok, Gupta Presented By: Ernesto.

Similar presentations


Presentation on theme: "1 Why the Information Explosion Can Be Bad for Data Mining, and How Data Fusion Provides a Way Out Written By: Putten, Kok, Gupta Presented By: Ernesto."— Presentation transcript:

1 1 Why the Information Explosion Can Be Bad for Data Mining, and How Data Fusion Provides a Way Out Written By: Putten, Kok, Gupta Presented By: Ernesto Ochandio DSCI 5240 November Dec 7, 2005

2 2 Problem Definition Exponential growth in data capture leads to data fragmentation. –POS customer tracking –Corporate Data Warehouse –Advanced Analytics Increased popularity of personalized messages. Prohibitive attitudinal data costs.

3 3 Data Fusion Overview Data Fusion is the combination of information from different sources. Also known as: Micro Data Set Merging, Statistical Record Linkage, and Multi-Source Imputation Example: –Demographic and psychographic data aggregated at geographical level. –Same characteristics for people in the same region. Motivation: –Algorithms can create generalized fusions providing richer data sets for use in applications or future data mining projects.

4 4 Data Fusion Terminology Recipient, Donor, Fused Variables, Common Variables, Critical Common Variables += RecipientDonor Fused Dataset Common Variables Fused Variables

5 5 Data Fusion Algorithm Find best Donor elements that match the Recipient element. Ensure Critical Variable exact match. Limit Donor element usage. Use averages from the Donor set to estimate the Fused variables for the Recipient set. += RecipientDonor Fused Dataset

6 6 Conclusion Data Fusion increases the value of Data Mining by creating more data to mine while reducing costs and ensuring the best matches possible without over-representing elements in the Donor set.


Download ppt "1 Why the Information Explosion Can Be Bad for Data Mining, and How Data Fusion Provides a Way Out Written By: Putten, Kok, Gupta Presented By: Ernesto."

Similar presentations


Ads by Google