hiddenThere is often information hidden in the data that is not readily evident Human analysts may take weeks to discover useful information Much of the data is never analyzed at all Number of analysts Total new disk (TB) since 1995 The Data Gap Gap
Data collected and stored at enormous speeds (GB/hour) Traditional techniques infeasible for raw data Data mining may help scientists
Classification Regression Collaborative Filtering Clustering Association rules Deviation detection
ClassifierDecision rules Salary > 5 L Prof. = Exec New applicants data Many approaches: Statistics, Decision Trees, Neural Networks,...
Unsupervised learning when old data with class labels not available e.g. when introducing a new product.
Given set T of groups of items Example: set of item sets purchased MilkCerealRice TeaRiceBread ChipsBreadcheese......
The use of data, particularly about people, for data mining has serious ethical implications. When applied to people discriminate.
Data mining (or simple analysis) on people may come with a profile that would raise controversial issues of – Discrimination – Privacy – Security Examples: – Should males between 18 and 35 from countries that produced terrorists be singled out for search before flight? – Can people be denied mortgage based on age, sex, race? – Women live longer. Should they pay less for life insurance?