Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India.
Published byModified over 5 years ago
Presentation on theme: "Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India."— Presentation transcript:
Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India
Data Mining has emerged as one of the most exciting and dynamic fields in computing science. The driving force for data mining is the presence of petabyte-scale online archives that potentially contain valuable bits of information hidden in them. Commercial enterprises have been quick to recognize the value of this concept; consequently, within the span of a few years, the software market itself for data mining is expected to be in excess of $10 billion. Data mining refers to a family of techniques used to detect interesting nuggets of relationships/knowledge in data. While the theoretical underpinnings of the field have been around for quite some time (in the form of pattern recognition, statistics, data analysis and machine learning), the practice and use of these techniques have been largely ad-hoc. With the availability of large databases to store, manage and assimilate data, the new thrust of data mining lies at the intersection of database systems, artificial intelligence and algorithms that efficiently analyze data. The distributed nature of several databases, their size and the high complexity of many techniques present interesting computational challenges.
Data Preparation Database Theory SQL Data Transformation http://www.ecn.purdue.edu/KDDCUP/data/
Classification Find a rule, a formula, or black box classifier for organizing data into classes. –Classify clients requesting loans into categories based on the likelihood of repayment –Classify customers into Big or Moderate Spenders based on what they buy –Classify the customers into loyal, semi-loyal, infrequent based on the products they buy The classifier is developed from the data in the training set The reliability of the classifier is evaluated using the test set of data
Classification ID3 Algorithm –Numerical Illustration –Application to a Small E-commerce Dataset C4.5 for Experimentation Other approaches –Neural Networks –Fuzzy Classification –Rough Set Theory
Association Market basket analysis –determine which things go together Transactions might reveal that –customers who buy banana also buy candles –cheese and pickled onions seem to occur frequently in a shopping cart Information can be used for –arranging a physical shop or structuring the Web site –for targeted advertising campaign
Association Apriori Algorithm Demonstration for an E-commerce Application
Breaks a large database into different subgroups or clusters Unlike classification there are no predefined classes The clusters are put together on the basis of similarity to each other The data miners determine whether the clusters offer any useful insight
Other topics in Web Content Mining Search Engines –How to prepare for and setup a search engine –Types and listings of search engines (freeware, remote hosting services, commercial) Multimedia Information Retrieval