Presentation is loading. Please wait.

Presentation is loading. Please wait.

Research paper: Web Mining Research: A survey SIGKDD Explorations, June 2000. Volume 2, Issue 1 Author: R. Kosala and H. Blockeel.

Similar presentations


Presentation on theme: "Research paper: Web Mining Research: A survey SIGKDD Explorations, June 2000. Volume 2, Issue 1 Author: R. Kosala and H. Blockeel."— Presentation transcript:

1 Research paper: Web Mining Research: A survey SIGKDD Explorations, June 2000. Volume 2, Issue 1 Author: R. Kosala and H. Blockeel

2  Introduction  Web Mining  Web Content Mining  Web Structure Mining  Web Usage Mining  Conclusion

3  The World Wide Web is a popular and interactive medium to disseminate information  Information users may encounter four problems 1. Finding relevant information a. low precision b. low recall 2. Creating new knowledge out of the information available on the web --- data-triggered process 3. Personalizing of the information People differ in the content and presentations of information 4. Learning about consumers or individual users Mass customizing or even personalizing

4  Definition: web mining refers to the overall process of discovering potentially useful and previously unknown information or knowledge from the web data  Four subtasks  Resource finding: retrieving intended web documents  Information selection and pre-processing: selecting and pre- processing specific information  Generalization: discovering general patterns  Analysis: validation and/or interpretation of mined patterns

5  Web Mining and Information Retrieval Definition: IR is the automatic retrieval of all relevant documents while at the same time retrieving as few of the non-relevant documents as possible. goal: indexing and searching for useful documents  Web Mining and Information Extraction IE has the goal of transforming a collection of documents into information that is more readily digested and analyzed.  Compare IR and IE a. aims b. fields

6  Web Mining and the Agent Paradigm Web mining is often viewed from or implemented within an agent paradigm 1. User interface agents 2. Distributed agents 3. Mobile agents Two approaches used to develop intelligent agents 1. Content-based approach 2. Collaborative approach

7  Definition: discovering useful info from web page contents/data/documents  Several types of data: text, image, audio, video, hyperlinks  Types of Data Structure: 1.Unstructured: free text 2.Semi- structured: HTML 3.More structured: data in tables or database generated HTML pages

8  IR view: Unstructured Documents a. Bag of words to represent unstructured documents b. Feature: Boolean, Frequency based c. Variations of the feature selection d. Features could be reduced using different feature selection techniques Semi-Structured Documents a. Uses richer representations for features b. Uses common data mining methods

9  DB view: DB view tries to infer the structure of a web site or transform a web site to become a database Methods: a. Finding the scheme of web documents b. Building a web warehouse c. Building a web knowledge base d. Building a virtual database

10  Interested in the structure of the hyperlinks within the web  Inspired by the study of social networks and citation analysis Discover specific types of pages based on the incoming and outgoing links  Application: a. discovering micro-communities in the web b. measuring the completeness of a web site

11  Tries to predict user behavior from interaction with the web  Wide range of data  Two commonly used approaches a. Maps the usage data of Web server into relational tables before an adapted data mining technique is performed b. Uses the log data directly by utilizing special pre-processing techniques  problems: a. Distinguishing among unique users, server sessions, episodes in the presence of caching and proxy servers b. Often usage mining uses some background or domain knowledge  applications

12  Survey of research in the area of web mining  Three web mining categories: content structure usage mining  Connection between web mining categories and related agent paradigm

13


Download ppt "Research paper: Web Mining Research: A survey SIGKDD Explorations, June 2000. Volume 2, Issue 1 Author: R. Kosala and H. Blockeel."

Similar presentations


Ads by Google