WebMiningResearchASurvey Web Mining Research: A Survey Authors: Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Computer Science Department University.

Slides:



Advertisements
Similar presentations
Web Mining.
Advertisements

Web Mining Research: A Survey Authors: Raymond Kosala & Hendrik Blockeel Presenter: Ryan Patterson April 23rd 2014 CS332 Data Mining pg 01.
Machine Learning and the Semantic Web
Funding Networks Abdullah Sevincer University of Nevada, Reno Department of Computer Science & Engineering.
Information Retrieval in Practice
Basic IR: Queries Query is statement of user’s information need. Index is designed to map queries to likely to be relevant documents. Query type, content,
WebMiningResearch ASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007.
Xyleme A Dynamic Warehouse for XML Data of the Web.
Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India.
Web Mining Research: A Survey
The Web is perhaps the single largest data source in the world. Due to the heterogeneity and lack of structure, mining and integration are challenging.
Web Mining Research: A Survey
WebMiningResearch ASurvey Web Mining Research: A Survey By Raymond Kosala & Hendrik Blockeel, Katholieke Universitat Leuven, July 2000 Presented 4/18/2002.
Web Mining Research: A Survey
WebMiningResearchASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007 Revised.
Data Mining By Archana Ketkar.
Data Mining – Intro.
Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Huimin Ye.
Overview of Web Data Mining and Applications Part I
Authors:Jochen Dijrre, Peter Gerstl, Roland Seiffert Adapted from slides by: Trevor Crum Presenter: Nicholas Romano Text Mining: Finding Nuggets in Mountains.
Overview of Search Engines
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
FALL 2012 DSCI5240 Graduate Presentation By Xxxxxxx.
Temporal Event Map Construction For Event Search Qing Li Department of Computer Science City University of Hong Kong.
Web Mining Research: A survey
IS432: Semi-Structured Data Dr. Azeddine Chikh. 1. Semi Structured Data Object Exchange Model.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Research paper: Web Mining Research: A survey SIGKDD Explorations, June Volume 2, Issue 1 Author: R. Kosala and H. Blockeel.
Search Engines and Information Retrieval Chapter 1.
Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.
Chapter 7 DATA, TEXT, AND WEB MINING Pages , 311, Sections 7.3, 7.5, 7.6.
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
Chapter 1 Introduction to Data Mining
Web Usage Patterns Ryan McFadden IST 497E December 5, 2002.
Automatically Extracting Data Records from Web Pages Presenter: Dheerendranath Mundluru
WebMining Web Mining By- Pawan Singh Piyush Arora Pooja Mansharamani Pramod Singh Praveen Kumar 1.
Web Mining By:- Vineeta 8pgc18 M.Tech (II Semester)
Data Mining By Dave Maung.
Chapter 12: Web Usage Mining - An introduction Chapter written by Bamshad Mobasher Many slides are from a tutorial given by B. Berendt, B. Mobasher, M.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
BioRAT: Extracting Biological Information from Full-length Papers David P.A. Corney, Bernard F. Buxton, William B. Langdon and David T. Jones Bioinformatics.
Next Generation Search Engines Ehsun Daroodi 1 Feb, 2003.
Mining real world data Web data. World Wide Web Hypertext documents –Text –Links Web –billions of documents –authored by millions of diverse people –edited.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Web Information Retrieval Prof. Alessandro Agostini 1 Context in Web Search Steve Lawrence Speaker: Antonella Delmestri IEEE Data Engineering Bulletin.
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
Introduction to Data Mining by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.
Information Design Trends Unit Five: Delivery Channels Lecture 2: Portals and Personalization Part 2.
Artificial Intelligence Techniques Internet Applications 4.
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
WEB USAGE MINING Web Usage Mining 1. Contents Web Usage Mining 2  Web Mining  Web Mining Taxonomy  Web Usage Mining  Web analysis tools  Pattern.
© Prentice Hall1 DATA MINING Web Mining Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University Companion slides.
Web mining is the use of data mining techniques to automatically discover and extract information from Web documents/services
Chapter 8: Web Analytics, Web Mining, and Social Analytics
General Architecture of Retrieval Systems 1Adrienn Skrop.
Lecture-6 Bscshelp.com. Todays Lecture  Which Kinds of Applications Are Targeted?  Business intelligence  Search engines.
CS570: Data Mining Spring 2010, TT 1 – 2:15pm Li Xiong.
WEB STRUCTURE MINING SUBMITTED BY: BLESSY JOHN R7A ROLL NO:18.
Information Retrieval in Practice
Data Mining – Intro.
Search Engine Architecture
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Data Warehousing and Data Mining
CSE 635 Multimedia Information Retrieval
Web Mining Department of Computer Science and Engg.
Web Mining Research: A Survey
Presentation transcript:

WebMiningResearchASurvey Web Mining Research: A Survey Authors: Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Computer Science Department University Of Vermont Revised and Presented by Onur Demircan

Outline  Introduction  Web Mining  Web Content Mining  Web Structure Mining  Web Usage Mining  Conclusion & Exam Questions 2Web Mining Research: A Survey

Introduction  With the huge amount of information available online, the World Wide Web is a fertile area for data mining research.  WWW is a popular and interactive medium to circulate information today.  The Web is huge, diverse, and dynamic. Thus raises the scalability, multimedia data, and temporal issues respectively. Web Mining Research: A Survey3

Four Problems  Finding relevant information Low precision and unindexed information  Creating new knowledge out of available information on the web A data-triggered process  Personalizing the information Personal preference in content and presentation of the information  Learning about the consumers What does the customer want to do? 4Web Mining Research: A Survey

Other Approaches Web mining is NOT the only approach  Database approach (DB)  Information retrieval (IR)  Natural language processing (NLP)  Machine Learning  Web document community 5Web Mining Research: A Survey

Direct vs. Indirect Web Mining  Web mining techniques can be used to solve the information overload problems:  Directly Address the problem with web mining techniques E.g. newsgroup agent classifies whether the news as relevant  Indirectly Used as part of a bigger application that addresses problems E.g. used to create index terms for a web search service 6Web Mining Research: A Survey

The Research  Converging research from: Database, information retrieval, and artificial intelligence (specifically NLP and machine learning)  Attempt to put research done in a structured way from the machine learning point of view 7Web Mining Research: A Survey

Outline  Introduction  Web Mining  Web Content Mining  Web Structure Mining  Web Usage Mining  Conclusion & Exam Questions 8Web Mining Research: A Survey

Web Mining: Definition  “Web mining refers to the overall process of discovering potentially useful and previously unknown information or knowledge from the Web data.” Can be viewed as four subtasks 9Web Mining Research: A Survey

Web Mining: Subtasks  Resource finding Retrieving intended web documents  Information selection and pre-processing Select and pre-process specific information from selected documents Kind of transformation processes of the original data retrieved in the IR process This transformation could be a kind of pre-processing  Generalization Discover general patterns within and across web sites  Analysis Validation and/or interpretation of mined patterns 10Web Mining Research: A Survey

Web Mining and Information Retrieval  Information retrieval (IR) is the automatic retrieval of all relevant documents while at the same time retrieving as few of the non-relevant documents as possible  Goal: Indexing text and searching for useful documents in a collection.  Research in IR: modeling, document classification and categorization, user interfaces, data visualization, filtering etc.  Web document classification, which is a Web Mining task, could be part of an IR system (e.g. indexing for a search engine) Viewed in this respect, Web mining is part of the (Web) IR process. 11Web Mining Research: A Survey

Web Mining and Information Extraction  Information Extraction (IE): Transforming a collection of documents, into information that is more easily understood and analyzed.  Building IE systems manually for the general Web are not feasible Most IE systems focus on specific Web sites or content to extract 12Web Mining Research: A Survey

Compare IR and IE  IR aims to select relevant documents IE aims to extract the relevant facts from given documents  IR views the text in a document just as a bag of unordered words IE interested in structure or representation of a document Web Mining Research: A Survey13

Web Mining and The Agent Paradigm  Web mining is often viewed from or implemented within an agent paradigm. Web mining has a close relationship with Intelligent Agents.  User Interface Agents information retrieval agents, information filtering agents, & personal assistant agents.  Distributed Agents Concerned with problem solving by a group of agents. distributed agents for knowledge discovery or data mining.  Mobile Agents 14Web Mining Research: A Survey

Web Mining and The Agent Paradigm (contd.)  Two frequently used approaches for developing intelligent agents:  Content-based approach The system searches for items that match based on an analysis of the content using the user preferences.  Collaborative approach The system tries to find users with similar interests to give recommendations to. Analyze the user profiles and sessions or transactions. 15Web Mining Research: A Survey

Agents based on Filtering Technology Web Mining Research: A Survey16

Outline  Introduction  Web Mining  Web Content Mining  Web Structure Mining  Web Usage Mining  Conclusion & Exam Questions 17Web Mining Research: A Survey

Web Mining Categories  Web Content Mining Discovering useful information from web page contents/data/documents.  Web Structure Mining Discovering the model underlying link structures (topology) on the Web. E.g. discovering authorities and hubs  Web Usage Mining Extraction of interesting knowledge from logging information produced by web servers. Usage data from logs, user profiles, user sessions, cookies, user queries, bookmarks, mouse clicks and scrolls, etc. 18Web Mining Research: A Survey

Web Mining Categories Web Mining Research: A Survey19

Outline  Introduction  Web Mining  Web Content Mining  Web Structure Mining  Web Usage Mining  Conclusion & Exam Questions 20Web Mining Research: A Survey

Web Content Data Structure  Web content consists of several types of data Text, image, audio, video, hyperlinks.  Unstructured – free text  Semi-structured – HTML  More structured – Data in the tables or database generated HTML pages Note: much of the Web content data is unstructured text data. 21Web Mining Research: A Survey21

Web Content Mining: IR View  Unstructured Documents Bag of words to represent unstructured documents Takes single word as feature Ignores the sequence in which words occur Features could be Boolean Word either occurs or does not occur in a document Frequency based Frequency of the word in a document Variations of the feature selection include Removing the case, punctuation, infrequent words and stop words Features can be reduced using different feature selection techniques: Information gain, mutual information, cross entropy. Stemming: which reduces words to their morphological roots. 22Web Mining Research: A Survey

Web Content Mining: IR View  Semi-Structured Documents Uses richer representations for features Due to the additional structural information in the hypertext document (typically HTML and hyperlinks) Uses common data mining methods (whereas unstructured might use more text mining methods) Application: Hypertext classification or categorization and clustering, learning relations between web documents, learning extraction patterns or rules, and finding patterns in semi-structured data. Web Mining Research: A Survey23

Web Content Mining: DB View  The database techniques on the Web are related to the problems of managing and querying the information on the Web.  DB view tries to infer the structure of a Web site or transform a Web site to become a database Better information management Better querying on the Web  Can be achieved by: Finding the schema of Web documents Building a Web warehouse Building a Web knowledge base Building a virtual database 24Web Mining Research: A Survey

Data Warehouse  A data warehouse maintains a copy of information from the source transaction systems.  This architectural complexity provides the opportunity to: Congregates data from multiple sources into a single database so a single query engine can be used to present data. Web Mining Research: A Survey25

Web Content Mining: DB View  DB view mainly uses the Object Exchange Model (OEM) Represents semi-structured data by a labeled graph The data in the OEM is viewed as a graph, with objects as the vertices and labels on the edges Each object is identified by an object identifier [oid] and Value is either atomic or complex  Process typically starts with manual selection of Web sites for doing Web content mining  Main application: The task of finding frequent substructures in semi-structured data The task of creating multi-layered database 26Web Mining Research: A Survey

What is Object Exchange Model(OEM)  An OEM data graph is a rooted, labelled, directed graph  Its edge labels map to strings  Only its leaf nodes have labels which map to data values  No ordering of edges leaving a node Web Mining Research: A Survey27

OEM Example Web Mining Research: A Survey28

Outline  Introduction  Web Mining  Web Content Mining  Web Structure Mining  Web Usage Mining  Conclusion & Exam Questions 29Web Mining Research: A Survey

Web Structure Mining  Interested in the structure of the hyperlinks within the Web  Inspired by the study of social networks and citation analysis Can discover specific types of pages(such as hubs, authorities, etc.) based on the incoming and outgoing links.  Application: Discovering micro-communities in the Web, measuring the “completeness” of a Web site 30Web Mining Research: A Survey

NETWORK GRAPH Web Mining Research: A Survey31

NETWORK GRAPH  Stefan Decker (along with Rudi Studer and Raphael Volz) plays the role of a local bridge between the Karlsruhe group and other parts of the core. Web Mining Research: A Survey32

Outline  Introduction  Web Mining  Web Content Mining  Web Structure Mining  Web Usage Mining  Conclusion & Exam Questions 33Web Mining Research: A Survey

Web Usage Mining  Tries to predict user behavior from interaction with the Web  Wide range of data (logs) Web client data Proxy server data Web server data  Two common approaches Maps the usage data of Web server into relational tables before an adapted data mining techniques Uses the log data directly by utilizing special pre-processing techniques 34Web Mining Research: A Survey

Web Usage Mining  Typical problems: Distinguishing among unique users, server sessions, episodes, etc. in the presence of caching and proxy servers Often Usage Mining uses some background or domain knowledge E.g. site topology, Web content, etc. 35Web Mining Research: A Survey

Web Usage Mining  Applications: Two main categories: Learning a user profile (personalized) Web users would be interested in techniques that learn their needs and preferences automatically Learning user navigation patterns (impersonalized) Information providers would be interested in techniques that improve the effectiveness of their Web site 36Web Mining Research: A Survey

Outline  Introduction  Web Mining  Web Content Mining  Web Structure Mining  Web Usage Mining  Conclusion & Exam Questions 37Web Mining Research: A Survey

Conclusions  Survey the research in the area of Web mining.  Suggest three Web mining categories Content, Structure, and Usage Mining And then situate some of the research with respect to these categories  Explored connection between Web mining categories and related agent paradigm 38Web Mining Research: A Survey

Exam Question #1  Question: Outline the main characteristics of Web information.  Answer:Web information is huge, diverse, and dynamic. 39Web Mining Research: A Survey

Exam Question #2  Question: Define Web Mining  Answer: Web mining refers to the overall process of discovering potentially useful and previously unknown information or knowledge from the Web data. 40Web Mining Research: A Survey

Exam Question #3  Question: What are the three main areas of interest for Web mining?  Answer: (1) Web Content (2) Web Structure (3) Web Usage 41Web Mining Research: A Survey

Thank you!