Presentation is loading. Please wait.

Presentation is loading. Please wait.

Mining real world data Web data. World Wide Web Hypertext documents –Text –Links Web –billions of documents –authored by millions of diverse people –edited.

Similar presentations


Presentation on theme: "Mining real world data Web data. World Wide Web Hypertext documents –Text –Links Web –billions of documents –authored by millions of diverse people –edited."— Presentation transcript:

1 Mining real world data Web data

2 World Wide Web Hypertext documents –Text –Links Web –billions of documents –authored by millions of diverse people –edited by no one in particular –distributed over millions of computers, connected by variety of media

3 Structured vs. Web data mining traditional data mining –data is structured and relational –well-defined tables, columns, rows, keys, and constraints. Web data –readily available data rich in features and patterns –spontaneous formation and evolution of topic-induced graph clusters hyperlink-induced communities

4 History of Hypertext Citation, –Hyperlinking Ramayana, Mahabharata, Talmud –branching, non-linear discourse, nested commentary, Dictionary, encyclopedia –self-contained networks of textual nodes –joined by referential links

5 Three Broad Categories of Web Mining Web content mining –Application of data-mining techniques Web structure mining –Operates on the Web’s hyperlink structure Web usage mining –Analyzes user interaction with Web server –Include logs, database transaction, … –Privacy concern

6 Web Context and Structure Mining Web as a Database Document Classification Hubs and Authorities Clever: Ranking by Content Identifying Web Communities

7 Web as a Database Placing a layer of abstraction containing some semantic information on top of semistructured Web Query the Web as a database –Topic, author, creation date, and so on WebLog and WebSQL Recent work: Semantic Web

8 Document Classification Roots –Machine learning –Pattern Recognition –Text Analysis Topic Aggregation Google News –http://news.google.com

9 Semantic Web Mining Semantic Web –Next generation Web –Semantically rich language Web Ontology Language –More Complex than Web-as-database –Fit Web mining –More and more benefits


Download ppt "Mining real world data Web data. World Wide Web Hypertext documents –Text –Links Web –billions of documents –authored by millions of diverse people –edited."

Similar presentations


Ads by Google