Presentation is loading. Please wait.

Presentation is loading. Please wait.

Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.

Similar presentations


Presentation on theme: "Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan."— Presentation transcript:

1 Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan

2 Page 2 OUTLINE Introduction Data mining Vs Web mining Web mining subtasks Challenges Taxonomy Web content mining Web structure mining Web usage mining Applications

3 Page 3 INTRODUCTION Nowadays, it has become necessary for users to utilise automated tools to find, extract, filter & evaluate desired information & resources. The target of search engines is only to discover the resources on the web.

4 Page 4 INTRODUCTION Needs for Web Mining Narrowly searching scope Low precision

5 Page 5 INTRODUCTION Other Approaches  Database approach (DB)  Information retrieval  Natural language processing (NLP)  Web document community

6 Page 6 WEB MINING DEFENITION Web mining refers to the overall process of discovering potentially useful and previously unknown information or knowledge from the Web data.

7 Page 7 DATA MINING WEB MINING  Extraction of useful patterns from data sources like databases, texts, web, images etc  Extracting relevant information hidden in Web-related data, like hypertext documents on web

8 Page 8 WEB MINING SUBTASKS  Resource finding  Information selection & preprocessing  Generalization  Analysis

9 Page 9 CHALLENGES  Search relevant information on web  Create knowledge  Personalization of Information  Learn patterns  Uniformity & standardisation

10 Page 10 CHALLENGES  Redundant Information  Noisy web  Monitoring changes  Sites providing Services  Privacy

11 Page 11 TAXONOMY Web Mining Web Structure Mining Web Content Mining Web Usage Mining Web Text Mining Web Multimedia Mining Personalized Usages Track Gen. Access Pattern Track Link Mining URL Mining Internal Structure Mining

12 Page 12 WEB CONTENT MINING Discovering useful information & Analyses the content Automatic process beyond keyword extraction Approaches to restructure document content Two groups of mining strategies

13 Page 13 WEB CONTENT MINING Agent based Approach  Intelligent search agents  Information filtering/categorization  Personalized web agents

14 Page 14 WEB CONTENT MINING Database Approach  Multilevel databases  Web query system

15 Page 15 WEB STRUCTURE MINING Discovering structure information from web Web graph : web pages as nodes & hyperlinks as edges

16 Page 16 WEB STRUCTURE MINING Two algorithms for handling of links  PageRank  HITS

17 Page 17 WEB STRUCTURE MINING PageRank  Metric for ranking hypertext documents  Depends on rank of pages pointing it  Iterative process

18 Page 18 WEB STRUCTURE MINING n : Number of nodes in graph Outdegree(q) : Number of hyperlinks on page q d : damping factor

19 Page 19 WEB STRUCTURE MINING HITS  Iterative algorithm  Identify topic hubs & authorities  Input : search results returned by traditional text indexing technique

20 Page 20 WEB STRUCTURE MINING  Assigns weight to hub based on authoritiveness  Outputs pages with largest hub & authority weights

21 Page 21 WEB USAGE MINING Extracting information from server logs Discover user access patterns of Web pages Decomposed into 3 subtasks Site Files Preprocessing Mining algorithms Pattern Analysis Raw logs User session file Rules, Patterns & Statistic Interesting Rules, Patterns & Statistic

22 Page 22 WEB USAGE MINING Preprocessing  Data cleaning  User identification  User sessions identification  Access path supplement  Transaction identification

23 Page 23 WEB USAGE MINING Pattern discovery  Statistical Analysis  Association Rules  Clustering analysis

24 Page 24 WEB USAGE MINING  Classification analysis  Sequential Pattern  Dependancy Modeling

25 Page 25 WEB USAGE MINING Pattern Analysis  Eliminates irrelevant rules or patterns  Extract intresting patterns

26 Page 26 APPLICATIONS  Personalized Services  Improve website design  System Improvement  Predicting trends  Carry out intelligent buisness

27 Page 27 PROS High trade volumes Classify threats & fight against Terrorism Establish better customer relationship Increase profitability

28 Page 28 CONS Invasion of Privacy Discrimination by controversial attributes

29 Page 29 CONCLUSION  Rapidly growing area  Promising area of future research

30 Page 30 REFERENCE [1] http://en.wikipedia.org/wiki/Web mining [2] http://www.galeas.de/webimining.html [3] Jaideep srivastava, Robert Cooley, Mukund Deshpande, Pan-Ning Tan, Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data, SIGKDD Explorations, ACM SIGKDD,Jan 2000. [4] Miguel Gomes da Costa Jnior,Zhiguo Gong, Web Structure Mining: An Introduction, Proceedings of the 2005 IEEE International Conference on Information Acquisition [5] R. Cooley, B. Mobasher, and J. Srivastava,Web Mining: Information and Pattern Discovery on the World Wide Web, ICTAI97 [6] Brijendra Singh, Hemant Kumar Singh, WEB DATA MINING RE- SEARCH: A SURVEY, 2010 IEEE [7] Mining the Web: discovering knowledge from hypertext data, Part 2 By Soumen Chakrabarti, 2003 edition [8] Web mining: applications and techniques By Anthony Scime

31 Page 31 WEB MINING Thank You


Download ppt "Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan."

Similar presentations


Ads by Google