Information Overload on the Internet: The Web Mining Techniques Approach UNIVERSITI UTARA MALAYSIA COLLEGE OF ARTS AND SCIENCES RESEARCH METHODOLOGY (SZRZ6014)

Slides:

Advertisements

Similar presentations

Advertisements

Web Mining Research: A Survey Authors: Raymond Kosala & Hendrik Blockeel Presenter: Ryan Patterson April 23rd 2014 CS332 Data Mining pg 01.

Web Search - Summer Term 2006 III. Web Search - Introduction (Cont.) (c) Wolfgang Hürst, Albert-Ludwigs-University.

WebMiningResearch ASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007.

Models for Information Retrieval Mainly used in science and research, (probably?) less often in real systems But: Research results have significance for.

Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India.

Mastering the Internet, XHTML, and JavaScript Chapter 7 Searching the Internet.

Web Mining Research: A Survey

Web Search – Summer Term 2006 III. Web Search - Introduction (Cont.) - Jeff Dean, Google's Systems Lab:

LinkSelector: A Web Mining Approach to Hyperlink Selection for Web Portals Xiao Fang University of Arizona 10/18/2002.

FACT: A Learning Based Web Query Processing System Hongjun Lu, Yanlei Diao Hong Kong U. of Science & Technology Songting Chen, Zengping Tian Fudan University.

Web Mining Research: A Survey

Web Information Retrieval and Extraction Chia-Hui Chang, Associate Professor National Central University, Taiwan Sep. 16, 2005.

WebMiningResearch ASurvey Web Mining Research: A Survey By Raymond Kosala & Hendrik Blockeel, Katholieke Universitat Leuven, July 2000 Presented 4/18/2002.

Web Mining Research: A Survey

WebMiningResearchASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007 Revised.

Exercise 1: Bayes Theorem (a). Exercise 1: Bayes Theorem (b) P (b 1 | c plain ) = P (c plain ) P (c plain | b 1 ) * P (b 1 )

Information Retrieval

Overview of Web Data Mining and Applications Part I

FALL 2012 DSCI5240 Graduate Presentation By Xxxxxxx.

Temporal Event Map Construction For Event Search Qing Li Department of Computer Science City University of Hong Kong.

Introduction The large amount of traffic nowadays in Internet comes from social video streams. Internet Service Providers can significantly enhance local.

Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.

Research paper: Web Mining Research: A survey SIGKDD Explorations, June Volume 2, Issue 1 Author: R. Kosala and H. Blockeel.

Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.

Tennessee Technological University1 The Scientific Importance of Big Data Xia Li Tennessee Technological University.

CONCLUSION & FUTURE WORK Normally, users perform triage tasks using multiple applications in concert: a search engine interface presents lists of potentially.

CS523 INFORMATION RETRIEVAL COURSE INTRODUCTION YÜCEL SAYGIN SABANCI UNIVERSITY.

Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.

Chapter 1 Introduction to Data Mining

Introduction to Web Mining Spring What is data mining? Data mining is extraction of useful patterns from data sources, e.g., databases, texts, web,

Automatically Extracting Data Records from Web Pages Presenter: Dheerendranath Mundluru

UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.

WebMining Web Mining By- Pawan Singh Piyush Arora Pooja Mansharamani Pramod Singh Praveen Kumar 1.

Google News Personalization: Scalable Online Collaborative Filtering

Data Mining for Web Intelligence Presentation by Julia Erdman.

Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.

1 1 COMP5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified based on the slides provided by Lawrence Page, Sergey Brin, Rajeev Motwani.

Web Information Retrieval Prof. Alessandro Agostini 1 Context in Web Search Steve Lawrence Speaker: Antonella Delmestri IEEE Data Engineering Bulletin.

Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)

Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.

Navigation Aided Retrieval Shashank Pandit & Christopher Olston Carnegie Mellon & Yahoo.

CIS750 – Seminar in Advanced Topics in Computer Science Advanced topics in databases – Multimedia Databases V. Megalooikonomou Link mining ( based on slides.

© Prentice Hall1 DATA MINING Web Mining Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University Companion slides.

UNIVERSITY UTARA MALAYSIA COLLEGE OF ARTS & SCIENCES.

Chapter 8: Web Analytics, Web Mining, and Social Analytics

WebMiningResearchASurvey Web Mining Research: A Survey Authors: Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Computer Science Department University.

WEB STRUCTURE MINING SUBMITTED BY: BLESSY JOHN R7A ROLL NO:18.

Data mining in web applications

MINING DEEP KNOWLEDGE FROM SCIENTIFIC NETWORKS

Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance Hello everyone,

DATA MINING Introductory and Advanced Topics Part III – Web Mining

MIS2502: Data Analytics Advanced Analytics - Introduction

School of Computer Science & Engineering

Methods and Apparatus for Ranking Web Page Search Results

Web Mining Ref:

Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.

Submitted By: Usha MIT-876-2K11 M.Tech(3rd Sem) Information Technology

Exploring Scholarly Data with Rexplore

Data Warehousing and Data Mining

Boštjan Kožuh Statistical Office of the Republic of Slovenia,

Exploratory search: New name for an old hat?

Data Mining Chapter 6 Search Engines

Web Mining Department of Computer Science and Engg.

AGMLAB Information Technologies

Web Mining Research: A Survey

Information Retrieval and Web Design

Information Retrieval and Web Design

Promising “Newer” Technologies to Cope with the

Presentation transcript:

Information Overload on the Internet: The Web Mining Techniques Approach UNIVERSITI UTARA MALAYSIA COLLEGE OF ARTS AND SCIENCES RESEARCH METHODOLOGY (SZRZ6014) Prepared by : Ahmed Ghazi Hameed (812517) Prepared for Dr. Farzana binti Kabir Ahmad

2 2 Overview  Introduction.  Background to the Study  Problem Statement.  Research Questions.  Research Objective.  Significance of the Research.  Literature Review.  The Architecture of the Web Log Mining.  Related Work.  Methodology.  Summary.  References

Introduction The internet has presented new opportunities to the world. This has made the internet to be popular more than ever and has become a necessity for everyone. The information on the Web is growing like never before. This growth is rapid and significant and has become a challenge to the users due to information overload an eventual drowning of the users that occurs with time. This has been caused by the World Wide Web which has presented a powerful platform where information can be stored, disseminated, and retrieved. Also, this platform helps to mine useful knowledge.

The Internet consists of varieties of data. These data are stored in a big repository. The big data repository also consists of large amount of unseen information knowledge. The unseen information knowledge can be discovered by the use of data mining. The approaches that are commonly used for database research for information retrieval are intelligent computing and computational intelligence. Background to the Study

Problem Statement The problem of information overload has brought about the challenge of how to find the relevant information. In finding particular information on the Internet, it is possible for the user to either make use of the search engine or employ the use of a search assistant or they can decide to browse Web documents directly. This is the problem; a user will always type in several keywords as a query into the search engine, then the search engine will then return several numbers of pages based on their ranking that is relevant to the query.

6 Research Questions The following research questions will be answered to achieve the objectives of this research work. 1- In what ways can the latent semantic factor space be revealed and discovered? 2- How will the Web pages based on their usage-oriented similarities be grouped? 3- Why is it important to predict Web users’ task preference distributions for Web recommendations?

7 Research Objective The main objective is to develop ways how the needed information can be accurately found on the internet in the midst of information overload. The objectives to be achieved are as follows: 1- To discover the latent semantic factor space and Web user preference in Web search by Probability Latent Semantic Analysis model. 2- The Web pages will be grouped based on the usage- oriented similarities. 3- To forecast the preference of the Web user’s task and Web recommendations. The usage pattern knowledge will be used for Web recommendations.

8 Significance of the Research The research will focus on the ways to help Web users with the exact information needed during the Web information retrieval. This will be achieved by improving the performance of retrieval system in Web applications and Web presentation. This will be achieved by employing and developing Web data mining paradigms. In addition, by capturing the interest of the Web user or pattern this will help to facilitate better understanding of how users navigational on the Web.

9 Literature Review. The earlier chapter gives the understanding of the topic being discussed and the aims and objects to be achieved in this research. This chapter begins with explorations of characteristics of web data. The chapter explains the concept of searching the web and the issue of information overload. Then, the architecture of the web data mining was discussed to give the understanding of the web data. Both web mining techniques and web capturing we evaluated in this chapter. There are unique features when the data on the web is compared to the data that is available in any conventional database management systems. The characteristics of the data on the Web are huge in term s of the size.

10 The Architecture of the Web Log Mining The probability inference approach is used in mining the Web usage which is used for Web page grouping and profiling the users. The approaches are useful to reveal the implicit associations between the users of the web and the pages visited. At the same time, it is capable to capture latent task space. This corresponds to the users’ mode of navigation and the functionality of the Web site.

11 Related Work The two types of clustering methods that are performed on usage data in the field of Web usage mining are i. Web page clustering and ii. Web transaction clustering (8). Web page clustering has been applied in various ways. It has been used in the adaptive Web site and it has proved to be successful. One example of the application is the PageGather. PageGather is an algorithm of Web page clustering (46, 81). The PageGather algorithm is used to synthesize the index pages. These index pages do not exist before. This is achieved by sorting Web pages according to different groups.

12 Methodology The methodology will focus on discovering the usage of Web pattern through applying the Web usage mining. This will help to discover the usage knowledge will then be applied to present the Web users with personalized Web contents. This is a form of web recommendation. There is a need to establish a mathematical framework which will help in analysing Web user behaviour. The framework will be referred to as the usage data analysis model. The framework model will in turn help to categorize the observed Web log files as they occur together. Then the mathematical model will show the understanding between the Web pages and the users. The mathematical model will be based on matrix of the usage data schema.

13 Methodology After creating the data model, the algorithms that will detect mutual associations between Web pages and the users will be done. The access data that is hidden in the Web log data of the users sessions will be uncover. The three types of latent analytical techniques that are based on statistical models will be used. The techniques are traditional Latent Semantic Indexing, Probabilistic Latent Semantic Analysis, and Latent Dirichlet Allocation model. This will help to show the mutual relationships between Web objects, such as Web sites and the user sessions. The technique will uncover the Web page categories and the pattern of usage from the Web log files.

14 Summary In this chapter the methodology to discover the web pattern by applying the web usage mining was discussed. Also, the need to have a mathematical framework was established. Also, the data analysis model that will help categorize the web log files as they occur together was discussed. The three types of latent analytical techniques that are based on statistical models were also mentioned in this chapter.

15 References Agarwal, R., C. Aggarwal, and V. Prasad, A Tree Projection Algorithm for Generation ofFrequent Itemsets. Journal of Parallel and Distributed Computing (3): p Agrawal, R. and R. Srikant. Mining Sequential Patterns. in Proceedings of the InternationalConference on Data Engineering (ICDE). 1995, p. 3-14, Taipei, Taiwan: IEEEComputer Society Press. Asano, Y., et al. Finding Neighbor Communities in the Web Using Inter-site Graph. in Proc.of the 14th International Conference on Database and Expert Systems Applications(DEXA'03). 2003, p , Prague, Czech Republic. Baeza-Yates, R. and B. Ribeiro-Neto, Modern Information Retrieval. 1999: Addison Wesley,ACM Press. Borodin, A., et al. Finding Authorities and Hubs from Hyperlink Structures on the WorldWide Web. in Proceedings of the 10th International World Wide Web Conference.2001, p , Hong Kong, China. Brin, S. and L. Page, The PageRank Citation Ranking: Bringing Order to the Web ( Büchner, A.G. and M.D. Mulvenna, Discovering Internet Marketing Intelligence throughOnline Analytical Web Usage Mining. SIGMOD Record, (4): p Chakraborty, S., Data mining for hypertext: A Tutorial Survey. ACM SIGKDD ExplorationsNewsletter, (2): p

16 شكراً جزيلاً