Presentation is loading. Please wait.

Presentation is loading. Please wait.

Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen Author : Juan D.Velasquez Richard Weber Hiroshi Yasuda 國立雲林科技大學 National.

Similar presentations


Presentation on theme: "Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen Author : Juan D.Velasquez Richard Weber Hiroshi Yasuda 國立雲林科技大學 National."— Presentation transcript:

1 Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen Author : Juan D.Velasquez Richard Weber Hiroshi Yasuda 國立雲林科技大學 National Yunlin University of Science and Technology A Methodology to Find Web Site Keywords Proceedings of the 2004 IEEE International Conference on e-Technology,e-Commerce and e- Service, IEEE 2004

2 Intelligent Database Systems Lab Outline Motivation Objective Introduction Related Work WUM,WCM,TFIDF,cosine,SOM Experimental Conclusions Personal Opinion Review N.Y.U.S.T. I.M.

3 Intelligent Database Systems Lab N.Y.U.S.T. I.M. Motivation What in many cases makes the difference between success and failure of e-business is the potential of the respective web site to attract and retain visitors.

4 Intelligent Database Systems Lab Objective We propose a method to determine the set of the most important words in a web site from the visitor’s point of view. This is done combining usage information with web page content arriving at a set of keywords determined implicitly by the site’s visitors N.Y.U.S.T. I.M.

5 Intelligent Database Systems Lab Introduction We use web page content, especially free text together with pattern from web usage as input for clustering of visitor sessions. Web usage mining Web content mining Analyzing the pages that belong to each one of the clusters found, we can determine the most important words for each cluster and consequently for each type of visitor. Cluster algorithm, in order to find groups of similar visitor sessions. N.Y.U.S.T. I.M.

6 Intelligent Database Systems Lab Related Work N.Y.U.S.T. I.M. These are categorized in three sub areas: Web Structure Mining (WSM) Web Content Mining (WCM) Web Usage mining (WUM) In this paper, we propose a combination of WCM and WUM techniques.

7 Intelligent Database Systems Lab Web Context Mining N.Y.U.S.T. I.M. The goal is to find useful information from web contexts. TFIDF:R:words Q:documents

8 Intelligent Database Systems Lab Web Usage Mining N.Y.U.S.T. I.M. Web Usage mining: The goal is pattern discovery using different kinds of data mining techniques, such as statistical, association, clustering, classification…

9 Intelligent Database Systems Lab Combining WUM and WCM N.Y.U.S.T. I.M. Applying WUM we can understand the visitor browsing behavior, but we cannot discover which content is interesting for the visitor. A similarity measure has been suggested that allows to compare the behavior of different visitors, through the analysis of visitor preferences.

10 Intelligent Database Systems Lab Web page processing N.Y.U.S.T. I.M. HTML Tags Stop words Word stemming

11 Intelligent Database Systems Lab TFIDF Let R be the number of different words in a web site and Q be the number of its pages. Based on traditional method, we propose a variation incorporating the influence of special words, i.e., words that have different levels of importance for a visitor. ex : italic font, a referrer word, words associated to page title… N.Y.U.S.T. I.M.

12 Intelligent Database Systems Lab Definition 1 N.Y.U.S.T. I.M.

13 Intelligent Database Systems Lab Definition 2 From the visitor behavior vector we want to select the most important pages, assuming the important being correlated to the relative time spent on each page. N.Y.U.S.T. I.M.

14 Intelligent Database Systems Lab Definition 3 N.Y.U.S.T. I.M.

15 Intelligent Database Systems Lab The first element is indicating the visitor’s interest in the pages visited. The second element is the distance between pages. N.Y.U.S.T. I.M.

16 Intelligent Database Systems Lab Clustering visitor sessions We use a clustering algorithm in order to find groups of similar visitor sessions. Base on this information we determine the most important words for each cluster. N.Y.U.S.T. I.M.

17 Intelligent Database Systems Lab Identifying web site keywords We propose the following method to determine the most important keywords and their importance in each cluster. A measure (geometric mean) used in order to calculate the importance of each word relative to each cluster. N.Y.U.S.T. I.M.

18 Intelligent Database Systems Lab Experimental N.Y.U.S.T. I.M.

19 Intelligent Database Systems Lab Experimental N.Y.U.S.T. I.M.

20 Intelligent Database Systems Lab Experimental N.Y.U.S.T. I.M.

21 Intelligent Database Systems Lab Experimental N.Y.U.S.T. I.M.

22 Intelligent Database Systems Lab Experimental N.Y.U.S.T. I.M.

23 Intelligent Database Systems Lab Experimental N.Y.U.S.T. I.M.

24 Intelligent Database Systems Lab Concluding We proposed a way to find the most important pages for the visitor, assuming that the time spent in each page is proportional to the visitor interest. Finding out the most important pages visited, and the time spent in each one of them, ordered by time. The similarity introduced, can be very useful to increase the knowledge about the visitor preferences in the web,to identify keywords that attract and retain visitors. N.Y.U.S.T. I.M.

25 Intelligent Database Systems Lab Personal Opinion N.Y.U.S.T. I.M.

26 Intelligent Database Systems Lab Review WUM WCM TFIDF Important pages Vector Clustering Identify the important keywords. N.Y.U.S.T. I.M.


Download ppt "Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen Author : Juan D.Velasquez Richard Weber Hiroshi Yasuda 國立雲林科技大學 National."

Similar presentations


Ads by Google