Presentation is loading. Please wait.

Presentation is loading. Please wait.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Fuzzy integration of structure adaptive SOMs for web content.

Similar presentations


Presentation on theme: "Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Fuzzy integration of structure adaptive SOMs for web content."— Presentation transcript:

1 Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Fuzzy integration of structure adaptive SOMs for web content mining Advisor : Dr. Hsu Graduate : Chih-Ling Wang Authors : Kyung-Joong Kim, Sung-Bae Cho 2003 IEEE.

2 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 2 Outline Motivation Objective Introduction Feature selection SASOM Fuzzy integral Experimental results Conclusions Personal Opinion

3 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 3 Motivation Since exponentially growing web contains giga-bytes of web documents,users are faced with difficulty to find an appropriate web site.

4 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 4 Objective We need an ensemble of classifiers that estimate user’s preference using web content labeled by user as “like” or “dislike.”

5 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 5 Introduction Web mining can be classified into three components according to the sources of information to discover knowledge:web content mining, web usage mining and structure mining. In this paper, we focus on web content mining for creating user profile from the HTML documents and user’s preference record.

6 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 6 Introduction (cont.) In this paper, we have adopted the ensemble of SASOM’s to estimate user profile and each SASOM is trained independently using different feature sets. Three different feature ranking methods are used for this problem:information gain, TFIDF, and odds ratio.

7 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 7 Introduction (cont.) Fuzzy integral is a combination method to aggregate evidence from multiple sources using fuzzy measure and user’s subjective evaluation on classifiers’ relevance.

8 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 8 Introduction (cont.)

9 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 9 TFIDF is a general method that is frequently used in text retrieval. TFIDF is multiplication of term frequency and inverse document frequency. TFIDF does not use class information data to calculate the importance of features. Feature selection ─ TFIDF

10 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 10 E(W,S)=I(S) – [P(W=present)I(S w=present )+ P(W=absent)I(S w=absent )] S is a set of pages and E is expected information gain. E(W,S) means that the expectation of term W on the documents set S. Feature selection ─ Information gain A B C

11 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 11 Odds ratio is used when the goal is to make a good prediction for one of the class value. C 1 and C 2 are class labels of binary classification problem. n is a number of examples. X i is the probability variable such as the probability that term W is in text and class label of the text is C i. Feature selection ─ Odds ratio

12 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 12 TFIDF does not consider class values of documents when calculating the relevance of features while information gain uses class labels of documents. Odds ratio uses class labels of documents but they find useful features to classify only one specific class. Feature selection(cont.)

13 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 13 SOM is a neural network model that has property of preserving topology of map and is frequently used to visualize high-dimensional data to low-dimensional space. SASOM

14 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 14 Basic SOM fixes the structure of map and shows low performance in classification because each node has data that have different class labels. When a node has data that have different class labels, SASOM divides the node into a submap of 4 nodes. SASOM (cont.)

15 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 15 The basic procedure for SASOM is : (1) Start with a basic SOM(in our case, a 4x4 map in which each node is fully connected to all input nodes). (2) Train the current network with the Kohonen’s algorithm. (3) Calibrate the network using known I/O patterns to determine: (a) which node should be replaced with a submap of several nodes(in our case, 2x2 map),and (b) which node should be deleted. (4) Unless every node represents a unique class, go to step 2. SASOM (cont.)

16 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 16 Basic learning algorithm of SOM is as follow: SASOM (cont.)

17 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 17 A node representing more than one class is replaced with several nodes. Weights of child nodes of parent node are determined as follows: Nc is the neighborhood nodes of child and S is the number of Nc + 2 SASOM (cont.)

18 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 18 SASOM (cont.)

19 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 19 SASOM (cont.)

20 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 20 Fuzzy integral provides the importance of each classifier measured subjectively. Final decision is integrated from the evidence of classifier for each class and the importance of classifiers subjectively defined by users. Fuzzy integral

21 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 21 Fuzzy measure assigns a real value between 0 and 1 for each subset of X. g λ -fuzzy measures satisfy the following additional property. Fuzzy integral (cont.)

22 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 22 Fuzzy integral (cont.)

23 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 23 Syskill & Webert data have four different topics “Bands,” “Biomedical,” “Goats,” and “Sheep,” among which we use “Goats” and “Bands” data. “Goats” data have 70 HTML documents and “Bands” 61 HTML documents. Each document has the class label of “hot” or “cold.” Experimental results

24 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 24 Experimental results(cont.)

25 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 25 Bands Experimental results (cont.)

26 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 26 Goats Experimental results (cont.)

27 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 27 Experimental results (cont.)

28 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 28 Bands Experimental results (cont.)

29 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 29 Goats Experimental results (cont.)

30 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 30 Conclusions Fuzzy integral provides the method of measuring the importance of classifiers subjectively. SASOM can classify documents with high performance and visualize its map to understand internal mechanism. The proposed method can be effectively applied to web content mining for predicting user’s preference as user profile.

31 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 31 Personal Opinion We can combine this paper with ViSOM. One thing we have to pay more attention to is the categorical data.It’s because that the paper only uses the numerical data.


Download ppt "Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Fuzzy integration of structure adaptive SOMs for web content."

Similar presentations


Ads by Google