1 A Fuzzy Logic Framework for Web Page Filtering Authors : Vrettos, S. and Stafylopatis, A. Source : Neural Network Applications in Electrical Engineering,

Slides:



Advertisements
Similar presentations
Text Categorization.
Advertisements

Traditional IR models Jian-Yun Nie.
Chapter 5: Introduction to Information Retrieval
INFO624 - Week 2 Models of Information Retrieval Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University.
Modern information retrieval Modelling. Introduction IR systems usually adopt index terms to process queries IR systems usually adopt index terms to process.
Albert Gatt Corpora and Statistical Methods Lecture 13.
Basic IR: Modeling Basic IR Task: Slightly more complex:
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
IR Models: Overview, Boolean, and Vector
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) Classic Information Retrieval (IR)
ISP 433/533 Week 2 IR Models.
IR Models: Structural Models
Models for Information Retrieval Mainly used in science and research, (probably?) less often in real systems But: Research results have significance for.
Mehran Sahami Timothy D. Heilman A Web­based Kernel Function for Measuring the Similarity of Short Text Snippets.
Project Management: The project is due on Friday inweek13.
Vector Space Model Any text object can be represented by a term vector Examples: Documents, queries, sentences, …. A query is viewed as a short document.
IR Models: Latent Semantic Analysis. IR Model Taxonomy Non-Overlapping Lists Proximal Nodes Structured Models U s e r T a s k Set Theoretic Fuzzy Extended.
Vector Space Model CS 652 Information Extraction and Integration.
1 Discovering Unexpected Information from Your Competitor’s Web Sites Bing Liu, Yiming Ma, Philip S. Yu Héctor A. Villa Martínez.
Recommender systems Ram Akella November 26 th 2008.
Chapter 5: Information Retrieval and Web Search
1/16 Final project: Web Page Classification By: Xiaodong Wang Yanhua Wang Haitang Wang University of Cincinnati.
1 Text Categorization  Assigning documents to a fixed set of categories  Applications:  Web pages  Recommending pages  Yahoo-like classification hierarchies.
Social Networks Extended Fuzzy Adjacency Matrix. Outlines Introduction Social Networks Adjacency Matrix Fuzzy Adjacency Matrix Our Work Extended Fuzzy.
Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
The identification of interesting web sites Presented by Xiaoshu Cai.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
Incident Threading for News Passages (CIKM 09) Speaker: Yi-lin,Hsu Advisor: Dr. Koh, Jia-ling. Date:2010/06/14.
No. 1 Classification and clustering methods by probabilistic latent semantic indexing model A Short Course at Tamkang University Taipei, Taiwan, R.O.C.,
1 CS 430: Information Discovery Lecture 12 Extending the Boolean Model.
Distributed Information Retrieval Using a Multi-Agent System and The Role of Logic Programming.
Pseudo-supervised Clustering for Text Documents Marco Maggini, Leonardo Rigutini, Marco Turchi Dipartimento di Ingegneria dell’Informazione Università.
The Perceptron. Perceptron Pattern Classification One of the purposes that neural networks are used for is pattern classification. Once the neural network.
1 Automatic Classification of Bookmarked Web Pages Chris Staff Second Talk February 2007.
So Far……  Clustering basics, necessity for clustering, Usage in various fields : engineering and industrial fields  Properties : hierarchical, flat,
Ranking in Information Retrieval Systems Prepared by: Mariam John CSE /23/2006.
Binxing Jiao et. al (SIGIR ’10) Presenter : Lin, Yi-Jhen Advisor: Dr. Koh. Jia-ling Date: 2011/4/25 VISUAL SUMMARIZATION OF WEB PAGES.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
Personalized Course Navigation Based on Grey Relational Analysis Han-Ming Lee, Chi-Chun Huang, Tzu- Ting Kao (Dept. of Computer Science and Information.
Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,
1 Opinion Retrieval from Blogs Wei Zhang, Clement Yu, and Weiyi Meng (2007 CIKM)
21/11/20151Gianluca Demartini Ranking Clusters for Web Search Gianluca Demartini Paul–Alexandru Chirita Ingo Brunkhorst Wolfgang Nejdl L3S Info Lunch Hannover,
Algorithmic Detection of Semantic Similarity WWW 2005.
Search Tools and Search Engines Searching for Information and common found internet file types.
Vector Space Models.
1 Information Retrieval LECTURE 1 : Introduction.
Information Retrieval
The Boolean Model Simple model based on set theory
Universal fuzzy system representation with XML Authors : Chris Tseng, Wafa Khamisy, Toan Vu Source : Computer Standards & Interfaces, Volume 28, Issue.
Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Longzhuang Li, Yi Shang, Wei Zhang 2002.ACM. Improvement of HITS-based Algorithms.
Set Theoretic Models 1. IR Models Non-Overlapping Lists Proximal Nodes Structured Models Retrieval: Adhoc Filtering Browsing U s e r T a s k Classic Models.
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
Divided Pretreatment to Targets and Intentions for Query Recommendation Reporter: Yangyang Kang /23.
Introduction n IR systems usually adopt index terms to process queries n Index term: u a keyword or group of selected words u any word (more general) n.
MapReduce and the New Software Stack. Outline  Algorithm Using MapReduce  Matrix-Vector Multiplication  Matrix-Vector Multiplication by MapReduce 
Fuzzy Logic Artificial Intelligence Chapter 9. Outline Crisp Logic Fuzzy Logic Fuzzy Logic Applications Conclusion “traditional logic”: {true,false}
General Architecture of Retrieval Systems 1Adrienn Skrop.
1 Text Categorization  Assigning documents to a fixed set of categories  Applications:  Web pages  Recommending pages  Yahoo-like classification hierarchies.
University Of Seoul Ubiquitous Sensor Network Lab Query Dependent Pseudo-Relevance Feedback based on Wikipedia 전자전기컴퓨터공학 부 USN 연구실 G
No. 1 Classification Methods for Documents with both Fixed and Free Formats by PLSI Model* 2004International Conference in Management Sciences and Decision.
Lecture 1: Introduction and the Boolean Model Information Retrieval
Deep Compositional Cross-modal Learning to Rank via Local-Global Alignment Xinyang Jiang, Fei Wu, Xi Li, Zhou Zhao, Weiming Lu, Siliang Tang, Yueting.
Representation of documents and queries
Text Categorization Assigning documents to a fixed set of categories
Ying Dai Faculty of software and information science,
Ying Dai Faculty of software and information science,
Ying Dai Faculty of software and information science,
Berlin Chen Department of Computer Science & Information Engineering
Topic: Semantic Text Mining
Presentation transcript:

1 A Fuzzy Logic Framework for Web Page Filtering Authors : Vrettos, S. and Stafylopatis, A. Source : Neural Network Applications in Electrical Engineering, NEUREL ' th Seminar on, Sept. 2002,Page(s): Present by : Chung - Hsun Hsieh Members: Chung - Hsun Hsieh & Wen - Lin Lee Date:2003/12/23

2 Outline Introduction Textual Retrieval through Rocchio ’ s Algorithm Fuzzy logic Framework Conclusion

3 Introduction This paper propose a framework that uses fuzzy logic to combine available text classifiers in a user friendly and common sense manner. Each classifier is considered to be a membership function that provides the membership degree of a certain page to a class.User use a logical rule combining the available classes,e.g.(class1 AND class2) OR class3.

4 Textual Retrieval through Rocchio’s Algorithm(1/4) When the case is the retrieval of textual objects, like text documents or HTML pages, these objects are usually represented as vectors in the vector space language model(VSM). A document collection of d documents and t terms is represented as a t*d term-by-document matrix A. the columns of A are the document vectors the rows of A are the term vectors.

5 Textual Retrieval through Rocchio’s Algorithm(2/4) The elements of the matrix A are often weighted by a two- components transformation - :the global weight of the i-term in the collection - :the local weight of the i-term in the j-document Document j Term i

6 Textual Retrieval through Rocchio’s Algorithm(3/4) A query is represented as a vector in the same vector space. A document is retrieved when it contains one or more terms of this query vector. The retrieved documents are then sorted according to the cosine similarity measure between the document and the query vector: For j= i,….,d where the Euclidean vector norm is defined as for vector x.

7 Textual Retrieval through Rocchio’s Algorithm(4/4) Rocchio’s algorithm learns a model for every category by combining document vectors into a prototype vector which may be the sum or the average of the documents that belong to the category: -NC:the number of documents that belong to category set C - :document vector - :prototype vector

8 Fuzzy Logic(1/3) Let X be a space of objects and x be an element of X. A classical set A is defined as a collection of element x € X, such that each x can either belong or not belong to the set A. We can represent a classical set A by a set of ordered pairs (x,0) or (x,1), which indicates that x  A or x€A.

9 Fuzzy Logic(2/3) A fuzzy set is defined as a set of elements that may belong to the set by a membership degree value between 0 and 1. A fuzzy set A in X is defined as a set of ordered pairs A={(x,ų(x)),x€A},where ų(x) is called membership function(MF) for the fuzzy set A.

10 Fuzzy Logic(3/3) The union, intersection, complement of two fuzzy sets A and B is a fuzzy set C,denote C=A  B or C=A OR B, C=A  B or C=A AND B,  A,whose MF is related to those of A and B by ųC(x)=max(ųA(x), ųB(x))= ųA(x)  ųB(x)…….(1) ųC(x)=min (ųA(x), ųB(x))= ųA(x)  ųB(x)…….(2) ų  A(x)=1-ųA(x)…………………………………………(3)

11 Framework(1/2) If is the prototype vector of the topic and is a web page, then the membership function of the topic is defined to be: We are able to use ųC as a membership function due to the fact that 0  ųC  1, because both and are positive vector.

12 Framework(2/2) Once we have related each available topic with its corresponding membership function,we are able to formulate and evaluate logical expressions of the form e.g. (Topic1 AND Topic2) OR NOT (Topic3) using fuzzy logic operators, eq. (1)(2)(3)

13 Interface

14 Conclusions This paper has presented a framework that makes possible the use of fuzzy logic in web filtering. Based on this, an interface for web filtering has been materialized using the directory structure of Open Directory Project. Through the interface, the user formulates fuzzy rules using the available categories resulting in different orderings of the retrieved sets.

15 ~The End~