A Scalable Self-organizing Map Algorithm for Textual Classification: A Neural Network Approach to Thesaurus Generation Dmitri G. Roussinov Department of.

Slides:



Advertisements
Similar presentations
Negative Selection Algorithms at GECCO /22/2005.
Advertisements

Self Organization of a Massive Document Collection
Unsupervised learning. Summary from last week We explained what local minima are, and described ways of escaping them. We investigated how the backpropagation.
Self Organization: Competitive Learning
5/16/2015Intelligent Systems and Soft Computing1 Introduction Introduction Hebbian learning Hebbian learning Generalised Hebbian learning algorithm Generalised.
Kohonen Self Organising Maps Michael J. Watts
Artificial neural networks:
Unsupervised Networks Closely related to clustering Do not require target outputs for each input vector in the training data Inputs are connected to a.
Self-Organizing Map (SOM). Unsupervised neural networks, equivalent to clustering. Two layers – input and output – The input layer represents the input.
X0 xn w0 wn o Threshold units SOM.
Self Organizing Maps. This presentation is based on: SOM’s are invented by Teuvo Kohonen. They represent multidimensional.
1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.
Slides are based on Negnevitsky, Pearson Education, Lecture 8 Artificial neural networks: Unsupervised learning n Introduction n Hebbian learning.
CONTENT BASED FACE RECOGNITION Ankur Jain 01D05007 Pranshu Sharma Prashant Baronia 01D05005 Swapnil Zarekar 01D05001 Under the guidance of Prof.
Presented by Zeehasham Rasheed
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Hazırlayan NEURAL NETWORKS Radial Basis Function Networks I PROF. DR. YUSUF OYSAL.
Introduction to Neural Networks John Paxton Montana State University Summer 2003.
November 24, 2009Introduction to Cognitive Science Lecture 21: Self-Organizing Maps 1 Self-Organizing Maps (Kohonen Maps) In the BPN, we used supervised.
Neural Networks Lecture 17: Self-Organizing Maps
Lecture 09 Clustering-based Learning
Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.
 C. C. Hung, H. Ijaz, E. Jung, and B.-C. Kuo # School of Computing and Software Engineering Southern Polytechnic State University, Marietta, Georgia USA.
CHAPTER 12 ADVANCED INTELLIGENT SYSTEMS © 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang.
Machine Learning. Learning agent Any other agent.
Unsupervised Learning and Clustering k-means clustering Sum-of-Squared Errors Competitive Learning SOM Pre-processing and Post-processing techniques.
CZ5225: Modeling and Simulation in Biology Lecture 5: Clustering Analysis for Microarray Data III Prof. Chen Yu Zong Tel:
Self-organizing Maps Kevin Pang. Goal Research SOMs Research SOMs Create an introductory tutorial on the algorithm Create an introductory tutorial on.
Artificial Neural Networks Dr. Abdul Basit Siddiqui Assistant Professor FURC.
-Artificial Neural Network- Chapter 9 Self Organization Map(SOM) 朝陽科技大學 資訊管理系 李麗華 教授.
Artificial Neural Network Unsupervised Learning
Self organizing maps 1 iCSC2014, Juan López González, University of Oviedo Self organizing maps A visualization technique with data dimension reduction.
Community Architectures for Network Information Systems
A two-stage approach for multi- objective decision making with applications to system reliability optimization Zhaojun Li, Haitao Liao, David W. Coit Reliability.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A Comparison of SOM Based Document Categorization Systems.
Kohonen Mapping and Text Semantics Xia Lin College of Information Science and Technology Drexel University.
1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.
Self Organization of a Massive Document Collection Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Teuvo Kohonen et al.
Machine Learning Neural Networks (3). Understanding Supervised and Unsupervised Learning.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Extracting meaningful labels for WEBSOM text archives Advisor.
Hierarchical Clustering of Gene Expression Data Author : Feng Luo, Kun Tang Latifur Khan Graduate : Chien-Ming Hsiao.
Neural Networks - Lecture 81 Unsupervised competitive learning Particularities of unsupervised learning Data clustering Neural networks for clustering.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Externally growing self-organizing maps and its application to database visualization and exploration.
1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.
381 Self Organization Map Learning without Examples.
CUNY Graduate Center December 15 Erdal Kose. Outlines Define SOMs Application Areas Structure Of SOMs (Basic Algorithm) Learning Algorithm Simulation.
Unsupervised Learning Networks 主講人 : 虞台文. Content Introduction Important Unsupervised Learning NNs – Hamming Networks – Kohonen’s Self-Organizing Feature.
Semiconductors, BP&A Planning, DREAM PLAN IDEA IMPLEMENTATION.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A self-organizing map for adaptive processing of structured.
Self-Organizing Maps (SOM) (§ 5.5)
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
Non-parametric Methods for Clustering Continuous and Categorical Data Steven X. Wang Dept. of Math. and Stat. York University May 13, 2010.
Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods.
Supervised Learning – Network is presented with the input and the desired output. – Uses a set of inputs for which the desired outputs results / classes.
A Self-organizing Semantic Map for Information Retrieval Xia Lin, Dagobert Soergel, Gary Marchionini presented by Yi-Ting.
Big data classification using neural network
Self-Organizing Network Model (SOM) Session 11
Semi-Supervised Clustering
Data Mining, Neural Network and Genetic Programming
Unsupervised Learning Networks
Other Applications of Energy Minimzation
Lecture 22 Clustering (3).
CSE572, CBS598: Data Mining by H. Liu
CSE572, CBS572: Data Mining by H. Liu
CSE572: Data Mining by H. Liu
Feature mapping: Self-organizing Maps
Unsupervised Networks Closely related to clustering
Presentation transcript:

A Scalable Self-organizing Map Algorithm for Textual Classification: A Neural Network Approach to Thesaurus Generation Dmitri G. Roussinov Department of MIS, Karl Eller Graduate School of Management, University of Arizona, McClelland Hall 430ww, Tucson, Arizona, 85721, Hsinchun Chen Department of MIS, Karl Eller Graduate School of Management, University of Arizona, McClelland Hall 430ww, Tucson, Arizona, 85721, 報告人:李文雄

報告大綱 Abstract Introduction Textual Classification : Literature Review A Neural Network Approach to Automatic Thesaurus Generation A Scaleable Self-organizing Map Algorithm for Textual Classification Benchmarking Experiments Conclusion and Discussion

Abstract This paper presents research in which we sought to develop a scaleable textual classification and categorization system based on the Kohonen's self-organizing feature map (SOM) algorithm. In our paper, we show how self-organization can be used for automatic thesaurus generation.

Introduction One of the major drawbacks of neural network computation, including the SOM algorithm, has been its computational complexity. The computational complexity of the SOM algorithm has rendered it infeasible for large- scale applications (1-10 GBs, millions of documents, e.g., the entire searchable Internet WWW homepages).

Textual Classification : Literature Review The Serial, Statistical Approach The Parallel, Neural Network Approach The Self-organizing Map Approach

Textual Classification : Literature Review The Self-organizing Map Approach

Self-organizing Map Kohonen based his neural network on the associative neural properties of the brain. This network contains two layers of nodes - an input layer and a mapping (output) layer in the shape of a two-dimensional grid. The input layer acts as a distribution layer. The number of nodes in the input layer is equal to the number of features or attributes associated with the input.

Self-organizing Map Each node of the mapping layer also has the same number of features as there are input nodes. Thus, the input layer and each node of the mapping layer can be represented as a vector which contains the number of features of the input. The network is fully connected in that every mapping node is connected to every input node. The mapping nodes are initialized with random numbers.

Self-organizing Map Each actual input is compared with each node on the mapping grid. The ``winning'' mapping node is defined as that with the smallest Euclidean distance between the mapping node vector and the input vector. The input thus maps to a given mapping node. The value of the mapping node vector is then adjusted to reduce the Euclidean distance. In addition, all of the neighboring nodes of the winning node are adjusted proportionally.

Self-organizing Map In this way, the multi-dimensional (in terms of features) input nodes are mapped to a two- dimensional output grid. After all of the input is processed (usually after hundreds or thousands of repeated presentations), the result should be a spatial organization of the input data organized into clusters of similar (neighboring) regions.

A Neural Network Approach to Automatic Thesaurus Generation Literature Review

A Neural Network Approach to Automatic Thesaurus Generation SSOM as a Thesaurus Tool – Kohonen's SOM is very well known as a clustering and dimension reduction tool. – Clustering can be used for categorization of input vectors. Dimension reduction can be used for visualization and for reducing information in order to ease search, storage or processing of another kind.

A Neural Network Approach to Automatic Thesaurus Generation SSOM as a Thesaurus Tool – In Kohonen's implementation of SOM for categorizing Internet documents (WEBSOM) there are no automatically created categories. – Users are expected to label the map manually to produce meaningful categories. – We produce a hierarchical taxonomy of the clustered documents as well as the concepts discovered in them.

A Neural Network Approach to Automatic Thesaurus Generation SSOM as a Thesaurus Tool – We create a label for a node by assigning the term that corresponds to the largest coordinate in the representation of the node, called the winning term. Neighboring regions having the same winning terms are merged to produce regions. – Our SSOM algorithm gives a hierarchy of concepts, which is also called a thesaurus, and a hierarchy of documents.

A Neural Network Approach to Automatic Thesaurus Generation SSOM as a Thesaurus Tool – Since SSOM is a neural network technique, an assumption of statistical independence of terms is not required. As a result of self organization, the vector space becomes quantified. Each quantum (node in the map) is represented by a spectrum of keywords, each with its own weight. We call this combination of terms a concept.

A Scaleable Self-organizing Map Algorithm for Textual Classification The SOM Algorithm for Textual Classification and Time Complexity Intuition Behind the Technique Mathematical Foundation for the SSOM Algorithm – Updating Weights to Nodes – Computing Distance to All Nodes – What Is the Gain?

SOM Algorithm 1. Initialize input nodes, output nodes, and connection weights: – Use the top (most frequently occurring) N terms as the input vector and create a two-dimensional map (grid) of M output nodes (say a 20-by-10 map of 200 nodes). Initialize weights wij from N input nodes to M output nodes to small random values.

SOM Algorithm 2. Present each document in order: Describe each document as an input vector of N coordinates. Set a coordinate to 1 if the document has the corresponding term and to 0 if there is no such term. Each document is presented to the system several times.

SOM Algorithm 3. Compute distance to all nodes: Compute Euclidean distance dj between the input vector and each output node j:

SOM Algorithm 4. Select winning node j* and update weights to node j* and its neighbors: Select winning node j*, which produces minimum dj. Update weights to nodes j* and its neighbors to reduce the distances between them and the input vector xi(t):

SOM Algorithm 5. Label regions in map: After the network is trained through repeated presentations of all documents, assign a term to each output node by choosing the one corresponding to the largest weight (winning term). Neighboring nodes which contain the same winning terms are merged to form a concept/topic region (group).

Intuition Behind the Technique Our objective is to modify the SOM algorithm so as to be able to compute distances to all nodes(Step 3) and update the weights of nodes (Step 4) at a number of iterations proportional to the number of non-zero coordinates in the input vector, represented here as P. Since we were able to do so, we can obtain an algorithm that takes O(PS) time instead of O(NS), which is thousands of times faster with up-to- date tasks.

Mathematical Foundation for the SSOM Algorithm Updating Weights to Nodes Computing Distance to All Nodes What Is the Gain?

Benchmarking Experiments

Conclusion and Discussion

elf-organizing-map