Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A text mining approach on automatic generation of web.

Slides:



Advertisements
Similar presentations
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Presenter : Yu Cheng Chen Author: Hichem.
Advertisements

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 On Rival Penalization Controlled Competitive Learning.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A novel document similarity measure based on earth mover’s.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Web-Page Summarization Using Clickthrough Data Advisor.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Fast exact k nearest neighbors search using an orthogonal search tree Presenter : Chun-Ping Wu Authors.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 An Efficient Concept-Based Mining Model for Enhancing.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Extreme Re-balancing for SVMs: a case study Advisor :
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Unsupervised pattern recognition models for mixed feature-type.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Student : Sheng-Hsuan Wang Department.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Presenter : Chien-Shing Chen Author: Tie-Yan.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Human eye sclera detection and tracking using a modified.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology On multidimensional scaling and the embedding of self-organizing.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A novel genetic algorithm for automatic clustering Advisor.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen Author : Satoshi Oyama Takashi Kokubo Toru lshida 國立雲林科技大學 National Yunlin.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A Comparison of SOM Based Document Categorization Systems.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology HE-Tree: a framework for detecting changes in clustering.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 The k-means range algorithm for personalized data clustering.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Mining Positive and Negative Patterns for Relevance Feature.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A Comprehensive Comparison Study of Document Clustering.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology On Data Labeling for Clustering Categorical Data Hung-Leng.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Visualizing Ontology Components through Self-Organizing.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Finding Terminology Translations From Hyperlinks On the.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Extracting meaningful labels for WEBSOM text archives Advisor.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Using term informativeness for named entity detection.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology SIGIR1 Improving Web Search Results Using Affinity Graph.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Ming Hsiao Author : Bing Liu Yiyuan Xia Philp S. Yu 國立雲林科技大學 National Yunlin University.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 New Unsupervised Clustering Algorithm for Large Datasets.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Automatic Recommendations for E-Learning Personalization.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. An IPC-based vector space model for patent retrieval Presenter: Jun-Yi Wu Authors: Yen-Liang Chen, Yu-Ting.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 GMDH-based feature ranking and selection for improved.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A k-mean clustering algorithm for mixed numeric and categorical.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Using the Web for Automated Translation Extraction in.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Development of a reading material recommendation system based on a knowledge engineering approach Presenter.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 An Adaptation of the Vector-Space Model for Ontology-Based.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author: Manoranjan.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Mining Logs Files for Data-Driven System Management Advisor.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 The Evolving Tree — Analysis and Applications Advisor.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 2007.SIGIR.8 New Event Detection Based on Indexing-tree.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Utilizing Marginal Net Utility for Recommendation in E-commerce.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author: Chung-hung.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Using Text Mining and Natural Language Processing for.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A modified version of the K-means algorithm with a distance.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Fuzzy integration of structure adaptive SOMs for web content.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen Author : Juan D.Velasquez Richard Weber Hiroshi Yasuda 國立雲林科技大學 National.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Iterative Translation Disambiguation for Cross-Language.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Mining massive document collections by the WEBSOM method Presenter : Yu-hui Huang Authors :Krista Lagus,
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Multiclass boosting with repartitioning Graduate : Chen,
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Unsupervised Learning with Mixed Numeric and Nominal Data.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A self-organizing map for adaptive processing of structured.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A hierarchical clustering algorithm for categorical sequence.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Direct mining of discriminative patterns for classifying.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Towards comprehensive support for organizational mining Presenter : Yu-hui Huang Authors : Minseok Song,
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author: Wei Xu,
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Concept Frequency Distribution in Biomedical Text Summarization.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Automatic Document Indexing in Large Medical Collections.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology ACM SIGMOD1 Subsequence Matching on Structured Time Series.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Hierarchical model-based clustering of large datasets.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Text Classification Improved through Multigram Models.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Growing Hierarchical Tree SOM: An unsupervised neural.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Mining Advisor-Advisee Relationships from Research Publication.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author : Yongqiang Cao Jianhong Wu 國立雲林科技大學 National Yunlin University of Science.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Dual clustering : integrating data clustering over optimization.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Key Blog Distillation: Ranking Aggregates Presenter : Yu-hui Huang Authors :Craig Macdonald, Iadh Ounis.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Sanghamitra.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Visualizing social network concepts Presenter : Chun-Ping Wu Authors :Bin Zhu, Stephanie Watts, Hsinchun.
Intelligent Database Systems Lab Presenter : YU-TING LU Authors : Hsin-Chang Yang, Han-Wei Hsiao, Chung-Hong Lee IPM Multilingual document mining.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Chun Kai Chen Author : Andrew.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Enhancing Text Clustering by Leveraging Wikipedia Semantics.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A New Cluster Validity Index for Data with Merged Clusters.
Presentation transcript:

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A text mining approach on automatic generation of web directories and hierarchies Advisor : Dr. Hsu Reporter : Chun Kai Chen Author : Hsin-Chang Yang and Chung-Hong Lee Expert Systems with Applications

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 2 Outline Motivation Objective The text mining process Automatic generation of web directories Experimental Results Summary

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 3 Motivation  The classification of web pages into proper directories and the organization of directory hierarchies are generally performed by human experts.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 4 Objective  In this work, we provide a corpus-based method that applies a kind of text mining techniques on a corpus of web pages to automatically create web directories and organize them into hierarchies.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 5 S1 S2 S3 web directories Generation of directories Generation of directory hierarchies SiSi Automatic generation of web directories 網頁 萃取文章資料 SOM(DCM) SOM(WCM) The text mining process two-level hierarchy S i+1

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 6 Automatic generation of web directories  Generation of directory hierarchies ─ The super-cluster generation process algorithm  Generation of directories ─ identify cluster themes by examining the WCM ─ selects the word that is the most important to a super-cluster DCM WCM stop criteria

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 7 Experimental Results  The experiments show that our method can produce comprehensible and reasonable web directories and hierarchies.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 8 Introduction(1/3)  Information finding is thus a serious problem for the web since most users find it hard to obtain the information using current information retrieval strategies.  Two kinds of strategies are now adopted by the web communities, namely searching and browsing.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 9 Introduction(2/3)  Since the link structures may be considered static during browsing ─ the selection of starting pages plays the most important role when a user tries to find his goal in minimum time  Therefore, many commercial or academic web sites actively collect web pages and sort them into web directories ─ to provide users the starting points in the browsing process

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 10 Introduction(3/3)  Most existing web directories were created manually by human specialists. ─ Yahoo!  Such limitation is mainly caused by the gigantic amount of web pages produced and being produced

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 11 Related work  category hierarchy ─ predefined category hierarchy (Yahoo!) ─ automatically developing category hierarchy  topic identification ─ mutually related text excerpts  Self-organizing map algorithm

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 12 The text mining process(1/2)  The method is based on the self-organizing map learning algorithm and requires no human intervention during the construction of web directories and hierarchies. 網頁 萃取文章資料 SOM(DCM) SOM(WCM) The text mining process

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 13 The text mining process(2/2)  labeling process ─ each document will associate with a neuron in the map. We record such associations and form the DCM. ─ In the DCM, each neuron is labeled by a list of documents which are considered similar and are in the same cluster. ─ In the same manner, we label each word to some neuron in the map and form the WCM.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 14 Generation of directory hierarchies(1/3)  The two-level hierarchy generation process ─ the parent node is the constructed super-cluster ─ the child nodes are the clusters that compose the super-cluster ─ can be further applied to every super-cluster to establish the next level of this hierarchy  The overall hierarchy ─ iteratively using such top–down approach ─ until a stop criterion is satisfied

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 15 Generation of directory hierarchies(2/3)  To form a super-cluster ─ the distance between two clusters( 二維空間座標距離 ) ─ the dissimilarity between two clusters( 神經元向量相似度 ) ─ the supporting cluster similarity we can determine the significance of a cluster by examining the overall similarity that is contributed by its neighboring clusters. doc(i) : 神經元 i 的文件數量 Bi : 神經元 i 的鄰近神經元 index F: is a monotonically increasing function  The dominating clusters ─ has locally maximal supporting cluster similarity ─ the centroid of a super-cluster, which contains several child clusters

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 16 Generation of directory hierarchies(3/3)  In Step 3 of the super-cluster generation process algorithm we set three stop criteria. ─ The first criterion stops finding super-clusters if there is no neuron left for selection. ─ The second criterion, which limits the number of dominating clusters, to constrain the breadth of hierarchies. ─ The third criterion constrains the depth of a hierarchy.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 17 S1 S2 S3

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 18 Generation of directories  In this work, we try to identify cluster themes, i.e. directory labels, by examining the WCM. ─ selects the word that is the most important to a super-cluster

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 19 Summary  In this paper, we present a method to automatically generate ─ web directory hierarchies and identify directory labels.  Experiments show that our method could ─ successfully cluster the documents into directories, ─ reveal the hierarchical structure among these directories, ─ and assign a label to each directory.  However, fully automatic process may not provide the best solutions for these tasks that interfere so much with human beings.  Thus, in our opinions, a kind of semi-automatic process which uses the proposed method as a preprocessing stage should be plausible to meet the general requirements.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 20 Personal Opinion  Application ─ such as text categorization, thesaurus construction, ontology learning, multilingual information retrieval  Advantage ─ fully automatic process, which can automatically create web director hierarchies without the intervention of human beings  Disadvantage ─ may not provide the best solutions