Intelligent Database Systems Lab Presenter : YAN-SHOU SIE Authors : JEROEN DE KNIJFF, FLAVIUS FRASINCAR, FREDERIK HOGENBOOM 2012. DKE Data & Knowledge.

Slides:

Advertisements

Similar presentations

Document Clustering Carl Staelin. Lecture 7Information Retrieval and Digital LibrariesPage 2 Motivation It is hard to rapidly understand a big bucket.

Advertisements

Intelligent Database Systems Lab Presenter : WU, MIN-CONG Authors : KADIM TA¸SDEMIR, PAVEL MILENOV, AND BROOKE TAPSALL 2011,IEEE Topology-Based Hierarchical.

Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Abdelghani Bellaachia and Mohammed Al-Dhelaan 2012, WIIAT NE-Rank: A Novel Graph-based.

Learning Semantic Information Extraction Rules from News The Dutch-Belgian Database Day 2013 (DBDBD 2013) Frederik Hogenboom Erasmus.

Agglomerative Hierarchical Clustering 1. Compute a distance matrix 2. Merge the two closest clusters 3. Update the distance matrix 4. Repeat Step 2 until.

Comparing Twitter Summarization Algorithms for Multiple Post Summaries David Inouye and Jugal K. Kalita SocialCom May 10 Hyewon Lim.

Sequence Clustering and Labeling for Unsupervised Query Intent Discovery Speaker: Po-Hsien Shih Advisor: Jia-Ling Koh Source: WSDM’12 Date: 1 November,

A New Suffix Tree Similarity Measure for Document Clustering Hung Chim, Xiaotie Deng City University of Hong Kong WWW 2007 Session: Similarity Search April.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A novel document similarity measure based on earth mover’s.

A Robust System Architecture For Mining Semi-structured Data By Aby M Mathew CSE

Creating Concept Hierarchies in a Customer Self-Help System Bob Wall CS /29/05.

Sentiment Lexicon Creation from Lexical Resources BIS 2011 Bas Heerschop Erasmus School of Economics Erasmus University Rotterdam

LSDS-IR’08, October 30, Peer-to-Peer Similarity Search over Widely Distributed Document Collections Christos Doulkeridis 1, Kjetil Nørvåg 2, Michalis.

Detecting Economic Events Using a Semantics-Based Pipeline 22nd International Conference on Database and Expert Systems Applications (DEXA 2011) September.

Analyzing Sentiment in a Large Set of Web Data while Accounting for Negation AWIC 2011 Bas Heerschop Erasmus School of Economics Erasmus University Rotterdam.

Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Huimin Ye.

“A Comparison of Document Clustering Techniques” Michael Steinbach, George Karypis and Vipin Kumar (Technical Report, CSE, UMN, 2000) Mahashweta Das

Word Sense Disambiguation for Automatic Taxonomy Construction from Text-Based Web Corpora 12th International Conference on Web Information System Engineering.

Erasmus University Rotterdam Introduction Nowadays, emerging news on economic events such as acquisitions has a substantial impact on the financial markets.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Fast exact k nearest neighbors search using an orthogonal search tree Presenter : Chun-Ping Wu Authors.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 An Efficient Concept-Based Mining Model for Enhancing.

Intelligent Database Systems Lab Presenter : BEI-YI JIANG Authors : UNIVERSIT´E CATHOLIQUE DE LOUVAIN, BELGIUM ASSOCIATION FOR COMPUTING MACHINERY.

Presented by Tienwei Tsai July, 2005

Intelligent Database Systems Lab N.Y.U.S.T. I. M. BNS Feature Scaling: An Improved Representation over TF·IDF for SVM Text Classification Presenter : Lin,

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Evaluation of novelty metrics for sentence-level novelty mining Presenter : Lin, Shu-Han Authors : Flora.

PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.

UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.

Ontology-Driven Automatic Entity Disambiguation in Unstructured Text Jed Hassell.

CROSSMARC Web Pages Collection: Crawling and Spidering Components Vangelis Karkaletsis Institute of Informatics & Telecommunications NCSR “Demokritos”

Intelligent Database Systems Lab Presenter : WU, MIN-CONG Authors : Jorge Villalon and Rafael A. Calvo 2011, EST Concept Maps as Cognitive Visualizations.

Theory and Application of Database Systems A Hybrid Approach for Extending Ontology from Text He Wei.

INTERACTIVE ANALYSIS OF COMPUTER CRIMES PRESENTED FOR CS-689 ON 10/12/2000 BY NAGAKALYANA ESKALA.

1 Motivation Web query is usually two or three words long. –Prone to ambiguity –Example “keyboard” –Input device of computer –Musical instruments How can.

Intelligent Database Systems Lab Presenter : JHOU, YU-LIANG Authors :Shady Shehata, Fakhri Karray, Mohamed S. Kamel, Fellow 2012, IEEE An Efficient Concept-Based.

Final Presentation Industrial project Automatic tagging tool for Hebrew Wiki pages Supervisors: Dr. Miri Rabinovitz, Supervisors: Dr. Miri Rabinovitz,

1 Web-Page Summarization Using Clickthrough Data* JianTao Sun, Yuchang Lu Dept. of Computer Science TsingHua University Beijing , China Dou Shen,

Benchmarking ontology-based annotation tools for the Semantic Web Diana Maynard University of Sheffield, UK.

Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : Youngjoong Ko, Jungyun Seo 2009, IPM Text classiﬁcation from unlabeled documents.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Word sense disambiguation of WordNet glosses Presenter: Chun-Ping Wu Author: Dan Moldovan, Adrian Novischi.

Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : CHRISTOS BOURAS, VASSILIS TSOGKAS 2012, KBS A clustering technique for news articles.

Intelligent Database Systems Lab Presenter : YAN-SHOU SIE Authors : E.J. Palomo, J. North, D. Elizondo, R.M. Luque, T. Watson NN Application of growing.

Lexico-semantic Patterns for Information Extraction from Text The International Conference on Operations Research 2013 (OR 2013) Frederik Hogenboom

Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Bui Quang Hung, Masanori Otsubo, Yoshinori Hijikata, Shogo Nishida 2010.WIA. HITS.

Intelligent Database Systems Lab Presenter : Kung, Chien-Hao Authors : Eghbal G. Mansoori 2011,IEEE FRBC: A Fuzzy Rule-Based Clustering Algorithm.

Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Kevin Meijer, Flavius Frasincar, Frederik Hogenboom 2014.DSS. A semantic approach.

A Practical Web-based Approach to Generating Topic Hierarchy for Text Segments CIKM2004 Speaker ： Yao-Min Huang Date ： 2005/03/10.

1 Masters Thesis Presentation By Debotosh Dey AUTOMATIC CONSTRUCTION OF HASHTAGS HIERARCHIES UNIVERSITAT ROVIRA I VIRGILI Tarragona, June 2015 Supervised.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Psychiatric document retrieval using a discourse-aware model Presenter : Wu, Jia-Hao Authors : Liang-Chih.

Machine Learning and Data Mining Clustering (adapted from) Prof. Alexander Ihler TexPoint fonts used in EMF. Read the TexPoint manual before you delete.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Mining massive document collections by the WEBSOM method Presenter : Yu-hui Huang Authors :Krista Lagus,

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Improving the performance of personal name disambiguation.

Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Longzhuang Li, Yi Shang, Wei Zhang 2002.ACM. Improvement of HITS-based Algorithms.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 1 Mining knowledge from natural language texts using fuzzy associated concept mapping Presenter : Wu,

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 1 Identifying Domain Expertise of Developers from Source Code Presenter : Wu, Jia-Hao Authors : Renuka.

Intelligent Database Systems Lab Presenter : YAN-SHOU SIE Authors : MARCO PIASTRA NN Self-organizing adaptive map: Autonomous learning of curves.

Intelligent Database Systems Lab Presenter: YU-TING LU Authors: Christopher C. Yang and Tobun Dorbin Ng TSMCA Analyzing and Visualizing Web Opinion.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Text Classification Improved through Multigram Models.

Intelligent Database Systems Lab Presenter : JHOU, YU-LIANG Authors : Jae Hwa Lee, Aviv Segev 2012 CE Knowledge maps for e-learning.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Information Extraction from Wikipedia: Moving Down the Long.

Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,

Intelligent Database Systems Lab Presenter : YU-TING LU Authors : Hsin-Chang Yang, Han-Wei Hsiao, Chung-Hong Lee IPM Multilingual document mining.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Enhancing Text Clustering by Leveraging Wikipedia Semantics.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Decision trees for hierarchical multi-label classification.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Learning Portfolio Analysis and Mining for SCORM Compliant Environment Pattern Recognition (PR, 2010)

Intelligent Database Systems Lab Presenter : BEI-YI JIANG Authors : JAMAL A. NASIR, IRAKLIS VARLAMIS, ASIM KARIM, GEORGE TSATSARONIS KNOWLEDGE-BASED.

Using lexical chains for keyword extraction

Guangbing Yang Presentation for Xerox Docushare Symposium in 2011

Enriching Taxonomies With Functional Domain Knowledge

A Method for the Comparison of Criminal Cases using digital documents

Presentation transcript:

Intelligent Database Systems Lab Presenter : YAN-SHOU SIE Authors : JEROEN DE KNIJFF, FLAVIUS FRASINCAR, FREDERIK HOGENBOOM DKE Data & Knowledge Engineering

Intelligent Database Systems Lab Outlines Motivation Objectives Methodology Experiments Conclusions Comments

Intelligent Database Systems Lab Motivation In the past, data were stored physically, not digitally, and were often structured manually so that the desired information could be found easily. Today, data are often stored digitally and are usually unstructured, as in documents. Manually structuring documents is time consuming.

Intelligent Database Systems Lab Objectives makes it interesting to investigate possibilities to automatically organize documents. This could be performed by automatically generating a concept taxonomy from a document corpus. In our current work, we present a framework for automatically constructing a domain taxonomy from text corpora. We call this framework Automatic Domain Taxonomy Construction from Text (ADTCT).

Intelligent Database Systems Lab Methodology ADTCT Framework

Intelligent Database Systems Lab Term Extraction ： use a part-of-speech parser Term Filtering ： – domain pertinence DP – lexical cohesion LC Methodology-ADTCT Framework

Intelligent Database Systems Lab Methodology-ADTCT Framework −domain consensus DC  norm _freq −final domain score

Intelligent Database Systems Lab Concept hierarchy creation – subsumption method – hierarchical clustering algorithm Methodology-ADTCT Framework

Intelligent Database Systems Lab subsumption method – Concept x potentially subsumes concept y if: – A score calculated for each potential parent Methodology-ADTCT Framework

Intelligent Database Systems Lab – Explained P(p|x) ex : ‘Technology adaptation’ potential parents : Technology, Technological, Adaptation Methodology Technological : = t 0.6 Technology adaptation :

Intelligent Database Systems Lab Hierarchical clustering method – Algorithm:  1. Start with n clusters (each term is a cluster).  2. Compute the distances between clusters.  3. Merge the two nearest clusters into one cluster. Return to step 2 if more than one cluster remains; otherwise, the algorithm has finished. – distance measures  document co-occurrence similarity  window-based similarity Methodology-ADTCT Framework

Intelligent Database Systems Lab – document co-occurrence similarity – window-based similarity » Suppose that we have a document with four concepts: ‘Ad,’‘Bert,’ ‘Cees,’ and ‘Dirk.’ If the window size is 2, the following windows are created for this document: {Ad}, {Ad, Bert}, {Bert, Cees},{Cees, Dirk}, and {Dirk}. Methodology-ADTCT Framework

Intelligent Database Systems Lab – hierarchical clustering algorithm ex : ‘System’ appears in documents {1,3,6,8} and windows {1,5,10,14,18,20,28}; ‘Process’ appears in documents {1,3,6,12} and windows {1,5,12,14,18,25,30}.  the similarities are converted to distances: Methodology-Implementation window similarity : document similarity : Min Max Avg = 0.15

Intelligent Database Systems Lab Methodology ADCTC Implementation

Intelligent Database Systems Lab Experiments Experimental setup – lexical precision : – common semantic cotopy : – local taxonomic precision : – taxonomic precision and recall : – taxonomic F-measure (TF):

Intelligent Database Systems Lab Experiments Experimental results

Intelligent Database Systems Lab Experiments – trade-off decision mathematically  Suppose minimal average depth = 3, minimal quality = 0.60, t=0.20, t=0.25, t=0.30 obey these constraints. γ=0.40 and λ=0.60 t=0.20 t=0.25 t=0.30

Intelligent Database Systems Lab Experiments

Intelligent Database Systems Lab Conclusions Ourevaluation in the field of management and economics indicates that a trade-off between taxonomy quality and depth must be made when choosing one of these methods. The subsumption method is preferable for shallow taxonomies, whereas the hierarchical clustering algorithm is recommended for deep taxonomies.

Intelligent Database Systems Lab Comments Advantages -Automatically create taxonomies that approach the quality of manually created taxonomies and save even more time Applications - Clustering, Classification, etc.