Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen Author : Satoshi Oyama Takashi Kokubo Toru lshida 國立雲林科技大學 National Yunlin.

Slides:



Advertisements
Similar presentations
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Presenter : Yu Cheng Chen Author: Hichem.
Advertisements

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Validating Transliteration Hypotheses Using the Web: Web.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A novel document similarity measure based on earth mover’s.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Web-Page Summarization Using Clickthrough Data Advisor.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Student : Sheng-Hsuan Wang Department.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Extreme Re-balancing for SVMs: a case study Advisor :
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Presenter : Chien-Shing Chen Author: Tie-Yan.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Probabilistic Model for Definitional Question Answering.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Keng-Wei Chang Author : Anthony K.H. Tung Hongjun Lu Jiawei Han Ling Feng 國立雲林科技大學 National.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Data mining for credit card fraud: A comparative study.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A Comparison of SOM Based Document Categorization Systems.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 The k-means range algorithm for personalized data clustering.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Mining Positive and Negative Patterns for Relevance Feature.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A Comprehensive Comparison Study of Document Clustering.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology On Data Labeling for Clustering Categorical Data Hung-Leng.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Presenter : Chien Shing Chen Author: Wei-Hao.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Visualizing Ontology Components through Self-Organizing.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Finding Terminology Translations From Hyperlinks On the.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Extracting meaningful labels for WEBSOM text archives Advisor.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology SIGIR1 Improving Web Search Results Using Affinity Graph.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Ming Hsiao Author : Bing Liu Yiyuan Xia Philp S. Yu 國立雲林科技大學 National Yunlin University.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Automatic Recommendations for E-Learning Personalization.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. An IPC-based vector space model for patent retrieval Presenter: Jun-Yi Wu Authors: Yen-Liang Chen, Yu-Ting.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 GMDH-based feature ranking and selection for improved.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. How valuable is medical social media data? Content analysis of the medical web Presenter :Tsai Tzung.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 An Adaptation of the Vector-Space Model for Ontology-Based.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author: Manoranjan.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Mining Logs Files for Data-Driven System Management Advisor.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 The Evolving Tree — Analysis and Applications Advisor.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Utilizing Marginal Net Utility for Recommendation in E-commerce.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author: Chung-hung.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Using Text Mining and Natural Language Processing for.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Fuzzy integration of structure adaptive SOMs for web content.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Presenter : Yu Cheng Chen Author: YU-SHENG.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Authors :
Intelligent Database Systems Lab Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Lian Yan and David J. Miller 國立雲林科技大學 National Yunlin University of.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen Author : Juan D.Velasquez Richard Weber Hiroshi Yasuda 國立雲林科技大學 National.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A text mining approach on automatic generation of web.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Iterative Translation Disambiguation for Cross-Language.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Psychiatric document retrieval using a discourse-aware model Presenter : Wu, Jia-Hao Authors : Liang-Chih.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Information Loss of the Mahalanobis Distance in High Dimensions-
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Mining massive document collections by the WEBSOM method Presenter : Yu-hui Huang Authors :Krista Lagus,
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Multiclass boosting with repartitioning Graduate : Chen,
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 An initialization method to simultaneously find initial.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Improving the performance of personal name disambiguation.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Region-based image retrieval using integrated color, shape,
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Unsupervised Learning with Mixed Numeric and Nominal Data.
Intelligent Database Systems Lab Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Balaji Rajagopalan Mark W. Isken 國立雲林科技大學 National Yunlin University.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A personal route prediction system base on trajectory.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A self-organizing map for adaptive processing of structured.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Cost- sensitive boosting for classification of imbalanced.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Translation of Web Queries Using Anchor Text Mining Advisor.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Direct mining of discriminative patterns for classifying.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen Author : Jessica K. Ting Michael K. Ng Hongqiang Rong Joshua Z. Huang 國立雲林科技大學.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author: Wei Xu,
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Comparing Association Rules and Decision Trees for Disease.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Hierarchical model-based clustering of large datasets.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Text Classification Improved through Multigram Models.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Growing Hierarchical Tree SOM: An unsupervised neural.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Mining Advisor-Advisee Relationships from Research Publication.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author : Yongqiang Cao Jianhong Wu 國立雲林科技大學 National Yunlin University of Science.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Dual clustering : integrating data clustering over optimization.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Presenter : Chien-Shing Chen Author: Gustavo.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Key Blog Distillation: Ranking Aggregates Presenter : Yu-hui Huang Authors :Craig Macdonald, Iadh Ounis.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Information Extraction from Wikipedia: Moving Down the Long.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Chun Kai Chen Author : Andrew.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Electricity Based External Similarity of Categorical Attributes.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Jian-Lin Kuo Author : Aristidis Likas Nikos Vlassis Jakob J.Verbeek 國立雲林科技大學 National Yunlin.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A support system for predicting eBay end prices Presenter.
Presentation transcript:

Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen Author : Satoshi Oyama Takashi Kokubo Toru lshida 國立雲林科技大學 National Yunlin University of Science and Technology Domain-Specific Web Search with Keyword Spices Knowledge and Data Engineering, IEEE Transactions on, Jan. 2004,IEEE JNL

Intelligent Database Systems Lab Outline Motivation Objective Introduction Domain-specific web search with keyword spices Algorithm for extracting keyword spices Experiments Conclusions Opinion N.Y.U.S.T. I.M.

Intelligent Database Systems Lab N.Y.U.S.T. I.M. Motivation naïve queries may find many irrelevant pages obtain more relevant pages depend on much experience and skill previous, domain-specific collect and index relevant page manually constructed: cost, scalable

Intelligent Database Systems Lab Objective Domain-specific search engines return: relevant to certain domains filter irrelevant web pages N.Y.U.S.T. I.M.

Intelligent Database Systems Lab 1-1.Introduction Domain-specific web search engines Looking for a recipe Only input ‘beef’, find few recipes Input ‘beef pepper’, find other recipes N.Y.U.S.T. I.M.

Intelligent Database Systems Lab N.Y.U.S.T. I.M. 牛肉 牛肉、胡椒

Intelligent Database Systems Lab 1-2.Introduction N.Y.U.S.T. I.M.

Intelligent Database Systems Lab 1-3.Introduction N.Y.U.S.T. I.M.

Intelligent Database Systems Lab 1-4.Introduction N.Y.U.S.T. I.M. Domain-specific search engines return: relevant to certain domains filter irrelevant web pages download irrelevant and relevant, classify them Use Decision-Tree

Intelligent Database Systems Lab 2-1.Domain-Specific web search with keyword spices Domain-Specific Web search as a Text Classification problem Domain-Specific which collect sample web pages according to the assumption of user’s input N.Y.U.S.T. I.M.

Intelligent Database Systems Lab 2-1. Domain-specific web search as a text classification D : all web documents Dt: the set of documents relevant to a certain domain N.Y.U.S.T. I.M.

Intelligent Database Systems Lab 2-1. Domain-specific web search as a text classification set of all keywords in the domain be the hypothesis space composed of all Boolean expressions is regarded as a Boolean variable A Boolean expression of keywords can be regarded as a function from D to 1, keywords is contained in the document 0, otherwise N.Y.U.S.T. I.M. Words in domain- specific out put

Intelligent Database Systems Lab 2-1. Domain-specific web search as a text classification Finding hypothesis h that minimizes the error rate: N.Y.U.S.T. I.M.

Intelligent Database Systems Lab 2-2.Collecting sample web pages by user’s input It’s difficult with random sampling. assume all candidates keyword have the same probability of occurrence in the “recipe domain”, input “beef,” “salmon( 鮭魚 ),” “potato,” etc. as sample keywords and download the same web pages for each keyword N.Y.U.S.T. I.M.

Intelligent Database Systems Lab 2-2.Collecting sample web pages by user’s input N.Y.U.S.T. I.M.

Intelligent Database Systems Lab 3-1.Identifying keyword spices N.Y.U.S.T. I.M. classify sample pages into two classes T or F by hand a decision tree learning algorithm to discover keyword spices each node is an attribute value of a branch indicates the value of the attribute each leaf is a class No “tablespoon”, has “recipe”, no “home”, no “top, class T

Intelligent Database Systems Lab 3-1. Extracting keyword spices N.Y.U.S.T. I.M. Words in domain-specificoutput d d d Classified by humans Web pages collected by user’s input keyword

Intelligent Database Systems Lab 3-1.Identifying keyword spices N.Y.U.S.T. I.M.

Intelligent Database Systems Lab 3-2.Simplifying keyword spices Decision trees are very large. Too-complex queries can’t be accepted. overfitting problem N.Y.U.S.T. I.M.

Intelligent Database Systems Lab 3-2.Simplifying keyword spices Simplify the induced Boolean expression 1.For each conjunction c in h we remove keywords (Boolean literals) from c to simplify. 2.We remove conjunctions from disjunctive normal from h to simplify it. N.Y.U.S.T. I.M.

Intelligent Database Systems Lab 3-2.Simplifying keyword spices Precision P and recall R are defined over validation Harmonic mean of P and R N.Y.U.S.T. I.M.

Intelligent Database Systems Lab 3-2.Simplifying keyword spices greater contribution to F weighted harmonic mean of F N.Y.U.S.T. I.M.

Intelligent Database Systems Lab 4.Experimtents N.Y.U.S.T. I.M.

Intelligent Database Systems Lab 4-1.Experimtents-extracting keyword spices N.Y.U.S.T. I.M.

Intelligent Database Systems Lab 4-1.Experimtents-extracting keyword spices N.Y.U.S.T. I.M.

Intelligent Database Systems Lab 4-1.Extracting keyword spices sample pages were split randomly in the recipe domain N.Y.U.S.T. I.M.

Intelligent Database Systems Lab keyword spices discovered for a recipe search engines N.Y.U.S.T. I.M. 4-1.Extracting keyword spices

Intelligent Database Systems Lab trade off between precision and recall N.Y.U.S.T. I.M. 4-1.Extracting keyword spices

Intelligent Database Systems Lab When, keyword spices extracted for the domain of … N.Y.U.S.T. I.M. 4-1.Extracting keyword spices

Intelligent Database Systems Lab N.Y.U.S.T. I.M. 4-2.Evluation Using a General- Purpose search engine

Intelligent Database Systems Lab to test queries in each domain N.Y.U.S.T. I.M. 4-2.Evluation Using a General- Purpose search engine

Intelligent Database Systems Lab N.Y.U.S.T. I.M. 4-2.Evluation Using a General- Purpose search engine

Intelligent Database Systems Lab precision values of the sample queries conjoined with “recipe” keyword “recipe” finds fewer relevant than the query with keyword spice, for example: “beef recipe” N.Y.U.S.T. I.M. 4-2.Evluation Using a General- Purpose search engine

Intelligent Database Systems Lab N.Y.U.S.T. I.M. 4-3.Comparison to the Filtering model

Intelligent Database Systems Lab precision values of the sample queries in the filtering model N.Y.U.S.T. I.M. 4-3.Comparison to the Filtering model

Intelligent Database Systems Lab numbers of relevant pages returned by the … N.Y.U.S.T. I.M. 4-3.Comparison to the Filtering model

Intelligent Database Systems Lab for example “shrimp”, must download 5 pages to obtain one result and so is quite inefficient N.Y.U.S.T. I.M. 4-3.Comparison to the Filtering model

Intelligent Database Systems Lab 5.Future Work training examples classified by human cost N.Y.U.S.T. I.M.

Intelligent Database Systems Lab 5.Future Work 1. Using a Web Directory as a Source for Training examples Web directories such as Yahoo, Open Directory,…,… estimate bias N.Y.U.S.T. I.M.

Intelligent Database Systems Lab 5.Future Work 2. Learning Classifiers from Partially Labeled Data Proposed an algorithm augment a small to huge N.Y.U.S.T. I.M.

Intelligent Database Systems Lab 6.Conclusion keyword spices human Cost, effective N.Y.U.S.T. I.M.

Intelligent Database Systems Lab Opinion dependent on human seriously assume all candidates keyword have the same probability of occurrence …… N.Y.U.S.T. I.M.

Intelligent Database Systems Lab Opinion Pr(TL)? Pr(TL’)? N.Y.U.S.T. I.M.

Intelligent Database Systems Lab Opinion Poster Probability Rule X N.Y.U.S.T. I.M. assume all candidates keyword have the same probability of occurrence

Intelligent Database Systems Lab N.Y.U.S.T. I.M.

Intelligent Database Systems Lab N.Y.U.S.T. I.M.

Intelligent Database Systems Lab N.Y.U.S.T. I.M.

Intelligent Database Systems Lab N.Y.U.S.T. I.M.

Intelligent Database Systems Lab N.Y.U.S.T. I.M.

Intelligent Database Systems Lab N.Y.U.S.T. I.M.

Intelligent Database Systems Lab N.Y.U.S.T. I.M.

Intelligent Database Systems Lab N.Y.U.S.T. I.M.

Intelligent Database Systems Lab N.Y.U.S.T. I.M.

Intelligent Database Systems Lab N.Y.U.S.T. I.M.

Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen 國立雲林科技大學 National Yunlin University of Science and Technology Keyword Spices Modified

Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen 國立雲林科技大學 National Yunlin University of Science and Technology Information Retrieval

Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen 國立雲林科技大學 National Yunlin University of Science and Technology Machine Learning (cluster,classify)

Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen 國立雲林科技大學 National Yunlin University of Science and Technology Content Web Mining

Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen 國立雲林科技大學 National Yunlin University of Science and Technology Dictionary which can represent a distance between Words

Intelligent Database Systems Lab Advisor : Graduate : Chien-Shing Chen 國立雲林科技大學 National Yunlin University of Science and Technology