1 LiveClassifier: Creating Hierarchical Text Classifiers through Web Corpora Chien-Chung Huang Shui-Lung Chuang Lee-Feng Chien Presented by: Vu LONG.

Slides:



Advertisements
Similar presentations
Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
Advertisements

Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen SIMS, UC Berkeley Susan Dumais Adaptive Systems & Interactions Microsoft.
ELPUB 2006 June Bansko Bulgaria1 Automated Building of OAI Compliant Repository from Legacy Collection Kurt Maly Department of Computer.
Explorations in Tag Suggestion and Query Expansion Jian Wang and Brian D. Davison Lehigh University, USA SSM 2008 (Workshop on Search in Social Media)
Jianwei Lu1 Information Extraction from Event Announcements Student: Jianwei Lu ( ) Supervisor: Robert Dale.
Information Retrieval in Practice
Search Engines and Information Retrieval
Distributed Search over the Hidden Web Hierarchical Database Sampling and Selection Panagiotis G. Ipeirotis Luis Gravano Computer Science Department Columbia.
Creating Concept Hierarchies in a Customer Self-Help System Bob Wall CS /29/05.
Interfaces for Selecting and Understanding Collections.
Mobile Web Search Personalization Kapil Goenka. Outline Introduction & Background Methodology Evaluation Future Work Conclusion.
Learning to Advertise. Introduction Advertising on the Internet = $$$ –Especially search advertising and web page advertising Problem: –Selecting ads.
Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.
MANISHA VERMA, VASUDEVA VARMA PATENT SEARCH USING IPC CLASSIFICATION VECTORS.
1 Information Retrieval and Web Search Introduction.
Text Classification: An Implementation Project Prerak Sanghvi Computer Science and Engineering Department State University of New York at Buffalo.
Overview of Search Engines
1/16 Final project: Web Page Classification By: Xiaodong Wang Yanhua Wang Haitang Wang University of Cincinnati.
Improving web image search results using query-relative classifiers Josip Krapacy Moray Allanyy Jakob Verbeeky Fr´ed´eric Jurieyy.
Search Engines and Information Retrieval Chapter 1.
©2008 Srikanth Kallurkar, Quantum Leap Innovations, Inc. All rights reserved. Apollo – Automated Content Management System Srikanth Kallurkar Quantum Leap.
Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen, CS Division, UC Berkeley Susan Dumais, Microsoft Research ACM:CHI April.
Searching the Web Dr. Frank McCown Intro to Web Science Harding University This work is licensed under Creative Commons Attribution-NonCommercial 3.0Attribution-NonCommercial.
Ihr Logo Chapter 7 Web Content Mining DSCI 4520/5240 Dr. Nick Evangelopoulos Xxxxxxxx.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
1 A Unified Relevance Model for Opinion Retrieval (CIKM 09’) Xuanjing Huang, W. Bruce Croft Date: 2010/02/08 Speaker: Yu-Wen, Hsu.
Question Answering.  Goal  Automatically answer questions submitted by humans in a natural language form  Approaches  Rely on techniques from diverse.
Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009.
CROSSMARC Web Pages Collection: Crawling and Spidering Components Vangelis Karkaletsis Institute of Informatics & Telecommunications NCSR “Demokritos”
Features and Algorithms Paper by: XIAOGUANG QI and BRIAN D. DAVISON Presentation by: Jason Bender.
Similar Document Search and Recommendation Vidhya Govindaraju, Krishnan Ramanathan HP Labs, Bangalore, India JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Giorgos Giannopoulos (IMIS/”Athena” R.C and NTU Athens, Greece) Theodore Dalamagas (IMIS/”Athena” R.C., Greece) Timos Sellis (IMIS/”Athena” R.C and NTU.
Publication Spider Wang Xuan 07/14/2006. What is publication spider Gathering publication pages Using focused crawling With the help of Search Engine.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
Research Paper Computer Information Technology. Research Paper There seems to confusion over when the paper is due. The paper was due 4/6/11. I must have.
1 Web-Page Summarization Using Clickthrough Data* JianTao Sun, Yuchang Lu Dept. of Computer Science TsingHua University Beijing , China Dou Shen,
Personalized Course Navigation Based on Grey Relational Analysis Han-Ming Lee, Chi-Chun Huang, Tzu- Ting Kao (Dept. of Computer Science and Information.
How Useful are Your Comments? Analyzing and Predicting YouTube Comments and Comment Ratings Stefan Siersdorfer, Sergiu Chelaru, Wolfgang Nejdl, Jose San.
21/11/20151Gianluca Demartini Ranking Clusters for Web Search Gianluca Demartini Paul–Alexandru Chirita Ingo Brunkhorst Wolfgang Nejdl L3S Info Lunch Hannover,
Algorithmic Detection of Semantic Similarity WWW 2005.
The World Wide Web: Information Resource. Hock, Randolph. The Extreme Searcher’s Internet Handbook. 2 nd ed. CyberAge Books: Medford. (2007). Internet.
Augmenting Focused Crawling using Search Engine Queries Wang Xuan 10th Nov 2006.
A Practical Web-based Approach to Generating Topic Hierarchy for Text Segments CIKM2004 Speaker : Yao-Min Huang Date : 2005/03/10.
Search Result Interface Hongning Wang Abstraction of search engine architecture User Ranker Indexer Doc Analyzer Index results Crawler Doc Representation.
Reference Collections: Collection Characteristics.
UWMS Data Mining Workshop Content Analysis: Automated Summarizing Prof. Marti Hearst SIMS 202, Lecture 16.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Improving the performance of personal name disambiguation.
Post-Ranking query suggestion by diversifying search Chao Wang.
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
CONTEXTUAL SEARCH AND NAME DISAMBIGUATION IN USING GRAPHS EINAT MINKOV, WILLIAM W. COHEN, ANDREW Y. NG SIGIR’06 Date: 2008/7/17 Advisor: Dr. Koh,
TO Each His Own: Personalized Content Selection Based on Text Comprehensibility Date: 2013/01/24 Author: Chenhao Tan, Evgeniy Gabrilovich, Bo Pang Source:
Marko Grobelnik, Janez Brank, Blaž Fortuna, Igor Mozetič.
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
Predicting Short-Term Interests Using Activity-Based Search Context CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.
Contextual Text Cube Model and Aggregation Operator for Text OLAP
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Bringing Order to the Web : Automatically Categorizing Search Results Advisor : Dr. Hsu Graduate : Keng-Wei Chang Author : Hao Chen Susan Dumais.
Usefulness of Quality Click- through Data for Training Craig Macdonald, ladh Ounis Department of Computing Science University of Glasgow, Scotland, UK.
Information Retrieval in Practice
Information Storage and Retrieval Fall Lecture 1: Introduction and History.
Search Engine Architecture
Using the Web to Interactively Learn to Find Objects
Magnet & /facet Zheng Liang
Panagiotis G. Ipeirotis Luis Gravano
Date: 2012/11/15 Author: Jin Young Kim, Kevyn Collins-Thompson,
Semi-Automatic Data-Driven Ontology Construction System
Using Link Information to Enhance Web Page Classification
Introduction to Search Engines
Presentation transcript:

1 LiveClassifier: Creating Hierarchical Text Classifiers through Web Corpora Chien-Chung Huang Shui-Lung Chuang Lee-Feng Chien Presented by: Vu LONG

2 Outline 1. Introduction 2. LiveClassifier 3. Evaluation 4. Contribution 5. Future work

3 Introduction Uses Web search-result pages as the corpus source Exploits the structure information in the topic hierarchy to train the classifier Creates key terms to amend the insufficiency of the topic hierarchy

4 LiveClassifier (Demo version) Classify documents Computer Science Classifier is chosen There are three created classifiers (topics): Computer Science, Europe, Scientists based on Yahoo! directory

5 LiveClassifier Classify documents Pseudo class

6 LiveClassifier Users can self create their classifiers

7 LiveClassifier Feature Extractor - Interacts with Search Engine and extracts highly-ranked search snippets as effective feature source - Outputs feature vectors to describe both topic classes and text objects

8 LiveClassifier Hier-Concept-Query-Formulation - Formulate query through the topic hierarchy

9 LiveClassifier Text Classifier

10 Evaluation Overall performance evaluation

11 Evaluation Granularity & Diversity - Classifying text objects into different levels of the topic hierarchy got roughly the same results.

12 Evaluation Thematic Metadata for Textual Data

13 Evaluation Paper Title Classification - Collect data from 4 CS conferences in Classify them into 36 second-level CS classes

14 Contribution Finds the ways to collect and organize corpora effectively Creates key terms to amend the insufficiency of the topic hierarchy Classifies text objects automatically without a pre-labeled training set Cooperates with Web information services and other systems easily Helps to create more refined data (thematic metadata) for textual data

15 Future work Optimize the classifier based by focusing on the training stage rather than only on organizing corpora Improve responding time Find appropriate pseudo classes