Mapping Between Taxonomies Elena Eneva 27 Sep 2001 Advanced IR Seminar.

Slides:



Advertisements
Similar presentations
Background Knowledge for Ontology Construction Blaž Fortuna, Marko Grobelnik, Dunja Mladenić, Institute Jožef Stefan, Slovenia.
Advertisements

Wincite Knowledge Warehousing and Networking Sophisticated Simplicity.
Albert Gatt Corpora and Statistical Methods Lecture 13.
Machine Learning and the Semantic Web
Multi-class SVM with Negative Data Selection for Web Page Classification Chih-Ming Chen, Hahn-Ming Lee and Ming-Tyan Kao International Joint Conference.
Merging Taxonomies. Assertion Creation and maintenance of large ontologies will require the capability to merge taxonomies This problem is similar to.
Text Classification With Support Vector Machines
Mapping Between Taxonomies Elena Eneva 30 Oct 2001 Advanced IR Seminar.
6/16/20151 Recent Results in Automatic Web Resource Discovery Soumen Chakrabartiv Presentation by Cui Tao.
Reconciling Schemas of Disparate Data Sources: A Machine-Learning Approach AnHai Doan Pedro Domingos Alon Halevy.
Mapping Between Taxonomies Elena Eneva 11 Dec 2001 Advanced IR Seminar.
Web Information Retrieval and Extraction Chia-Hui Chang, Associate Professor National Central University, Taiwan Sep. 16, 2005.
Reconciling Schemas of Disparate Data Sources: A Machine-Learning Approach AnHai Doan Pedro Domingos Alon Halevy.
Learning to Match Ontologies on the Semantic Web AnHai Doan Jayant Madhavan Robin Dhamankar Pedro Domingos Alon Halevy.
Hypertext Categorization using Hyperlink Patterns and Meta Data Rayid Ghani Séan Slattery Yiming Yang Carnegie Mellon University.
OER Case Study TJTS569 Advanced Topics in Global Information Systems Savenkova Iuliia.
SIEVE—Search Images Effectively through Visual Elimination Ying Liu, Dengsheng Zhang and Guojun Lu Gippsland School of Info Tech,
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011.
Dept. Computer Science, Korea Univ. Intelligent Information System Lab. XML clustering methods Sohn Jong-Soo Intelligent Information.
1 The BT Digital Library A case study in intelligent content management Paul Warren
AnHai Doan Pedro Domingos Alon Levy Department of Computer Science & Engineering University of Washington Learning Source Descriptions for Data Integration.
Web Page Language Identification Based on URLs Reporter: 鄭志欣 Advisor: Hsing-Kuo Pao 1.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Recent Trends in Text Mining Girish Keswani
Topical Crawlers for Building Digital Library Collections Presenter: Qiaozhu Mei.
25/03/2003CSCI 6405 Zheyuan Yu1 Finding Unexpected Information Taken from the paper : “Discovering Unexpected Information from your Competitor’s Web Sites”
Text Feature Extraction. Text Classification Text classification has many applications –Spam detection –Automated tagging of streams of news articles,
Reference Sources on Business and Economics Sarah Aerni Special Projects Librarian University of Pittsburgh 6 April 2005.
Web Taxonomy Integration through Co-Bootstrapping Dell Zhang National University of Singapore Wee Sun Lee National University of Singapore SIGIR’04.
DITA packaging diagrams with verbal descriptions in the boxes.
Machine Learning in Ad-hoc IR. Machine Learning for ad hoc IR We’ve looked at methods for ranking documents in IR using factors like –Cosine similarity,
Search engines are used to for looking for documents. They compile their databases by employing "spiders" or "robots" to crawl through web space from.
1 Automatic Classification of Bookmarked Web Pages Chris Staff Second Talk February 2007.
Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm Chen, Yi-wen( 陳憶文 ) Graduate Institute of Computer Science & Information Engineering.
LOGO A comparison of two web-based document management systems ShaoxinYu Columbia University March 31, 2009.
21/11/20151Gianluca Demartini Ranking Clusters for Web Search Gianluca Demartini Paul–Alexandru Chirita Ingo Brunkhorst Wolfgang Nejdl L3S Info Lunch Hannover,
Aligner automatiquement des ontologies avec Tuesday 23 rd of January, 2007 Rapha ë l Troncy.
Exploring in the Weblog Space by Detecting Informative and Affective Articles Xiaochuan Ni, Gui-Rong Xue, Xiao Ling, Yong Yu Shanghai Jiao-Tong University.
 Who Uses Web Search for What? And How?. Contribution  Combine behavioral observation and demographic features of users  Provide important insight.
Seminar on Dynamic Graphics for presenting Statistical Indicators 5-6 March 2007, Rome Eurostat approach to graphical representation of statistical data.
Post-Ranking query suggestion by diversifying search Chao Wang.
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
A Supervised Machine Learning Algorithm for Research Articles Leonidas Akritidis, Panayiotis Bozanis Dept. of Computer & Communication Engineering, University.
CSCE 5073 Section 001: Data Mining Spring Overview Class hour 12:30 – 1:45pm, Tuesday & Thur, JBHT 239 Office hour 2:00 – 4:00pm, Tuesday & Thur,
Improving the Classification of Unknown Documents by Concept Graph Morteza Mohagheghi Reza Soltanpour
Hypertext Categorization using Hyperlink Patterns and Meta Data Rayid Ghani Séan Slattery Yiming Yang Carnegie Mellon University.
The Development of a search engine & Comparison according to algorithms Sung-soo Kim The final report.
Marko Grobelnik, Janez Brank, Blaž Fortuna, Igor Mozetič.
Toward Entity Retrieval over Structured and Text Data Mayssam Sayyadian, Azadeh Shakery, AnHai Doan, ChengXiang Zhai Department of Computer Science University.
Introduction to Biometrics Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #6 Guest Lecture + Some Topics in Biometrics September 12,
FNA/Spring CENG 562 – Machine Learning. FNA/Spring Contact information Instructor: Dr. Ferda N. Alpaslan
Data Mining and Text Mining. The Standard Data Mining process.
© NCSR, Frascati, July 18-19, 2002 CROSSMARC big picture Domain-specific Web sites Domain-specific Spidering Domain Ontology XHTML pages WEB Focused Crawling.
Term Project Proposal By J. H. Wang Apr. 7, 2017.
Information Organization: Overview
Objectives of the Course and Preliminaries
Course Summary (Lecture for CS410 Intro Text Info Systems)
Mining the Data Charu C. Aggarwal, ChengXiang Zhai
Web Taxonomy Integration through Co-Bootstrapping
International Marketing and Output Database Conference 2005
The INTERACT Website: Important source of information for the ETC Community Karen Vandeweghe, Communications Manager, IS Bratislava 27 January 2010.
Integrating Taxonomies
Dissemination and use of aggregate data: structures and functionality
Semi-Automatic Data-Driven Ontology Construction System
CSCE 4143 Section 001: Data Mining Spring 2019.
Information Organization: Overview
Introduction Dataset search
Presentation transcript:

Mapping Between Taxonomies Elena Eneva 27 Sep 2001 Advanced IR Seminar

Taxonomies  Formal systems of orderly classification of knowledge, which are designed for a specific purpose  Change of purpose, change of taxonomies  Businesses often need and keep the information in several structures  Important to be able to automatically map between taxonomies

Useful Mappings  Companies, organizing information in various ways (eg. one for marketing, another for product development)  Personal online bookmark classification  Search engines (eg. Google Yahoo)  EU Committee for Standardization “detailed overview of the existing taxonomies officially used in the EU, in order to derive general concepts such as: information organisation, properties, multilinguality, keywords, etc. and, last but not least, the mapping between.”

Approach German French Textile Automobile By country By industry

Approach German French Textile Automobile By country By industry

Approach German French Textile Automobile By country By industry

Approach German French Textile Automobile By country By industry

Approach Textile Automobile By industry

Approach Textile Automobile By industry abc

Approach Textile Automobile By industry abc

Approach Textile Automobile By industry abc

Approach German French Textile Automobile By country By industry abc

Approach German French Textile Automobile By country By industry abc

Approach German French Textile Automobile By country By industry abc

Learning Algorithms  2 separate learners for the documents  Old doc category -> new doc category  Doc contents -> new category  Weighted average based on confidence  Final result determined by a decision tree  One combined learner – used both old category and contents as features  Use the unlabeled data for bootstrapping (eg. top 1%)

Learners  Decision Tree (C4.5)  Naïve Bayes Classifier (Rainbow)  Support Vector Machine (SVM-Light)  KNN (from Yiming)

Datasets Two classification schemes:  Reuter 2001  Topics  Industry categories  Hoovers-255 and Hoovers-28  28 industry categories  255 industry categories  Web pages from Google and Yahoo

Related Literature  Reconciling Schemas of Disparate Data Sources: A Machine Learning Approach, A. Doan, P. Domingos, and A. Halevy. Proceedings of the ACM SIGMOD Conf. on Management of Data (SIGMOD-2001)  Learning Source Descriptions for Data Integration, A. Doan, P. Domingos, and A. Levy. Proceedings of the Third International Workshop on the Web and Databases (WebDB-2000), pages 81-86, Dallas, TX: ACM SIGMOD.  Learning Mappings between Data Schemas, A. Doan, P. Domingos, and A. Levy. Proceedings of the AAAI-2000 Workshop on Learning Statistical Models from Relational Data, 2000, Austin, TX.

Questions and Ideas  Other possible datasets?  Other learners?  Other papers? The end.