Information Organization: Overview

Slides:



Advertisements
Similar presentations
Albert Gatt Corpora and Statistical Methods Lecture 13.
Advertisements

Taxonomies of Knowledge: Building a Corporate Taxonomy Wendi Pohs, Iris Associates
Text mining Extract from various presentations: Temis, URI-INIST-CNRS, Aster Data …
Search Engines and Information Retrieval
April 22, Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Doerre, Peter Gerstl, Roland Seiffert IBM Germany, August 1999 Presenter:
6/16/20151 Recent Results in Automatic Web Resource Discovery Soumen Chakrabartiv Presentation by Cui Tao.
Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
Information Retrieval in Practice
Modern Information Retrieval Chapter 1 Introduction.
Recommender systems Ram Akella November 26 th 2008.
Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Huimin Ye.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
Overview of Web Data Mining and Applications Part I
Chapter 5: Information Retrieval and Web Search
1/16 Final project: Web Page Classification By: Xiaodong Wang Yanhua Wang Haitang Wang University of Cincinnati.
1 SOCIAL BOOKMARKING 101. HIBA KHALID BILAL SAEED KHAN FARID ALIANI ASKARI HASAN SOCIAL BOOKMARKING.
1 Text Categorization  Assigning documents to a fixed set of categories  Applications:  Web pages  Recommending pages  Yahoo-like classification hierarchies.
Search Engines and Information Retrieval Chapter 1.
Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.
Review of the web page classification approaches and applications Luu-Ngoc Do Quang-Nhat Vo.
Improving Web Spam Classification using Rank-time Features September 25, 2008 TaeSeob,Yun KAIST DATABASE & MULTIMEDIA LAB.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Topical Crawlers for Building Digital Library Collections Presenter: Qiaozhu Mei.
WebMining Web Mining By- Pawan Singh Piyush Arora Pooja Mansharamani Pramod Singh Praveen Kumar 1.
Features and Algorithms Paper by: XIAOGUANG QI and BRIAN D. DAVISON Presentation by: Jason Bender.
Proposal for Term Project J. H. Wang Mar. 2, 2015.
1 Automatic Classification of Bookmarked Web Pages Chris Staff Second Talk February 2007.
Learning from observations
Individualized Knowledge Access David Karger Lynn Andrea Stein Mark Ackerman Ralph Swick.
How Do We Find Information?. Key Questions  What are we looking for?  How do we find it?  Why is it difficult? “A prudent question is one-half of wisdom”
1 Automatic indexing Salton: When the assignment of content identifiers is carried out with the aid of modern computing equipment the operation becomes.
Introduction to Information Retrieval Example of information need in the context of the world wide web: “Find all documents containing information on computer.
Information Retrieval Transfer Cycle Dania Bilal IS 530 Fall 2007.
Text Clustering Hongning Wang
Toward Semantic Search: RDFa based facet browser Jin Guang Zheng Tetherless World Constellation.
Relevance Models and Answer Granularity for Question Answering W. Bruce Croft and James Allan CIIR University of Massachusetts, Amherst.
A Supervised Machine Learning Algorithm for Research Articles Leonidas Akritidis, Panayiotis Bozanis Dept. of Computer & Communication Engineering, University.
Automatic vs manual indexing Focus on subject indexing Not a relevant question? –Wherever full text is available, automatic methods predominate Simple.
Bringing Order to the Web : Automatically Categorizing Search Results Advisor : Dr. Hsu Graduate : Keng-Wei Chang Author : Hao Chen Susan Dumais.
1 Text Categorization  Assigning documents to a fixed set of categories  Applications:  Web pages  Recommending pages  Yahoo-like classification hierarchies.
SIMS 202, Marti Hearst Final Review Prof. Marti Hearst SIMS 202.
Information Retrieval in Practice
Information Storage and Retrieval Fall Lecture 1: Introduction and History.
Information Organization: Overview
Visual Information Retrieval
Lecture 1: Introduction and the Boolean Model Information Retrieval
System for Semi-automatic ontology construction
Proposal for Term Project
Information Retrieval and Web Search
Introduction to Data Science Lecture 7 Machine Learning Overview
Information Retrieval and Web Search
Text & Web Mining 9/22/2018.
Taxonomies, Lexicons and Organizing Knowledge
Multimedia Information Retrieval
Web & Databases Dania Bilal IS 530 Fall 2006.
Information Organization: Clustering
Introduction into Knowledge and information
Text Categorization Rong Jin.
Text Categorization Assigning documents to a fixed set of categories
CS Fall 2016 (Shavlik©), Lecture 2
Data Mining Chapter 6 Search Engines
IL Step 3: Using Bibliographic Databases
Networked Information Resources
Ying Dai Faculty of software and information science,
Information Retrieval
Information Retrieval and Web Design
Information Retrieval and Web Design
Text Mining Application Programming Chapter 9 Text Categorization
Presentation transcript:

Information Organization: Overview

IO: What What is Information Organization? Systematic arrangement of items group similar items together assign meaning to groups determine relationships between groups assign items to groups Grouping 1 Grouping 2 Grouping 3 Big Small Square Circle Blue Red Small Big Small Big Small Big Small Big Blue Red Square Circle Square Circle Search Engine

IO: Why Why organize information? Why do we put certain things in certain places? Closet - Seasonal groups - Pants vs. Shirts - Color groups - Favorite vs. non-favorite To find things easier → Information Retrieval (IR) Taxonomy Food Good Bad sweet taste smell like milk too hot hard to chew To make sense of the world → Knowledge Discovery (KD) Search Engine

IO: How What to do when information to organize is massive? How do we organize information? General Approach anticipate how item is searched for e.g. by subject, date, author look for common features among items determine what an item is about Classification Identification/creation of classes Assignment of items into classes Clustering group similar items together What to do when information to organize is massive? 10,000 books 100,000 journal papers 1,000,000 web pages Search Engine

Machine Learning: Introduction What is Machine Learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997). Any change in a system that allows it to perform better the second time on repetition of the same task or on task drawn from the same population (H. Simon, 1983). How can systems improve? By acquiring new knowledge Acquiring new facts Acquiring new skills By adapting its behavior Solving problems more accurately Solving problems more efficiently Search Engine

Machine Learning: Introduction Which is different? Which are similar? How is learning possible? Because there are regularities in the world. Search Engine

ML: Classification vs. Clustering Task is to learn to assign instances to predefined classes Supervised Learning data has to specify what we are trying to learn (the classes) requires training data predefined classes and classified items Clustering Task is to learn a classification from the data no predefined classification is required Unsupervised Learning data doesn’t specify what we are trying to learn (the clusters) Clustering algorithms divide a data set into natural groups (clusters) items in the same cluster are similar to each other and share certain properties Search Engine

IO for IR Clustering Document Clustering Cluster Hypothesis Documents having similar contents tend to be relevant to the same query Rank clusters by Query-Cluster Similarity Cluster documents based on vector similarity Post-retrieval clustering Scatter-Gather Keyword Clustering Automatic Thesaurus Construction Query Expansion Search Engine

IO for IR Classification Document Categorization classify documents into manually defined categories supports hierarchical browsing, query expansion via relevance feedback Document Indexing assign keywords to documents automatic indexing with controlled vocabulary, metadata generation Document Filtering e.g. news delivery, email spam filtering Query Classification collection selection algorithm selection Search Engine