Information Organization: Overview

Slides:

Advertisements

Similar presentations

Albert Gatt Corpora and Statistical Methods Lecture 13.

Advertisements

Taxonomies of Knowledge: Building a Corporate Taxonomy Wendi Pohs, Iris Associates

Text mining Extract from various presentations: Temis, URI-INIST-CNRS, Aster Data …

Search Engines and Information Retrieval

April 22, Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Doerre, Peter Gerstl, Roland Seiffert IBM Germany, August 1999 Presenter:

6/16/20151 Recent Results in Automatic Web Resource Discovery Soumen Chakrabartiv Presentation by Cui Tao.

Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.

Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.

Information Retrieval in Practice

Modern Information Retrieval Chapter 1 Introduction.

Recommender systems Ram Akella November 26 th 2008.

Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.

Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Huimin Ye.

Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.

Overview of Web Data Mining and Applications Part I

Chapter 5: Information Retrieval and Web Search

1/16 Final project: Web Page Classification By: Xiaodong Wang Yanhua Wang Haitang Wang University of Cincinnati.

1 SOCIAL BOOKMARKING 101. HIBA KHALID BILAL SAEED KHAN FARID ALIANI ASKARI HASAN SOCIAL BOOKMARKING.

1 Text Categorization  Assigning documents to a fixed set of categories  Applications:  Web pages  Recommending pages  Yahoo-like classification hierarchies.

Search Engines and Information Retrieval Chapter 1.

Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.

Review of the web page classification approaches and applications Luu-Ngoc Do Quang-Nhat Vo.

Improving Web Spam Classification using Rank-time Features September 25, 2008 TaeSeob,Yun KAIST DATABASE & MULTIMEDIA LAB.

UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.

Topical Crawlers for Building Digital Library Collections Presenter: Qiaozhu Mei.

WebMining Web Mining By- Pawan Singh Piyush Arora Pooja Mansharamani Pramod Singh Praveen Kumar 1.

Features and Algorithms Paper by: XIAOGUANG QI and BRIAN D. DAVISON Presentation by: Jason Bender.

Proposal for Term Project J. H. Wang Mar. 2, 2015.

1 Automatic Classification of Bookmarked Web Pages Chris Staff Second Talk February 2007.

Learning from observations

Individualized Knowledge Access David Karger Lynn Andrea Stein Mark Ackerman Ralph Swick.

How Do We Find Information?. Key Questions  What are we looking for?  How do we find it?  Why is it difficult? “A prudent question is one-half of wisdom”

1 Automatic indexing Salton: When the assignment of content identifiers is carried out with the aid of modern computing equipment the operation becomes.

Introduction to Information Retrieval Example of information need in the context of the world wide web: “Find all documents containing information on computer.

Information Retrieval Transfer Cycle Dania Bilal IS 530 Fall 2007.

Text Clustering Hongning Wang

Toward Semantic Search: RDFa based facet browser Jin Guang Zheng Tetherless World Constellation.

Relevance Models and Answer Granularity for Question Answering W. Bruce Croft and James Allan CIIR University of Massachusetts, Amherst.

A Supervised Machine Learning Algorithm for Research Articles Leonidas Akritidis, Panayiotis Bozanis Dept. of Computer & Communication Engineering, University.

Automatic vs manual indexing Focus on subject indexing Not a relevant question? –Wherever full text is available, automatic methods predominate Simple.

Bringing Order to the Web : Automatically Categorizing Search Results Advisor ： Dr. Hsu Graduate ： Keng-Wei Chang Author ： Hao Chen Susan Dumais.

1 Text Categorization  Assigning documents to a fixed set of categories  Applications:  Web pages  Recommending pages  Yahoo-like classification hierarchies.

SIMS 202, Marti Hearst Final Review Prof. Marti Hearst SIMS 202.

Information Retrieval in Practice

Information Storage and Retrieval Fall Lecture 1: Introduction and History.

Information Organization: Overview

Visual Information Retrieval

Lecture 1: Introduction and the Boolean Model Information Retrieval

System for Semi-automatic ontology construction

Proposal for Term Project

Information Retrieval and Web Search

Introduction to Data Science Lecture 7 Machine Learning Overview

Information Retrieval and Web Search

Text & Web Mining 9/22/2018.

Taxonomies, Lexicons and Organizing Knowledge

Multimedia Information Retrieval

Web & Databases Dania Bilal IS 530 Fall 2006.

Information Organization: Clustering

Introduction into Knowledge and information

Text Categorization Rong Jin.

Text Categorization Assigning documents to a fixed set of categories

CS Fall 2016 (Shavlik©), Lecture 2

Data Mining Chapter 6 Search Engines

IL Step 3: Using Bibliographic Databases

Networked Information Resources

Ying Dai Faculty of software and information science,

Information Retrieval

Information Retrieval and Web Design

Information Retrieval and Web Design

Text Mining Application Programming Chapter 9 Text Categorization

Presentation transcript:

Information Organization: Overview

IO: What What is Information Organization? Systematic arrangement of items group similar items together assign meaning to groups determine relationships between groups assign items to groups Grouping 1 Grouping 2 Grouping 3 Big Small Square Circle Blue Red Small Big Small Big Small Big Small Big Blue Red Square Circle Square Circle Search Engine

IO: Why Why organize information? Why do we put certain things in certain places? Closet - Seasonal groups - Pants vs. Shirts - Color groups - Favorite vs. non-favorite To find things easier → Information Retrieval (IR) Taxonomy Food Good Bad sweet taste smell like milk too hot hard to chew To make sense of the world → Knowledge Discovery (KD) Search Engine

IO: How What to do when information to organize is massive? How do we organize information? General Approach anticipate how item is searched for e.g. by subject, date, author look for common features among items determine what an item is about Classification Identification/creation of classes Assignment of items into classes Clustering group similar items together What to do when information to organize is massive? 10,000 books 100,000 journal papers 1,000,000 web pages Search Engine

Machine Learning: Introduction What is Machine Learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997). Any change in a system that allows it to perform better the second time on repetition of the same task or on task drawn from the same population (H. Simon, 1983). How can systems improve? By acquiring new knowledge Acquiring new facts Acquiring new skills By adapting its behavior Solving problems more accurately Solving problems more efficiently Search Engine

Machine Learning: Introduction Which is different? Which are similar? How is learning possible? Because there are regularities in the world. Search Engine

ML: Classification vs. Clustering Task is to learn to assign instances to predefined classes Supervised Learning data has to specify what we are trying to learn (the classes) requires training data predefined classes and classified items Clustering Task is to learn a classification from the data no predefined classification is required Unsupervised Learning data doesn’t specify what we are trying to learn (the clusters) Clustering algorithms divide a data set into natural groups (clusters) items in the same cluster are similar to each other and share certain properties Search Engine

IO for IR Clustering Document Clustering Cluster Hypothesis Documents having similar contents tend to be relevant to the same query Rank clusters by Query-Cluster Similarity Cluster documents based on vector similarity Post-retrieval clustering Scatter-Gather Keyword Clustering Automatic Thesaurus Construction Query Expansion Search Engine

IO for IR Classification Document Categorization classify documents into manually defined categories supports hierarchical browsing, query expansion via relevance feedback Document Indexing assign keywords to documents automatic indexing with controlled vocabulary, metadata generation Document Filtering e.g. news delivery, email spam filtering Query Classification collection selection algorithm selection Search Engine