Web Services and Application of Multi-Agent Paradigm for DL Yueyu Fu & Javed Mostafa School of Library and Information Science Indiana University, Bloomington.

Slides:



Advertisements
Similar presentations
DELOS Highlights COSTANTINO THANOS ITALIAN NATIONAL RESEARCH COUNCIL.
Advertisements

Provenance in Open Distributed Information Systems Syed Imran Jami PhD Candidate FAST-NU.
0 General information Rate of acceptance 37% Papers from 15 Countries and 5 Geographical Areas –North America 5 –South America 2 –Europe 20 –Asia 2 –Australia.
Chapter 4 DECISION SUPPORT AND ARTIFICIAL INTELLIGENCE
Software Engineering Techniques for the Development of System of Systems Seminar of “Component Base Software Engineering” course By : Marzieh Khalouzadeh.
The Decision-Making Process IT Brainpower
WebMiningResearch ASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007.
April 22, Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Doerre, Peter Gerstl, Roland Seiffert IBM Germany, August 1999 Presenter:
Spotlighting Decentralized P2P File Sharing Archie Kuo and Ethan Le Department of Computer Science San Jose State University.
Brent Dingle Marco A. Morales Texas A&M University, Spring 2002
Web Mining Research: A Survey
Web Mining Research: A Survey
WebMiningResearch ASurvey Web Mining Research: A Survey By Raymond Kosala & Hendrik Blockeel, Katholieke Universitat Leuven, July 2000 Presented 4/18/2002.
MultiAgent Systems. Distributed Artificial Intelligence MultiAgent Systems Characteristics of MAS Challenges of MAS Networking Remote Method Invocation.
Web Mining Research: A Survey
WebMiningResearchASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007 Revised.
1 Introduction to Load Balancing: l Definition of Distributed systems. Collection of independent loosely coupled computing resources. l Load Balancing.
Developing Intelligent Agents and Multiagent Systems for Educational Applications Leen-Kiat Soh Department of Computer Science and Engineering University.
Yimam & Kobsa July 13, 2000TWIST 2000 Centralization vs. Decentralization Issues in Internet-based KMS: Experiences from Expertise Recommender Systems.
“Multi-Agent Systems for Distributed Data Fusion in Peer-to-Peer Environment” Smirnova Vira ”Cheese Factory”/
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Huimin Ye.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
Intelligent Agents for the Banking and Insurance Market Intelligent Business Support.
Computer System Architectures Computer System Software
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Research paper: Web Mining Research: A survey SIGKDD Explorations, June Volume 2, Issue 1 Author: R. Kosala and H. Blockeel.
Topic 2: Multi-Agent Systems a practical example categories of MAS examples definitions: agents and MAS conclusion.
What is Enterprise Architecture?
Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.
Introduction to DISTRIBUTED SYSTEMS Tran, Van Hoai Department of Systems & Networking Faculty of Computer Science & Engineering HCMC University of Technology.
Distributed Systems 1 CS- 492 Distributed system & Parallel Processing Sunday: 2/4/1435 (8 – 11 ) Lecture (1) Introduction to distributed system and models.
©2008 Srikanth Kallurkar, Quantum Leap Innovations, Inc. All rights reserved. Apollo – Automated Content Management System Srikanth Kallurkar Quantum Leap.
Four Types of Decisions (p p.130) Structured vs. Nonstructured(Examples?) –Structured: Follow rules and criteria. The right answer exists. No “feel”
© 2007 Tom Beckman Features:  Are autonomous software entities that act as a user’s assistant to perform discrete tasks, simplifying or completely automating.
Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen, CS Division, UC Berkeley Susan Dumais, Microsoft Research ACM:CHI April.
Bibster AIFB Bibster A Semantics-Based Bibliographic Peer-to-Peer System Peter Haase, Steffen Staab, Rudi Studer, Frank van Harmelen, Michal Plechawski.
Travis Steel. Objectives What is the Agent Paradigm? What is Agent-Oriented Design and how is it different than OO? When to apply AOD techniques? When.
WebMining Web Mining By- Pawan Singh Piyush Arora Pooja Mansharamani Pramod Singh Praveen Kumar 1.
Fundamentals of Information Systems, Third Edition2 Principles and Learning Objectives Artificial intelligence systems form a broad and diverse set of.
Subtask 1.8 WWW Networked Knowledge Bases August 19, 2003 AcademicsAir force Arvind BansalScott Pollock Cheng Chang Lu (away)Hyatt Rick ParentMark (SAIC)
Edinburg March 2001CROSSMARC Kick-off meetingICDC ICDC background and know-how and expectations from CROSSMARC CROSSMARC Project IST Kick-off.
DISTRIBUTED COMPUTING Introduction Dr. Yingwu Zhu.
Agents that Reduce Work and Information Overload and Beyond Intelligent Interfaces Presented by Maulik Oza Department of Information and Computer Science.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Externally growing self-organizing maps and its application to database visualization and exploration.
Chapter 4 Decision Support System & Artificial Intelligence.
COLLABORATIVE CLASSIFIER AGENTS Studying the Impact of Learning in Distributed Document Classification Weimao Ke, Javed Mostafa, and Yueyu Fu {wke, jm,
Distributed Models for Decision Support Jose Cuena & Sascha Ossowski Pesented by: Gal Moshitch & Rica Gonen.
Multiagent System Katia P. Sycara 일반대학원 GE 랩 성연식.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Intelligent Agents. 2 What is an Agent? The main point about agents is they are autonomous: capable of acting independently, exhibiting control over their.
Modern Systems Analysis and Design Third Edition Chapter 2 Succeeding as a Systems Analyst 2.1.
Data Mining Concepts and Techniques Course Presentation by Ali A. Ali Department of Information Technology Institute of Graduate Studies and Research Alexandria.
Enhanced hypertext categorization using hyperlinks Soumen Chakrabarti (IBM Almaden) Byron Dom (IBM Almaden) Piotr Indyk (Stanford)
An Architecture-Centric Approach for Software Engineering with Situated Multiagent Systems PhD Defense Danny Weyns Katholieke Universiteit Leuven October.
Cluster computing. 1.What is cluster computing? 2.Need of cluster computing. 3.Architecture 4.Applications of cluster computing 5.Advantages of cluster.
WebMiningResearchASurvey Web Mining Research: A Survey Authors: Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Computer Science Department University.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
Artificial Intelligence
Service Oriented Architecture (SOA) Prof. Wenwen Li School of Geographical Sciences and Urban Planning 5644 Coor Hall
Introduction to Machine Learning, its potential usage in network area,
Web Services and Application of Multi-Agent Paradigm for DL
Information Retrieval and Web Search
Information Retrieval and Web Search
Data Mining Chapter 6 Search Engines
Interdisciplinary Program in Cognitive Science Lee, Jung-Woo
Distributed computing deals with hardware
Panagiotis G. Ipeirotis Luis Gravano
Introduction To Distributed Systems
Web Mining Research: A Survey
Presentation transcript:

Web Services and Application of Multi-Agent Paradigm for DL Yueyu Fu & Javed Mostafa School of Library and Information Science Indiana University, Bloomington 2005

Outline Background Centralized vs. Distributed Classification Multi-agent Classification Discussion

Background Information overload –MEDLINE database contains over 12 million records dating back to the mid-1960’s. –Google claims that it can search more than 8 billion web pages, which is only a small fraction of the whole web. Information organization –Document classification Document classification is an important operational problem in digital library research.

What is document classification? Document classification --- a process of assigning natural language texts to predefined categories. –a news article about a basketball game – sport –a patent document about computer chips - technology –a new article about war in Iraq – politics/economic

Why classification is important? Categorization Classification Retrieval NLP Clustering Filtering Extraction Routing

Document Classification Human/machine  Manual classification  Automatic classification Organization structure  Centralized classification  Distributed classification

Manual Classification Traditional approach – Manual classification Principal schemes: Dewey Decimal Classification Universal Decimal Classification Library of Congress Classification  Con: heavily rely on domain experts and human judgments

Automatic Classification Alternative approach – Automatic classification Automatically classify texts based on a set of pre- classified documents using machine learning techniques Classifiers built in a centralized and monolithic manner  Pro: automation, efficient, and consistent.

Centralized Classification

Distributed Classification

Centralized vs. Distributed Classification Centralized approach –Classify the documents independently using a centralized and monolithic classification program Distributed approach –Allows multiple classification programs to work together to classify the documents in a distributed computing environment

Centralized Classification: disadvantages Limited by its knowledge Limited by its computing power Performance bottleneck Single point of failure

Distributed Classification: advantages More knowledge More computing resources Reliable --- avoid single point of failure Scalable --- dealing with large data set

Multi-Agent Paradigm Evolved from distributed artificial intelligence in the late 80’s Multi-agent system (MAS) is “a loosely coupled network of problem solvers that work together to solve problems that are beyond their individual capabilities.” (Durfee & Montgomery, 1989)

MAS: characteristics Composed of multiple autonomous components, called agent Each agent has incomplete capabilities to solve a problem No global system control Data is decentralized Computation is asynchronous

MAS: advantages Distributes computational resources and capabilities across a network of interconnected agents Avoids the “single point of failure” problem A modular, scalable architecture. Solutions to problems that can naturally be regarded as a society of autonomous interacting components-agents. Solutions that efficiently use information sources that are spatially distributed. Solutions in situations where expertise is distributed. Enhances overall system performance.

Multi-Agent Collaboration and Classification of Information (MACCI) Agent-1Agent-2 Agent-6Agent-3 Agent-5Agent-4 Admin Agent

MACCI - Experiment Data set –RCV1-v2: 800,000 manually categorized newswire stories from Reuters, Ltd. Classification method –Cosine similarity Effectiveness measure

MACCI - Results

Agent Collaboration Agents collaborate to help each other. Agent communication and interaction are controlled by agent collaboration strategies. Collaboration strategies –Random strategy –Good-Neighbor strategy

Multi-Agent Collaboration and Classification of Information (MACCI) Agent-1Agent-2 Agent-6Agent-3 Agent-5Agent-4 Admin Agent

Random Strategy 1.An agent (A) asks another agent (B) for help randomly when it fails to classify a document. 2.Then agent B tries to classify the document and report the result to agent A. i.If agent B classifies the document successfully, then this tasks is finished; ii.If agent B fails to classify, agent A will repeat the steps to ask other agents for help until the document has been classified or all the other agents in the environment has been asked.

Multi-Agent Collaboration and Classification of Information (MACCI) Agent-1Agent-2 Agent-6Agent-3 Agent-5Agent-4 Admin Agent

Good-Neighbor Strategy 1.The administration agent distributed a document from the document pool to a randomly chosen classification agent. The process continues until the document pool is empty. 2.If an agent successfully classifies a document sent from the administration agent, it sends the document to all the agents in its success list for other potential classification. The help degree is set to 1. 3.If an agent fails to classify a document sent from the administration agent, it sends the document to the four top level parent agents in its failure list for help. The help degree is set to 1. 4.If an agent successfully classifies a document sent from another classification agent and the help degree is smaller than 4, it sends the document to the agents that represent its child classes in its success list. The help degree is incremented by 1. 5.If an agent fails to classify a document sent from another classification agent, it doesn’t take any action.

Multi-Agent Collaboration and Classification of Information (MACCI) Agent-1Agent-2 Agent-6Agent-3 Agent-5Agent-4 Admin Agent

Conclusion The multi-agent approach can successfully achieve the same level of effective for document classification as the centralized approaches do. High level of effectiveness can be achieved by adapting carefully designed agent collaboration strategies.