Presentation is loading. Please wait.

Presentation is loading. Please wait.

Veljko Milutinović, Laslo Kraus, Jelena Mirković, Nela Tomča, Saša Slijepčević, Suzana Cvetićanin, Ljiljana Nešić, Mladen Mrkić, Vladan Obradović, Igor.

Similar presentations


Presentation on theme: "Veljko Milutinović, Laslo Kraus, Jelena Mirković, Nela Tomča, Saša Slijepčević, Suzana Cvetićanin, Ljiljana Nešić, Mladen Mrkić, Vladan Obradović, Igor."— Presentation transcript:

1 Veljko Milutinović, Laslo Kraus, Jelena Mirković, Nela Tomča, Saša Slijepčević, Suzana Cvetićanin, Ljiljana Nešić, Mladen Mrkić, Vladan Obradović, Igor Čakulev Intelligent Internet Search Department of Computer Engineering School of Electrical Engineering University of Belgrade POB 35-54, 11120 Belgrade Serbia, Yugoslavia vm@etf.bg.ac.yu 

2 Problem statement Number of Internet presentations and Web servers grows exponentially Variety of presentations grows, too Search and retrieval of documents gets harder Existing tools do not give satisfactory results

3 Existing solutions Keyword search and document indexing - e.g. Altavista Following links - e.g. Spiders + search is exhaustive - too many keywords result in too few documents found, and vice versa - it requires a large database of indexed documents + fast, no indexing and no database - it searches only a limited number of documents + possibility of changing the input parameters during the search - poor evaluation function

4 Our solution Design of intelligent agents for Internet search Two basic approaches: 1. Simulated annealing - inherently serial 2. Genetic algorithms - inherently parallel Character of the search: 1. Local search - following only the links of the input documents - Best First Search Algorithm 2. Global search - following the links of the input documents and occasionally mutating them - Genetic Algorithm Spider implementation: 2. Mobile 1. Static

5 Our research Essence: Creating a set of packages for experimenting in the domain of intelligent Internet search All written in Sun Java - JDK 1.1 Lego approach - stand alone applications but easily interfaced with one another Code and executable version available at http://galeb.etf.bg.ac.yu/~ebihttp://galeb.etf.bg.ac.yu/~ebi Further research in mobile domain

6 Measure the fitness value for each document in CC Set Select the best one for the Output Set Best First Search Algorithm Select the initial WWW presentation or a set thereof Extract all URLs and fetch the corresponding WWW presentations; They are inserted into the CurrentConfiguration Set CC Set Output Set and add documents linked to it into the CC Set. Input Set

7 Basic Genetic Algorithm 1. Initialize the population randomly pick a set of possible solutions 2. Select individuals for the mating pool measure the fitness value and pick the best ones 3. Perform crossover create new individuals using genetic material from parents in the mating pool 4. Perform mutation randomly create new individuals, completely unrelated to those in the mating pool 5. Insert offspring in the population 6. Is the stopping criteria satisfied? desired number of solutions is found or specified time for search has elapsed No? GOTO Step 2 Yes? The end!

8 Genetic Algorithm applied to Internet Search Select the initial WWW presentation or a set thereof Extract all URLs and fetch the corresponding WWW presentations; They are inserted into the CurrentConfiguration Set Measure the fitness value for each document in CC Set CC Set Output Set and add documents linked to it into the CC Set. Mutate - e.g. by inserting documents from the database of URLs Select the best one for the Output Set Database Input Set

9 Mutation operator Generational - generate a new URL DB based - pick existing URL from a database Semantic - use some logical reasoning to direct the search

10 Package #1 - Spider Spider - off-line browser Author: Saša Slijepčević sascha@galeb.etf.bg.ac.yusascha@galeb.etf.bg.ac.yu Fetches all linked documents up to the specified depth and stores them on the local disk in the structure suitable for off-line browsing

11 Agent - program for the Best First Search Algorithm Author: Nela Tomča nela@galeb.etf.bg.ac.yunela@galeb.etf.bg.ac.yu Package #2 - Agent Starts from the input set of URLs and finds the most similar to them following the links in input documents

12 Generator - program for generation of database of topic-sorted URLs Authors: Mladen Mrkić mladen@galeb.etf.bg.ac.yumladen@galeb.etf.bg.ac.yu Vladan Obradović OV32691D@kiklop.etf.bg.ac.yuOV32691D@kiklop.etf.bg.ac.yu yahoo Database Package #3 - Generator It fills the existing database with URLs obtained from www.yahoo.com as a result of a query submitted by the user, under the specified categorywww.yahoo.com

13 Package #4 - Pathfinder Pathfinder - program for discovering all servers with the same sufix as the one submitted by the user Author: Igor Čakulev igor@galeb.etf.bg.ac.yuigor@galeb.etf.bg.ac.yu Example: for galeb.etf.bg.ac.yu it gives orao.etf.bg.ac.yu; zmaj.etf.bg.ac.yu; buef31.etf.bg.ac.yu; kiklop.etf.bg.ac.yu...

14 Package #5 - Tropical Tropical - program for performing genetic algorithm search with database mutation Author: Jelena Mirković sunshine@galeb.etf.bg.ac.yusunshine@galeb.etf.bg.ac.yu Database Repeating the Hong Kong experiment Chen, H., Chung, Y., Ramsey, M., Yang, C., Ma, P., Yen, J., "Intelligent Spider for Internet Searching", Proceedings of the Thirtieth Annual Hawaii International Conference on System Sciences, Maui, Hawaii, USA, January 1997.

15

16

17 Packages in progress - Space Space - program for performing genetic algorithm search with database mutation and occasional spatial locality mutation Database

18 Packages in progress - Time Time - program for performing genetic algorithm search with database mutation and occasional temporal locality mutation Topic Database Time Database

19 Current System

20 The Vision

21 Newly open problems Too many linked documents imply high network traffic Disk space consumed increases exponentially with the number of linked documents, while only small percent of them is found to be useful Program is unable to learn Future directions Implementation in mobile domain Autonomous agents that transport themselves on the host computer and perform examination of documents there, transferring to the home computer only the best ones network traffic and disk usage decreases Intelligent agents that remain active in the background able to learn and adapt to user’s needs

22 References Goldberg, D., Genetic Algorithms in Search, Optimization and Machine Learning, Addison- Wesley, Reading, Massachusetts, USA 1989. Milojičić S., Musliner D., Shroeder-Preikschat W "Agents: Mobility and communication", Proceedings of the Thirty-First Annual Hawaii International Conference on System Sciences, Maui, Hawaii, USA, January 1998. Joerg P., Mueller "The Design of Intelligent Agents: A layered approach", Springer-Verlag, Germany, 1997. Chen, H., Chung, Y., Ramsey, M., Yang, C., Ma, P., Yen, J., "Intelligent Spider for Internet Searching", Proceedings of the Thirtieth Annual Hawaii International Conference on System Sciences, Maui, Hawaii, USA, January 1997. Kraus, L., Milutinovic, V., "Technical Report on a New Genetic Algorithm for Internet Search Based on Priciples of Spatial and Temporal Locality", Proceedings of the SinfoN '97, Zlatibor, Serbia, Yugoslavia, November 1997. Tomca, N., A Flexible Tool for Jaccard Score Evaluation, B.Sc. Thesis, University of Belgrade, Belgrade, Serbia, Yugoslavia, November 1997. Award paper at SinfoN-97, Zlatibor, Serbia, Yugoslavia, October 1997. Slijepcevic, S., A Programmable Agent for Internet Retrieval, B.Sc. Thesis, University of Belgrade, Belgrade, Serbia, Yugoslavia, October 1997. Award paper at SinfoN-97, Zlatibor, Serbia, Yugoslavia, October 1997.


Download ppt "Veljko Milutinović, Laslo Kraus, Jelena Mirković, Nela Tomča, Saša Slijepčević, Suzana Cvetićanin, Ljiljana Nešić, Mladen Mrkić, Vladan Obradović, Igor."

Similar presentations


Ads by Google