Presentation is loading. Please wait.

Presentation is loading. Please wait.

T.Sharon-A.Frank 1 Internet Resources Discovery (IRD) Concrete Learning Agents.

Similar presentations


Presentation on theme: "T.Sharon-A.Frank 1 Internet Resources Discovery (IRD) Concrete Learning Agents."— Presentation transcript:

1 T.Sharon-A.Frank 1 Internet Resources Discovery (IRD) Concrete Learning Agents

2 2 T.Sharon-A.Frank Concrete Learning Agents Ahoy! - homepage finder –Finds homepage of any person by name and organization. ShopBot - robot for comparison shopping –Finds where user can buy some product in any pre- learned domain. ILA - Internet Learning Agent Learns to understand the content of semi-structured pages in terms of internal concepts.

3 3 T.Sharon-A.Frank Ahoy! Homepage Finder Personal homepages are a relatively new resource to be located on the Web. Search engines don’t do a good job in finding personal homepages because they are hard to define/locate. Ahoy! does it much better. Ahoy! implements a new search method: DRS - Dynamic Reference Sifting.

4 4 T.Sharon-A.Frank Dynamic Reference Sifting (DRS) How to improve recall and precision? DRS architecture is proposed as a way to provide high recall and precision in automatic page finding system. DRS Components: –Candidate References Source –Cross Filter –Heuristic-based filter –Buckets –URL generator –URL pattern extractor

5 5 T.Sharon-A.Frank DRS Components (1) Candidate References Source –comprehensive web indexes, like AltaVista. –E-mail services, like Whowhere, Bigfoot, Iaf Cross Filter –filters candidates based on some orthogonal references source, like e-mail address directories.

6 6 T.Sharon-A.Frank DRS Components (2) Heuristic-based filter –filters candidates using domain-specific knowledge and heuristics –for homepages - look for the words: “homepage”, “my homepage”, “personal page”, etc. –for names - uses nicknames database and templates like “Sharon, Taly”, etc.

7 7 T.Sharon-A.Frank DRS Components (3) Buckets –ranks and labels the candidates into buckets of matches and near misses. URL generator –tries to synthesize new candidate URLs if everything else fails.

8 8 T.Sharon-A.Frank Example: URL Generator

9 9 T.Sharon-A.Frank DRS Components (4) URL pattern extractor –Extracts patterns from successful queries, to be used in URL generator. –For each successful hit saves : name, institution, URL –Learn institutions servers names and homepage paths.

10 10 T.Sharon-A.Frank User inputs target name and institution E-mail services provide user names MetaCrawler provides raw references Institutional DB provides server names Raw references filtered and bucketed Success? URLs generated using server name, username, stored URL patterns URL patterns extracted and stored References returned YES NO Ahoy! Flow

11 11 T.Sharon-A.Frank Ahoy! Search Example

12 12 T.Sharon-A.Frank Ahoy! Example: Success

13 13 T.Sharon-A.Frank Ahoy! Example Details

14 14 T.Sharon-A.Frank Search Engines Results

15 15 T.Sharon-A.Frank Ahoy! Evaluation Recall:Precision:

16 16 T.Sharon-A.Frank ILA - Internet Learning Agent Translation problem: how to interpret the source response in terms of internal concepts of the agent? Search engines can’t understand the information contained in the returned source response. ILA, as a learning agent, parses the response and uses heuristics to learn its format and data fields. ILA uses learning by comparison.

17 17 T.Sharon-A.Frank etc/passwd - Sample daemon:*:1:1:Mr Background:/:/dev/null sys:*:2:2::/:/bin/true bin:*:3:3::/bin:/bin/true gibuy:bncKACcgNpmFA:49:3:,,,,:/u/opers/gibuy:/bin/tc sh ariel:zNdAzJUj2G6vs:105:100:Ariel J. Frank,CS 019,035318407,03749454,:/u/opers/ariel:/bin/tcsh taly:pxEi5OQD/4N3E:1991:180:Sharon Taly:/u/grad/taly:/bin/Tcsh


Download ppt "T.Sharon-A.Frank 1 Internet Resources Discovery (IRD) Concrete Learning Agents."

Similar presentations


Ads by Google