Presentation is loading. Please wait.

Presentation is loading. Please wait.

Enhancing Internet Search Engines to Achieve Concept- based Retrieval F. Lu, T. Johnsten, V. Raghavan, and D. Traylor InForum ‘99 May 5 -6, 1999.

Similar presentations


Presentation on theme: "Enhancing Internet Search Engines to Achieve Concept- based Retrieval F. Lu, T. Johnsten, V. Raghavan, and D. Traylor InForum ‘99 May 5 -6, 1999."— Presentation transcript:

1 Enhancing Internet Search Engines to Achieve Concept- based Retrieval F. Lu, T. Johnsten, V. Raghavan, and D. Traylor InForum ‘99 May 5 -6, 1999

2 Agenda Information on the Internet. Boolean Retrieval Model and the Internet. Concept-Based Retrieval (RUBRIC / CS 3 ). CS 3 and Boolean Search Engines. Future Work.

3 Information on the Internet Large volume. Rapid growth rate. Wide variations in quality and type.

4 Boolean Retrieval Model and the Internet Most Internet search engines are based on the Boolean Retrieval Model. Boolean Retrieval Model is relatively easy to implement. Limitations: –Inability to assign weights to query or document terms. –Inability to rank retrieved documents. – Naïve users have difficulty in using

5 Concept-Based Retrieval Address shortcomings of Boolean Retrieval Model. Search Requests specified in terms of concepts structured as rule-base trees.

6 Development of Rule-Base Trees (General) Top-down refinement strategy. Support for AND / OR relationships. Support for user-defined weights.

7

8 Development of Rule-Base Trees (CS 3 ) Concept-Set Structuring System (CS 3 ) CS 3 supports the creation, storage and modification of user-defined concepts Post-processing of results of sub-queries CS 3 user-interface.

9 CS3 User Interface

10 Evaluation of Rule-Base Trees (RUBRIC) Run-time, bottom-up analysis. Propagation of weight values (MIN / MAX). Disadvantage of run-time analysis.

11

12 Evaluation of Rule-Base Trees (CS 3 ) Static, bottom-up analysis. Construct Minimal Term Set (MTS). Propagation of terms. CS 3 user-interface.

13 MTS-Minimal Term Set lA MTS for a topic is a set of terms such that if each term in the set appears in the document, the document would get a RSV larger than 0. If not, the RSV would be 0. lA topic could have more than one MTSs. lA user can choose from those MTSs to perform a search to his needs.

14

15

16

17

18 Concept-Based Retrieval and Boolean Search Engines CS 3 is designed to interface with existing Boolean search engines. U.S. Department of Energy’s “Information- Bridge” search engine. U.S. Department of Transportation’s “National Transportation Library” search engine.

19 System Architecture Client (Java/ Applet ) CORBACGI Server (JAVA)Server (JAVA/C++) JDBC ORACLE DOE InfoBridge … etc.

20 Information-Bridge and CS 3 Search request: Boolean Vs. Concept Output: Non-Ranked Vs. Ranked. Calculation of RSV: –Given a document D and a set S of MTS expressions satisfied by D, the RSV of D is equal to the sum of all the weights of S plus the maximum weight in S.

21 Information-Bridge and CS 3 (Example) Boolean search request (“Environmental Science Network” Form): –(“Hydrogeology” OR “Dnapl” OR (“Colloid*” AND “Environmental Transport”)). Concept (CS 3 ): –“Hydrogeology”. –Rule-Base Tree.

22 CS3 Hydrogeology Rule Base

23 CS3 search results

24 Current and Future Work Conduct experiments to evaluate effectiveness (future). Investigate alternative methods to compute RSVs [KADR00, KDR01*]. Learning edge weights through relevanace feedback [KR00]. Thesaurii based rulebase generation [KLR00].

25 Relevant URLs www.cacs.usl.edu/~linc-projects/cs3/ [LJRT99*] RaghavanHome  Publications since 1991


Download ppt "Enhancing Internet Search Engines to Achieve Concept- based Retrieval F. Lu, T. Johnsten, V. Raghavan, and D. Traylor InForum ‘99 May 5 -6, 1999."

Similar presentations


Ads by Google