Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Development of a search engine & Comparison according to algorithms 20032017 Sung-soo Kim The final report.

Similar presentations


Presentation on theme: "The Development of a search engine & Comparison according to algorithms 20032017 Sung-soo Kim The final report."— Presentation transcript:

1 The Development of a search engine & Comparison according to algorithms 20032017 Sung-soo Kim The final report

2 Contents Topic Topic Development environment Development environment Procedure Procedure Retrieval system design Retrieval system design Comparing performance Comparing performance Conclusion Conclusion Future work Future work Reference Reference

3 Topic Design information retrieval system to compare performance such as Vector modeling, boolean, and natural-query. Design information retrieval system to compare performance such as Vector modeling, boolean, and natural-query.

4 Development environment OS: OS: Red hat – linux Red hat – linux System: System: Pentium 2.4G, XP window Pentium 2.4G, XP window Language: Language: C and gcc compiler C and gcc compiler Interface: Interface: Execute on console line Execute on console line

5 Procedure  Extracting the text-information ’ s position from raw files.  Extracting the keyword or index from the text.  Making the index file.  Gathering and sorting those index file  Getting information of index.  Boolean retrieval  Natural language retrieval using Vector

6 Retrieval system design (1)

7 Retrieval system design (2)

8 Comparing performance (1) SIM(Di,Dj)= SIM(Di,Dj)= Where the weights Wik are simple frequency counts Where the weights Wik are simple frequency counts The problem with this simple measure is that it is not normalized to account for variances in the length of documents The problem with this simple measure is that it is not normalized to account for variances in the length of documents –This might be corrected by dividing each frequency count by the length of the document –It may be also be corrected by dividing each frequency count by the maximum frequency count for the document Additional normalization is often performed to force all similarity values to the range between 0 and 1 Additional normalization is often performed to force all similarity values to the range between 0 and 1

9 Comparing performance (2)

10 Comparing performance (3)

11 Comparing performance (4) But, we used different equation following But, we used different equation following - Similarity: SIM(Di,Dj)= - Weighted value for index in document: - Weighted value for query:

12 Executes system (indexing)

13 Executes system (boolean)

14 Executes system (natural_query)

15 Conclusion Boolean: Boolean: -Easy for user to composite and, for computer to transact. -Cannot sort the document as similarity for ranking -Only find the document that is exactly equal to user ’ s query. Vector: Vector: - Calculate similarity (query and document ’ s index). -Can retrieval some document satisfied similarity defined by user.

16 Future work Both boolean and natural_query have relevant limits Both boolean and natural_query have relevant limits Because they are based on Structural concepts (streaming match) Because they are based on Structural concepts (streaming match) Recently new concepts are accomplished not structural but semantic. Recently new concepts are accomplished not structural but semantic. So called semantic web So called semantic web

17 Reference Lee, J.H(1995), Combining Multiple Evidence from different Properties of Weighting Schemes, ACM SIGIR Conference on Research and Development in Information Retrieval. Lee, J.H(1995), Combining Multiple Evidence from different Properties of Weighting Schemes, ACM SIGIR Conference on Research and Development in Information Retrieval. Harman,D.(1993), Overview of the 1 st text retrieval conference, Proceeding of the 16 th Annual International ACM SIGIR Conference on Research and development in Information Retrieval. Harman,D.(1993), Overview of the 1 st text retrieval conference, Proceeding of the 16 th Annual International ACM SIGIR Conference on Research and development in Information Retrieval. http://blue.skhu.ac.kr/~mckim/Lecture/IR/Note/hwork.html http://blue.skhu.ac.kr/~mckim/Lecture/IR/Note/hwork.html


Download ppt "The Development of a search engine & Comparison according to algorithms 20032017 Sung-soo Kim The final report."

Similar presentations


Ads by Google