Presentation is loading. Please wait.

Presentation is loading. Please wait.

Improving Information Retrieval in MEDLINE by Modulating MeSH Term Weights Kwangcheol Shin, Sang-Yong Han School of CSE, Chung-Ang Univ. Seoul, Korea NLDB.

Similar presentations


Presentation on theme: "Improving Information Retrieval in MEDLINE by Modulating MeSH Term Weights Kwangcheol Shin, Sang-Yong Han School of CSE, Chung-Ang Univ. Seoul, Korea NLDB."— Presentation transcript:

1 Improving Information Retrieval in MEDLINE by Modulating MeSH Term Weights Kwangcheol Shin, Sang-Yong Han School of CSE, Chung-Ang Univ. Seoul, Korea NLDB 2004

2 2 Contents MEDLINE and MeSH Vector Space Model Modulating MeSH Term Weights Experimental Results Conclusion

3 3 Contents MEDLINE and MeSH Vector Space Model Modulating MeSH Term Weights Experimental Results Conclusion

4 4 MEDLINE and MeSH MEDLINE is a premier bibliography database of National Library of Medicine (NLM). Medical Subject Headings (MeSH) is the authority list of controlled vocabulary terms used for subject analysis of biomedical literature at NLM. Expert annotators of the NLM assign MeSH keywords to each MEDLINE document for effective retrieval. Manual annotation with MeSH terms is a distinctive feature of MEDLINE. MEDLINE is supplied with its own Boolean model- based search engine.

5 5 Contents MEDLINE and MeSH Vector Space Model Modulating MeSH Term Weights Experimental Results Conclusion

6 6 Vector Space Model In the VSM, the documents are represented as vectors with the coordinates proportional to the number of occurrences. The similarity between two vectors is measured using the cosine measure:

7 7 Suggested Method We show that applying a Vector Space Model- based search engine to MEDLINE data gives much better results than Boolean-based. More importantly, balancing the weights of the manually assigned MeSH keywords and the text words further improves the quality of the results.

8 8 Contents MEDLINE and MeSH Vector Space Model Modulating MeSH Term Weights Experimental Results Conclusion

9 9 Modulating MeSH Term Weights MEDLINE documents contain MeSH keywords Our idea is to increase the weights of MeSH terms in each documents vector. MJBONE-DISEASES-DEVELOPMENTAL: co. CYSTIC- FIBROSIS: co. DWARFISM: co. MNCASE-REPORT. CHILD. FEMALE. HUMAN. SYNDROME. ABTaussig et al reported a case of a 6-year-old boy with the Russell variant of the Silver- Russell syndrome concomitant with cystic fibrosis. We would like to describe another patient who...

10 10 Modulating MeSH Term Weights We use following procedure 1. Assign the weights w ij as in vector space model 2. Use formula to increase the weight of MeSH terms: where ρ is a parameter regulating the sensitivity of the formula to the MeSH terms

11

12 12 Contents MEDLINE and MeSH Vector Space Model Modulating MeSH Term Weights Experimental Results Conclusion

13 13 Experimental Results - Test Collection Experimented with the well-known Cystic Fibrosis (CF) reference collection, which is a subset of MEDLINE. It has 1,239 medical data records supplied with 100 queries with relevant documents provided. QUWhat are the effects of calcium on the physical properties of mucus from CF patients? RD139 1222 151 2211 166 0001 311 0001 370 1010 392 0001 439 0001 440 0011 441 2122 454 0100 461 1121 502 0002 503 1000 505 0001

14 14 Experimental Results

15 15 Results: Vector Space Model

16 16 Results: Boolean vs. Vector

17 17 Contents MEDLINE and MeSH Vector Space Model Modulating MeSH Term Weights Experimental Results Conclusion

18 18 Conclusions Vector space model gives better results than Boolean model-based system. Increasing the weights for MeSH terms as compared with the standard vector space model improves retrieval accuracy. Optimal weights are balanced: both MeSH terms and text terms are taken into account We get as much as 2.4 times better results than the system currently provided with MEDLINE.

19 Thank you!


Download ppt "Improving Information Retrieval in MEDLINE by Modulating MeSH Term Weights Kwangcheol Shin, Sang-Yong Han School of CSE, Chung-Ang Univ. Seoul, Korea NLDB."

Similar presentations


Ads by Google