Presentation is loading. Please wait.

Presentation is loading. Please wait.

Datamining MEDLINE for Topics and Trends in Dental and Craniofacial Research William C. Bartling, D.D.S. NIDCR/NLM Fellow in Dental Informatics Center.

Similar presentations


Presentation on theme: "Datamining MEDLINE for Topics and Trends in Dental and Craniofacial Research William C. Bartling, D.D.S. NIDCR/NLM Fellow in Dental Informatics Center."— Presentation transcript:

1 Datamining MEDLINE for Topics and Trends in Dental and Craniofacial Research William C. Bartling, D.D.S. NIDCR/NLM Fellow in Dental Informatics Center for Biomedical Informatics University of Pittsburgh Titus K. L. Schleyer, D.M.D., Ph.D. Director, Center for Dental Informatics University of Pittsburgh School of Dental Medicine

2 Overview Goals of project Goals of project Retrieving the entire corpus of dental and craniofacial research literature from MEDLINE Retrieving the entire corpus of dental and craniofacial research literature from MEDLINE Determining the characteristics of a dental research article Determining the characteristics of a dental research article Machine learning to extract articles from any body of literature Machine learning to extract articles from any body of literature Methods to categorize dental research literature to study temporal trends Methods to categorize dental research literature to study temporal trends Summary Summary

3 Goals of project To use computerized methods to determine topics and trends in dental and craniofacial research since 1966. To use computerized methods to determine topics and trends in dental and craniofacial research since 1966. Determining the structure of such research can help to identify those research areas emerging and those waning. Determining the structure of such research can help to identify those research areas emerging and those waning. Identify research funding opportunities? Identify research funding opportunities?

4 Retrieving the dental literature MEDLINE chosen as the database MEDLINE chosen as the database MeSH tree searched manually for dental and craniofacial terms MeSH tree searched manually for dental and craniofacial terms Many MeSH terms were found in unusual locations in the hierarchy. Many MeSH terms were found in unusual locations in the hierarchy. Decision to keep or discard term Decision to keep or discard term Search limited to : Search limited to : –English language –Journal article –Abstract present

5 Results of search ~450,000 English language articles in: ~450,000 English language articles in: –DENTISTRY –STOMATOGNATHIC SYSTEM (not PHARYNX) –STOMATOGNATHIC DISEASES (not PHARYNGEAL DISEASES) ~61,000 articles indexed with dental MeSH terms not in above set ~61,000 articles indexed with dental MeSH terms not in above set ~134,000 articles remaining after limiting to journal articles containing abstracts ~134,000 articles remaining after limiting to journal articles containing abstracts

6 What is a dental research article? Currently at this phase of project Currently at this phase of project 1000 abstracts randomly chosen, 5 groups of 200 each 1000 abstracts randomly chosen, 5 groups of 200 each 15 expert judges 15 expert judges 3 judges assigned to each group 3 judges assigned to each group Judges categorize each article as: Judges categorize each article as: –Dental or craniofacial research –Dental or craniofacial, non-research –Non-dental –Not sure Web interface for judging- PHP with mySQL Web interface for judging- PHP with mySQL Web interface Web interface

7

8

9 Differentiation of article categories Acceptable reliability in each group (  > 0.70) Acceptable reliability in each group (  > 0.70) Use results of each category to develop training set Use results of each category to develop training set Identify Patient Sets (IPS) software Identify Patient Sets (IPS) software –Developed by Dr. Greg Cooper at University of Pittsburgh CBMI –Natural language processing used to find patient records of a certain type from free text documents, i.e. hospital admission records

10 IPS creates a document vector for each document or set of documents Document i Word 3 p 3 Word n p n Word 1 p 1 Word 2 p 2

11 IDENTIFY PATIENT SETS (IPS) Uses machine learning technique of “text classification” Uses machine learning technique of “text classification” All articles fed into the program All articles fed into the program –Select fields (title, abstract, MeSH terms) Training set: Training set: –2/3 of validated “dental research” articles Add remaining 1/3 to original set, less the training set Add remaining 1/3 to original set, less the training set Calculate success of retrieval using model created from training set Calculate success of retrieval using model created from training set Adjust IPS and iterate, or train set with more or less documents until successful Adjust IPS and iterate, or train set with more or less documents until successful

12 Determining trends and topics in dental and craniofacial research Entire set of dental research articles used Entire set of dental research articles used Knowledge visualization and bibliometric methods Knowledge visualization and bibliometric methods Based on the assumption that articles in a given field are similar to one other (Hearst & Pedersen, 1996) Based on the assumption that articles in a given field are similar to one other (Hearst & Pedersen, 1996) Similar articles and topics tend to cluster together Similar articles and topics tend to cluster together

13 Bibliometric examples from other fields Co-word analysis Co-word analysis –Software engineering (Coulter, Monarch, and Konda, 1998) Co-descriptor analysis Co-descriptor analysis –Information science (McCain, 1995) Co-author analysis Co-author analysis –Information retrieval literature (Ding et. al., 1999) Co-citation analysis Co-citation analysis –Medical informatics literature (Morris & McCain, 1998)

14 Visual methods to categorize literature Co-occurrence vectors or weights Co-occurrence vectors or weights –Weights based on co-occurrence of terms Multidimensional scaling Multidimensional scaling –Display of points in two or three dimensions –Points closer together on matrix when articles are more similar Clustering Clustering –Groups of points in close proximity to each other are bounded to provide an intellectual grouping

15 Medical Informatics Structure

16 How do we cluster dental research? Entire text of abstracts Entire text of abstracts MeSH terms only MeSH terms only –Major headings –Subheadings –All MeSH headings Journal titles Journal titles Combinations of the above Combinations of the above

17 Once clustering is done: Cluster dental research within certain time periods (5 years) Cluster dental research within certain time periods (5 years) Determine quantities of articles published for each cluster within each time period Determine quantities of articles published for each cluster within each time period Cluster including only journals with a given impact factor threshhold Cluster including only journals with a given impact factor threshhold Study changes over time of different categories of research Study changes over time of different categories of research

18 Summary A comprehensive content analysis of the dental and craniofacial research literature has not been done. A comprehensive content analysis of the dental and craniofacial research literature has not been done. Computerized methods can help to retrieve and categorize this literature. Computerized methods can help to retrieve and categorize this literature. Study of trends in dental research can help researchers to identify relevance of current studies and possibly reveal future research opportunities. Study of trends in dental research can help researchers to identify relevance of current studies and possibly reveal future research opportunities.

19 Many thanks to the following: Amy Gregg, MLIS-Dental Reference Librarian Falk Library for the Health Sciences University of Pittsburgh Shyam Visweswaran, MD- NLM Fellow in Intelligent Systems Center for Biomedical Informatics University of Pittsburgh All of my expert raters! This research is supported with a training grant from the National Institute of Dental and Craniofacial Research and the National Library of Medicine


Download ppt "Datamining MEDLINE for Topics and Trends in Dental and Craniofacial Research William C. Bartling, D.D.S. NIDCR/NLM Fellow in Dental Informatics Center."

Similar presentations


Ads by Google