Presentation is loading. Please wait.

Presentation is loading. Please wait.

Codifying Semantic Information in Medical Questions Using Lexical Sources Paul E. Pancoast Arthur B. Smith Chi-Ren Shyu.

Similar presentations


Presentation on theme: "Codifying Semantic Information in Medical Questions Using Lexical Sources Paul E. Pancoast Arthur B. Smith Chi-Ren Shyu."— Presentation transcript:

1 Codifying Semantic Information in Medical Questions Using Lexical Sources Paul E. Pancoast Arthur B. Smith Chi-Ren Shyu

2 Research Purpose To find a method for classifying medical questions that are asked by clinicians To find a method for classifying medical questions that are asked by clinicians Hypothesis - Simply indexing by keywords isn’t enough to Hypothesis - Simply indexing by keywords isn’t enough to distinguish questions with different meanings but similar wording, or to distinguish questions with different meanings but similar wording, or to group questions with similar meanings but different words. group questions with similar meanings but different words.

3 Definitions Semantic Information – the meaning of the words Semantic Information – the meaning of the words Syntactic Information – the parts of speech of the words (word type, sentence part) Syntactic Information – the parts of speech of the words (word type, sentence part) Medical Questions – a question asked by a clinician Medical Questions – a question asked by a clinician Lexical Sources – sources of words and vocabularies Lexical Sources – sources of words and vocabularies UMLS – Unified Medical Language System UMLS – Unified Medical Language System

4 UMLS Ambitious project of the National Library of Medicine, begun in 1986 Ambitious project of the National Library of Medicine, begun in 1986 Help researchers retrieve and integrate electronic biomedical information from a variety of sources Help researchers retrieve and integrate electronic biomedical information from a variety of sources Links over 100 controlled vocabularies Links over 100 controlled vocabularies Assigns unique identifiers to medical concepts and strings Assigns unique identifiers to medical concepts and strings Maps the hierarchical relationships between the medical concepts Maps the hierarchical relationships between the medical concepts

5 Why Bother? (To classify medical questions?) Clinicians have questions when treating patients Clinicians have questions when treating patients Researchers have gathered collections of these questions Researchers have gathered collections of these questions No good method exists to classify the questions No good method exists to classify the questions How many times has a particular question been asked? How many times has a particular question been asked? Which questions should receive priority for evidence-based answers? Which questions should receive priority for evidence-based answers?

6 Examples What is the best way to treat acute pharyngitis? What is the best way to treat acute pharyngitis? How should I approach a patient with a sore throat? How should I approach a patient with a sore throat? What should I do with a patient with diabetes and insulin resistance? What should I do with a patient with diabetes and insulin resistance? What should I do with a patient with diabetes who is resistant to taking insulin? What should I do with a patient with diabetes who is resistant to taking insulin?

7 Methods Source Questions American researcher – observed clinicians at work American researcher – observed clinicians at work British researchers – questions sent in by clinicians – answered by researchers British researchers – questions sent in by clinicians – answered by researchers Australian researchers – questions sent in by clinicians – answered by researchers Australian researchers – questions sent in by clinicians – answered by researchers 4083 total questions 4083 total questions

8 Methods Source Vocabulary MRCON – a table from the Metathesaurus MRCON – a table from the Metathesaurus Lists the medical concepts by unique identifiers (CUI) and each string associated with a concept Lists the medical concepts by unique identifiers (CUI) and each string associated with a concept unique (string => 1 concept) unique (string => 1 concept) ambiguous (string => 2+ concepts) ambiguous (string => 2+ concepts) COLD – ambient temperature, viral respiratory infection, chronic obstructive lung disease COLD – ambient temperature, viral respiratory infection, chronic obstructive lung disease 2,247,454 strings associated with concepts 2,247,454 strings associated with concepts Non-medical Lexicon – from Roget’s Thesaurus Non-medical Lexicon – from Roget’s Thesaurus Query objects (why, when, how), identifiers (I, you, he), modifiers (soon, frequently) Query objects (why, when, how), identifiers (I, you, he), modifiers (soon, frequently) 749 terms in this lexicon 749 terms in this lexicon

9 String Matching Parsing program (written in C) Parsing program (written in C) Separates individual questions into 3-word, 2- word, 1-word windows Separates individual questions into 3-word, 2- word, 1-word windows Matches the window against MRCON and our lexicon Matches the window against MRCON and our lexicon Generates a report of: Generates a report of: Total number of words parsed Total number of words parsed Number of matches from unique, ambiguous, non- medical lists Number of matches from unique, ambiguous, non- medical lists Strings that didn’t match any of the lists Strings that didn’t match any of the lists

10 Results String – individual word or words that matched String – individual word or words that matched Hits – how often the string was found Hits – how often the string was found Words – total number of matching words (some strings have more than one word in them) Words – total number of matching words (some strings have more than one word in them) StringsHitsWords % match MRCONUnique4,53424,84430,18642.3% MRCONAmbiguous5749,2569,76913.7% Non- medical 20816,76817,78324.9% Unmatched2,32113,62419.1%

11 Results 100 strings occurred 7850 times – or 57.6% of the total matches 100 strings occurred 7850 times – or 57.6% of the total matches 712 strings => 3+ hits, 85% of all hits 712 strings => 3+ hits, 85% of all hits Our focus was on strings that didn’t match one of the source vocabularies Our focus was on strings that didn’t match one of the source vocabularies 19.1% didn’t match 19.1% didn’t match Hypothesis that additional terms not found in MRCON will be important for indexing Hypothesis that additional terms not found in MRCON will be important for indexing

12 Results Unmatched words – 2+ occurrences Unmatched words – 2+ occurrences Unique words Total Number Percent Verb261367631.7% Noun186235620.3% Preposition9254421.9% Adj/Adv/Conj10310959.5% Mix * 728107.0% Pronoun106145.3% Integer705024.3% * can be more than one word type, depending on the context. Attacks, step, process all can be nouns or verbs

13 Discussion MRCON – selected because of low rate of ambiguous string-CUI combinations MRCON – selected because of low rate of ambiguous string-CUI combinations 89% unique string matches 89% unique string matches 11% ambiguous string matches 11% ambiguous string matches Other tables have greater word coverage, but have more ambiguity for each of the words Other tables have greater word coverage, but have more ambiguity for each of the words

14 Discussion Our word-matching results were similar to other researchers Our word-matching results were similar to other researchers Cimino matched 43% of words with Meta-1 (we had 56% MRCON matches) Cimino matched 43% of words with Meta-1 (we had 56% MRCON matches) Computers & Biomedical Research. Aug 1992;25(4):366-373. Computers & Biomedical Research. Aug 1992;25(4):366-373. Hersh matched 60% of words to medical terminology & names dictionary Hersh matched 60% of words to medical terminology & names dictionary (we had 79% combined lexicon matches) Proceedings/AMIA Annual Fall Symposium. p. 1997. Proceedings/AMIA Annual Fall Symposium. p. 1997.

15 Discussion Stop words – commonly removed by most normalization tools. Prepositions, conjunctions, pronouns Stop words – commonly removed by most normalization tools. Prepositions, conjunctions, pronouns Provide valuable contextual information. Provide valuable contextual information. Blood FOR an HIV-positive patient Blood FOR an HIV-positive patient Blood FROM an HIV-positive patient Blood FROM an HIV-positive patient Asprin AND warfarin Asprin AND warfarin Asprin OR warfarin Asprin OR warfarin

16 Discussion Integers Integers 186 distinct integers or integer word combinations 186 distinct integers or integer word combinations Occurred 647 times Occurred 647 times Additional modification of concepts Additional modification of concepts Hyperkalemia – 5.3 mEq/li & 8.7 mEq/li Hyperkalemia – 5.3 mEq/li & 8.7 mEq/li Both are hyperkalemia, but the evaluation and management are markedly different Both are hyperkalemia, but the evaluation and management are markedly different

17 Discussion Verbs – largest category of unmatched words Verbs – largest category of unmatched words Include action and relation concepts Include action and relation concepts Non-medical lexicon contained some Non-medical lexicon contained some Treats, attends, increases, lessens, reduce, follows, starts, can, should, is, equal, improve Treats, attends, increases, lessens, reduce, follows, starts, can, should, is, equal, improve Verb tense changes the meaning of a question Verb tense changes the meaning of a question In a patient TAKING antibiotics In a patient TAKING antibiotics In a patient who TOOK antibiotics In a patient who TOOK antibiotics

18 Discussion Verbs may be conceptually related to medical concepts Verbs may be conceptually related to medical concepts Diagnose => Diagnosis Diagnose => Diagnosis Treat=> Treatment Treat=> Treatment Evaluate=> Evaluation Evaluate=> Evaluation Prescribe=> Prescription Prescribe=> Prescription In these cases the verb (relationship) is not equivalent to the noun (concept) In these cases the verb (relationship) is not equivalent to the noun (concept)

19 Summary We developed an application to We developed an application to Parse individual words from collections of medical questions Parse individual words from collections of medical questions Match the words (phrases) with lexical sources, codified by the UMLS Match the words (phrases) with lexical sources, codified by the UMLS Our results were better than previous investigators (for percentage of matched words) Our results were better than previous investigators (for percentage of matched words) We still have some work to do…. We still have some work to do….

20 Related Experiments We attempted to cluster questions by sequences of semantic types We attempted to cluster questions by sequences of semantic types Initial attempts mostly clustered common phrases such as “How should I” and “What is the” Initial attempts mostly clustered common phrases such as “How should I” and “What is the” We may repeat this method after discarding ‘stop phrases’ We may repeat this method after discarding ‘stop phrases’

21 Future Work Family Practice Inquiries Network (FPIN) has 200 questions that have associated MeSH terms manually assigned by librarians. Family Practice Inquiries Network (FPIN) has 200 questions that have associated MeSH terms manually assigned by librarians. We will look at these question-term groups for clustering purposes (with the hypothesis that they will not make distinct clusters). We will look at these question-term groups for clustering purposes (with the hypothesis that they will not make distinct clusters).

22 Future Work I will work with researchers at NLM to apply MetaMap to medical questions extract triplets (Medical Concept-Allowable Relation-Medical Concept) from questions. Drug-treats-Disease extract triplets (Medical Concept-Allowable Relation-Medical Concept) from questions. Drug-treats-Disease Insert the triplets into a vector-space model and look for clusters Insert the triplets into a vector-space model and look for clusters

23 Thank-you!! ???


Download ppt "Codifying Semantic Information in Medical Questions Using Lexical Sources Paul E. Pancoast Arthur B. Smith Chi-Ren Shyu."

Similar presentations


Ads by Google