Presentation is loading. Please wait.

Presentation is loading. Please wait.

Medical WordNet A Proposal Christiane Fellbaum Princeton University and Berlin-Brandenburg Academy of Sciences.

Similar presentations


Presentation on theme: "Medical WordNet A Proposal Christiane Fellbaum Princeton University and Berlin-Brandenburg Academy of Sciences."— Presentation transcript:

1 Medical WordNet A Proposal Christiane Fellbaum Princeton University and Berlin-Brandenburg Academy of Sciences

2 The Challenge Bridge communication gap between lay persons and health care providers

3 Health Care Providers (HCP or “Experts”) --Physicians --Nurses --Therapists --on-line medical information systems

4 Non-Experts patients family members benefit administrators lawyers

5 Modes of communications Live Interaction with Patients Virtual Interaction --On-line medical information

6 Experts, lay persons speak different “dialects”

7 Characteristics of HCP language Ignorance/uncertainty as to non-experts’ lexical and conceptual knowledge Same word is used with different meanings by the two populations (word-concept mismatch) HCP use technical terms HCP substitute synonyms from different levels

8 Characteristics of non-expert language Idiosyncratic, “unregulated” --mix of technical and folk terms --taxonomies are less elaborate, shallower (fewer intermediate levels of categorial distinctions) --lay concepts are fuzzy (e.g., flu) --lay concepts have no (clear) equivalents in medicine (“Kreislaufprobleme”: “circulatory problems”)

9 Expert vs. non-expert language in dialogue interaction HCP introduce new concepts for which the lay person is unprepared --go from symptoms to diagnosis, treatments, etc. Lay questions are frequently “yes/no” Expert replies are usually not “yes/no” Often no opportunities for “repair”

10 Additional problem with on-line information systems Trivial linguistic features can have potentially significant consequences

11 Example: MEDLINEplus different results depending on query: tremor vs. intentional tremor tremble vs. trembling Linguistic (morphological) differences in the query result in semantically different answers

12 (our) solution Make the HCP “bilingual” Enable “translation” between consumer health information systems and laymen

13 Problems on three levels Lexical Conceptual Propositional (facts, beliefs, hypotheses,...)

14 Some ground rules for the next 45 mins Nothing hinges on “concept” Propose synset: {concept, universal, idea, type...} “Truth” applies only to propositions, not entities WordNet has “unicorn”, “Mickey Mouse”, etc.

15 A Linguist’s view Concepts/universals are expressed by lexemes (words) Words are embedded in contexts and partially derive their meanings from contexts Truth of propositions depends partially on their lexical make-up

16 Goals Document medical knowledge that can be understood by average adult health care consumer in the U.S. Make existing tools accessible for non- experts

17 Plan of Attack Create lexical database of medical terminology modeled on WordNet, with WN’s potential for NLP Lexical (word) information is complemented with definitional sentences, one for experts, one for laymen Sentences provide meaningful contexts for terms 2 Sentential subcorpora: Facts and Beliefs

18 Some background: WordNet Large lexical database for English Semantic network? yes Thesaurus? yes BUT unlike in Roget’s, WN’s relations are labeled Ontology? who knows?

19 WordNet Constructed entirely by hand Semantic network of 115,000 synonyms sets (“synsets”) Example synsets: {chest, thorax, torso,# body_part,@ the part of the body below the neck and above the belly; “the victim had a knife stuck in his chest”)}

20 WordNet synsets One or more “cognitively synonymous” lexemes Definition (“gloss”) Examples sentence Meronymy, hyponymy relate noun synsets result: semantic network

21 WordNet synsets Where did the makers of WN get their synonyms, meronyms, etc. from? Mid-1980s: no corpora were available Association norms Some psycholinguistic testing (sorting experiments) Assumption: speakers’ use of words reflects conceptual organization

22 WordNet WordNet’s value for computational linguistics, Natural Language Processing Synonyms, related synsets allow searches for semantically related nodes --E.g., query expansion Information retrieval, Q-A systems, data mining,... Inferencing

23 Two problems: Synonymy and Polysemy WordNet maps lexemes (words) and concepts (meanings) Words are labels for concepts that speakers find salient --Identification of the same concept labelled with different words (synonymy); e.g. chest, thorax --Disambiguation of polysemous words weak patient vs. weak solution

24 Synonymy and Polysemy Synonymy: membership in the same synset Polysemy: number of synsets of which a given string is a member

25 WordNet In addition, related words and concepts can be found via the relations among entire synsets Hyponymy/hyperonymy (super-/subordination) HIV is a kind of virus One kind of virus is HIV Meronymy/holonymy (part-whole) occipital bone is part of cranium cranium has an occipital bone

26 WordNet Different kinds of hyponymy Types vs. Instances Kingdom is a type of country Monaco is an instance of a kingdom

27 Lexical semantics in WordNet The meaning of a word results from its place in the semantic network

28 WordNet for medical/bioinformatics? Synonymy, polysemy are problems here, too is WN’s way of mapping words and meanings useful?

29 WordNet for medical/bioinformatics? WN’s was compiled by non-experts Medical coverage is sparse and arbitrary

30 WordNet’s medical coverage contains both expert and folk terms (indistinguishable) contains archaic terms like unction no type vs. role (symptom) distinction e.g., tumors are abnormal but not: some tumors are malignant No links among entities, properties, processes, states domain labels (medicine, drugs,..) are assigned incompletely and inconsistently (no good domain ontology)

31 Create lexical database of medical terminology modelled on WordNet (MedWN) Info in MedWN can be accessed automatically Retain WN’s features to make it usable for NLP

32 Steps to take Review, validate, augment WN’s present medical coverage Ensure sufficiently high scientific level so that MedWN can work in tandem with existing terminology banks, ontologies,...

33 Create subcorpora of sentences MedicalFactNet --sentences rated as correct by medical experts --sentences express “true” beliefs about medical phenomena --intelligible to non-experts

34 Subcorpora of sentences MedicalBeliefNet --sentences rated highly for assent by lay persons --representative fraction of true and false beliefs about medical phenomena

35 Constraints on subcorpora Complete, grammatical English sentences No anaphora (it, then, this): context-free generic sentences Statements embed terms in typical, informative contexts

36 Sources for subcorpora --sentences generated via WordNet’s relations --WordNet’s definitions of medical terms --sentences from online medical services

37 Sentences from on-line information sources --fact sheets --NIAID Health Information Publications --UK NetDoctor’s Diseases Encyclopedia

38 Example NetDoctor text: Hay fever, otherwise known as seasonal allergenic rhinitis, is an allergic reaction to airborne substances such as pollen.... Created sentences: Hay fever is an allergy. Hay fever is an allergic reaction Hay fever is a reaction to pollen...

39 Second source of sentences Derive propositions from WordNet: Express labeled arcs as proposition e.g. if x is a hyponym of y  “x is a type of y”  meronymy: “x is a part of y “

40 Validation Derived sentences are judged by humans Likert Scale 1-5 Participants assign a score for U (understanding) to all sentences Sentences judged to be understandable are scored further for B (belief) by lay persons C (correctness) by experts

41 Validation Statements receiving a B-score of 4 or higher => MedicalBeliefNet Statements receiving a C-score of 4 or higher => MedicalFactNet

42 Side effects (beneficial) of corpus Basis for new NLP applications in the medical domain Basis for exploring individual and group differences wrt medical knowledge, vocabulary, reasoning, decision-making Use in medical training

43 Future work Scale up coverage Add relations among events (states, activites) as expressed by verbs Current work: explore “function/purpose” relation among verbs (analogous to roles among entities expressed by nouns) e.g., to run is to exercise (defeasible) to run is to move (not defeasible)

44 Future work Add relations and modalities (causality, conditionals,..) --these are more or less explicit in WordNet Crosslingual MedWN? Bootstrap from existing multilingual wordnets?


Download ppt "Medical WordNet A Proposal Christiane Fellbaum Princeton University and Berlin-Brandenburg Academy of Sciences."

Similar presentations


Ads by Google