Presentation is loading. Please wait.

Presentation is loading. Please wait.

Semantic Interpretation of Medical Text Barbara Rosario, SIMS Steve Tu, UC Berkeley Advisor: Marti Hearst, SIMS.

Similar presentations


Presentation on theme: "Semantic Interpretation of Medical Text Barbara Rosario, SIMS Steve Tu, UC Berkeley Advisor: Marti Hearst, SIMS."— Presentation transcript:

1 Semantic Interpretation of Medical Text Barbara Rosario, SIMS Steve Tu, UC Berkeley Advisor: Marti Hearst, SIMS

2 Semantic Interpretation of Medical Text More accurate representation of the content of the input text Enhance text with information (concept, relationships) drawn from a medical knowledge source Determine semantic meaning of the words (and bigger constructs) and the relationships between them.

3 Combine Statistical and Symbolic Methods Use of knowledge bases, semantic hierarchies, medical knowledge, rules Use of statistic methods and machine learning techniques

4 Statistical methods Disambiguation Detection of semantic patterns Classification of semantically related constructs Degrees (weights, probabilities)

5 First Experiment: Noun Compounds and MeSH Interpretation of noun compounds is crucially semantic Noun compounds extracted from a collection of titles and abstracts of medical journals found in Medline MeSH (Medical Subject Headings) concepts for the labels

6 Input: Medline Text File Preprocessing Tagger Noun Compound Extraction Semantic Labeling Output: Semantic Labelled Noun Compounds MeSH

7 MeSH Tree Structures (main) 1. Anatomy [A] 2. Organisms [B] 3. Diseases [C] 4. Chemicals and Drugs [D] 5. Analytical, Diagnostic and Therapeutic Techniques and Equipment [E] 6. Psychiatry and Psychology [F] 7. Biological Sciences [G] 8. Physical Sciences [H] 9. Anthropology, Education, Sociology and Social Phenomena [I] 10. Technology and Food and Beverages [J] 11. Humanities [K] 12. Information Science [L] 13. Persons [M] 14. Health Care [N] 15. Geographic Locations [Z]

8 MeSH Tree Structures (node A expanded) 1. Anatomy [A] Body Regions [A01] + Musculoskeletal System [A02] + Digestive System [A03] + Respiratory System [A04] + Urogenital System [A05] + Endocrine System [A06] + Cardiovascular System [A07] + Nervous System [A08] + Sense Organs [A09] + Tissues [A10] + Cells [A11] + Fluids and Secretions [A12] + Animal Structures [A13] + Stomatognathic System [A14] + Hemic and Immune Systems [A15] + Embryonic Structures [A16] + Body Regions [A01] Abdomen [A01.047] Groin [A01.047.365] Inguinal Canal [A01.047.412] Peritoneum [A01.047.596] + Retroperitoneal Space[A01.047.681] Umbilicus [A01.047.849] Axilla [A01.133] Back [A01.176] + Breast [A01.236] + Buttocks [A01.258] Extremities [A01.378] + Head [A01.456] + Neck [A01.598] Pelvis [A01.673] + Perineum [A01.719] Skin [A01.835] + Thorax [A01.911] + Viscera [A01.960]

9 Mapping Nouns to MeSH Concepts Ex: migraine headache recurrence migraine C10.228.140.546.800.525 C10.228.140.300.800.542 C14.907.253.937.542 headache C23.888.592.612.441 C10.597.617.470 C23.888.646.487 recurrence C23.550.291.937

10 More Nouns Compounds migraine headache recurrence C10.228.140.546.800.525 C23.888.592.612.441 C23.550.291.937 blood plasma perfusion A12.207.152 A15.145.693 E05.680 migraine headache pain C10.228.140.546.800.525 C23.888.592.612.441 G11.561.796.444 brain stem neurons A08.186.211 E05.595.402.541.250 A08.663 rat liver mitochondria B02.649.865.635.560 A03.620 A11.368.702.564 plasma arginine vasopressin A15.145.693 D12.125.095.104 D06.472.734.692.781 rat thyroid cells B02.649.865.635.560 A06.407.900 A11 growth hormone secretion G07.553.481 D27.505.440.472 A12.200 blood urea nitrogen A12.207.152 D02.948 D01.362.625 breast cancer cells A01.236 C04 A11 cancer cell lines C04 A11 G05.331.599.110.708.330.800.400

11 Attachment and Semantic Interpretation Attachment classification “acute migraine treatment” [[N N] N] (LA) “intra-nasal migraine treatment” [N [N N]] (RA) To bootstrap semantic interpretation Decision tree (Quinlan )

12 Levels of Descriptions migraine headache recurrence (LA) C10.228.140.546.800.525 C23.888.592.612.441 C23.550.291.937 Feature vector Only TreeC, C, C Level 1C, 10, C, 23, C, 23 Level 2C, 10.228, C, 23.888, C, 23.550 Level 3C, 10.228.140, C, 23.888.592, C, 23.550.291 Level 4C, 10.228.140.546, C, 23.888.592.612, C, 23.550.291.937

13 Decision Tree Classification Training before pruning Training after pruning Testing before pruning Testing after pruning Only Tree 15.8 %16.4%17.3% Level 1 11.2%11.8%15.4 % Level 2 7.9%8.6%21.2%17.3% Level 3 7.9%10.5%26.9%17.3% Level 4 8.6%9.9%25.0%19.2%

14 Expressiveness of Decision Trees first noun tree = B: ra (33.0/3.7) first noun tree = E: ra (2.0/1.6) first noun tree = F: la (0.0) first noun tree = G: la (4.0/0.3) first noun tree = A: | second noun tree = B: la (0.0) | second noun tree = D: la (4.0/0.3) | second noun tree = E: la (10.0/0.4) | second noun tree = F: la (0.0) | second noun tree = G: la (6.0/1.6) | second noun tree = A: | | first tree position <= 4 : ra (7.0/1.6) | | first tree position > 4 : la (36.0/5.8) | second noun tree = C: | | third noun tree = A: ra (9.0/0.3) | | third noun tree = B: la (0.0) | | third noun tree = D: la (1.0/0.3) | | third noun tree = E: la (5.0/0.3) | | third noun tree = F: la (0.0) | | third noun tree = G: ra (2.0/1.6) | | third noun tree = C: | | | third tree position <= 21 : ra (5.0/2.6) | | | third tree position > 21 : la (5.0/0.3) first noun tree = C: …..

15

16 Semantic Interpretation Use decision tree paths for the detection of clusters of noun compounds with the same semantic interpretation

17 Ex: ACA: breast cancer cells A01.236 C04 A11 ra bladder cancer cells A05.810.161 C04 A11 ra colon carcinoma cells A03.492.411.495 C04.557.470 A11 ra prostate tumor cells A10.336.707 C04 A11 ra prostate cancer tissue A10.336.707 C04 A10 ra lung cancer cells A04.411 C04 A11 ra colon cancer cells A03.492.411.495.356 C04 A11 ra brain tumor tissue A08.186.211 C04 A10 ra colon cancer tissues A03.492.411.495.356 C04 A10 ra bladder tumor cells A05.810.161 C04 A11 ra Interpretation: noun3 exhibits noun2 in noun1

18 Ex: ACE: muscle disease diagnosis A10.690 C23.550.288 E01 la breast cancer prognosis A01.236 C04 E01.789 la breast cancer treatment A01.236 C04 E02 la hip fracture treatment A01.378.592 C21.866.405 E02 la cell cancer treatment A11 C04 E02 la brain tumor treatment A08.186.211 C04 E02 la colon adenocarcinoma xenograft A03.492.411.495.356 C04.557.470.200.025 E04.936.764 colon carcinoma xenograft A03.492.411.495.356 C04.557.470.200 E04.936.764 colon carcinoma xenografts A03.492.411.495.356 C04.557.470.200 E04.936.764 neck cancer xenografts A01.598 C04 E04.936.764 Interpretation: 1: noun3 diagnoses noun2 in noun1 2: noun3 treats noun2 in noun1

19 From MeSH to UMLS Unified Medical Language System, project at U.S National Library of Medicine 3 UMLS Knowledge Sources Metathesaurus Semantic Network SPECIALIST lexicon and programs

20 Metathesaurus Most extensive of UMLS sources 730,000 concepts representing more then 1,500,000 strings in over 60 vocabularies and classifications Organized by concept or meaning. In essence, its purpose is to link alternative names and views of the same concept together and to identify useful relationships between different concepts. Relationships in the Metathesaurus come from the sources themselves or are created by the Metathesaurus editors.

21 Semantic Network Consistent categorization of all concepts represented in the UMLS Metathesaurus and the important relationships between them. Every concept has been assigned a semantic type. The semantic types (134) are the nodes in the Network, and the relationships between them are the links (54) High level semantic structure

22 "Biologic Function" Hierarchy

23 Noun Compounds, again Very preliminary studies… Can we use the information of the Semantic Net for the semantic interpretation on the noun compounds? Are semantic types and relationships good descriptors? Are they useful for disambiguation and classification?

24 Mapping of Noun Compounds NC: peptide CRF receptor antagonists C0030956|C0010132|C0597357|C0243076| Amino Acid, Peptide, or Protein|Hormone|Receptor|Pharmacologic Substance| A1.4.1.2.1.7|A1.4.1.1.3.2|A1.4.1.1.3.6|A1.4.1.1.1| rel_12.1 (Amino Acid, Peptide, or Protein, Hormone) = interacts_with: A1.4.1.2.1.7 R3.1.5 A1.4.1.1.3.2 rel_13.1 (Amino Acid, Peptide, or Protein, Receptor) = interacts_with: A1.4.1.2.1.7 R3.1.5 A1.4.1.1.3.6 rel_14.1 (Amino Acid, Peptide, or Protein, Pharmacologic Substance) = interacts_with: A1.4.1.2.1.7 R3.1.5 A1.4.1.1.1 rel_23.1 (Hormone, Receptor) = interacts_with: A1.4.1.1.3.2 R3.1.5 A1.4.1.1.3.6 rel_24.1 (Hormone, Pharmacologic Substance) = interacts_with: A1.4.1.1.3.2 R3.1.5 A1.4.1.1.1 rel_34.1 (Receptor, Pharmacologic Substance) = interacts_with: A1.4.1.1.3.6 R3.1.5 A1.4.1.1.1

25 Mapping of Noun Compounds NC: day hospital treatment C0439228|C0019994|C0039798,C0087111| Temporal Concept|Health Care Related Organization|Functional Concept;Therapeutic or Preventive Procedure| A2.1.1|A2.7.1|A2.1.4;B1.3.1.3| rel_12.1 (Temporal Concept, Health Care Related Organization) = NOT found in SemNet rel_13.1 (Temporal Concept, Functional Concept) = NOT found in SemNet rel_13.2 (Temporal Concept, Therapeutic or Preventive Procedure) = NOT found in SemNet rel_23.1 (Health Care Related Organization, Functional Concept) = NOT found in SemNet rel_23.2 (Health Care Related Organization, Therapeutic or Preventive Procedure) = location_of: R2.1

26 Mapping of Noun Compounds NC: brain serotonin metabolism C0006104|C0036751|C0025519,C0025520| Body Part, Organ, or Organ Component|Neuroreactive Substance or Biogenic Amine|Organism Function;Functional Concept| A1.2.3.1|A1.4.1.1.3.1|B2.2.1.1.1;A2.1.4| rel_12.1 (Body Part, Organ, or Organ Component, Neuroreactive Substance or Biogenic Amine) = produces R3.2.1 rel_13.1 (Body Part, Organ, or Organ Component, Organism Function) = location_of R2.1 rel_13.2 (Body Part, Organ, or Organ Component, Functional Concept) = NOT found in SemNet rel_23.1 (Neuroreactive Substance or Biogenic Amine, Organism Function) = disrupts R3.1.3 rel_23.2 (Neuroreactive Substance or Biogenic Amine, Functional Concept) = NOT found in SemNet

27 Mapping Words - Semantic Types, Semantic Relationships Semantic types correctly assigned (on 246 nc, 738 nouns): 59% Semantic types disambiguated by the relationships Doesn’t disambiguate: 42.7% Disambiguates wrong: 17.3% Disambiguates correctly: 40%

28 (Some of) Future Work Explore in more depth UMLS sources What form the best basis for automatic semantic interpretation of noun phrases? Semantic types? Metathesaurus concepts?(and what parts of them) Just MeSH concepts? Machine Learning algorithms to help choose a good representation of medical terms

29 Future Work Machine learning algorithms for classification Can we (and how) generalize patterns found for noun compounds to other syntactic structures? How can we best formally represent semantics? How can we combine symbolic rules with statistical methods? How can we deal with non medical words? Can the system help us disambiguate them? Should we use other ontologies (ex WordNet)?


Download ppt "Semantic Interpretation of Medical Text Barbara Rosario, SIMS Steve Tu, UC Berkeley Advisor: Marti Hearst, SIMS."

Similar presentations


Ads by Google