Presentation is loading. Please wait.

Presentation is loading. Please wait.

Srihari-CSE730-Spring 2003 CSE 730 Information Retrieval of Biomedical Text and Data Inroduction.

Similar presentations


Presentation on theme: "Srihari-CSE730-Spring 2003 CSE 730 Information Retrieval of Biomedical Text and Data Inroduction."— Presentation transcript:

1 Srihari-CSE730-Spring 2003 CSE 730 Information Retrieval of Biomedical Text and Data Inroduction

2 Srihari-CSE730-Spring 2003 Student Participation Class presentation based on course syllabus supplementary material will be made available Regular attendance in class and participation in discussion Report at the end of the semester (based on class presentation) Class presentation based on course syllabus supplementary material will be made available Regular attendance in class and participation in discussion Report at the end of the semester (based on class presentation)

3 Srihari-CSE730-Spring 2003 Format of Class Presentation Begin with biological motivations: from Mount textbook Overview of solutions flowchart given in Mount textbook is very useful On-line demonstrations where possible Entrez, BLAST, FASTA etc. In-depth discussion of state-of-the-art algorithms to solve problem Discussion Begin with biological motivations: from Mount textbook Overview of solutions flowchart given in Mount textbook is very useful On-line demonstrations where possible Entrez, BLAST, FASTA etc. In-depth discussion of state-of-the-art algorithms to solve problem Discussion

4 Srihari-CSE730-Spring 2003 Course Outline Searching Biomedical Text and Literature one aspect of Medical Informatics IR techniques for biomedical domain lexicons, ontologies statistical language modeling information extraction and text mining TREC-10 genomics track Bioinformatics biological data resources alignment of pairs of sequences (dynamic programming) DB Sequence similarity (BLAST, FASTA etc.) Phylogenetic trees, probabilistic approaches Gene Prediction RNA Structure, stochastic context-free grammars protein classification genome analysis, gene expression Searching Biomedical Text and Literature one aspect of Medical Informatics IR techniques for biomedical domain lexicons, ontologies statistical language modeling information extraction and text mining TREC-10 genomics track Bioinformatics biological data resources alignment of pairs of sequences (dynamic programming) DB Sequence similarity (BLAST, FASTA etc.) Phylogenetic trees, probabilistic approaches Gene Prediction RNA Structure, stochastic context-free grammars protein classification genome analysis, gene expression

5 Srihari-CSE730-Spring 2003 Medical Informatics: Applications – Electronic medical records – Decision support systems – Information (knowledge) retrieval systems – Imaging and telemedicine systems – Support of medical education – Patient and public health information systems – Bioinformatics applications – Electronic medical records – Decision support systems – Information (knowledge) retrieval systems – Imaging and telemedicine systems – Support of medical education – Patient and public health information systems – Bioinformatics applications

6 Srihari-CSE730-Spring 2003 Properties of Scientific Information Growth de Solla Price – Doubling time of number of papers is 15 years Pao – A single paper in 1660 would have doubled to 2.3 million in 1977, close to 2.2 million indexed MEDLINE adds 300,000 references per year Scientific information becomes obsolete – e.g., cholesterol and heart disease, antibiotics Obsolescence varies by field – Half of citations in physics <5 yrs old, chemistry <8 Practical implications – IR systems and libraries Long lead time for acceptance into textbooks Growth de Solla Price – Doubling time of number of papers is 15 years Pao – A single paper in 1660 would have doubled to 2.3 million in 1977, close to 2.2 million indexed MEDLINE adds 300,000 references per year Scientific information becomes obsolete – e.g., cholesterol and heart disease, antibiotics Obsolescence varies by field – Half of citations in physics <5 yrs old, chemistry <8 Practical implications – IR systems and libraries Long lead time for acceptance into textbooks

7 Srihari-CSE730-Spring 2003 Properties of Scientific Information contd. Fragmentation One scientific paper reports one experiment Scientists aim to publish in diverse locations Academic promotion rules give incentive for many publications – Study results may be broken into pieces and published in different places Linkage Bibliometrics - a field concerned with linkage of information via citations Citations in scientific papers are important – Demonstrate awareness of background information and prior work – Substantiate claims Fragmentation One scientific paper reports one experiment Scientists aim to publish in diverse locations Academic promotion rules give incentive for many publications – Study results may be broken into pieces and published in different places Linkage Bibliometrics - a field concerned with linkage of information via citations Citations in scientific papers are important – Demonstrate awareness of background information and prior work – Substantiate claims

8 Srihari-CSE730-Spring 2003 Some Challenging Problems Ad-hoc document retrieval (TREC) queries consist of gene names, with the specific task being to find all articles that describe some aspect of the function of the gene need to look at both MEDLINE data and GeneRIF data Find evidence that suggests that certain diseases cause certain diseases analysis of MEDLINE documents more of a text mining problem Information extraction named entity tagging of disease names, symptoms, genes, proteins, chemical compounds, etc. supports both browsing as well as text mining (discovery) Ad-hoc document retrieval (TREC) queries consist of gene names, with the specific task being to find all articles that describe some aspect of the function of the gene need to look at both MEDLINE data and GeneRIF data Find evidence that suggests that certain diseases cause certain diseases analysis of MEDLINE documents more of a text mining problem Information extraction named entity tagging of disease names, symptoms, genes, proteins, chemical compounds, etc. supports both browsing as well as text mining (discovery)

9 Srihari-CSE730-Spring 2003 Information Needs Classification of information needs Models of clinician thinking Clinician information needs How are needs being met? Classification of information needs Models of clinician thinking Clinician information needs How are needs being met?

10 Srihari-CSE730-Spring 2003 Classification of Information Needs Information sources approached for two reasons – Need to locate a particular item – Need to obtain information on a subject Subject needs fall into three categories – Help in solving a problem or making a decision – Background information on a topic – Keeping up with information on a subject Information needs can also be classified by amount of information needed – Need for a single fact – Need for one or more documents but less than complete literature on a topic – Need for a comprehensive review of literature Information sources approached for two reasons – Need to locate a particular item – Need to obtain information on a subject Subject needs fall into three categories – Help in solving a problem or making a decision – Background information on a topic – Keeping up with information on a subject Information needs can also be classified by amount of information needed – Need for a single fact – Need for one or more documents but less than complete literature on a topic – Need for a comprehensive review of literature

11 Srihari-CSE730-Spring 2003 Models of Clinician Thinking Hypothetico-deductive approach – Hypothesis formation begins as information obtained and modified as new information supports or refutes hypothesis Illness scripts – Experienced clinicians develop “scripts” of illness and follow pathway to arrive at diagnosis Information needs for Clinicians Williamson et al. – 20-50% of physicians unaware of recent major advances in their specialties Two unmet needs for every three patients – Factors associated with pursuit of need urgent belief in answerability belief would help with others – Hypothetico-deductive approach – Hypothesis formation begins as information obtained and modified as new information supports or refutes hypothesis Illness scripts – Experienced clinicians develop “scripts” of illness and follow pathway to arrive at diagnosis Information needs for Clinicians Williamson et al. – 20-50% of physicians unaware of recent major advances in their specialties Two unmet needs for every three patients – Factors associated with pursuit of need urgent belief in answerability belief would help with others –

12 Srihari-CSE730-Spring 2003 How are Clinician needs being met? –Most common sources – Colleagues – Tertiary sources - textbooks, review articles Not with primary literature – Takes too much time – Requires too much expertise Not with computers (Hersh and Hickam) – When available, used only 1-6 times per month –Most common sources – Colleagues – Tertiary sources - textbooks, review articles Not with primary literature – Takes too much time – Requires too much expertise Not with computers (Hersh and Hickam) – When available, used only 1-6 times per month


Download ppt "Srihari-CSE730-Spring 2003 CSE 730 Information Retrieval of Biomedical Text and Data Inroduction."

Similar presentations


Ads by Google