Srihari-CSE730-Spring 2003 CSE 730 Information Retrieval of Biomedical Text and Data Inroduction.

Slides:



Advertisements
Similar presentations
Zoology 305 Library Databases/Indexes Lab Goals for session: 1) Meet your librarian Kevin Messner 2) Understand.
Advertisements

NCBI/WHO PubMed/Hinari Course NCBI Literature Databases: PubMed Background.
Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.
1.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
NATIONAL LIBRARY OF MEDICINE The PubMed ID and Entrez, PubMed and PubMed Central Edwin Sequeira National Center for Biotechnology Information June 21,
Collaborative Information Management: Advanced Information Processing in Bioinformatics Joost N. Kok LIACS - Leiden Institute of Advanced Computer Science.
1 Question Answering in Biomedicine Student: Andreea Tutos Id: Supervisor: Diego Molla.
Edward H. Shortliffe, MD, PhD College of Physicians & Surgeons
Bioinformatics Needs for the post-genomic era Dr. Erik Bongcam-Rudloff The Linnaeus Centre for Bioinformatics.
Evidence-Based Information Retrieval in Bioinformatics
Jeffery Loo NLM Associate Fellow ’03 – ’05 chemicalinformaticsforlibraries.
1 ETR 520 Introduction to Educational Research Dr. M C. Smith.
Fungal Semantic Web Stephen Scott, Scott Henninger, Leen-Kiat Soh (CSE) Etsuko Moriyama, Ken Nickerson, Audrey Atkin (Biological Sciences) Steve Harris.
Automating Discovery from Biomedical Texts Marti Hearst & Barbara Rosario UC Berkeley Agyinc Visit August 16, 2000.
Computational Molecular Biology (Spring’03) Chitta Baral Professor of Computer Science & Engg.
Bioinformatics: a Multidisciplinary Challenge Ron Y. Pinter Dept. of Computer Science Technion March 12, 2003.
Lecture 2.21 Retrieving Information: Using Entrez.
An Overview of Text Mining Rebecca Hwa 4/25/2002 References M. Hearst, “Untangling Text Data Mining,” in the Proceedings of the 37 th Annual Meeting of.
QuASI: Question Answering using Statistics, Semantics, and Inference Marti Hearst, Jerry Feldman, Chris Manning, Srini Narayanan Univ. of California-Berkeley.
Scientific Data Mining: Emerging Developments and Challenges F. Seillier-Moiseiwitsch Bioinformatics Research Center Department of Mathematics and Statistics.
CBioC: Massive Collaborative Curation of Biomedical Literature Future Directions.
Data Mining – Intro.
B IOMEDICAL T EXT M INING AND ITS A PPLICATION IN C ANCER R ESEARCH Henry Ikediego
APPLICATION : DIAGNOSTIC CODING 1 SIEMENS  Coding is the translation of diagnosis terms describing patients diagnosis or treatment into a coded number.
Datamining MEDLINE for Topics and Trends in Dental and Craniofacial Research William C. Bartling, D.D.S. NIDCR/NLM Fellow in Dental Informatics Center.
9/30/2004TCSS588A Isabelle Bichindaritz1 Introduction to Bioinformatics.
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Bioinformatics Timothy Ketcham Union College Gradutate Seminar 2003 Bioinformatics.
IL Step 1: Sources of Information Information Literacy 1.
Medical Informatics Basics
Bioinformatics and medicine: Are we meeting the challenge?
IProLINK – A Literature Mining Resource at PIR (integrated Protein Literature INformation and Knowledge ) Hu ZZ 1, Liu H 2, Vijay-Shanker K 3, Mani I 4,
This material was developed by Oregon Health & Science University, funded by the Department of Health and Human Services, Office of the National Coordinator.
CSCI 6900/4900 Special Topics in Computer Science Automata and Formal Grammars for Bioinformatics Bioinformatics problems sequence comparison pattern/structure.
Introduction to Science Informatics Lecture 1. What Is Science? a dependence on external verification; an expectation of reproducible results; a focus.
Introduction to Bioinformatics Biostatistics & Medical Informatics 576 Computer Sciences 576 Fall 2008 Colin Dewey Dept. of Biostatistics & Medical Informatics.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Discovering Descriptive Knowledge Lecture 18. Descriptive Knowledge in Science In an earlier lecture, we introduced the representation and use of taxonomies.
BioRAT: Extracting Biological Information from Full-length Papers David P.A. Corney, Bernard F. Buxton, William B. Langdon and David T. Jones Bioinformatics.
Overview of Bioinformatics 1 Module Denis Manley..
AdvancedBioinformatics Biostatistics & Medical Informatics 776 Computer Sciences 776 Spring 2002 Mark Craven Dept. of Biostatistics & Medical Informatics.
Evidence-Based Medicine – Definitions and Applications 1 Component 2 / Unit 5 Health IT Workforce Curriculum Version 1.0 /Fall 2010.
Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Kevin Heinrich, Lai Wei, and Michael W. Berry University of Tennessee.
Using Domain Ontologies to Improve Information Retrieval in Scientific Publications Engineering Informatics Lab at Stanford.
Information Technology in the Natural Sciences Biology – Chemistry – Physics.
1 Automatic indexing Salton: When the assignment of content identifiers is carried out with the aid of modern computing equipment the operation becomes.
Basics of Information Retrieval and Query Formulation Bekele Negeri Duresa Nuclear Information Specialist.
Bioinformatics and Computational Biology
Salha Jokhab, Msc 222 PHCL Pharmacy Literature. Objectives Brief description of the literature used in pharmacy, its structure and format. Tips for writing.
PubMed …featuring more than 20 million citations for biomedical literature from MEDLINE, life science journals, and online books.
An Introduction to NCBI & BLAST National Center for Biotechnology Information Richard Johnston Pasadena City College.
1 Survey of Biodata Analysis from a Data Mining Perspective Peter Bajcsy Jiawei Han Lei Liu Jiong Yang.
Genomic Medicine Grid Juan Pedro Sánchez Merino Instituto de Salud Carlos III
Informatics for Scientific Data Bio-informatics and Medical Informatics Week 9 Lecture notes INF 380E: Perspectives on Information.
Major Issues n Information is mostly online n Information is increasing available in full-text (full-content) n There is an explosion in the amount of.
BNFO 615 Fall 2016 Usman Roshan NJIT. Outline Machine learning for bioinformatics – Basic machine learning algorithms – Applications to bioinformatics.
TDM in the Life Sciences Application to Drug Repositioning *
Biological Databases By: Komal Arora.
Sentiment analysis algorithms and applications: A survey
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Machine Learning Ali Ghodsi Department of Statistics
What is Pattern Recognition?
Mangaldai College, Mangaldai
CSE591: Data Mining by H. Liu
Bioinformatics Biological Data Computer Calculations +
LESSON 1 INTNRODUCTION HYE-JOO KWON, Ph.D /
PubMed.
Presentation transcript:

Srihari-CSE730-Spring 2003 CSE 730 Information Retrieval of Biomedical Text and Data Inroduction

Srihari-CSE730-Spring 2003 Student Participation Class presentation based on course syllabus supplementary material will be made available Regular attendance in class and participation in discussion Report at the end of the semester (based on class presentation) Class presentation based on course syllabus supplementary material will be made available Regular attendance in class and participation in discussion Report at the end of the semester (based on class presentation)

Srihari-CSE730-Spring 2003 Format of Class Presentation Begin with biological motivations: from Mount textbook Overview of solutions flowchart given in Mount textbook is very useful On-line demonstrations where possible Entrez, BLAST, FASTA etc. In-depth discussion of state-of-the-art algorithms to solve problem Discussion Begin with biological motivations: from Mount textbook Overview of solutions flowchart given in Mount textbook is very useful On-line demonstrations where possible Entrez, BLAST, FASTA etc. In-depth discussion of state-of-the-art algorithms to solve problem Discussion

Srihari-CSE730-Spring 2003 Course Outline Searching Biomedical Text and Literature one aspect of Medical Informatics IR techniques for biomedical domain lexicons, ontologies statistical language modeling information extraction and text mining TREC-10 genomics track Bioinformatics biological data resources alignment of pairs of sequences (dynamic programming) DB Sequence similarity (BLAST, FASTA etc.) Phylogenetic trees, probabilistic approaches Gene Prediction RNA Structure, stochastic context-free grammars protein classification genome analysis, gene expression Searching Biomedical Text and Literature one aspect of Medical Informatics IR techniques for biomedical domain lexicons, ontologies statistical language modeling information extraction and text mining TREC-10 genomics track Bioinformatics biological data resources alignment of pairs of sequences (dynamic programming) DB Sequence similarity (BLAST, FASTA etc.) Phylogenetic trees, probabilistic approaches Gene Prediction RNA Structure, stochastic context-free grammars protein classification genome analysis, gene expression

Srihari-CSE730-Spring 2003 Medical Informatics: Applications – Electronic medical records – Decision support systems – Information (knowledge) retrieval systems – Imaging and telemedicine systems – Support of medical education – Patient and public health information systems – Bioinformatics applications – Electronic medical records – Decision support systems – Information (knowledge) retrieval systems – Imaging and telemedicine systems – Support of medical education – Patient and public health information systems – Bioinformatics applications

Srihari-CSE730-Spring 2003 Properties of Scientific Information Growth de Solla Price – Doubling time of number of papers is 15 years Pao – A single paper in 1660 would have doubled to 2.3 million in 1977, close to 2.2 million indexed MEDLINE adds 300,000 references per year Scientific information becomes obsolete – e.g., cholesterol and heart disease, antibiotics Obsolescence varies by field – Half of citations in physics <5 yrs old, chemistry <8 Practical implications – IR systems and libraries Long lead time for acceptance into textbooks Growth de Solla Price – Doubling time of number of papers is 15 years Pao – A single paper in 1660 would have doubled to 2.3 million in 1977, close to 2.2 million indexed MEDLINE adds 300,000 references per year Scientific information becomes obsolete – e.g., cholesterol and heart disease, antibiotics Obsolescence varies by field – Half of citations in physics <5 yrs old, chemistry <8 Practical implications – IR systems and libraries Long lead time for acceptance into textbooks

Srihari-CSE730-Spring 2003 Properties of Scientific Information contd. Fragmentation One scientific paper reports one experiment Scientists aim to publish in diverse locations Academic promotion rules give incentive for many publications – Study results may be broken into pieces and published in different places Linkage Bibliometrics - a field concerned with linkage of information via citations Citations in scientific papers are important – Demonstrate awareness of background information and prior work – Substantiate claims Fragmentation One scientific paper reports one experiment Scientists aim to publish in diverse locations Academic promotion rules give incentive for many publications – Study results may be broken into pieces and published in different places Linkage Bibliometrics - a field concerned with linkage of information via citations Citations in scientific papers are important – Demonstrate awareness of background information and prior work – Substantiate claims

Srihari-CSE730-Spring 2003 Some Challenging Problems Ad-hoc document retrieval (TREC) queries consist of gene names, with the specific task being to find all articles that describe some aspect of the function of the gene need to look at both MEDLINE data and GeneRIF data Find evidence that suggests that certain diseases cause certain diseases analysis of MEDLINE documents more of a text mining problem Information extraction named entity tagging of disease names, symptoms, genes, proteins, chemical compounds, etc. supports both browsing as well as text mining (discovery) Ad-hoc document retrieval (TREC) queries consist of gene names, with the specific task being to find all articles that describe some aspect of the function of the gene need to look at both MEDLINE data and GeneRIF data Find evidence that suggests that certain diseases cause certain diseases analysis of MEDLINE documents more of a text mining problem Information extraction named entity tagging of disease names, symptoms, genes, proteins, chemical compounds, etc. supports both browsing as well as text mining (discovery)

Srihari-CSE730-Spring 2003 Information Needs Classification of information needs Models of clinician thinking Clinician information needs How are needs being met? Classification of information needs Models of clinician thinking Clinician information needs How are needs being met?

Srihari-CSE730-Spring 2003 Classification of Information Needs Information sources approached for two reasons – Need to locate a particular item – Need to obtain information on a subject Subject needs fall into three categories – Help in solving a problem or making a decision – Background information on a topic – Keeping up with information on a subject Information needs can also be classified by amount of information needed – Need for a single fact – Need for one or more documents but less than complete literature on a topic – Need for a comprehensive review of literature Information sources approached for two reasons – Need to locate a particular item – Need to obtain information on a subject Subject needs fall into three categories – Help in solving a problem or making a decision – Background information on a topic – Keeping up with information on a subject Information needs can also be classified by amount of information needed – Need for a single fact – Need for one or more documents but less than complete literature on a topic – Need for a comprehensive review of literature

Srihari-CSE730-Spring 2003 Models of Clinician Thinking Hypothetico-deductive approach – Hypothesis formation begins as information obtained and modified as new information supports or refutes hypothesis Illness scripts – Experienced clinicians develop “scripts” of illness and follow pathway to arrive at diagnosis Information needs for Clinicians Williamson et al. – 20-50% of physicians unaware of recent major advances in their specialties Two unmet needs for every three patients – Factors associated with pursuit of need urgent belief in answerability belief would help with others – Hypothetico-deductive approach – Hypothesis formation begins as information obtained and modified as new information supports or refutes hypothesis Illness scripts – Experienced clinicians develop “scripts” of illness and follow pathway to arrive at diagnosis Information needs for Clinicians Williamson et al. – 20-50% of physicians unaware of recent major advances in their specialties Two unmet needs for every three patients – Factors associated with pursuit of need urgent belief in answerability belief would help with others –

Srihari-CSE730-Spring 2003 How are Clinician needs being met? –Most common sources – Colleagues – Tertiary sources - textbooks, review articles Not with primary literature – Takes too much time – Requires too much expertise Not with computers (Hersh and Hickam) – When available, used only 1-6 times per month –Most common sources – Colleagues – Tertiary sources - textbooks, review articles Not with primary literature – Takes too much time – Requires too much expertise Not with computers (Hersh and Hickam) – When available, used only 1-6 times per month