Presentation is loading. Please wait.

Presentation is loading. Please wait.

Literature Mining Tools for Analysis of Genomic Data Ramin Homayouni, Ph.D. Associate Professor of Biology Director of Bioinformatics UTHSC BINF April.

Similar presentations


Presentation on theme: "Literature Mining Tools for Analysis of Genomic Data Ramin Homayouni, Ph.D. Associate Professor of Biology Director of Bioinformatics UTHSC BINF April."— Presentation transcript:

1 Literature Mining Tools for Analysis of Genomic Data Ramin Homayouni, Ph.D. Associate Professor of Biology Director of Bioinformatics UTHSC BINF April 24, 2008

2 Gene Expression Profiling Alizadeh, et al., (2000) Nature 403:503. Now What?

3 Useful Links for Functional Analysis Databases: –GO: http://www.geneontology.org/http://www.geneontology.org/ –MeSH: http://www.nlm.nih.gov/mesh/meshhome.htmlhttp://www.nlm.nih.gov/mesh/meshhome.html –MEDLINE: http://www.ncbi.nlm.nih.gov/entrez/http://www.ncbi.nlm.nih.gov/entrez/ –GEO: http://www.ncbi.nlm.nih.gov/projects/geo/http://www.ncbi.nlm.nih.gov/projects/geo/ Programs: –GOTM (GO): http://genereg.ornl.gov/gotm/http://genereg.ornl.gov/gotm/ –PubGene (MEDLINE): http://www.pubgene.org/http://www.pubgene.org/ –Chilibot (MEDLINE): http://www.chilibot.net/http://www.chilibot.net/ –Arrowsmith (MEDLINE): http://arrowsmith.psych.uic.edu/http://arrowsmith.psych.uic.edu/ –PubMatrix (MEDLINE): http://pubmatrix.grc.nia.nih.gov/http://pubmatrix.grc.nia.nih.gov/ –TXTGate (MEDLINE): http://www.esat.kuleuven.ac.be/txtgate/http://www.esat.kuleuven.ac.be/txtgate/ –iHOP: (MEDLINE) http://www.ihop-net.org/UniPub/iHOP/http://www.ihop-net.org/UniPub/iHOP/ –STRING (MEDLINE): http://string.embl.de/http://string.embl.de/

4 Gene Ontology Consortium http://www.geneontology.org/http://www.geneontology.org/  A controlled vocabulary applied to genes in a variety of organisms; updated every 30 minutes!  Established in 1998 as a collaboration between FlyBaseFlyBase (Drosophila) Saccharomyces Genome DatabaseSaccharomyces Genome Database (SGD) Mouse Genome DatabaseMouse Genome Database (MGD)  Three main classifications: Molecular Function (7385 terms) Biological Process (8822 terms) Cellular Component (1430 terms)

5 Gene Ontology Consortium http://www.geneontology.org/ http://www.geneontology.org/

6 GO Tree Machine (GOTM) from WebGestalt Bing Zhang & Jay Snoddy, Vanderbilt University Zhang et al., BMC Bioinformatics. 2004 Feb 18;5(1):16. http://genereg.ornl.gov/gotm/ http://genereg.ornl.gov/gotm/

7 GO Tree Machine Demo GOTM http://bioinfo.vanderbilt.edu/webgestalt/

8 GO Tree Machine -- Example

9 Problems with Gene Ontology, or any other manual indexing approach  The vocabulary is limited  The vocabulary is general  Not Comprehensive, therefore biased for well studied genes  Human error: ~66% consistency between professional indexers! EGFR ERBB2 TRP53 TGFB1 DAB1 RELN LRP8 VLDLR (C)

10 Products of the National Library of Medicine (NLM) & National Center for Biotechnology Information (NCBI)  Databases GenBank, UniGene, LocusLink (Gene) MEDLINE OMIM  Services HealthSTAR Health Services Research Projects in Progress HSTAT  Vocabulary Medical Subject Headings (MeSH) NLM Classification Unified Medical language Systems (UMLS)

11 MEDLINE  MEDLINE is the premier bibliographic database for biomedicine supported by the National Library of Medicine  MEDLINE contains approximately 18 million references, most of which have abstracts.  MEDLINE covers over 4800 journals, in over 30 languages  MEDLINE citations date back to 1966  Free abstracts !!

12 Defining Functional Relationships between Genes  Direct Relationship Gene relationships already known (e.g., A-B or B-C) Term co-occurrence Gene symbol: PubGene ( Jenssen et al., Nature Genetics 2001 28:21) Gene names (synonyms and aliases) – biochemical  Indirect Relationship Gene relationships unknown (e.g., such as A-C) C B A

13 Reelin Signaling Pathway Dab1 ApoE Reelin VLDLR ApoER2 APP p35 Cdk5 Amyloid plaques pTau fyn

14 Miscellaneous Trp53 Fos Nras Rasa1 Rab1 Src Notch1 Dll1 Jag1 Robo1 Ptch Smo Reeler Reln Dab1 VLDLR Lpr8 Gene Document Test Set Alzheimer Disease APP Aplp2 Aplp1 Psen1 Psen2 Lrp1 Mapt Apoe A2m Apbb1 Apba1 Cdk5 Cdk5r Cdk5r2

15 PubGene Query: Dab1 http://www.pubgene.org/ MouseHuman PubMed Query: Dab1 AND Reln = 16 PubMed Query: Dab1 AND reelin = 152 ! Jenssen et al., Nat Genet. 2001 May;28(1):21-8.

16 iHOP Query: Dab1 http://www.ihop-net.org/

17 iHOP Query: Dab1; Sentence Structure http://www.ihop-net.org/

18 iHOP Query: Dab1; Network building http://www.ihop-net.org/

19 PubMatrix Demo iHOP (Information Hyperlink over Proteins) http://www.ihop-net.org/UniPub/iHOP/

20 Chilibot http://www.chilibot.net/ http://www.chilibot.net/ Extracts term-term relationship from Medline abstracts. Differentiates interactive (e.g. stimulation or inhibition) and non- interactive (e.g. homology, co- existence, etc.) interactions. Color-codes gene expression values when data are provided. Automatically suggests new hypothesis based on the literature. Chen and Sharp (2004) BMC Bioinformatics 5(1):147.

21 Chilibot Demo Chilibot http://www.chilibot.net/

22 STRING at EMBL

23 PubMatrix Demo STRING http://string.embl.de/

24 Vector Space Model: Latent Semantic Indexing w1w1 w2w2 w3w3 Query W1W2W3...WxW1W2W3...Wx G 1 G 2... G x a ij  G1G1 a ij = l ij g i

25 50-Gene Document Collection 1511 5 163

26 Hierarchical Tree DevelopmentCancerAlzheimerDevelopment

27 Unrooted Tree (Graph)

28 Semantic Gene Organizer © User Interface

29 GeneIndexer Software www.computablegenomix.com


Download ppt "Literature Mining Tools for Analysis of Genomic Data Ramin Homayouni, Ph.D. Associate Professor of Biology Director of Bioinformatics UTHSC BINF April."

Similar presentations


Ads by Google