Presentation is loading. Please wait.

Presentation is loading. Please wait.

The BIG Goal “The greatest challenge, however, is analytical. … Deeper biological insight is likely to emerge from examining datasets with scores of samples.”

Similar presentations


Presentation on theme: "The BIG Goal “The greatest challenge, however, is analytical. … Deeper biological insight is likely to emerge from examining datasets with scores of samples.”"— Presentation transcript:

1 The BIG Goal “The greatest challenge, however, is analytical. … Deeper biological insight is likely to emerge from examining datasets with scores of samples.” Eric Lander, “array of hope” Nat. Gen. volume 21 supplement pp 3 - 4, 1999. Bio-informatics: Provide methodologies for elucidating biological knowledge from biological data.

2 Genetic Information Central Paradigm of Bio-informatics

3 Molecular Structure Genetic Information Central Paradigm of Bio-informatics

4 Molecular Structure Genetic Information Biochemical Function Central Paradigm of BioInformatics

5 Molecular Structure Genetic Information Biochemical Function Symptoms Central Paradigm of Bio-informatics

6 Molecular Structure Genetic Information Biochemical Function Symptoms Central Paradigm of Bio-informatics

7 http://www.sanger.ac.uk/PostGenomics/S_pombe/presentations/EMBOCopenhagenWebsite.pdf Computer Science Tools are Crucial

8 New bio-technologies create huge amounts of data. It is impossible to analyze data by manual inspection. Novel mathematical, statistical, algorithmic and computational tools are necessary !

9 http://cbms.st-and.ac.uk/academics/ryan/Teaching/SB&Bioinf/lecture1.htm Automated Sequencing

10 What is Bio-Informatics ? A field of science in which Biology, Computer Science and Information Technology merge into a single discipline. Computers (& software tools) are used to collect, analyze and interpret biological information at the molecular level. Goal: To enable the discovery of new biological insights and create a global perspective for biologists.

11 Development of new algorithms and statistical methods to assess relationships among members of large data sets. Analysis and interpretation of various types of data. Development and implementation of tools to efficiently access and manage different types of information. Disciplines

12 Why Use Bio-Informatics ? An explosive growth in the amount of biological information necessitates the use of computers for cataloging and retrieval of data (> 3 billion bps, > 30,000 genes). The human genome project. Automated sequencing. GenBank has over 16 Billion bases and is doubling every year !!!

13 New Types of Biological Data Micro arrays - gene expression. Multi-level maps: genetic, physical: sequence, annotation. Networks of protein-protein interactions. Cross-species relationships: Homologous genes. Chromosome organization. http://www.the-scientist.com/yr2002/apr/research ‭ 020415.html

14 A more global view of experimental design. (from “one scientist = one gene/protein/disease” paradigm to whole organism consideration). Data mining - functional/structural information is important for studying the molecular basis of diseases, diagnostics, developing drugs (personal medicine), evolutionary patterns, etc. Why Bio Informatics ? (cont.)

15 http://www.library.csi.cuny.edu/~davis/Bioinfo_326/lectures/lect14/lect_14.html Why Bio Informatics ? (cont.)

16 http://www.usgenomics.com/technology/index.shtml Principle milestones in data mining and genome analysis: Sanger method for sequencing, invented in 1977 (winner of the Nobel Prize in 1980), Polymerase chain reaction (PCR), invented in 1989 (awarded the Nobel Prize in 1993). Future of Genomic Research

17 The next step: Locate all the genes and understand their function. This will probably take another 15-20 years !

18 Disease Genes Discovered

19

20 One can efficiently find information: Using databases and software on the web. Question: How likely are you to use a free bio-informatics library of accessible software ? http://www.cryst.bbk.ac.uk/classlib/BBSRC_poster/potential.html The job of biologists is changing…

21 Molecular Biology Analysis Software Tools - Freely Available on the Web. - Highlights

22 Broad Classification of Biological Databases http://www.mrc-lmb.cam.ac.uk/genomes/madanm/pres/biodb.htm

23 ENTREZ - PubMed NCBI

24 http://www3.ncbi.nlm.nih.gov/Entrez/index.html

25 Genome Proteome Transcriptome Gene function Metabolome Glycome 89,300 1,701 Google search PubMed 2.1x10 6 76,566 9,960 229 1.2x10 6 6.5x10 5 1,170 29 Post-genomic terms (Oct. 2002) 138 6 PubMed Hits Proteome From: Computational Proteomics, Mark B Gerstein, Yale U.

26 http://cbms.st-and.ac.uk/academics/ryan/Teaching/SB&Bioinf/lecture1.htm

27

28

29

30 Similarity / Analogy Examples: If looks like an elephant, and smells like an elephant– it’s an elephant. If walks like a duck, and quacks like a duck– it’s a duck. http://cbms.st-and.ac.uk/academics/ryan/Teaching/molbiol/Bioinf_files/v3_document.htm

31 Similarity Search in Databanks Find similar sequences to a working draft. As databanks grow, homologies get harder, and quality is reduced. Alignment Tools: BLAST & FASTA (time saving heuristics- approximations). >gb|BE588357.1|BE588357 194087 BARC 5BOV Bos taurus cDNA 5'. Length = 369 Score = 272 bits (137), Expect = 4e-71 Identities = 258/297 (86%), Gaps = 1/297 (0%) Strand = Plus / Plus Query: 17 aggatccaacgtcgctccagctgctcttgacgactccacagataccccgaagccatggca 76 |||||||||||||||| | ||| | ||| || ||| | |||| ||||| ||||||||| Sbjct: 1 aggatccaacgtcgctgcggctacccttaaccact-cgcagaccccccgcagccatggcc 59 Query: 77 agcaagggcttgcaggacctgaagcaacaggtggaggggaccgcccaggaagccgtgtca 136 |||||||||||||||||||||||| | || ||||||||| | ||||||||||| ||| || Sbjct: 60 agcaagggcttgcaggacctgaagaagcaagtggagggggcggcccaggaagcggtgaca 119 Query: 137 gcggccggagcggcagctcagcaagtggtggaccaggccacagaggcggggcagaaagcc 196 |||||||| | || | ||||||||||||||| ||||||||||| || |||||||||||| Sbjct: 120 tcggccggaacagcggttcagcaagtggtggatcaggccacagaagcagggcagaaagcc 179 Query: 197 atggaccagctggccaagaccacccaggaaaccatcgacaagactgctaaccaggcctct 256 ||||||||| | |||||||| |||||||||||||||||| |||||||||||||||||||| Sbjct: 180 atggaccaggttgccaagactacccaggaaaccatcgaccagactgctaaccaggcctct 239 Query: 257 gacaccttctctgggattgggaaaaaattcggcctcctgaaatgacagcagggagac 313 || || ||||| || ||||||||||| | |||||||||||||||||| |||||||| Sbjct: 240 gagactttctcgggttttgggaaaaaacttggcctcctgaaatgacagaagggagac 296 Pairwise alignment:

32 Multiple Sequence Alignment Multiple alignment: find protein families and functional domains.

33 Structure - Function Relationships structure function sequence

34 Protein Structure (domains)

35 Phylogeny Evolution - a process in which small changes occur within species over time. These changes could be monitored today using molecular techniques. The Tree of Life: A classical, basic science problem, since Darwin’s 1859 “Origin of Species”.

36 Origin of the universe ? Formation of the solar system First self replicating systems Prokaryotes/ eukaryotes Plant/ animals Invertebrates/ vertebrates Mammalian radiation Tree of Life: Searching Protein Sequence Databases - How far can we see back ?

37 Write down all of human DNA on a single CD (“completed” 2001). Identify all genes, their location and function (far from completion). The Human Genome Project (HGP)

38 Example for Gene Localization Bio-Tool (FISH).

39 Fluorescent labeled probes hybridize to specific chromosomal locations. Example application: low resolution localization of a gene. FISH - Fluorescence In-Situ Hybridization.

40 Sequencing Genes & Gene Assembly Automated sequencing

41 Gene Finding Only 2-3% of the human genome encodes for functional genes. Genes are found along large non-coding DNA regions. Repeats, pseudo-genes, introns, contamination of vectors, are very confusing.

42 Gene Finding - cont. Find special gene patterns: Translation start and stop sites (open reading frames - ORF). Transcription factors, promoters. Intron splice sites. Etc…

43

44 Micro Arrays (“DNA Chips”) New biotechnology breakthrough: measure RNA expression levels of thousands of genes (in one experiment).

45 The Idea Behind Micro Arrays

46 Clustering Analysis of Gene Expression Data DNA chips and personalized medicine (leading edge, future technologies).

47 Pharmaco-genomics Use DNA information to measure and predict the reaction to drugs. Personalized medicine. Faster clinical trials: selected populations. Less drug side-effects.

48 Protein and Other Arrays Sequencing the human genome => finite problem. Studying the proteome => endless possible variations, dynamic. Protein array Future fields of study: Proteins + Genomics = Proteomics Lipids + Genomics = Lipomics Sugars + Genomics = Glycomics

49 Understanding Mechanisms of Disease EC number compound

50 SEQUENCE ALIGNMENT ORTHOLOG GENES (Taxonomy) CONSERVED DOMAINS CODING REGIONS 3-D STRUCTURE GENE FAMILIES MUTATIONS & POLYMORPHISM GENOME MAPS CELLULAR LOCATION SIGNAL PEPTIDE Putting it all together: Bio-Informatics SEQUENCES & LITERATURE

51 GENE EXPRESSION, GENES FUNCTION, DRUG & PERSONAL THERAPY CODING REGIONS SEQUENCE ALIGNMENT ORTHOLOG GENES (Taxonomy) CONSERVED DOMAINS GENE FAMILIES MUTATIONS & POLYMORPHISM GENOME MAPS CELLULAR LOCATION SIGNAL PEPTIDE 3-D STRUCTURE Putting it all together: Bio-Informatics


Download ppt "The BIG Goal “The greatest challenge, however, is analytical. … Deeper biological insight is likely to emerge from examining datasets with scores of samples.”"

Similar presentations


Ads by Google