Introduction to Bioinformatics Lecturer: Dr. Yael Mandel-Gutfreund Teaching Assistance: Oleg Rokhlenko Ydo Wexler

Slides:



Advertisements
Similar presentations
Bioinformatics Ayesha M. Khan Spring 2013.
Advertisements

© Wiley Publishing All Rights Reserved. Using Nucleotide Sequence Databases.
Beyond PubMed and BLAST: Exploring NCBI tools and databases Kate Bronstad David Flynn Alumni Medical Library.
Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.
Genome databases and webtools for genome analysis Become familiar with microbial genome databases Use some of the tools useful for analyzing genome Visit.
COT 6930 HPC and Bioinformatics Bioinformatics Resources and Databases Xingquan Zhu Dept. of Computer Science and Engineering.
On line (DNA and amino acid) Sequence Information Lecture 7.
Bioinformatics for biomedicine Summary and conclusions. Further analysis of a favorite gene Lecture 8, Per Kraulis
Bioinformatics Dr. Aladdin HamwiehKhalid Al-shamaa Abdulqader Jighly Lecture 1 Introduction Aleppo University Faculty of technical engineering.
living organisms According to Presence of cell The non- cellular organism The cellular organisms According to Type the Eukaryotes the prokaryotes human.
Introduction to Bioinformatics Lecturer: Dr. Yael Mandel-Gutfreund Teaching Assistant: Shula Shazman Sivan Bercovici Course web site :
Archives and Information Retrieval
Biological databases.
Sequence Analysis MUPGRET June workshops. Today What can you do with the sequence? What can you do with the ESTs? The case of SNP and Indel.
Bioinformatics: a Multidisciplinary Challenge Ron Y. Pinter Dept. of Computer Science Technion March 12, 2003.
1 Pairwise Sequence Alignment. 2 Biological motivation Main algorithms for pairwise sequences alignment ATTGCGTCGATCGCAC-GCACGCT ATTGCAGTG-TCGAGCGTCAGGCT.
The Cell, Central Dogma and Human Genome Project.
Biological Databases Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center
Bioinformatics Student host Chris Johnston Speaker Dr Kate McCain.
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
Sequence Analysis. Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to.
Signaling Pathways and Summary June 30, 2005 Signaling lecture Course summary Tomorrow Next Week Friday, 7/8/05 Morning presentation of writing assignments.
Introduction to Bioinformatics / Lecturer: Prof. Yael Mandel-Gutfreund Teaching Assistance: Shai Ben-Elazar Idit kosti Course web site :
An Introduction to Bioinformatics Molecular Biology Databases.
Doug Brutlag 2011 Genome Databases Doug Brutlag Professor Emeritus of Biochemistry & Medicine Stanford University School of Medicine Genomics, Bioinformatics.
From T. MADHAVAN, & K.Chandrasekaran Lecturers in Zoology.. EXIT.
Doug Brutlag Professor Emeritus Biochemistry & Medicine (by courtesy) Genome Databases Computational Molecular Biology Biochem 218 – BioMedical Informatics.
Overview of Bioinformatics A/P Shoba Ranganathan Justin Choo National University of Singapore A Tutorial on Bioinformatics.
On line (DNA and amino acid) Sequence Information
Lesson 10 Bioinformatics
Bioinformatics.
Development of Bioinformatics and its application on Biotechnology
Chapter 14 Genomes and Genomics. Sequencing DNA dideoxy (Sanger) method ddGTP ddATP ddTTP ddCTP 5’TAATGTACG TAATGTAC TAATGTA TAATGT TAATG TAAT TAA TA.
Genomics, Proteomics, and Bioinformatics Biology 224 Instructor: Tom Peavy January 29, 2008.
Introduction to Bioinformatics CPSC 265. Interface of biology and computer science Analysis of proteins, genes and genomes using computer algorithms and.
Doug Raiford Lesson 3.  More and more sequence data is being generated every day  Useless if not made available to other researchers.
1 Orthology and paralogy A practical approach Searching the primaries Searching the secondaries Significance of database matches DB Web addresses Software.
1 Review of Biological Database Utilization. 2 Biological Databases We will discuss: Usefulness to the bioinformaticist Database types Search methods.
Molecular Biology Primer. Starting 19 th century… Cellular biology: Cell as a fundamental building block 1850s+: ``DNA’’ was discovered by Friedrich Miescher.
Genome databases and webtools for genome analysis Become familiar with microbial genome databases Use some of the tools useful for analyzing genome Visit.
Organizing information in the post-genomic era The rise of bioinformatics.
Introduction to Bioinformatics Lecturer: Prof. Yael Mandel-Gutfreund Teaching Assistance: Rachelly Normand Edward Vitkin Course web site :
Biological Databases Biology outside the lab. Why do we need Bioinfomatics? Over the past few decades, major advances in the field of molecular biology,
Professional Development Course 1 – Molecular Medicine Genome Biology June 12, 2012 Ansuman Chattopadhyay, PhD Head, Molecular Biology Information Services.
Biological databases Exercises. Discovery of distinct sequence databases using ensembl.
Epidemiology 217 Molecular and Genetic Epidemiology Bioinformatics & Proteomics John Witte.
EB3233 Bioinformatics Introduction to Bioinformatics.
An overview of Bioinformatics. Cell and Central Dogma.
Introduction to Bioinformatics Lecturer: Dr. Yael Mandel-Gutfreund Teaching Assistance: Martin Akerman Sivan Bercovici Course web site :
Bioinformatics and Computational Biology
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS) LECTURE 13 ANALYSIS OF THE TRANSCRIPTOME.
Introduction to Bioinformatics Dr. Yael Mandel-Gutfreund TA: Oleg Rokhlenko.
Introduction to Bioinformatics Lecturer: Prof. Yael Mandel-Gutfreund Teaching Assistance: Rachelly Normand Olga Karinski Course web site :
Using public resources to understand associations Dr Luke Jostins Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015.
 What is MSA (Multiple Sequence Alignment)? What is it good for? How do I use it?  Software and algorithms The programs How they work? Which to use?
Introduction to Genes and Genomes with Ensembl
Introduction to Bioinformatics
Introduction to Bioinformatics
Archives and Information Retrieval
생물정보학 Bioinformatics.
Introduction to Bioinformatics /234525
Mangaldai College, Mangaldai
Genomes and Their Evolution
Next Generation Sequencing and Human Genome Databases
Introduction to Bioinformatics
Gene Safari (Biological Databases)
Pairwise Sequence Alignment
SUBMITTED BY: DEEPTI SHARMA BIOLOGICAL DATABASE AND SEQUENCE ANALYSIS.
Presentation transcript:

Introduction to Bioinformatics Lecturer: Dr. Yael Mandel-Gutfreund Teaching Assistance: Oleg Rokhlenko Ydo Wexler

2 What is Bioinformatics?

3 Course Objectives To introduce the bioinfomatics discipline To make the students familiar with the major biological questions which can be addressed by bioinformatics tools To introduce the major tools used for sequence and structure analysis and explain in general how they work (limitation etc..)

4 Course Structure and Requirements 1.Class Structure Each class (except the first one) will be divided into two parts: 1.Lecture (in lecture room) 2.A Training Lab (in computer lab)* For the Training Lab the class will be divided to 2 groups. Each one of the groups will meet every second week, starting from the second week. The work in the Training Labs will be in pairs. Lab assignments will be submitted at the end of each lab. Preparing yourself for the lab- A tutorial including self home exercise and their answers will be posted on the web a week before the lab 2. A final home exam

5 Grading 30 % lab assignments 70% final exam

6 Literature list Gibas, C., Jambeck, P. Developing Bioinformatics Computer Skills. O'Reilly, Lesk, A. M. Introduction to Bioinformatics. Oxford University Press, Mount, D.W. Bioinformatics: Sequence and Genome Analysis. 2nd ed.,Cold Spring Harbor Laboratory Press, Advanced Reading Jones N.C & Pevzner P.A. An introduction to Bioinformatics algorithms MIT Press, 2004

7 Course syllabus

8 What is Bioinformatics?

9 “The field of science in which biology, computer science, and information technology merge to form a single discipline” Ultimate goal: to enable the discovery of new biological insights as well as to create a global perspective from which unifying principles in biology can be discerned. What is Bioinformatics?

10 from purely lab-based science to an information science Bioinformatics Bio = Informatics

11 Central Paradigm in Molecular Biology mRNAGene (DNA)Protein 21 ST centaury GenomeTranscriptomeProteome

12 Genome Chromosomal DNA of an organism Coding and non-coding DNA Genome size and number of genes does not necessarily determine organism complexity

13 Transcriptome Complete collection of all possible mRNAs (including splice variants) of an organism. Regions of an organism’s genome that get transcribed into messenger RNA. Transcriptome can be extended to include all transcribed elements, including non-coding RNAs used for structural and regulatory purposes.

14 Proteome The complete collection of proteins that can be produced by an organism. Can be studied either as static (sum of all proteins possible) or dynamic (all proteins found at a specific time point) entity

15 From DNA to Genome Watson and Crick DNA model First protein sequence First protein structure

First human genome draft First bacterial genome Hemophilus Influenzae Yeast genome

17 The Human Genome Project Initiated in 1986 Completed in 2003 Project goals were to identify all the genes in human DNA, determine the sequences of the 3 billion chemical base pairs that make up human DNA, store this information in databases, improve tools for data analysis and develop new tools address the ethical, legal, and social issues that may arise from the project.

18 Human Genome Project USA Department of Energy announces project International Human Genome Organization founded Low resolution linkage map published Celera Genomics founded First working drafts published Project successfully completed

19 The Human Genome Project Initiated in 1986 Completed in 2003 How did we do?? identify all the genes in human DNA ☺ ☺ determine the sequences of the 3 billion chemical base pairs that make up human DNA ☺ ☺ ☺ store this information in databases ☺ ☺ ☺ improve tools for data analysis and develop new tools ☺ ☺ ☺ address the ethical, legal, and social issues that may arise from the project ☺

20 What makes us human? CHIMP GENOME Chimpanzees are similar to humans in so many ways: they are socially complex, sensitive and communicative, and yet indisputably on the animal side of the man/beast divide. Scientists have now sequenced the genetic code of our closest living relative, showing the striking concordances and divergences between the two species, and perhaps holding up a mirror to our own humanity.

21 Perhaps not surprising!!! Comparison between the full drafts of the human and chimp genomes revealed that they differ only by 1.23% How humans are chimps?

eukaryotes 24 bacteria 240 archaea 39 Complete Genomes

23 The “post-genomics” era Goal: to understand the functional networks of a living cell AnnotationComparative genomics Structural genomics Functional genomics What’s Next ?

24 Annotation Open reading frames Functional sites Structure, function

25 CCTGACAAATTCGACGTGCGGCATTGCATGCAGACGTGCATG CGTGCAAATAATCAATGTGGACTTTTCTGCGATTATGGAAGAA CTTTGTTACGCGTTTTTGTCATGGCTTTGGTCCCGCTTTGTTC AGAATGCTTTTAATAAGCGGGGTTACCGGTTTGGTTAGCGAGA AGAGCCAGTAAAAGACGCAGTGACGGAGATGTCTGATG CAA TAT GGA CAA TTG GTT TCT TCT CTG AAT TGAAAAACGTA

26 CCTGACAAATTCGACGTGCGGCATTGCATGCAGACGTGCATG CGTGCAAATAATCAATGTGGACTTTTCTGCGATTATGGAAGAA CTTTGTTACGCGTTTTTGTCATGGCTTTGGTCCCGCTTTGTTC AGAATGCTTTTAATAAGCGGGGTTACCGGTTTGGTTAGCGAGA AGAGCCAGTAAAAGACGCAGTGACGGAGATGTCTGATG CAA TAT GGA CAA TTG GTT TCT TCT CTG AAT TGA AAAACGTA TF binding site promoter Ribosome binding Site ORF=Open Reading Frame CDS=Coding Sequence Transcription Start Site

27 Comparative genomics Whole Genome Comparison Concluding on regulatory networks

28 Chimps and Us

29 Comparative genomics Comparing ORFs Identifying orthologs Concluding on structure and function Whole Genome Comparison Concluding on regulatory networks

30 Researchers have learned a great deal about the function of human genes by examining their counterparts in simpler model organisms such as the mouse. Conservation of the IGFALS (Insulin-like growth factor) Between human and mouse.

31 Functional genomics Genome-wide profiling of: mRNA levels Protein levels Co-expression of genes and/or proteins

32 Understanding the function of genes and other parts of the genome

33 Functional genomics Genome-wide profiling of: mRNA levels Protein levels Co-expression of genes and/or proteins Identifying protein-protein interaction Networks of interactions

34 A large network of 8184 interactions among 4140 S. Cerevisiae proteins A network of interactions can be built For all proteins in an organism

35 Structural genomics Assign structure to all proteins encoded in a genome

36 Protein Structure

37 Resources and Databases The different types of data are collected in database –Sequence databases –Structural databases –Databases of Experimental Results All databases are connected

38 Database Types Sequence databases Generalspecial GenBank, emblTF binding sites PIR, SwissprotPromoters Genomes Structure databases GeneralSpecial PDBSpecific protein families folds Databases of experimental results Co-expressed genes, prot-prot interaction, etc.

39 Sequence databases Gene database Genome database SNPs database Disease related mutation database

40 What can we learn about a Gene

41 mRNA, full length, EST

42 EST Expressed Sequence Tags Partial copies of mRNA found within a particular cell Can be used to identify genic regions; splicing patterns of genes; etc

43 Different transcripts can be related to the same gene!

44 Gene database Give information into gene functionality Alternative splicing of genes –Alternative pattern of exons included to create gene product EST

45 Genome Databases Data organized by species Clones assembled into contigous pieces ‘contigs’ or whole chromosomes Information on non-coding regions Relativity

46 Genome Browsers Annotation adds value to sequence Easy “walk” through the genome Comparative genomics

47 Genome Browsers Ensembl Genome Browser ( UCSC Genome Browser WormBase: AceDB: Comprehensive Microbial Resource: FlyBase:

48 beta globin

49

50 RefSeq Set of mRNA sequences cureted at NCBI Many experimentally validated Some partially validated via ESTs Some computationally predicted

51

52

53

54

55

56 SNP database Single Nucleotide Polymorphisms (SNPs) Single base difference in a single position among two different individuals of the same species Play an important role in differentiation and disease

57 Sickle Cell Anemia Due to 1 swapping an A for a T, causing inserted amino acid to be valine instead of glutamine in hemoglobin Image source:

58 Healthy Individual >gi| |ref|NM_ | Homo sapiens hemoglobin, beta (HBB), mRNA ACATTTGCTTCTGACACAACTGTGTTCACTAGCAACCTCAAACAGACACCATGGTGCATCTGACTCCTGA GG A GAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGATGAAGTTGGTGGTGAGGCCCTGGGC AGGCTGCTGGTGGTCTACCCTTGGACCCAGAGGTTCTTTGAGTCCTTTGGGGATCTGTCCACTCCTGATG CTGTTATGGGCAACCCTAAGGTGAAGGCTCATGGCAAGAAAGTGCTCGGTGCCTTTAGTGATGGCCTGGC TCACCTGGACAACCTCAAGGGCACCTTTGCCACACTGAGTGAGCTGCACTGTGACAAGCTGCACGTGGAT CCTGAGAACTTCAGGCTCCTGGGCAACGTGCTGGTCTGTGTGCTGGCCCATCACTTTGGCAAAGAATTCA CCCCACCAGTGCAGGCTGCCTATCAGAAAGTGGTGGCTGGTGTGGCTAATGCCCTGGCCCACAAGTATCA CTAAGCTCGCTTTCTTGCTGTCCAATTTCTATTAAAGGTTCCTTTGTTCCCTAAGTCCAACTACTAAACT GGGGGATATTATGAAGGGCCTTGAGCATCTGGATTCTGCCTAATAAAAAACATTTATTTTCATTGC >gi| |ref|NP_ | beta globin [Homo sapiens] MVHLTP E EKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLG AFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVAN ALAHKYH

59 Diseased Individual >gi| |ref|NM_ | Homo sapiens hemoglobin, beta (HBB), mRNA ACATTTGCTTCTGACACAACTGTGTTCACTAGCAACCTCAAACAGACACCATGGTGCATCTGACTCCTGA GG T GAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGATGAAGTTGGTGGTGAGGCCCTGGGC AGGCTGCTGGTGGTCTACCCTTGGACCCAGAGGTTCTTTGAGTCCTTTGGGGATCTGTCCACTCCTGATG CTGTTATGGGCAACCCTAAGGTGAAGGCTCATGGCAAGAAAGTGCTCGGTGCCTTTAGTGATGGCCTGGC TCACCTGGACAACCTCAAGGGCACCTTTGCCACACTGAGTGAGCTGCACTGTGACAAGCTGCACGTGGAT CCTGAGAACTTCAGGCTCCTGGGCAACGTGCTGGTCTGTGTGCTGGCCCATCACTTTGGCAAAGAATTCA CCCCACCAGTGCAGGCTGCCTATCAGAAAGTGGTGGCTGGTGTGGCTAATGCCCTGGCCCACAAGTATCA CTAAGCTCGCTTTCTTGCTGTCCAATTTCTATTAAAGGTTCCTTTGTTCCCTAAGTCCAACTACTAAACT GGGGGATATTATGAAGGGCCTTGAGCATCTGGATTCTGCCTAATAAAAAACATTTATTTTCATTGC >gi| |ref|NP_ | beta globin [Homo sapiens] MVHLTP V EKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLG AFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVAN ALAHKYH

60 Disease Databases Genes are involved in disease Many diseases are well studied Description of diseases and what is known about them is stored OMIM - Online Mendelian Inheritance in Man

61

62 Structure Databases 3-dimensional structures of proteins, nucleic acids, molecular complexes etc 3-d data is available due to techniques such as NMR and X-Ray crystallography

63

64

65 Databases of Experimental Results Data such as experimental microarray images- expression data Clustering information Metabolic pathways, protein-protein interaction data

66 PubMed MEDLINE publication database –Over 17,000 journals –15 million citations since 1950 Service of the National Library of Medicine Literature Databases

67 Putting it All Together Each Database contains specific information Like other biological systems also these databases are interrelated

68 GENOMIC DATA GenBank DDBJ EMBL ASSEMBLED GENOMES GoldenPath WormBase TIGR PROTEIN PIR SWISS-PROT STRUCTURE PDB MMDB SCOP LITERATURE PubMed PATHWAY KEGG COG DISEASE LocusLink OMIM OMIA GENES RefSeq AllGenes GDB SNPs dbSNP ESTs dbEST unigene MOTIFS BLOCKS Pfam Prosite GENE EXPRESSION Stanford MGDB NetAffx ArrayExpress

69 Entrez – NCBI Engine Entrez is the integrated, text-based search and retrieval system used at NCBI for the major databases, including PubMed, Nucleotide and Protein Sequences, Protein Structures, Complete Genomes, Taxonomy, and others.Entrez

70 Entrez – NCBI Engine

71 General Bioinformatic Webpages –USA National Center for Biotechnology Information: –European Bioinformatics Institute: –ExPASy Molecular Biology Server: –Israeli National Node: inn.org.il