Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Bioinformatics 234525-236523 Lecturer: Dr. Yael Mandel-Gutfreund Teaching Assistance: Martin Akerman Sivan Bercovici Course web site :

Similar presentations


Presentation on theme: "Introduction to Bioinformatics 234525-236523 Lecturer: Dr. Yael Mandel-Gutfreund Teaching Assistance: Martin Akerman Sivan Bercovici Course web site :"— Presentation transcript:

1 Introduction to Bioinformatics 234525-236523 Lecturer: Dr. Yael Mandel-Gutfreund Teaching Assistance: Martin Akerman Sivan Bercovici Course web site : http://webcourse.cs.technion.ac.il/234525

2 2 What is Bioinformatics?

3 3 Course Objectives To introduce the bioinfomatics discipline To make the students familiar with the major biological questions which can be addressed by bioinformatics tools To introduce the major tools used for sequence and structure analysis and explain in general how they work (limitation etc..)

4 4 Course Structure and Requirements 1.Class Structure 1.2 hours Lecture 2.1 hour tutorial 2. Home work Homework projects will be given every second week The homework will be done in pairs. 5/5 homework projects submitted 2. A final project will be conducted and submitted in pairs

5 5 Grading 30 % Homework assignments 70% final project

6 6 Literature list Gibas, C., Jambeck, P. Developing Bioinformatics Computer Skills. O'Reilly, 2001. Lesk, A. M. Introduction to Bioinformatics. Oxford University Press, 2002. Mount, D.W. Bioinformatics: Sequence and Genome Analysis. 2nd ed.,Cold Spring Harbor Laboratory Press, 2004. Advanced Reading Jones N.C & Pevzner P.A. An introduction to Bioinformatics algorithms MIT Press, 2004

7 7 What is Bioinformatics?

8 8 “The field of science in which biology, computer science, and information technology merge to form a single discipline” Ultimate goal: to enable the discovery of new biological insights as well as to create a global perspective from which unifying principles in biology can be discerned. What is Bioinformatics?

9 9 from purely lab-based science to an information science Bioinformatics Bio = Informatics

10 10 Central Paradigm in Molecular Biology mRNAGene (DNA)Protein 21 ST centaury GenomeTranscriptomeProteome

11 11 Genome Chromosomal DNA of an organism Coding and non-coding DNA Genome size and number of genes does not necessarily determine organism complexity

12 12 Transcriptome Complete collection of all possible mRNAs (including splice variants) of an organism. Regions of an organism’s genome that get transcribed into messenger RNA. Transcriptome can be extended to include all transcribed elements, including non-coding RNAs used for structural and regulatory purposes.

13 13 Proteome The complete collection of proteins that can be produced by an organism. Can be studied either as static (sum of all proteins possible) or dynamic (all proteins found at a specific time point) entity

14 14 From DNA to Genome Watson and Crick DNA model First protein sequence 1955 1960 1965 1970 1975 1980 1985 First protein structure

15 15 1995 1990 2000 First human genome draft First bacterial genome Hemophilus Influenzae Yeast genome

16 16 Total 706 456 Eukaryotes 78 43 Bacteria 578 383 Archaea 50 29 Complete Genomes 2008 2007

17 17 Comparison between the full drafts of the human and chimp genomes revealed that they differ only by 1.23% How humans are chimps? Perhaps not surprising!!!

18 18 The “post-genomics” era Goal: to understand the living cell AnnotationComparative genomics Structural genomics Functional genomics What’s Next ?

19 19 CCTGACAAATTCGACGTGCGGCATTGCATGCAGACGTGCATG CGTGCAAATAATCAATGTGGACTTTTCTGCGATTATGGAAGAA CTTTGTTACGCGTTTTTGTCATGGCTTTGGTCCCGCTTTGTTC AGAATGCTTTTAATAAGCGGGGTTACCGGTTTGGTTAGCGAGA AGAGCCAGTAAAAGACGCAGTGACGGAGATGTCTGATG CAA TAT GGA CAA TTG GTT TCT TCT CTG AAT.................... TGAAAAACGTA Annotation

20 20 Annotation Identify the genes within a given sequence of DNA Identify the sites Which regulate the gene Predict the function

21 21 CCTGACAAATTCGACGTGCGGCATTGCATGCAGACGTGCATG CGTGCAAATAATCAATGTGGACTTTTCTGCGATTATGGAAGAA CTTTGTTACGCGTTTTTGTCATGGCTTTGGTCCCGCTTTGTTC AGAATGCTTTTAATAAGCGGGGTTACCGGTTTGGTTAGCGAGA AGAGCCAGTAAAAGACGCAGTGACGGAGATGTCTGATG CAA TAT GGA CAA TTG GTT TCT TCT CTG AAT............................................... TGA AAAACGTA TF binding site promoter Ribosome binding Site ORF=Open Reading Frame CDS=Coding Sequence Transcription Start Site

22 22 Comparative genomics Human ATAGCGGGGGGATGCGGGCCCTATACCC Chimp ATAGGGG - - GGATGCGGGCCCTATACCC Mouse ATAGCG - - - GGATGCGGCGC -TATACCA

23 23 Researchers have learned a great deal about the function of human genes by examining their counterparts in simpler model organisms such as the mouse. Conservation of the IGFALS (Insulin-like growth factor) Between human and mouse.

24 24 Functional genomics

25 25 Understanding the function of genes and other parts of the genome

26 26 A large network of 8184 interactions among 4140 S. Cerevisiae proteins A network of interactions can be built For all proteins in an organism

27 27 Structural genomics

28 28 Assigning the structures of all proteins Protein-ligand complexes Functional sites fold Evolutionary relationship Shape and electrostatics Active sites protein complexes Biologic processes

29 29 Resources and Databases The different types of data are collected in database –Sequence databases –Structural databases –Databases of Experimental Results All databases are connected

30 30 Sequence databases Gene database Genome database SNPs database Disease related mutation database

31 31 Gene database Give information into gene functionality Alternative splicing of genes –Alternative pattern of exons included to create gene product EST

32 32 Genome Databases Data organized by species Clones assembled into contigous pieces ‘contigs’ or whole chromosomes Information on non-coding regions Relativity

33 33 Genome Browsers Annotation adds value to sequence Easy “walk” through the genome Comparative genomics

34 34 Genome Browsers UCSC Genome Browser http://genome.ucsc.edu/ http://genome.ucsc.edu/ Ensembl Genome Browser (http://www.ensembl.org)http://www.ensembl.org WormBase: http://www.wormbase.org/ http://www.wormbase.org/ AceDB: http://www.acedb.org/ http://www.acedb.org/ Comprehensive Microbial Resource: http://www.tigr.org/tigr-scripts/CMR2/CMRHomePage.spl http://www.tigr.org/tigr-scripts/CMR2/CMRHomePage.spl FlyBase: http://flybase.bio.indiana.edu/ http://flybase.bio.indiana.edu/

35 35 SNP database Single Nucleotide Polymorphisms (SNPs) Single base difference in a single position among two different individuals of the same species Play an important role in differentiation and disease

36 36 Sickle Cell Anemia Due to 1 swapping an A for a T, causing inserted amino acid to be valine instead of glutamine in hemoglobin Image source: http://www.cc.nih.gov/ccc/ccnews/nov99/

37 37 Healthy Individual >gi|28302128|ref|NM_000518.4| Homo sapiens hemoglobin, beta (HBB), mRNA ACATTTGCTTCTGACACAACTGTGTTCACTAGCAACCTCAAACAGACACCATGGTGCATCTGACTCCTGA GG A GAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGATGAAGTTGGTGGTGAGGCCCTGGGC AGGCTGCTGGTGGTCTACCCTTGGACCCAGAGGTTCTTTGAGTCCTTTGGGGATCTGTCCACTCCTGATG CTGTTATGGGCAACCCTAAGGTGAAGGCTCATGGCAAGAAAGTGCTCGGTGCCTTTAGTGATGGCCTGGC TCACCTGGACAACCTCAAGGGCACCTTTGCCACACTGAGTGAGCTGCACTGTGACAAGCTGCACGTGGAT CCTGAGAACTTCAGGCTCCTGGGCAACGTGCTGGTCTGTGTGCTGGCCCATCACTTTGGCAAAGAATTCA CCCCACCAGTGCAGGCTGCCTATCAGAAAGTGGTGGCTGGTGTGGCTAATGCCCTGGCCCACAAGTATCA CTAAGCTCGCTTTCTTGCTGTCCAATTTCTATTAAAGGTTCCTTTGTTCCCTAAGTCCAACTACTAAACT GGGGGATATTATGAAGGGCCTTGAGCATCTGGATTCTGCCTAATAAAAAACATTTATTTTCATTGC >gi|4504349|ref|NP_000509.1| beta globin [Homo sapiens] MVHLTP E EKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLG AFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVAN ALAHKYH

38 38 Diseased Individual >gi|28302128|ref|NM_000518.4| Homo sapiens hemoglobin, beta (HBB), mRNA ACATTTGCTTCTGACACAACTGTGTTCACTAGCAACCTCAAACAGACACCATGGTGCATCTGACTCCTGA GG T GAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGATGAAGTTGGTGGTGAGGCCCTGGGC AGGCTGCTGGTGGTCTACCCTTGGACCCAGAGGTTCTTTGAGTCCTTTGGGGATCTGTCCACTCCTGATG CTGTTATGGGCAACCCTAAGGTGAAGGCTCATGGCAAGAAAGTGCTCGGTGCCTTTAGTGATGGCCTGGC TCACCTGGACAACCTCAAGGGCACCTTTGCCACACTGAGTGAGCTGCACTGTGACAAGCTGCACGTGGAT CCTGAGAACTTCAGGCTCCTGGGCAACGTGCTGGTCTGTGTGCTGGCCCATCACTTTGGCAAAGAATTCA CCCCACCAGTGCAGGCTGCCTATCAGAAAGTGGTGGCTGGTGTGGCTAATGCCCTGGCCCACAAGTATCA CTAAGCTCGCTTTCTTGCTGTCCAATTTCTATTAAAGGTTCCTTTGTTCCCTAAGTCCAACTACTAAACT GGGGGATATTATGAAGGGCCTTGAGCATCTGGATTCTGCCTAATAAAAAACATTTATTTTCATTGC >gi|4504349|ref|NP_000509.1| beta globin [Homo sapiens] MVHLTP V EKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLG AFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVAN ALAHKYH

39 39 Disease Databases Genes are involved in disease Many diseases are well studied Description of diseases and what is known about them is stored

40 40 Structure Databases 3-dimensional structures of proteins, nucleic acids, molecular complexes etc 3-d data is available due to techniques such as NMR and X-Ray crystallography

41 41

42 42 Databases of Experimental Results Data such as experimental microarray images- expression data Proteomic data Metabolic pathways, protein-protein interaction data, regulatory networks ETC………….

43 43 PubMed MEDLINE publication database –Over 17,000 journals –15 million citations since 1950 Service of the National Library of Medicine http://www.ncbi.nlm.nih.giv/PubMed Literature Databases

44 44 Putting it all Together Each Database contains specific information Like other biological systems also these databases are interrelated

45 45 GENOMIC DATA GenBank DDBJ EMBL ASSEMBLED GENOMES GoldenPath WormBase TIGR PROTEIN PIR SWISS-PROT STRUCTURE PDB MMDB SCOP LITERATURE PubMed PATHWAY KEGG COG DISEASE LocusLink OMIM OMIA GENES RefSeq AllGenes GDB SNPs dbSNP ESTs dbEST unigene MOTIFS BLOCKS Pfam Prosite GENE EXPRESSION Stanford MGDB NetAffx ArrayExpress


Download ppt "Introduction to Bioinformatics 234525-236523 Lecturer: Dr. Yael Mandel-Gutfreund Teaching Assistance: Martin Akerman Sivan Bercovici Course web site :"

Similar presentations


Ads by Google