Presentation is loading. Please wait.

Presentation is loading. Please wait.

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2002.08 Introduction to Bioinformatics.

Similar presentations


Presentation on theme: "Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2002.08 Introduction to Bioinformatics."— Presentation transcript:

1 Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2002.08 Introduction to Bioinformatics

2 Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2002.08 The Swiss Institute of Bioinformatics Collaborative structure Lausanne - Geneva Groups at ISREC, Ludwig Institute, CHUV, Unil, HUG, UniGe, and recently UniBas Several roles: research, services, teaching DEA (master degree) in Bioinformatics: 1 year full time. EMBnet courses: 2x 1 week per year, to be extended Pregrade courses in Geneva, Fribourg and Lausanne Universities

3 Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2002.08 Projects at SIB Databases SWISS-PROT, PROSITE, EPD, World-2DPAGE, SWISS-MODEL TrEST, TrGEN (predicted proteins), tromer (transcriptome) Softwares Melanie, Deep View, proteomic tools, ESTScan, pftools, Java applets Services Web servers ExPASy, EMBnet Teaching and helpdesk Research Mostly sequence and expression analysis, 3D structure, and proteomic

4 Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2002.08 EMBnet organisation European in 1988, now world-wide spread 29 country nodes, 9 special nodes. Role Training, education Software development (EMBOSS, SRS) Computing resources (databases, websites, services) Helpdesk and technical support Publications

5 Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2002.08 Swiss node http://www.ch.embnet.org

6 Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2002.08 Other important sites ExPASy - Expert Protein Analysis System www.expasy.org EBI - European Bioinformatics Institute www.ebi.ac.uk NCBI - National Center for Biotechnology Information www.ncbi.nlm.nih.gov Sanger - The Sanger Institute www.sanger.ac.uk

7 Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2002.08 Bioinformatics: definition Every application of computer science to biology Sequence analysis, images analysis, sample management, population modelling, … Analysis of data coming from large-scale biological projects Genomes, transcriptomes, proteomes, metabolomes, etc…

8 Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2002.08 The new biology Traditional biology Small team working on a specialized topic Well defined experiment to answer precise questions New « high-throughput » biology Large international teams using cutting edge technology defining the project Results are given raw to the scientific community without any underlying hypothesis

9 Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2002.08 Example of « high-throughput » Complete genome sequencing Large-scale sampling of the transcriptome (EST) Simultaneous expression analysis of thousands of genes (DNA microarrays, SAGE) Large-scale sampling of the proteome Protein-protein analysis large-scale 2-hybrid (yeast, worm) Large-scale 3D structure production (yeast) Metabolism modelling Simulations Biodiversity

10 Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2002.08 Role of bioinformatics Control and management of the data Analysis of primary data e.g. Base calling from chromatograms Mass spectra analysis DNA microarrays images analysis Statistics Database storage and access Results analysis in a biological context

11 Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2002.08 First information: a sequence ? Nucleotide RNA (or cDNA) Genomic (intron-exon) Complete or incomplete? mRNA with 5’ and 3’ UTR regions Entire chromosome Protein Pre/Pro or functional protein? Function prediction Post-translational modifications? Holy Grail: 3D structure?

12 Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2002.08 Genomes in numbers Sizes: virus: 10 3 to 10 5 nt bacteria: 10 5 to 10 7 nt yeast: 1.35 x 10 7 nt mammals: 10 8 to 10 10 nt plants: 10 10 to 10 11 nt Gene number: virus: 3 to 100 bacteria: ~ 1000 yeast: ~ 7000 mammals: ~ 30’000 Plants: 30’000-50’000?

13 Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2002.08 Sequencing projects « small » genomes (<10 7 ): bacteria, virus Many already sequenced (industry excluded) More than 90 microbial genomes already in the public domain More to come! (one new every two weeks…) « large » genomes (10 7 -10 10 ) eucaryotes 12 finished (S.cerevisiae, S. Pombe, E. cuniculi, C.elegans, D.melanogaster, A. gambiae, D. rerio, F. rubripes, A.thaliana, O. sativa, M. musculus, Homo sapiens) Many more to come: rat, pig, cow, maize (and other plants), insects, fishes, many pathogenic parasites (Plasmodium…) EST sequencing Partial mRNA sequences ~12x10 6 sequences in the public domain

14 Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2002.08 Human genome Size: 3 x 10 9 nt for a haploid genome Highly repetitive sequences 25%, moderately repetitive sequences 25-30% Size of a gene: from 900 to >2’000’000 bases (introns included) Proportion of the genome coding for proteins: 5-7% Number of chromosomes: 22 autosomal, 1 sexual chromosome Size of a chromosome: 5 x 10 7 to 5 x 10 8 bases centromerexons of a genetelomer regulatory elementsrepetitive sequences locus control region

15 Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2002.08 How to sequence the human genome? Consortium « international » approach: Generate genetic maps (meiotic recombination) and pseudogenetic maps (chromosome hybrids) for indicator sequences Generate a physical map based on large clones (BAC or PAC) Sequence enough large clones to cover the genome « commercial » approach (Celera): Generate random libraries of fixed length genomic clones (2kb and 10kb) Sequence both ends of enough clones to obtain a 10x coverage Use computer techniques to reconstitute the chromosomal sequences, check with the public project physical map

16 Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2002.08 Sequencing progression

17 Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2002.08 Interpretation of the human draft Still many gaps and unordered small pieces (except for chr 6, 7, 20, 21, 22, Y) Even a genomic sequence does not tell you where the genes are encoded. The genome is far from being « decoded » One must combine genome and transcriptome to have a better idea

18 Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2002.08 The transcriptome The set of all functional RNAs (tRNA, rRNA, mRNA etc…) that can potentially be transcribed from the genome The documentation of the localization (cell type) and conditions under which these RNAs are expressed The documentation of the biological function(s) of each RNA species

19 Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2002.08 Public draft transcriptome Information about the expression specificity and the function of mRNAs « full » cDNA sequences of know function « full » cDNA sequences, but « anonymous » (e.g. KIAA or DKFZ collections) EST sequences cDNA libraries derived from many different tissues Rapid random sequencing of the ends of all clones ORESTES sequences Growing set of expression data (microarrays, SAGE etc…) Increasing evidences for multiple alternative splicing and polyadenylation

20 Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2002.08 Example mapping of ESTs and mRNAs ESTs mRNAs Computer prediction

21 Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2002.08 The proteome Set of proteins present in a particular cell type under particular conditions Set of proteins potentially expressed from the genome Information about the specific expression and function of the proteins

22 Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2002.08 Information on the proteome Separation of a complex mixture of proteins 2D PAGE (IEF + SDS PAGE) Capillary chromatography Individual characterisation of proteins Tryptic peptides signature (MS) Sequencing by chemistry or MS/MS All post-translational modifications (PTMs) !

23 Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2002.08 Tridimentional structures Methods to determine structures X-ray cristallography NMR Data format Atoms coordinates (except H) in a cartesian space Databases For proteins and nucleic acids (RSCB, was PDB) Independent databases for sugars and small organic molecules

24 Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2002.08 Visualisation of the structures Secondary structure elements Alpha helices, beta sheets, other Softwares Various representations (atoms, bonds, secondary…) Big choice of commercial and free software (e.g., DeepView)

25 Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2002.08 Sequence information, and so what ? How to store and organise ? Databases (next lecture) How to access, search, compare ? Pairwise alignments, BLAST (tomorrow) EST clustering, Multiple Alignments (Wednesday) Patterns, PSI-BLAST, Profiles and HMMs (Thursday) Gene prediction (Thursday) Your problems? Friday


Download ppt "Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2002.08 Introduction to Bioinformatics."

Similar presentations


Ads by Google