Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2003.01 Introduction to Bioinformatics.

Slides:



Advertisements
Similar presentations
Bioinformatics Ayesha M. Khan Spring 2013.
Advertisements

Recombinant DNA Technology
Recombinant DNA technology
On line (DNA and amino acid) Sequence Information Lecture 7.
HCS806 “Methods in Horticulture and Crop Science” Introduction to methods in Bioinformatics for plant science. David Francis (Coordinator) Ian Holford.
Basic Genomic Characteristic  AIM: to collect as much general information as possible about your gene: Nucleotide sequence Databases ○ NCBI GenBank ○
Bioinformatics at WSU Matt Settles Bioinformatics Core Washington State University Wednesday, April 23, 2008 WSU Linux User Group (LUG)‏
Bioinformatics For MNW 2 nd Year Jaap Heringa FEW/FALW Integrative Bioinformatics Institute VU (IBIVU) Tel ,
Bioinformatics Dr. Aladdin HamwiehKhalid Al-shamaa Abdulqader Jighly Lecture 1 Introduction Aleppo University Faculty of technical engineering.
Protein databases Morten Nielsen. Background- Nucleotide databases GenBank, National Center for Biotechnology Information.
Archives and Information Retrieval
. Class 1: Introduction. The Tree of Life Source: Alberts et al.
Introduction to Bioinformatics Spring 2008 Yana Kortsarts, Computer Science Department Bob Morris, Biology Department.
Bioinformatics: a Multidisciplinary Challenge Ron Y. Pinter Dept. of Computer Science Technion March 12, 2003.
Data-intensive Computing: Case Study Area 1: Bioinformatics B. Ramamurthy 6/17/20151.
Kate Milova MolGen retreat March 24, Microarray experiments: Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
The Cell, Central Dogma and Human Genome Project.
Genes. Outline  Genes: definitions  Molecular genetics - methodology  Genome Content  Molecular structure of mRNA-coding genes  Genetics  Gene regulation.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
CHAPTER 15 Microbial Genomics Genomic Cloning Techniques Vectors for Genomic Cloning and Sequencing MS2, RNA virus nt sequenced in 1976 X17, ssDNA.
Workshop in Bioinformatics 2010 Class # Class 8 March 2010.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Bioinformatics Alternative splicing Multiple isoforms Exonic Splicing Enhancers (ESE) and Silencers (ESS) SpliceNest Lecture 13.
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
Luxembourg, Sep 2001 Pedro Fernandes Inst. Gulbenkian de Ciência, Oeiras, Portugal EMBER A European Multimedia Bioinformatics Educational Resource.
Workshop in Bioinformatics 2009 Class 2 Michal Linial Room A-515
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Human Genome Project Seminal achievement. Scientific milestone. Scientific implications. Social implications.
ExPASy - Expert Protein Analysis System The bioinformatics resource portal and other resources An Overview.
On line (DNA and amino acid) Sequence Information
A number of slides taken/modified from:
14.3 Studying the Human Genome
Lesson 10 Bioinformatics
Bioinformatics.
Development of Bioinformatics and its application on Biotechnology
AP Biology Ch. 20 Biotechnology.
Chapter 14 Genomes and Genomics. Sequencing DNA dideoxy (Sanger) method ddGTP ddATP ddTTP ddCTP 5’TAATGTACG TAATGTAC TAATGTA TAATGT TAATG TAAT TAA TA.
Manipulating DNA.
NCBI Review Concepts Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)
Master’s Degrees in Bioinformatics in Switzerland: Past, present and near future Patricia M. Palagi Swiss Institute of Bioinformatics.
Biological Databases By : Lim Yun Ping E mail :
Computational prediction of protein-protein interactions Rong Liu
Chapter 21 Eukaryotic Genome Sequences
Biological Databases Biology outside the lab. Why do we need Bioinfomatics? Over the past few decades, major advances in the field of molecular biology,
Professional Development Course 1 – Molecular Medicine Genome Biology June 12, 2012 Ansuman Chattopadhyay, PhD Head, Molecular Biology Information Services.
Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF Introduction to Bioinformatics.
Genomics.
EMBOSS over a Grid 1. 1st EELA Grid School December 4th of 2006 Eduardo MURRIETA LEON Romualdo ZAYAS-LAGUNAS Pierre-Alain BRANGER Jérôme VERLEYEN Roberto.
Central dogma: the story of life RNA DNA Protein.
EB3233 Bioinformatics Introduction to Bioinformatics.
Bioinformatics and Computational Biology
Alternative Splicing (a review by Liliana Florea, 2005) CS 498 SS Saurabh Sinha 11/30/06.
1 From Mendel to Genomics Historically –Identify or create mutations, follow inheritance –Determine linkage, create maps Now: Genomics –Not just a gene,
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
The iPlant Collaborative Vision Enable life science researchers and educators to use and extend cyberinfrastructure.
Central hub for biological data UniProtKB/Swiss-Prot is a central hub for biological data: over 120 databases are cross-referenced (EMBL/DDBJ/GenBank,
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS) LECTURE 13 ANALYSIS OF THE TRANSCRIPTOME.
IB Saccharomyces cerevisiae - Jan Major model system for molecular genetics. For example, one can clone the gene encoding a protein if you.
Biotechnology and Bioinformatics: Bioinformatics Essential Idea: Bioinformatics is the use of computers to analyze sequence data in biological research.
Introduction to Genes and Genomes with Ensembl
생물정보학 Bioinformatics.
Mangaldai College, Mangaldai
Genomes and Their Evolution
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
From Mendel to Genomics
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Human Genome Project Seminal achievement. Scientific milestone.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
SUBMITTED BY: DEEPTI SHARMA BIOLOGICAL DATABASE AND SEQUENCE ANALYSIS.
Presentation transcript:

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF Introduction to Bioinformatics

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF SIB and EMBnet Bioinformatics resources for biomedical scientists

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF The Swiss Institute of Bioinformatics Founded in March 1998 Collaborative structure Lausanne - Geneva - Basel Groups at ISREC, Ludwig Institute, Unil, HUG, UniGe, recently UniBas and soon EPFL. Several roles: teaching, services, research Currently: ~ 130 employees

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF Projects at SIB Databases SWISS-PROT, PROSITE, EPD, World-2DPAGE, SWISS-MODEL TrEST, TrGEN (predicted proteins), tromer (transcriptome) Softwares Melanie, Deep View, proteomic tools, ESTScan, pftools, Java applets Services Web servers ExPASy, EMBnet Teaching and helpdesk Research Mostly sequence and expression analysis, 3D structure, and proteomic

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF Teaching DEA (master degree) in Bioinformatics: 1 year full time, first diploma common to Unige and Unil. EMBnet courses: 2x 1 week per year in Lausanne, to be extended in Basel Pregrade courses in Geneva, Fribourg and Lausanne Universities Other courses at CHUV and EPFL Courses in other countries: Colombia, Cambodia, Peru, …

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF Research New algorithms (faster alignments…) New technology (GRID or cluster computing) New tools (protein analysis, microarrays, confocal microscopy) New databases (microarrays, transcriptome, proteome) Collaborations with lab researchers!

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF Three levels of services Simple web access to softwares and databases Easy to use for basic occasional research with few sequences Potentially insecure Command-line access with a local Unix account More powerful (automation) and secure Requires to understand Unix system and frequent practice Collaboration with SIB Access to experts in the field (help desk) For projects requiring huge programming or special hardware resources

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF SIB’s important sites Home ExPASy - Expert Protein Analysis System Hits database and tools hits.isb-sib.ch EMBnet Switzerland Geneva Bioinformatics

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF SIB home

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF Expert Protein Analysis System

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF Swiss node

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF EMBnet organisation European in 1988, now world-wide spread 32 country nodes, 8 special nodes. Role Training, education (EMBER) Software development (EMBOSS, SRS) Computing resources (databases, websites, services) Helpdesk and technical support Publications (EMBnet.news, Briefings in Bioinformatics) Access: Each node with “ where xx is the country code (e.g., ch for Switzerland)

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF EMBnet home

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF European Molecular Biology Open Software Suite Free Open Source (for most Unix plateforms) GCG successor (compatible with GCG file format) More than 200 programs Easy to install locally but no interface, requires local databases Unix command-line only Interfaces Jemboss, www2gcg, w2h, wemboss … (with account) Pise, EMBOSS-GUI (no account) Access:

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF Other important sites ExPASy - Expert Protein Analysis System EBI - European Bioinformatics Institute NCBI - National Center for Biotechnology Information Sanger - The Sanger Institute

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF Bioinformatics: definition Every application of computer science to biology Sequence analysis, images analysis, sample management, population modelling, … Analysis of data coming from large-scale biological projects Genomes, transcriptomes, proteomes, metabolomes, etc…

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF The new biology Traditional biology Small team working on a specialized topic Well defined experiment to answer precise questions New « high-throughput » biology Large international teams using cutting edge technology defining the project Results are given raw to the scientific community without any underlying hypothesis

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF Example of « high-throughput » Complete genome sequencing Large-scale sampling of the transcriptome (EST) Simultaneous expression analysis of thousands of genes (DNA microarrays, SAGE) Large-scale sampling of the proteome Protein-protein analysis large-scale 2-hybrid (yeast, worm) Large-scale 3D structure production (yeast) Metabolism modelling Simulations Biodiversity

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF Role of bioinformatics Control and management of the data Analysis of primary data e.g. Base calling from chromatograms Mass spectra analysis DNA microarrays images analysis Statistics Database storage and access Results analysis in a biological context

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF First information: a sequence ? Nucleotide RNA (or cDNA) Genomic (intron-exon) Complete or incomplete? mRNA with 5’ and 3’ UTR regions Entire chromosome Protein Pre/Pro or functional protein? Function prediction Post-translational modifications? Holy Grail: 3D structure?

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF Genomes in numbers Sizes: virus: 10 3 to 10 5 nt bacteria: 10 5 to 10 7 nt yeast: 1.35 x 10 7 nt mammals: 10 8 to nt plants: to nt Gene number: virus: 3 to 100 bacteria: ~ 1000 yeast: ~ 7000 mammals: ~ 30’000 Plants: 30’000-50’000?

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF Sequencing projects « small » genomes (<10 7 ): bacteria, virus Many already sequenced (industry excluded) More than 100 microbial genomes already in the public domain More to come! (one new every two weeks…) « large » genomes ( ) eucaryotes 15 finished (S.cerevisiae, S. Pombe, E. cuniculi, G. theta, C.elegans, D.melanogaster, A. gambiae, P. falciparum, P. yoelii, D. rerio, F. rubripes, A.thaliana, O. sativa (2x), M. musculus, Homo sapiens) Many more to come: rat, pig, cow, maize (and other plants), insects, fishes, many pathogenic parasites (Leishmania…) EST sequencing Partial mRNA sequences ~15x10 6 sequences in the public domain

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF Human genome Size: 3 x 10 9 nt for a haploid genome Highly repetitive sequences 25%, moderately repetitive sequences 25-30% Size of a gene: from 900 to >2’000’000 bases (introns included) Proportion of the genome coding for proteins: 5-7% Number of chromosomes: 22 autosomal, 1 sexual chromosome Size of a chromosome: 5 x 10 7 to 5 x 10 8 bases centromerexons of a genetelomer regulatory elementsrepetitive sequences locus control region

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF How to sequence the human genome? Consortium « international » approach: Generate genetic maps (meiotic recombination) and pseudogenetic maps (chromosome hybrids) for indicator sequences Generate a physical map based on large clones (BAC or PAC) Sequence enough large clones to cover the genome « commercial » approach (Celera): Generate random libraries of fixed length genomic clones (2kb and 10kb) Sequence both ends of enough clones to obtain a 10x coverage Use computer techniques to reconstitute the chromosomal sequences, check with the public project physical map

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF Sequencing progression

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF Interpretation of the human draft Still many gaps and unordered small pieces (except for chr 6, 7, 13, 14 20, 21, 22, Y) Even a genomic sequence does not tell you where the genes are encoded. The genome is far from being « decoded » One must combine genome and transcriptome to have a better idea Last freeze Ncbi30 June 24, 2002

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF The transcriptome The set of all functional RNAs (tRNA, rRNA, mRNA etc…) that can potentially be transcribed from the genome The documentation of the localization (cell type) and conditions under which these RNAs are expressed The documentation of the biological function(s) of each RNA species

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF Public draft transcriptome Information about the expression specificity and the function of mRNAs « full » cDNA sequences of know function « full » cDNA sequences, but « anonymous » (e.g. KIAA or DKFZ collections) EST sequences cDNA libraries derived from many different tissues Rapid random sequencing of the ends of all clones ORESTES sequences Growing set of expression data (microarrays, SAGE etc…) Increasing evidences for multiple alternative splicing and polyadenylation

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF Example mapping of ESTs and mRNAs ESTs mRNAs Computer prediction

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF The proteome Set of proteins present in a particular cell type under particular conditions Set of proteins potentially expressed from the genome Information about the specific expression and function of the proteins

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF Information on the proteome Separation of a complex mixture of proteins 2D PAGE (IEF + SDS PAGE) Capillary chromatography Individual characterisation of proteins Tryptic peptides signature (MS) Sequencing by chemistry or MS/MS All post-translational modifications (PTMs) !

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF Tridimentional structures Methods to determine structures X-ray cristallography NMR Data format Atoms coordinates (except H) in a cartesian space Databases For proteins and nucleic acids (RSCB, was PDB) Independent databases for sugars and small organic molecules

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF Visualisation of the structures Secondary structure elements Alpha helices, beta sheets, other Softwares Various representations (atoms, bonds, secondary…) Big choice of commercial and free software (e.g., DeepView)

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF Sequence information, and so what ? How to store and organise ? Databases (next lecture) How to access, search, compare ? Pairwise alignments, dot plots (Tuesday) BLAST searches in db (Tuesday) EST clustering (Wednesday) Multiple Alignments (Wednesday) Patterns, PSI-BLAST, Profiles and HMMs (Thursday) Gene prediction (Thursday) Protein function prediction (Friday) Users problems (Friday)

Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF Thank you