From Genome to Proteome and Biological Function

Slides:



Advertisements
Similar presentations
Pre-SIG meeting " Genome Annotation" A BioSapiens initiative Goal of the workshop were - to create an open forum to discuss current problems on function.
Advertisements

Protein Annotation Ontology The BioSapiens Virtual Institute for Genome Annotations Janet Thornton & Gabby Reeves AFP/BioSapiens Vienna: July 07.
The Human Genome Project Main reference: Nature (2001) 409,
The use of Ontology in Organising and Managing Protein Family Resources Katy Wolstencroft, University Of Manchester.
Molecular Biomedical Informatics Machine Learning and Bioinformatics Machine Learning & Bioinformatics 1.
Recombinant DNA Technology
LS-SNP: Large-scale annotation of coding non- synonymous SNPs based on multiple information sources -Bioinformatics April 2005.
PREDetector : Prokaryotic Regulatory Element Detector Samuel Hiard 1, Sébastien Rigali 2, Séverine Colson 2, Raphaël Marée 1 and Louis Wehenkel 1 1 Bioinformatics.
WKinMut An integrated tool for the analysis and interpretation of mutations in human protein kinases José MG Izarzugaza 1 Spanish National Cancer Research.
MitoInteractome : Mitochondrial Protein Interactome Database Rohit Reja Korean Bioinformation Center, Daejeon, Korea.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
Peter Tsai, Bioinformatics Institute.  University of California, Santa Cruz (UCSC)  A rapid and reliable display of any requested portion of genomes.
Outline to SNP bioinformatics lecture
Collaborative Information Management: Advanced Information Processing in Bioinformatics Joost N. Kok LIACS - Leiden Institute of Advanced Computer Science.
Predicting the Function of Single Nucleotide Polymorphisms Corey Harada Advisor: Eleazar Eskin.
Data-intensive Computing: Case Study Area 1: Bioinformatics B. Ramamurthy 6/17/20151.
Computational Tools for Finding and Interpreting Genetic Variations Gabor T. Marth Department of Biology, Boston College
Introduction to Genomics, Bioinformatics & Proteomics Brian Rybarczyk, PhD PMABS Department of Biology University of North Carolina Chapel Hill.
IST Computational Biology1 Information Retrieval Biological Databases 2 Pedro Fernandes Instituto Gulbenkian de Ciência, Oeiras PT.
Prepared with lots of help from friends... Metsada Pasmanik-Chor, Zohar Yakhini and NUMEROUS WEB RESOURCES. BioInformatics / Computational Biology Introduction.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProt Jennifer McDowall, Ph.D. Senior InterPro Curator Protein Sequence Database:
1-month Practical Course Genome Analysis Lecture 3: Residue exchange matrices Centre for Integrative Bioinformatics VU (IBIVU) Vrije Universiteit Amsterdam.
Polymorphisms – SNP, InDel, Transposon BMI/IBGP 730 Victor Jin, Ph.D. (Slides from Dr. Kun Huang) Department of Biomedical Informatics Ohio State University.
Data retrieval BioMart Data sets on ftp site MySQL queries of databases Perl API access to databases Export View.
Presented by Karen Xu. Introduction Cancer is commonly referred to as the “disease of the genes” Cancer may be favored by genetic predisposition, but.
BTN323: INTRODUCTION TO BIOLOGICAL DATABASES Day2: Specialized Databases Lecturer: Junaid Gamieldien, PhD
Bioinformatics Jan Taylor. A bit about me Biochemistry and Molecular Biology Computer Science, Computational Biology Multivariate statistics Machine learning.
Computational Molecular Biology Biochem 218 – BioMedical Informatics Simple Nucleotide.
Overview of Bioinformatics A/P Shoba Ranganathan Justin Choo National University of Singapore A Tutorial on Bioinformatics.
Bioinformatics.
Development of Bioinformatics and its application on Biotechnology
Epigenome 1. 2 Background: GWAS Genome-Wide Association Studies 3.
Erice 2008 Introduction to PDB Workshop From Molecules to Medicine: Integrating Crystallography in Drug Discovery Erice, 29 May - 8 June Peter Rose
Viewing & Getting GO COST Functional Modeling Workshop April, Helsinki.
Databases in Bioinformatics and Systems Biology Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
Doug Brutlag 2011 Genomics & Medicine Doug Brutlag Professor Emeritus of Biochemistry &
UCSC Genome Browser 1. The Progress 2 Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools.
PROTEIN STRUCTURE CLASSIFICATION SUMI SINGH (sxs5729)
20.1 Structural Genomics Determines the DNA Sequences of Entire Genomes The ultimate goal of genomic research: determining the ordered nucleotide sequences.
Biology 101 DNA: elegant simplicity A molecule consisting of two strands that wrap around each other to form a “twisted ladder” shape, with the.
Copyright OpenHelix. No use or reproduction without express written consent1.
CS177 Lecture 10 SNPs and Human Genetic Variation
Agent-based methods for translational cancer multilevel modelling Sylvia Nagl PhD Cancer Systems Science & Biomedical Informatics UCL Cancer Institute.
Alastair Kerr, Ph.D. WTCCB Bioinformatics Core An introduction to DNA and Protein Sequence Databases.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
Eukaryotic Genomes  The Organization and Control of Eukaryotic Genomes.
Building WormBase database(s). SAB 2008 Wellcome Trust Sanger Insitute Cold Spring Harbor Laboratory California Institute of Technology ● RNAi ● Microarray.
Bioinformatics MEDC601 Lecture by Brad Windle Ph# Office: Massey Cancer Center, Goodwin Labs Room 319 Web site for lecture:
Epidemiology 217 Molecular and Genetic Epidemiology Bioinformatics & Proteomics John Witte.
Chapter 12 DNA, RNA, Gene function, Gene regulation, and Biotechnology.
Bioinformatics and Computational Biology
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
Motif Search and RNA Structure Prediction Lesson 9.
A guided tour of Ensembl This quick tour will give you an outline view of what Ensembl is all about. You will learn: –Why we need Ensembl –What is in the.
Copyright OpenHelix. No use or reproduction without express written consent1.
Computational Biology and Genomics at Boston College Biology Gabor T. Marth Department of Biology, Boston College
Tools in Bioinformatics Genome Browsers. Retrieving genomic information Previous lesson(s): annotation-based perspective of search/data Today: genomic-based.
Protein databases Petri Törönen Shamelessly copied from material done by Eija Korpelainen and from CSC bio-opas
Genomes at NCBI. Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools lists 57 databases.
Using public resources to understand associations Dr Luke Jostins Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015.
EBI is an Outstation of the European Molecular Biology Laboratory. A web based integrated search service to understand ligand binding and secondary structure.
Different microarray applications Rita Holdhus Introduction to microarrays September 2010 microarray.no Aim of lecture: To get some basic knowledge about.
1 Finding disease genes: A challenge for Medicine, Mathematics and Computer Science Andrew Collins, Professor of Genetic Epidemiology and Bioinformatics.
Sequence: PFAM Used example: Database of protein domain families. It is based on manually curated alignments.
Developing a protein-interactions ontology
Functional Annotation of the Horse Genome
Sequencing Data Analysis
SNPs and CNPs By: David Wendel.
Sequencing Data Analysis
Presentation transcript:

From Genome to Proteome and Biological Function Cancer Genes lists Alfonso Valencia Structural and Computational Biology Programme Spanish National Cancer Research Centre CNIO, Madrid BioSapiens Workshop From Genome to Proteome and Biological Function Brussels April 2008

Cancer genes

Transcriptome classification of B-cell non.Hodgkins lymphomas Mohit Aggarwal et al. Cancer Cell 2007 CGH and microarray data in Ewing sarcomas Ferreira et al., Oncogene. 2007 Oct 22 Epigenetics The DNA Methylomes of Double-Stranded DNA Viruses Associated with Human Cancer Agustin Fernandez-Fernandez1, ….. Osvaldo Graña2, Gonzalo Gomez-Lopez2, David G. Pisano2, Alfonso Valencia2, …… Manel Esteller1å

“ ” BioSapiens Network of Excellence The BioSapiens-sponsored project concentrated on the protein coding loci and in particular on the alternatively spliced products. This work is part of the BioSapiens efforts for the annotation of the human genome (www.biosapiens.info). BioSapiens Network of Excellence €12 Million between 26 partners in 14 different countries “ The objective of the BIOSAPIENS Network of Excellence is to provide a large-scale, concerted effort to annotate genome data by laboratories distributed around Europe, using both informatics tools and input from experimentalists. ”

_Line of action 1_: Making information about cancer genes accessible to experimental biologists. The idea here is to take the lists of genes provided by experimental groups, starting with the one published by Sjoblom et al., (ref: Science. 2006 Oct 13;314(5797):268-274), and add the information/annotations provided by the different groups. Other gene lists will be added as they are published, what makes important to have the methods working as automatically as possible. We need proposals of groups on what they can provide. We have to avoid duplications. Represent information for biologist. We can use the protein DAS or CARGO system (see http://cargo.bioinfo.cnio.es) The aim in this chapter is to publish a rich resource of annotated cancer gene lists in a format useful for biologist. And the goal is to do it by summer this year. DO IT !

A web portal to integrate customized biological information. 6

Available at http://cargo2.bioinfo.cnio.es CARGO is a configurable biological web portal designed as a tool to facilitate, integrate and visualize results from Internet resources, independently of their native format or access method through the use of small agents, called widgets (or BioWidgets). CARGO provides pieces of minimal, relevant and descriptive biological information. The tool is designed to be used by experimental biologists with no training in bioinformatics. Available at http://cargo2.bioinfo.cnio.es Cases I, Pisano DG, Andres E, Carro A, Fernández JM, Gómez-López G, Rodriguez JM, Vera JF, Valencia A, Rojas AM. CARGO: a web portal to integrate customized biological information. PubMed 17483515. 7

8

Cargo has a iGoogle Gadget version. iGoogle Gadgets are simple HTML and JavaScript mini-applications served in iFrames that can be embedded in webpages and other apps. 9

A widget for CARGO is described by an XML Document that contains several fields providing information and documentation. 10

How do widgets work? PDB/seq alignments Distributed Annotation System. Ensembl request Distributed Annotation System. FTP Asyncronous Javascript And Xml (AJAX). 3D files SNP’s

DAS Infrastructure By Henning Hermjakob

By Andreas Prlic

Search for a term (like "regulation") or gene name ("p53") See some gene lists related with cancer (Sjoblom et al. Science, 2006, Matsuoka et al. Science, 2007, etc.) and some protein lists. Cancer Spindle 14

Register new widgets, login and manage accounts Register new widgets, login and manage accounts. New “Widget Manager” web form. 15

Open any classified widget by clicking on their names at menu bar on the top. See the global information related to the query made in the "Input description panel”. 16

BioSapiens Ontology Aim: Standardise DAS feature types Developed protein feature ontology in close collaboration with UniProt and HUPO PSI Three main branches: Positional features: “Donated” terms to the existing Sequence Ontology from GO consortium Protein Modifications: Adopted the existing PSI MI MOD ontology Non-positional features: BioSapiens Delivered as De107.8 By Gabby Reeves and Henning Hermjakob

By Ildefonso Cases

By Ildefonso Cases

By Ildefonso Cases

By Ildefonso Cases

By Ildefonso Cases

By Ildefonso Cases

By Ildefonso Cases

By Ildefonso Cases

Biosapiens Widgets By Ildefonso Cases MIPS Philip Wong Corum: http://mips.gsf.de/genre/proj/corum/ the Comprehensive Resource of Mammalian protein complexes PBD Cb-Cb 8a Pawel Smialowski (Data are calculated directly from structures of biological units.) . Univ Roma Alejandro Giorgetti, Tiziana Castrignano, Ildefonso Cases (CNIO) PMDB: http://mi.caspur.it/PMDB/ Protein Models database MPI Inf. Fidel Rodriguez Anotation Similarity. EBI- Thornton David Talavera CSA and PDB Sum: http://www.ebi.ac.uk/thornton-srv/databases/CSA/ EBI-Brazma Misha Kapushesky ArrayExpress Top 5 experiments: http://www.ebi.ac.uk/microarray-as/aew/ Uni Bologna Piero Fariselli, Ildefonso Cases (CNIO) PhD-SNP:Predictor of human Deleterious Single Nucleotide Polymorphisms http://gpcr2.biocomp.unibo.it/cgi/predictors/PhD-SNP/PhD-SNP.cgi CBS Peter Wad Sakett (service), Ildefonso Cases (CNIO) ProtMod: Protein Modification and Transmembrane Predictions: http://www.cbs.dtu.dk/services/ UCL Corin Yates, Joathan Lees Gene3D and Cath ENSEMBL Andreas Prlic CNIO iHop (Jose Manuel Rodríguez) Text Mining OMIM (Jose Maria Fernández) Disease FunCut (Jose Manuel Rodríguez) Function AllDomains (Ildefonso Cases) Domains Enviro (Jaime Fernández) Interactions SNP 3D (Ildefonso Cases) Structure and SNPs Mutation Viewer (Jaime Fernández) Cancer Mutations General Framework (Angel Carro, Eduardo Andrés León) CIPF Joaquin Tarraga FatiGo: GO Classification Asignements IDConverter: Ids Translator PCB Adam Hospital MoDel : Molecular Dynamics Extended Library Pmut: Prediction of pathological mutations BSC Dmitry Repchevsky 3D-Annotation: Domains annotation over 3D structures CNB Natalia Jimenez Visual Genomics: Gene Expression on Anatomical Atlases Teresa Paramo Gene2SNPs SNPs in HapMap Gene2tagSNPs Tag SNPs Gene3GADStudies Association Studies UPF Nuria Bigas CGPROP Cancer gene properties By Ildefonso Cases

“Enviro” Widget provides additional information on other interactions. Combining SNP3D and OMIM facilitates the study of the structural consequences of each variant (SNPs and/or mutations). IN this case the mutations “0001,R248” is clearly part of the DNA interaction site. Comparative study with OMIN R249S, associated with Hepatocellular carcinoma is not related to DNA binding. Related with phenotypic differences ? “Functional Residues” widgets reports S249 shows that it is involved in ligand binding. SNP-3D widget with 1GZH structure is part of the interaction interface between P53 and P53-BP and part of the interaction with the SV40 Oncoprotein ( 2H1L structure). “Enviro” Widget provides additional information on other interactions. By Ildefonso Cases

_Line of action 2_: Annotating with detailed manual interpretation of genes potentially associated with cancer and the mutations already detected. The plan here is to collaborate with the Sanger Cancer Genome Project in the analysis of their list of genes. In particular in the analysis of human protein kinases in a large collection of cancers (Greenman ... Futreal and Stratton Patterns of somatic mutation in human cancer genomes. Nature. 2007 Mar 8;446(7132):153-8.). Possible functional consequences of the mutations knowing that less than 1/3 of them are truly related with cancer. We will need here a combination of structural bioinformatics and genomics (i.e. splicing analysis, comparative genomics). The automatic results of modelling and analysis tools will not be sufficient and we have to think in how to develop a sufficiently robust analysis framework valid for other families. Interested people will be cancer groups in search for targets interested in the relation between cancer/genes/SNPs/mutations. For Discussion

Driver Vs Passenger mutations There are 2 different kinds of mutations that arise with the cancer cell spread-out: - Driver Mutations: Mutations that confer growth advantage on the cell in which they occur, are casually implicated in cancer development and have been therefore positively selected. They are by definition found in cancer cells. - Passenger Mutations: Mutations not subject to positive selection. Present in the cell that was the progenitor of the final clonal expansion of the cancer, biologically neutral and do not confer growth advantage. Normal Tissue Mutation Cancer Passenger Driver (Greenman et al, Nature 2007) (Wood et al, Science 2007)

Single Nucleotide Polymorphisms A SNP is a DNA sequence variation occurring when a single nucleotide in the genome differs between members of a species (or between paired chromosomes in an individual). Almost all common SNPs have only two alleles, so we say they are dimorphic. Within a population, SNPs can be assigned a minor allele frequency (the ratio of chromosomes in the population carrying the less common variant to those with the most common variant). Only mutations with a minor allele frequency of ≥ 1% (or 0.5%, depending on the dataset) are given the title "SNP". It is important to note that there are variations between human populations, so a SNP allele that is common in one geographical or ethnic group may be much rarer in another. SNPs can localize everywhere in the genome: - within coding sequences of genes, - non-coding regions of genes, - intergenic regions between genes. A SNP, within a coding sequence, in which both forms lead to the same polypeptide sequence (degeneracy of the genetic code) is termed synonymous (sometimes called a silent mutation) - if a different polypeptide sequence is produced they are non-synonymous. SNPs that are not in protein coding regions may still have consequences for gene splicing, transcription factor binding, or the sequence of non-coding RNA. By Jose M. G.-Izarzugaza

Maximal distance between changes This slide shows the distance between the furthest changes. Cancer and random mutations have different distributions (Kolmogorov-Smirnov test). Cancer-related mutations from the paper by Sjöblom et al. (2006). Ten randomly generated sets of positions SNPs downloaded from Ensembl By David Talavera

Effect of mutations: effect on functional sites Cancer-related mutations Random positions Ligand-binding 17% 21% Metal-binding 7% Nucleic Acid-binding 10% 11% Catalytic 0% This slide shows as cancer-related mutations are not more frequently found on functional sites than randomly picked up positions. These numbers are the percentage of genes having coincidence in any positions. Then, 17% (cancer-related mutations; ligand-binding) means that 17% of genes have at least one mutation affecting one residue involved in binding ligands. By David Talavera

Effect of mutations: kind of substitution Cancer mutations are not randomly distributed along the sequence; however, there is no relation with functional sites. Cancer-related mutations don’t occur at extremely conserved positions. Cancer-related mutations don’t seem to be more drastic than SNPs. Cancer-related mutations SNPs Conservative changes 55.1% 55.3% Non-conservative changes 44.9% 44.7% This slide shows as both type of changes (deleterious and tolerated) have similar rates of conservative/non-conservative changes. Non-conservatives changes have a negative blosum score, whereas conservative changes have positive or 0 score. By David Talavera

Protein Kinases By Jose M. G.-Izarzugaza Protein kinases are enzymes that modify other proteins by chemically adding phosphate groups to them (phosphorylation). Phosphorylation usually results in a functional change of the target protein (substrate) by changing enzyme activity, cellular location, or association with other proteins. The chemical activity of a kinase involves removing a phosphate group from ATP and covalently attaching it to one of three amino acids that have a free hydroxyl group. Most kinases act on both serine and threonine, others act on tyrosine, and a number (dual specificity kinases) act on all three. The human genome contains about 520 protein kinase genes [Manning et al, 2001] Disregulated kinase activity is a frequent cause of disease, particularly cancer, since kinases regulate cell growth, movement and cell-death. Protein Kinase is the most commonly found domain in known cancer genes [Futreal et al, 2004] Since protein kinases have key effects on the cell, their activity is highly regulated: - by phosphorylation (sometimes auto-phosphorylation) - by binding of activator proteins or inhibitor proteins. - by binding of activator/inhibitor small molecules. - by controlling their location in the cell relative to their substrates. Drugs which inhibit specific kinases are being developed to treat several diseases, and some are currently in clinical use, including Gleevec (imatinib, leukaemia) and Iressa (gefitinib, lung cancer). 35 By Jose M. G.-Izarzugaza

Many Structures Active Inactive Kinases undergo a large articulated motion when they turn “on” and “off” Src undergoes a large articulated motion when it turns "on" and "off." The crystal structure, shown on the right from PDB entry 2src, shows the inactive form. The protein opens up, as shown on the left, to form the active protein. Active Inactive Source: Src tyrosine kinase from Protein DataBank By Jose M. G.-Izarzugaza

Mutation analysis workflow Query Family (Kinases) Family Members (From Kinbase) Mutation analysis workflow Get SNPs Family Representatives (From PDB) Map SNPs onto PDBs Multiple Structure Alignment Feature Distribution Analysis for SNPs, very similar for Mutations By Jose M. G.-Izarzugaza

Statistics on the PK PDB retrieval Total Human Sequences in Kinbase 620 Sequences in Kinbase not Pseudogenes 516 Sequences with known Swissprot ID (asigned by BLAST) 488 Sequences with known Swissprot ID, Blast identity>95% 474 Kinases with at least one solver protein structure (PDB) 145 Human Kinase Sequences in the Multiple Seq. Alignment 266 Total Number of SNPs (Kinase Domain) Synonymous SNPs Non-Synonymous SNPs 569 263 306 Total Number of Mutations (Kinase Domain) Driver Mutations Passenger Mutations 140 73 63 By Jose M. G.-Izarzugaza

TreeDet vs firedb TreeDet vs firedb vs conserv By David de Juan

By David de Juan

Driver Passenger By Jose M. G.-Izarzugaza Mean: 3.61 Median: 4.68 St.Dev: 3.12 Xd: 1.72 Mean: 6.50 Median: 6.35 St.Dev: 4.16 Xd: 1.24 Mean: 11.07 Median: 10.26 St.Dev: 7.06 Xd: -0.07 Driver Mean: 4.34 Median: 4.94 St.Dev: 2.58 Xd: -0.89 Mean: 6.71 Median: 5.57 St.Dev: 3.69 Xd: -0.30 Mean: 10.26 Median: 9.94 St.Dev: 6.32 Xd: -0.78 Passenger By Jose M. G.-Izarzugaza

Next - “CARGO cancer gene list” paper to be presented tomorrow with action items (scope: Cancer Research) - Mutation analysis is still a key challenge. Creation of analysis pipelines for all proteins and for protein families (SNPs versus mutations, driver versus passenger mutations)