Presentation on theme: "Cancer Genes lists Alfonso Valencia Structural and Computational Biology Programme Spanish National Cancer Research Centre CNIO, Madrid BioSapiens Workshop."— Presentation transcript:
Cancer Genes lists Alfonso Valencia Structural and Computational Biology Programme Spanish National Cancer Research Centre CNIO, Madrid BioSapiens Workshop From Genome to Proteome and Biological Function Brussels April 2008
Transcriptome classification of B-cell non.Hodgkins lymphomas Mohit Aggarwal et al. Cancer Cell 2007 CGH and microarray data in Ewing sarcomas Ferreira et al., Oncogene Oct 22 Epigenetics The DNA Methylomes of Double-Stranded DNA Viruses Associated with Human Cancer Agustin Fernandez-Fernandez 1, ….. Osvaldo Graña 2, Gonzalo Gomez-Lopez 2, David G. Pisano 2, Alfonso Valencia 2, …… Manel Esteller 1å
BioSapiens Network of Excellence 12 Million between 26 partners in 14 different countries The objective of the BIOSAPIENS Network of Excellence is to provide a large-scale, concerted effort to annotate genome data by laboratories distributed around Europe, using both informatics tools and input from experimentalists. The BioSapiens-sponsored project concentrated on the protein coding loci and in particular on the alternatively spliced products. This work is part of the BioSapiens efforts for the annotation of the human genome (www.biosapiens.info).www.biosapiens.info
_Line of action 1_: Making information about cancer genes accessible to experimental biologists. The idea here is to take the lists of genes provided by experimental groups, starting with the one published by Sjoblom et al., (ref: Science Oct 13;314(5797): ), and add the information/annotations provided by the different groups. Other gene lists will be added as they are published, what makes important to have the methods working as automatically as possible. We need proposals of groups on what they can provide. We have to avoid duplications. Represent information for biologist. We can use the protein DAS or CARGO system (see The aim in this chapter is to publish a rich resource of annotated cancer gene lists in a format useful for biologist. And the goal is to do it by summer this year. DO IT !
A web portal to integrate customized biological information.
CARGO is a configurable biological web portal designed as a tool to facilitate, integrate and visualize results from Internet resources, independently of their native format or access method through the use of small agents, called widgets (or BioWidgets). CARGO provides pieces of minimal, relevant and descriptive biological information. The tool is designed to be used by experimental biologists with no training in bioinformatics. Available at Cases I, Pisano DG, Andres E, Carro A, Fernández JM, Gómez-López G, Rodriguez JM, Vera JF, Valencia A, Rojas AM. CARGO: a web portal to integrate customized biological information. PubMed
A widget for CARGO is described by an XML Document that contains several fields providing information and documentation.
DAS Infrastructure By Henning Hermjakob
By Andreas Prlic
Search for a term (like "regulation") or gene name ("p53") See some gene lists related with cancer (Sjoblom et al. Science, 2006, Matsuoka et al. Science, 2007, etc.) and some protein lists. Cancer Spindle
Register new widgets, login and manage accounts. New Widget Manager web form.
Open any classified widget by clicking on their names at menu bar on the top. See the global information related to the query made in the "Input description panel.
BioSapiens Ontology Aim: Standardise DAS feature types Developed protein feature ontology in close collaboration with UniProt and HUPO PSI Three main branches: –Positional features: Donated terms to the existing Sequence Ontology from GO consortium –Protein Modifications: Adopted the existing PSI MI MOD ontology –Non-positional features: BioSapiens Delivered as De107.8 By Gabby Reeves and Henning Hermjakob
By Ildefonso Cases
CIPFJoaquin Tarraga FatiGo: GO Classification Asignements IDConverter: Ids Translator PCBAdam Hospital MoDel : Molecular Dynamics Extended Library Pmut: Prediction of pathological mutations BSCDmitry Repchevsky 3D-Annotation: Domains annotation over 3D structures CNB Natalia Jimenez Visual Genomics: Gene Expression on Anatomical Atlases Teresa Paramo Gene2SNPs SNPs in HapMap Gene2tagSNPs Tag SNPs Gene3GADStudies Association Studies UPFNuria Bigas CGPROP Cancer gene properties MIPSPhilip Wong Corum: the Comprehensive Resource of Mammalian protein complexes PBD Cb-Cb 8aPawel Smialowski (Data are calculated directly from structures of biological units.). Univ Roma Alejandro Giorgetti, Tiziana Castrignano, Ildefonso Cases (CNIO) PMDB: Protein Models database MPI Inf. Fidel Rodriguez Anotation Similarity. EBI- Thornton David Talavera CSA and PDB Sum: EBI-BrazmaMisha Kapushesky ArrayExpress Top 5 experiments: Uni Bologna Piero Fariselli, Ildefonso Cases (CNIO) PhD-SNP:Predictor of human Deleterious Single Nucleotide Polymorphisms CBS Peter Wad Sakett (service), Ildefonso Cases (CNIO) ProtMod: Protein Modification and Transmembrane Predictions: UCL Corin Yates, Joathan Lees Gene3D and Cath ENSEMBL Andreas Prlic CNIO iHop (Jose Manuel Rodríguez) Text Mining OMIM (Jose Maria Fernández) Disease FunCut (Jose Manuel Rodríguez)Function AllDomains (Ildefonso Cases) Domains Enviro (Jaime Fernández) Interactions SNP 3D (Ildefonso Cases) Structure and SNPs Mutation Viewer (Jaime Fernández) Cancer Mutations General Framework (Angel Carro, Eduardo Andrés León) Biosapiens Widgets By Ildefonso Cases
Combining SNP3D and OMIM facilitates the study of the structural consequences of each variant (SNPs and/or mutations). IN this case the mutations 0001,R248 is clearly part of the DNA interaction site. Comparative study with OMIN R249S, associated with Hepatocellular carcinoma is not related to DNA binding. Related with phenotypic differences ? Functional Residues widgets reports S249 shows that it is involved in ligand binding. SNP-3D widget with 1GZH structure is part of the interaction interface between P53 and P53-BP and part of the interaction with the SV40 Oncoprotein ( 2H1L structure). Enviro Widget provides additional information on other interactions. By Ildefonso Cases
_Line of action 2_: Annotating with detailed manual interpretation of genes potentially associated with cancer and the mutations already detected. The plan here is to collaborate with the Sanger Cancer Genome Project in the analysis of their list of genes. In particular in the analysis of human protein kinases in a large collection of cancers (Greenman... Futreal and Stratton Patterns of somatic mutation in human cancer genomes. Nature Mar 8;446(7132):153-8.). Possible functional consequences of the mutations knowing that less than 1/3 of them are truly related with cancer. We will need here a combination of structural bioinformatics and genomics (i.e. splicing analysis, comparative genomics). The automatic results of modelling and analysis tools will not be sufficient and we have to think in how to develop a sufficiently robust analysis framework valid for other families. Interested people will be cancer groups in search for targets interested in the relation between cancer/genes/SNPs/mutations. For Discussion
30 Driver Vs Passenger mutations There are 2 different kinds of mutations that arise with the cancer cell spread-out: –- Driver Mutations: Mutations that confer growth advantage on the cell in which they occur, are casually implicated in cancer development and have been therefore positively selected. They are by definition found in cancer cells. –- Passenger Mutations: Mutations not subject to positive selection. Present in the cell that was the progenitor of the final clonal expansion of the cancer, biologically neutral and do not confer growth advantage. Normal TissueMutationCancer PassengerDriver (Greenman et al, Nature 2007) (Wood et al, Science 2007)
31 Single Nucleotide Polymorphisms A SNP is a DNA sequence variation occurring when a single nucleotide in the genome differs between members of a species (or between paired chromosomes in an individual). Almost all common SNPs have only two alleles, so we say they are dimorphic. Within a population, SNPs can be assigned a minor allele frequency (the ratio of chromosomes in the population carrying the less common variant to those with the most common variant). Only mutations with a minor allele frequency of 1% (or 0.5%, depending on the dataset) are given the title "SNP". It is important to note that there are variations between human populations, so a SNP allele that is common in one geographical or ethnic group may be much rarer in another. SNPs can localize everywhere in the genome: - within coding sequences of genes, - non-coding regions of genes, - intergenic regions between genes. A SNP, within a coding sequence, in which both forms lead to the same polypeptide sequence (degeneracy of the genetic code) is termed synonymous (sometimes called a silent mutation) - if a different polypeptide sequence is produced they are non-synonymous. SNPs that are not in protein coding regions may still have consequences for gene splicing, transcription factor binding, or the sequence of non-coding RNA. By Jose M. G.-Izarzugaza
Maximal distance between changes –Cancer-related mutations from the paper by Sjöblom et al. (2006). –Ten randomly generated sets of positions –SNPs downloaded from Ensembl By David Talavera
Effect of mutations: effect on functional sites Cancer-related mutations Random positions Ligand-binding17%21% Metal-binding7% Nucleic Acid- binding 10%11% Catalytic0% By David Talavera
Effect of mutations: kind of substitution Cancer-related mutations SNPs Conservative changes 55.1%55.3% Non-conservative changes 44.9%44.7% By David Talavera Cancer mutations are not randomly distributed along the sequence; however, there is no relation with functional sites. Cancer-related mutations dont occur at extremely conserved positions. Cancer-related mutations dont seem to be more drastic than SNPs.
35 Protein kinases are enzymes that modify other proteins by chemically adding phosphate groups to them (phosphorylation). Phosphorylation usually results in a functional change of the target protein (substrate) by changing enzyme activity, cellular location, or association with other proteins. The chemical activity of a kinase involves removing a phosphate group from ATP and covalently attaching it to one of three amino acids that have a free hydroxyl group. Most kinases act on both serine and threonine, others act on tyrosine, and a number (dual specificity kinases) act on all three. The human genome contains about 520 protein kinase genes [Manning et al, 2001] Disregulated kinase activity is a frequent cause of disease, particularly cancer, since kinases regulate cell growth, movement and cell-death. Protein Kinase is the most commonly found domain in known cancer genes [Futreal et al, 2004] Since protein kinases have key effects on the cell, their activity is highly regulated: - by phosphorylation (sometimes auto-phosphorylation) - by binding of activator proteins or inhibitor proteins. - by binding of activator/inhibitor small molecules. - by controlling their location in the cell relative to their substrates. Drugs which inhibit specific kinases are being developed to treat several diseases, and some are currently in clinical use, including Gleevec (imatinib, leukaemia) and Iressa (gefitinib, lung cancer). Protein Kinases 35 By Jose M. G.-Izarzugaza
Many Structures InactiveActive Kinases undergo a large articulated motion when they turn on and off Source: Src tyrosine kinase from Protein DataBank By Jose M. G.-Izarzugaza
Query Family (Kinases) Family Members (From Kinbase) Family Representatives (From PDB) Feature Distribution Analysis Multiple Structure Alignment Get SNPs Map SNPs onto PDBs Mutation analysis workflow for SNPs, very similar for Mutations By Jose M. G.-Izarzugaza
Statistics on the PK PDB retrieval Total Human Sequences in Kinbase620 Sequences in Kinbase not Pseudogenes516 Sequences with known Swissprot ID (asigned by BLAST)488 Sequences with known Swissprot ID, Blast identity>95%474 Kinases with at least one solver protein structure (PDB)145 Human Kinase Sequences in the Multiple Seq. Alignment266 Total Number of SNPs (Kinase Domain) Synonymous SNPs Non-Synonymous SNPs Total Number of Mutations (Kinase Domain) Driver Mutations Passenger Mutations By Jose M. G.-Izarzugaza
TreeDet vs firedb vs conserv By David de Juan TreeDet vs firedb
Next - CARGO cancer gene list paper to be presented tomorrow with action items (scope: Cancer Research) - Mutation analysis is still a key challenge. Creation of analysis pipelines for all proteins and for protein families (SNPs versus mutations, driver versus passenger mutations)