Presentation is loading. Please wait.

Presentation is loading. Please wait.

From Genome to Proteome and Biological Function

Similar presentations


Presentation on theme: "From Genome to Proteome and Biological Function"— Presentation transcript:

1 From Genome to Proteome and Biological Function
Cancer Genes lists Alfonso Valencia Structural and Computational Biology Programme Spanish National Cancer Research Centre CNIO, Madrid BioSapiens Workshop From Genome to Proteome and Biological Function Brussels April 2008

2 Cancer genes

3 Transcriptome classification of B-cell non.Hodgkins lymphomas
Mohit Aggarwal et al. Cancer Cell 2007 CGH and microarray data in Ewing sarcomas Ferreira et al., Oncogene Oct 22 Epigenetics The DNA Methylomes of Double-Stranded DNA Viruses Associated with Human Cancer Agustin Fernandez-Fernandez1, ….. Osvaldo Graña2, Gonzalo Gomez-Lopez2, David G. Pisano2, Alfonso Valencia2, …… Manel Esteller1å

4 “ ” BioSapiens Network of Excellence
The BioSapiens-sponsored project concentrated on the protein coding loci and in particular on the alternatively spliced products. This work is part of the BioSapiens efforts for the annotation of the human genome ( BioSapiens Network of Excellence €12 Million between 26 partners in different countries The objective of the BIOSAPIENS Network of Excellence is to provide a large-scale, concerted effort to annotate genome data by laboratories distributed around Europe, using both informatics tools and input from experimentalists.

5 _Line of action 1_: Making information about cancer genes accessible to experimental biologists.
The idea here is to take the lists of genes provided by experimental groups, starting with the one published by Sjoblom et al., (ref: Science Oct 13;314(5797): ), and add the information/annotations provided by the different groups. Other gene lists will be added as they are published, what makes important to have the methods working as automatically as possible. We need proposals of groups on what they can provide. We have to avoid duplications. Represent information for biologist. We can use the protein DAS or CARGO system (see The aim in this chapter is to publish a rich resource of annotated cancer gene lists in a format useful for biologist. And the goal is to do it by summer this year. DO IT !

6 A web portal to integrate customized biological information.
6

7 Available at http://cargo2.bioinfo.cnio.es
CARGO is a configurable biological web portal designed as a tool to facilitate, integrate and visualize results from Internet resources, independently of their native format or access method through the use of small agents, called widgets (or BioWidgets). CARGO provides pieces of minimal, relevant and descriptive biological information. The tool is designed to be used by experimental biologists with no training in bioinformatics. Available at Cases I, Pisano DG, Andres E, Carro A, Fernández JM, Gómez-López G, Rodriguez JM, Vera JF, Valencia A, Rojas AM. CARGO: a web portal to integrate customized biological information. PubMed 7

8 8

9 Cargo has a iGoogle Gadget version.
iGoogle Gadgets are simple HTML and JavaScript mini-applications served in iFrames that can be embedded in webpages and other apps. 9

10 A widget for CARGO is described by an XML Document that contains several fields providing information and documentation. 10

11 How do widgets work? PDB/seq alignments Distributed Annotation System.
Ensembl request Distributed Annotation System. FTP Asyncronous Javascript And Xml (AJAX). 3D files SNP’s

12 DAS Infrastructure By Henning Hermjakob

13 By Andreas Prlic

14 Search for a term (like "regulation") or gene name ("p53")
See some gene lists related with cancer (Sjoblom et al. Science, 2006, Matsuoka et al. Science, 2007, etc.) and some protein lists. Cancer Spindle 14

15 Register new widgets, login and manage accounts
Register new widgets, login and manage accounts. New “Widget Manager” web form. 15

16 Open any classified widget by clicking on their names at menu bar on the top.
See the global information related to the query made in the "Input description panel”. 16

17 BioSapiens Ontology Aim: Standardise DAS feature types
Developed protein feature ontology in close collaboration with UniProt and HUPO PSI Three main branches: Positional features: “Donated” terms to the existing Sequence Ontology from GO consortium Protein Modifications: Adopted the existing PSI MI MOD ontology Non-positional features: BioSapiens Delivered as De107.8 By Gabby Reeves and Henning Hermjakob

18 By Ildefonso Cases

19 By Ildefonso Cases

20 By Ildefonso Cases

21 By Ildefonso Cases

22 By Ildefonso Cases

23 By Ildefonso Cases

24 By Ildefonso Cases

25 By Ildefonso Cases

26 Biosapiens Widgets By Ildefonso Cases MIPS Philip Wong
Corum: the Comprehensive Resource of Mammalian protein complexes PBD Cb-Cb 8a Pawel Smialowski (Data are calculated directly from structures of biological units.) . Univ Roma Alejandro Giorgetti, Tiziana Castrignano, Ildefonso Cases (CNIO) PMDB: Protein Models database MPI Inf. Fidel Rodriguez Anotation Similarity. EBI- Thornton David Talavera CSA and PDB Sum: EBI-Brazma Misha Kapushesky ArrayExpress Top 5 experiments: Uni Bologna Piero Fariselli, Ildefonso Cases (CNIO) PhD-SNP:Predictor of human Deleterious Single Nucleotide Polymorphisms CBS Peter Wad Sakett (service), Ildefonso Cases (CNIO) ProtMod: Protein Modification and Transmembrane Predictions: UCL Corin Yates, Joathan Lees Gene3D and Cath ENSEMBL Andreas Prlic CNIO iHop (Jose Manuel Rodríguez) Text Mining OMIM (Jose Maria Fernández) Disease FunCut (Jose Manuel Rodríguez) Function AllDomains (Ildefonso Cases) Domains Enviro (Jaime Fernández) Interactions SNP 3D (Ildefonso Cases) Structure and SNPs Mutation Viewer (Jaime Fernández) Cancer Mutations General Framework (Angel Carro, Eduardo Andrés León) CIPF Joaquin Tarraga FatiGo: GO Classification Asignements IDConverter: Ids Translator PCB Adam Hospital MoDel : Molecular Dynamics Extended Library Pmut: Prediction of pathological mutations BSC Dmitry Repchevsky 3D-Annotation: Domains annotation over 3D structures CNB Natalia Jimenez Visual Genomics: Gene Expression on Anatomical Atlases Teresa Paramo Gene2SNPs SNPs in HapMap Gene2tagSNPs Tag SNPs Gene3GADStudies Association Studies UPF Nuria Bigas CGPROP Cancer gene properties By Ildefonso Cases

27 “Enviro” Widget provides additional information on other interactions.
Combining SNP3D and OMIM facilitates the study of the structural consequences of each variant (SNPs and/or mutations). IN this case the mutations “0001,R248” is clearly part of the DNA interaction site. Comparative study with OMIN R249S, associated with Hepatocellular carcinoma is not related to DNA binding. Related with phenotypic differences ? “Functional Residues” widgets reports S249 shows that it is involved in ligand binding. SNP-3D widget with 1GZH structure is part of the interaction interface between P53 and P53-BP and part of the interaction with the SV40 Oncoprotein ( 2H1L structure). “Enviro” Widget provides additional information on other interactions. By Ildefonso Cases

28 _Line of action 2_: Annotating with detailed manual interpretation of genes potentially associated with cancer and the mutations already detected. The plan here is to collaborate with the Sanger Cancer Genome Project in the analysis of their list of genes. In particular in the analysis of human protein kinases in a large collection of cancers (Greenman ... Futreal and Stratton Patterns of somatic mutation in human cancer genomes. Nature Mar 8;446(7132):153-8.). Possible functional consequences of the mutations knowing that less than 1/3 of them are truly related with cancer. We will need here a combination of structural bioinformatics and genomics (i.e. splicing analysis, comparative genomics). The automatic results of modelling and analysis tools will not be sufficient and we have to think in how to develop a sufficiently robust analysis framework valid for other families. Interested people will be cancer groups in search for targets interested in the relation between cancer/genes/SNPs/mutations. For Discussion

29

30 Driver Vs Passenger mutations
There are 2 different kinds of mutations that arise with the cancer cell spread-out: - Driver Mutations: Mutations that confer growth advantage on the cell in which they occur, are casually implicated in cancer development and have been therefore positively selected. They are by definition found in cancer cells. - Passenger Mutations: Mutations not subject to positive selection. Present in the cell that was the progenitor of the final clonal expansion of the cancer, biologically neutral and do not confer growth advantage. Normal Tissue Mutation Cancer Passenger Driver (Greenman et al, Nature 2007) (Wood et al, Science 2007)

31 Single Nucleotide Polymorphisms
A SNP is a DNA sequence variation occurring when a single nucleotide in the genome differs between members of a species (or between paired chromosomes in an individual). Almost all common SNPs have only two alleles, so we say they are dimorphic. Within a population, SNPs can be assigned a minor allele frequency (the ratio of chromosomes in the population carrying the less common variant to those with the most common variant). Only mutations with a minor allele frequency of ≥ 1% (or 0.5%, depending on the dataset) are given the title "SNP". It is important to note that there are variations between human populations, so a SNP allele that is common in one geographical or ethnic group may be much rarer in another. SNPs can localize everywhere in the genome: - within coding sequences of genes, - non-coding regions of genes, - intergenic regions between genes. A SNP, within a coding sequence, in which both forms lead to the same polypeptide sequence (degeneracy of the genetic code) is termed synonymous (sometimes called a silent mutation) - if a different polypeptide sequence is produced they are non-synonymous. SNPs that are not in protein coding regions may still have consequences for gene splicing, transcription factor binding, or the sequence of non-coding RNA. By Jose M. G.-Izarzugaza

32 Maximal distance between changes
This slide shows the distance between the furthest changes. Cancer and random mutations have different distributions (Kolmogorov-Smirnov test). Cancer-related mutations from the paper by Sjöblom et al. (2006). Ten randomly generated sets of positions SNPs downloaded from Ensembl By David Talavera

33 Effect of mutations: effect on functional sites
Cancer-related mutations Random positions Ligand-binding 17% 21% Metal-binding 7% Nucleic Acid-binding 10% 11% Catalytic 0% This slide shows as cancer-related mutations are not more frequently found on functional sites than randomly picked up positions. These numbers are the percentage of genes having coincidence in any positions. Then, 17% (cancer-related mutations; ligand-binding) means that 17% of genes have at least one mutation affecting one residue involved in binding ligands. By David Talavera

34 Effect of mutations: kind of substitution
Cancer mutations are not randomly distributed along the sequence; however, there is no relation with functional sites. Cancer-related mutations don’t occur at extremely conserved positions. Cancer-related mutations don’t seem to be more drastic than SNPs. Cancer-related mutations SNPs Conservative changes 55.1% 55.3% Non-conservative changes 44.9% 44.7% This slide shows as both type of changes (deleterious and tolerated) have similar rates of conservative/non-conservative changes. Non-conservatives changes have a negative blosum score, whereas conservative changes have positive or 0 score. By David Talavera

35 Protein Kinases By Jose M. G.-Izarzugaza
Protein kinases are enzymes that modify other proteins by chemically adding phosphate groups to them (phosphorylation). Phosphorylation usually results in a functional change of the target protein (substrate) by changing enzyme activity, cellular location, or association with other proteins. The chemical activity of a kinase involves removing a phosphate group from ATP and covalently attaching it to one of three amino acids that have a free hydroxyl group. Most kinases act on both serine and threonine, others act on tyrosine, and a number (dual specificity kinases) act on all three. The human genome contains about 520 protein kinase genes [Manning et al, 2001] Disregulated kinase activity is a frequent cause of disease, particularly cancer, since kinases regulate cell growth, movement and cell-death. Protein Kinase is the most commonly found domain in known cancer genes [Futreal et al, 2004] Since protein kinases have key effects on the cell, their activity is highly regulated: - by phosphorylation (sometimes auto-phosphorylation) - by binding of activator proteins or inhibitor proteins. - by binding of activator/inhibitor small molecules. - by controlling their location in the cell relative to their substrates. Drugs which inhibit specific kinases are being developed to treat several diseases, and some are currently in clinical use, including Gleevec (imatinib, leukaemia) and Iressa (gefitinib, lung cancer). 35 By Jose M. G.-Izarzugaza

36 Many Structures Active Inactive Kinases undergo a large
articulated motion when they turn “on” and “off” Src undergoes a large articulated motion when it turns "on" and "off." The crystal structure, shown on the right from PDB entry 2src, shows the inactive form. The protein opens up, as shown on the left, to form the active protein. Active Inactive Source: Src tyrosine kinase from Protein DataBank By Jose M. G.-Izarzugaza

37 Mutation analysis workflow
Query Family (Kinases) Family Members (From Kinbase) Mutation analysis workflow Get SNPs Family Representatives (From PDB) Map SNPs onto PDBs Multiple Structure Alignment Feature Distribution Analysis for SNPs, very similar for Mutations By Jose M. G.-Izarzugaza

38 Statistics on the PK PDB retrieval
Total Human Sequences in Kinbase 620 Sequences in Kinbase not Pseudogenes 516 Sequences with known Swissprot ID (asigned by BLAST) 488 Sequences with known Swissprot ID, Blast identity>95% 474 Kinases with at least one solver protein structure (PDB) 145 Human Kinase Sequences in the Multiple Seq. Alignment 266 Total Number of SNPs (Kinase Domain) Synonymous SNPs Non-Synonymous SNPs 569 263 306 Total Number of Mutations (Kinase Domain) Driver Mutations Passenger Mutations 140 73 63 By Jose M. G.-Izarzugaza

39 TreeDet vs firedb TreeDet vs firedb vs conserv By David de Juan

40 By David de Juan

41 Driver Passenger By Jose M. G.-Izarzugaza Mean: 3.61 Median: 4.68
St.Dev: 3.12 Xd: Mean: Median: 6.35 St.Dev: 4.16 Xd: Mean: Median: 10.26 St.Dev: 7.06 Xd: Driver Mean: Median: 4.94 St.Dev: 2.58 Xd: Mean: Median: 5.57 St.Dev: 3.69 Xd: Mean: Median: 9.94 St.Dev: 6.32 Xd: Passenger By Jose M. G.-Izarzugaza

42 Next - “CARGO cancer gene list” paper to be presented tomorrow with action items (scope: Cancer Research) - Mutation analysis is still a key challenge. Creation of analysis pipelines for all proteins and for protein families (SNPs versus mutations, driver versus passenger mutations)


Download ppt "From Genome to Proteome and Biological Function"

Similar presentations


Ads by Google