3 What is Functional Genomics? Functional genomics refers to the development and application of global (genome-wide or system-wide) experimental approaches to assess gene function by making use of the information and reagents provided by structural genomics. (Hieter and Boguski 1997)Functional genomics as a means of assessing phenotype differs from more classical approaches primarily with respect to the scale and automation of biological investigations. (UCDavis Genome Center)
5 Functional Genomics Differential Gene expression SAGE/MPSS*Open systems*Identifying the Function of GenesFunctional ComplementationRNA interference/RNA silencing
6 Why We Need Functional Genomics Organism# genes% of genes with inferred functionCompletion date of genomeE. coli4288601997yeast6,600401996C. elegans19,0001998Drosophila12-14K251999Arabidopsis25,0002000mouse~30,000?10-202002human
7 What is the limitation of functional genomics? 5P
8 QuestionsFunctional genomics will not replace the time-honored use of genetics, biochemistry, cell biology and structural studies in gaining a detailed understanding of biological mechanisms.
9 SAGE & MPSS Serial Analysis of Gene Expression Massively Parallel Signature SequencingStart from mRNA (euks)Generate a short sequence tag (9-21 nt) for each mRNA ‘species’ in a cell
10 SAGE Described by Velculescu et al. (1995) Originally 9 bp tags, now LongSAGE 21 bp10-50 tags in a cloneOnly requires a sequencer (and some time)
11 MPSS Proprietary technology; published 2000 Generates 17 nt “signature sequence”Collects >1,000,000 signatures per sampleRequires 2 µg of mRNA and $$
22 Kamath et al. 2003 16,757 strains = 86% of predicted ORFs Looked for sterility or lethality(Nonv), slow growth (Gro) or defects (Vpep)1,722 strains (10.3% had such phenotypes)
23 Genes involved in basic metabolism & cell maintenance are enriched for Nonv phenotype Genes involved in more complex ‘metazoan’ processes (signal transduction, transcriptional regulation) are enriched for Vpep phenotype Nonv phenotypes highly underrepresented on the X chromosome X chromosome is enriched for Vpep phenotypes
24 Basal functions of eukaryotes are shared: - lethal (Nonv) genes tended to be of ancient origin - ‘animal-specific’ genes tended to be non-lethal (Vpep) - almost no ‘worm-specific’ genes were lethal
25 Protein-Protein Interaction Interactome: proteome scale data sets of protein-protein interaction. Protein network.Methods: Yeast two hybrid, protein microarray, gene disruption phenotype, protein subcellular localization, mRNA expression profile, immunoprecipitation/mass spectrometryProblem: false positive, false negativeAlthough many protein-protein interaction maps on a global scale have been generated, they suffer from high error rates as evidence by low overlap among the different databases.
26 Network Topology Various pattern of links connecting pairs of nodes Computer network or any communication networkA given node ahs one or more links to othersNetwork topology is determined only by the configuration of connections between nodesDaisy chain; linear, ringCentralizationDecentralizationHybrid; two or more different basic network
27 Traveling Salesman Network (or Conference Site Map) Seattle, WASioux Falls, SDBoston, MASteamboat, CODenver, COSan Francisco, LAIowa city, IowaNew York, NYLincoln, NEChicago, ILMoab, UTWashington DC, MDWichita, KSLake of the Ozarks, MSColumbia, SCAtlanta, GADallas, TXFort Lauderdale, FLTo find the cheapest way of visiting all of the cities returning to your starting pointAnchorage, ALOrlando, FLHonolulu, HI
28 Protein-Protein Interaction: Extended Neighborhood Counting simple paths between two proteinsIn addition to the path through the immediate neighbor G, we consider four other simple paths between A and BThe local structure beyond the shortest path through G contains valuable information
29 Protein-Protein Interaction: Assigning Weight on the Hub As the paths between a pair of proteins are found, we wanted to assign scores to each paths, according to path length and the characteristics of the nodesThen we sum up the contribution from each path to calculate the overall scoreThe longer path should have less weight since they have less chance for interactionTraveling along the nodes that have fewer neighbors should be given more weight
30 np; maximum path length Total score:np; maximum path lengthal; the weighting coefficient for paths of different lengthnl; number of paths with length lPli; nodes along the ith path of length l including the start and end nodesdj; degree of the nodesAs the paths between a pair of proteins are found, we assign scores to each path, according to the path length and the characteristics of the nodes it traverses. Then we sum up the contribution from each path to calculate the overall score. Intuitively, the longer paths should have less weight since they present weaker evidence for the interaction. Similarly, traversing along the nodes that have fewer neighbors should be given more weight in general than doing so along the nodes with many neighbors. However, we assign a lower weight to the scenario on the left. This may not give the correct result all the time, as there are many real hub-like structures in the network, but we apply a conservative weighting scheme in order to protect against false positives due to experimental errors. The square root scaling is intuitively appealing, as this gives each additional degree diminishing weight, and the factor 1 was added inside the square root to lessen the difference between the nodes with degrees one and two. The total score is then where np is the maximum path length to consider, al is the weighting coefficient for paths of different length, nl is the number of paths with length l, and Pli contains the nodes along the i th path of length l including the start and end nodes.
31 Database Molecular Interaction (MINT) database (Zanzoni et al., 2002) Datasets by Gavin et al (2002) and Ho et al (2002)Database of Interacting Proteins (DIP) by Salwinski et al., 2004.MINT focuses on experimentally verified protein interactions mined from the scientific literature by expert curators. The curated data can be analyzed in the context of the high throughput data and viewed graphically with the 'MINT Viewer'.The DIPTM database catalogs experimentally determined interactions between proteins. It combines information from a variety of sources to create a single, consistent set of protein-protein interactions. The data stored within the DIP database were curated, both, manually by expert curators and also automatically using computational approaches that utilize the the knowledge about the protein-protein interaction networks extracted from the most reliable, core subset of the DIP data. Please, check the reference page to find articles describing the DIP database in greater detail. This page serves also as an access point to a number of projects related to DIP, such as LiveDIP, The Database of Ligand-Receptor Partners (DLRP) and JDIP.
32 RNA polymerase II transcription process Mediator ComplexRNA polymerase II transcription processTo bridge between gene-specific transcription factors and the core RNAP II machinery25 subunitsComputational and 3D structural analysis
33 Carbon and energy source Glucose MetabolismCarbon and energy sourceAdaptation of their metabolism based on the available nutrientsRegulate gene expressionGlucose homeostasis regulates its lifespan and aging in all eukayotesSnf1 protein kinase complex: key components of the glucose repression and derepression pathway
34 AgingThe ultimate causes of aging are unknownMultifactorial processMutation accumulation and oxidation