Presentation is loading. Please wait.

Presentation is loading. Please wait.

Gene Signatures and Knowledge-Guided Gene Set Characterization Lab

Similar presentations


Presentation on theme: "Gene Signatures and Knowledge-Guided Gene Set Characterization Lab"— Presentation transcript:

1 Gene Signatures and Knowledge-Guided Gene Set Characterization Lab
Han Sinha Song Weinshilboum Gene Signatures and Knowledge-Guided Gene Set Characterization Lab KnowEnG Center PowerPoint by Charles Blatti Signatures and Knowledge-Guided Characterization | KnowEnG Center | 2018

2 Introduction This goals of the lab are as follows:
Define a novel gene expression signature based on estrogen receptor status in TCGA samples using the integrative (iLINCS) Data Portal and identify other similar known gene expression signatures Use networks of prior knowledge to identify pathways, additional genes, and other annotations that relate to the gene set of our novel signature using SPIA, GeneMANIA, and KnowEnG’s DRaWR. Signatures and Knowledge-Guided Characterization | KnowEnG Center | 2018

3 Step 0: Download and Extract Data Files
For viewing and manipulating the files needed for this laboratory exercise, download the following archive: ures_and_Characterization.zip Right Click and Extract the contents of the archive to your course directory. We will use the files found in: [course_directory]/07_Signatures_and_Characterizati on/data/ Signatures and Knowledge-Guided Characterization | KnowEnG Center | 2018

4 Creating a Novel Gene Expression Signature
In this exercise, we will use the integrative iLINCS data portal to extract gene expression data from TCGA BRCA samples and build a gene signature based on the estrogen receptor status. Signatures and Knowledge-Guided Characterization | KnowEnG Center | 2018

5 Step 1: Perturbagen and Disease Datasets
Open your web browser and go to the iLINCS data portal: This portal, curated by the LINCS Data Coordination and Integration Center, contains transcriptomic and proteomic datasets from the many LINCS affiliated projects, including the LINCS L1000 assay. It also contains several other large public datasets of perturbations to cell lines and samples of disease. We will define a custom gene signature from TCGA data and see how it relates to the library of signatures generated from the LINCS L1000 project. Signatures and Knowledge-Guided Characterization | KnowEnG Center | 2018

6 Step 2: Select Breast Cancer Dataset
Click on “Datasets” in the options along the top Select the “All Datasets” tab Click “Choose” button for TCGA datasets Find “919 mRNA-seq breast invasive carcinoma (BRCA) samples from TCGA project” by Collins, et al. Click “Analyze”. Signatures and Knowledge-Guided Characterization | KnowEnG Center | 2018

7 Step 3A: Creating a Novel Gene Signature
Click on “Create a Signature” In “Select grouping variable” dropdown select “breast_carcinoma_estrogen_receptor_status” In “Select group 1” dropdown select “Negative” In “Select group 2” dropdown select “Positive” Finally, click on “Create Signature” button Signatures and Knowledge-Guided Characterization | KnowEnG Center | 2018

8 Step 3B: Our ER Status Gene Signature
When the signature is calculated, a quick summary of the number of samples from each group is presented Next, we will look more closely at the genes involved in our signature. Signatures and Knowledge-Guided Characterization | KnowEnG Center | 2018

9 Step 4A: Examining Gene Expression of our Signature
To get statistics about how the signature is defined, we will select “Modify the list of selected genes” We are presented with a volcano plot for the log fold change (x-axis) and differential expression significance (y-value) of each gene. Thresholds on both of these criteria define the genes selected for the signature Slide the Log2 Fold Change cutoff to the value 3 so there are only 100 genes in the our ER status signature Signatures and Knowledge-Guided Characterization | KnowEnG Center | 2018

10 Step 4B: Examining Gene Expression of our Signature
Click the “Table” tab to see the list of selected signature genes Note that ESR1, estrogen receptor 1, is the most significantly differentially expressed gene, which is consistent with the immunohistochemical staining assay result that defined the positive and negative groups. Because of the number of samples (868) is high, the differential expression p-values are very significant for these top signature genes Click the “Download significant gene(s)” button to save this table as an Excel file called “genelist_tmp_volcano_*.xls” and move to [course_directory]/07_Signatures_and_Characterization/results/. Click the “Analyze 100 significant genes” button to continue analysis Signatures and Knowledge-Guided Characterization | KnowEnG Center | 2018

11 Finding Similar Signatures from Public Libraries
Here we will attempt to find signatures from large public collections that relate to the ER status signature we defined. These public signatures are defined using both basic methods as well as the Characteristic Direction method. Signatures and Knowledge-Guided Characterization | KnowEnG Center | 2018

12 Step 5A: Finding Related L1000 Signatures
Click on the “Connected Signatures” tab in the bottom half of the screen. We will start by looking at shRNA gene knockdown signatures defined from the LINCS L1000 project, thus only 976 genes are measured. Click the checkbox next to “LINCS consensus (CGS) gene knockdown signature”. These consensus signatures are defined by combining all different shRNA with different seeds that target the same gene and by comparing to appropriate control experiments Signatures and Knowledge-Guided Characterization | KnowEnG Center | 2018

13 Step 5B: Finding Related L1000 Signatures
When view the similar signatures when the calculation is complete, expand the results by clicking on “4856 of LINCS consensus (CGS) gene knockdown signature” The second result is CDK4, cell division protein kinase 4, an important regulator of cell cycle progression. Previous literature has shown that silencing of CDK4 will have a variable influence on cancer progression based on the expression of estrogen receptor. Inhibition of CDK4 increases migration and stem-like cell activity in ER negative breast cancer. Signatures and Knowledge-Guided Characterization | KnowEnG Center | 2018

14 Step 5C: Finding Related L1000 Signatures
By clicking on the small graph icon in the Concordance column, we can see the correlation between our ER status signature and the LINCS signature for CDK4 knockdown Signatures and Knowledge-Guided Characterization | KnowEnG Center | 2018

15 Step 6: Related ENCODE Signatures
Uncheck the checkbox for “LINCS consensus (CGS) gene knockdown signature” and Click the checkbox next to “ENCODE Transcription Factor Binding Signatures”. Expand the results when computed. These signatures are defined by creating gene level scores that integrate the distance of transcription factor ChIP peaks and the likelihood that the gene is regulated in the condition using the TREG method, True REGulatory TF-gene interactions. Two of the top five results we recover are TF signatures of estrogen receptor binding meaning our ER status differential expression signature matches ER differential binding signatures. The other TFs in the top signatures also have known roles in mediating ER binding. Signatures and Knowledge-Guided Characterization | KnowEnG Center | 2018

16 Step 7A: Finding Related Signatures Using Characteristic Direction
Click on the “Analysis Results” tab which contains many different methods for analyzing our novel ER status gene signature. We will discuss some of these next. The public signatures so far have been defined by independently considering whether each gene is differential expressed. The following exercise uses signatures defined with the characteristic direction method (L1000CDS²), which represents each signature with an arrow in gene expression space. Signatures and Knowledge-Guided Characterization | KnowEnG Center | 2018

17 Step 7B: Finding Related Signatures Using Characteristic Direction
The L1000CDS² tool is a LINCS L1000 characteristic direction signature search engine where users can find matches to their input signature from 33K small molecule perturbagen signatures covering 62 cell lines and 4K small molecules. We will go directly to the L1000CDS² tool by pasting this link in our browser: Signatures and Knowledge-Guided Characterization | KnowEnG Center | 2018

18 Step 7C: Finding Related Signatures Using Characteristic Direction
Find, open, and copy the contents of the file [course_directory]/07_Signatures_and_Characterizati on/data/ERstatus_top100_logDiffExp.csv This is our ER status gene signature with the pair of columns [Name_GeneSymbol, Value_LogDiffExp] extracted from our earlier Excel download Paste the contents of this file into the “up genes” text box on the left, its name will change to “signature”. In the Configuration panel, switch mimic to reverse small molecule signature make sure latest database version is selected uncheck the three remaining checkboxes Click the “Search” button Signatures and Knowledge-Guided Characterization | KnowEnG Center | 2018

19 Step 7D: Finding Related Signatures Using Characteristic Direction
These are the top small molecule LINCS L1000 signatures that are the most opposite to our ER status signature. The idea is that if our ER status signature represents a direction in gene space, these signatures of small molecules perturbations represent the best reversal of that signature. The top two results are unnamed Broad compounds, but the Jak2 inhibitor curcubitacin I is known to reduce mammary tumorigenesis and metastasis by inhibiting Rac1 activity which is frequently elevated in ER positive tumors. Signatures and Knowledge-Guided Characterization | KnowEnG Center | 2018

20 Step 8: Caveats about using LINCS signatures
Only 978 genes are measured in L1000 probe. Other gene values can be imputed from Connectivity Map dataset. However, the missing or imputed values can make signature analysis less reliable. Also, although tens of thousands of signatures exist, most are still missing. Tools are being developed to identify signatures by learning models on dense parts of the cube and then learning how to correctly transfer those models to sparse regions. Signatures and Knowledge-Guided Characterization | KnowEnG Center | 2018

21 Discovering Pathways Related to Our Gene Signature
In this section, we will consider some of the characterization resources that available for gene signatures and gene sets. Signatures and Knowledge-Guided Characterization | KnowEnG Center | 2018

22 Step 9: Standard Gene Set Enrichment
Back on the iLINCS tab, two of the “Analysis Results” tools that are linked to are Enrichr and DAVID. DAVID is the enrichment tool used in the Regulatory Genomics lab. Both tools use standard statistical enrichment tests to examine the overlap of the 100 genes of our ER status gene signature with Gene Ontology term annotations, pathways, and other gene sets. These tools output the results in slightly different ways, so you may want to explore them in your own time. Signatures and Knowledge-Guided Characterization | KnowEnG Center | 2018

23 Step 10A: Pathway Network Enrichment Test
Signaling Pathway Enrichment Analysis (SPIA) is a method for assessing the impact of a gene set on a pathway. It combines standard enrichment p-values with network perturbation based p-values. Click on “Pathway Analysis” Estrogen signaling pathway is the third result related to our ER status gene signature, although the overall adjusted p-value “SPIA padj” is not significant. Our gene signature is computed to activate the pathway. Signatures and Knowledge-Guided Characterization | KnowEnG Center | 2018

24 Step 10B: Pathway Network Enrichment Test
Click on the KEGG icon in the last column for Estrogen signaling pathway. Yellow nodes are up-regulated genes in our signature, blue are down-regulated Signatures and Knowledge-Guided Characterization | KnowEnG Center | 2018

25 Step 11A: GeneMANIA Return to the analysis result by clicking on “Differential Expression Signature” in the tool bar at the top The last linked tool we will explore today from iLINCS is GeneMANIA. GeneMANIA is a network-based guilt-by-association algorithm that finds the network neighbors of an input gene set from a heterogeneous collection of interaction networks Go to Signatures and Knowledge-Guided Characterization | KnowEnG Center | 2018

26 Step 11A: GeneMANIA We are going to enter the top 20 differentially expressed genes from our ER status gene signature. We will use GeneMANIA to return 20 additional network neighbor genes (not necessarily differentially expressed themselves) Then we will look at functional enrichment of this combined set of 40 genes. Find, open, and copy the contents of the file [course_directory]/07_Signatures_and_Characteriz ation/data/ERstatus_top20.txt This is the top 20 differentially expressed genes of our ER status gene signature extracted from the Name_GeneSymbol column our earlier Excel download Paste this list into the text box at the top left corner of the main page Signatures and Knowledge-Guided Characterization | KnowEnG Center | 2018

27 Step 11B: GeneMANIA Click on the stacked-dots options button This first list shows all the possible networks that GeneMANIA will consider combining for the analysis of our twenty genes Select “Customise advanced options” This menu shows that we are going to find at most 20 neighbors using the automatic network weighting scheme, which is based on our 20 query genes Click the search magnifying glass. Signatures and Knowledge-Guided Characterization | KnowEnG Center | 2018

28 Step 11C: GeneMANIA The resulting network contains our 20 input genes (striped) and our 20 predicted network neighbors (solids). The size of the network neighbors indicates its final guilt-by-association value on the composite affinity network. You may choose between three arrangements of the graph. The stacked arrangement may be easiest for understanding the nodes. You can hover over any node to highlight its neighbors. For example, NCOA7 is also known as Estrogen Nuclear Receptor Coactivator 1 NCOA3 is associated with Estrogen-Receptor Positive Breast Cancer Both are connected to ESR1 (and other top 20 genes) through pathways edges and neither are in our original 100 differentially expressed gene signature Signatures and Knowledge-Guided Characterization | KnowEnG Center | 2018

29 Step 11D: GeneMANIA On the right side is the selected interaction networks that were relevant to the 20 input genes, sorted by type and by weight. You can toggle the networks to display any set of edges. The highest weighted co-expression network is from breast tumors and relates the top 20 genes to each other fairly well, but does not connect them to the predicted 20. Signatures and Knowledge-Guided Characterization | KnowEnG Center | 2018

30 Step 11E: GeneMANIA Finally, we can perform the standard enrichment tests incorporating our predicted neighbors into our gene set. Click on the pie chart in the bottom left corner We see most of the results relating to hormone and steroid signaling pathways and receptors. Signatures and Knowledge-Guided Characterization | KnowEnG Center | 2018

31 Gene Set Characterization Using Discriminative Random Walks
In this final exercise, we will find terms related to the 100 top differentially expressed genes of our ER status signature using the DRaWR method that incorporates the functional annotation terms directly in the network-based algorithm. Signatures and Knowledge-Guided Characterization | KnowEnG Center | 2018

32 Step 12A: Login Into KnowEnG Platform
Go to Select “KNOWENG PLATFORM”. Select “LOGIN OR REGISTER”. Enter your biocluster username. And for the initial password enter: KnowEnGCompGen2018 Then click “Sign in” Agree to Terms of Use. Change password to something you will remember. Signatures and Knowledge-Guided Characterization | KnowEnG Center | 2018

33 Step 12B: Launch DRaWR Analysis
The first page has links to many resources, but we will get started by clicking “Start a New Pipeline” The KnowEnG Analysis Platform has many knowledge network-informed pipelines. You will learn about more of them in the afternoon session. For now, hover over Gene Set Characterization and click “Start Pipeline”. Signatures and Knowledge-Guided Characterization | KnowEnG Center | 2018

34 Step 12C: Upload Data Leave the default species “Human” Find, open, and copy the contents of the file [course_directory]/07_Signatures_and_Char acterization/data/ERstatus_top100.txt This is the top 100 differentially expressed genes of our ER status gene signature extracted from the Name_GeneSymbol column our earlier Excel download Click on the “Upload New Data” tab Select the “Paste a Gene List” button. Give your gene list a name, e.g. “ERstatus_gene_list” Paste the file contents into the gene list text box. Click “Done” Click “Select” next to the name of your pasted list and you should see a checkmark Click “Next” Signatures and Knowledge-Guided Characterization | KnowEnG Center | 2018

35 Step 12D: Configure Algorithm Parameters
We will choose to use a subset of 4 gene set collections available in the knowledge network Ontologies: Gene Ontology (default) Pathways: Enrichr Pathway Membership (must add) Pathways: Reactome Pathways Curated (must add) Tissue Expression: GEO Expression Set (must add) (unclick Protein Domains: PFam Protein Domains) Click “Next” Signatures and Knowledge-Guided Characterization | KnowEnG Center | 2018

36 Step 12E: Configure Network Parameters
Click “Yes” for question about using the Knowledge Network The Knowledge Network we will use is an integrated network from the HumanNet project (“HumanNet Integrated Network”) Network size information can be found here The amount of network smoothing controls how much importance is put on network connections instead of the original 100 genes. We will use the default of 50% Click “Next” Signatures and Knowledge-Guided Characterization | KnowEnG Center | 2018

37 Step 12F: Reminder about DRaWR Algorithm
Squares are the Gene Ontology and pathway terms we selected Query Genes are our 100 ER status signature genes Gray edges are the HumanNet Integrated Network We are asking the algorithm to find property squares that a random walker who is forced to restart often at the query genes will visit unusually frequently Signatures and Knowledge-Guided Characterization | KnowEnG Center | 2018

38 Step 12G: Launch DRaWR Job
Change job name to “gene_set_characterization-DRaWR-HN” Verify all the parameters are correct. Click “Submit Job” While this is running, we are going to launch the standard fisher exact enrichment tests with the same data sets. Click “Start New Pipeline” Signatures and Knowledge-Guided Characterization | KnowEnG Center | 2018

39 Step 13: Launch Standard Enrichment Tests
Hover over Gene Set Characterization and click “Start Pipeline” Click “Select” next to the name of your pasted list and you should see a checkmark. Click “Next” Select same 4 collections: Ontologies: Gene Ontology (default) Pathways: Enrichr Pathway Membership (must add) Pathways: Reactome Pathways Curated (must add) Tissue Expression: GEO Expression Set (must add) (unclick Protein Domains: PFam Protein Domains) Click “Next” Click “No” for question about using the Knowledge Network. Click “Next” Change job name to “gene_set_characterization-fisher” Verify all the parameters are correct. Click “Submit Job” Signatures and Knowledge-Guided Characterization | KnowEnG Center | 2018

40 Step 14A: View DRaWR Results
Click the “Go to Data Page” button You can check the status of your jobs here. Gray arrows mean that your job is currently queued or running. A red icon means something went wrong. Otherwise, when your job is successfully finished, you should be able to click the green arrow and see the primary result files. Click on the DRaWR job “gene_set_characterization-DRaWR-HN” Then click on the “View Results” button Signatures and Knowledge-Guided Characterization | KnowEnG Center | 2018

41 Step 14B: View DRaWR Results
Slide the filter slider all the way to the right. The DRaWR method picks up many GEO Expression gene sets that relate to ESR1 and estrogen and estradiol. DRaWR also ranks highly a number of pathway and Gene Ontology terms related to extracellular matrix, which is known to have many molecules effected by estrogens and related to ER expression Signatures and Knowledge-Guided Characterization | KnowEnG Center | 2018

42 Step 14C: View Fisher Results
Click the “Data” link at the top of the page Click on the DRaWR job “gene_set_characterization-fisher” Then click on the “View Results” button Slide the filter slider all the way to the right. The Fisher method finds the same GEO Expression gene sets that relate to ESR1 and estrogen and estradiol, as well as some additional estradiol ones that DRaWR missed. It also detects many more less obviously related GEO gene sets. The standard enrichment method does not detect any highly significant enrichments with pathways or Gene Ontology terms. The extracellular matrix terms detected by DRaWR are strongly connected to the signature genes, but mostly through their HumanNet network neighbors and not direct connections. Signatures and Knowledge-Guided Characterization | KnowEnG Center | 2018

43 Main Take Home Messages
When you create your own gene signature, you can search libraries of public gene signatures that might provide you with insights relating to mechanisms (e.g. gene knockdowns and transcription factor binding) or treatments (e.g. reverse small molecule perturbagens). A gene signature or more simply a gene set can be analyzed in the context of a pathway, interaction, or other affinity network to provide complementary annotations to standard enrichment tests Signatures and Knowledge-Guided Characterization | KnowEnG Center | 2018


Download ppt "Gene Signatures and Knowledge-Guided Gene Set Characterization Lab"

Similar presentations


Ads by Google