Array Studio Expression Training

Array Studio Expression Training
OmicSoft July

Outline Concepts Hands on exercise (Microarray) Feature introduction
Suggested workflow Studio concepts Server concepts Hands on exercise (Microarray) Feature introduction

Suggested workflow Client side Array Studio Server side Array Server
Array Viewer Raw data Xpress Data Affymetrix CEL files Raw text /Excel files (stored in local or shared folder) Server side processing (optional) Shared folder raw files Server projects Search Search results Analysis Analysis results Array Studio Central storage Projects Meta data Shared views Lists (stored in server) Download Array Studio projects (stored in local or shared folder) Projects Publish Share Shared views

Studio concepts: solution
Project1 Data1 View 1.1 Data2 View 2.1 View 2.2 View 2.3 Project2 Data3 View 3.1 Distributed project: save all data/lists in a folder (recommended for Exon Array/SNP/CNV) Simple project: save all data/lists in a single file (recommended for MicroArray/Taqman)

Studio concepts: L shape structure
Design: sample description / phenotype data Annotation: variable annotation (i.e probeset/gene annotation) Measurement Microarray data Taqman data Exon data SNP data SNP allele signal data SNP dose/probability data Genotype data CNV log2 ratio data CNV allele difference data CNV LOH data CNV transcript level data Mythylation data Design (X) Measurement (Y) Annotation(A) Observations

Studio concepts: solution
Project Data types –Omic data (contains data matrix, design, and annotation Table data -Omic data Annotation Design Data folder Table data List types Variable list Observation list Row list Column list General list Views List folder Lists

Studio concepts: user interface
Data viewer Solution Explorer View controller Details window

Workflows Array Studio has workflows for CNV, SNP, MicroArray, Taqman and Exon array

Solution explorer You can open multiple projects in the solution
Each project can contain multiple datasets You can easily organize your data and lists by folders You can rename any data/view/folder A lot of context sensitive functions by right clicking Commonly used right-click functions Add view Import design Import annotation New folder Copy/paste views Export View audit trail View source

Data viewer Views are different from graphs
They are fully interactive and customizable The status is stored by projects You can open/close views any time Most views can be saved as PDF/EMF/PowerPoint/Excel Based on tabs, but you can float any view Drag the tabs to split the viewer F10 to float tabs Mouse over to show the project name and data name Your active view (not active project) will determine the default selected data

View controller Always use view controller to customize your view
Task tab: view sensitive menus to customize your view Variable tab: filter the variables (-omic data) Observation tab: filter the observations (-omic data) Filter tab: filter the observations (table data) Legend: show legend information Filter status and customized filters are saved with projects, and the filters might be inherited when generating new data!

Details window Details window show the details for selected variables or observations (depending on the context)

Studio concepts: interactivity
Array Studio is a fully interactive visualization package (a high dimensional version of SpotFire) Interactivity concepts Filtering Selection (click, drag or lasso) Hot track Broadcasting View customization (from task or legend) Exporting Quick demo of the interactivity concepts

Studio concepts: Selection vs Filtering
Selection shows details on demand for a particular selected row/variable or column/observation (and highlights the selected items in each view for that dataset). Use Selection Menu to clear row and column selections. Filtering “filters” a particular dataset (and all accompanying views) using a set of criteria.

Server concepts: what does the server do?
Feature Description Central storage Data repository for all data objects in Array Studio Sharing Share your data (under access control) with your colleagues/clients Search project Access data (and all views) by filtering projects, variables and observations Search variable profile Given an ID (e.g. Probeset), find all the information from all/selected projects Search annotation Give a symbol/text, find all IDs that contain the symbol/text in the annotation or all annotations for a given master ID List Analysis Give a list, find projects that show over representation of significance Search segment Use multiple criteria to search CNV segments across projects/platforms PValue region analysis Given a chromosome region, find all p-values from all projects and display them in a region view CNV region analysis Given a chromosome region, find all CNV values (log2 ratio or allele differences) from all projects and display them in a region view This table only displays selected features

Statistical algorithms for Expression/Microarray data in Array Studio
Proc GLM One-way ANOVA Two-way ANOVA Two-way Nested ANOVA General Linear Model (fixed, mixed, or random models). Survival Model Proportional hazard regression (Proc tphreg) Logistic Regression Proportion data logistic regression (Proc logistic) Omicsoft’s implementation is independent of SAS/R and Array Command (Array Studio’s command module can be run under Linux (AS can not be run under linux). All the implementations are exact (i.e. not approximations)

PART II Exercises for Microarray data analysis

The hands-on training will focus on
Usability If you know how to use Microsoft Office, you will be able to know how to use Array Studio Interactivity We will have lots of exercise to interact with different views Performance Array Studio is usually times faster than its competitors

Keys for Today/Keys for Success with Array Studio
The goal is not to familiarize you with every command in Array Studio that you will ever use. Instead, we hope to give you a good start that you can build on yourself. The key to learning Array Studio, like any complicated software, is to practice with your own data. Don’t worry about “hurting” things. Clicking and trying out new options can only help you learn the software better. With that said, you can save your data, and always return to a previously saved version (using the Save As command). If you don’t see something, or can’t figure something ask, don’t hesitate to ask…..First, consult the Online Help and Frequently Asked Questions ( database, then ask a power-user, or if they cannot help, or are not available, call Omicsoft Support ( ) or at Web Chat and Remote Support also available at using the Live Help Button

List of features to exercise
Linear modeling and result exploration Linear modeling-2 Way ANOVA Volcano Plot Summarize Inference Report Venn diagram Interpretation Hierarchical clustering Molecular signatures analysis Pattern and power Find neighbors Audit trail Signal extraction RMA extraction Attach design table Raw data visualization Web details on demand Observation table view VariableView Quality Control PairwiseScatterView PCA

Launch Array Studio Launch Array Studio now.

Workflow Window The Workflow Window can be found on the left-hand side of the screen the first time the user starts Array Studio. Workflows are used as a starting place for first-time and novice users of Array Studio. Array Studio offers workflows for Microarray, Taqman, Exon, CNV, and Genotyping analysis. Microarray workflow includes sections for Getting Started, Manage data, Preprocess, Quality Control, Statistical Inference, and Pattern Recognition. The workflows do not contain all the commands and analyses that can be run in Array Studio, but should give the user a good start.

Create a New Project A project contains all the datasets, results, reports, views, lists, etc. in a single file (for “simple projects” i.e for microarray, Taqman data) It is perfectly fine to share/transfer the project file to another user and the other user will be able to open the project immediately (Array Studio is required) When you create a new project, the project is present in memory until you save it. Now – create a new project by pushing the New Project button in the Microarray Workflow Array Studio will prompt you to choose a type of project. For Microarray data, it is recommended to create a “simple project”. Click the Browse button, and name the project and select a save location. Click OK to continue. Note: Alternatively, to create a New Project, go to File Menu | New Project or click the New button in the toolbar.

Adding Microarray Data/Chip Normalization
Choose Add Microarray data from the workflow. Select Affymetrix .CEL files from the source Add all 24.CEL files Push Submit button Array Studio provides fast RMA/GCRMA/MAS5 implementations The result is benchmarked with R packages (max difference < 1e-7) Can easily process thousands of chips in a few hours The 24 .CEL files ~30 seconds, depending on the computer speed No memory problems Alternatively, data can be added by going to the File Menu | Add Data | Add Microarray Data or clicking the Add Data button on the toolbar

Attach design table .CEL files generate the Y block (signal matrix)
Array Studio automatically attaches the annotation block (A) Design block still needs to be attached to the dataset Array Studio prompts the user to attach the Design Table upon import of data. Click Yes to import Design Table. Choose Tab delimited file and select dbpts.design.txt to attach the Design Table to the dataset. Rename MicroArrayData to DBPTS (right click and choose rename) If you choose no upon import, you can always attach the Design table later on by right-clicking on the Design node for your dataset (in the Project Explorer), and choosing Import.

The Solution Explorer Switch to the Solution Explorer by finding the tab for it at the bottom of the Workflow Window (or, going to View Menu | Show Solution Explorer. The Solution Explorer is used to organize all the data and views in your project, and allow you to keep open multiple projects at a tme. Imported microarray/genotying/taqman data is organized in the –Omic data section. Generated results will usually be shown in the Table data section. Other important sections include the List section (for creating lists of genes/probesets/etc..), as well as a QC Section, Table Section, Inference Section, etc. (not shown). In Array Studio 3.6, most sections are just “folders” and can easily be changed, but the important thing to remember is that there is an –Omics section and a Tables section. For each Data, Table, Inference Report, etc., the Solution Explorer also maintains the views. Notice the Table view under DBPTS. These views can be closed and opened, and all settings are retained. Try closing the DBPTS\Table View now, then reopening it by double-clicking it in the Project Explorer.

The TableView/View Controller
The TableView shows the microarray data, with the columns representing each chip, and the rows representing each probeset. The View Controller is found on the right-hand side of Array Studio. It’s responsible for the customization of all views. Switch to the Variable Tab. The Variable Tab and Observation Tab are used for filtering of data. The Variable Tab uses the attached Gene Annotation for columns to filter, while the Observation Tab uses the attached Design Table for columns to filter. Type ^egr1$ into the Gene Symbol filter to filter the TableView for only the gene egr1. (Uses regular expressions) The Observation Tab can also be used to filter the data. Switch to it now, and filter treatment to control. Notice that the TableView is updated to reflect the filter. Note:right-clicking on treatment will offer the option of three different types of filters (radio, checkbox, and string). Clear the Observation tab filter by clicking the (All) radio box or selecting the Reset All Filters tab.

Details Window In Array Studio, all views are interactive.
Selecting a column header in the TableView or a row header brings up details in the Details Window (found at the bottom of the screen), showing the Design Table information for the selected Observation (Chip) or the Gene Annotation for the selected variable (probeset). The Details Window allows the user to find out on-the-fly information about individual probesets, chips, etc..

Web Details Web Details is used to provide users with on-demand web information about particular variables/probesets. Right-click on the selected probeset in the Details Window or main view window. This brings up a list of websites the user can choose to find out info about that probeset. Select Entrez and one of the gene identifiers. Internet Explorer should open containing the web details/ Web details allows easy access to Array Server (via Search Variable Profile and Search Variable Data—to be shown later). Also includes access to GeneGo and Ingenuity’s GeneView and Gene Neighborhood functionality.

VariableView What is the variable view?
Variable view is a highly customizable view designed for high dimensional data. It provides auto-trellis for each variable and shows the profile of each variable in its own pane Why does Omicsoft think variable view is the most important feature of the software? It is unique It addresses the needs of most biologists: look at the gene profiles It is highly optimized It has many special features that other views do not have, e.g. confidence intervals

VariableView To add a new view to the DBPTS dataset, right click on the DBPTS node of the Solution Explorer. Click Add View, then select VariableView from the ensuing window. (Alternatively, just choose Add View from the toolbar). Scroll through all ~16000 charts, one for each gene. This view can be customized. Re-filter using the Variable Tab for ^egr1$ so that only one chart is showing. Using the Task Tab of the View Controller, customize this view.. Specify Title Columns to include Gene Symbol along with probeset. Specify Profile column to Time. Specify Split column to Treatment. Specify Transformation to Exp2. Why does the X-Axis look strange? What are we looking at? The Column Type is wrong for time…..

Column Type The VariableView’s X-Axis appears to show the time, on an integer scale. We’d rather it show each time point (1, 3, 6, 18hrs) as individual factors. This can be changed by opening the Design Node of the Solution Explorer for the dataset, then double-clicking the Table view. Column properties can be edited by going to Table Menu | Columns | Column Properties (alternatively, right click on the design column in the table view and choose Column Properties). Select time column, then change Column Type to Factor.

VariableView Now switch back to the VariableView.
Notice the X axis is now correct. Now click the Show Summary Information button in the Task tab of the View Controller. On-the-fly p-value information is shown for time (profile column), treatment (split column), and the interaction of the two factors. This should not replace a formal analysis, but can be used as a way to quickly find out if a gene is significantly changing. Click the Change Profile Gallery button in the Task tab of the View Controller, and switch to a different view (choose Bar as the gallery type), then click the Show Error Bars button. Switch to the Legend tab of the View Controller to see the Legend for the chart. Any charts can be opened at any point in PowerPoint Reset all Variable Tab filters now.

Variable view: other features
LASSO selection-right click and drag Control selection-for choosing multiple points F10-for popping the view out (good for multiple screens) Open in Excel Most of the features also apply to other plots

PairwiseScatterView PairwiseScatterView can be used for QC purposes, to compare biological/technical replicates. It shows a ScatterView comparing chip-to-chip, (bottom left of the view), as well as the MA Plot for each chip comparison. Add a new view, PairwiseScatterView, using the same method used earlier for VariableView. Filter the group column, in the Observation Tab to DBP.t18. The PairwiseScatterView is updated to show only the 3 chips belonging to the DBP treatment at timepoint 18. Notice that one chip 22A, appears to correlate more poorly to the other chips. This is the first indication this is an outlier chip.

Principal Component Analysis (2 components)
Choose Principal Component Analysis from the Quality Control section of the microarray workflow. Alternative, choose Microarray| QC | Principal Component Analysis from menu Make sure that Demonstration is selected as project. Make sure DBPTS is selected as Data Ensure that 2 components are generated Ensure that group is selected for Group. Ensure that Calculate Hotelling T2 is selected. Click Submit.

Principal Component Analysis (2 components)
PCA with two components is generated. Legend available using the Legend Tab. Automatic coloring based on the Group setting. Customize chart using Change Symbol Properties. Change Labels to All, By to chip. Chart is updated, indicating appears to be an outlier. Select chip 22A. Notice Details Window. Point should turn red. Click Exclude Selection in the Task tab of the Project Explorer. This re-runs the PCA, and creates a list, DBPTS.Observation23. This list will be used for further analysis, as it contains the 23 “good” chips.

Lists What is a list in Array Studio?
A flat list of probesets, chips, genes, etc.. Lists can be re-used in other projects. Lists can be used to filter. Lists can be used when running analysis modules to limit the analysis. Variable Lists, Observation Lists, Row, Column, or General lists—Array Studio is smart and only shows context-specific lists.

Principal Component Analysis (3-D)
Choose Principal Component Analysis from the Quality Control section of the microarray workflow. Alternative, choose Microarray| QC | Principal Component Analysis from menu Make sure that Demonstration is selected as project. Make sure DBPTS is selected as Data Ensure that 3 components are generated Ensure that group is selected for Group. Click Submit.

Principal Component Analysis (3-D)
A fully interactive 3-D PCA is returned. Includes trackball tool, panning/zooming tool, and selection tool for interacting with the graph. Functions the same as 2-D plot (changing coloring, excluding selection, etc.)

Differential Expression/Two-Way ANOVA
Using Workflow, select Two-Way ANOVA from the Statistical Inference section. Set Data to DBPTS. Ensure that all Variables are selected, but use the list DBPTS.Observation23 for Observations. The design of this experiment is 4 time points, with a treatment and control at each time point. Thus, contrasts should be generated for each time point, comparing the treatment (DBP) to control. To figure out the comparisons, read from the top to the bottom. For each, time, Compare to control will create 4 comparisons. Other options include generating F-Test (time, treatment , time*treatment) Pvalues, generating LSMean data, Appending LSMean data to the inference report, and generating estimate data. Click Submit to run the module.

General Linear Model Demonstration of General Linear Model Module
Two-way ANOVA gives equivalent results—General Linear Model provides much power power and flexibility.

Results of Statistical Inference
The Two-Way ANOVA generates a table called DBPTS.Tests in the Inference folder in the Tables section. This includes two generated views- Report and Volcano view. In addition, Lists were generated for each comparison, using the alpha level (p-value cutoff) for each comparison. A 5th list is generated, with all the significant probesets in the Two-Way ANOVA Note: Lists are generated using the adjusted p-value column, because a multiplicity adjustment was set in the Two-Way ANOVA window.

Volcano plots Volcano plots give a nice overview of the modeling results Array Studio automatically sets the layout of the plots to incorporate as much information as possible on one screen For this particular data, a 2*2 layout is set (2 rows, 2 columns) All the plots are linked (both hot track and selection) A uniform scale could be more informative Details on demand could be useful Select a probeset in the top right corner of the 1 DBP vs Control and notice that the Details Windows provides on-demand gene annotation info, including p-values, estimates, etc.. If you do not see anything on the volcano plot, reset your filter

Table reports Volcano plot is one way to view the modeling results. Table view is another way (so is chromosome view). Usually a table with everything is too big to explore. Filtering is essential. To view the table reports, double click the table view generated by the modeling process. Use Group By Mode to arrange the filters so all the raw pvalues are grouped together (and adjusted pvalues,, estimates, etc..) Create a list that contains probesets significant in all treatments Filter 1 DBP vs. Control.RawPValue < 0.05 Filter 3 DBP vs. Control.RawPValue < 0.05 Filterr 6 DBP vs. Control.RawPValue < 0.05 The final number should be 78 rows. Click Add Item, then Add List From Visible Rows, then choose List Source as Probe Set ID.

Broadcasting What if you’ve filtered one dataset, and want to look at the filtered results in other open tables or datasets? Options: Create a list, then filter in that other dataset by that list. Broadcast the results to all the other open datasets. Cross-Platform broadcasting Uses Array Server to map to a “master ID” and then “broadcasts” to the other platforms. Use when looking at multiple platforms (or species). Broadcast your results now using Current Filter->Filter all Opened Views Return to the previously created Variable View

Venn diagram view Generate Venn diagram view
Right click on Solution Explorer | Data | DBPTS | Views and choose Add View Choose VennDiagramView from the list Select three of your lists from the Solution Explorer and darg and drop into the view. Advanced features: change the title of the plot Venn diagram is also interactive Hint: to compare more than 4lists, you can use Compare Lists feature Remember, our 3 lists were generated with the adjusted p-values, so the number of probesets similar in all three lists should not match the previously created Filtered list

Summarize Inference Report
Summarize Inference Report used to count the # variables meeting certain criteria. Go to Summarize Inference Report in the microarray workflow, under Statistical Inference. Alternative, go to Microarray Menu | Inference | Summarize Inference Report. Select DBPTS.Tests, Variables all, and all 4 estimates. In Options section, build the conditions. Build Raw Pvalue<0.05 for all conditions, but make one condition for FC>2, FC>3, FC<-2, FC<-3 Make sure to name each condition. Table is generated, giving a count for each condition/estimate. Notice the interactivity of the table.

Hierarchical clustering
Select Hierarchical clustering from the Pattern Recognition section of the microarray workflow. Alternatively, choose Microarray Menu| Pattern | Hierarchical clustering Make sure DBPTS is the data to be analyzed Select 18 DBP vs control.Sig379 as the working variable set. Select DBPTS.Observation23 as the working observation set Check Compute variable tree Check Generate classic dendrogram view. Push Submit button

Dendrogram Interacts with heatmap table view Adjust thumbnail width
Adjust thumbnail cell sizes Fit thumbnails into window Change color properties Select branches Select thumbnail blocks Change color bars Adjust heatmap cell sizes Specify annotation columns Select Gene Symbol Star

Classic Dendrogram Not as interactive as the other view, but provides a “flat structure”. Similar options for changing colors and labels. Hint: Right-clicking in the legend allows changing of colors (applies to all views).

Molecular signatures analysis
Uses the molecular signatures datasbase to find enriched pathways and functions. ( Choose Microarray Menu | Annotation | Molecular Signatures Choose 18 DBP vs Control list. Choose Rat as the organism, and Map by annotation Column Gene Symbol. Click Submit.

Molecular signatures analysis
Returns a table of GeneSets with p-values. Sort by raw or adjusted p-value Click on links for regulated genesets. Alternatively, use Microarray Menu | Geneset Enrichment Analysis for the “Classical” version of GSEA

Find neighbors Customize the neighbor view
Select Find Neigbors from the Pattern Recognition section of the Microarray workflow. Alternatively, choose Mcroarray| Pattern | Find Neighbors DBPTS.Observation23 as the working observation set Find neighbors for _at (if this probeset is the first selected probeset it will be automatically inputted) Change Fixed neighbor number to 20 Customize the neighbor view Reset filter if necessary Hide X-axis labels Sort the heatmap columns Add sample color bars Change Y-axis label to gene symbol Add mean/median values

GeneGo/Ingenuity Requires access to both systems.
Right-clicking on a probeset provides GeneView access to Ingenuity and GeneGo Microarray menu provides access to GeneGo MetaCore: Upload Data (uploads data with fold changes and p-values for analysis in MetaCore). Microarray menu provides access to Ingenuity: Search View Canonical Pathway Create New Pathway Upload Data (uploads data with fold changes and p-values for analysis in Ingenuity).

Audit trail Launched from File Menu| Audit trail
Audit trail is owned by a project, not owned by a specific data entry. Source, on the other hand, is owned by a specific data entry and describes how the data was generated OmicScript can be used to re-generate the results

PART III Array Server features

Array Server Array Server contains ~2000 fully analyzed (including p-values, fold changes, etc..) projects from GEO and Array Express. (Note: will soon be 5000 projects—almost all public Affymetrix projects). Publishing BMS data to the server allows integration of internal data with public data. Useful for sharing data between colleagues, using the same views from Array Studio Integration between the Local Analysis tab and Server Analysis tab in Array Studio.

Project Publishing

Search project list

Ranked variable List

List Analysis Take a list from your project, and find other projects on the server that have similar results (overrepresentation of that list of genes).

Interaction between Local Analysis and Server Explorer.
Right-click on a list allows: Search Profile Upload to Server (saves list to server for quick access at a later point). Server List Analysis (demonstrated previously).

Interaction between Local Analysis and Server Explorer.
Right-click on a probeset in a table view or details view allows: Search Variable Profile Search Variable Data Right-click on Solution in the Solution Explorer allows adding a project from the server directly to the Solution Explorer for further analysis.

Omicsoft’s Philosophy
If there is something that we do not provide in Array Studio, please ask. We are always adding new features, and it is based on customer feedback, so if there is something you’d like to see, or something that you’d like to see done better, send us a message or give us a call. We always appreciate your feedback.

Resources Help Menu | Tutorials
Highly recommend going through the Microarray Tutorial yourself Individual analysis modules have help buttons. Frequently Asked Questions section of the Omicsoft Support is always available and willing to help you—including remote support (i.e screen sharing) ( OMIC) or

Array Studio Expression Training

Similar presentations

Presentation on theme: "Array Studio Expression Training"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Array Studio Expression Training

Similar presentations

Presentation on theme: "Array Studio Expression Training"— Presentation transcript:

Similar presentations

About project

Feedback