Download presentation
Presentation is loading. Please wait.
1
January 20081 MSCL Analyst’s Toolbox Instructors: Jennifer Barb, Zoila G. Rangel, Peter Munson Jan 2008 Mathematical and Statistical Computing Laboratory Division of Computational Bioscience
2
January 20082 Course Outline Day 1 MSCL Analyst’s Toolbox and JMP™ overview MSCL Toolbox Concepts JMP™fundamentals Lunch Affymetrix ExpressionConsole™, processing.cel files, exporting data MSCL Toolbox Demo –Data input –Basic Analysis (Master File, Final File, Data normalization, QC, PCA, ) –Gene selection, statistical tests (p-values, FDR) –Annotation Day 2 Statistical Topics (PCA, Data normalization, FDR) MSCL Analyst’sToolbox Demo (cont.) –Complex Analysis (2-way ANOVA, blocked ANOVA) –Data Visualization
3
January 20083 Topics not included Exon Array Analysis -- coming soon! SNP chip Resequencing analysis, ChIP-Chip, copy number 2-color or spotted cDNA array analysis complete JMP tutorial JMP on Mac, Linux JMP scripting language Data management commands in JMP: Stack, Split, Concatenate, Sort
4
January 20084 Why use JMP? Interactive graphics facilitates data exploration, discovery of features Powerful, > 2,00,000 rows by 100s of columns (currently, 2 GB limit) Scripting language -- object oriented, allows matrix manipulation Connects to database servers including NIHLIMS or local GCOS JMP is also general purpose statistics pack Good technical support for JMP from: (919) 677-8008 or www.jmp.com No direct cost to individual NIH users* (centrally supported in most NIH ICs) MSCL Analyst's Toolbox is FREE, adds tools for microarray studies
5
January 20085
6
6 MSCL Analyst’s Toolbox Features Menu driven Automated gene annotations Web link-out** Highly interactive, intuitive user interface Analysis pipeline, based on years of experience Familiar parametric analysis, e.g. ANOVA Exploratory Data Analysis Adaptable to new designs, analyses (e.g. Exon chips, SNP chips) Powerful, handles largest Affy chips, probe-level analysis Up to hundreds of chips at once PC, Mac or Linux desktops Support available through MSCL
7
January 20087 MSCL Analyst’s Toolbox Capabilities Connects to the central NIHLIMS database or local GCOS databases Reads in Pivot Tables from Affymetrix EC™ or GCOS™ Visualizes Principal Components Analyzes simple experiments (paired, unpaired T-tests) Analyzes complex experiments (multiple treatments, time series, linear trends, slope changes between treatments) Compensates for “batch” effects Selects and annotates significant genes Manages multiple gene lists (intersection, union, Venn diagrams) Multivariate, Cluster, Discriminant, Neural net analysis Uses dynamic visualization tools
8
January 20088 How to obtain: JMP –http://isdp.cit.nih.gov/downloads/stats.asp –Find your desktop support person at http://isdp.cit.nih.gov/information/contact_lookup_nih.asp –JMP technical support from (919) 677-8008 The MSCL Analyst's Toolbox –Download from http://affylims.cit.nih.gov –Help offered on collaborative basis by MSCL –Email questions to: munson@helix.nih.gov
9
January 20089 NIH Bioinformatics Cooperative http://affylims.cit.nih.gov
10
January 200810 Input files or Fetch data Transform and normalize Principal Components Analysis Create Master file, add treatment groups Compute statistical test, get p-values Correct for multiple comparisons or use FalseDiscoveryRate Compute log fold-change Visualize results Select relevant genes files Xform PCA Master Final MSCL Toolbox Data Pipeline:
11
January 200811 Data sources: NIHLIMS database via ODBC connection Local GCOS database via ODBC connection GCOS pivot table EC pivot table (NEW support for this option) Excel spread sheet Text files
12
January 200812 Data Input or data fetch DCEG/NCI Publish DB MSCL Publish DB client files client workstation Analyze (MAS) Process DB.dat files.cel files.chp files.rpt files Import(LM) Export(LM) Import Publish(MAS) ODBC access DMT Partek GeneSpring archive(LM) delete(LM) assume ownership(LM) Fluidics PlatformScanner CCMD Publish DB A-SCAN NIHLIMS database EC™ or GCOS™ MAS5™.txt
13
January 200813 Gene Expression Data Matrix Expression Matrix 116 Samples 1 20,000 Genes Gene Annotations Sample information
14
January 200814 Annotations for each gene Probe Set ID Genbank ID Unigene ID, Title Entrez Gene ID Cytogenetic map location Physical map location HUGO gene symbol, synonyms Functional relevance Associated literature references... GO terms for molecular process, biological function or cellular component Gene Annotations 1 20,000 Genes
15
January 200815 Annotation Files: Affymetrix annotations for each probeset have been downloaded and formatted for MSCL Toolbox, available at affylims.cit.nih.gov Annotations are updated quarterly Annotation tables may be JOINed by ProbeSetID Probe Set ID Gene Title Gene Symbol UnigeneID Transcript ID Ensembl Entrez Gene Representative Public ID First SwissProt Genome Alignment Chromosome Genome Alignment Start Address Genome Alignment Stop Address Genome Alignment Strand Chromosomal Location FinalAnnot. Final-Annot
16
January 200816 Annotating Genes Netaffx, reformatted Your data file “JOIN” on ProbeSetID
17
January 200817 Information about the Sample (transposed into MasterFile) 1 16 S amples Information about each Sample Clinical information (human) Diagnosis Demographic information Treatment (in vivo, in vitro) in designed experiment Tissue of origin Cell culture, strain, passage Sampling date/time RNA preparation protocol Operator/batch/lot/laboratory information QC information (rawQ, scale factor, 3/5-actin, 3/5-GAPDH, etc)
18
January 200818 Table formats JMP usually deals with a single Table, but… TWO tables are needed for MSCL Analyst’s Toolbox: 1. "Master File" layout –Each ROW represents a chip –Columns define treatment, replicate number, etc. 2. "Final" layout –COLUMNs correspond to chips (rows in Master File) –Each ROW is a probe set, unique identifier is probe set ID Tables are LINKED by “Shortnames” field in Master
19
January 200819 Linked Table Formats Master File -- one row per chip Final File -- one row per probe set
20
January 200820 Naming Convention for Final File Columns (prefixes) Data type: AD-, SG-, PA- Data transform: L-, Lmed-, GL-, S10- Statistical results: p-, FDR-, mean-, SFC- Column Naming Tips: –Avoid punctuation, hyphen, period, slash, etc. –Avoid spaces, use underscore “_” instead –Shorter is better –Toolbox utility available for trimming column names Column Name ITEM_NAME SG-33NH SG-33TH S10-33NH S10-33TH PA-33NH PA-33TH SFC-7 SFC-11 p-slope¢2 FDR slope¢2
21
January 200821 Input files or Fetch data Transform and normalize Principal Components Analysis Create Master file, add treatment groups Compute statistical test, get p-values Correct for multiple comparisons or use FalseDiscoveryRate Compute log fold-change Visualize results Select relevant genes Data Pipeline: files Xform PCA Master Final
22
January 200822 Data Transformation and Normalization
23
January 200823 Log(x/median x) transform (“Lmed”)
24
January 200824 Input files or Fetch data Transform and normalize Principal Components Analysis Create Master file, add treatment groups Compute statistical test, get p-values Correct for multiple comparisons or use FalseDiscoveryRate Compute log fold-change Visualize results Select relevant genes Data Pipeline: files Xform PCA Master Final
25
January 200825 Principal Components Analysis PC 1(38%) PC 2(12%)
26
January 200826 Input files or Fetch data Transform and normalize Principal Components Analysis Create Master file, add treatment groups Compute statistical test, get p-values Correct for multiple comparisons or use FalseDiscoveryRate Compute log fold-change Visualize results Select relevant genes Data Pipeline: files Xform PCA Master Final
27
January 200827 Analysis Scripts ANOVA1 T-test, unequal variance Paired t-test Consistency test ANOVA1 with blocking ANOVA2 with interaction terms (unbalanced data allowed) ANOVA2 with blocking Linear regression ANCOVA with blocking (balanced data case) ANCOVA2 with blocking (balanced data case) Other tests are easily added (requires scripting)
28
January 200828 Input files or Fetch data Transform and normalize Principal Components Analysis Create Master file, add treatment groups Compute statistical test, get p-values Correct for multiple comparisons or use FalseDiscoveryRate Compute log fold-change Visualize results Select relevant genes Data Pipeline: files Xform PCA Master Final
29
January 200829 Log(FoldChange)=“LFC” FoldChange = treated / control Log(FoldChange) = Log(treated / control) = Log(treated) - Log(control) Rule of Thumb for Base10 Logarithms: Log10(2-fold change) = 0.3 Log10(10-fold change) = 1 Log10(0.1-fold change) = -1
30
January 200830 Input files or Fetch data Transform and normalize Principal Components Analysis Create Master file, add treatment groups Compute statistical test, get p-values Correct for multiple comparisons or use FalseDiscoveryRate Compute log fold-change Visualize results Select relevant genes Data Pipeline: files Xform PCA Master Final
31
January 200831 Volcano Plot Significance of change Magnitude of change, Log Scale Selection Regions
32
January 200832 Interpreting Gene Lists FinalAnnot. Filter (FDR<10%) GeneList Significant Terms Ingenuity™, GeneGo™
33
January 200833 GO-SCAN- Gene Ontology Annotations Gene Ontology for Significant Collection of Annotations: GO-SCAN is a bioinformatics tool that selects and presents relevant Gene Ontology (GO) annotations for a gene "hit" list from an Affymetrix microarray experiment. http://goscan.cit.nih.gov/
34
January 200834 Ingenuity Pathway Analysis (Doug Joubert, NIH Library)
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.