Presentation is loading. Please wait.

Presentation is loading. Please wait.

June 2009 1 Detecting Alternative Splicing using the Human Affymetrix Exon Array 1.0 Instructors: Jennifer Barb, Zoila Rangel, Peter Munson June 15, 2009.

Similar presentations


Presentation on theme: "June 2009 1 Detecting Alternative Splicing using the Human Affymetrix Exon Array 1.0 Instructors: Jennifer Barb, Zoila Rangel, Peter Munson June 15, 2009."— Presentation transcript:

1 June 2009 1 Detecting Alternative Splicing using the Human Affymetrix Exon Array 1.0 Instructors: Jennifer Barb, Zoila Rangel, Peter Munson June 15, 2009 Mathematical and Statistical Computing Laboratory Division of Computational Biosciences

2 June 2009 Background

3 June 2009 Gene structure Source: http://genome.wellcome.ac.uk/doc_WTD020755.html

4 June 2009 Alternative splicing 40-60% of the genome is alternatively spliced (AS) AS increases mRNA and protein diversity ~20,000 genes give rise to more than 100,000 different functioning proteins because of AS AS events account for the disparity between the number of human genes and the number of human expressed sequences (mRNAs), transcript isoforms

5 June 2009 Classic AS example of tissue specific splicing Source: http://genetics.hannam.ac.kr/note/Processing%20of%20hnRNAs.htm

6 June 2009 Different types of AS events 1/3 of all cases Both comprise 1/4 of all cases Last 4 events, represent minority of all AS cases B.J. Blencowe. Alternative splicing: new insights from global analyses. Cell, 126: 37-47, Jul 2006.

7 June 2009 Screening for alternative splicing using the exon array Investigate changes in gene expression on the isoform level Identify novel AS events, estimate occurrence of in different applications: –Tissue types –Disease states –Response to treatment –Knock out gene models –In mammalian development –Many more

8 June 2009 Types of high-throughput screening for AS 1. Next Generation Sequencing (not covered today) – builds on idea of serial analysis of gene expression (SAGE) Thorough measurement of a nucleic acid profile generating huge numbers of short sequencing reads 1. RNA-Seq 2. ChIP-Seq 3. Methyl-Seq 2. Exon Microarrays – sequence must be known prior to study Exon based probes interrogate known exons within a gene Exon splice junction probes interrogate exon-exon splice junctions and investigates idea of exon skipping

9 June 2009 Detecting alternative splicing using exon microarrays ExonHit Human GW spliceArray on Affymetrix platform –Similar to Affy Exon array except has splice junction probes Affymetrix Human Gene 1.0 ST array –Expression array offering whole transcript coverage –Uses a subset of probes from the Human exon 1.0 ST array Affymetrix Human Exon 1.0 ST array –4 probes per exon, allows for gene expression and alternative splicing detection

10 June 2009 How is the exon chip different from 3’ IVT arrays? http://www.affymetrix.com/products_services/arrays/specific/hugene_1_0_st.affx 3 different isoforms of same gene Gene

11 June 2009 The Affymetrix Human Exon 1.0 ST array Substantially higher probe density than traditional gene expression microarrays 6.5 million probes, comprising 1.4 million probesets, targeting 1.2 million exons Goal of array: target every known and predicted exon in the genome Allows for genome-wide screening of AS events of multiple genes

12 June 2009 Annotation of Exon chip

13 June 2009 Affy exon chip annotations Affy’s basic approach: 1. A variety of sources used to construct gene annotations 2.Exon probesets map to gene annotations 3.Probesets grouped together when map to same gene annotation 4.Transcript clusters (TC) closely resembles a gene Affymetrix. Exon Probeset Annotations and Transcript Cluster Groupings. Aymetrix Whitepaper Collections, pages 1-11, 2005.

14 June 2009 Affy annotation problem Genomic Location RMA Intensity 2 PSR’s in genomic region where no gene is found MYCBP gene GJA10 gene Encompasses 2 genes Contains 15 probesets

15 June 2009 Annotation problems continued Source: UCSC Genome Browser, http://genome.ucsc.edu

16 June 2009 How often does a TC include more than one gene? “Core” exon annotations downloaded from Affymetrix Expression Console (EC, Feb 2009) 287, 329 core probesets 17,583 transcript clusters 567 Transcript clusters annotated to more than one gene 629 gene symbols annotated to more than one transcript cluster **Solution: Reannotate exon chip!

17 June 2009 Reference Sequence (RefSeq) project at the NCBI Comprehensive, non-redundant set of sequences Genomic DNA, transcript RNA and protein products Stable reference for genome annotation http://www.ncbi.nlm.nih.gov/RefSeq/

18 June 2009 Steps for exon array reannotation 1. Download RefSeq database from UCSC 2. Create continuous, non-overlapping set of exons for each gene from RefSeq transcripts 3. Map Affy probesets to RefSeq exons by genomic location

19 June 2009 Analysis of exon chip

20 June 2009 Statistical software available for the analysis of exon microarrays MSCL Toolbox JMP Genomics Partek Genomics Suite Li and Wong Bioconductor Array Assist ***Very active area of development

21 June 2009 Mixed-effect, 3 factor ANOVA (test applied to each gene) Effect for alternative splicing 2 fixed, one random effect A i Treatment effect (fixed) β j(i) Sample within treatment effect (random) C k Exon effect (fixed) AC ik Treatment-exon interaction effect (fixed) ε ijk error term  LPS or control  Replicate within treatment  Exon effect within a gene  Exon*tissue interaction

22 June 2009 ANOVA table

23 June 2009 Filtering methods Pre-analysis – excluding probesets from the analysis Do not include probesets who do not reach a particular maximum intensity over all treatments Do not include probesets whose range across all treatments is low Post-analysis – filter out non-significant genes Apply a p-value cutoff filter Apply a magnitude of interaction effect filter

24 June 2009 Pre-Analysis filters “Dead” probeset Calculate maximum over all treatments (maxIntensity) Plot distribution of maxIntensity Determine first quartile of distribution of maxIntensity First quartile used as threshold for “dead” probeset MaxIntensity_Tissues “Unresponsive” probeset Calculate minimum over all treatments (minIntensity) Calculate Range by maxIntensity-minIntensity Determine first quartile of distribution of Range First quartile used as threshold for “unresponsive” probeset Range_Tissues

25 June 2009 Example of dead/absent probesets within a gene RMA intensity ControlsTreated samples Probesets whose maximum intensity across all samples never make it above a certain threshold will not be included in the analysis. Green lines represent probesets/exons never going above 3. Green represents exon/probeset with low MaxIntensity across samples

26 June 2009 Low-range probeset Each line represents an exons RMA intensity across each treatment. Y-axis is RMA Intensity value. X-axis represents different treatments used in the study. ControlsTreated samples Green represents exon/probeset with low range across samples RMA intensity

27 June 2009 Post-analysis filters Cutoff criteria of p-AC ik for treatment-tissue interaction p-AC ik < 1e -7 Cutoff criteria of maximum absolute interaction effect (maxAbsInt) AC ik maxAbsInt > 1 or 2

28 June 2009 Volcano plot showing post-analysis filter thresholds 348 AS genes

29 June 2009 Affymetrix tissue dataset (www.affymetrix.com) –11 different tissue types, 3 replicates each –Testes, breast, spleen, kidney, liver, muscle, thyroid, pancreas, heart, cerebellum and prostate LPS dataset (data from collaborative lab at the NIH) –THP1 cells infected with LPS (N=5) –uninfected THP1 cells as controls (N=4) (**THP1 cells - human acute monocytic leukemia cell line. - good biological sample for prominent inflammatory effect) Datasets

30 June 2009 Filtering on Range of LPS data Filtering on range of LPS data alone  Filters out 41,294 probesets, 25% of the data Addition of tissue dataset, allows for probeset rescue Choose to filter out “Uniformly Unresponsive” in both current datasets and anatomical dataset Filters out 25% of probesets

31 June 2009 Range of LPS dataset vs. range of tissue dataset 41,294 probesets do not pass LPS range filter

32 June 2009 Rescuing probesets who are NOT “Uniformly Unresponsive” 17,766 Uniformly Unresponsive Probesets removed 23,528 rescued probesets

33 June 2009 Exon analysis steps Data Import: 1.Obtain RMA values for exon chip from EC 2.Export Affy pivot table from EC 3.Import pivot table into JMP Formatting and annotating data in MSCL Toolbox: 1.Run ParseAffyPivot and RecodeAffyPivot scripts in MSCLtoolbox 2.Annotate exon chip using MSCLtoolbox script (RefSeq or Affy) Pre-Analysis Filters, post-analysis filters and Statistical Analysis 1.Decide threshold values for pre-analysis filters 2.Run ExonANOVA script in MSCLtoolbox 3.Investigate ExonLevel and GeneLevel output files Visualization: 1.Create overlay plot of interesting AS genes 2.View interesting genes in UCSC Genome Browser

34 June 2009 Data Analysis Flow to determine AS gene list Normalize data and import to JMP Determine first quartile of Range and maxIntensity Filter out “dead” and “unresponsive” probesets Apply statistical test 3 factor, mixed effect ANOVA Apply p-value filter and maxAbsInt filter Obtain list of AS genes Validate with RT-PCR **Annotate data (RefSeq or Affy)

35 June 2009 Run RMA analysis in EC

36 June 2009 Export pivot table with RMA values

37 June 2009 Import pivot table into JMP (text import preview)

38 June 2009 Parse Affy pivot table to create MasterFile

39 June 2009 Recode Affy pivot table to FinalTable

40 June 2009 Data Analysis Flow to determine AS gene list Normalize data and import to JMP Determine first quartile of Range and maxIntensity Filter out “dead” and “unresponsive” probesets Apply statistical test 3 factor, mixed effect ANOVA Apply p-value filter and maxAbsInt filter Obtain list of AS genes Validate with RT-PCR **Annotate data (RefSeq or Affy)

41 June 2009 Annotate Exon chip – choose applicable chip (RefSeq or Affy)

42 June 2009 Data Analysis Flow to determine AS gene list Normalize data Determine first quartile of Range and maxIntensity Filter out “dead” and “unresponsive” probesets Apply statistical test 3 factor, mixed effect ANOVA Apply p-value filter and maxAbsInt filter Obtain list of AS genes Validate with RT-PCR **Annotate data (RefSeq or Affy)

43 June 2009 Three-factor, nested mixed-effect ANOVA: ExonANOVAnested

44 June 2009 ExonANOVAnested continued Select probeset or exonID and geneID

45 June 2009 Output of ExonANOVA script

46 June 2009 Data Analysis Flow to determine AS gene list Normalize data Determine first quartile of Range and maxIntensity Filter out “dead” and “unresponsive” probesets Apply statistical test 3 factor, mixed effect ANOVA Apply p-value filter and maxAbsInt filter Obtain list of AS genes Validate with RT-PCR **Annotate data (RefSeq or Affy)

47 June 2009 Volcano Plot of AS genes

48 June 2009 Selection of AS genes using a volcano plot 36 genes selected

49 June 2009 Overlay plot of 1 gene on AS list: MMP9 gene Exon Genomic Start Location RMA Intensity

50 June 2009 Parallel Plots of AS genes

51 June 2009 Parallel plots of some AS genes

52 June 2009 MMP9 found to be alternatively spliced RMA Intensity Control LPS p<1x10 -12 and maxAbsInt = 1.3 units Exon signaling AS event

53 June 2009 Visualization of MMP9 gene

54 June 2009 54 How to obtain: JMP – http://isdp.cit.nih.gov/downloads/stats.asp – Find your desktop support person at http://isdp.cit.nih.gov/information/contact_lookup_nih.asp – JMP technical support from (919) 677-8008 The MSCL Analyst's Toolbox – Download from http://affylims.cit.nih.gov – Help offered on collaborative basis by MSCL – Email questions to: munson@helix.nih.govmunson@helix.nih.gov or barbj@mail.nih.govbarbj@mail.nih.gov


Download ppt "June 2009 1 Detecting Alternative Splicing using the Human Affymetrix Exon Array 1.0 Instructors: Jennifer Barb, Zoila Rangel, Peter Munson June 15, 2009."

Similar presentations


Ads by Google