Presentation is loading. Please wait.

Presentation is loading. Please wait.

Biological question Differentially expressed genes Sample class prediction etc. Testing Biological verification and interpretation Microarray experiment.

Similar presentations


Presentation on theme: "Biological question Differentially expressed genes Sample class prediction etc. Testing Biological verification and interpretation Microarray experiment."— Presentation transcript:

1 Biological question Differentially expressed genes Sample class prediction etc. Testing Biological verification and interpretation Microarray experiment Estimation Experimental design Image analysis Normalization Clustering Discrimination Churchill, March 15 Bult, Lecture 5 Bult, Lecture 6 Hibbs, Lectures 10 and 11 Blake, Lecture 16 and 17

2 Project Steps Find and Download Array Data Normalize Array Data Analyze Data – i.e., generate gene lists Differentially expressed genes, genes in clusters, etc. Interpret Gene Lists – Use the annotations of genes in your lists Gene Ontology terms are available for many organisms, but not all

3 Getting The Data Search GEO (or whatever) for a data set of interest. Download the data files – e.g., Affy.CEL files, Affy.CDF files, etc. Upload to home directory

4 Normalize the Data Sent you all a script (2/23/2012) to RMA normalize the Ackerman array data available from my home directory

5 library(affy) library(makecdfenv) Array.CDF=make.cdf.env(“MoGene-1_0-st-v1.cdf”) CELData=ReadAffy() CELData@cdfName=“Array.CDF” rma.CELData = rma(CELData) rma.expr = exprs(rma.CELData) rma.expr.df = data.frame(ProbeID=row.names(rma.expr),rma.expr) write.table(rma.expr.df,"rma.expr.dat",sep="\t",row=F,quote=F)

6 What is a library? What does the ReadAffy() function do?What are possible arguments for the ReadAffy() function? What class of R object is rma.CELData? What class of R object is rma.expr? What class of R object is rma.expr.df?

7 slotNames(CELData) phenoData(CELData)

8 This is what rma.expr.df looks like in Excel……

9 Plotting summarized probeset intensities across the Ackerman arrays….(non normalized) jpeg("boxplot.jpeg") boxplot(CELData, names=CELData$sample, col="blue") dev.off()

10 mydata=rma.expr.df jpeg("normal_boxplot.jpg") boxplot(mydata[-1], main = "Normalized Intensities", xlab="Array", ylab="Intensities", col="blue") dev.off() Plotting summarized probeset intensities across the Ackerman arrays….(normalized)

11 Next time Posted articles from Gary Churchill. – If you only read one article, read Churchill 2004 – See also Gary’s web site: http://churchill.jax.org/software/rmaanova.shtml – Look at Sample Data and Tutorial After that lecture we will begin analysis of microarray data – MAANOVA

12

13 Gigabases Cost per Kb Lucinda Fulton, The Genome Center at Washington University CostThroughput

14 Sequencing Technologies http://www.geospiza.com/finchtalk/uploaded_images/plates-and-slides-718301.png

15 Sequence “Space” Roche 454 – Flow space – Measure pyrophosphate released by a nucleotide when it is added to a growing DNA chain – Flow space describes sequence in terms of these base incorporations – http://www.youtube.com/watch?v=bFNjxKHP8Jc http://www.youtube.com/watch?v=bFNjxKHP8Jc AB SOLiD – Color space – Sequencing by DNA ligation via synthetic DNA molecules that contain two nested known bases with a flouorescent dye – Each base sequenced twice – http://www.youtube.com/watch?v=nlvyF8bFDwM&feature=related http://www.youtube.com/watch?v=nlvyF8bFDwM&feature=related Illumina/Solexa – Base space – Single base extentions of fluorescent-labeled nucleotides with protected 3 ‘ OH groups – Sequencing via cycles of base addition/detection followed deprotection of the 3’ OH – http://www.youtube.com/watch?v=77r5p8IBwJk&feature=related http://www.youtube.com/watch?v=77r5p8IBwJk&feature=related GenomeTV – Next Generation Sequencing (lecture) – http://www.youtube.com/watch?v=g0vGrNjpyA8&feature=related http://www.youtube.com/watch?v=g0vGrNjpyA8&feature=related http://finchtalk.geospiza.com/2008/03/color-space-flow-space-sequence-space_23.html

16 “Standard” File formats Sequence containers FASTA FASTQ BAM/SAM Alignments BAM/SAM MAF Annotation BED GFF/GTF/GFF3 WIG Variation VCF GVF

17 Tools Alignments BLAST: not for NGS BWA Bowtie Maq … Transcriptomics Tophat Cufflinks … Variant calling ssahaSNP Mosaic … Counting (Chip-Seq, etc) FindPeaks PeakSeq

18 FASTQ: Data Format FASTQ – Text based – Encodes sequence calls and quality scores with ASCII characters – Stores minimal information about the sequence read – 4 lines per sequence Line 1: begins with @; followed by sequence identifier and optional description Line 2: the sequence Line 3: begins with the “+” and is followed by sequence identifiers and description (both are optional) Line 4: encoding of quality scores for the sequence in line 2 References/Documentation – http://maq.sourceforge.net/fastq.shtml http://maq.sourceforge.net/fastq.shtml – Cock et al. (2009). Nuc Acids Res 38:1767-1771.

19 FASTQ Example FASTQ example from: Cock et al. (2009). Nuc Acids Res 38:1767-1771. For analysis, it may be necessary to convert to the Sanger form of FASTQ…For example, Illumina stores quality scores ranging from 0-62; Sanger quality scores range from 0-93. Solexa quality scores have to be converted to PHRED quality scores.

20 SAM (Sequence Alignment/Map) It may not be necessary to align reads from scratch…you can instead use existing alignments in SAM format – SAM is the output of aligners that map reads to a reference genome – Tab delimited w/ header section and alignment section Header sections begin with @ (are optional) Alignment section has 11 mandatory fields – BAM is the binary format of SAM http://samtools.sourceforge.net/

21 http://samtools.sourceforge.net/SAM1.pdf Mandatory Alignment Fields

22 http://samtools.sourceforge.net/SAM1.pdf Alignment Examples Alignments in SAM format

23 chr18611426586116346nsv433165 chr218417741846089nsv433166 chr1629504462955264nsv433167 chr171435038714351933nsv433168 chr173283169432832761nsv433169 chr173283169432832761nsv433170 chr186188055061881930nsv433171 chr11675982916778548chr1:21667704270866- chr11676319416784844chr1:146691804407277+ chr11676319416784844chr1:144004664408925- chr11676319416779513chr1:142857141291416- chr11676319416779513chr1:143522082293473- chr11676319416778548chr1:146844175284555- chr11676319416778548chr1:147006260284948- chr11676341116784844chr1:144747517405362+ Valid BED files

24 Galaxy http://main.g2.bx.psu.edu/ See Tutorial 1 Build and share data and analysis workflows No programming experience required Strong and growing development and user community

25 Tools HistoryDialog/Parameter Selection

26 Tutorial Web Site http://www.ncbi.nlm.nih.gov/staff/church/GenomeAnalysis/index.shtml Tutorial 5

27 RNA Seq Workflow Convert data to FASTQ Upload files to Galaxy Quality Control – Throw out low quality sequence reads, etc. Map reads to a reference genome – Many algorithms available – Trade off between speed and sensitivity Data summarization – Associating alignments with genome annotations – Counts Data Visualization Statistical Analysis

28 Typical RNA_Seq Project Work Flow Sequencing Tissue Sample Cufflinks TopHat FASTQ file QC Gene/Transcript/Exon Expression Visualization Total RNA mRNA cDNA Statistical Analysis JAX Computational Sciences Service

29 TopHat Trapnell et al. (2009). Bioinformatics 25:1105-1111. http://tophat.cbcb.umd.edu/ Figure from: Trapnell et al. (2010). Nature Biotechnology 28:511-515. TopHat is a good tool for aligning RNA Seq data compared to other aligners (Maq, BWA) because it takes splicing into account during the alignment process.

30 Trapnell C et al. Bioinformatics 2009;25:1105-1111 TopHat is built on the Bowtie alignment algorithm.

31 Cufflinks Trapnell et al. (2010). Nature Biotechnology 28:511-515. http://cufflinks.cbcb.umd.edu/ Assembles transcripts, Estimates their abundances, and Tests for differential expression and regulation in RNA-Seq samples


Download ppt "Biological question Differentially expressed genes Sample class prediction etc. Testing Biological verification and interpretation Microarray experiment."

Similar presentations


Ads by Google