Lawrence Hunter, Ph.D. Director, Computational Bioscience Program University of Colorado School of Medicine

Lawrence Hunter, Ph.D. Director, Computational Bioscience Program University of Colorado School of Medicine Larry.Hunter@uchsc.edu http://compbio.uchsc.edu/Hunter Microarrays Tzu Lip Phang, Ph.D. Associate Professor of Bioinformatics Division of Pulmonary Sciences and Critical Care Medicine University of Colorado School of Medicine Tzu.Phang@ucdenver.edu

The Central Dogma Transcriptome Genome

Microarrys in the Literature

Barrett, T., Wilhite, S. E., Ledoux, P., Evangelista, C., Kim, I. F., Tomashevsky, M., et al. (2012). NCBI GEO: archive for functional genomics data sets--update. Nucleic acids research, 41(D1)

Public Data Usages Preliminary Data/Results, hypothesis generation Test Algorithm Power Analysis (sample size calculation) Enhance sample size

Array technology Basic idea: Genomic material DNA/RNA hybridizes best to exactly complementary sequences. Method: – Probes are attached to a substrate in a known location – DNA/RNA in one or more samples are fluorescently labelled – samples are hybridized to probe array, excess is washed off, and fluorescence reading are taken for each position

Microarray: Primer

Array synthesis Photolithography for oligonucleotides Cost proportional to length of oligo, not number of features (genes) per chip! Many layers compared to computer chips.

Affymetrix Probe Sets (11 to 16) 25mer AAAA.. 25mer PM MM http://intermedin.stanford-edu/Arrays.ppt

Gene Expression Still most common use for microarrays Aim to determine differential expression between groups of samples e.g. disease and control Generate hypotheses about the mechanisms underlying the disease of interest

Basic Statistical Analysis

Experimental Design Biological replication is essential – Technical replication not essential except for quality control studies Pooling biological samples to reduce array variability – Increase sample size without running more chips – BUT, if individual variation is important, pooling wash out the effect Power Analysis is essential

Power Analysis How many biological replication? My experience; at least 3, preferably 5, even 7 Bioconductor: SSPA

Preprocessing Including image analysis, normalization, and data transformation Data normalization: – Remove systematic errors introduced in labeling, hybridization and scanning procedures – Correct these errors while preserve biological variability / information

Why normalization?

A different look … Technical replicate difference Average Intensity Values

To normalize or not to …

AffyComp Rafael Irizarry, Dept BioStat John Hopkins University

Statistical Testing Hypothesis Testing: Is the means of two groups different from each other – Fold Change – Student-T Test

Microarray Scatter Plot

Student-T Test

What is Multiple Comparison Testing??! GenesP-values Critical levelHo Gene 10.0001<=0.051 Gene 20.0002<=0.051 Gene 30.008<=0.051 Gene 40.009<=0.051 Gene 50.005<=0.051 Gene 60.09<=0.050 Gene 70.05<=0.050 Gene 80.09<=0.050 Gene 90.2<=0.050 Gene 100.3<=0.050 Alpha level = 0.05

When large number of tests … GenesP-values Critical levelHo Gene 10.0001<=0.051 Gene 20.0002<=0.051 Gene 30.008<=0.051 Gene 40.009<=0.051 Gene 50.005<=0.051 Gene 60.09<=0.050 …………… …………… Gene 9990.2<=0.050 Gene 10000.3<=0.050 Alpha level = 0.05 50 wrong genes …

Correction … Bonferroni GenesP-values Critical levelHo Gene 10.0001<=0.000050 Gene 20.0002<=0.000050 Gene 30.008<=0.000050 Gene 40.009<=0.000050 Gene 50.005<=0.000050 Gene 60.09<=0.000050 ……… … ……… … Gene 9990.2<=0.000050 Gene 10000.3<=0.000050 Alpha level = 0.05 / 1000 = 0.00005

Strike the balance … BonferroniNo correction False Discovery Rate Most ConservativeMost Lenient The False Discovery Rate (FDR) of a set of predictions is the expected percent of false predictions in the set of predictions. Example: If the algorithm returns 100 genes with false discovery rate of 0.3, then we should expect 70 of them to be correct

Put them together

Result Validation RT-PCR: most common method Gene levels at the borderline of differential expression – Their measurability reduce by random error For highly differentially expressed genes, having sufficient replicates would serve as validation.

Biological Interpretation

Lawrence Hunter, Ph.D. Director, Computational Bioscience Program University of Colorado School of Medicine

Similar presentations

Presentation on theme: "Lawrence Hunter, Ph.D. Director, Computational Bioscience Program University of Colorado School of Medicine"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Lawrence Hunter, Ph.D. Director, Computational Bioscience Program University of Colorado School of Medicine

Similar presentations

Presentation on theme: "Lawrence Hunter, Ph.D. Director, Computational Bioscience Program University of Colorado School of Medicine"— Presentation transcript:

Similar presentations

About project

Feedback