Presentation is loading. Please wait.

Presentation is loading. Please wait.

Mathematical Statistics, Centre for Mathematical Sciences

Similar presentations


Presentation on theme: "Mathematical Statistics, Centre for Mathematical Sciences"— Presentation transcript:

1 Mathematical Statistics, Centre for Mathematical Sciences
cDNA Microarrays - an introduction Henrik Bengtsson Bioinformatics Group Mathematical Statistics, Centre for Mathematical Sciences Lund University

2 Outline The Genomic Code The Central Dogma of Biology
The cDNA Microarray Technique Data Analysis of cDNA Microarray Data Statistical Problems Take-home message

3 The Genomic Code 120.000 genes ? 80.000 genes ? 35.000 genes ? or ?
22+1 chromosome pairs genes ? genes ? genes ? or ? bp

4 The Central Dogma of Biology
DNA RNA Protein transcription translation CCTGAGCCAACTATTGATGAA PEPTIDE CCUGAGCCAACUAUUGAUGAA

5 The cDNA Microarray Technique
High-throughput measuring gene expressions at the same time Identify genes that behaves different in different cell populations - tumor cells vs healthy cells - brain cells vs liver cells - same tissue different organisms Time series experiments - gene expressions over time after treatment ...

6 Example of a cDNA Microarray

7 Overview cDNA clones (probes) printing Hybridize RNA Tumor sample cDNA
PCR product amplification purification printing 0.1nl / spot excitation red laser green laser emission scanning Hybridize RNA Tumor sample cDNA Reference sample overlay images and normalise microarray analysis

8 Creating the slides

9

10 RNA Extraction & Hybridization
Hybridize RNA Tumor sample cDNA Reference sample

11 Scanning & Image Analysis

12 Data Output

13 16-bit TIFF files (Rfg, Rbg), (Gfg, Gbg) R, G Biological question
Differentially expressed genes Sample class prediction etc. Testing Biological verification and interpretation Microarray experiment Estimation Experimental design Image analysis Normalization Clustering Discrimination R, G 16-bit TIFF files (Rfg, Rbg), (Gfg, Gbg)

14 Data Transformation “Observed” data {(R,G)}n=1..5184:
R = red channel signal G = green channel signal (background corrected or not) Transformed data {(M,A)}n= : M = log2(R/G) (ratio), A = log2(R·G)1/2 = 1/2·log2(R·G) (intensity signal)  R=(22A+M)1/2, G=(22A-M)1/2

15 Normalization Biased towards the green channel & Intensity dependent artifacts

16 Replicated measurements
Scaled print-tip normalization Median Absolute Deviation (MAD) Scaling Averaging

17 Identification of differentially expressed genes
Extreme in M values? ...or extreme in some other statistics? Extreme in T values?

18 List of genes that the biologist can understand and verify with other experiments
Gene: Mavg Aavg T SE ...

19 Time Course Gene Expression Profiles

20 Statistical Problems Image analysis - what is foreground? - what is background? Quality - which spots can we trust? - which slides can we trust? Artifacts from preparing the RNA, the printing, the scanning etc. Data cleanup Normalization within an experiment: - when few genes change. - when many genes change. - dye-swap to minimize dye effects. Normalization between experiments: - location and scale effects. What is noise and what is variability? Which genes are actually up- and down regulated? P-values. Planning of experiments: - what is best design? - what is an optimal sample sizes? Classification: - of samples. - of genes. Clustering: - of samples. - of genes. Time course experiments. Gene networks. - identification of pathways ...

21 Total microarray articles indexed in Medline
100 200 300 400 500 600 (projected) Year Number of papers

22 Acknowledgments/Collaborators
Statistics Dept, UC Berkeley: Sandrine Dudoit Terry Speed Yee Hwa Yang Oncology Dept, Lund University: Pär-Ola Bendahl Åke Borg Johan Vallon-Christersson Enerst Gallo Research Inst., California: Monica Moore Karen Berger Endocrinology, Lund University, Malmö: Leif Groop Peter Almgren Lawrence Berkeley National Laboratory: Saira Mian Matt Callow Mathematical Statistics, Chalmers University: Olle Nerman Staffan Nilsson Dragi Anevski CSIRO Image Analysis Group, Melbourne: Michael Buckley

23 Take-home message Bioinformatics is the future!
More educated people are needed! Statistics is fun when it is applied! Master’s thesis project? Talk to us!

24 Finding genes in DNA sequence
“This is one of the most challenging and interesting problems in computational biology at the moment. With so many genomes being sequenced so rapidly, it remains important to begin by identifying genes computationally.” – Terry Speed.

25 The Central Dogma of Biology
Challenges: Sequencing Fragment assembly Gene finding Linkage analysis etc Homology searches Annotation DNA RNA Protein transcription translation Isolation Sequencing RNA structure prediction Gene expression: microarrays etc Protein structure prediction Protein folding Homology searches Functional pathways Annotation


Download ppt "Mathematical Statistics, Centre for Mathematical Sciences"

Similar presentations


Ads by Google