Presentation is loading. Please wait.

Presentation is loading. Please wait.

Microarray II. What is a microarray Microarray Experiment RT-PCR LASER DNA “Chip” High glucose Low glucose.

Similar presentations


Presentation on theme: "Microarray II. What is a microarray Microarray Experiment RT-PCR LASER DNA “Chip” High glucose Low glucose."— Presentation transcript:

1 Microarray II

2 What is a microarray

3 Microarray Experiment RT-PCR LASER DNA “Chip” High glucose Low glucose

4 Raw data – images Red (Cy5) dot – overexpressed or up-regulated Green (Cy3) dot – underexpressed or down- regulated Yellow dot –equally expressed Intensity - “absolute” level cDNA plotted microarray

5 Levels of analysis Level 1: Which genes are induced / repressed? Gives a good understanding of the biology Methods: Factor-2 rule, t-test. Level 2: Which genes are co-regulated? Inference of function. -Clustering algorithms. Level 3: Which genes regulate others? Reconstruction of networks. - Transcriptions factor binding sites.

6 Level 1 2-fold rule: Is a gene 2-fold up (or down) regulated? Students t-test: Is the regulation significantly different from background variation? (Needs repeated measurements)

7 T-test X ~ N(  ), Cannot reject H 0 Reject H 0 The p-value is the probability of drawing the wrong conclusion by rejecting a null hypothesis 

8 Multiple testing In a microarray experiment, we perform 1 test / gene Prob (correct) = 1 -  c Prob (globally correct) = (1 –  c  n Prob (wrong somewhere) = 1 - (1 –  c  n  e = 1 - (1 –  c  n For small  e :  c   e  n Bonferroni correction

9 Multiple Experiments: Time course (Chu et al) Explore changes in gene expression during a biological process. Extract mRNA at time points 0, 0.5, 2, 5, 7, 9, and 11 hours and wish to compare expression profiles across time points. Compensate for array variability by using the 0 time point as common reference (green channel).

10 Experiment: time course Time 0 Genes Sample annotations Gene annotations Intensity (Red) Intensity (Green)

11 Experiment: time course Time 0.5 Genes Intensity (Red) Intensity (Green) Time 0

12 Experiment: time course Genes 0 0 0.5 0 2 0 5 0 7 0 9 0 11 0 Time (hours)

13 Gene expression database Genes Gene expression levels Samples Sample annotations Gene annotations Gene expression matrix

14 Gene expression database Samples Genes Gene expression matrix Timeseries, Conditions A, B, … Mutants in genes a, b … Etc.

15 Data normalization expression of gen x in experiment i expression of gen x in reference Logarithm of ratio - treats induction and repression of identical magnitude as numerical equal but with opposite sign. red/green - ratio of expression – 2 - 2x overexpressed – 0.5 - 2x underexpressed log 2 ( red/green ) - “log ratio” – 1 2x overexpressed – -1 2x underexpressed

16 Analysis of multiple experiments Expression of gene x in m experiments can be represented by an expression vector with m elements Z-transformation: If X ~ N(  ),

17 Clustering Hierachical clustering: - Transforms n (genes) * m (experiments) matrix into a diagonal n * n similarity (or distance) matrix Similarity (or distance) measures: Euclidic distance Pearsons correlation coefficent Eisen et al. 1998 PNAS 95:14863-14868

18 Most Common Minkowski Metrics

19 An Example 4 3 x y

20 Similarity Measures: Correlation Coefficient

21 Time Gene A Gene B Gene A Time Gene B Expression Level Time Gene A Gene B

22 Clustering of Genes and Conditions Unsupervised: –Hierarchical clustering –K-means clustering –Self Organizing Maps (SOMs)

23 Clustering Hierachical clustering: - Transforms n (genes) * m (experiments) matrix into a diagonal n * n similarity (or distance) matrix Similarity (or distance) measures: Euclidic distance Pearsons correlation coefficent Eisen et al. 1998 PNAS 95:14863-14868

24 Distance Measures: Minkowski Metric r r m i ii m m yxyxd yyyy xxxx myx ||),( )( )( 1 21 21      by defined is metric Minkowski The :features have both and objects two Suppose  

25 Most Common Minkowski Metrics ||max),( ||),( 1 ||),( 2 1 1 2 2 1 ii m i m i ii m i ii yxyxd r yxyxd r yxyxd r            )distance sup"(" 3, distance) (Manhattan 2, ) distance (Euclidean 1,

26 An Example 4 3 x y

27 Similarity Measures: Correlation Coefficient. and :averages )()( ))(( ),( 1 1 1 1 11 22 1           m i i m m i i m m i m i ii m i ii yyxx yyxx yyxx yxs

28 Similarity Measures: Correlation Coefficient Time Gene A Gene B Gene A Time Gene B Expression Level Time Gene A Gene B

29 Distance-based Clustering Assign a distance measure between data Find a partition such that: –Distance between objects within partition (i.e. same cluster) is minimized –Distance between objects from different clusters is maximized Issues: –Requires defining a distance (similarity) measure in situation where it is unclear how to assign it –What relative weighting to give to one attribute vs another? –Number of possible partition is super-exponential

30 Clustering of Genes and Conditions Unsupervised: –Hierarchical clustering –K-means clustering –Self Organizing Maps (SOMs)

31 Ordered dendrograms Hierachical clustering: Hypothesis: guilt-by-association Common regulation -> common function Eisen98

32 Hierarchical Clustering Techniques At the beginning, each object (gene) is a cluster. In each of the subsequent steps, two closest clusters will merge into one cluster until there is only one cluster left.

33 Hierarchical Clustering Given a set of n items to be clustered, and an n*n distance (or similarity) matrix, the basic process hierarchical clustering is this: 1.Start by assigning each item to its own cluster, so that if you have n items, you now have n clusters, each containing just one item. Let the distances (similarities) between the clusters equal the distances (similarities) between the items they contain. 2.Find the closest (most similar) pair of clusters and merge them into a single cluster, so that now you have one less cluster. 3.Compute distances (similarities) between the new cluster and each of the old clusters. 4.Repeat steps 2 and 3 until all items are clustered into a single cluster of size N.

34 Merge two clusters by: Single-Link Method / Nearest Neighbor (NN): minimum of pairwise dissimilarities Complete-Link / Furthest Neighbor (FN): maximum of pairwise dissimilarities Unweighted Pair Group Method with Arithmetic Mean (UPGMA): average of pairwise dissimilarities

35 Single-Link Method Diagonal n*n distance Matrix Euclidean Distance b a cd (1) cd a,b (2) a,b,c d (3) a,b,c,d

36 Complete-Link Method b a Distance Matrix Euclidean Distance (1) (2) (3) a,b ccd d c,d a,b,c,d

37 Compare Dendrograms 2 4 6 0 Single-LinkComplete-Link

38 Serum stimulation of human fibroblasts (24h) Cholesterol biosynthesis Celle cyclus I-E response Signalling/ Angiogenesis Wound healning

39 Partitioning k-means clustering Self organizing maps (SOMs)

40 k-means clustering Tavazoie et al. 1999 Nature Genet. 22:281-285

41 k-Means Clustering Algorithm 1) Select an initial partition of k clusters 2) Assign each object to the cluster with the closest centre 3) Compute the new centres of the clusters 4) Repeat step 2 and 3 until no object changes cluster

42

43 1. centroide

44 2. centroide 3. centroide 4. centroide 5. centroide 6. centroide k = 6

45 1. centroide 2. centroide 3. centroide 5. centroide 6. centroide k = 6

46 1. centroide 2. centroide 3. centroide 4. centroide 5. centroide 6. centroide k = 6

47 Self organizing maps Tamayo et al. 1999 PNAS 96:2907-2912

48

49 1. centroide2. centroide3. centroide 4. centroide 5. centroide6. centroide k = (2,3) = 6

50 k = 6

51

52

53 Cluster Co-regulation (DeRisi et al, 1997)

54 Cluster of co-expressed genes, pattern discovery in regulatory regions 600 basepairs Expression profiles Upstream regions Retrieve Pattern over-represented in cluster

55 Some Discovered Patterns Pattern Probability ClusterNo.Total ACGCG 6.41E-3996751088 ACGCGT 5.23E-389452 387 CCTCGACTAA 5.43E-382718 23 GACGCG 7.89E-318640 284 TTTCGAAACTTACAAAAAT 2.08E-292614 18 TTCTTGTCAAAAAGC 2.08E-292614 18 ACATACTATTGTTAAT 3.81E-282213 18 GATGAGATG 5.60E-286824 83 TGTTTATATTGATGGA 1.90E-272413 18 GATGGATTTCTTGTCAAAA 5.04E-271812 18 TATAAATAGAGC 1.51E-262713 18 GATTTCTTGTCAAA 3.40E-262012 18 GATGGATTTCTTG 3.40E-262012 18 GGTGGCAA 4.18E-264020 96 TTCTTGTCAAAAAGCA 5.10E-262913 18 Vilo et al. 2001

56 Results Over 6000 “interesting” patterns Many from homologous upstreams - removed –Leaves 1500 patterns These patterns clustered into 62 groups –Found alignments, consensus, and profiles Of 62 clusters - 48 had patterns matching SCPD (experimentally mapped) binding site database

57 The " GGTGGCAA " Cluster

58 Clustering and promoter elements Harmer et al. 2000 Science 290:2110-2113

59 Two sided Clustering

60  -Deletion mutations Vector Chromosomes Homologous recombination

61 Transcriptional profiling of mutants  -Mutants Genes

62 Microarray and cancer Alizadeh et al. 2000 Nature 403:505-5011

63 Diffuse large B-cell lymphoma

64 Human tumor patient and normal cells; various conditions Cluster genes across tumors Classify tumors according to genes

65

66 Regulatory pathways: KEGG

67 Regulatory pathway reconstruction Ideker et al Science 2001

68

69 Perturbations Selected genes are deleted. RNA is extracted from  -strains and from WT under +/- Galactose conditions Repeated measurements enable estimation of statistical significance Compare data – model –Design new experiments Clustering : Self Organizing Maps Protein – mRNA correlations Network correlations –Protein-DNA (Promoter analysis) –Protein-Protein

70

71 Correlation mRNA – protein levels Mass-spectrometry

72 ICAT reagent Isotope coded affinity tags

73 ICAT procedure

74

75 Mapping of gene expression changes onto interaction network Yellow: Protein-DNA Blue: Protein-protein

76 Hierarchical clustering of  -perturbations

77 Conclusion Significance Database (matrix), data normalization Distances HCL, SOM, k-means Two-sided clustering Promoter elements Metabolic / regulatory pathways Deletion mutants ICAT technology; MS/MS


Download ppt "Microarray II. What is a microarray Microarray Experiment RT-PCR LASER DNA “Chip” High glucose Low glucose."

Similar presentations


Ads by Google