Download presentation
Presentation is loading. Please wait.
Published byOctavia Flynn Modified over 9 years ago
1
1 baySeq homework HS analysis: Out of 7388 genes with data, 1995 genes were DE at FDR <1%, 3158 genes were DE at FDR <5% There were 3,582 genes with an average fold-change >2X (1.0 in log 2 space) 2,669 (63%) BUT HS + EtOH analysis (added 2 replicates of a new conditions): Only 1618 genes were DE (at any of the models) at FDR of 5% ??? Why so few when 3157 met this cutoff when HS was analyzed alone? baySeq paper: harder to call DE with “more complex” models
2
How well did baySeq do on the HS only analysis? 3158 genes FDR <0.05 (10K it on prior calc) HS log2 fold-change rep1 HS log2 fold-change rep2
3
3 How well did baySeq do on the HS only analysis? HS log2 fold-change rep1 HS log2 fold-change rep2 902 genes FDR >5% but fold-change >1.5X in both replicates ~50% of these: low counts Many of remaining missed due to day-to-day variation that is not accounted for without pairing the data
4
How well did baySeq do on the HS + EtOH analysis? 1618 genes FDR <0.05 to at least one DE model Models: NDE = 1,1,1,1,1,1 DEH = 1,1,2,2,1,1 DEE = 1,1,1,1,2,2 DEHE = 1,1,2,2,2,2 DEHE2 = 1,1,2,2,3,3
5
5 How well did baySeq do on the HS only analysis? But, 1391 genes with FDR > 0.05 to all DE models but at least 1.5X expression change in all 4 samples Why weren’t these identified as DE? 218 of these genes were DE when HS was analyzed ALONE.
6
6 Assessing sensitivity (with VLOOKUP in Excel) There were 64 known Hsf1 targets *with data* on the file. My run identified 38 of those at an FDR of 0.01 38/64 59.4% sensitivity 45 were identified at FDR of 0.05% 45/64 70% sensitivity
7
7 Gene X: X 1 X 2 X 3 Array 1Array 2Array 3 x coordinate y coordinate z coordinate LAST TIME:
8
8 4. Centroid linkage clustering ‘ centroid ’ (average vector) LAST TIME:
9
9 Gene X: X 1 X 2 X 3 X 4 X 5 Array 1Array 2Array 3Array 4Array 5 Gene Y: Y 1 Y 2 Y 3 Y 4 Y 5 Sometimes, want to use the weighted pearson correlation For example: if these arrays are identical, the data are over-represented 3X (X i ) (Y i ) N S x,y = i = 1 N XiXi N 2 N YiYi N 2 N
10
10 (X i ) (Y i ) wiwi S x,y = i = 1 N Gene X: X 1 X 2 X 3 X 4 X 5 Array 1Array 2Array 3Array 4Array 5 Gene Y: Y 1 Y 2 Y 3 Y 4 Y 5 Sometimes, want to use the weighted pearson correlation For example: if these arrays are identical, the data are over-represented 3X -- can weight experiments i = 3,4,5 by w = 0.33 wiwi Where w i = 1 L i k = array corr. cutoff d = Pearson distance (= 1 - P. corr) n = exponent (usually 1) XiXi i = 1 N 2 N YiYi N 2 N
11
11 Unweighted Pearson correlationWeighted Pearson correlation
12
12 Unweighted Pearson correlationWeighted Pearson correlation
13
13 Alizadeh et al. 2000 Can also cluster array experiments based on global similarity in expression
14
14 A B C D F E Hierarchical trees of gene expression data are analogous to phylogenetic trees Distance between genes is proportionate to the total branchlength between genes (not the distance on the y-axis) Orientation of the nodes is irrelevant …. although some clustering programs try to organize nodes in some way.
15
15 A B C D F E Hierarchical trees of gene expression data are analogous to phylogenetic trees Distance between genes is proportionate to the total branchlength between genes (not the distance on the y-axis) Orientation of the nodes is irrelevant …. although some clustering programs try to organize nodes in some way. C F E D A B
16
16 Genes involved in same cellular process are often coregulated These genes may not have the same annotation, but still function together and are thus co-expressed
17
17 M choose i = # of possible groups of size i composed of the objects M = M ! (M-i)! * i !
18
18 Advantages and Disadvantages of Hierarchical clustering Advantages: 1) Straightforward 2) Captures biological information relatively well Disadvantages: 1) Doesn ’ t give discrete clusters … need to define clusters with cutoffs 2) Hierarchical arrangement does not always represent data appropriately -- sometimes a hierarchy is not appropriate: genes can belong only to one cluster. 3) Get different clustering for different experiment sets THERE IS NO ONE PERFECT CLUSTERING METHOD
19
19 Partitioning (or top-down) clustering method -- Randomly split the data into k groups of equal number of genes -- Calculate the centroid of each group -- Reassign genes to the centroid to which it is most similar -- Calculate a new centroid for each group, reassign genes, etc … iterate until stable k-means clustering
20
20 Centroids Partitioning (or top-down) clustering method -- Randomly split the data into k groups of equal number of genes -- Calculate the centroid of each group -- Reassign genes to the centroid to which it is most similar -- Calculate a new centroid for each group, reassign genes, etc … iterate until stable k-means clustering
21
21 Partitioning (or top-down) clustering method -- Randomly split the data into k groups of equal number of genes -- Calculate the centroid of each group -- Reassign genes to the centroid to which it is most similar -- Calculate a new centroid for each group, reassign genes, etc … iterate until stable k-means clustering What are the disadvantages of k-means clustering?
22
22 Partitioning (or top-down) clustering method -- Randomly split the data into k groups of equal number of genes -- Calculate the centroid of each group -- Reassign genes to the centroid to which it is most similar -- Calculate a new centroid for each group, reassign genes, etc … iterate until stable k-means clustering What are the disadvantages of k-means clustering? - Need to know how many clusters to ask for (can define this empirically) - Genes are not organized within each cluster (can hierarchically cluster genes afterwards or use SOM analysis) - Random process makes this an indeterminate method
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.