1 baySeq homework HS analysis: Out of 7388 genes with data, 1995 genes were DE at FDR <1%, 3158 genes were DE at FDR <5% There were 3,582 genes with an.

1 baySeq homework HS analysis: Out of 7388 genes with data, 1995 genes were DE at FDR <1%, 3158 genes were DE at FDR <5% There were 3,582 genes with an average fold-change >2X (1.0 in log 2 space) 2,669 (63%) BUT HS + EtOH analysis (added 2 replicates of a new conditions): Only 1618 genes were DE (at any of the models) at FDR of 5% ??? Why so few when 3157 met this cutoff when HS was analyzed alone? baySeq paper: harder to call DE with “more complex” models

How well did baySeq do on the HS only analysis? 3158 genes FDR <0.05 (10K it on prior calc) HS log2 fold-change rep1 HS log2 fold-change rep2

3 How well did baySeq do on the HS only analysis? HS log2 fold-change rep1 HS log2 fold-change rep2 902 genes FDR >5% but fold-change >1.5X in both replicates ~50% of these: low counts Many of remaining missed due to day-to-day variation that is not accounted for without pairing the data

How well did baySeq do on the HS + EtOH analysis? 1618 genes FDR <0.05 to at least one DE model Models: NDE = 1,1,1,1,1,1 DEH = 1,1,2,2,1,1 DEE = 1,1,1,1,2,2 DEHE = 1,1,2,2,2,2 DEHE2 = 1,1,2,2,3,3

5 How well did baySeq do on the HS only analysis? But, 1391 genes with FDR > 0.05 to all DE models but at least 1.5X expression change in all 4 samples Why weren’t these identified as DE? 218 of these genes were DE when HS was analyzed ALONE.

6 Assessing sensitivity (with VLOOKUP in Excel) There were 64 known Hsf1 targets *with data* on the file. My run identified 38 of those at an FDR of 0.01 38/64  59.4% sensitivity 45 were identified at FDR of 0.05% 45/64  70% sensitivity

7 Gene X: X 1 X 2 X 3 Array 1Array 2Array 3 x coordinate y coordinate z coordinate LAST TIME:

8 4. Centroid linkage clustering ‘ centroid ’ (average vector) LAST TIME:

9 Gene X: X 1 X 2 X 3 X 4 X 5 Array 1Array 2Array 3Array 4Array 5 Gene Y: Y 1 Y 2 Y 3 Y 4 Y 5 Sometimes, want to use the weighted pearson correlation For example: if these arrays are identical, the data are over-represented 3X  (X i ) (Y i ) N S x,y =  i = 1 N XiXi  N 2  N YiYi  N 2  N

10  (X i ) (Y i ) wiwi S x,y =  i = 1 N Gene X: X 1 X 2 X 3 X 4 X 5 Array 1Array 2Array 3Array 4Array 5 Gene Y: Y 1 Y 2 Y 3 Y 4 Y 5 Sometimes, want to use the weighted pearson correlation For example: if these arrays are identical, the data are over-represented 3X -- can weight experiments i = 3,4,5 by w = 0.33 wiwi  Where w i = 1 L i k = array corr. cutoff d = Pearson distance (= 1 - P. corr) n = exponent (usually 1) XiXi  i = 1 N 2  N YiYi  N 2  N

11 Unweighted Pearson correlationWeighted Pearson correlation

12 Unweighted Pearson correlationWeighted Pearson correlation

13 Alizadeh et al. 2000 Can also cluster array experiments based on global similarity in expression

14 A B C D F E Hierarchical trees of gene expression data are analogous to phylogenetic trees Distance between genes is proportionate to the total branchlength between genes (not the distance on the y-axis) Orientation of the nodes is irrelevant …. although some clustering programs try to organize nodes in some way.

15 A B C D F E Hierarchical trees of gene expression data are analogous to phylogenetic trees Distance between genes is proportionate to the total branchlength between genes (not the distance on the y-axis) Orientation of the nodes is irrelevant …. although some clustering programs try to organize nodes in some way. C F E D A B

16 Genes involved in same cellular process are often coregulated These genes may not have the same annotation, but still function together and are thus co-expressed

17 M choose i = # of possible groups of size i composed of the objects M = M ! (M-i)! * i !

18 Advantages and Disadvantages of Hierarchical clustering Advantages: 1) Straightforward 2) Captures biological information relatively well Disadvantages: 1) Doesn ’ t give discrete clusters … need to define clusters with cutoffs 2) Hierarchical arrangement does not always represent data appropriately -- sometimes a hierarchy is not appropriate: genes can belong only to one cluster. 3) Get different clustering for different experiment sets THERE IS NO ONE PERFECT CLUSTERING METHOD

19 Partitioning (or top-down) clustering method -- Randomly split the data into k groups of equal number of genes -- Calculate the centroid of each group -- Reassign genes to the centroid to which it is most similar -- Calculate a new centroid for each group, reassign genes, etc … iterate until stable k-means clustering

20 Centroids Partitioning (or top-down) clustering method -- Randomly split the data into k groups of equal number of genes -- Calculate the centroid of each group -- Reassign genes to the centroid to which it is most similar -- Calculate a new centroid for each group, reassign genes, etc … iterate until stable k-means clustering

21 Partitioning (or top-down) clustering method -- Randomly split the data into k groups of equal number of genes -- Calculate the centroid of each group -- Reassign genes to the centroid to which it is most similar -- Calculate a new centroid for each group, reassign genes, etc … iterate until stable k-means clustering What are the disadvantages of k-means clustering?

22 Partitioning (or top-down) clustering method -- Randomly split the data into k groups of equal number of genes -- Calculate the centroid of each group -- Reassign genes to the centroid to which it is most similar -- Calculate a new centroid for each group, reassign genes, etc … iterate until stable k-means clustering What are the disadvantages of k-means clustering? - Need to know how many clusters to ask for (can define this empirically) - Genes are not organized within each cluster (can hierarchically cluster genes afterwards or use SOM analysis) - Random process makes this an indeterminate method

1 baySeq homework HS analysis: Out of 7388 genes with data, 1995 genes were DE at FDR <1%, 3158 genes were DE at FDR <5% There were 3,582 genes with an.

Similar presentations

Presentation on theme: "1 baySeq homework HS analysis: Out of 7388 genes with data, 1995 genes were DE at FDR <1%, 3158 genes were DE at FDR <5% There were 3,582 genes with an."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 baySeq homework HS analysis: Out of 7388 genes with data, 1995 genes were DE at FDR <1%, 3158 genes were DE at FDR <5% There were 3,582 genes with an.

Similar presentations

Presentation on theme: "1 baySeq homework HS analysis: Out of 7388 genes with data, 1995 genes were DE at FDR <1%, 3158 genes were DE at FDR <5% There were 3,582 genes with an."— Presentation transcript:

Similar presentations

About project

Feedback