Presentation is loading. Please wait.

Presentation is loading. Please wait.

COMBO-17 Galaxy Dataset Colin Holden COSC 4335 April 17, 2012.

Similar presentations


Presentation on theme: "COMBO-17 Galaxy Dataset Colin Holden COSC 4335 April 17, 2012."— Presentation transcript:

1 COMBO-17 Galaxy Dataset Colin Holden COSC 4335 April 17, 2012

2 Contains data on 3,462 objects which have been classified as Galaxies in the Chandra Deep Field South which is basically a patch of sky that lies in the Fornax constellation. There is 65 columns of data in this dataset ranging from luminosities in 10 different bands of the spectrum to size and brightness. However the website mentions how a vast majority of these attributes are redundant and not independent. Focusing on three main attributes of this dataset. – Total R (red band) magnitude is a measure of brightness of the galaxy. These are done in inverted logarithmic measurements. So a galaxy with R=21 is 100 more times brighter then one with R=26. – ApDRmag is the difference between the total and aperture magnitude in the R band. This is a rough measure of the size of the galaxy. – rsMAG which is the magnitude of the vector coming from the galaxy. Roughly a vector measurement of distance.

3 NrRmagApDRmagrsMAGe.rsMAGUbMAGe.UbMAGBbMAGe.BbMAGVnMAGe.VbMAGS280MAGe.S280MAW420FEe.W420FE 624.9950.935-17.970.25-17.760.14-17.530.25-17.760.25-18.220.176.60E-043.85E-03 925.013-0.135-18.430.55-18.360.22-17.850.55-18.190.55-17.970.543.24E-043.19E-03 1624.2460.821-20.710.14-19.820.14-19.890.14-20.40.14-19.770.121.30E-024.11E-03 2125.2030.639-17.890.31-17.920.17-17.380.31-17.670.31-18.120.281.19E-022.70E-03 2625.504-1.588-19.880.83-17.760.42-18.350.83-19.370.83-13.9345.111.35E-033.71E-03 2923.74-1.636-19.051.37-19.30.16-18.081.37-18.691.37-19.180.413.24E-033.02E-03 4525.7060.199-16.391.94-17.190.3-16.051.94-16.221.94-17.810.398.98E-032.88E-03 4925.139-0.31-17.321.81-16.950.44-16.461.81-17.011.81-14.3714.194.36E-034.26E-03 5024.6990.268-18.10.32-17.760.15-17.660.15-17.860.32-17.860.231.44E-023.84E-03 5124.8490.399-11.60.14-11.730.16-11.130.19-11.310.14-12.220.212.00E-022.83E-03 6025.3090.03-17.930.64-17.680.23-16.960.64-17.580.64-17.750.494.52E-033.27E-03 6224.0910.098-14.680.12-13.840.17-13.970.15-14.410.12-13.850.384.75E-033.55E-03 6425.2190.316-18.970.28-18.660.28-18.480.28-18.750.28-18.740.197.46E-033.26E-03 6626.2690.672-12.190.31-11.030.6-10.281.08-11.810.31-10.162.931.45E-035.15E-03 7123.596-0.084-19.570.13-17.820.11-18.180.11-19.110.13-17.320.252.75E-033.17E-03 7223.204-0.026-20.510.15-20.60.15-20.130.15-20.290.15-20.960.114.87E-023.31E-03 7525.161-0.028-16.472.16-17.970.28-16.122.16-16.32.16-18.660.347.01E-033.68E-03 8322.884-0.097-19.930.14-19.910.1-19.410.14-19.630.14-20.240.114.31E-023.77E-03 8424.346-0.046-18.810.22-18.210.14-18.10.14-18.580.22-18.310.21.18E-023.39E-03 8725.4530.159-18.280.42-17.860.23-17.790.42-18.060.42-18.420.343.58E-032.93E-03 8825.9110.787-18.430.35-17.410.22-17.560.35-18.110.35-17.630.377.98E-034.69E-03 8926.0040.662-17.530.66-17.470.24-16.960.66-17.290.66-16.980.595.15E-032.65E-03 9126.8030.471-19.680.47-16.740.56-18.130.47-19.170.477.66E-043.34E-03 9525.204-1.157-20.130.53-18.120.38-18.630.53-19.630.53-18.011.194.54E-042.87E-03 9725.3570.484-17.840.38-17.730.2-17.470.38-17.670.38-18.290.251.26E-024.32E-03 10524.1170.066-16.750.42-16.740.22-16.590.12-16.830.12-17.080.221.80E-024.11E-03 10726.1080.807-15.831.58-16.350.31-16.370.31-15.541.58-16.640.418.76E-033.39E-03 10824.909-0.012-18.140.33-17.170.18-17.190.18-17.850.33-17.060.422.93E-033.03E-03 11724.474-0.1-19.430.26-19.040.15-18.940.26-19.220.26-19.260.241.44E-023.19E-03

4

5 At first glance, Data appeared to have some sort of linear relationship. Started with the Pearson Correlation Coefficient to test for such a relationship. The Pearson Correlation Coefficient Calculated was about.6789. The Pearson Correlation Coefficient assumes the data is normally distributed, which may not be the case, but this was just a first step and the data seem to have a slightly linear relationship. The brightness of the galaxy seems to decrease as the size grows.

6

7 K Means Clustering Attempt to break the data set into smaller data sets. Number of Clusters was chosen to be 5. Had to limit the number of iterations of when to stop trying to improve the centroid for each cluster. Initial centroids were chosen to be the first 5 records.

8

9 Hierarchical Clustering Chose to stop at 5 clusters to have comparison with the K-Means results. Proximity using Euclidean Distance. Used Ward’s Method to determine cluster similarity when merging clusters. Computationally Expensive

10

11 K Means with 3 Variables Wanted to see what kind of results would be yielded from choosing 3 Variables to cluster against. Same parameters for the previous K- Means algorithms. Chose Brightness, Size, and Distance from Earth as the 3 Variables. Difficult to present graphically.

12 ObservationClassDistance to centroid Obs110.864 Obs210.525 Obs321.648 Obs410.551 Obs522.159 Obs621.466 Obs711.754 Obs810.916 Obs910.627 Obs1032.021 Obs1110.202 Obs1231.244 Obs1310.894 Obs1432.247 Obs1520.303 Obs1621.193 Obs1711.644 Obs1821.051 Obs1920.901 Obs2010.230 Obs2110.941 Obs2211.036 Obs2312.198 Obs2421.741 Obs2510.431 Obs2641.311 Obs2712.494 Obs2810.434 Obs2920.657 Obs3021.319 Obs3141.676 Obs3241.355 Obs3351.052 Obs3420.682 Obs3531.295 Obs3611.597 Obs3710.580 Obs3820.143 Obs3910.115 Obs4030.649 Obs4110.861 Obs4233.572 Obs4320.895 Obs4420.967 Obs4521.277 Obs4622.152 Obs4741.040 Obs4840.238 Obs4942.020 Obs5010.956

13 Conclusions Got to see how the affects of outliers can affect the clustering algorithms for AHC vs K-Means. K- Means was more sensitive to outliers. Also got to see how these cluster analysis can be so versatile with lots of different options i.e. value for K, number of attributes to compare etc. – The lots of options can be a downfall of clustering also in that one small change can yield very different results.

14 Afterthoughts I would have done another K-Means clustering analysis after removing the outliers from my original data and see how the difference in the clusters and their centroids. I would have experimented with different values of K and looked at the results.


Download ppt "COMBO-17 Galaxy Dataset Colin Holden COSC 4335 April 17, 2012."

Similar presentations


Ads by Google