Weather Mining Hayato Akatsuka
Objective Cluster a region which shares similar climate.
Input Each weather station in the United States is an input Each station contains more than 50 parameters –i.e. Latitude, Longitude, Elevation, Minimum Temperature, Maximum Temperature, so on…
Stations 6000 ~ Stations
Overview output(Image) Input (text file) Station1 2005/01/01 MaxTemp MinTemp Lantitude Longitude Elevation …. Station2 2005/01/01 MaxTemp MinTemp Lantitude Longitude Elevation …. Station3 2005/01/01 MaxTemp MinTemp Lantitude Longitude Elevation ….. Clustering
Distance Measure Euclidean Distance If you are interested in some particular parameters, adjust k accordingly
About Clustering Day 1(Hierachical Clustering) –This is an initialization Stage. –Pick a number of clusters –Then, Perform Hierarchical Clustering Day 2(Clustering variant) –For each input, cluster with the nearest centroid obtained from the previous day (Day 1 in this case). –Do not update centroid –Repeat until you cluster all the input for Day 2. –Recalculate centroid Day 3 –Repeat Day2 ….
Centroid Calculation For same cluster 2 nd Day: 3 rd Day: 4 th Day:
Quick Animation Day1Day2
Result For simplicity, just use only 1 parameter (TMIN). Number of Clusters = 5
Comparison Output Hardiness Zone
Conclusion Well… there are not much different between a map I received from January and one from December. Simply making a map out of annual data, instead of daily data, might be better.
Reference Hardiness Map up.cfm