Data Preprocessing Missing data points Need to choose whether to interpolate or just to leave data sequences fragmented Differences in sampling frequencies Resample the data at appropriate rates – we also need to know when this is feasible Preprocessing Feature Extraction Classification Verification Correlation
Algorithm Development 1. Define the time domain in which a surge in oxygen concentration is unexpected. Here, we define it to be the time between half an hour after sunset and half an hour before sunrise. 2. The structure of a rise generally requires a string of mostly positive gradients between data points. Since raw data often contains jitter, we need to pass the data through a low- pass filter.
Filter Selection We want to smooth the data (remove high frequency noise) by passing the raw data through a low-pass filter. We dont want to smooth out the feature of interest!
Algorithm Design After filtering, we can use the gradients to find the local minima in our data set. Using these minima, we segment our data set and extract features such as volume and height from the curve. Given that were trying to detect deviation from a negative slope, we need to choose a baseline appropriately.
Feature Extraction Baseline Selection When the next minima is above the previous minima, use a linear interpolation for the baseline. When the next data point is below the previous minima, assume the general decrease has taken over again – use a horizontal line as the baseline. Preprocessing Feature Extraction Classification Verification Correlation
Classification With just the volume metric, we implemented a classification system based upon a fixed threshold. This does not take into account lakes with characteristically different amplitudes; a lake with typically smaller daily variation will have less chance of triggering the classifier. Preprocessing Feature Extraction Classification Verification Correlation
Bump Height vs. Daily Amplitude Now, we take into account the height of our feature with respect to the range of a window of days around our feature.
Expert Comparison Using expert analysis, have a set of ideal classifications to compare against. Our experts classified each day regarding the presence of the bump as either Yes, Maybe or No. These results were compared against the algorithms Yes or No. Preprocessing Feature Extraction Classification Verification Correlation
Results Lake YearDays DetectedTotal Days% Days Crystal Bog 06334376.74 Ormajarvi 06111291.67 Sparkling Lake 0412016871.43 Sparkling Lake 06284562.22 Sunapee 06406066.67 Taihu 06283287.50 Trout Bog 0415218383.06 Trout Bog 05679471.28 Trout Russ 06183256.25 TOTAL49766974.29
The Future We now have a method to objectively detect the presence of the midnight surge. Our new question is: Why is there a surge on some days, and not on other days? To answer this question, we have to look at other types of data readings, for example: water temperature; wind speed; PAR etc. Preprocessing Feature Extraction Classification Verification Correlation
The Future Search for hidden correlations between other data types and our feature to formulate/validate an hypothesis. Sparkling Lake 04 Dissolved OxygenConcentration Water Temperatures ThermistorSensor Depth