Identifying Severe Weather Radar Characteristics

Identifying Severe Weather Radar Characteristics
Ron Holmes NWS State College PA

Strategy Identify differences between original training set and original test set Prepare training and test files from original training data set with known classifications. WEKA Histograms Identify Attributes Decision Tree Neural Net Develop and train a Neural Net for real-time Operations

Methods Identify differences between original training set and original test set to: Eliminate potential outliers in training set. Eliminate suspect or bad data due to measurement errors or misclassification in training set. Data Changes: Eliminated Lat and Lon of Centroid for all instances Changed all missing data to 0 Determine Outliers: Computed Min, Max, Mean, STD for each predictor in the original training and test files. Eliminated 4 instances in training file where a few predictors were over 2 STD’s from the mean of the test file for that predictor. Contest based on highest TSS of test set Objective to bring Mean and STD for each predictor in training file as close as possible to those in the test file. Without original radar data it would be difficult to try to correct for misclassifications in the training set Train system for highest test results (….not necessarily reality )

Methods Created separate Training and Test files from original data file: Used 70% for training and 30% for testing. Randomly stratified Due to uneven distribution of predictands kept the same ratio for both training and test sets. Storm Type Total number in original file Number in training file based on 70% of original file Number in testing file based on 30% of original file Non-Svr 526 368 158 Isold 222 156 66 Line 208 146 62 Pulse 400 280 120 Resulting skill scores reflected uneven distribution (as with original study) Higher skill in predicting Non-Svr and Pulse Lowest skill in predicting Lines

Methods Weka workbench (http://www.cs.waikato.ac.nz/ml/weka/)
Variety of tools for attribute selection, decision trees, logistic regression, and neural network. Attribute selection Histogram Analysis Various algorithms to rank important predictors Chi Squared, Genetic Search, Gain Ratio, Support Vector Machine Goal: To reduce number of predictors for faster training. Performed sensitivity experiments with decision tree/neural net to identify which predictors maximize skill scores. Main predictors (11 out of original 20): Max Ref, Mean Ref, Max VIL, Aspect Ratio, MESH, LatRadius, LonRadius, LifePOSH, Shear, Rot120, Motion South

Good Predictors Good Predictors Good Predictors Good Predictors
Non Pulse Isold Line

943 Correctly Classified : 69.5% 413 Incorrectly Classified : 30.5%
Majority of Pulse Storms Majority of Line Storms identified by the shape High Buoyancy, High Shear Environment Suspected Miss-classification due to Low MaxVil High Buoyancy, Weak Shear Environment. Slow Moving Pulse Storms Majority of Non-Svr Storms

Non-Linear Relationships by Storm Type Max Reflectivity (X axis) Max VIL (Y axis)
MESH (X axis) Max VIL (Yaxis) Low Lvl Shear (X axis) Mean Ref (Y axis)

Neural Network Standard Feed-Forward/Backpropagation ANN
2 Layers – adjustable number of nodes Logistic Sigmoid activation function – winner takes all method Adjustable: Iterations Learning rate Momentum Weight Decay Split original training file: 70% Training and 30% for Testing Instances randomly stratified and proportioned in training and test files according to number of instances of each predictand. Same method as was used for Decision Tree preparation Inputs to ANN were Normalized according to: Normalized Value = (Value-min)/(max-min) Where min and max are the minimum and maximum values of each predictor.

ANN Results using original file split into training and test sets
Training TSS: 0.67 POD FAR CSI Non Svr 0.86 0.14 0.75 Isold 0.28 0.57 Line 0.56 0.25 0.46 Pulse 0.31 Testing TSS: 0.70 POD FAR CSI Non Svr 0.88 0.12 0.78 Isold 0.71 0.27 0.55 Line 0.51 0.25 0.43 Pulse 0.80 0.29 0.60 Best Network: Iterations: 600 Hidden Nodes Layer1: 20 Hidden Nodes Layer2: 10 Learn: 0.08 Momentum: 0.3 Training Confusion Matrix Forecast Class: obs NOT SVR: 0| Acc: 86.3% (POD) obs ISOLD: 1| Acc: 75.0% (POD) obs LINE: 2| Acc: 56.1% (POD) obs PULSE: 4| Acc: 75.2% (POD) Accuracy: 85.8% 71.7% 74.2% 68.8% TSS: 0.67 Testing Confusion Matrix obs NOT SVR: 0| Acc: 88.5% (POD) obs ISOLD: 1| Acc: 71.2% (POD) obs LINE: 2| Acc: 51.6% (POD) obs PULSE: 4| Acc: 80.7% (POD) Accuracy: 87.4% 72.3% 74.4% 70.1% TSS: 0.70 Confusion Matrix From Entry on Contest Test file obs NOT SVR: 0| Acc: 57.1% (POD) obs ISOLD: 1| Acc: 76.3% (POD) obs LINE: 2| Acc: 30.5% (POD) obs PULSE: 4| Acc: 74.5% (POD) Accuracy: 94.2% 58.5% 31.2% 44.3% TSS: 0.45

ANN Results using Original Training file and Contest Test file
Training TSS: 0.65 POD FAR CSI Non Svr 0.86 0.14 0.75 Isold 0.70 0.29 0.54 Line 0.50 0.31 0.40 Pulse 0.73 0.33 0.53 Testing TSS: 0.65 POD FAR CSI Non Svr 0.86 0.06 0.81 Isold 0.64 0.34 0.48 Line 0.16 0.42 0.14 Pulse 0.82 0.41 0.52 Best Network: Iterations: 2000 Hidden Nodes Layer1: 14 Hidden Nodes Layer2: 3 Learn: 0.04 Momentum: 0.3 Training Confusion Matrix Forecast Class: obs NOT SVR: 0| Acc: 86.5% (POD) obs ISOLD: 1| Acc: 70.6% (POD) obs LINE: 2| Acc: 50.1% (POD) obs PULSE: 4| Acc: 73.9% (POD) Accuracy: 85.1% 70.2% 68.8% 66.1% TSS: 0.65 Testing Confusion Matrix obs NOT SVR: 0| Acc: 86.5% (POD) obs ISOLD: 1| Acc: 64.9% (POD) obs LINE: 2| Acc: 16.8% (POD) obs PULSE: 4| Acc: 82.8% (POD) Accuracy: 93.4% 65.9% 57.1% 59.0% Confusion Matrix of the Official Contest Test File obs NOT SVR: 0| Acc: 87.5% (POD) obs ISOLD: 1| Acc: 62.6% (POD) obs LINE: 2| Acc: 43.2% (POD) obs PULSE: 4| Acc: 61.3% (POD) Accuracy: 88.8% 61.2% 49.4% 57.7% TSS: 0.58

Conclusions Improving classifications and bad/missing data in main data set Group of meteorologists to determine storm type from original radar data Better quality control (ie…low MaxVIL values for severe storms) Fill in missing values Uneven distribution of predictands Better skill with large amount of instances (Non-Svr and Pulse) Poor skill with few training instances (Line Storms) Provide an equal distribution of training instances for each predictand Decision Tree about equal in skill to original study and Neural Net an improvement (based on Answer Key). Possible that Neural Net is better than Decision Tree due to non-linear relationships Before Entry: Neural Net implementation “appeared” to show more skill than original study for training and test files made from original data set. Contest entry: not that great Re-trained NN with original, full Training file and tested it with contest Test file. Improved skill over baseline Test set. TSS Training: 0.65 TSS Testing:

Identifying Severe Weather Radar Characteristics

Similar presentations

Presentation on theme: "Identifying Severe Weather Radar Characteristics"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Identifying Severe Weather Radar Characteristics

Similar presentations

Presentation on theme: "Identifying Severe Weather Radar Characteristics"— Presentation transcript:

Similar presentations

About project

Feedback