© NERC All rights reserved Storms rare but important Balance dataset otherwise storms look like noise Features selected like Split: training set, validation.

Slides:



Advertisements
Similar presentations
Robin Hogan Ewan OConnor University of Reading, UK What is the half-life of a cloud forecast?
Advertisements

TNO orbit computation: analysing the observed population Jenni Virtanen Observatory, University of Helsinki Workshop on Transneptunian objects - Dynamical.
Coefficient Path Algorithms Karl Sjöstrand Informatics and Mathematical Modelling, DTU.
Psychology 202b Advanced Psychological Statistics, II February 1, 2011.
BA 555 Practical Business Analysis
Chapter 12 - Forecasting Forecasting is important in the business decision-making process in which a current choice or decision has future implications:
Linear Regression Models Based on Chapter 3 of Hastie, Tibshirani and Friedman Slides by David Madigan.
Data Freshman Clinic II. Overview n Populations and Samples n Presentation n Tables and Figures n Central Tendency n Variability n Confidence Intervals.
Modeling Gene Interactions in Disease CS 686 Bioinformatics.
© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
General Mining Issues a.j.m.m. (ton) weijters Overfitting Noise and Overfitting Quality of mined models (some figures are based on the ML-introduction.
Introduction The primary geomagnetic storm indicator is the Dst index. This index has a well established ‘recipe’ by which ground-based observations are.
1 BA 555 Practical Business Analysis Review of Statistics Confidence Interval Estimation Hypothesis Testing Linear Regression Analysis Introduction Case.
Today Evaluation Measures Accuracy Significance Testing
Evaluating Classifiers
Traffic modeling and Prediction ----Linear Models
Anomaly detection Problem motivation Machine Learning.
Creating Empirical Models Constructing a Simple Correlation and Regression-based Forecast Model Christopher Oludhe, Department of Meteorology, University.
ACCURATE TELEMONITORING OF PARKINSON’S DISEASE SYMPTOM SEVERITY USING SPEECH SIGNALS Schematic representation of the UPDRS estimation process Athanasios.
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall7-1 Chapter 7: Forecasting.
Forecast Accuracy and Model Choice Keith Ord Georgetown University.
© Crown copyright Met Office Forecasting Icing for Aviation: Some thoughts for discussion Cyril Morcrette Presented remotely to Technical Infra-structure.
Oceanography 569 Oceanographic Data Analysis Laboratory Kathie Kelly Applied Physics Laboratory 515 Ben Hall IR Bldg class web site: faculty.washington.edu/kellyapl/classes/ocean569_.
Exploration Physics International, Inc. HAF: An Operational, Event-driven Solar Wind Forecast Model 1 Murray Dryer
The SWENET Online Archive: 10 years of a European Space Weather Community Resource H. Laurens¹, A. Glover¹², JP. Luntama¹, E.Amata³, E.Clarke ⁴, P. Beltrami.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
Kp Forecast Models S. Wing 1, Y. Zhang 1, and J. R. Johnson 2 1 Applied Physics Laboratory, The Johns Hopkins University 2 Princeton Plasma Physics Laboratory,
Multiple Regression The Basics. Multiple Regression (MR) Predicting one DV from a set of predictors, the DV should be interval/ratio or at least assumed.
Heidke Skill Score (for deterministic categorical forecasts) Heidke score = Example: Suppose for OND 1997, rainfall forecasts are made for 15 stations.
Latest results in verification over Poland Katarzyna Starosta, Joanna Linkowska Institute of Meteorology and Water Management, Warsaw 9th COSMO General.
© NERC All rights reserved Chris Turbitt Observatories Manager Geomagnetism Team British Geological Survey BGS Observatory Network.
Data assimilation and forecasting the weather (!) Eugenia Kalnay and many friends University of Maryland.
Stat 112 Notes 9 Today: –Multicollinearity (Chapter 4.6) –Multiple regression and causal inference.
Bob Jacobsen Aug 6, 2002 From Raw Data to Physics From Raw Data to Physics: Reconstruction and Analysis Introduction Sample Analysis A Model Basic Features.
Analyzing wireless sensor network data under suppression and failure in transmission Alan E. Gelfand Institute of Statistics and Decision Sciences Duke.
Machine Learning Tutorial-2. Recall, Precision, F-measure, Accuracy Ch. 5.
Improving Space Weather Forecasts Using Coronagraph Data S.P. Plunkett 1, A. Vourlidas 1, D.R. McMullin 2, K. Battams 3, R.C. Colaninno 4 1 Naval Research.
How Good is a Model? How much information does AIC give us? –Model 1: 3124 –Model 2: 2932 –Model 3: 2968 –Model 4: 3204 –Model 5: 5436.
WCRP Extremes Workshop Sept 2010 Detecting human influence on extreme daily temperature at regional scales Photo: F. Zwiers (Long-tailed Jaeger)
Machine Learning 5. Parametric Methods.
Data analysis tools Subrata Mitra and Jason Rahman.
Brian Lukoff Stanford University October 13, 2006.
Page 1© Crown copyright 2004 The use of an intensity-scale technique for assessing operational mesoscale precipitation forecasts Marion Mittermaier and.
Learning Photographic Global Tonal Adjustment with a Database of Input / Output Image Pairs.
MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Data Mining Lectures Lecture 7: Regression Padhraic Smyth, UC Irvine ICS 278: Data Mining Lecture 7: Regression Algorithms Padhraic Smyth Department of.
The CME geomagnetic forecast tool (CGFT) M. Dumbović 1, A. Devos 2, L. Rodriguez 2, B. Vršnak 1, E. Kraaikamp 2, B. Bourgoignie 2, J. Čalogović 1 1 Hvar.
Chapter 11 – Neural Nets © Galit Shmueli and Peter Bruce 2010 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
Bob Weigel George Mason University.  “We are still in the process of identifying characteristic behavior that identifies various modes as separate phenomena.”
Machine Learning with Spark MLlib
How to forecast solar flares?
Linear Regression CSC 600: Data Mining Class 12.
Evaluating Classifiers
Intensity-scale verification technique
CSE 4705 Artificial Intelligence
Validation of Regression Models
Michael K. Tippett1,2, Adam H. Sobel3,4 and Suzana J. Camargo4
An application of machine learning to geomagnetic index prediction: aiding human space weather forecasting Laurence Billingham, Gemma Kelly British Geological.
Introduction to Data Mining and Classification
Mitchell Kossoris, Catelyn Scholl, Zhi Zheng
Gemma Kelly, Alan Thomson
Improving Operational Geomagnetic Index Forecasting
Regression Analysis Jared Dean as quoted in Big Data, Data Mining, and Machine Learning From my experience, regression is the most dominant force in driving.
CSCI N317 Computation for Scientific Applications Unit Weka
Intro to Machine Learning
Comparison of the csEN algorithm to existing predictive methods and model reduction. Comparison of the csEN algorithm to existing predictive methods and.
Model generalization Brief summary of methods
Presentation transcript:

© NERC All rights reserved Storms rare but important Balance dataset otherwise storms look like noise Features selected like Split: training set, validation set, test set Training set scaled Some algorithms require use Principal Component Analysis to decompose Improving Operational Geomagnetic Index Forecasting Laurence Billingham Gemma Kelly British Geological Survey, West Mains Road, Edinburgh, UK 1. Introduction The interest in space weather has never been greater, with society becoming ever more reliant upon technology and infrastructure which are potentially at risk. Geomagnetic storms are potentially damaging to power-grids, communication systems and oil and gas operations. 2. Data 3. Techniques Machine Learning A branch of statistics We use regression algorithms here Data laid out as for matrix inversion (little like finding best fit line with 2D data) Many algorithms (see [2] for an excellent introduction), some are like linear regression e.g. Workflow: Training: get coefficients from Tune model parameters against validation set Test and score model with test set Predict new a p from unseen data ARIMA Auto-regressive moving average A linear regression over a windowed average of a p Only input is a p timeline Currently operational: used here as a baseline quality comparison 5.Summary and Future Work 4.Results Initial dataset with 205 samples (small set) Some models much better at identifying storms than others Large range in rms values and percentage of predictions which are close to the true value We then increased the total dataset size to 1000 samples (large set) and tested the best performing models Again range of rms values All the machine learning models out perform the ARIMA model in terms of rms, HitRate and skill (HSS) Positive results: worth pursuing for production system References [1] McPherron, Magnetospheric Dynamics, in Introduction to Space Physics, edited by Kivelson, Russell, pp , Cambridge University Press, [2] Hastie et al., The Elements of Statistical Learning Data Mining, Inference, and Prediction, Springer 2009(II) Geomagnetic indices Capture magnetic storm severity by summarising lots of data have become ubiquitous parameterisations of storm-time magnetic conditions required as inputs by a variety of models a p index captures amplitude of the disturbance in horizontal part of the field (see e.g. [1] for more detail) tracks disturbances within a 3-hour interval indicates the global level of disturbance Samples times over ~15 years of geomagnetic and solar wind data Same scaling applied to other sets Linear Regression LR + = RidgeLR + = Lasso LR + Lasso + Ridge = ElasticNet Scoping study results positive value in predictions proceed to operational system Here we only predict 1 a p interval into future Some models easily configures to predict multiple intervals Others need new train, validate, test cycles Classification not regression e.g. G1,..., G5 More useful aid to human forecaster Potentially easier computation Up-weight storm categories: balance dataset More features per sample Models converge with few training samples (see fig): models powerful enough Data mine human forecasts, coronagraph data... Science potential in ‘white-box’ models: which features give useful info? This work is powered by Python-Scikit-learn Pedregosa et al., Scikit-learn: Machine Learning in Python, JMLR 12, pp , Metrics: rms: root-mean square error % within ±N: Percentage of predicted values within ±N of the observed value HitRate: how well do we predict the storms? 1 = predicted every single storm 0 = missed every storm HSS: Heidke skill score measures fractional improvement of the forecast over forecast by random chance HSS = 2 (ad – bc) / [(a+ c)(c + d) + (a + b)(b + d)] 1 = highly skilled 0 = no skill <0 = worse than random chance FAR: False alarm rate of storm prediction 0 = no false alarms 1 = all false alarms Event Forecast Storm Observed YesNo Forc Σ Yesaba + b Nocdc + d Obs Σ a + cb + da+b+c+d = n Small set Large set