Download presentation

Presentation is loading. Please wait.

Published byJaylon Wooten Modified over 4 years ago

1
**Data Mining Techniques to classify inter-area oscillations**

Adamantios Marinakis ABB Corporate Research CH London, 29/11/2013

2
Presentation outline Problem statement Data mining Support Vector Machines Evolution Strategies Random Forests Solution – Results Conclusion

3
Presentation outline Problem statement Data mining Support Vector Machines Evolution Strategies Random Forests Solution – Results Conclusion

4
**Wide-Area Monitoring System (WAMS)**

GPS Satellite Time stamps Visualization of power system dynamics Stability monitoring Stability control and blackout prevention V, I t System Protection Center V, I t V, I t V, I t V, I t V, I t V, I t V, I t V, I t V, I t Voltage and current phasors Communication network © ABB Group March 31, 2017 | Slide 4

5
**Power Damping Monitoring – PDM Principle**

Sliding window of minutes length Estimate MIMO state-space model 𝑥 𝑘+1 =𝐴 𝑥 𝑘 +𝐵 𝑢 𝑘 +𝐾 𝑒(𝑘) 𝑦 𝑘 =𝐶 𝑥 𝑘 +𝐷 𝑢 𝑘 +𝑒(𝑘) Carry our modal analysis Damping and frequency of critical modes …

6
**Swissgrid WAMS Collects measurements from PMUs around Europe**

7
**And then? Do something more than observing…**

What we have: An operator can at any moment know what are the oscillation modes in its system ⇒ The operator can know in real-time its system security status Insecure if damping < some value What would be nice to have: Given a candidate operating point, predict its expected oscillatory status. Given an observed poorly damped operating point, say what is the reason for this. modify the operating point such that it becomes well damped. Insecure → secure model operating point security status

8
**What is an “operating point” At least, how we define it here**

9
**Overview of the approach Linking WAMS with SCADA data…**

PMU measurements SCADA system (time-stamped data) WAMS generation, load dispatch line power flows FACTS devices status (PSS status) … time-stamped oscillations damping ratios Need to time- synchronize them output labels Database Train classifier input variables

10
Presentation outline Problem statement Data mining Support Vector Machines Evolution Strategies Random Forests Solution – Results Conclusion

11
**What is data mining? Apart from a fancy term**

An interdisciplinary subfield of computer science. It is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics and database systems. The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use. It is about analyzing the data

12
Presentation outline Problem statement Data mining Support Vector Machines Evolution Strategies Random Forests Solution – Results Conclusion

13
**Support Vector Machines A powerful classification technique**

Main Idea: Find the optimal separating hyperplane ⇒ maximum margin, i.e. maximize distance to the closest point from either class Minimizes generalization error 𝑓 𝒙 = 𝒘 𝑇 𝒙+𝑏=0 found by solving: min 𝒘,𝑏 𝒘 subject to 𝑦 𝑖 𝒘 𝑇 𝒙 𝑖 +𝑏 ≥1, 𝑖=1,…,𝑁 a QP

14
**Non-separable classes**

min 𝒘,𝑏 𝒘 s.t. 𝑦 𝑖 𝒘 𝑇 𝒙 𝑖 +𝑏 ≥1, 𝑖=1,…,𝑁 min 𝒘,𝑏 𝒘 2 +𝐶 𝑖=1 𝑁 𝜉 𝑖 s.t. 𝑦 𝑖 𝒘 𝑇 𝒙 𝑖 +𝑏 ≥1− 𝜉 𝑖 , 𝜉 𝑖 ≥0 ∀𝑖 regularization parameter

15
**And what about nonlinear patterns in the data?**

Map into a higher dimension feature space Is there any problem? YES! Number of features may blow up! ⇒ Computing the mapping can be inefficient Using the mapped representation can be inefficient Is there any solution? YES!

16
**We only need 𝝓 𝒙 𝑇 𝝓 𝒙 ′ , never just 𝝓 𝒙 Hence: **

The “kernel trick” QP solved by resorting to its dual problem: max 𝜶 𝑖=1 𝑁 𝛼 𝑖 − 𝑖=1 𝑁 𝑗=1 𝑁 𝛼 𝑖 𝛼 𝑗 𝑦 𝑖 𝑦 𝑗 𝒙 𝑖 𝑇 𝒙 𝑗 s.t. 𝑖=1 𝑁 𝛼 𝑖 𝑦 𝑖 =0, 0≤ 𝛼 𝑖 ≤𝐶, ∀𝑖 which … finally gives: 𝑓 𝒙 = 𝑖=1 𝑁 𝛼 𝑖 𝑦 𝑖 𝒙 𝑖 𝑇 𝒙+𝑏 𝐾 (𝑁×𝑁) : 𝐾 𝑖𝑗 =𝑘( 𝑥 𝑖 , 𝑥 𝑗 ) 𝒙 𝑖 𝑇 𝒙 𝑗 ↓ 𝝓 𝒙 𝑖 𝑇 𝝓 𝒙 𝑗 It should have a dot product in the space defined by 𝝓 Note: We only need 𝝓 𝒙 𝑇 𝝓 𝒙 ′ , never just 𝝓 𝒙 Hence: kernel function: 𝑘 𝒙, 𝒙 ′ = 𝝓 𝒙 𝑇 𝝓 𝒙 ′

17
**Polynomial: 𝑘 𝒙,𝒙′ = 1+ 𝒙 𝑇 𝒙 ′ 𝑑**

Most used kernels Polynomial: 𝑘 𝒙,𝒙′ = 1+ 𝒙 𝑇 𝒙 ′ 𝑑 Linear: special case of polynomial Gaussian: 𝑘 𝒙,𝒙′ = 𝑒 − 𝒙−𝒙′ 2 𝛾 𝑑, 𝛾 etc. are called “kernel hyperparameters” They have to be chosen by the user

18
**Ouf, now it seems that quite some tuning is required …**

The user should choose … 𝐶 kernel function kernel function hyperparameters Role of regularization parameter 𝐶: even more pronounced in an enlarged feature space where perfect separation can typically be achieved Overly large value of 𝐶 will lead to an overfit “too curvy” boundary. Overly small 𝐶 will lead to an overly smooth boundary, with big training error. Large 𝑑, 𝛾 ⇒ kernel function “too flexible”, very nonlinear boundary can be achieved Proper tuning is essential for good SVM performance

19
**Automatic tuning of the SVM hyperparameters A nonlinear, non analytical optimization problem**

Choose: 𝐶, kernel, ( 𝑑, 𝛾, … ) Such that: SVM accuracy is maximized Kernel choice: binary → coninuous 𝑘 𝒙,𝒙′ = 𝜆 1+ 𝒙 𝑇 𝒙 ′ 𝑑 + 1−𝜆 𝑒 − 𝒙−𝒙′ 2 𝛾 SVM accuracy: 10-fold cross-validation

20
Presentation outline Problem statement Data mining Support Vector Machines Evolution Strategies Random Forests Solution – Results Conclusion

21
**The basic cycle of the ES algorithm**

∘ 𝑓=… ∘ × ∘ × ∘ ∘ ∘ × ∘ ∘ ∘ 𝑓=… × ∘ ∘ ∘ ∘ Explore Exploit

22
**Mutation: create an offspring out of one parent**

𝒚 is created by mutating 𝒚: 𝒚 =𝒚+𝒛 with 𝒛≔𝜎 𝒩 1 0,1 ,…, 𝒩 𝑛 0,1 𝜎 is called the mutation strength

23
**Create 𝜆 offsprings out of one parent**

∘ ∘ ∘ ∘ ∘ ∘ × ∘ ∘ × ∘ ∘ ∘ ∘ ∘ ∘ ∘ ∘ × ∘ × ∘ ∘ ∘ ∘

24
**Self-adaptation of mutation strength 𝜎**

Each variable 𝑙 has its mutation strength 𝝈 𝒍 Mutation strengths are also mutated 𝜎 𝑙 = 𝜎 𝑙 𝑒 𝜁 𝑙 with 𝜁 𝑙 sampled from 𝜏𝒩 0,1 + 𝜏 ′ 𝒩 0,1 Each individual carries its mutation strengths’ values 𝑦= 𝑦 1 ,…, 𝑦 𝑙 ,…, 𝜎 1 ,…, 𝜎 𝑙 ,… Idea: individuals with more suitable mutation strength values will survive Before mutating the individual object parameters, the strategy parameters are first mutated

25
Population 𝜇>1 ∘ ∘ ∘ × ∘ × ∘ ∘ ∘ ∘ ∘ ∘ ∘ × × ∘

26
**Another variation operator: Recombination**

Create offspring out of 𝜚 parents e.g. (𝜚=2) 𝑦 𝑖 = 𝑦 𝑖 𝐴 + 𝑦 𝑖 𝐵 2 Do 𝜆 times recombination Then apply mutation on those offsprings Parents are selected by uniform random distribution (their fitness is NEVER taken into account) × × ∘ ∘ × × × ∘ × × × ∘ 𝜇/𝜚 + , 𝜆 −𝐸𝑆

27
**Guidelines for successful self-adaptation**

𝜇/𝜚,𝜆 preferred over 𝜇/𝜚+𝜆 selection better in leaving local optimum better in following moving optima with the + strategy bad 𝜎 can survive too long 𝜇>1 to carry different strategies high selective pressure (usually 𝜆≈7∙𝜇) to generate offspring surplus mix strategy parameters (i.e. mutation strengths) by recombining them

28
**ES-tuned SVM classifier Coming up with the oscillation damping classifier**

29
Presentation outline Problem statement Data mining Support Vector Machines Evolution Strategies Random Forests Solution – Results Conclusion

30
**Random Forests A promising alternative**

A collection of decision trees Basic Idea of DT: Greedy algorithm to progressively select the cut-attributes Splitting decided according to some node impurity measure typically the Gini index

31
**Assume independence among classifiers**

Ensemble classifiers General Idea Why do they work Assume 25 classifiers Each with error rate 𝜀=0.35 Assume independence among classifiers Error rate of the ensemble classifier: 𝑖= 𝑖 𝜀 𝑖 1−𝜀 25−𝑖 =0.06

32
**Random Forests – The algorithm**

Given training dataset 𝒟= 𝒙 1 , 𝑦 1 … 𝒙 𝑛 , 𝑦 𝑛 For 𝑏=1 to 𝐵: Draw a bootstrap sample 𝒟 𝑏 of size 𝑛 from 𝒟 (i.e. sample 𝑛 times with replacement) Grow a tree classifier on 𝒟 𝑏 , where each split is computed as follows: Select 𝑚 variables at random (from the 𝑝 variables) Pick the best variable/split-point among the 𝑚 Split the current node into two Output: the ensemble of 𝐵 trees 𝜚 : pairwise correlation 𝜚 𝜎 2 + 1−𝜚 𝐵 𝜎 2 Feature importance insight Massive parallelization potential

33
Presentation outline Problem statement Data mining Support Vector Machines Evolution Strategies Random Forests Solution – Results Conclusion

34
**Solution Overview Linking WAMS with SCADA data…**

PMU measurements SCADA system (time-stamped data) WAMS generation, load dispatch line power flows FACTS devices status (PSS status) … time-stamped oscillations damping ratios output labels Need for proper feature selection Database Train classifier input variables

35
**Test system - Modified Nordic32 12978 samples, produced by simulations**

Generators mostly participating at the Hz mode (based on participation factors from linear model)

36
**Damping vs. Intertie Cut Correlated, but …**

3580 samples 1643 samples (out of 12978) Correspond to different PSS being off 1271 samples 4851 samples

37
**ES-SVM classifier 10-fold cross-validation accuracy**

Input features kernel mixed radial basis polynomial Only intertie flow 92.7 92.7 92.0 Intertie flow & PSS status 93.4 94.0 92.8 Dispatch 95.6 95.6 95.6 Intertie flow, PSS status & 98.3 97.8 98.2 synthetic features Dispatch & PSS status 98.6 97.8 98.3 Dispatch, power flows, 99.2 98.6 99.1 PSS status & synthetic features 1% - 3% improvement compared to initial guess mixed kernel slightly better More features ⇒ better performance (even if redundant)

38
**Random Forest classifier Out-of-bag accuracy**

Input features Accuracy Dispatch, power flows, PSS status & synthetic features 97.79 PSS, Intertie, Line 18, Line 32 98.54 PSS, Intertie, Gen63, Line 16, Line 32 98.53 PSS, Intertie, Gen63 & 6 line flows 98.59 18 16 32 very efficient feature selection less accurate than SVM Gen63

39
Presentation outline Problem statement Data mining Support Vector Machines Evolution Strategies Random Forests Solution – Results Conclusion

40
**Conclusion … and challenges**

WAMS-SCADA link turned out to be an interesting idea At least for the inter-area oscillations case SVM achieved higher accuracy proper SVM tuning pays off RFs are not much worse, while allowing for very efficient feature selection Challenges… Check in real data Computational intensiveness Close the loop – Correct operating point based on model

41
Acknowledgment The author gratefully acknowledges the financial support from Marie Curie FP7-IAPP Project: Using real-time measurements for monitoring and management of power transmission dynamics for the smart grid- REAL-SMART, Contract No. PIAP-GA

42
**Thank you for your attention!**

Adamantios Marinakis ABB Corporate Research Switzerland Phone: Mobile:

Similar presentations

Presentation is loading. Please wait....

OK

Adding Up In Chunks.

Adding Up In Chunks.

© 2018 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google