Exploration of electricity usage data from smart meters to investigate household composition Topic (v): Integration and management of new data sources Seminar on Statistical Data Collection Geneva, Switzerland, September 2013
Overview Setting the scene The data Problem statement The methodology Some results The resources Team review CSO review Concluding remarks 2
Setting the Scene -the players 3
The data Over 5000 households in pilot 3 months baseline data (reading every 30 mins) Pre-trial survey using CATI Purpose : Consumer Behaviour Trials in 2009 and
Problem statement To determine household composition using smart metering data CategoryAdultsChildren A32 B31 C30 D25 E24 F23 G22 H21 I20 J11 K10 L41 M40 N51 O50 P60 5
The methodology Machine learning algorithms for classifier – (learning and testing || generalisation) – Neural Networks used – Binomial and Multinomial classification – Unbalanced data Data reduction/ dimension reduction – Used 21 explanatory variables as input to classifier – Variables normalised 6
Some results – balanced multinomial classifier Test Predicted Household category BCFGHIKMΣ % Accuracy Actual Household category B C F G H I K M CategoryAdultsChildren A32 B31 C30 D25 E24 F23 G22 H21 I20 J11 K10 L41 M40 N51 O50 P60 “Confusion matrix” 7
The resources Project team of two persons for 3 months – Significant amount of time spent manipulating data Software: R with nnet and neuralnet packages Hardware: Required considerable computer resources for manipulating full dataset (Stokes at ICHEC) 8
Team review Problem statement too specific - broaden to household characteristics Alternative approach (cluster analysis and then describe clusters) Other techniques – PCA or signal processing 9
CSO review – forward looking Assuming go live 1.5m household meters linked to statistical household register in 2019 Existing statistical needs – Field force management – Auxiliary information – Sample selection /Representivity analysis New statistical products? – Energy consumption patterns by location, household etc – Quality of life (time to rise, time to bed) 10
Concluding remarks 3 V’s + V for Value – Is there value in SMD Access v Privacy – Legal, moral, proportionality Infrastructure for Big data ( 1.5m data points every 30 mins ) – Outsourcing, downsampling New tools, skills, approaches Roadmap – collaboration with suitable partners 11