Exploration of electricity usage data from smart meters to investigate household composition

Slides:



Advertisements
Similar presentations
The Likely Impact of Smart Electricity Meters in Ireland Seán Lyons (with Conor Devitt & Anne Nolan) ESRI/EPA Environmental Economics Seminar, 30 May 2011.
Advertisements

Data Mining in Computer Games By Adib Adam Hussain & Mohammed Sarfraz.
R for Classification Jennifer Broughton Shimadzu Research Laboratory Manchester, UK 2 nd May 2013.
Will ‘big data’ transform official statistics?
Chi Squared Tests. Introduction Two statistical techniques are presented. Both are used to analyze nominal data. –A goodness-of-fit test for a multinomial.
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Jeff Howbert Introduction to Machine Learning Winter Collaborative Filtering Nearest Neighbor Approach.
Civil and Environmental Engineering Carnegie Mellon University Sensors & Knowledge Discovery (a.k.a. Data Mining) H. Scott Matthews April 14, 2003.
Supervised classification performance (prediction) assessment Dr. Huiru Zheng Dr. Franscisco Azuaje School of Computing and Mathematics Faculty of Engineering.
Presentation of approach and pilot results Mannheim, March 20-22, 2015 You walk, you travel, you use your phone – differently!
Introduction to Machine Learning Approach Lecture 5.
International Seminar on Modernizing Official Statistics:
1 CHAPTER M4 Cost Behavior © 2007 Pearson Custom Publishing.
1 Using R for consumer psychological research Research Analytics | Strategy & Insight September 2014.
Data Mining. 2 Models Created by Data Mining Linear Equations Rules Clusters Graphs Tree Structures Recurrent Patterns.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
From Devices to People: Attribution of Search Activity in Multi-User Settings Ryen White, Ahmed Hassan, Adish Singla, Eric Horvitz Microsoft Research,
Crowdsourcing Predictors of Behavioral Outcomes. Abstract Generating models from large data sets—and deter¬mining which subsets of data to mine—is becoming.
Population Movements from Anonymous Mobile Signaling Data An Alternative or Complement to Large- Scale Episodic Travel Surveys?
K E Y : SW Service Use Big Data Information Flow SW Tools and Algorithms Transfer Application Provider Visualization Access Analytics Curation Collection.
Tutor: Prof. A. Taleb-Bendiab Contact: Telephone: +44 (0) CMPDLLM002 Research Methods Lecture 8: Quantitative.
Lecture 9: Knowledge Discovery Systems Md. Mahbubul Alam, PhD Associate Professor Dept. of AEIS Sher-e-Bangla Agricultural University.
Self organizing maps 1 iCSC2014, Juan López González, University of Oviedo Self organizing maps A visualization technique with data dimension reduction.
Big Data Quality, Partnerships and Privacy Teams.
Emerging methodologies for the census in the UNECE region Paolo Valente United Nations Economic Commission for Europe Statistical Division International.
Use of web scraping and text mining techniques in the Istat survey on “Information and Communication Technology in enterprises” Giulio Barcaroli(*), Alessandra.
Millions of points of measurement Dense spatial and temporal data Need visual analytic tools as conventional analyses are too inefficient Visualization.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A data mining approach to the prediction of corporate failure.
Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.
Metadata Models in Survey Computing Some Results of MetaNet – WG 2 METIS 2004, Geneva W. Grossmann University of Vienna.
Some working definitions…. ‘Data Mining’ and ‘Knowledge Discovery in Databases’ (KDD) are used interchangeably Data mining = –the discovery of interesting,
Data Reduction. 1.Overview 2.The Curse of Dimensionality 3.Data Sampling 4.Binning and Reduction of Cardinality.
Topic (vi): New and Emerging Methods Topic organizer: Maria Garcia (USA) UNECE Work Session on Statistical Data Editing Oslo, Norway, September 2012.
Addressing the challenge of producing European comparable data using administrative data Mihaela AGAFIŢEI Sorina VÂJU UNECE Seminar on Statistical Data.
K E Y : SW Service Use Big Data Information Flow SW Tools and Algorithms Transfer Transformation Provider Visualization Access Analytics Curation Collection.
Advanced Analytics on Hadoop Spring 2014 WPI, Mohamed Eltabakh 1.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
Nurissaidah Ulinnuha. Introduction Student academic performance ( ) Logistic RegressionNaïve Bayessian Artificial Neural Network Student Academic.
Pilot Census in Poland Some Quality Aspects Geneva, 7-9 July 2010 Janusz Dygaszewicz Central Statistical Office POLAND.
Large Scale Data Representation Erik Goodman Daniel Kapellusch Brennen Meland Hyunjae Park Michael Rogers.
The project includes around 36 centres across Europe. Reference Centres Twin Centre 1s (Twin Centre 2s) About to add 24.
Modelling sample data from smart-type meter electricity usage Susan Williams NTTS Conference, March 2015, Brussels.
Application of Data Mining Techniques on Survey Data using R and Weka
Special Challenges With Large Data Mining Projects CAS PREDICTIVE MODELING SEMINAR Beth Fitzgerald ISO October 2006.
Towards the 2011 UK Census Editing Strategy Heather Wagstaff and Steven Rogers Methodology Directorate Office for National Statistics, U.K.
Research Methodology II Term review. Theoretical framework  What is meant by a theory? It is a set of interrelated constructs, definitions and propositions.
K E Y : DATA SW Service Use Big Data Information Flow SW Tools and Algorithms Transfer Hardware (Storage, Networking, etc.) Big Data Framework Scalable.
An Overview of Editing and Imputation Methods for the next Italian Censuses Gianpiero Bianchi, Antonia Manzari, Alessandra Reale UNECE-Eurostat Meeting.
Chapter 2 The Tools of Sociology Key Terms. hypothesis A statement that specifies a relationship between two or more variables that can be tested through.
Multivariate statistical methods. Multivariate methods multivariate dataset – group of n objects, m variables (as a rule n>m, if possible). confirmation.
New data sources (such as Big Data) and Traditional Sources Work Package 2.
Introduction to Machine Learning, its potential usage in network area,
Software Defects Cmpe 550 Fall 2005
Sharing of previous experiences on scraping Istat’s experience
Big data classification using neural network
Discussion for: Broadening the data base for deepening the focus
Preface to the special issue on context-aware recommender systems
A Level Computing AQA (7517)
Machine learning in Action: Unpacking the Biographical Questionnaire
The Six Parts of a GIS.
The Steps into creation of research
Collaborative Filtering Nearest Neighbor Approach
Christophe Dubach, Timothy M. Jones and Michael F.P. O’Boyle
INTRODUCTION.
Overview of Approaches to Register-Based Populating Censuses
Ryen White, Ahmed Hassan, Adish Singla, Eric Horvitz
Intro to Machine Learning
Parallel Session: BR maintenance Quality in maintenance of a BR:
Deep Learning with Botanical Specimen Images
Case Study: HLG Big Data Sandbox
Presentation transcript:

Exploration of electricity usage data from smart meters to investigate household composition Topic (v): Integration and management of new data sources Seminar on Statistical Data Collection Geneva, Switzerland, September 2013

Overview Setting the scene The data Problem statement The methodology Some results The resources Team review CSO review Concluding remarks 2

Setting the Scene -the players 3

The data Over 5000 households in pilot 3 months baseline data (reading every 30 mins) Pre-trial survey using CATI Purpose : Consumer Behaviour Trials in 2009 and

Problem statement To determine household composition using smart metering data CategoryAdultsChildren A32 B31 C30 D25 E24 F23 G22 H21 I20 J11 K10 L41 M40 N51 O50 P60 5

The methodology Machine learning algorithms for classifier – (learning and testing || generalisation) – Neural Networks used – Binomial and Multinomial classification – Unbalanced data Data reduction/ dimension reduction – Used 21 explanatory variables as input to classifier – Variables normalised 6

Some results – balanced multinomial classifier Test Predicted Household category BCFGHIKMΣ % Accuracy Actual Household category B C F G H I K M CategoryAdultsChildren A32 B31 C30 D25 E24 F23 G22 H21 I20 J11 K10 L41 M40 N51 O50 P60 “Confusion matrix” 7

The resources Project team of two persons for 3 months – Significant amount of time spent manipulating data Software: R with nnet and neuralnet packages Hardware: Required considerable computer resources for manipulating full dataset (Stokes at ICHEC) 8

Team review Problem statement too specific - broaden to household characteristics Alternative approach (cluster analysis and then describe clusters) Other techniques – PCA or signal processing 9

CSO review – forward looking Assuming go live 1.5m household meters linked to statistical household register in 2019 Existing statistical needs – Field force management – Auxiliary information – Sample selection /Representivity analysis New statistical products? – Energy consumption patterns by location, household etc – Quality of life (time to rise, time to bed) 10

Concluding remarks 3 V’s + V for Value – Is there value in SMD Access v Privacy – Legal, moral, proportionality Infrastructure for Big data ( 1.5m data points every 30 mins ) – Outsourcing, downsampling New tools, skills, approaches Roadmap – collaboration with suitable partners 11