Presentation is loading. Please wait.

Presentation is loading. Please wait.

Analytics – Statistical Approaches

Similar presentations


Presentation on theme: "Analytics – Statistical Approaches"— Presentation transcript:

1 Analytics – Statistical Approaches
CSC 453, CSC-591/ECE592, Spring 2016 Rudra Dutta

2 Overview Extracting “insight” or “knowledge” from sensor readings is a key value-add for IoT “Insight” aspect comes from knowing/hypothesizing what available data may correlated with what required information (The origin of business analytics) Actual analysis may be more traditional Algorithmic and statistical Algorithmic challenge – what IS the data? Statistical challenge – how to process? Copyright Rudra Dutta, CSC, NCSU, Spring 2016

3 R Programming Language
A scripting-like environment MATLAB-like Openly available across platforms GUI through X-windows Easy manipulation of vectors and matrices Rich statistical library and tables Most of what follows can be (comparatively) painlessly done with R Without having to implement detailed steps E.g. linear regression is a single simple call “lm” Copyright Rudra Dutta, CSC, NCSU, Spring 2016

4 Datasets in IoT Context
Data often has “stochastic” nature Many random variation factors Some are artifacts of the system itself Noise Environmental conditions for sensors Difference between individual sensors Uncertainty Actual variation due to complex environmental factors Often secondary, but with some effect nevertheless Copyright Rudra Dutta, CSC, NCSU, Spring 2016

5 Estimation, Forecast, Extrapolation
When we measure some quantity or quantities in the physical world, we might be interested in Knowing what the “real” underlying quantity is, from the measurements Knowing how one quantity might be determined from other(s) Knowing what the quantity is when/where we have NOT measured it Copyright Rudra Dutta, CSC, NCSU, Spring 2016

6 Arithmetic Mean (Expectation)
Simple and straightforward estimation of underlying quantity Population: all the instances that might be measured (potentially infinite) Sample: the ones we DO measure in an experiment (necessarily finite) In some cases, repeated measurements (to reduce effect of noise) In others, measurements apply to different instances (heights of people, e.g.) Copyright Rudra Dutta, CSC, NCSU, Spring 2016

7 Confidence Interval Based on a sample, a “confidence interval” can be extracted for any degree of confidence (e.g. 95%) Something of a misnomer “If we took more samples, and derived this interval each time, 95% the actual mean would lie inside the interval” Calculated with assumptions of underlying Gaussian distribution Copyright Rudra Dutta, CSC, NCSU, Spring 2016

8 Regression General-purpose model-fitting approach to collected data
Hypothesis: there is an underlying reality that causes an observed variable to take on values in response to a different one (“model”) May or may not be causal Heater setting and temperature Height and weight Approach: assuming so, actual observed sample points must contain “error” Due to noise or random variation What parameters of the model would minimize the overall error? Linear model – one independent and one dependent variable Generalized to multiple (matrix statement of same) Copyright Rudra Dutta, CSC, NCSU, Spring 2016

9 Regression – Linear Model
Copyright Rudra Dutta, CSC, NCSU, Spring 2016

10 Hypothesis Testing Original hypothesis of the model
Null hypothesis: no such correlation Pre-determine desired error likelihood, e.g. 5% Test: draw sample, then determine the probability that this sample could have come from null hypothesis population If probability is less than pre-determined, then we say we can “reject the null hypothesis” at the 5% level Copyright Rudra Dutta, CSC, NCSU, Spring 2016

11 Hypothesis Testing Copyright Rudra Dutta, CSC, NCSU, Spring 2016

12 Time-Series Copyright Rudra Dutta, CSC, NCSU, Spring 2016

13 FSM-based Modeling Copyright Rudra Dutta, CSC, NCSU, Spring 2016

14 Smoothing Copyright Rudra Dutta, CSC, NCSU, Spring 2016

15 Clustering and Classification
Copyright Rudra Dutta, CSC, NCSU, Spring 2016

16 Summary Statistical techniques can be useful tools in the toolkit of the IoT engineer Extract meaningful sense from available data Many such techniques are sufficiently mature that they can simply be plugged in programmatically Need to understand basic concepts behind tools and techniques Copyright Rudra Dutta, CSC, NCSU, Spring 2016


Download ppt "Analytics – Statistical Approaches"

Similar presentations


Ads by Google