# School of Computing FACULTY OF ENGINEERING MJ11 (COMP1640) Modelling, Analysis & Algorithm Design Vania Dimitrova Lecture 18 Statistical Data Analysis:

## Presentation on theme: "School of Computing FACULTY OF ENGINEERING MJ11 (COMP1640) Modelling, Analysis & Algorithm Design Vania Dimitrova Lecture 18 Statistical Data Analysis:"— Presentation transcript:

School of Computing FACULTY OF ENGINEERING MJ11 (COMP1640) Modelling, Analysis & Algorithm Design Vania Dimitrova Lecture 18 Statistical Data Analysis: Types of Data, Sampling Methods, Descriptive Statistics November 2011

In the previous lectures Mathematical Modelling Identify Factors Make assumptions Formulate model Examine behaviour Assumptions are crucial Estimated values for parameters Assumed dependencies between variables Check validity of assumptions Grounded on data analysis

In this series of lectures Analysis of data How to collect data samples? How to make estimations of values (point/interval)? How to infer possible dependencies between variables? How to check the validity of a hypothesis?

Population Large collection of objects or events which vary in respect of some characteristics The whole set of measurements or counts about which we want to draw a conclusion Characteristics of population: height, age, reading abilities, fitness level What is the population for each of the claims on the previous slide?

Sample Subset of the population, as set of some of the measurements or the characteristics of the population. population sample Measures describing population characteristics PARAMETERS Measures describing sample characteristics STATISTICS Statistics estimate the parameters Systematic error the sample is not representative of the population Sampling error influences by the size of the sample and the variation in the population

Sampling methods Random sampling selecting members of the population in a random order Pros & Cons Systematic sampling selecting members of the population in a systematic order (quasi-random) Pros & Cons

Sampling methods (cont.) Stratified sampling dividing population in homegeneous groups and random selection within the group Pros & Cons Cluster sampling when the population is too big, we may select certain clusters (e.g. UK students) Pros & Cons Stage sampling – random selection of clusters

Sample size What precision do we want? Increase the size to get better precision What is the likely variability in the population? Increase the size to account for higher variability

Types of data Nominal Categories, classes Ordinal Nominal with order Discrete Numbers that are distinct points on a scale Continuous Can take any values between points on a scale GIVE EXAMPLES

Descriptive Statistics Mean – average score Median – middle point on the scale of measurement helpful for oddly shaped distributions General description of the sample 2 3 5 7 9 10 12 13 14 16 18 20 21 3 4 4 5 5 5 5 6 7

Distribution of scores Standard Deviation Variance Coefficient of variation

Example (EU-Area-Current-Accounts.xls) http://epp.eurostat.ec.europa.eu/

Normal and Skewed Distribution wikipedia Skewed Distribution Normal Distribution

Approximating Normal Distribution As the sample size increases, the shape of the sampling distribution becomes normal (see also Central Limit Theorem ) http://www.statsoft.com/textbook/esc.html

Correlation between two variables Measure of the relations between two or more variables Correlation coefficient r Negative correlation r -1 Positive correlation r 1 Different methods to calculate r Simplest: based on deviations from the mean

Example: Positive Correlation r=0.998

Example: Negative Correlation r=-0.99

Example: Limited (or No) Correlation r=0.179

Summary Types of data Population vs Sample Sampling methods Descriptive statistics Normal & Skewed distribution Correlation between variables References Rees D.G., Essential Statistics, Chapman & Hall/CRC, 2000. Cohen, L., Holliday, M., Practical Statistics for Students, Chapman, 1996. http://www.statsoft.com/textbook/esc.html

Similar presentations