Presentation on theme: "School of Computing FACULTY OF ENGINEERING MJ11 (COMP1640) Modelling, Analysis & Algorithm Design Vania Dimitrova Lecture 18 Statistical Data Analysis:"— Presentation transcript:
School of Computing FACULTY OF ENGINEERING MJ11 (COMP1640) Modelling, Analysis & Algorithm Design Vania Dimitrova Lecture 18 Statistical Data Analysis: Types of Data, Sampling Methods, Descriptive Statistics November 2011
In the previous lectures Mathematical Modelling Identify Factors Make assumptions Formulate model Examine behaviour Assumptions are crucial Estimated values for parameters Assumed dependencies between variables Check validity of assumptions Grounded on data analysis
In this series of lectures Analysis of data How to collect data samples? How to make estimations of values (point/interval)? How to infer possible dependencies between variables? How to check the validity of a hypothesis?
Do you agree with these statements? Average earnings in the UK grow steadily. There are more overseas visits to the UK than UK visits abroad. House prices are dependent on average family income. Young people prefer to shop online. Advertisement improves sales figures. TV advertisement is more powerful than Radio advertisement. Women are more likely to buy computer games than men. Men are more likely to buy cosmetics products than women.
Population Large collection of objects or events which vary in respect of some characteristics The whole set of measurements or counts about which we want to draw a conclusion Characteristics of population: height, age, reading abilities, fitness level What is the population for each of the claims on the previous slide?
Sample Subset of the population, as set of some of the measurements or the characteristics of the population. population sample Measures describing population characteristics PARAMETERS Measures describing sample characteristics STATISTICS Statistics estimate the parameters Systematic error the sample is not representative of the population Sampling error influences by the size of the sample and the variation in the population
Sampling methods Random sampling selecting members of the population in a random order Pros & Cons Systematic sampling selecting members of the population in a systematic order (quasi-random) Pros & Cons
Sampling methods (cont.) Stratified sampling dividing population in homegeneous groups and random selection within the group Pros & Cons Cluster sampling when the population is too big, we may select certain clusters (e.g. UK students) Pros & Cons Stage sampling – random selection of clusters
Sample size What precision do we want? Increase the size to get better precision What is the likely variability in the population? Increase the size to account for higher variability
Types of data Nominal Categories, classes Ordinal Nominal with order Discrete Numbers that are distinct points on a scale Continuous Can take any values between points on a scale GIVE EXAMPLES
Descriptive Statistics Mean – average score Median – middle point on the scale of measurement helpful for oddly shaped distributions General description of the sample 2 3 5 7 9 10 12 13 14 16 18 20 21 3 4 4 5 5 5 5 6 7
Distribution of scores Standard Deviation Variance Coefficient of variation
Example (EU-Area-Current-Accounts.xls) http://epp.eurostat.ec.europa.eu/
Normal and Skewed Distribution wikipedia Skewed Distribution Normal Distribution
Approximating Normal Distribution As the sample size increases, the shape of the sampling distribution becomes normal (see also Central Limit Theorem ) http://www.statsoft.com/textbook/esc.html
Correlation between two variables Measure of the relations between two or more variables Correlation coefficient r Negative correlation r -1 Positive correlation r 1 Different methods to calculate r Simplest: based on deviations from the mean
Summary Types of data Population vs Sample Sampling methods Descriptive statistics Normal & Skewed distribution Correlation between variables References Rees D.G., Essential Statistics, Chapman & Hall/CRC, 2000. Cohen, L., Holliday, M., Practical Statistics for Students, Chapman, 1996. http://www.statsoft.com/textbook/esc.html