Summarising and presenting data www.anu.edu.au/nceph/surfstat/

Slides:



Advertisements
Similar presentations
Describing Quantitative Variables
Advertisements

Lesson Describing Distributions with Numbers parts from Mr. Molesky’s Statmonkey website.
Descriptive Measures MARE 250 Dr. Jason Turner.
CHAPTER 4 Displaying and Summarizing Quantitative Data Slice up the entire span of values in piles called bins (or classes) Then count the number of values.
1 Chapter 1: Sampling and Descriptive Statistics.
B a c kn e x t h o m e Parameters and Statistics statistic A statistic is a descriptive measure computed from a sample of data. parameter A parameter is.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-1 Statistics for Business and Economics 7 th Edition Chapter 2 Describing Data:
B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.
Very Basic Statistics.
Chap 3-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 3 Describing Data: Numerical Statistics for Business and Economics.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Describing Data: Numerical
Agresti/Franklin Statistics, 1 of 63 Chapter 2 Exploring Data with Graphs and Numerical Summaries Learn …. The Different Types of Data The Use of Graphs.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 4.1 Chapter Four Numerical Descriptive Techniques.
AP Statistics Chapters 0 & 1 Review. Variables fall into two main categories: A categorical, or qualitative, variable places an individual into one of.
Describing distributions with numbers
Chapter 1 Descriptive Analysis. Statistics – Making sense out of data. Gives verifiable evidence to support the answer to a question. 4 Major Parts 1.Collecting.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
(c) 2007 IUPUI SPEA K300 (4392) Outline: Numerical Methods Measures of Central Tendency Representative value Mean Median, mode, midrange Measures of Dispersion.
REPRESENTATION OF DATA.
Numerical Descriptive Techniques
Chapter 3 – Descriptive Statistics
Variable  An item of data  Examples: –gender –test scores –weight  Value varies from one observation to another.
Methods for Describing Sets of Data
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 1 Overview and Descriptive Statistics.
© Copyright McGraw-Hill CHAPTER 3 Data Description.
Section 1 Topic 31 Summarising metric data: Median, IQR, and boxplots.
Chapter 2 Describing Data.
Skewness & Kurtosis: Reference
An Introduction to Statistics. Two Branches of Statistical Methods Descriptive statistics Techniques for describing data in abbreviated, symbolic fashion.
1 CHAPTER 3 NUMERICAL DESCRIPTIVE MEASURES. 2 MEASURES OF CENTRAL TENDENCY FOR UNGROUPED DATA  In Chapter 2, we used tables and graphs to summarize a.
Sampling Design and Analysis MTH 494 Ossam Chohan Assistant Professor CIIT Abbottabad.
Categorical vs. Quantitative…
INVESTIGATION 1.
Data Quantitative data are numerical observation –age of students in a class. Age is quantitative data because it quantifies the age of a person Qualitative.
Business Statistics Spring 2005 Summarizing and Describing Numerical Data.
To be given to you next time: Short Project, What do students drive? AP Problems.
1 Descriptive Statistics 2-1 Overview 2-2 Summarizing Data with Frequency Tables 2-3 Pictures of Data 2-4 Measures of Center 2-5 Measures of Variation.
1 Review Sections 2.1, 2.2, 1.3, 1.4, 1.5, 1.6 in text.
Unit 3: Averages and Variations Week 6 Ms. Sanchez.
Notes Unit 1 Chapters 2-5 Univariate Data. Statistics is the science of data. A set of data includes information about individuals. This information is.
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons. 3-1 Business Statistics, 4e by Ken Black Chapter 3 Descriptive Statistics.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall2(2)-1 Chapter 2: Displaying and Summarizing Data Part 2: Descriptive Statistics.
LIS 570 Summarising and presenting data - Univariate analysis.
More Univariate Data Quantitative Graphs & Describing Distributions with Numbers.
Statistics and Data Analysis
Descriptive Statistics Unit 6. Variable Any characteristic (data) recorded for the subjects of a study ex. blood pressure, nesting orientation, phytoplankton.
Chapter 5 Describing Distributions Numerically Describing a Quantitative Variable using Percentiles Percentile –A given percent of the observations are.
Statistics Unit Test Review Chapters 11 & /11-2 Mean(average): the sum of the data divided by the number of pieces of data Median: the value appearing.
(Unit 6) Formulas and Definitions:. Association. A connection between data values.
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
Descriptive Statistics
Exploratory Data Analysis
Methods for Describing Sets of Data
Descriptive Measures Descriptive Measure – A Unique Measure of a Data Set Central Tendency of Data Mean Median Mode 2) Dispersion or Spread of Data A.
Measures of Central Tendency
Statistics Unit Test Review
Chapter 6 ENGR 201: Statistics for Engineers
CHAPTER 3 Data Description 9/17/2018 Kasturiarachi.
NUMERICAL DESCRIPTIVE MEASURES
Chapter 3 Describing Data Using Numerical Measures
Descriptive Statistics
Topic 5: Exploring Quantitative data
Displaying Distributions with Graphs
Displaying and Summarizing Quantitative Data
Honors Statistics Review Chapters 4 - 5
MBA 510 Lecture 2 Spring 2013 Dr. Tonya Balan 4/20/2019.
Types of variables. Types of variables Categorical variables or qualitative identifies basic differentiating characteristics of the population.
Math 341 January 24, 2007.
Presentation transcript:

Summarising and presenting data

Types of data Two broad types: qualitative and quantitative Qualitative data arise when the observations fall into separate distinct categories. Examples are:  Colour of eyes : blue, green, brown etc  Exam result : pass or fail  Socio-economic status : low, middle or high. Such data are discrete

Quantitative Data Quantitative or numerical data arise when the observations are counts or measurements Discrete if measurements are integers –number of people in a household, –number of cigarettes smoked per day Continuous if measurements can take any value, (usually within some range) – weight – height – time

Variables and statistics Quantities such as sex and weight are called variables, because the value of these quantities vary from one observation to another. Numbers calculated to describe important features of the data are called statistics. For example,  the proportion of females  the average age of unemployed persons, in a sample of residents of a town are statistics.

Example: Commodore data Prices of n=38 second-hand cars * Continuous data, need to summarise

Constructing a frequency distribution Calculate the range and divide it by the chosen number of intervals to get the approximate length for each interval. Usually use from 5 to 15 intervals. Define interval end points so they don't overlap or leave gaps (ie. they are mutually exclusive and exhaustive) - This ensures that every observation belongs in exactly one interval. It is a usually simpler idea to have all intervals of the same length Count the number of values in each interval (the class frequency) - go through the data once only and use tally marks to help counting. Usually relative frequencies or percentages are helpful to show the distribution of data.

Frequency distribution

Histogram area of rectangle = frequency (or relative frequency) But area = length x height So if all intervals are the same length, L

Features of a histogram

Mode The mode is the value or category which occurs most frequently. If several data values occur with the same maximal frequency, they are all modes. For example, in the Commodore data, using the grouped data, the class interval, [8, ,999], is the modal interval.

Modality and Symmetry Modality: No. of peaks –E.g. one peak-unimodal Skewness: departure from symmetry positive skewness (skew to the right) negative skewness (skew to the left)

Human histogram

Human histogram explained

Process control example Is process in control? Why the gap? Deming 500 steel rods Ideal dia. = 1cm

MEASURES OF CENTRAL TENDENCY ("Averages") Mean (arithmetic mean): (read as 'x bar') Notation: denote data values by x 1,x 2,…,x n n denotes no. of data points

Mean for frequency distribution

Median ‘Middle’ value of the data set A number which is greater than half the data values and less than the other half (n+1)/2 –th ordered observation Data set: 6, 6.7, 3.8, 7, 5.8 Ordered: 3.8, 5.8, 6, 6.7, 7 Median: (5+1)/2 ordered obs. If even: 6, 6.7, 3.8, 7, 5.8, 9.975

Quartiles and percentiles Median: 50% below, 50% above 1 st quartile: 25% below, 50% above Q 1 : (n+1)/4 ordered observation Q 3 (3 rd quartile): (3n+1)/4 ordered observation Data set: 6, 6.7, 3.8, 7, 5.8 Ordered: 3.8, 5.8, 6, 6.7, 7 p-th percentile or quantile: p% below, (100-p)% above

Stem and leaf plot Finally order the leaves

Percentiles via stem and leaf plot Get the median: Median= (n+1)/2 ordered obs. i.e th ordered observation Lies in the stem 7| Median=(72+76)/2 = 74 Get 1 st quartile: Q 1 = (n+1)/4 ordered obs. Get third quartile: Q 3 = (3n+1)/4 ordered obs.

Percentiles from a freq. distr. What are median, 1 st and 3 rd quartiles ? Actual values are 6700, 5900 and You lose details in a frequency distribution

Comparison of Mean and Median Data set A: 2,3,3,4,5,7,8 Data set B: 2,3,3,4,5,8,20 Both have n = 7 values. The median is not affected by extreme values, but the mean is changed Median is useful for incomplete data E.g. consider an experiment to measure average lifetime of a light bulb (n=6) : 200,400, 650, 700, 900,..

Comparing Mean, Median and Mode If distribution is symmetric and unimodal, all three coincide If only symmetric, mean and median coincide If distribution is not symmetric, better to use median than mean

MEASURES OF VARIABILITY Statistics which summarise how spread out the data values are. Also called measures of dispersion The range = max-min (used in quality control) The range is susceptible to extreme values

IQR The interquartile range is defined as IQR = Q 3 - Q 1 IQR is less susceptible to outliers (like the median)

Five number summary Boxplot (or box-and-whisker plot) Box contains middle 50% of data If an obs is > 3 times IQR, it is an outlier

Boxplots are useful for comparing groups

Deviations from the mean

Summarising deviations from mean The deviation of each value x i from the mean is: The mean (or sum) of deviations is not a good summary: Instead use a positive function such as d i 2 or |d i | Variance or mean square error: Mean absolute deviation:

Variance and Standard Deviation Usually n-1 instead of n is used in the denominator : sample variance Problem: squared distances have squared units s = the sample standard deviation.

Example: small data set Data set A: {x i } = 2, 3, 3, 4, 5, 7, 8: There are n=7 observations and mean = The deviations from the mean, d i, are: -2.57, -1.57, -1.57, -0.57, 0.43, 2.43, So

Shortcut formulae for variance

Bivariate methods We have (mostly) looked at univariate methods Most interesting problems are bi (or multi) variate Continuous variable vs. qualitative variable: comparative boxplot Continuous variable vs. continuous variable: scatterplot

Presenting bivariate data Scatterplots are useful for illustrating the relationship between continuous variables (x i, y i ), i = 1,..n Indicates type of relationship

Creating a scatterplot Step 1: Create variables ht and wt Step 2: plot(ht,wt,xlab=“height”, ylab=“weight”)

Summarising a relationship plot(temperature,ozone) abline(lm(ozone~temperature, data=air))

Summarising a nonlinear relationship plot(E,NOx) lines(supsmu(E,NOx)) Use a smoother