Presentation on theme: "Notes Unit 1 Chapters 2-5 Univariate Data. Statistics is the science of data. A set of data includes information about individuals. This information is."— Presentation transcript:
Statistics is the science of data. A set of data includes information about individuals. This information is organized into different categories or characteristics called variables. For example in our class survey, each one of you is an individual represented in the data set. We collected information about the variables gender, height, etc…
We are always interested in the context of the data. That means…where did it come from, who did we include, when was it collected, why were we interested, what did we collect etc…Without context, data is meaningless.
After we understand the context, the next thing we should always do is GRAPH the data.
Graphs Be sure to always: *Title your graphs *Label your axis including units of measure *number your axes in a consistent and reasonable manner
Categorical Data Categorical variables record which of several groups or categories an individual belongs to.
Quantitative Data Quantitative variables take numerical values for which it makes sense to do arithmetic operations like adding or averaging.
Quantitative Data The distribution of a variable tells us what values the variable typically takes and how often it takes them. It is a generalization about the variable values.
When describing any Quantitative distribution: C – Center U – Unusual Features S – Shape S – Spread & B – Be S - Specific
Common Shapes of distributions/graphs Symmetric Skewed to the right Skewed to the left Bimodal Uniform
Once you have chosen a shape, you choose a measure of center and spread based on that shape.
If a distribution is symmetric, we use mean for center. Mean: the average formula:
If the distribution is symmetric, we use standard deviation for spread. Standard deviation:
Measure of Center when the distribution is not symmetric: Median – the middle value in an ordered list. If there are two values in the middle, then average them.
Measure Spread or Variability when the distribution is not Symmetric We can also examine spread by looking at the range of middle 50% of the data. This is called the: Interquartile Range (IQR). IQR = Q3 – Q1
We also need to talk about the 5-number summary. The 5-number summary is made up of the minimum, the first quartile, Q1 (where 25% of the data lies below this value), the median, the third quartile, Q3 (where 75% of the data lies below this value), and the maximum.
Another Measure of Spread or Variability Range – the difference between the maximum and the minimum observations. This is the simplest measure of spread. We typically use this as preliminary information or if it is the only measure of spread we can calculate.
Another measure of spread or variability Variance is the average of the squares of the deviations of the observations from their mean. It is the standard deviation squared.
An outlier is an individual observation in data that falls outside the overall pattern of the data.
Using the IQR, we can perform a test for outliers. Outlier Test: Any value below Q1 – 1.5(IQR) or above Q3 + 1.5 (IQR) is considered an outlier.
Measures that are not strongly affected by extreme values are said to be resistant. The median and IQR are more resistant than the mean and standard deviation. The standard deviation, is even less resistant than the mean.
Measures of Spread or Variability – Why? We measure spread because it’s an important description of what is happening with the data. We need to know about the amount of variation we can expect in a data set.