AP STATISTICS Section 1.1: Displaying Distributions
Objective: To be able to represent categorical and quantitative data using a variety of graphs. Statistics: The science and art of learning from data. Part I: Exploratory Data Analysis Data: objects described in the data set a. Experimental Unit b. Subject (individual) Variable: any characteristic of the data
Types of Variables 1. Qualitative (categorical): any variable that records what group an individual belongs. 2. Quantitative: a numerical value in which it makes sense to perform a mathematical operation. (average?) Examples: Distribution: refers to how the data looks. AP Graphing Points: 1. Label the title, axes and scales. 2. Provide comments.
Types of Distributions: 1. Bar graph: used for comparing different groups. Categorical graph Categories on the horizontal axis. Frequency on the vertical axis. Bars do not touch. Example: Gender of Pulse Rates
2. Pie Chart: Used to compare categories. Categorical graph Make a table to find central angles. Example: Gender of Pulse Rates
3. Dotplot: Used for small quantitative data sets. Values for the variable are placed on the horizontal axis. Observations are represented by placing dots above the horizontal axis. Each point represents one observation. Data may need to be rounded or truncated. (ex. 10.6, 12.3, 14.5, 8.1) Example: Dotplot of pulse rates
4. Stemplots: used for small quantitative data sets. Stems increase going down. No skipping stems. Leaves increase going away from the stems. Always include the leaf unit. No punctuation. Negative signs go with stems if necessary. (Caution with negatives) Example: Stemplot of pulse rates
Split stemplot: Used when the data does not have a large spread. Can only split the stems in such a way that each stem has the same possible number of leaves. Possible splits: Example: Split stemplot for pulse data.
Back-to-back stemplot: Used to compare a categorical variable with two possible outcomes. Example: Stemplot of pulse rates and gender.
Describing Distributions: (3 things) 1. Center: the middle observation in an ordered data set. (median) May need to find the average of two observations. 2. Shape: describes the overall shape of the distribution. Bell shapedV-shaped Skewed leftExponential Skewed rightBimodal Uniform
Outliers: observations that deviate from the overall pattern of the distribution. Outliers do not dictate the shape of the distribution. 3. Spread: max – min. (range)
5. Histogram: used with large quantitative data sets. Classes go on the horizontal axis Label the horizontal axis with the lower class limits Frequency (relative frequency) on the vertical axis. Approximately 4 – 8 bars. If an observation falls 2 or more bars away from the overall body of the data, it may be an outlier. Must create a frequency table prior to construction. Class width = (range)/(number of classes) Class width represents the number of integers that are contained in a class. *** Always round up even if it is an integer!!! The lower class limit of the first class is always the minimum.
Example: Create a Histogram for the pulse rate data using 5 classes: ClassesFrequencyRelative frequency Cumulative frequency Cumulative Rel. Freq.
6. Ogive: used to relate percentiles to the data and vice versa. Horizontal axis is labeled with lower class limits. Place each point for the cumulative relative frequency above the lower class limit of the next class. Connect points with segments. Vertical axis labeled from 0 to 100% or 0 to n. Example: Ogive for pulse rates
7. Timeplot: used to observe trends. Time on the horizontal axis. Variable on the vertical axis. Connect the points with segments. Example: Apple stock price over years.