Presentation is loading. Please wait.

Presentation is loading. Please wait.

STAT 4030 – Programming in R STATISTICS MODULE: Basic Data Analysis

Similar presentations


Presentation on theme: "STAT 4030 – Programming in R STATISTICS MODULE: Basic Data Analysis"— Presentation transcript:

1 STAT 4030 – Programming in R STATISTICS MODULE: Basic Data Analysis
Jennifer Lewis Priestley, Ph.D. Kennesaw State University 1

2 STATISTICS MODULE Basic Descriptive Statistics and Confidence Intervals Basic Visualizations Histograms Pie Charts Bar Charts Scatterplots Ttests One Sample Paired Independent Two Sample ANOVA Chi Square and Odds Regression Basics 2 2 2

3 Statistics Module: Descriptive Statistics
Center, or where do we find most of the data Distribution or shape, such as a bell shaped curve Variation or dispersion, how far spread out is the data, on average, how far are observations from the center? Outliers…do we have points that are so unusual that we need to address them separately? 3

4 Statistics Module: Descriptive Statistics
The “center” of a data set can be described using three different measures: Mean – the commonly known “average” Median – the midpoint Mode – the most frequently occurring value Without any additional information, the “center” of the data is the expected value of any observation pulled at random. 4

5 Statistics Module: Descriptive Statistics
5

6 Statistics Module: Descriptive Statistics
In a symmetric, bell shaped distribution we typically describe the entire distribution using only two numbers: the mean and the standard deviation. The standard deviation is roughly the average distance that observations are from their mean: 6

7 Statistics Module: Descriptive Statistics
The Empirical Rule For any normal curve, approximately 68% of the values fall within 1 standard deviation of the mean 95% of the values fall within 2 standard deviations of the mean 99.7% of the values fall within 3 standard deviations of the mean Using this logic, what is the definition of an outlier? 7

8 Statistics Module: Descriptive Statistics
Boxplots are helpful to visualize all of these at the same time: Minimum Lower Quartile Median Upper Quartile Maximum + * Mean Outlier Inter-Quartile Range 8

9 Statistics Module: Descriptive Statistics
When developing a visual representation of a single variable, the most common tools are – Histograms, Pie Charts, Bar Charts, Box Plots and Stem and Leaf Plots. Visualizations are as much a part of the data discovery process as descriptive statistics. These visualizations will be addressed in a separate set of notes. 9


Download ppt "STAT 4030 – Programming in R STATISTICS MODULE: Basic Data Analysis"

Similar presentations


Ads by Google