STAT 4030 – Programming in R STATISTICS MODULE: Basic Data Analysis

Slides:



Advertisements
Similar presentations
Describing Quantitative Variables
Advertisements

Descriptive Measures MARE 250 Dr. Jason Turner.
Introduction to Summary Statistics
IB Math Studies – Topic 6 Statistics.
Descriptive Statistics
B a c kn e x t h o m e Parameters and Statistics statistic A statistic is a descriptive measure computed from a sample of data. parameter A parameter is.
Measures of Dispersion
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Programming in R Describing Univariate and Multivariate data.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
LECTURE 12 Tuesday, 6 October STA291 Fall Five-Number Summary (Review) 2 Maximum, Upper Quartile, Median, Lower Quartile, Minimum Statistical Software.
Exploratory Data Analysis. Computing Science, University of Aberdeen2 Introduction Applying data mining (InfoVis as well) techniques requires gaining.
Objectives 1.2 Describing distributions with numbers
Rules of Data Dispersion By using the mean and standard deviation, we can find the percentage of total observations that fall within the given interval.
LECTURE 8 Thursday, 19 February STA291 Fall 2008.
Univariate Data Chapters 1-6. UNIVARIATE DATA Categorical Data Percentages Frequency Distribution, Contingency Table, Relative Frequency Bar Charts (Always.
M07-Numerical Summaries 1 1  Department of ISM, University of Alabama, Lesson Objectives  Learn when each measure of a “typical value” is appropriate.
Chapter 21 Basic Statistics.
Lecture 5 Dustin Lueker. 2 Mode - Most frequent value. Notation: Subscripted variables n = # of units in the sample N = # of units in the population x.
Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.
1 Results from Lab 0 Guessed values are biased towards the high side. Judgment sample means are biased toward the high side and are more variable.
Statistics Chapter 1: Exploring Data. 1.1 Displaying Distributions with Graphs Individuals Objects that are described by a set of data Variables Any characteristic.
Organizing Data AP Stats Chapter 1. Organizing Data Categorical Categorical Dotplot (also used for quantitative) Dotplot (also used for quantitative)
Unit 3: Averages and Variations Week 6 Ms. Sanchez.
Descriptive Statistics – Graphic Guidelines
Plan for Today: Chapter 11: Displaying Distributions with Graphs Chapter 12: Describing Distributions with Numbers.
MODULE 3: DESCRIPTIVE STATISTICS 2/6/2016BUS216: Probability & Statistics for Economics & Business 1.
Descriptive Statistics Unit 6. Variable Any characteristic (data) recorded for the subjects of a study ex. blood pressure, nesting orientation, phytoplankton.
DATA ANALYSIS AND STATISTICS Methodology for Describing and Understanding VARIABILITY.
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
Slide 1 Copyright © 2004 Pearson Education, Inc.  Descriptive Statistics summarize or describe the important characteristics of a known set of population.
Section 2.1 Visualizing Distributions: Shape, Center, and Spread.
Descriptive Statistics ( )
Methods for Describing Sets of Data
Statistics 200 Lecture #4 Thursday, September 1, 2016
EXPLORATORY DATA ANALYSIS and DESCRIPTIVE STATISTICS
Warm Up What is the mean, median, mode and outlier of the following data: 16, 19, 21, 18, 18, 54, 20, 22, 23, 17 Mean: 22.8 Median: 19.5 Mode: 18 Outlier:
STATISTICS ELEMENTARY MARIO F. TRIOLA
Module 6: Descriptive Statistics
Correlation and Regression Basics
1st Semester Final Review Day 1: Exploratory Data Analysis
CHAPTER 3 Data Description 9/17/2018 Kasturiarachi.
IENG 486: Statistical Quality & Process Control
Statistical Reasoning in Everyday Life
Statistical Reasoning
Description of Data (Summary and Variability measures)
The Practice of Statistics, Fourth Edition.
Laugh, and the world laughs with you. Weep and you weep alone
STAT 4030 – Programming in R STATISTICS MODULE: Confidence Intervals
Bar graphs are used to compare things between different groups
Correlation and Regression Basics
Descriptive Statistics
Unit 4 Statistics Review
Distributions (Chapter 1) Sonja Swanson
Bivariate Testing (Chi Square)
HMI 7530– Programming in R STATISTICS MODULE: Confidence Intervals
HMI 7530– Programming in R STATISTICS MODULE: Basic Data Analysis
Algebra 1/4/17
Bivariate Testing (Chi Square)
12/1/2018 Normal Distributions
The absolute value of each deviation.
Displaying Distributions with Graphs
Displaying and Summarizing Quantitative Data
Statistics: The Interpretation of Data
Means & Medians.
Exploratory Data Analysis
Chapter 1: Exploring Data
Advanced Algebra Unit 1 Vocabulary
Business and Economics 7th Edition
Introductory Statistics
Presentation transcript:

STAT 4030 – Programming in R STATISTICS MODULE: Basic Data Analysis Jennifer Lewis Priestley, Ph.D. Kennesaw State University 1

STATISTICS MODULE Basic Descriptive Statistics and Confidence Intervals Basic Visualizations Histograms Pie Charts Bar Charts Scatterplots Ttests One Sample Paired Independent Two Sample ANOVA Chi Square and Odds Regression Basics 2 2 2

Statistics Module: Descriptive Statistics Center, or where do we find most of the data Distribution or shape, such as a bell shaped curve Variation or dispersion, how far spread out is the data, on average, how far are observations from the center? Outliers…do we have points that are so unusual that we need to address them separately? 3

Statistics Module: Descriptive Statistics The “center” of a data set can be described using three different measures: Mean – the commonly known “average” Median – the midpoint Mode – the most frequently occurring value Without any additional information, the “center” of the data is the expected value of any observation pulled at random. 4

Statistics Module: Descriptive Statistics 5

Statistics Module: Descriptive Statistics In a symmetric, bell shaped distribution we typically describe the entire distribution using only two numbers: the mean and the standard deviation. The standard deviation is roughly the average distance that observations are from their mean: 6

Statistics Module: Descriptive Statistics The Empirical Rule For any normal curve, approximately 68% of the values fall within 1 standard deviation of the mean 95% of the values fall within 2 standard deviations of the mean 99.7% of the values fall within 3 standard deviations of the mean Using this logic, what is the definition of an outlier? 7

Statistics Module: Descriptive Statistics Boxplots are helpful to visualize all of these at the same time: Minimum Lower Quartile Median Upper Quartile Maximum + * Mean Outlier Inter-Quartile Range 8

Statistics Module: Descriptive Statistics When developing a visual representation of a single variable, the most common tools are – Histograms, Pie Charts, Bar Charts, Box Plots and Stem and Leaf Plots. Visualizations are as much a part of the data discovery process as descriptive statistics. These visualizations will be addressed in a separate set of notes. 9