STAT02 - Descriptive statistics (cont.) 1 Descriptive statistics (cont.) Lecturer: Smilen Dimitrov Applied statistics for testing and evaluation – MED4.

Slides:



Advertisements
Similar presentations
Random Sampling and Data Description
Advertisements

Appendix A. Descriptive Statistics Statistics used to organize and summarize data in a meaningful way.
Introduction to Summary Statistics
Modeling Process Quality
Lecture 2 PY 427 Statistics 1 Fall 2006 Kin Ching Kong, Ph.D
Introduction to Educational Statistics
B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.
Descriptive statistics (Part I)
QBM117 Business Statistics
Lecture 4 Dustin Lueker.  The population distribution for a continuous variable is usually represented by a smooth curve ◦ Like a histogram that gets.
Statistical Analysis I have all this data. Now what does it mean?
Chapter 2 Describing Data with Numerical Measurements
Describing distributions with numbers
Objective To understand measures of central tendency and use them to analyze data.
1 Statistical Analysis - Graphical Techniques Dr. Jerrell T. Stracener, SAE Fellow Leadership in Engineering EMIS 7370/5370 STAT 5340 : PROBABILITY AND.
Census A survey to collect data on the entire population.   Data The facts and figures collected, analyzed, and summarized for presentation and.
BIOSTAT - 2 The final averages for the last 200 students who took this course are Are you worried?
6.1 What is Statistics? Definition: Statistics – science of collecting, analyzing, and interpreting data in such a way that the conclusions can be objectively.
Chapter 9 Statistics Section 9.1 Frequency Distributions; Measures of Central Tendency.
POPULATION DYNAMICS Required background knowledge:
CHAPTER 1 Basic Statistics Statistics in Engineering
STATISTICS!!! The science of data. What is data? Information, in the form of facts or figures obtained from experiments or surveys, used as a basis for.
2011 Summer ERIE/REU Program Descriptive Statistics Igor Jankovic Department of Civil, Structural, and Environmental Engineering University at Buffalo,
Statistics Chapter 9. Statistics Statistics, the collection, tabulation, analysis, interpretation, and presentation of numerical data, provide a viable.
Smith/Davis (c) 2005 Prentice Hall Chapter Four Basic Statistical Concepts, Frequency Tables, Graphs, Frequency Distributions, and Measures of Central.
© Copyright McGraw-Hill CHAPTER 3 Data Description.
Worked examples and exercises are in the text STROUD (Prog. 28 in 7 th Ed) PROGRAMME 27 STATISTICS.
Statistical Tools in Evaluation Part I. Statistical Tools in Evaluation What are statistics? –Organization and analysis of numerical data –Methods used.
STAT 211 – 019 Dan Piett West Virginia University Lecture 1.
QBM117 Business Statistics Descriptive Statistics Numerical Descriptive Measures.
Worked examples and exercises are in the text STROUD PROGRAMME 27 STATISTICS.
Chapter 2 Describing Data.
Biostatistics Class 1 1/25/2000 Introduction Descriptive Statistics.
Measures of central tendency are statistics that express the most typical or average scores in a distribution These measures are: The Mode The Median.
STATISTICS. Statistics * Statistics is the area of science that deals with collection, organization, analysis, and interpretation of data. * A collection.
STATISTICS!!! The science of data.
Subbulakshmi Murugappan H/P:
1 Descriptive Statistics 2-1 Overview 2-2 Summarizing Data with Frequency Tables 2-3 Pictures of Data 2-4 Measures of Center 2-5 Measures of Variation.
Worked examples and exercises are in the text STROUD PROGRAMME 27 STATISTICS.
Chapter Eight: Using Statistics to Answer Questions.
Lecture 4 Dustin Lueker.  The population distribution for a continuous variable is usually represented by a smooth curve ◦ Like a histogram that gets.
Graphical Displays of Information
LIS 570 Summarising and presenting data - Univariate analysis.
HL Psychology Internal Assessment
2/15/2016ENGM 720: Statistical Process Control1 ENGM Lecture 03 Describing & Using Distributions, SPC Process.
Measurements and Their Analysis. Introduction Note that in this chapter, we are talking about multiple measurements of the same quantity Numerical analysis.
Worked examples and exercises are in the text STROUD PROGRAMME 27 STATISTICS.
Lecture 8: Measurement Errors 1. Objectives List some sources of measurement errors. Classify measurement errors into systematic and random errors. Study.
STAT03 - Descriptive statistics (cont.) - variability 1 Descriptive statistics (cont.) - variability Lecturer: Smilen Dimitrov Applied statistics for testing.
STROUD Worked examples and exercises are in the text Programme 28: Data handling and statistics DATA HANDLING AND STATISTICS PROGRAMME 28.
Chapter 6: Descriptive Statistics. Learning Objectives Describe statistical measures used in descriptive statistics Compute measures of central tendency.
Applied Quantitative Analysis and Practices LECTURE#05 By Dr. Osman Sadiq Paracha.
Chapter 4: Measures of Central Tendency. Measures of central tendency are important descriptive measures that summarize a distribution of different categories.
Exploratory Data Analysis
INTRODUCTION TO STATISTICS
Introduction to Summary Statistics
IENG 486: Statistical Quality & Process Control
Description of Data (Summary and Variability measures)
Introduction to Summary Statistics
STATS DAY First a few review questions.
Introduction to Summary Statistics
Introduction to Summary Statistics
Introduction to Summary Statistics
Introduction to Summary Statistics
Introduction to Summary Statistics
Chapter Nine: Using Statistics to Answer Questions
DESIGN OF EXPERIMENT (DOE)
Descriptive Statistics
Introduction to Summary Statistics
Ch. 12 Vocabulary 9.) measure of central tendency 10.) outlier
Presentation transcript:

STAT02 - Descriptive statistics (cont.) 1 Descriptive statistics (cont.) Lecturer: Smilen Dimitrov Applied statistics for testing and evaluation – MED4

STAT02 - Descriptive statistics (cont.) 2 Introduction We previously discussed arithmetic mean, as a measure of central tendency (location) of a data sample (collection) in descriptive statistics Here we continue with other important measures of central tendency – namely mode and median We will also get acquainted with frequency tables, and their graphical form – histograms – and also get acquainted to the range as a measure of statistical variability (dispersion or spread) in descriptive statistics We will look at how we perform these operations in R, and a bit more about plotting

STAT02 - Descriptive statistics (cont.) 3 Arithmetic mean as central tendency, range and outliers The range is the length of the smallest interval which contains all the data. –calculated by subtracting the smallest observations from the greatest In R, we can use the commands min and max to find the range of a data collection We can use abline to plot straight lines

STAT02 - Descriptive statistics (cont.) 4 Arithmetic mean as central tendency, range and outliers Our sample data set (raisins), with quantities plotted as bar graph (using barplot), and with the range and arithmetic mean shown:

STAT02 - Descriptive statistics (cont.) 5 Arithmetic mean as central tendency, range and outliers Our sample data set (raisins), with quantities plotted as point/line plot (using plot), and with the range and arithmetic mean shown: The y axis is auto scaled to show the range with plot

STAT02 - Descriptive statistics (cont.) 6 Arithmetic mean as central tendency, range and outliers Our sample data set (raisins), with quantities plotted as point/line plot (using plot), and with the range and arithmetic mean shown – with only one value changed to lie outside the original range:

STAT02 - Descriptive statistics (cont.) 7 Arithmetic mean as central tendency, range and outliers Both the range and the arithmetic mean change significantly, if only one value is quite different than the others –outlier - is a single observation 'far away' from the rest of the data. However, one outlier does not change the fact that the other values still tend to have values close to the original arithmetic mean and range Therefore we need tools / methods for describing central tendency and variability, which are less sensitive to outliers For central tendency, we can use mode and median

STAT02 - Descriptive statistics (cont.) 8 Mode and frequency distribution By definition, outliers occur rarely - they are single occurrences. Useful to see which values occur the most often (most frequently) - mode –mode means the most frequent value assumed by a random variable, or occurring in a sampling of a random variable. –applied both to probability distributions and to collections of experimental data –Can be unusable for real numbers (they are unique – occur only once), unless we apply histogram techniques –Can be applied to nominal data (most frequent name for instance)

STAT02 - Descriptive statistics (cont.) 9 Mode and frequency distribution To see which value occurs most frequently, we must first count how many times does each value in the data collection occur – frequency count == distribution –Collection and aggregation of data result in a distribution. Distributions are most often in the form of a histogram or a table (frequency table) – looking to approximate to a math function, and infer conclusions –Frequency of an event i is the number n i of times the event occurred in the experiment or the study. These frequencies are often graphically represented in histograms. absolute frequencies - when the counts n i themselves are given (relative) frequencies - when the counts are normalized by the total number of events:

STAT02 - Descriptive statistics (cont.) 10 Building a histogram and frequency table (ex using applets) 1. Standard collection of our data:2. Building a point plot histogram “manually”, from the individual counts observed 4. Transition to a bar graph histogram from a point plot histogram 3. Building a frequency table from a point plot histogram

STAT02 - Descriptive statistics (cont.) 11 Mode and frequency distribution - histogram Histogram – graphical display of a frequency table (distribution) –A histogram is a graphical display of tabulated frequencies. –A histogram is the graphical version of a table which shows what proportion of cases fall into each of several or many specified categories. –The categories are usually specified as non-overlapping intervals of some variable – bins –In a more general mathematical sense - a histogram is simply a mapping that counts the number of observations that fall into various disjoint categories (known as bins), whereas the graph of a histogram is merely one way to represent a histogram.

STAT02 - Descriptive statistics (cont.) 12 Mode and frequency distribution - histogram In R – a frequency table is obtained through table command A histogram is most easily drawn (for integer data) by plotting the output of table using plot or barplot

STAT02 - Descriptive statistics (cont.) 13 Mode and frequency distribution - histogram In R there is a special command hist that is used for plotting a histogram –however, as it can accept real (in addition to integer) numeric data, it needs some fine-tuning to graph integer data correctly Plotting relative frequencies is relatively easy – by dividing with the number of elements ( length ) in the data collection

STAT02 - Descriptive statistics (cont.) 14 Median A median is a number dividing the higher half of a sample, a population, or a probability distribution from the lower half. –At most half the population have values less than the median and at most half have values greater than the median. –If both groups contain less than half the population, then some of the population is exactly equal to the median. In R – means one should –Sort the data collection – in ascending order –Find out whether the data collection has odd or even number of elements If they are odd, return the mid-element in the collection If they are even, return the mean value of the two mid-elements in the data sample

STAT02 - Descriptive statistics (cont.) 15 Review Arithmetic mean Median Mode Range Measures of Central tendency (location) Measure of Statistical variability (dispersion - spread) Descriptive statistics

STAT02 - Descriptive statistics (cont.) 16 Exercise for mini-module 2 – STAT02 Exercise Use the Sample Data Set of Southern Oscillations, given on Collect the southern oscillation data per month for three consecutive years in an Excel sheet. –Choose the years based on your group number g, according to the formula: (so group 1 would choose 1955, 1956, 1957; group 2 would choose 1958, 1959, 1960 etc.) –Multiply all oscillation data with 10 so as to work with integers. –Hint: you could use month number as row names, and years as column names in Excel and in R. Import the data into R, and for each year, find the arithmetic mean, the median and the mode of the oscillation. Using R, plot as quantity the oscillation each month, for each of the assigned years. Mark graphically the range and the median on each graph. Using R, plot the relative frequency histogram for each of the assigned years. Mark graphically the arithmetic mean on each graph. Delivery: Deliver the collected data (in tabular format), the found statistics and the requested graphs for the assigned years in an electronic document. You are welcome to include R code as well.