Exploratory data analysis, descriptive measures and sampling or, “How to explore numbers in tables and charts”

Slides:



Advertisements
Similar presentations
Statistical Techniques I EXST7005 Start here Measures of Dispersion.
Advertisements

Learning Objectives 1 Copyright © 2002 South-Western/Thomson Learning Data Processing and Fundamental Data Analysis CHAPTER fourteen.
Measures of Dispersion
Statistics.
QUANTITATIVE DATA ANALYSIS
1Feb 13, 2006BUS304 – Review chapter 1-3 Descriptive statistics1 Review of Chapter 1-3 Descriptive Statistics  Descriptive Statistics  Ways to collect,
Statistical Analysis SC504/HS927 Spring Term 2008 Week 17 (25th January 2008): Analysing data.
Introduction to Educational Statistics
Central Tendency & Variability Dec. 7. Central Tendency Summarizing the characteristics of data Provide common reference point for comparing two groups.
Very Basic Statistics.
Social Research Methods
Measures of Central Tendency
Introduction to Statistics February 21, Statistics and Research Design Statistics: Theory and method of analyzing quantitative data from samples.
Understanding Research Results
Statistical Analysis I have all this data. Now what does it mean?
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 12 Describing Data.
Chapter 1 Descriptive Analysis. Statistics – Making sense out of data. Gives verifiable evidence to support the answer to a question. 4 Major Parts 1.Collecting.
@ 2012 Wadsworth, Cengage Learning Chapter 5 Description of Behavior Through Numerical 2012 Wadsworth, Cengage Learning.
Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics.
Census A survey to collect data on the entire population.   Data The facts and figures collected, analyzed, and summarized for presentation and.
ITEC6310 Research Methods in Information Technology Instructor: Prof. Z. Yang Course Website: c6310.htm Office:
1 DATA DESCRIPTION. 2 Units l Unit: entity we are studying, subject if human being l Each unit/subject has certain parameters, e.g., a student (subject)
Description and measurement
© Copyright McGraw-Hill CHAPTER 3 Data Description.
PTP 560 Research Methods Week 8 Thomas Ruediger, PT.
Statistical Analysis I have all this data. Now what does it mean?
Measures of Central Tendency and Dispersion Preferred measures of central location & dispersion DispersionCentral locationType of Distribution SDMeanNormal.
Measures of Dispersion
1 PUAF 610 TA Session 2. 2 Today Class Review- summary statistics STATA Introduction Reminder: HW this week.
Chapter 2 Describing Data.
© 2006 McGraw-Hill Higher Education. All rights reserved. Numbers Numbers mean different things in different situations. Consider three answers that appear.
Chapter 2 Statistical Concepts Robert J. Drummond and Karyn Dayle Jones Assessment Procedures for Counselors and Helping Professionals, 6 th edition Copyright.
Sampling Design and Analysis MTH 494 Ossam Chohan Assistant Professor CIIT Abbottabad.
LECTURE CENTRAL TENDENCIES & DISPERSION POSTGRADUATE METHODOLOGY COURSE.
 The mean is typically what is meant by the word “average.” The mean is perhaps the most common measure of central tendency.  The sample mean is written.
BASIC STATISTICAL CONCEPTS Chapter Three. CHAPTER OBJECTIVES Scales of Measurement Measures of central tendency (mean, median, mode) Frequency distribution.
IE(DS)1 Descriptive Statistics Data - Quantitative observation of Behavior What do numbers mean? If we call one thing 1 and another thing 2 what do we.
Statistical Analysis of Data. What is a Statistic???? Population Sample Parameter: value that describes a population Statistic: a value that describes.
Statistics topics from both Math 1 and Math 2, both featured on the GHSGT.
LIS 570 Summarising and presenting data - Univariate analysis.
Introduction to statistics I Sophia King Rm. P24 HWB
Variability Introduction to Statistics Chapter 4 Jan 22, 2009 Class #4.
Why do we analyze data?  It is important to analyze data because you need to determine the extent to which the hypothesized relationship does or does.
Preparing for Algebra Chapter Plan for Problem Solving Pg. P5-P6 Obj: Learn how to use the four-step problem- solving plan.
Educational Research: Data analysis and interpretation – 1 Descriptive statistics EDU 8603 Educational Research Richard M. Jacobs, OSA, Ph.D.
REVIEW OF BASIC STATISTICAL CONCEPTS Kerstin Palombaro PT, PhD, CAPS HSED 851 PRIVITERA CHAPTERS 1-4.
1 By maintaining a good heart at every moment, every day is a good day. If we always have good thoughts, then any time, any thing or any location is auspicious.
Statistics Vocabulary. 1. STATISTICS Definition The study of collecting, organizing, and interpreting data Example Statistics are used to determine car.
7 th Grade Math Vocabulary Word, Definition, Model Emery Unit 4.
Exploratory Data Analysis
Methods for Describing Sets of Data
Statistical Methods Michael J. Watts
Chapter 6 Introductory Statistics and Data
Central Tendency & Scale Types
Statistical Methods Michael J. Watts
Module 6: Descriptive Statistics
Review Data: {2, 5, 6, 8, 5, 6, 4, 3, 2, 1, 4, 9} What is F(5)? 2 4 6
Description of Data (Summary and Variability measures)
Social Research Methods
Descriptive Statistics
Basic Statistical Terms
Statistics Statistics- Inferential Statistics Descriptive Statistics
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
Week 3 Lecture Statistics For Decision Making
Descriptive and Inferential
Review for Exam 1 Ch 1-5 Ch 1-3 Descriptive Statistics
Descriptive Statistics
Chapter 6 Introductory Statistics and Data
Biostatistics Lecture (2).
Presentation transcript:

Exploratory data analysis, descriptive measures and sampling or, “How to explore numbers in tables and charts”

Outline Types of numerical data Frequency distributions Relationships between data Sampling strategies

Four types of numerical data Categorical (or nominal) –Categories: yes/no, male/female… Ordinal –Whole numbers, but showing a rank order (e.g. surveys with frequencies of scores, as in 1-10) Continuous variables (include fractions) –Interval-level measurements (e.g. o C, or dates) No zero; addition and subtraction are meaningful But ratios are not meaningful - so multiplication and division cannot be carried out –Ratio-level measurements (e.g. areas of land) Have a zero - an absolute origin Can be multiplied and divided

Discussion Examples of the four kinds of numerical data What sorts of data do geographers work with most?

Summary Source: Graphpad 2010

Descriptions of frequency for sample data: The sophistication of the machine! –We can put data into a programme, and get ‘statistics’ out, but do we know what they really mean? What basic frequency descriptors are there? –Central values (measures of central tendency) Mode Median Mean –Dispersal Range Quartiles Standard Deviation

The mode –The measurement with the maximum frequency In this instance –The mode is “3”

The median –The middle measurement when the measurements are placed in order of magnitude In this instance –75 measurements –The 38th measurement –Again “3”

The mean, or average What is the mean? –What is loosely called the average –Mean =  x/n In this example –(10 x 1) + (20 x 2) + (30 x 3) + (15 x 4) / 75 –200/75 = 2.667

Measures of dispersal Range –Difference between maximum and minimum (6) Quartiles –Dividing a frequency distribution into quarters (25 in each in this example) –One quarter of the area below first quartile (3) –One quarter of the area above the upper quartile (5) –Median in the middle (4) –Inter-quartile range Difference between quartiles (2)

Measures of dispersal Standard deviation of a population (of variance) – x-) 2 /n} –Note for a sample use n-1 (linked to degrees of freedom) In essence –Subtracting the mean (4) from each value –Summing the squares of these differences To remove effect of sign –Dividing by the total To weight it –And then taking the square root to scale it –  n = 1.581

Descriptors Categorical/nominal –Central tendency - mode Ordinal –Mode or median –Spread through inter-quartile range Interval –Mode, median or mean –Spread with standard deviation Ratio –Mode, median or mean –Spread with standard deviation

Summary Source: Graphpad 2010

Opportunity for discussion There are different kinds of numbers Measures we use to describe them therefore need to be different

Showing relationships How are different sets of data related? How can we best visualise these relationships? –Hans Rosling and Gapminder Use of graphs and maps Time series Independent (x) and dependent (y) variables

Clustering techniques Exploring ways to disaggregate data –Clusters within an overall distribution What might the plot on the right represent?

Outliers being the interesting data Again, how might we ‘explain’ the distribution on the right? Always seek to explore differences within the data

Appropriate representations for various data Columns/histogram Lines Pie charts Scatter graphs Doughnuts Surfaces

Distinguishing between cause and effect Independent and dependent variables –X generally independent –Y generally dependent Essential to distinguish between –A relationship between data; and –Actual explanation of cause and effect Importance for analysis –Independent variables explaining variation in dependent variables

Opportunity for discussion

Samples and populations The basic question –Does our sample represent the population as a whole? If so –We can be certain (within specific limits) of our results –We can claim that they are significant Sampling strategy is therefore of crucial importance –And must be discussed in detail in methodologies

Sampling strategies Non-probability –Convenience Whoever comes along –Purposive Case study approach: a “typical” place –Quota On basis of criteria such as age, ethnicity, gender Probability –Random Equal likelihood of anyone being selected Use of Random Number Tables –Systematic Selecting at regular intervals

Stratified sampling Assuming an existing pattern in the population –Can be used with both probability and non- probability –Divides the population into strata (subsets) that are each then sampled using one of the other methods Socio-economic groups, gender…

Our samples determine our results Need for sufficient sample size to be significant –How do we do know what sample size? –Does the sample represent the population –Relationship to analytical structure Must justify our sampling strategies in our methodologies Sampling applies equally in quantitative and qualitative research

Discussion