V pátek 9. 10. nebude přednáška. Cvičení v tomto týdnu bude.

Slides:



Advertisements
Similar presentations
MEASURES OF CENTRALITY. Last lecture summary Mode Distribution.
Advertisements

Economics 105: Statistics Review #1 due next Tuesday in class Go over GH 8 No GH’s due until next Thur! GH 9 and 10 due next Thur. Do go to lab this week.
Last lecture summary Which measures of variability do you know? What are they advantages and disadvantages? Empirical rule.
Numerically Summarizing Data
Dual Tragedies in the B-ham Paper. Module 2 Simple Descriptive Statistics and Univariate Displays of Data A Tale of Three Cities George Howard, DrPH.
Chapter 10: Sampling and Sampling Distributions
Descriptive Statistics
Lecture 6: Descriptive Statistics: Probability, Distribution, Univariate Data.
Thomas Songer, PhD with acknowledgment to several slides provided by M Rahbar and Moataza Mahmoud Abdel Wahab Introduction to Research Methods In the Internet.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
AP Statistics Chapters 0 & 1 Review. Variables fall into two main categories: A categorical, or qualitative, variable places an individual into one of.
Chapter 1 Getting Started
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
What is statistics? STATISTICS BOOT CAMP Study of the collection, organization, analysis, and interpretation of data Help us see what the unaided eye misses.
LECTURE 12 Tuesday, 6 October STA291 Fall Five-Number Summary (Review) 2 Maximum, Upper Quartile, Median, Lower Quartile, Minimum Statistical Software.
Objectives 1.2 Describing distributions with numbers
Ways to look at the data Number of hurricanes that occurred each year from 1944 through 2000 as reported by Science magazine Histogram Dot plot Box plot.
● Midterm exam next Monday in class ● Bring your own blue books ● Closed book. One page cheat sheet and calculators allowed. ● Exam emphasizes understanding.
Fall 2013 Lecture 5: Chapter 5 Statistical Analysis of Data …yes the “S” word.
6.1 What is Statistics? Definition: Statistics – science of collecting, analyzing, and interpreting data in such a way that the conclusions can be objectively.
Data Analysis: Part 3 Lesson 7.1. Data Analysis: Part 3 MM2D1. Using sample data, students will make informal inferences about population means and standard.
Statistics Primer ORC Staff: Xin Xin (Cindy) Ryan Glaman Brett Kellerstedt 1.
LECTURE 8 Thursday, 19 February STA291 Fall 2008.
MEASURES OF CENTRALITY. Last lecture summary Which graphs did we meet? scatter plot (bodový graf) bar chart (sloupcový graf) histogram pie chart (koláčový.
Measures of Spread Chapter 3.3 – Tools for Analyzing Data I can: calculate and interpret measures of spread MSIP/Home Learning: p. 168 #2b, 3b, 4, 6, 7,
UNIT 8:Statistical Measures Measures of Central Tendency: numbers that represent the middle of the data Mean ( x ): Arithmetic average Median: Middle of.
STA Lecture 131 STA 291 Lecture 13, Chap. 6 Describing Quantitative Data – Measures of Central Location – Measures of Variability (spread)
Lecture 5: Chapter 5: Part I: pg Statistical Analysis of Data …yes the “S” word.
Fall Final Topics by “Notecard”.
Summary Five numbers summary, percentiles, mean Box plot, modified box plot Robust statistic – mean, median, trimmed mean outlier Measures of variability.
Lecture 5 Dustin Lueker. 2 Mode - Most frequent value. Notation: Subscripted variables n = # of units in the sample N = # of units in the population x.
Introduction to Inferential Statistics Statistical analyses are initially divided into: Descriptive Statistics or Inferential Statistics. Descriptive Statistics.
Distributions of the Sample Mean
Sampling Methods and Sampling Distributions
Agenda Descriptive Statistics Measures of Spread - Variability.
The hypothesis that most people already think is true. Ex. Eating a good breakfast before a test will help you focus Notation  NULL HYPOTHESIS HoHo.
AP STATISTICS Section 5.1 Designing Samples. Objective: To be able to identify and use different sampling techniques. Observational Study: individuals.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 7-1 Chapter 7 Sampling Distributions Basic Business Statistics.
 The mean is typically what is meant by the word “average.” The mean is perhaps the most common measure of central tendency.  The sample mean is written.
Notes 1.3 (Part 1) An Overview of Statistics. What you will learn 1. How to design a statistical study 2. How to collect data by taking a census, using.
Last lecture summary Which measures of central tendency do you know? Which measures of variability do you know? Empirical rule Population, census, sample,
Part III – Gathering Data
Describing Data Descriptive Statistics: Central Tendency and Variation.
Statistical Analysis of Data. What is a Statistic???? Population Sample Parameter: value that describes a population Statistic: a value that describes.
1.3 Describing Quantitative Data with Numbers Pages Objectives SWBAT: 1)Calculate measures of center (mean, median). 2)Calculate and interpret measures.
Variability Introduction to Statistics Chapter 4 Jan 22, 2009 Class #4.
Chapter 7 Data for Decisions. Population vs Sample A Population in a statistical study is the entire group of individuals about which we want information.
Chapter 7 Introduction to Sampling Distributions Business Statistics: QMIS 220, by Dr. M. Zainal.
Topics Semester I Descriptive statistics Time series Semester II Sampling Statistical Inference: Estimation, Hypothesis testing Relationships, casual models.
Designing Studies In order to produce data that will truly answer the questions about a large group, the way a study is designed is important. 1)Decide.
Plan for Today: Chapter 1: Where Do Data Come From? Chapter 2: Samples, Good and Bad Chapter 3: What Do Samples Tell US? Chapter 4: Sample Surveys in the.
Review Chapter 1-3. Exam 1 25 questions 50 points 90 minutes 1 attempt Results will be known once the exam closes for everybody.
Introduction Sample surveys involve chance error. Here we will study how to find the likely size of the chance error in a percentage, for simple random.
Minds on! Two students are being considered for a bursary. Sal’s marks are Val’s marks are Which student would you award the bursary.
1.3 Experimental Design. What is the goal of every statistical Study?  Collect data  Use data to make a decision If the process to collect data is flawed,
Last lecture summary Five numbers summary, percentiles, mean Box plot, modified box plot Robust statistic – mean, median, trimmed mean outlier Measures.
3.3 Measures of Spread Chapter 3 - Tools for Analyzing Data Learning goal: calculate and interpret measures of spread Due now: p. 159 #4, 5, 6, 8,
AP Statistics. Chapter 1 Think – Where are you going, and why? Show – Calculate and display. Tell – What have you learned? Without this step, you’re never.
7 th Grade Math Vocabulary Word, Definition, Model Emery Unit 4.
MAT 135 Introductory Statistics and Data Analysis Adjunct Instructor
Thursday, May 12, 2016 Report at 11:30 to Prairieview
Descriptive Statistics (Part 2)
1.2 Describing Distributions with Numbers
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
POPULATION VS. SAMPLE Population: a collection of ALL outcomes, responses, measurements or counts that are of interest. Sample: a subset of a population.
6A Types of Data, 6E Measuring the Centre of Data
(-4)*(-7)= Agenda Bell Ringer Bell Ringer
Advanced Algebra Unit 1 Vocabulary
Introductory Statistics
Presentation transcript:

V pátek nebude přednáška. Cvičení v tomto týdnu bude.

Last lecture summary Mode Distribution Five numbers summary, percentiles, mean Box plot, modified box plot Robust statistic – mean, median, trimmed mean outlier

SDA girls – histogram of heights 2014 n = 48 or N = 48 bin size = 3.8

SDA girls – all previous years + actual n = 69 bin size = 3.8

MEASURES OF VARIABILITY

Navození atmosféry – Introduction to statistics

QUESTION Mean1 Mean2 Mode1 Mode2 Median1 Median2 – Statistics n = 1000

range (variační rozpětí) MAX - min – Statistics n = 1000

Range Range changes when we add new data into dataset Always Sometimes Never – Statistics n = 1000

Adding Mark Zuckerberg – Statistics n = 1000

Cut off data IQR, mezikvartilové rozpětí – Statistics n = 1000

Interquartile range, IQR Let’ take this quiz, answer yes or no. 1. About 50% of the data fall within the IQR. 2. The IQR is affected by every value in the data set. 3. The IQR is not affected by outliers. 4. The mean is always between Q1 and Q Q2Q1=1 Q3=3 – Statistics průměr = 8.62 n = 13

Define the outlier Sample (n=10) $38,946 $43,420 $49,160 $50,430 $50,557 $52,580 $53,595 $54,160 $60,181 $10,000,000 What values are outliers for this data set? 1.$60,000 2.$80,000 3.$100,000 4.$200,000 – Statistics

Problem with IQR normal bimodal uniform – Statistics

Options for measuring variability Find the average distance between all pairs of data values. Find the average distance between each data value and either the max or the min. Find the average distance between each data value and the mean. – Statistics

Average distance from mean Sample

Average distance from mean Sample Find the average distance between each data value and the mean.

Preventing cancellation How can we prevent the negative and positive deviations from cancelling each out? 1. Ignore (i.e. delete) the negative sign. 2. Multiply each deviation by two. 3. Square each deviation. 4. Take absolute value of each deviation.

Average absolute deviation Sample avg. absolute deviation = 4.6

Average absolute deviation

Squared deviations Sample avg. square deviation = 31.2

Variance Average square devation has a special name – variance (rozptyl). – Statistics

Standard deviation

What is so great about the standard deviation? Why don’t we just find the average absolute deviation? More on absolute vs. standard deviation: 1.SD is used because of tradition 2.It is easier to work with power of two than with absolute value. 3.SD has very nice interpretation in Gaussian distribution.

Standard deviation – empirical rule

Empirical rule – well behaved distribution

Empirical rule – not-so-well behaved distribution

Statistical inference The goal of statistics: make rational conclusions or decisions based on the incomplete information we have in our data. This process is known as statistical inference. In inferential statistics we want to answer 1. Is some relationship in data due to chance? Or is it a real difference? 2. If the effect is real, can it be generalized to a larger group?

Statistical jargon Population – the group we are interested in making conclusions about. Census – a collection of data on the entire population. Sample – if we can’t conduct a census, we collect data from the sample of a population. Goal: make conclusions about that population.

Statistical jargon population (census) vs. sample parameter (population) vs. statistic (sample)

Statistical inference A statistic is a value calculated from our observed data (sample). A parameter is a value that describes the population. We want to be able to generalize what we observe in our data to our population. In order to this, the sample needs to be representative. How to select a representative sample? Use randomization.

Random sampling Simple Random Sampling (SRS) – each possible sample from the population is equally likely to be selected. Stratified Sampling – simple random sample from subgroups of the population subgroups: gender, age groups, … Cluster sampling – divide the population into non- overlapping groups (clusters), sample is a randomly chosen cluster example: population are all students in an area, randomly select schools and create a sample from students of the given school

Simple random sampling sampling with replacement (WR) výběr s navrácením Generates independent samples Two sample values are independent if that what we get on the first one doesn't affect what we get on the second. sampling without replacement (WOR) výběr bez navrácení Deliberately avoid choosing any member of the population more than once. This type of sampling is not independent, however it is more common. The error is small as long as 1. the sample is large 2. the sample size is no more than 10% of population size

Bias If a sample is not representative, it can introduce bias into our results. bias – zkreslení, odchylka A sample is biased if it differs from the population in a systematic way. The Literary Digest poll, 1936, U. S. presidential election surveyed 10 mil. people – subscribers 2.3 mil. responded predicting (3:2) a Republican candidate to win a Democrat candidate won What went wrong? only wealthy people were surveyed (selection bias) survey was voluntary response (nonresponse bias) – angry people or people who want a change