Introduction to the Practice of Statistics

Slides:



Advertisements
Similar presentations
Describing Quantitative Variables
Advertisements

Describing Distributions with Numbers
Chapter 1 Introduction Individual: objects described by a set of data (people, animals, or things) Variable: Characteristic of an individual. It can take.
Statistics Lecture 2. Last class began Chapter 1 (Section 1.1) Introduced main types of data: Quantitative and Qualitative (or Categorical) Discussed.
Experimental Statistics I.  We use data to answer research questions  What evidence does data provide?  How do I make sense of these numbers without.
CHAPTER 1: Picturing Distributions with Graphs
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Describing distributions with numbers
Chapter 1 Exploring Data
Let’s Review for… AP Statistics!!! Chapter 1 Review Frank Cerros Xinlei Du Claire Dubois Ryan Hoshi.
Chapter 1 – Exploring Data YMS Displaying Distributions with Graphs xii-7.
Chapter 1: Exploring Data AP Stats, Questionnaire “Please take a few minutes to answer the following questions. I am collecting data for my.
1 Laugh, and the world laughs with you. Weep and you weep alone.~Shakespeare~
Chapter 2 Describing Data.
Describing distributions with numbers
Statistics Chapter 1: Exploring Data. 1.1 Displaying Distributions with Graphs Individuals Objects that are described by a set of data Variables Any characteristic.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
1 Never let time idle away aimlessly.. 2 Chapters 1, 2: Turning Data into Information Types of data Displaying distributions Describing distributions.
More Univariate Data Quantitative Graphs & Describing Distributions with Numbers.
Chapter 5 Describing Distributions Numerically Describing a Quantitative Variable using Percentiles Percentile –A given percent of the observations are.
Class Two Before Class Two Chapter 8: 34, 36, 38, 44, 46 Chapter 9: 28, 48 Chapter 10: 32, 36 Read Chapters 1 & 2 For Class Three: Chapter 1: 24, 30, 32,
Exploratory Data Analysis
UNIT ONE REVIEW Exploring Data.
Chapter 1.1 Displaying Distributions with graphs.
Chapter 1: Exploring Data
Chapter 4 Review December 19, 2011.
Warm Up.
Statistical Reasoning
Laugh, and the world laughs with you. Weep and you weep alone
CHAPTER 1: Picturing Distributions with Graphs
Chapter 3 Describing Data Using Numerical Measures
Numerical Descriptive Measures
CHAPTER 1: Picturing Distributions with Graphs
DAY 3 Sections 1.2 and 1.3.
Data Analysis and Statistical Software I Quarter: Winter 02/03
Describing Distributions of Data
Drill {A, B, B, C, C, E, C, C, C, B, A, A, E, E, D, D, A, B, B, C}
Displaying Distributions with Graphs
Displaying and Summarizing Quantitative Data
Daniela Stan, PhD School of CTI, DePaul University
Data Analysis and Statistical Software I Quarter: Spring 2003
Chapter 1: Exploring Data
Basic Practice of Statistics - 3rd Edition
Welcome!.
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Basic Practice of Statistics - 3rd Edition
Chapter 1: Exploring Data
CHAPTER 1: Picturing Distributions with Graphs
Honors Statistics Review Chapters 4 - 5
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Presentation transcript:

Introduction to the Practice of Statistics Instructor : Alex Kulik Office : C-11, p. 2.03 http://www.im.pwr.wroc.pl/~kulyk/

Your grade homework (25%) quizzes (25%) midterm 1 (25%) Total grade of 90%=bdb, 70%=db, 50%=dst. Late homeworks and quizzes are not accepted. Class participation is required. Contact the instructor if expecting problems to take an exam.

Textbook Introduction to the Practice of Statistics, 4th edition, by David S. Moore and George P. McCabe – available in the library of C-11. We will go through Chapters 1-12 omitting Chapter 11.

To do.. Get a calculator, especially for tests. Install MS Excel at home, to be occasionally used for your homework. Regularly visit out web page for the schedule, lecture notes, assignments, solutions, tables.

Data: We use data to answer scientific questions. Data has variability. To assess the evidence data provide, we need to distinguish signal from noise.

Example Study the effect of exercise on cholesterol levels. One group exercises and another does not. Is cholesterol reduced by exercise? Consider: people differ other factors may have an effect exercise may affect other factors

What is Statistics? The science of understanding data and making decisions in face of variability/randomness. The set of methods to analyze the data and to design the experiment in order to extract information and quantify its reliability

Section 1.1 (Numbering as in the textbook) Data set: Individuals and Variables Individuals – objects described by a set of data (people, animals, things) Variable – characteristic of the individuals

Types of Variables Variables Quantitative Continuous Discrete Ordinal Not ordinal Categorical

Types of variables Quantitative (numerical) Continuous: e.g. height, weight, concentration Discrete: e.g. number of customers, flowers Categorical (non-numerical) Ordinal: e.g. choices on a survey: never, rarely, occasionally, often, always Non-ordinal: e.g. shape, race

Example: Information on employees

Exploratory data analysis variables Distribution = description of count or percent. Categorical variables: visualize the distribution by using bar char or pie chart. Quantitative variables: visualize the distribution by stemplot or histogram.

Education of 25- to 34-years-olds (US) Count (in milions) Percent Less than high school 4.7 12.3 High school graduate 11.8 30.7 Some college 10.9 28.3 Bachelor’s degree 8.5 22.1 Advanced degree 2.5 6.6

Bar graph of education

Pie chart of education

Distribution of quantitative variables Individual observations often differ—we observe a cloud rather than a few values The distribution of quantitative variables is displayed by histogram

Examining distributions Describe the pattern: Shape: e.g. symmetric or skewed in one direction; the number of modes, Center – e.g. the midpoint, Spread –e.g. the range between the smallest and the largest values. Look for outliers – individual values that do not match the overall pattern.

A glimpse at the distribution Example: Numbers of home runs that Babe Ruth hit in each of his 15 years (1920 – 1034) with the New York Yankees: 54 59 35 41 46 25 47 60 54 46 49 46 41 34 22 Stemplot, also called stem-and-leaf plot. Leaf = the last digit Stem = all but the last digit

Draw the stem-and-leaf plot: a) write the stems b) write the leafs for each stem c) order the leaves on each stem We can increase the number of stems by splitting them into two, e.g. one with leaves 0 to 4 and one with leaves 5 through 9. We can also round numbers before making stemplot.

Back-to-back stemplot Compare the counts of Babe Ruth’s hits and Mark McGwire’s hits: 9 9 22 29 32 32 33 39 39 42 49 52 58 65 70

Distribution at large: Histograms

Frequency Table of the Hispanic data Class Count Percent 0.1-5.0 30 60 20.1-25 1 2 5.1-10.0 10 20 25.1-30 4 10.1-15 8 30.1-35 15.1-20 35.1-40

Histogram of Percent of Hispanic adults

Histogram, comments: The ranges of the variable are called bins. Bins should be convenient; usually of equal length, covering the whole range of data. The number of bins is a matter of judgement, choose e.g. an integer close to the square root of the number of observations. Frequency histogram = has counts Relative frequency histogram = has percents

Labelling the graph is important! The horizontal axis is for the variable. The vertical axis is for the counts/frequencies or relative frequencies/percents. Remember to label the axes precisely as in our examples.

... + 24,800 nanoseconds

Give frequency table of Newcomb’s data 20- 24.9 25- 29.9

Draw frequency histogram of Newcomb’s data (Then relative frequency histogram)

Histogram of Newcomb’s data (note left outliers)

Other plots: e.g. time series May exhibit hidden mechanisms Trend – persistent, long-term rise or fall Seasonal variation – a pattern that repeats itself at known regular intervals of time. ...less important in this course.

Time plots. Newcomb’s data.

Section 1.2 Describing distributions with numbers: Mean Median Quartiles Boxplot Standard deviation Changing the unit of measurement

Measures of Centre Mean The arithmetic mean of a data set (average) Denoted by Mean can be easily influenced by outliers, i.e. it is not resistant.

Median Median is the midpoint of a distribution: Sort the data in increasing order. Median equals the (n+1)/2-th observation if n is odd, and it is the average of the two middle observations if n is even. Median is a resistant measure of center. Outliers do not influence median much.

Mean vs. Median In a symmetric distribution mean=median In a skewed distribution the mean is further out in the long tail than the median is. Example: The mean price of existing houses sold in 2000 was 176,200. The median price of these houses was 139,000.

Measures of spread Quartiles: Q2 (second quartile)=Median Q1 (first quartile) =median of the lower “half” of the sorted data Q3 (third quartile) = median of the upper half of data p-th percentile – number q such that approximately p percent of the observations are smaller than q. Q1, Q2, Q3 are 25th, 50th, 75th percentiles.

The InterQuanileRange and criterion for outliers The interquartile range: IQR=Q3-Q1 An observation is an outlier if it falls more then 1.5*IQR above the third quartile or more than 1.5*IQR below the first quartile. We often remove the outliers from the data.

Standard deviation Deviation of i-th observation: Variance:

Five-Number Summary Minimum, Q1, Median, Q3, Maximum Boxplot – visual representation of the five- number summary.

Statistics: Minicomp. City Minicomp. Highway Two-seater City Highway w/o outlier mean 13.4 25.8 19.2 14.1 23.4 median 18 25 26 14.5 Q1 16 23 13 21 Q3 20 28 27 SD 2.42 3.16 11.2 11.5 5.07 5.34

Boxplots

Hispanics data: the histogram...

...and a boxplot... Modified boxplot: outliers shown.

Five-Number Summary VS. Standard Deviation s=0 when there is no spread s is not resistant The five-number summary usually better describes a skewed distribution or a distribution with outliers. Mean and standard deviation are usually used for reasonably symmetric distributions without outliers.

Linear Transformations: xnew=a+bxold Examples: xmiles=0.62 xkm xg=28.35 xoz

Linear transformations do not change the shape of a distribution. They do change the center and the spread e.g: Pythons 1 2 3 4 5 oz 1.13 1.02 1.23 1.06 1.16 g 32 29 35 30 33

Effect of a linear transformation: xnew=a+b*xold meannew=a+b*meanold mediannew=a+b*medianold stdnew=|b|*stdold IRQnew=|b|*IRQold

in [g] in [oz] Mean Median SD Calculate mean, median and SD for the weight of pythons in [g] in [oz] Mean Median SD