Week11 STA220 Fall 2007 - Useful Information Instructor: Hadas Moshonov. Web-page:

Slides:



Advertisements
Similar presentations
Describing Quantitative Variables
Advertisements

CHAPTER 1 Exploring Data
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 2 Exploring Data with Graphs and Numerical Summaries Section 2.2 Graphical Summaries.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Lecture Slides Elementary Statistics Eleventh Edition and the Triola.
Experimental Statistics I.  We use data to answer research questions  What evidence does data provide?  How do I make sense of these numbers without.
CHAPTER 1: Picturing Distributions with Graphs
Chapter 1: Exploring Data
1 Laugh, and the world laughs with you. Weep and you weep alone.~Shakespeare~
1 Chapter 3 Looking at Data: Distributions Introduction 3.1 Displaying Distributions with Graphs Chapter Three Looking At Data: Distributions.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Essential Statistics Chapter 11 Picturing Distributions with Graphs.
Displaying Distributions with Graphs. the science of collecting, analyzing, and drawing conclusions from data.
CHAPTER 1 Picturing Distributions with Graphs BPS - 5TH ED. CHAPTER 1 1.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 1 Exploring Data 1.2 Displaying Quantitative.
+ Chapter 1: Exploring Data Section 1.2 Displaying Quantitative Data with Graphs The Practice of Statistics, 4 th edition - For AP* STARNES, YATES, MOORE.
+ Chapter 1: Exploring Data Section 1.2 Displaying Quantitative Data with Graphs The Practice of Statistics, 4 th edition - For AP* STARNES, YATES, MOORE.
1 Take a challenge with time; never let time idles away aimlessly.
Statistics - is the science of collecting, organizing, and interpreting numerical facts we call data. Individuals – objects described by a set of data.
+ Chapter 1: Exploring Data Section 1.1 Displaying Quantitative Data with Graphs Dotplots, Stemplots and Shapes.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 1 Exploring Data 1.2 Displaying Quantitative.
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
The rise of statistics Statistics is the science of collecting, organizing and interpreting data. The goal of statistics is to gain understanding from.
Chapter 1.1 Displaying Distributions with graphs.
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Warm Up.
Laugh, and the world laughs with you. Weep and you weep alone
CHAPTER 1: Picturing Distributions with Graphs
recap Individuals Variables (two types) Distribution
CHAPTER 1: Picturing Distributions with Graphs
Chapter 1 Data Analysis Section 1.2
CHAPTER 1 Exploring Data
The rise of statistics Statistics is the science of collecting, organizing and interpreting data. The goal of statistics is to gain understanding from.
CHAPTER 1 Exploring Data
Daniela Stan, PhD School of CTI, DePaul University
CHAPTER 1 Exploring Data
1.1 Cont’d.
CHAPTER 1 Exploring Data
Good Morning AP Stat! Day #2
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Basic Practice of Statistics - 3rd Edition
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
Basic Practice of Statistics - 3rd Edition
Chapter 1: Exploring Data
CHAPTER 1: Picturing Distributions with Graphs
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
Displaying Quantitative Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
Warmup Find the marginal distribution for age group.
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
Statistics is... a collection of techniques for planning experiments, obtaining data, and then organizing, summarizing, presenting, analyzing, interpreting,
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Presentation transcript:

week11 STA220 Fall Useful Information Instructor: Hadas Moshonov. Web-page: Office: Sidney Smith Hall Room Tel: Office hours: Friday 12 – 2 or by appointment. NOTE: The lecture notes are based on the textbook and can be downloaded from the CCNet website (lecture section L0201).

week12 The rise of statistics Statistics is the science of collecting, organizing and interpreting data. The goal of statistics is to gain understanding from data. Historically, the ideas and methods of statistics developed gradually as society became interested in collecting and using data for a variety of applications. The discipline of statistics took shape in the twentieth century when methods for producing and understanding data grew in number and sophistication.

week13 Elements of Statistics - Introduction Data are numerical facts with context and we need to understand the context if we are to make sense of the numbers. A set of data contains some information about a group of individuals. Individuals are the objects upon which we collect data. Individuals can be people, animals, plots of land and many other things. A population is a set of individuals that we are interested in studying. A variable is any characteristic of an individual. A sample is a subset of the individuals of a population.

week14 Questions to ask when planning a statistical study Why? What purpose do the data have? Do we hope to answer some specific questions? Do we want to draw conclusions about individuals other than the ones we actually have data for? Who? What individuals do the data describe? How many individuals appear in the data? What? How many variables do the data contain? Exact definitions of these variables. What are the units of measurements in which each variable is recorded? Weights for example, might be recorded in pounds, or in kg.

week15 Collecting Data Generally, data can be obtained in four different ways.  Published source.  Designed experiment.  Survey.  Observational study.

week16 Types of Variables A categorical variable places an individual into one of several groups or categories, e.g. gender, college major. A quantitative variable takes numerical values for which arithmetic operations are defined, e.g. height, weight. The distribution of a variable tells us what values it takes and how often it takes these values. Examples 1.2, 1.3 pages 5-6 in IPS.

week17 Displaying Distributions With Graphs Statistical tools and ideas help us examine data in order to describe their main features. This examination is called exploratory data analysis. Two basic strategies for exploration of data set:  Begin by examining each variable by itself. Then move on to study the relationships among the variables.  Begin with graphs. Then add numerical summaries of specified aspects of the data.

week18 Graphs for categorical variables The values of a categorical variable are the labels for the categories such as “male” and “female”. The distribution of a categorical variable lists the categories and gives either the count or the percent of individuals who fall in each category.

week19 Example : The of marital status for all Americans age 18+. Marital statusCount (millions)Percent Never married Married Widowed Divorced

week110

week111 Measurement - example We want to compare the “size” of several statistics books. Describe three possible numerical variables that describes the “size” of a book. In what units would you measure each variable? What measuring instrument does each require? Describe a variable that is appropriate for estimating how long it would take to read the book?

week112 Describing Quantitative Data The pattern of variation of a variable is called its distribution. The distribution of a variable is best displayed graphically. There are three main graphical methods for describing summarizing and detecting patterns in quantitative data:  Dot plot  Stem-and-leaf plot  Histogram

week113 Stemplots To make a stemplot: 1. Separate each observation into a stem consisting of all but the final (rightmost) digit and a leaf, the final digit. 2.Write the stems in a vertical column with the smallest at the top, and draw a vertical line at the right of this column. 3.Write each leaf in the row to the right of its stem, in increasing order out from the stem. Examples 1.5, 1.6 pages in IPS.

week114 Example Here are the scores of a basketball player (say player A) Make a stemplot of these data. Describe the main features of the distribution. Solution : Min. = 22, Max = 60 … MINITAB command: Graph > Stem-and-Leaf

week115 Back-to-back stemplots When comparing two related distributions, a back-to-back stemplot with common stems is useful. Example: Here are the scores of two players A and B. Player A: Player B: Make a back-to-back stemplot of these data.

week116 Exercise Write down the min., max, and the median of the data sets summarized by the following MINITAB stemplot. Stem-and-leaf of Fuel Use N = 15 Leaf Unit = (4)

week117 Exercise Write down the min., max, and the median of the data sets summarized by the following MINITAB stemplot. Stem-and-leaf of weight N = 12 Leaf Unit = (4)

week118 Examining a distribution In any graph of data, look for the overall pattern and for striking deviations from that pattern. Overall pattern of a distribution can be described by its shape, centre, and spread. An important kind of deviation is an outlier, an individual value that falls outside the overall pattern. Some other things to look for in describing shape are:  Does the distribution have one or several major peaks, called modes? A distribution with one major peak is called unimodal.  Is it approximately symmetric or skewed in one direction.

week119 Exercise Describe the shape of the distributions summarized by the following stemplot. Stem-and-leaf of sta220 marks N = 42 Leaf Unit = (11)

week120 Exercise Describe the shape of the distributions summarized by the following stemplots. Stem-and-leaf of C1 N= 50 Leaf Unit = (17)

week121 Exam Question Forty students wrote a Statistics examination having a maximum of 50 marks. The mark distribution is given in the following stem-and-leaf plot: Stem Leaf State whether the following statements are true or false. (a) The distribution is right skewed. (b) The median of the distribution is 31. (c) The median of the distribution is 28. (d) The mode of the distribution is 48. (e) More than 18% of the students scored 45 or more on the examination.

week122 Histograms A histogram breaks the range of values of a variable into intervals and displays only the count or percent of the observations that fall into each interval. We can choose a convenient number of intervals. Histograms do not display the actual values observed. (only counts in each interval). Example: Here is some data on the number of days lost due to illness of a group of employees: 47, 1, 55, 30, 1, 3, 7, 14, 7, 66, 34, 6, 10, 5, 12, 5, 3, 9, 18, 45, 5, 8, 44, 42, 46, 6, 4, 24, 24, 34, 11, 2, 3, 13, 5, 5, 3, 4, 4, 1

week123 The main steps in constructing a histogram 1.Determine the Range of the data (largest and smallest values) In our example the data ranges from a min. of 1 day to a max. of 66 days. 2.Decide on the number of intervals (or classes), and the width of each class (usually equal). 3.Count the number of observations in each class. These counts are called class frequencies. 4.Draw the histogram.

week124 A table with the first two columns above is called frequency table or frequency distribution. A table with the first column and the third column is called cumulative frequency distribution. ClassNo. of employees (Frequency) Cumulative. FrequencyRelative frequency Total

week125 MINITAB command: Graph > Histogram

week126 Comments The above histogram suggests that the distribution is skewed to the right. No gaps or outliers. Since this data set is not very large (40 observations) we can also use a dotplot or a stemplot to represent the data. MINITAB commands for dotplot: Graph > Character Graphs > Dotplot Some good examples are 1.7 and 1.8 on pages 14, 18 in IPS.

week127 Stem-and-leaf of days lost N = 40 Leaf Unit = 1.0 (22)

week128 Dealing with outliers We can spot outliers by looking for observations that stand apart from the overall pattern of a histogram or stemplot. Identifying outliers is a matter for judgment. Outliers are points that are clearly apart from the body of the data, not just the most extreme observations in a distribution. We should always search for an explanation for any outliers. Sometimes outliers point to errors made in recording the data. In other cases, the outlying observation may be caused by equipment failure or other unusual circumstances. Example 1.9 p18 in IPS.

week129 Time plot Whenever the data are collected over time, it is a good idea to plot the observations in time order. A time plot of a variable plots each observation against the time at which it was measured. The horizontal scale is always time and the vertical scale is the variable we measured. Connecting the data points by lines helps emphasize any changes over time. Measurements of a variable taken at regular intervals over time are called time series. Examples: monthly unemployment rate, quarterly GDP, weather records. Time plots can reveal the main feature of a time series. Example 1.10 on page 19 in IPS.

week130 Question Which type of display uses the actual numbers as building blocks for the display? gives the most flexibility for setting the class width and number of classes? is most convenient for very large data sets? is quickest to construct? keeps the most detail re the actual data?

week131 Review Questions 1.The purpose of a frequency distribution is to ____. a)present scores and their frequency of occurrence b)present data in a more meaningful way than single scores c)provide more information than a graph d)all of the above e)a and b 2.Which of the following indicates the proportion of the total number of scores which occurred in each interval? a)Relative frequency distribution b)Cumulative frequency distribution c)Cumulative percentage distribution d)None of the above

week Which of the following indicates the number of scores which fell below the upper limit of each interval? a)Relative frequency distribution b)Cumulative frequency distribution c)Cumulative percentage distribution d)None of the above 4.Which of the following is not a symmetrical distribution? a)A bell-shaped distribution b)A J-shaped distribution c)A U-shaped distribution d)An inverted U-shaped distribution