1 Take a challenge with time; never let time idles away aimlessly.

2 Turning Data Into Information types of data summary statistics distributions

3 What are Data? Any set of data contains information about some group of individuals. The information is organized in variables. Individuals are the objectives described by a set of data. Could be animals, people, or things. A variable is any characteristic of an individual. A variable can take different values for different individuals.

4 Population/Sample/Raw Data A population is a collection of all individuals about which information is desired. A sample is a subset of a population. Raw data: information collected but not been processed. Eg. What’s the fastest you’ve ever driven a car? 100 mph. ** use this eg to show individual, population, sample and variable.

5 Example: A College’s Student Dataset The data set includes data about all currently enrolled students such as their ages, genders, heights, grades, and choices of major. Who? What individuals do the data describe? Population/sample/raw data of study? What? How many variables do the data describe? Give an example of variables.

6 Types of Variables A categorical variable places an individual into one of several groups or categories. A quantitative variable takes numerical values for which arithmetic operations such as adding and averaging make sense. Q. Which variable is categorical ? Quantitative?

7 A variable Categorical/ Qualitative Nominal variable Ordinal variable Numerical/ Quantitative Discrete variable Continuous variable Q: Does “average” make sense? Yes No Yes Q: Is there any natural ordering among categories?Q: Can all possible values be listed down?

8 Asking the Right Questions The relationship between gender and age ** What is your age and gender? ** Reading assignment: P 16-18.

9 The Role of a Variable Response/outcome/dependent variables: how they change are the most interested. Explanatory/predictor/independent variables: those can explain how responses change. Study Q: Do the males tend to speedier than the females? 1. What are the two variables? Their roles?

10 Two Basic Strategies to Explore Data Begin by examining each variable by itself. Then move on to study the relationship among the variables. Begin with a graph or graphs. Then add numerical summaries of specific aspects of the data.

11 Summarizing Data Goal: to study or estimate the distributions of variables The distribution of a variable tells us what values/categories it takes and how often it takes those values/categories. Displaying distributions of data with graphs Describing distributions of data with numbers

12 A Dataset of CSUEB Students GenderHeight (inches) Weight (pounds) College M68.5155Bsns F61.299Bsns F63.0115Sci M70.0205Sci F68.6170Arts F65.1125Bsns M72.4220Arts M--188Sci

13 Numerical Summaries for Categorical Variables Frequency/counts Relative frequency/percentage ** show gender Contingency table ** show gender vs. college

14 Graphs for Categorical Variables Pie charts ** good for one variable; show gender pie chart Bar graphs ** good for one or two variables; show gender then gender and college bar graphs

15 Graphs for Quantitative Variables Stem-and-leaf plots; dotplots ** to see the distribution. Good for small datasets; explicit but choppy. E.g. height Histograms ** to see the distribution. Good for all datasets. E.g. height

16 How to Make a Stemplot 1. Separate each observation into a stem consisting of all but the final (rightmost) digit and a leaf, the final digit. Stems may have as many digits as needed, but each leaf contains only a single digit. Example: age of 20.5  leaf = “5” and the other digit “20” will be the stem

17 How to Make a Stemplot 2. Write the stems in a vertical column with the smallest at the top, and draw a vertical line at the right of this column. 3. Write each leaf in the row to the right of its stem, in increasing order out from the stem. *** work on blackboard for height ***

18 How to Make a Histogram 1. Break the range of values of a variable into equal-width intervals. 2. Count the # of individuals in each interval. These counts are called frequencies and the corresponding %’s are called relative frequencies. 3. Draw the histogram: the variable on the horizontal axis and the count (or %) on the vertical axis. *** work on blackboard for height ***

19 What do We See from the Graphs? Important features we should look for: Overall pattern – Shape – Center (the location data tend to cluster to) – Spread (the spread level of data) Outliers, the values that fall far outside the overall pattern (for quantitative variables only)

20 Overall Pattern—Shape How many peaks, called modes? A distribution with one major peak is called unimodal. Symmetric or skewed? – Symmetric if the large values are mirror images of small values – Skewed to the right if the right tail (large values) is much longer than the left tail (small values) – Skewed to the left if the left tail (small values) is much longer than the right tail (large values) *** Show examples on blackboard. ***

21 Numerical Summaries for Quantitative Variables To measure center (location): Mode, Mean and Median To measure spread: Range, Interquartile Range (IQR) and Standard Deviation (SD) Five-number summaries ** show height Outliers ** give a large number for the missing height Regression ** height vs. weight, discussed in chapter 5

22 More Graphs for Quantitative Variables Boxplots ** to show location and spread, and identify outliers Scatterplots ** to see the relationship between two quan. var’s: height vs. weight

Graphs for the Relation of Two Variables 1 categorical + 1 quantitative var’s: 2 quantitative var’s: 2 categorical var’s: 23

24 (for Bell-shaped distributions only)

25 Empirical Rule (68-95-99.7 rule) If a variable X follows normal distribution, that is, all X values (the whole population) show bell-shaped, then: Mean(X) + 1*SD(X) covers 68% of possible X values Mean(X) + 2*SD(X) covers 95% of possible X values Mean(X) + 3*SD(X) covers 99.7% of possible X values

26 Empirical Rule (68-95-99.7 rule) If the data (from a sample) of a variable X show bell-shaped, then: X + 1*S covers about 68% of possible X values X + 2*S covers about 95% of possible X values X + 3*S covers about 99.7% of possible X values

How to use Empirical Rule Find the range covering 68%, 95% or 99.7% of X values Check if X follows a normal distribution. Provide an estimate of S without messy calculation 27

1 Take a challenge with time; never let time idles away aimlessly.

Similar presentations

Presentation on theme: "1 Take a challenge with time; never let time idles away aimlessly."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Take a challenge with time; never let time idles away aimlessly.

Similar presentations

Presentation on theme: "1 Take a challenge with time; never let time idles away aimlessly."— Presentation transcript:

Similar presentations

About project

Feedback