Presentation is loading. Please wait.

Presentation is loading. Please wait.

Descriptive Statistics

Similar presentations


Presentation on theme: "Descriptive Statistics"— Presentation transcript:

1 Descriptive Statistics
SAA 2023 COMPUTATIONALTECHNIQUE FOR BIOSTATISTICS Introduction & Descriptive Statistics

2 Introduction Statistics - technology used to describe and measure aspects of nature from samples Statistics lets us quantify the uncertainty of these measures

3 Introduction Statistics is also about good scientific practice
The history of statistics has its roots in biology

4 Sir Francis Galton Inventor of fingerprints,
study of heredity of quantitative traits Regression & correlation Also: efficacy of prayer, attractiveness as function of distance from London

5 Karl Pearson Polymath- Studied genetics Correlation coefficient
c2 test Standard deviation

6 Sir Ronald Fisher The Genetical Theory of Natural Selection
Founder of population genetics Analysis of variance Likelihood P-value Randomized experiments Multiple regression etc., etc., etc.

7 Statistical quotations
There are three kinds of lies: lies, damn lies, and statistics. Benjamin Disraeli / Mark Twain It is easy to lie with statistics, but easier to lie without them. Frederick Mosteller

8 Goals of statistics Estimation Hypothesis testing
Infer an unknown quantity of a population using sample data Hypothesis testing Differences among groups Relationships among variables

9 Introduction Introduction to the basic concepts of statistics as applied to problems in biological science. Goal of the course Understand statistical concepts (population, sample,, slope, significant etc.); Identify appropriate methods for your data (e.g., one-sample, two-sample, paired t-test or independent t-test, one-way or two-way ANOVA); Select correct MINITAB procedures to analyze data Scientific reading and interpretation.

10 Biostatistics Why study Biostatistics?
Statistical methods are widely used in biological field; Examples are from biological field, practical and useful; Focus on application instead of mathematical derivation; Help to evaluate the paper in an intelligent manner. Statistics - the science and art of obtaining reliable results and conclusions from data that is subject to variation. Biostatistics (Biometry)- the application of statistics to the biological sciences.

11 Biostatistics Why Computer Applications?
Statistical methods are mostly difficult and complicated (ANOVA, regression etc); Advances in computer technology and statistical software development make the application of statistical method much easier today than before; Software such as MINITAB needs time to learn.

12 Is Biostatistics hard to study?
Factors make it hard for some students to learn statistics: The terminology is deceptive. To understand statistics, you have to understand the statistical meaning of terms such as significant, error and hypothesis are distinct from ordinary uses of these words.

13 Is Biostatistics hard to study?
Statistics requires mastering abstract concepts. It is not easy to think about theoretical concepts such as populations, probability distributions, and null hypotheses. Statistics is at the interface of mathematics and science. To really grasp the concepts of statistics, you need to be able to think about it from both angles.

14 Is Biostatistics hard to study?
The derivation of many statistical tests involves difficult math. However, you can learn to use statistical tests and interpret the results even if you do not fully understand how they work. You only need to know enough about how the tool works so that you can avoid using them in inappropriate situations.

15 Is Biostatistics hard to study?
Basically, you can calculate statistical tests and interpret results even if you don’t understand how the equations were derived, as long as you know enough to use the statistical tests appropriately.

16 Questions about this course
Is this course to be hard? No. Concept is easy and procedure is clear. Why do we spend time on theoretical stuff? Helpful to understand the application Do we need to know all the stuff? You may not need all, but be prepared

17 Role of statistics in Biological Science
1.Idea or Question 2.Collect data/make observations 3.Describe data / observations 4.Assess the strength of evidence for / against the hypothesis Statistics 1.Mathematical model / hypothesis 2.Study design 3.Descriptive statistics 4.Inferential statistics

18 Contents of the course Descriptive statistics Inferential statistics
Graph, table, mean and standard deviation Inferential statistics Probability and distribution Hypothesis test Analysis of Variation Correlation and regression analysis Other special topics

19 Basic Concept Data numerical facts, measurements, or observations obtained from an investigation, experiment aimed at answering a question Statistical analyses deal with numbers Quantitative - usual type of measurement, such as height or weight - measurements of quantitative variables carry information about 'amount' - can calculate means, etc., and can use in calculations Qualitative - carry information about category or classification, such as medical diagnosis, ethnic group, gender - cannot calculate means as such, but can tabulate counts or frequencies and analyze frequencies Random - variables whose values arise by chance factors which cannot be predicted in advance, such as height or weight - race or age are 'fixed' variables; i.e., not random Discrete - can take only certain values or can only be measured to a certain degree of accuracy - e.g., # of children that a woman has delivered, # of teeth with fillings, blood pressure(?) - may be handled differently in analysis Continuous - can take any value to any degree of precision in a certain range - height, weight, temperature(?) Population - largest collection of values of a random variable for which we have an interest at a particular time - school children in Maryland Sample - selected part of a population - fifth grade girls, second grade boys, etc.

20 Basic Concept Quantitative
Usual type of measurement, such as height or weight - measurements of quantitative variables carry information about 'amount' - can calculate means, etc., and can use in calculations Quantitative - usual type of measurement, such as height or weight - measurements of quantitative variables carry information about 'amount' - can calculate means, etc., and can use in calculations Qualitative - carry information about category or classification, such as medical diagnosis, ethnic group, gender - cannot calculate means as such, but can tabulate counts or frequencies and analyze frequencies Random - variables whose values arise by chance factors which cannot be predicted in advance, such as height or weight - race or age are 'fixed' variables; i.e., not random Discrete - can take only certain values or can only be measured to a certain degree of accuracy - e.g., # of children that a woman has delivered, # of teeth with fillings, blood pressure(?) - may be handled differently in analysis Continuous - can take any value to any degree of precision in a certain range - height, weight, temperature(?) Population - largest collection of values of a random variable for which we have an interest at a particular time - school children in Maryland Sample - selected part of a population - fifth grade girls, second grade boys, etc.

21 Basic Concept Qualitative
Carry information about category or classification, such as medical diagnosis, ethnic group, gender - cannot calculate means as such, but can tabulate counts or frequencies and analyze frequencies Quantitative - usual type of measurement, such as height or weight - measurements of quantitative variables carry information about 'amount' - can calculate means, etc., and can use in calculations Qualitative - carry information about category or classification, such as medical diagnosis, ethnic group, gender - cannot calculate means as such, but can tabulate counts or frequencies and analyze frequencies Random - variables whose values arise by chance factors which cannot be predicted in advance, such as height or weight - race or age are 'fixed' variables; i.e., not random Discrete - can take only certain values or can only be measured to a certain degree of accuracy - e.g., # of children that a woman has delivered, # of teeth with fillings, blood pressure(?) - may be handled differently in analysis Continuous - can take any value to any degree of precision in a certain range - height, weight, temperature(?) Population - largest collection of values of a random variable for which we have an interest at a particular time - school children in Maryland Sample - selected part of a population - fifth grade girls, second grade boys, etc.

22 Basic Concept Variable Examples:
a characteristic that can take on different values for different persons, places or things Statistical analyses need variability; otherwise there is nothing to study Examples: Concentration of a substance, pH values obtained from atmospheric precipitation, birth weight of babies whose mothers are smokers, etc. Quantitative - usual type of measurement, such as height or weight - measurements of quantitative variables carry information about 'amount' - can calculate means, etc., and can use in calculations Qualitative - carry information about category or classification, such as medical diagnosis, ethnic group, gender - cannot calculate means as such, but can tabulate counts or frequencies and analyze frequencies Random - variables whose values arise by chance factors which cannot be predicted in advance, such as height or weight - race or age are 'fixed' variables; i.e., not random Discrete - can take only certain values or can only be measured to a certain degree of accuracy - e.g., # of children that a woman has delivered, # of teeth with fillings, blood pressure(?) - may be handled differently in analysis Continuous - can take any value to any degree of precision in a certain range - height, weight, temperature(?) Population - largest collection of values of a random variable for which we have an interest at a particular time - school children in Maryland Sample - selected part of a population - fifth grade girls, second grade boys, etc.

23 Basic Concept A variable is a characteristic measured on individuals drawn from a population under study. Data are measurements of one or more variables made on a collection of individuals.

24 Basic Concept Type of Variable Continuous variable
Between any two values of a variable, there is another possible value Examples: height, weight, concentration Discrete variable Value can be only integer Example: number of people, plant etc. Quantitative - usual type of measurement, such as height or weight - measurements of quantitative variables carry information about 'amount' - can calculate means, etc., and can use in calculations Qualitative - carry information about category or classification, such as medical diagnosis, ethnic group, gender - cannot calculate means as such, but can tabulate counts or frequencies and analyze frequencies Random - variables whose values arise by chance factors which cannot be predicted in advance, such as height or weight - race or age are 'fixed' variables; i.e., not random Discrete - can take only certain values or can only be measured to a certain degree of accuracy - e.g., # of children that a woman has delivered, # of teeth with fillings, blood pressure(?) - may be handled differently in analysis Continuous - can take any value to any degree of precision in a certain range - height, weight, temperature(?) Population - largest collection of values of a random variable for which we have an interest at a particular time - school children in Maryland Sample - selected part of a population - fifth grade girls, second grade boys, etc.

25 Basic Concept Continuous variables
Can take any value to any degree of precision in a certain range - height, weight, temperature (?) Sampling - essence of statistical inference - why Why sample? Cannot afford time or money to record measurments on entire population and new members of the population may be entering all of the time - We use statistical analysis of a sample to answer questions about a population - cancer patients, teen-age boys, women after child birth, etc. So how do 'intervention studies fit into this? Studies select a sample of the population (e.g., cancer patients) to study the effects of a new therapy and then make inferences about how the rest of the cancer patient population would react to the new therapy.

26 Basic Concept Discrete variables:
Can take only certain values or can only be measured to a certain degree of accuracy - e.g., # of children that a woman has delivered, # of teeth with fillings, blood pressure (?) - may be handled differently in analysis Sampling - essence of statistical inference - why Why sample? Cannot afford time or money to record measurments on entire population and new members of the population may be entering all of the time - We use statistical analysis of a sample to answer questions about a population - cancer patients, teen-age boys, women after child birth, etc. So how do 'intervention studies fit into this? Studies select a sample of the population (e.g., cancer patients) to study the effects of a new therapy and then make inferences about how the rest of the cancer patient population would react to the new therapy.

27 Basic Concept Independent Variable Dependent Variable
We try to predict or explain a response variable from an explanatory variable.

28 Populations and samples

29 Populations <-> Parameters; Samples <-> Estimates
Basic Concept Populations <-> Parameters; Samples <-> Estimates

30 Basic Concept Nomenclature Population Parameters Sample Statistics
Mean Variance  s2 Standard Deviation s

31 Basic Concept Population
Population parameters are constants whereas estimates are random variables, changing from one random sample to the next from the same population. Quantitative - usual type of measurement, such as height or weight - measurements of quantitative variables carry information about 'amount' - can calculate means, etc., and can use in calculations Qualitative - carry information about category or classification, such as medical diagnosis, ethnic group, gender - cannot calculate means as such, but can tabulate counts or frequencies and analyze frequencies Random - variables whose values arise by chance factors which cannot be predicted in advance, such as height or weight - race or age are 'fixed' variables; i.e., not random Discrete - can take only certain values or can only be measured to a certain degree of accuracy - e.g., # of children that a woman has delivered, # of teeth with fillings, blood pressure(?) - may be handled differently in analysis Continuous - can take any value to any degree of precision in a certain range - height, weight, temperature(?) Population - largest collection of values of a random variable for which we have an interest at a particular time - school children in Maryland Sample - selected part of a population - fifth grade girls, second grade boys, etc.

32 Basic Concept Population and Sample
SamplePopulation, StatisticParameter population Parameter predict properties of sample Generalize to a population Quantitative - usual type of measurement, such as height or weight - measurements of quantitative variables carry information about 'amount' - can calculate means, etc., and can use in calculations Qualitative - carry information about category or classification, such as medical diagnosis, ethnic group, gender - cannot calculate means as such, but can tabulate counts or frequencies and analyze frequencies Random - variables whose values arise by chance factors which cannot be predicted in advance, such as height or weight - race or age are 'fixed' variables; i.e., not random Discrete - can take only certain values or can only be measured to a certain degree of accuracy - e.g., # of children that a woman has delivered, # of teeth with fillings, blood pressure(?) - may be handled differently in analysis Continuous - can take any value to any degree of precision in a certain range - height, weight, temperature(?) Population - largest collection of values of a random variable for which we have an interest at a particular time - school children in Maryland Sample - selected part of a population - fifth grade girls, second grade boys, etc. sample statistic

33 Basic Concept Population
Population: a set or collection of objects we are interested in. (finite, infinite) Parameter: a descriptive measure associated with a variable of an entire population, usually unknown because the whole population cannot be enumerated. For example, Plant height under warming conditions; Graduates in USIM; Smokers in the world. Example: number of people, plant etc. Quantitative - usual type of measurement, such as height or weight - measurements of quantitative variables carry information about 'amount' - can calculate means, etc., and can use in calculations Qualitative - carry information about category or classification, such as medical diagnosis, ethnic group, gender - cannot calculate means as such, but can tabulate counts or frequencies and analyze frequencies Random - variables whose values arise by chance factors which cannot be predicted in advance, such as height or weight - race or age are 'fixed' variables; i.e., not random Discrete - can take only certain values or can only be measured to a certain degree of accuracy - e.g., # of children that a woman has delivered, # of teeth with fillings, blood pressure(?) - may be handled differently in analysis Continuous - can take any value to any degree of precision in a certain range - height, weight, temperature(?) Population - largest collection of values of a random variable for which we have an interest at a particular time - school children in Maryland Sample - selected part of a population - fifth grade girls, second grade boys, etc.

34 Basic Concept Population and Sample
Population - largest collection of values of a random variable for which we have an interest at a particular time - school children in Negeri Sembilan. Sample - selected part of a population – Form Three girls, Form Five boys, etc. Quantitative - usual type of measurement, such as height or weight - measurements of quantitative variables carry information about 'amount' - can calculate means, etc., and can use in calculations Qualitative - carry information about category or classification, such as medical diagnosis, ethnic group, gender - cannot calculate means as such, but can tabulate counts or frequencies and analyze frequencies Random - variables whose values arise by chance factors which cannot be predicted in advance, such as height or weight - race or age are 'fixed' variables; i.e., not random Discrete - can take only certain values or can only be measured to a certain degree of accuracy - e.g., # of children that a woman has delivered, # of teeth with fillings, blood pressure(?) - may be handled differently in analysis Continuous - can take any value to any degree of precision in a certain range - height, weight, temperature(?) Population - largest collection of values of a random variable for which we have an interest at a particular time - school children in Maryland Sample - selected part of a population - fifth grade girls, second grade boys, etc.

35 Basic Concept A sample of convenience is a collection of individuals that happen to be available at the time. Quantitative - usual type of measurement, such as height or weight - measurements of quantitative variables carry information about 'amount' - can calculate means, etc., and can use in calculations Qualitative - carry information about category or classification, such as medical diagnosis, ethnic group, gender - cannot calculate means as such, but can tabulate counts or frequencies and analyze frequencies Random - variables whose values arise by chance factors which cannot be predicted in advance, such as height or weight - race or age are 'fixed' variables; i.e., not random Discrete - can take only certain values or can only be measured to a certain degree of accuracy - e.g., # of children that a woman has delivered, # of teeth with fillings, blood pressure(?) - may be handled differently in analysis Continuous - can take any value to any degree of precision in a certain range - height, weight, temperature(?) Population - largest collection of values of a random variable for which we have an interest at a particular time - school children in Maryland Sample - selected part of a population - fifth grade girls, second grade boys, etc.

36 Basic Concept Sampling
essence of statistical inference – why? Why sample? Cannot afford time or money to record measurements on entire population and new members of the population may be entering all of the time - We use statistical analysis of a sample to answer questions about a population - cancer patients, teen-age boys, women after child birth, etc. Quantitative - usual type of measurement, such as height or weight - measurements of quantitative variables carry information about 'amount' - can calculate means, etc., and can use in calculations Qualitative - carry information about category or classification, such as medical diagnosis, ethnic group, gender - cannot calculate means as such, but can tabulate counts or frequencies and analyze frequencies Random - variables whose values arise by chance factors which cannot be predicted in advance, such as height or weight - race or age are 'fixed' variables; i.e., not random Discrete - can take only certain values or can only be measured to a certain degree of accuracy - e.g., # of children that a woman has delivered, # of teeth with fillings, blood pressure(?) - may be handled differently in analysis Continuous - can take any value to any degree of precision in a certain range - height, weight, temperature(?) Population - largest collection of values of a random variable for which we have an interest at a particular time - school children in Maryland Sample - selected part of a population - fifth grade girls, second grade boys, etc.

37 Basic Concept Sampling
Quantitative - usual type of measurement, such as height or weight - measurements of quantitative variables carry information about 'amount' - can calculate means, etc., and can use in calculations Qualitative - carry information about category or classification, such as medical diagnosis, ethnic group, gender - cannot calculate means as such, but can tabulate counts or frequencies and analyze frequencies Random - variables whose values arise by chance factors which cannot be predicted in advance, such as height or weight - race or age are 'fixed' variables; i.e., not random Discrete - can take only certain values or can only be measured to a certain degree of accuracy - e.g., # of children that a woman has delivered, # of teeth with fillings, blood pressure(?) - may be handled differently in analysis Continuous - can take any value to any degree of precision in a certain range - height, weight, temperature(?) Population - largest collection of values of a random variable for which we have an interest at a particular time - school children in Maryland Sample - selected part of a population - fifth grade girls, second grade boys, etc.

38 Basic Concept Bias is a systematic discrepancy between estimates and the true population characteristic. Quantitative - usual type of measurement, such as height or weight - measurements of quantitative variables carry information about 'amount' - can calculate means, etc., and can use in calculations Qualitative - carry information about category or classification, such as medical diagnosis, ethnic group, gender - cannot calculate means as such, but can tabulate counts or frequencies and analyze frequencies Random - variables whose values arise by chance factors which cannot be predicted in advance, such as height or weight - race or age are 'fixed' variables; i.e., not random Discrete - can take only certain values or can only be measured to a certain degree of accuracy - e.g., # of children that a woman has delivered, # of teeth with fillings, blood pressure(?) - may be handled differently in analysis Continuous - can take any value to any degree of precision in a certain range - height, weight, temperature(?) Population - largest collection of values of a random variable for which we have an interest at a particular time - school children in Maryland Sample - selected part of a population - fifth grade girls, second grade boys, etc.

39 Basic Concept Sampling error - The difference between the estimate and average value of the estimate is a systematic discrepancy between estimates and the true population characteristic. Quantitative - usual type of measurement, such as height or weight - measurements of quantitative variables carry information about 'amount' - can calculate means, etc., and can use in calculations Qualitative - carry information about category or classification, such as medical diagnosis, ethnic group, gender - cannot calculate means as such, but can tabulate counts or frequencies and analyze frequencies Random - variables whose values arise by chance factors which cannot be predicted in advance, such as height or weight - race or age are 'fixed' variables; i.e., not random Discrete - can take only certain values or can only be measured to a certain degree of accuracy - e.g., # of children that a woman has delivered, # of teeth with fillings, blood pressure(?) - may be handled differently in analysis Continuous - can take any value to any degree of precision in a certain range - height, weight, temperature(?) Population - largest collection of values of a random variable for which we have an interest at a particular time - school children in Maryland Sample - selected part of a population - fifth grade girls, second grade boys, etc.

40 Basic Concept Larger samples on average will have smaller sampling error. Quantitative - usual type of measurement, such as height or weight - measurements of quantitative variables carry information about 'amount' - can calculate means, etc., and can use in calculations Qualitative - carry information about category or classification, such as medical diagnosis, ethnic group, gender - cannot calculate means as such, but can tabulate counts or frequencies and analyze frequencies Random - variables whose values arise by chance factors which cannot be predicted in advance, such as height or weight - race or age are 'fixed' variables; i.e., not random Discrete - can take only certain values or can only be measured to a certain degree of accuracy - e.g., # of children that a woman has delivered, # of teeth with fillings, blood pressure(?) - may be handled differently in analysis Continuous - can take any value to any degree of precision in a certain range - height, weight, temperature(?) Population - largest collection of values of a random variable for which we have an interest at a particular time - school children in Maryland Sample - selected part of a population - fifth grade girls, second grade boys, etc.

41 Basic Concept Properties of a good sample
Independent selection of individuals Random selection of individuals Sufficiently large Quantitative - usual type of measurement, such as height or weight - measurements of quantitative variables carry information about 'amount' - can calculate means, etc., and can use in calculations Qualitative - carry information about category or classification, such as medical diagnosis, ethnic group, gender - cannot calculate means as such, but can tabulate counts or frequencies and analyze frequencies Random - variables whose values arise by chance factors which cannot be predicted in advance, such as height or weight - race or age are 'fixed' variables; i.e., not random Discrete - can take only certain values or can only be measured to a certain degree of accuracy - e.g., # of children that a woman has delivered, # of teeth with fillings, blood pressure(?) - may be handled differently in analysis Continuous - can take any value to any degree of precision in a certain range - height, weight, temperature(?) Population - largest collection of values of a random variable for which we have an interest at a particular time - school children in Maryland Sample - selected part of a population - fifth grade girls, second grade boys, etc.

42 Basic Concept Sampling
So how do 'intervention studies fit into this? Studies select a sample of the population (e.g., cancer patients) to study the effects of a new therapy and then make inferences about how the rest of the cancer patient population would react to the new therapy. Quantitative - usual type of measurement, such as height or weight - measurements of quantitative variables carry information about 'amount' - can calculate means, etc., and can use in calculations Qualitative - carry information about category or classification, such as medical diagnosis, ethnic group, gender - cannot calculate means as such, but can tabulate counts or frequencies and analyze frequencies Random - variables whose values arise by chance factors which cannot be predicted in advance, such as height or weight - race or age are 'fixed' variables; i.e., not random Discrete - can take only certain values or can only be measured to a certain degree of accuracy - e.g., # of children that a woman has delivered, # of teeth with fillings, blood pressure(?) - may be handled differently in analysis Continuous - can take any value to any degree of precision in a certain range - height, weight, temperature(?) Population - largest collection of values of a random variable for which we have an interest at a particular time - school children in Maryland Sample - selected part of a population - fifth grade girls, second grade boys, etc.

43 Basic Concept Sample Sample: a small number of subjects from a population to make inference about the population; Random sample: A sample of size n drawn from a population of size N in such a way that every possible sample of size n has the same chance of being selected. Statistic: a descriptive measure associated with a random variable of a sample. Quantitative - usual type of measurement, such as height or weight - measurements of quantitative variables carry information about 'amount' - can calculate means, etc., and can use in calculations Qualitative - carry information about category or classification, such as medical diagnosis, ethnic group, gender - cannot calculate means as such, but can tabulate counts or frequencies and analyze frequencies Random - variables whose values arise by chance factors which cannot be predicted in advance, such as height or weight - race or age are 'fixed' variables; i.e., not random Discrete - can take only certain values or can only be measured to a certain degree of accuracy - e.g., # of children that a woman has delivered, # of teeth with fillings, blood pressure(?) - may be handled differently in analysis Continuous - can take any value to any degree of precision in a certain range - height, weight, temperature(?) Population - largest collection of values of a random variable for which we have an interest at a particular time - school children in Maryland Sample - selected part of a population - fifth grade girls, second grade boys, etc.

44 Basic Concept Random Variables whose values arise by chance factors which cannot be predicted in advance, such as height or weight race or age are 'fixed' variables; i.e., not random Quantitative - usual type of measurement, such as height or weight - measurements of quantitative variables carry information about 'amount' - can calculate means, etc., and can use in calculations Qualitative - carry information about category or classification, such as medical diagnosis, ethnic group, gender - cannot calculate means as such, but can tabulate counts or frequencies and analyze frequencies Random - variables whose values arise by chance factors which cannot be predicted in advance, such as height or weight - race or age are 'fixed' variables; i.e., not random Discrete - can take only certain values or can only be measured to a certain degree of accuracy - e.g., # of children that a woman has delivered, # of teeth with fillings, blood pressure(?) - may be handled differently in analysis Continuous - can take any value to any degree of precision in a certain range - height, weight, temperature(?) Population - largest collection of values of a random variable for which we have an interest at a particular time - school children in Maryland Sample - selected part of a population - fifth grade girls, second grade boys, etc.

45 Basic Concept Random In a random sample, each member of a population has an equal and independent chance of being selected. Quantitative - usual type of measurement, such as height or weight - measurements of quantitative variables carry information about 'amount' - can calculate means, etc., and can use in calculations Qualitative - carry information about category or classification, such as medical diagnosis, ethnic group, gender - cannot calculate means as such, but can tabulate counts or frequencies and analyze frequencies Random - variables whose values arise by chance factors which cannot be predicted in advance, such as height or weight - race or age are 'fixed' variables; i.e., not random Discrete - can take only certain values or can only be measured to a certain degree of accuracy - e.g., # of children that a woman has delivered, # of teeth with fillings, blood pressure(?) - may be handled differently in analysis Continuous - can take any value to any degree of precision in a certain range - height, weight, temperature(?) Population - largest collection of values of a random variable for which we have an interest at a particular time - school children in Maryland Sample - selected part of a population - fifth grade girls, second grade boys, etc.

46 Descriptive Statistics
Graphical Summaries Frequency distribution Histogram Stem and Leaf plot Boxplot Numerical Summaries Location – mean, median, mode. Spread – range, variance, standard deviation Shape – skewness, kurtosis

47 Frequency Distribution - Discrete variables
Example: Number of grass plants, Mytilus edulis, found in 800 sample quadrats (1m2) in an ecological study of grasses: Sampling - essence of statistical inference - why Why sample? Cannot afford time or money to record measurments on entire population and new members of the population may be entering all of the time - We use statistical analysis of a sample to answer questions about a population - cancer patients, teen-age boys, women after child birth, etc. So how do 'intervention studies fit into this? Studies select a sample of the population (e.g., cancer patients) to study the effects of a new therapy and then make inferences about how the rest of the cancer patient population would react to the new therapy.

48 Frequency Distribution - Discrete variables
Example: Number of grass plants, Mytilus edulis, found in 800 sample quadrats (1m2) in an ecological study of grasses: 1, 4, 1, 0, 0, 1, 0, 0, 2, 3, 1, 2, 3, 1, 0, 2, 0, 1, 2, ……………………………………………………… 1, 2, 3, 2, 1, 1, 0, 5, 0, 0, 1, 0, 1, 0, 2, 4, 7, 2, 1,0 How is the plant number in a quadrat distributed? Sampling - essence of statistical inference - why Why sample? Cannot afford time or money to record measurments on entire population and new members of the population may be entering all of the time - We use statistical analysis of a sample to answer questions about a population - cancer patients, teen-age boys, women after child birth, etc. So how do 'intervention studies fit into this? Studies select a sample of the population (e.g., cancer patients) to study the effects of a new therapy and then make inferences about how the rest of the cancer patient population would react to the new therapy.

49 Frequency Distribution - Discrete variables
Table 1. The frequency, relative frequency, cumulative frequencies of plant sedge in a quadrat. Sampling - essence of statistical inference - why Why sample? Cannot afford time or money to record measurments on entire population and new members of the population may be entering all of the time - We use statistical analysis of a sample to answer questions about a population - cancer patients, teen-age boys, women after child birth, etc. So how do 'intervention studies fit into this? Studies select a sample of the population (e.g., cancer patients) to study the effects of a new therapy and then make inferences about how the rest of the cancer patient population would react to the new therapy. frequency - number of times value occurs in data.(probability for population). relative frequency - the % of the time that the value occurs (frequency/n). cumulative relative frequency - the % of the sample that is equal to or smaller than the value (cumulative frequency/n).

50 Histogram (Bar graph) and polygon
Histogram graph of frequencies Can be used to visually compare frequencies Easier to assess magnitude of differences rather than trying to judge numbers Frequency polygon - similar to histogram Sampling - essence of statistical inference - why Why sample? Cannot afford time or money to record measurments on entire population and new members of the population may be entering all of the time - We use statistical analysis of a sample to answer questions about a population - cancer patients, teen-age boys, women after child birth, etc. So how do 'intervention studies fit into this? Studies select a sample of the population (e.g., cancer patients) to study the effects of a new therapy and then make inferences about how the rest of the cancer patient population would react to the new therapy. Fig. 1. Frequency distribution of plants in a quadrat.

51 Frequency Distribution - Continuous variables
Grouping of continuous outcome Examples: weight, height. Better understanding of what data show rather than individual values Example: Fiber length of a cotton (n=106) Data: 27.5,28.6,29.4,30.5,31.4,29.8,27.6,28.7,27.6………… 31.8,32.0,27.8 Sampling - essence of statistical inference - why Why sample? Cannot afford time or money to record measurments on entire population and new members of the population may be entering all of the time - We use statistical analysis of a sample to answer questions about a population - cancer patients, teen-age boys, women after child birth, etc. So how do 'intervention studies fit into this? Studies select a sample of the population (e.g., cancer patients) to study the effects of a new therapy and then make inferences about how the rest of the cancer patient population would react to the new therapy.

52 Frequency Distribution - Continuous variables
Table 2. Frequency and relative frequency distribution of fiber length (mm) of a cotton variety (n=106) Sampling - essence of statistical inference - why Why sample? Cannot afford time or money to record measurments on entire population and new members of the population may be entering all of the time - We use statistical analysis of a sample to answer questions about a population - cancer patients, teen-age boys, women after child birth, etc. So how do 'intervention studies fit into this? Studies select a sample of the population (e.g., cancer patients) to study the effects of a new therapy and then make inferences about how the rest of the cancer patient population would react to the new therapy.

53 Frequency Distribution - Continuous variables
Calculate Range: R=max(X)-min(x)=5.13 Set Number of intervals g and interval range i Some “rules” exist, but generally create 8-15 equal sized intervals, g=11 i =R/(g-1)=0.5 Set intervals L1=min(X)-i /2=27.0, L2=L1+i =27.5, … Count number in each interval Sampling - essence of statistical inference - why Why sample? Cannot afford time or money to record measurments on entire population and new members of the population may be entering all of the time - We use statistical analysis of a sample to answer questions about a population - cancer patients, teen-age boys, women after child birth, etc. So how do 'intervention studies fit into this? Studies select a sample of the population (e.g., cancer patients) to study the effects of a new therapy and then make inferences about how the rest of the cancer patient population would react to the new therapy.

54 Histogram (Bar graph) and polygon
Sampling - essence of statistical inference - why Why sample? Cannot afford time or money to record measurments on entire population and new members of the population may be entering all of the time - We use statistical analysis of a sample to answer questions about a population - cancer patients, teen-age boys, women after child birth, etc. So how do 'intervention studies fit into this? Studies select a sample of the population (e.g., cancer patients) to study the effects of a new therapy and then make inferences about how the rest of the cancer patient population would react to the new therapy. Fig. 2. Frequency distribution in fiber length of a cotton.

55 Histogram A histogram is a way of summarising data that are measured on an interval scale (either discrete or continuous). It is often used in exploratory data analysis to illustrate the major features of the distribution of the data in a convenient form. It divides up the range of possible values in a data set into classes or groups. For each group, a rectangle is constructed with a base length equal to the range of values in that specific group, and an area proportional to the number of observations falling into that group. This means that the rectangles might be drawn of non-uniform height.

56 Histogram The histogram is only appropriate for variables whose values are numerical and measured on an interval scale. It is generally used when dealing with large data sets (>100 observations), when stem and leaf plots become tedious to construct. A histogram can also help detect any unusual observations (outliers), or any gaps in the data set.

57 Histogram

58 Stem and Leaf Displays Another way to assess frequencies
Does preserve individual measure information, so not useful for large data sets Stem is first digit(s) of measurements, leaves are last digit of measurements Most useful for two digit numbers, more cumbersome for three+ digits Sampling - essence of statistical inference - why Why sample? Cannot afford time or money to record measurments on entire population and new members of the population may be entering all of the time - We use statistical analysis of a sample to answer questions about a population - cancer patients, teen-age boys, women after child birth, etc. So how do 'intervention studies fit into this? Studies select a sample of the population (e.g., cancer patients) to study the effects of a new therapy and then make inferences about how the rest of the cancer patient population would react to the new therapy. 20: X 30: XXX 40: XXXX 50: XX 60: X 2* | 1 3* | 244 4* | 2468 5* | 26 6* | 4 Stem leaf

59 Stem and Leaf Plot A stem and leaf plot is a way of summarising a set of data measured on an interval scale. It is often used in exploratory data analysis to illustrate the major features of the distribution of the data in a convenient and easily drawn form.

60 Stem and Leaf Plot A stem and leaf plot is similar to a histogram but is usually a more informative display for relatively small data sets (<100 data points). It provides a table as well as a picture of the data and from it we can readily write down the data in order of magnitude, which is useful for many statistical procedures, e.g. in the skinfold thickness example below:

61 Stem and Leaf Plot We can compare more than one data set by the use of multiple stem and leaf plots. By using a back-to-back stem and leaf plot, we are able to compare the same characteristic in two different groups, for example, pulse rate after exercise of smokers and non-smokers.

62 Summary In practice, descriptive statistics play a major role
Always the first 1-2 tables/figures in a paper Statistician needs to know about each variable before deciding how to analyze to answer research questions In any analysis, 90% of the effort goes into setting up the data Descriptive statistics are part of that 90% Sampling - essence of statistical inference - why Why sample? Cannot afford time or money to record measurments on entire population and new members of the population may be entering all of the time - We use statistical analysis of a sample to answer questions about a population - cancer patients, teen-age boys, women after child birth, etc. So how do 'intervention studies fit into this? Studies select a sample of the population (e.g., cancer patients) to study the effects of a new therapy and then make inferences about how the rest of the cancer patient population would react to the new therapy.

63 Descriptive Statistics - Measures of Location
Descriptive measure computed from population data - parameter Descriptive measure computed from sample data - statistic Most common measures of location Mean Median Mode Geometric Mean, harmonic mean Sampling - essence of statistical inference - why Why sample? Cannot afford time or money to record measurments on entire population and new members of the population may be entering all of the time - We use statistical analysis of a sample to answer questions about a population - cancer patients, teen-age boys, women after child birth, etc. So how do 'intervention studies fit into this? Studies select a sample of the population (e.g., cancer patients) to study the effects of a new therapy and then make inferences about how the rest of the cancer patient population would react to the new therapy.

64 Arithmetic mean (population)
Suppose we have N measurements of a particular variable in a population.We denote these N measurements as: X1, X2, X3,…,XN where X1 is the first measurement, X2 is the second, etc. Definition More accurately called the arithmetic mean, it is defined as the sum of measures observed divided by the number of observations. Sampling - essence of statistical inference - why Why sample? Cannot afford time or money to record measurments on entire population and new members of the population may be entering all of the time - We use statistical analysis of a sample to answer questions about a population - cancer patients, teen-age boys, women after child birth, etc. So how do 'intervention studies fit into this? Studies select a sample of the population (e.g., cancer patients) to study the effects of a new therapy and then make inferences about how the rest of the cancer patient population would react to the new therapy.

65 Arithmetic mean (sample)
Sample: Suppose we have n measurements of a particular variable in a population with N measurements.The n measurements are: X1, X2, X3,…,Xn where X1 is the first measurement, X2 is the second, etc. Definition Sampling - essence of statistical inference - why Why sample? Cannot afford time or money to record measurments on entire population and new members of the population may be entering all of the time - We use statistical analysis of a sample to answer questions about a population - cancer patients, teen-age boys, women after child birth, etc. So how do 'intervention studies fit into this? Studies select a sample of the population (e.g., cancer patients) to study the effects of a new therapy and then make inferences about how the rest of the cancer patient population would react to the new therapy.

66 Some Properties of the Arithmetic Mean
Arithmetic mean (sample) Some Properties of the Arithmetic Mean , Prove: 1. 2. Sampling - essence of statistical inference - why Why sample? Cannot afford time or money to record measurments on entire population and new members of the population may be entering all of the time - We use statistical analysis of a sample to answer questions about a population - cancer patients, teen-age boys, women after child birth, etc. So how do 'intervention studies fit into this? Studies select a sample of the population (e.g., cancer patients) to study the effects of a new therapy and then make inferences about how the rest of the cancer patient population would react to the new therapy.

67 Median Frequently used if there are extreme values in a distribution or if the distribution is non-normal Definition That value that divides the ‘ordered array’ into two equal parts If an odd number of observations, the median Md will be the (n+1)/2 observation ex.: median of 11 observations is the 6th observation If an even number of observations, the median Md will be the midpoint between the middle two observations ex.: median of 12 observations is the midpoint between 6th and 7th Sampling - essence of statistical inference - why Why sample? Cannot afford time or money to record measurments on entire population and new members of the population may be entering all of the time - We use statistical analysis of a sample to answer questions about a population - cancer patients, teen-age boys, women after child birth, etc. So how do 'intervention studies fit into this? Studies select a sample of the population (e.g., cancer patients) to study the effects of a new therapy and then make inferences about how the rest of the cancer patient population would react to the new therapy.

68 Mode Definition Value that occurs most frequently in data set Example
, mode Mo=5 If all values different, no mode May be more than one mode Bimodal or multimodal Not used very frequently in practice Sampling - essence of statistical inference - why Why sample? Cannot afford time or money to record measurments on entire population and new members of the population may be entering all of the time - We use statistical analysis of a sample to answer questions about a population - cancer patients, teen-age boys, women after child birth, etc. So how do 'intervention studies fit into this? Studies select a sample of the population (e.g., cancer patients) to study the effects of a new therapy and then make inferences about how the rest of the cancer patient population would react to the new therapy.

69 Example: Central Location
Suppose the ages of the 10 trees you are studying are: ,24,56,52,21,44,64,34,42,46 Then the mean age of this group is: To find the median, first order the data: 21,24,34,34,42,44,46,52,56,64 The mode is 34 years Mo=34 (occurred twice). Mean are commonly used

70 Geometric mean Used to calculate mean growth rate Definition
Antilog of the mean of the log xi Sampling - essence of statistical inference - why Why sample? Cannot afford time or money to record measurments on entire population and new members of the population may be entering all of the time - We use statistical analysis of a sample to answer questions about a population - cancer patients, teen-age boys, women after child birth, etc. So how do 'intervention studies fit into this? Studies select a sample of the population (e.g., cancer patients) to study the effects of a new therapy and then make inferences about how the rest of the cancer patient population would react to the new therapy.

71 Geometric mean Example: Root growth at 25oC, calculate mean growth rate (mm/d). Sampling - essence of statistical inference - why Why sample? Cannot afford time or money to record measurments on entire population and new members of the population may be entering all of the time - We use statistical analysis of a sample to answer questions about a population - cancer patients, teen-age boys, women after child birth, etc. So how do 'intervention studies fit into this? Studies select a sample of the population (e.g., cancer patients) to study the effects of a new therapy and then make inferences about how the rest of the cancer patient population would react to the new therapy.

72 Descriptive Statistics - Measures of Dispersion
Look at these two data sets: Set 1: 100, 30, 20, 7, –20, –30, –100 Set 2: 10, 3, 2, 7, -2, -3, -10 If we calculate mean: Set 1. Set 2. How to measure dispersion (spread, variability)? Sampling - essence of statistical inference - why Why sample? Cannot afford time or money to record measurments on entire population and new members of the population may be entering all of the time - We use statistical analysis of a sample to answer questions about a population - cancer patients, teen-age boys, women after child birth, etc. So how do 'intervention studies fit into this? Studies select a sample of the population (e.g., cancer patients) to study the effects of a new therapy and then make inferences about how the rest of the cancer patient population would react to the new therapy.

73 Descriptive Statistics - Measures of Dispersion
Common measures Range Variance and Standard deviation Coefficient of variation Many distributions are well-described by measure of location and dispersion Sampling - essence of statistical inference - why Why sample? Cannot afford time or money to record measurments on entire population and new members of the population may be entering all of the time - We use statistical analysis of a sample to answer questions about a population - cancer patients, teen-age boys, women after child birth, etc. So how do 'intervention studies fit into this? Studies select a sample of the population (e.g., cancer patients) to study the effects of a new therapy and then make inferences about how the rest of the cancer patient population would react to the new therapy.

74 Range Range is the difference between the largest and smallest values in the data set R=Max (Xi) - Min (Xi) Heavily influenced by two most extreme values and ignores the rest of the distribution Set 1: 100, 30, 20, 7, –20, –30, –100 Set 2: 10, 3, 2, 7, -2, -3, -10 R1=200 R2=20 Sampling - essence of statistical inference - why Why sample? Cannot afford time or money to record measurments on entire population and new members of the population may be entering all of the time - We use statistical analysis of a sample to answer questions about a population - cancer patients, teen-age boys, women after child birth, etc. So how do 'intervention studies fit into this? Studies select a sample of the population (e.g., cancer patients) to study the effects of a new therapy and then make inferences about how the rest of the cancer patient population would react to the new therapy.

75 Variance and Standard Deviation - Population
Suppose we have N measurements of a particular variable in a population: X1, X2, X3,…,XN, The mean is , as , we define: as variance, unit is X unit2 as standard deviation Sampling - essence of statistical inference - why Why sample? Cannot afford time or money to record measurments on entire population and new members of the population may be entering all of the time - We use statistical analysis of a sample to answer questions about a population - cancer patients, teen-age boys, women after child birth, etc. So how do 'intervention studies fit into this? Studies select a sample of the population (e.g., cancer patients) to study the effects of a new therapy and then make inferences about how the rest of the cancer patient population would react to the new therapy.

76 Variance and Standard Deviation - Sample
Suppose we have n measurements of a particular variable in a sample: X1, X2, X3,…,Xn, The mean is , we define: as mean squares, or sample variance as standard deviation Sampling - essence of statistical inference - why Why sample? Cannot afford time or money to record measurments on entire population and new members of the population may be entering all of the time - We use statistical analysis of a sample to answer questions about a population - cancer patients, teen-age boys, women after child birth, etc. So how do 'intervention studies fit into this? Studies select a sample of the population (e.g., cancer patients) to study the effects of a new therapy and then make inferences about how the rest of the cancer patient population would react to the new therapy.

77 Variance and Standard Deviation
Corrected Sum of Squares (CSS) Degree of freedom n-1 used because if we know n-1 deviations, the nth deviation is known Deviations have to sum to zero Sampling - essence of statistical inference - why Why sample? Cannot afford time or money to record measurments on entire population and new members of the population may be entering all of the time - We use statistical analysis of a sample to answer questions about a population - cancer patients, teen-age boys, women after child birth, etc. So how do 'intervention studies fit into this? Studies select a sample of the population (e.g., cancer patients) to study the effects of a new therapy and then make inferences about how the rest of the cancer patient population would react to the new therapy.

78 Example Suppose the ages of the 10 trees you are studying are: 34,24,56,52,21,44,64,34,42,46, We calculated Calculate range, variation, standard deviation and CV. Sampling - essence of statistical inference - why Why sample? Cannot afford time or money to record measurments on entire population and new members of the population may be entering all of the time - We use statistical analysis of a sample to answer questions about a population - cancer patients, teen-age boys, women after child birth, etc. So how do 'intervention studies fit into this? Studies select a sample of the population (e.g., cancer patients) to study the effects of a new therapy and then make inferences about how the rest of the cancer patient population would react to the new therapy. R=64-21=43 y, s2=1692.1/9= y2, s=13.72 y.

79 Coefficient of Variation
Relative variation rather than absolute variation such as standard deviation Definition of C.V. Useful in comparing variation between two distributions Used particularly in comparing laboratory measures to identify those determinations with more variation Sampling - essence of statistical inference - why Why sample? Cannot afford time or money to record measurments on entire population and new members of the population may be entering all of the time - We use statistical analysis of a sample to answer questions about a population - cancer patients, teen-age boys, women after child birth, etc. So how do 'intervention studies fit into this? Studies select a sample of the population (e.g., cancer patients) to study the effects of a new therapy and then make inferences about how the rest of the cancer patient population would react to the new therapy.

80 Example Set 1: 100, 30, 20, 7, –20, –30, –100 Set 2: 10, 3, 2, 7, -2, -3, -10 Calculate , s2, s and CV. Set s s CV

81 Box Plots Descriptive method to convey information about measures of location and dispersion Box-and-Whisker plots Construction of boxplot Box is IQR Line at median Whiskers at smallest and largest observations Other conventions can be used, especially to represent extreme values Sampling - essence of statistical inference - why Why sample? Cannot afford time or money to record measurments on entire population and new members of the population may be entering all of the time - We use statistical analysis of a sample to answer questions about a population - cancer patients, teen-age boys, women after child birth, etc. So how do 'intervention studies fit into this? Studies select a sample of the population (e.g., cancer patients) to study the effects of a new therapy and then make inferences about how the rest of the cancer patient population would react to the new therapy.

82 Box Plots Drug Sampling - essence of statistical inference - why
Why sample? Cannot afford time or money to record measurments on entire population and new members of the population may be entering all of the time - We use statistical analysis of a sample to answer questions about a population - cancer patients, teen-age boys, women after child birth, etc. So how do 'intervention studies fit into this? Studies select a sample of the population (e.g., cancer patients) to study the effects of a new therapy and then make inferences about how the rest of the cancer patient population would react to the new therapy. Drug

83 Box and Whisker Plot (or Boxplot)
A box and whisker plot is a way of summarising a set of data measured on an interval scale. It is often used in exploratory data analysis. It is a type of graph which is used to show the shape of the distribution, its central value, and variability. The picture produced consists of the most extreme values in the data set (maximum and minimum values), the lower and upper quartiles, and the median

84 Box and Whisker Plot (or Boxplot)
A box plot (as it is often called) is especially helpful for indicating whether a distribution is skewed and whether there are any unusual observations (outliers) in the data set. Box and whisker plots are also very useful when large numbers of observations are involved and when two or more data sets are being compared.

85 Box and Whisker Plot (or Boxplot)

86 Box and Whisker Plot (or Boxplot)

87 Box and Whisker Plot (or Boxplot)

88 5-Number Summary A 5-number summary is especially useful when we have so many data that it is sufficient to present a summary of the data rather than the whole data set. It consists of 5 values: the most extreme values in the data set (maximum and minimum values), the lower and upper quartiles, and the median. A 5-number summary can be represented in a diagram known as a box and whisker plot. In cases where we have more than one data set to analyse, a 5-number summary is constructed for each, with corresponding multiple box and whisker plots.

89 Outlier An outlier is an observation in a data set which is far removed in value from the others in the data set. It is an unusually large or an unusually small value compared to the others. An outlier might be the result of an error in measurement, in which case it will distort the interpretation of the data, having undue influence on many summary statistics, for example, the mean.

90 Outlier If an outlier is a genuine result, it is important because it might indicate an extreme of behaviour of the process under study. For this reason, all outliers must be examined carefully before embarking on any formal analysis. Outliers should not routinely be removed without further justification.

91 Interpreting a Boxplot

92 Interpreting a Boxplot
The boxplot is interpreted as follows: The box itself contains the middle 50% of the data. The upper edge (hinge) of the box indicates the 75th percentile of the data set, and the lower hinge indicates the 25th percentile. The range of the middle two quartiles is known as the inter-quartile range. The line in the box indicates the median value of the data.

93 Interpreting a Boxplot
The boxplot is interpreted as follows: If the median line within the box is not equidistant from the hinges, then the data is skewed. The ends of the vertical lines or "whiskers" indicate the minimum and maximum data values, unless outliers are present in which case the whiskers extend to a maximum of 1.5 times the inter-quartile range. The points outside the ends of the whiskers are outliers or suspected outliers.

94 Boxplot Enhancements Beyond the basic information, boxplots sometimes are enhanced to convey additional information: The mean and its confidence interval can be shown using a diamond shape in the box. The expected range of the median can be shown using notches in the box. The width of the box can be varied in proportion to the log of the sample size.

95 Advantages of Boxplots
Boxplots have the following strengths: Graphically display a variable's location and spread at a glance. Provide some indication of the data's symmetry and skewness. Unlike many other methods of data display, boxplots show outliers. By using a boxplot for each categorical variable side-by-side on the same graph, one quickly can compare data sets.

96 Disadvantage of Boxplots
One drawback of boxplots is that they tend to emphasize the tails of a distribution, which are the least certain points in the data set. They also hide many of the details of the distribution. Displaying a histogram in conjunction with the boxplot helps in this regard, and both are important tools for exploratory data analysis.

97 Boxplot Example 1 Check location and variation shifts Box plots are an excellent tool for conveying location and variation information in data sets, particularly for detecting and illustrating location and variation changes between different groups of data. Sample Plot: This box plot reveals that machine has a significant effect on energy with respect to location and possibly variation

98 Boxplot Example 1

99 Boxplot Example 1 This box plot, comparing four machines for energy output, shows that machine has a significant effect on energy with respect to both location and variation. Machine 3 has the highest energy response (about 72.5); machine 4 has the least variable energy response with about 50% of its readings being within 1 energy unit.

100 Boxplot Example 1 These MINITAB boxplots represent lottery payoffs for winning numbers for three time periods (May 1975-March 1976, November 1976-September 1977, and December 1980-September 1981).

101 Boxplot Example 1 The median for each dataset is indicated by the black center line, and the first and third quartiles are the edges of the red area, which is known as the inter-quartile range (IQR).

102 Boxplot Example 1 The extreme values (within 1.5 times the inter-quartile range from the upper or lower quartile) are the ends of the lines extending from the IQR. Points at a greater distance from the median than 1.5 times the IQR are plotted individually as asterisks. These points represent potential outliers.

103 Boxplot Example 1 In this example, the three boxplots have nearly identical median values. The IQR is decreasing from one time period to the next, indicating reduced variability of payoffs in the second and third periods. In addition, the extreme values are closer to the median in the later time periods.

104 Boxplot Example 2 As shown in the figure, a line is drawn from the upper hinge to the upper adjacent value and from the lower hinge to the lower adjacent value. Every score between the inner and outer fences is indicated by an "o" whereas a score beyond the outer fences is indicated by a "*".

105 Boxplot Example 2 It is often useful to compare data from two or more groups by viewing box plots from the groups side by side. The data from 2b are higher, more spread out, and have a positive skew. That the skew is positive can be determined by the fact that the mean is higher than the median and the upper whisker is longer than the lower whisker.

106 Boxplot Example 3 Although the medians are all roughly the same, you can see at a glance that the spread of each data set is different. The boxplot on the left shows data that appears to be distributed evenly. The median is in the middle of the rectangle, and the whiskers are about the same length. In addition, the plot contains no outside values. The median of the second plot from the left appears to be slightly off-center. The amount of extreme values is a point of concern because it suggests that the data vary widely.

107 Boxplot Example 3 The third boxplot shows data that has less variation and spread than the other plots. The fourth boxplot shows data that is significantly upwardly-skewed. The median of this plot is closer to the top of the rectangle than to the bottom, and the upper whisker is longer than the bottom one. All the boxplots have approximately the same median, and the two boxplots on the left have approximately the same variation in the data.

108 Descriptive Statistics (Summmary)
Graphical Summaries Frequency distribution Histogram Stem and Leaf plot Boxplot Numerical Summaries Location - mean, median, mode. Dispersion - range, variance, standard deviation Shape

109 Software Statistical software Graphical software SAS SPSS Stata BMDP
MINITAB Graphical software Sigmaplot Harvard Graphics PowerPoint Excel Sampling - essence of statistical inference - why Why sample? Cannot afford time or money to record measurments on entire population and new members of the population may be entering all of the time - We use statistical analysis of a sample to answer questions about a population - cancer patients, teen-age boys, women after child birth, etc. So how do 'intervention studies fit into this? Studies select a sample of the population (e.g., cancer patients) to study the effects of a new therapy and then make inferences about how the rest of the cancer patient population would react to the new therapy.

110 Biostatistics

111 Biostatistics


Download ppt "Descriptive Statistics"

Similar presentations


Ads by Google