Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Statistics

Similar presentations


Presentation on theme: "Introduction to Statistics"— Presentation transcript:

1 Introduction to Statistics
MAT L PRADO Introduction to Statistics Prof. L Prado OER

2 Chapter 1 The science of statistics is
Overview Nature of data Skills needed in statistics The science of statistics is Collecting, Organizing, Summarizing, Analyzing information to draw conclusions from data or answer questions.

3 Statistical Methods Descriptive Statistics Inferential Estimation
MAT L PRADO Statistical Methods Descriptive Statistics Inferential Estimation Hypothesis Testing

4 Overview Key goal of statistics: Learn about a large group
Survey: tool to collect data from a smaller group which is part of a larger group to learn something about the larger group Statistics: Descriptive Collection,organization sumarization, and presentation of data. Inferential Draw conclusions with respect a population by using samples. Draw = Infer Key goal of statistics: Learn about a large group (population) from data from a smaller subgroup (sample)

5 Overview Definitions:
Variable: It’s a characteristic or attribute that varies. Data: are the values for the variable collected (measurements,observations: gender, answers,…). Statistics: collection of methods to study data Population: complete collection of all subjects (individuals, scores, measurements,…) Sample: subcollection of members selected from a population. Census: collection of data from every member of the population. (ex. US-Census).

6 Overview Example: Poll: 1087 adults are asked whether they drink alcoholic beverages or not. Sample: 1087 adults Population: US adults 150 million. Census: Every 10 years, the census bureau tries to collect information from every member of the US population. Impossible! Very expensive! (time and money) Use sample data to draw conclusions from whole population: inferential statistics!

7 Lincoln elected: 39.82% of 1,865,908 votes counted.
Parameter: A numerical measurement describing some characteristic of the population. Lincoln elected: 39.82% of 1,865,908 votes counted. 39.82% is a parameter. Statistic: A numerical measurement describing some characteristic of the sample. Based on a sample of 877 elected executives, 45% would not hire an applicant with a typographical error in the application. 45% is a statistic.

8 Types of data Quantitative data: Numbers representing counts or measurements. Number of children in a family,Weights, Heights,ages. Qualitative data (Categorical data): Nonnumerical. Gender of an athlete, Zip code, Blood type, States in the U.S., and brands of TV. Discrete(count) variable vs. continuous (measure) variable # of people in a household vs. temperatures in May. Nominal level of measurement: names, labels categories: no ordering. Yes/No/Undecided responses, colors,gender,jersey numbers of players. Ordinal level of measurement: some order(rank), but numerical values meaningless or nonexistent. grades A, B, C, D, F.,intensity of pain(none,mild,moderate, severe) Interval level of measurement: order, but “no 0” or meaningless. Temperature, year, IQ score. Ratio level of measurement: Interval level with meaningful zero. Weights, prices (non-negative), number of phones calls received.

9 Summary The process of statistics is designed to collect and analyze data to reach conclusions Variables can be classified by their type of data Qualitative variables: Nominal or Ordinal. Quantitative variables: Discrete: (values counted) Continuous:(values measured)

10 Basic skills Samples: representative:
“39/40 polled people vote for A” Sampled in A’s headquarters! Not too small: CDF published “among HS students suspended, 67% suspended more than 3 times” Sample size: 3! Graphs: In which one does red do better? Percentage of: 6 % of 1200 = 6 / 100 * = 72 Fraction >>> percentage: 3/4 = 0.75 >>> * 100% = 75 % Percentage >>> decimal: 27.3% = 27.3/100 = 0.273 Decimal >>> percentage: 0.852 >>> * 100% = 85.2% `

11 Basic skills 2 Calculator:

12 Statistical Study Observational study: observe and measure characteristics without trying to modify individuals. Gallup poll, Nielsen Media poll (TV shows). Cross-sectional: data observed, measured at one point in time. Retrospective: data are collected from the past (records) Prospective: data collected along the way from groups (Smokers/Non-Smokers) Experiment: apply treatment to individuals and observe and measure effects. Clinical trial for Lipitor. Treatment group(Lipitor) and Control group (placebo group) Control: comparison, single-blinding , double-blinding, placebo,blocks Replication: ability to repeat the experiment Randomization: data needs to be collected in an appropriate (random) way, otherwise it is completely useless!

13 This has similarities to completely random sampling
A completely randomized design is when each experimental unit is assigned to a treatment completely at random An example A farmer wants to test the effects of a fertilizer We choose a set of plants to receive the treatment We randomly assign plants to receive different levels of fertilizer This has similarities to completely random sampling

14 We control as many factors as we can
Amount of watering Method of tilling Soil acidity Randomization decreases the effects of uncontrolled factors Rainfall Sunlight Temperature

15 The groups are called blocks This design will reduce confounding
A randomized block design is when the experimental units are grouped and then each group is assigned a treatment at random The groups are called blocks This design will reduce confounding This has similarities to stratified sampling Remark: When two effects cannot be distinguished, this is called confounding

16

17 Summary The planning for designed experiments is crucial to the success of the experiment A double-blind implementation of experiments reduces the amount of changes in behavior There are different good methods for assigning treatments to experimental units Completely random Randomized blocks Matched-pairs (I skipped!)

18 Sampling Design (ex. The city blocks, geografic areas)
Simple random sample(SRS) of size n : every possible random sample of size n individuals has the same chance of being chosen. Note an SRS also gives for each individual an equal chance to be chosen (thus avoiding bias in the choice) systematic: select starting point and every kth member chosen. convenience: use easy to get data. stratified: subdivide population into at least 2 subgroups with common characteristic(homogeneous) and draw samples from each (e.g. gender age, animal species,) cluster: divide population into areas and draw samples form clusters(intact groups representative of the population) (ex. The city blocks, geografic areas) Sampling error: the difference between a sample result and the true population result; results from chance sample fluctuations Nonsampling error: occurs when data is incorrectly collected, measured, recorded or analyzed.

19

20

21 Summary There are other sampling methods that are particularly useful in certain situations Stratified sampling to cover the different strata Systematic sampling when the frame is unknown Cluster sampling to reduce the time and expense required Multistage sampling for effective large scale samples The choice of sampling methods depends on the structure of the population and the goals of the analyst

22 Sources of Error In Sampling
One type of error, sampling errors, occur because we use only part of the population in our study Samples consist of only part of the total data Samples are usually more realistic to analyze Because there are individuals in the population that are not in our sample, sampling errors are difficult to control We will study sampling errors in future chapters

23 Types of nonsampling error
Using an incomplete frame Individuals who respond have different characteristics than individuals who do not respond Interviewer errors Misrepresented answers Data checks Questionnaire design Wording of questions Order of questions, words, and responses

24 Another type of error, nonsampling errors, occur from the actual survey process
Preference is given to selecting some individuals over others Individual answers are not accurate (for various reasons) Nonsampling errors can often be controlled or minimized with a well-designed survey and sampling technique

25 The Literary Digest used their polls to predict the winner of presidential elections
Their previous polls were accurate In 1936, the Literary Digest predicted that Alf Landon would defeat Franklin Roosevelt in a landslide In the actual election, Roosevelt won in a landslide

26 Why was the Literary Digest so far off?
The 1936 frame was not representative of the total voting population The sampling process was not completely random The frame had too large of a proportion of Republicans, who generally favored Landon The frame had too small of a proportion of Democrats, who generally favored Roosevelt Republicans were overrepresented and Democrats were underrepresented!


Download ppt "Introduction to Statistics"

Similar presentations


Ads by Google