Presentation on theme: "Statistics-MAT 150 Chapter 1 Introduction to Statistics Prof. Felix Apfaltrer Office:N518 Phone: x7421."— Presentation transcript:
Statistics-MAT 150 Chapter 1 Introduction to Statistics Prof. Felix Apfaltrer firstname.lastname@example.org Office:N518 Phone: x7421
Chapter 1 Overview Nature of data Skills needed in statistics
Overview Statistics: Descriptive –Analyze nature of data from surveys, experiments, observations, Inferential –Draw conclusions from the analyses with respect to the population Survey: tool to collect data from a smaller group which is part of a larger group to learn something about the larger group Key goal of statistics: Learn about a large group (population) from data from from a smaller subgroup (sample)
Overview Definitions: Data: observations collected (measurements, gender, answers,…) Statistics: collection of methods to analyze data Population: complete collection of elements (scores, measurements, subjects,…) Sample: subcollection of members from selected population Census: collection of data from every member of the population
Overview 2 Example: Poll: 1087 adults are asked whether they drink alcoholic beverages or not. –Sample: 1087 adults –Population: US adults 150 million. Census: Every 10 years, the census bureau tries to collect information from every member of the US population. –Impossible! –Very expensive! Use sample data to draw conclusions from whole population: inferential statistics!
Types of data Parameter: A numerical measurement describing some characteristic of the population. Lincoln elected: 39.82% of 1,865,908 votes counted. –39.82% is a parameter. Statistic: A numerical measurement describing some characteristic of the sample. Based on a sample of 877 elected executives, 45% would not hire an applicant with a typographical error in the application. –45% is a statistic.
Types of data 2 Quantitative data:Numbers representing counts or measurements. Weights of supermodels. Qualitative data: Nonnumerical. Gender of an athlete. Discrete vs. continuous data # of people in a household vs. temperatures in May. Nominal level of measurement: names, labels categories: no ordering. Yes/No/Undecided responses, colors. Ordinal level of measurement: some order, but numerical values meaningless or nonexistent. Course grades A, B, C, D, F. “Livability rank of a city”. Interval level of measurement: order, but “no 0” or meaningless. Temperature, year. Ratio level of measurement: as before with meaningfull zero. Weights, prices (non-negative).
Basic skills Samples: representative: “39/40 polled people vote for A” Sampled in A’s headquarters! Not too small: CDF published “among HS students suspended, 67% suspended more than 3 times” Sample size: 3! Percentage of: 6 % of 1200 = 6 / 100 * 1200 = 72% Fraction >>> percentage: 3/4 = 0.75 >>> 0.75 * 100% = 75 % Graphs: In which one does red do better? Percentage >>> decimal: 27.3% = 27.3/100 = 0.273 Decimal >>> percentage: 0.852 >>> 0.852 * 100% = 85.2% `
Design Observational study: observe and measure characteristics without trying to modify subjects. Gallup poll. Cross-sectional: data observed, measured at one point in time. Retrospective: data are collected from the past (records) Prospective: data collected along the way from groups (smokers/NS) Experiment: apply treatment and observe and measure effects. Clinical trial for Lipitor. Control: blinding - placebo, double-blinding, blocks Replication: ability to repeat experiment Randomization: data needs to be collected in an appropriate (random) way, otherwise it is completely useless! –Random sample: members of the population are selected so that each individual member has the same chance of being selected. –Simple random sample of size n : every possible random sample of size n has the same chance of being chosen.
Design 2 Sampling: systematic: select starting point and every k th member chosen. convenience: use easy to get data stratified: subdivide population into at least 2 subgroups with common characteristic and draw samples from each (e.g. gender or age) cluster: divide population into areas and draw samples form clusters Sampling error: the difference between a sample result and the true population result; results from chance sample fluctuations Nonsampling error: occurs when data is incorrectly collected, measured, recorded or analyzed.