# LECTURE I Statistics Two different meanings of statistics:

## Presentation on theme: "LECTURE I Statistics Two different meanings of statistics:"— Presentation transcript:

LECTURE I Statistics Two different meanings of statistics:
‘summary numbers resulting from data analysis’ ‘procedures used to organize and analyze facts numerically’

Descriptive statistics
When one wants to describe the data that has been collected The mean age in the class is 24. The median home price in Orange County is \$305,000.

Inferential statistics
when one wants to generalize or make inferences based on it. Because the mean age of this class is 24, the mean age of all classes are 24. Based on the data from insurance companies, males under 18 are more likely to get into accidents then females under 18.

The research process Specify research goals Review the literature
Formulate hypotheses Measure and record Analyze the data Invite scrutiny

The W’s of data WHO: each observation belong to somebody or something. If observations are coming from individuals, those individuals are called ‘subjects’, ‘respondents’, ‘participants’ or ‘cases’. Unit of observation does not have be individuals though. If observations are coming from inanimate subjects they are usually called experimental units.

Example: You collected data on your patients’ age. Then each observation comes from a single patient. In this case each patient is a ‘case’ or a ‘participant’. You collected data from different web-sites. You recorded how may links each web-site has. Here observation comes from a different web-site. Therefore each web-site is called an ‘experimental unit’.

The W’s of data 2. WHAT: The characteristics recorded about each individual are called variables. 3. WHERE 4. WHEN 5. HOW

Definitions Population – the collection of all elements in the study.
Census - the collection of data from every element in a population. Sample - a sub-collection of elements drawn from the population . Random sample – selected in such a way that each element in the population has an equal chance of being represented. Sampling frame – a list of elements in the population

SAMPLING: Is the process by which you select the sample from the population. Sample has to be representative, if not then it is a biased sample and results will not apply (be generalized) to the population.

How to get a representative sample?:
Random sampling Systematic sampling Stratified sampling Cluster sampling Convenience sampling

Simple random sample – n subjects are selected in a way that every possible sample of size n has the same chance of being chosen Stratified sample – subdivide the population into at least 2 different subpopulations that share the same characteristic, then draw a sample from each group. Systematic sample – select every k element in the population. Cluster sample – divide the population into sections/clusters, then randomly select a few of those sections, and then choose all of the numbers from those selected sections Convenience sample – use what is readily available

What is a variable? A variable is anything that can take on different values or amounts across time or across subjects. * IQ * size of a classroom * midterm scores * depression * motivation * drug dosage * SES * Teaching experience * Race, ethnicity

Hypothesis A statement that describes a relationship between at least two variables; these statements are based on either research or personal knowledge. The majority of Americans run red lights. Research claims that the mean body temperature of healthy adults is not 98.6.

Depending on the context of research a variable can be the one of the two :
Dependant variable is the variable of main interest. It is observed but not manipulated. It is the variable on which the effect of other variables are investigated Independent variable is the variable of which effect on the dependant variable is investigated. Control variable is any variable other than the above that can have affect on the independent – dependant relationship.

Example: Catholics are more likely to vote for Bush.
DEPENDANT VARIABLE: voting preference. INDEPENDANT VARIABLE: religion. (the way that you vote depends on your religion) Control variables:

Context-independent classification of variables:
Qualitative variables: also known as ‘categorical variables’. Differ in kind rather than amount. There is no unit of measurement. Quantitative variables: numbers assigned to quantitative variables represent differing quantities of characteristics.

Which are quantitative, which are qualitative?
Gender IQ scores Age Ethnicity Number of years of experience Smoking status

A Quantitative variable can be
Discrete [e.g. number of students in a class, number of kids one has] Continuous [time, weight, ability, achievement, IQ]

Summary of types of variables

SCALES OF MEASUREMENT 1. Nominal Scale: Simplest scale. Provides names or labels only. Numbers assigned to each label is completely arbitrary. Therefore labels cannot be put in a meaningful order. There is no magnitude of measurement. Ex: Gender is measured on nominal scale. 1 = Male, 2 = Female does not mean Female is bigger, better or stronger.

SCALES OF MEASUREMENT 2. Ordinal Scale: Numbers assigned on ordinal scale tell us about the ranking of each observation. Therefore they can be put in a meaningful order. Ex: How difficult is this course? 1 = not at all difficult 2 = a bit difficult 3 = extremely difficult BE CAREFUL! THE DIFFERENCE BETWEEN THE NUMBERS IS MEANINGLESS.

SCALES OF MEASUREMENT 3. Interval Scale: Numbers assigned on interval scale are meaningful, the differences among the numbers are also meaningful. There is no absolute zero, therefore the ratio of the numbers is meaningless. Ex: Temperature is measured on interval scale. difference between 30 F and 40 F is equal to the difference between 50 F ad 60 F. But 60 F is not twice as hot as 30 F.

SCALES OF MEASUREMENT 4. Ratio Scale: the numbers assigned on ratio scale are meaningful, the differences are meaningful and the ratios are meaningful. There is an absolute zero point. Example: Weight is measured on ratio scale. 40 pounds is twice as heavy as 20 pounds.

More examples Ice cream flavors - NOMINAL
The speed of five runners in a 1-mile race, as measured by the runner’s order of finish. 1 for winner, 2 for second, etc. - ORDINAL The height above ground level of the floors in a particular 10 storied apartment building, as measure by the number of each floor, assuming that the first floor is at ground level. - INTERVAL The number of people going to a particular movie theater each night as a measure of the theater’s gross income from ticket sales, assuming each ticket costs \$ RATIO Population of all eighth grade students in the US, with X representing the region of the country in which the student lives. 1 = northeast, 2 = north central, 3 = south, and 4 = west. - NOMINAL Toss a coin 100 times and X represents the number of heads obtained for each set of 100 tosses. - RATIO

Uses and abuses of statistics
small samples – even a large sample can be biased precise numbers – a statistic that is very precise is not necessarily accurate guesstimates – estimating how many people at the million man march distorted percentages partial pictures deliberate distortions loaded questions – since we already have enough nuclear warheads to blow up the world, should more federal money be spent on the defense budget? misleading graphs – see text! pictographs – often drawn distorted pollster pressure – answering to favor self-image bad samples

WHY DID WE LEARN THIS ANYWAY?!?
Your variables are measured on one scale or the other. And the type of scale determines what kind of operations (or calculations) you can carry out with the data that represents your variables. If you measured something on nominal scale you cannot take its average for instance !!!