Presentation on theme: "LECTURE I Statistics Two different meanings of statistics:"— Presentation transcript:
1LECTURE I Statistics Two different meanings of statistics: ‘summary numbers resulting from data analysis’‘procedures used to organize and analyze facts numerically’
2Descriptive statistics When one wants to describe the data that has been collectedThe mean age in the class is 24.The median home price in Orange County is $305,000.
3Inferential statistics when one wants to generalize or make inferences based on it.Because the mean age of this class is 24, the mean age of all classes are 24.Based on the data from insurance companies, males under 18 are more likely to get into accidents then females under 18.
4The research process Specify research goals Review the literature Formulate hypothesesMeasure and recordAnalyze the dataInvite scrutiny
5The W’s of dataWHO: each observation belong to somebody or something. If observations are coming from individuals, those individuals are called ‘subjects’, ‘respondents’, ‘participants’ or ‘cases’. Unit of observation does not have be individuals though. If observations are coming from inanimate subjects they are usually called experimental units.
6Example:You collected data on your patients’ age. Then each observation comes from a single patient. In this case each patient is a ‘case’ or a ‘participant’.You collected data from different web-sites. You recorded how may links each web-site has. Here observation comes from a different web-site. Therefore each web-site is called an ‘experimental unit’.
7The W’s of data2. WHAT: The characteristics recorded about each individual are called variables.3. WHERE4. WHEN5. HOW
8Definitions Population – the collection of all elements in the study. Census - the collection of data from every element in a population.Sample - a sub-collection of elements drawn from the population .Random sample – selected in such a way that each element in the population has an equal chance of being represented.Sampling frame – a list of elements in the population
9SAMPLING:Is the process by which you select the sample from the population.Sample has to be representative, if not then it is a biased sample and results will not apply (be generalized) to the population.
10How to get a representative sample?: Random samplingSystematic samplingStratified samplingCluster samplingConvenience sampling
11Simple random sample – n subjects are selected in a way that every possible sample of size n has the same chance of being chosenStratified sample – subdivide the population into at least 2 different subpopulations that share the same characteristic, then draw a sample from each group.Systematic sample – select every k element in the population.Cluster sample – divide the population into sections/clusters, then randomly select a few of those sections, and then choose all of the numbers from those selected sectionsConvenience sample – use what is readily available
12What is a variable?A variable is anything that can take on different values or amounts across time or across subjects.* IQ* size of a classroom* midterm scores* depression* motivation* drug dosage* SES* Teaching experience* Race, ethnicity
13HypothesisA statement that describes a relationship between at least two variables; these statements are based on either research or personal knowledge.The majority of Americans run red lights.Research claims that the mean body temperature of healthy adults is not 98.6.
14Depending on the context of research a variable can be the one of the two : Dependant variable is the variable of main interest. It is observed but not manipulated. It is the variable on which the effect of other variables are investigatedIndependent variable is the variable of which effect on the dependant variable is investigated.Control variable is any variable other than the above that can have affect on the independent – dependant relationship.
15Example: Catholics are more likely to vote for Bush. DEPENDANT VARIABLE: voting preference.INDEPENDANT VARIABLE: religion.(the way that you vote depends on your religion)Control variables:
16Context-independent classification of variables: Qualitative variables: also known as ‘categorical variables’. Differ in kind rather than amount. There is no unit of measurement.Quantitative variables: numbers assigned to quantitative variables represent differing quantities of characteristics.
17Which are quantitative, which are qualitative? GenderIQ scoresAgeEthnicityNumber of years of experienceSmoking status
18A Quantitative variable can be Discrete [e.g. number of students in a class, number of kids one has]Continuous [time, weight, ability, achievement, IQ]
20SCALES OF MEASUREMENT1. Nominal Scale: Simplest scale. Provides names or labels only. Numbers assigned to each label is completely arbitrary. Therefore labels cannot be put in a meaningful order. There is no magnitude of measurement.Ex: Gender is measured on nominal scale.1 = Male, 2 = Femaledoes not mean Female is bigger, better or stronger.
21SCALES OF MEASUREMENT2. Ordinal Scale: Numbers assigned on ordinal scale tell us about the ranking of each observation. Therefore they can be put in a meaningful order.Ex: How difficult is this course?1 = not at all difficult2 = a bit difficult3 = extremely difficultBE CAREFUL! THE DIFFERENCE BETWEEN THE NUMBERS IS MEANINGLESS.
22SCALES OF MEASUREMENT3. Interval Scale: Numbers assigned on interval scale are meaningful, the differences among the numbers are also meaningful. There is no absolute zero, therefore the ratio of the numbers is meaningless.Ex: Temperature is measured on interval scale.difference between 30 F and 40 F is equal to the difference between 50 F ad 60 F. But 60 F is not twice as hot as 30 F.
23SCALES OF MEASUREMENT4. Ratio Scale: the numbers assigned on ratio scale are meaningful, the differences are meaningful and the ratios are meaningful. There is an absolute zero point.Example: Weight is measured on ratio scale.40 pounds is twice as heavy as 20 pounds.
24More examples Ice cream flavors - NOMINAL The speed of five runners in a 1-mile race, as measured by the runner’s order of finish. 1 for winner, 2 for second, etc. - ORDINALThe height above ground level of the floors in a particular 10 storied apartment building, as measure by the number of each floor, assuming that the first floor is at ground level. - INTERVALThe number of people going to a particular movie theater each night as a measure of the theater’s gross income from ticket sales, assuming each ticket costs $ RATIOPopulation of all eighth grade students in the US, with X representing the region of the country in which the student lives. 1 = northeast, 2 = north central, 3 = south, and 4 = west. - NOMINALToss a coin 100 times and X represents the number of heads obtained for each set of 100 tosses. - RATIO
25Uses and abuses of statistics small samples – even a large sample can be biasedprecise numbers – a statistic that is very precise is not necessarily accurateguesstimates – estimating how many people at the million man marchdistorted percentagespartial picturesdeliberate distortionsloaded questions – since we already have enough nuclear warheads to blow up the world, should more federal money be spent on the defense budget?misleading graphs – see text!pictographs – often drawn distortedpollster pressure – answering to favor self-imagebad samples
26WHY DID WE LEARN THIS ANYWAY?!? Your variables are measured on one scale or the other. And the type of scale determines what kind of operations (or calculations) you can carry out with the data that represents your variables. If you measured something on nominal scale you cannot take its average for instance !!!