Presentation is loading. Please wait.

Presentation is loading. Please wait.

Thomas Songer, PhD with acknowledgment to several slides provided by M Rahbar and Moataza Mahmoud Abdel Wahab Introduction to Research Methods In the Internet.

Similar presentations


Presentation on theme: "Thomas Songer, PhD with acknowledgment to several slides provided by M Rahbar and Moataza Mahmoud Abdel Wahab Introduction to Research Methods In the Internet."— Presentation transcript:

1 Thomas Songer, PhD with acknowledgment to several slides provided by M Rahbar and Moataza Mahmoud Abdel Wahab Introduction to Research Methods In the Internet Era Data Collection Descriptive Statistics Introduction to Biostatistics

2 Key Lecture Concepts Distinguish between different strategies for obtaining a sample from a population Distinguishing between different forms of data collection Identify key approaches to organize and portray your data Understand the measures of central tendency and variability in your data 2

3 Descriptive & Inferential Statistics Descriptive Statistics deal with the enumeration, organization and graphical representation of data from a sample Inferential Statistics deal with reaching conclusions from incomplete information, that is, generalizing from the specific sample Inferential statistics use available information in a sample to draw inferences about the population from which the sample was selected Rahbar

4 Epidemiology is… The study of disease and its treatment, control, and prevention in a population of individuals. Whole populations may be examined, but… More frequently, samples of the population may be examined. Samples that are studied must be representative of the population for the results to be generalized to the total population. Torrence 1997 4

5 Hypothetical Population Sample 1: Sample 2: Sample 3: Representative? Y N 5

6 Sampling Approaches Convenience Sampling: select the most accessible and available subjects in target population. Inexpensive, less time consuming, but sample is nearly always non-representative of target population. Random Sampling (Simple): select subjects at random from the target population. Need to identify all in target population first. Provides representative sample frequently. 6

7 Sampling Approaches Systematic Sampling: Identify all in target population, and select every x th person as a subject. Stratified Sampling: Identify important sub- groups in your target population. Sample from these groups randomly or by convenience. Ensures that important sub-groups are included in sample. May not be representative. More complex sampling 7

8 Sampling Error The discrepancy between the true population parameter and the sample statistic Sampling error likely exists in most studies, but can be reduced by using larger sample sizes Sampling error approximates 1 / √n Note that larger sample sizes also require time and expense to obtain, and that large sample sizes do not eliminate sampling error 8

9 Research Process Research question Hypothesis Identify research design Data collection Presentation of data Data analysis Interpretation of data Polgar, Thomas 9

10 Types of Data Collection Surveys/Questionnaires –Self-report –Interviewer-administered –proxy Direct medical examination Direct measurement (e.g. blood draws) Administrative records 10

11 Understanding and Presenting Data 11

12 Types of Data 1.Categorical: (e.g., Sex, Marital Status, income category) 2.Continuous: (e.g., Age, income, weight, height, time to achieve an outcome) 3.Discrete: (e.g.,Number of Children in a family) 4.Binary or Dichotomous: (e.g., response to all Yes or No type of questions) 12

13 Brain Size and IQ What types of data do these variables represent? GenderFSIQVIQPIQWeightHeightMRI Count Female13313212411864.5816932 Male140150124 72.51001121 Male13912315014373.31038437 Male13312912817268.8965353 Female13713213414765951545 Female999011014669928799 Female13813613113864.5991305 Female92909817566854258 Male89938413466.3904858 Male13311414717268.8955466 Female13212912411864.5833868 13

14 Scale of Data 1. Nominal: These data do not represent an amount or quantity (e.g., Marital Status, Sex) 2. Ordinal: These data represent an ordered series of relationship (e.g., level of education) 3. Interval: These data is measured on an interval scale having equal units but an arbitrary zero point. (e.g.: Temperature in Fahrenheit) 4. Interval Ratio: Variable such as weight for which we can compare meaningfully one weight versus another (say, 100 Kg is twice 50 Kg) 14

15 Organizing Data and Presentation Frequency Table Frequency Histogram Relative Frequency Histogram Frequency polygon Relative Frequency polygon Bar chart Pie chart Box plot 15

16 Frequency Table Generally, the first approach to examining your data. Identifies distribution of variables overall Identifies potential outliers –Investigate outliers as possible data entry errors –Investigate a sample of others for data entry errors 16

17 Frequency Table A research study has been conducted examining the number of children in the families living in a community. The following data has been collected based on a random sample of n = 30 families from the community. 2, 2, 5, 3, 0, 1, 3, 2, 3, 4, 1, 3, 4, 5, 7, 3, 2, 4, 1, 0, 5, 8, 6, 5, 4, 2, 4, 4, 7, 6 Organize this data in a Frequency Table! 17

18 X=No. of Children Count (Frequency) Relative Freq. 022/30=0.067 133/30=0.100 255/30=0.167 35 466/30=0.200 544/30=0.133 622/30=0.067 72 811/30=0.033 18

19 Frequency Table Now, construct a similar frequency table for the age of patients with Heart related problems in a clinic. The following data has been collected based on a random sample of n = 30 patients who went to the emergency room of the clinic for Heart related problems. The measurements are: 42, 38, 51, 53, 40, 68, 62, 36, 32, 45, 51, 67, 53, 59, 47, 63, 52, 64, 61, 43, 56, 58, 66, 54, 56, 52, 40, 55, 72, 69. 19

20 Age GroupsFrequencyRelative Frequency 32 -36 yr22/30=0.067 37- 41 yr33/30=0.100 42-46 yr44/30=0.134 47-51 yr33/30=0.100 52-56 yr88/30=0.267 57-61 yr33/30=0.100 62-66 yr44/30=0.134 67-72 yr33/30=0.100 Totaln=30 20

21 Frequency Polygon Use to identify the distribution of your data 21

22 Table 1 in a paper Describe your study population in a frequency table Table Title Name of variable (Units of variable) Frequency(n)% Mean (SD) - - Categories - Total Total 22

23 Measures of Central Tendency Where is the heart of distribution? 1. Mean 2. Median 3. Mode 23

24 Sample Mean The arithmetic mean (or, simply, mean) is computed by summing all the observations in the sample and dividing the sum by the number of observations. For a sample of five household incomes, 6000, 10,000, 10,000, 14000, 50,000 the sample mean is, 24

25 Median In a list ranked from smallest measurement to the highest, the median is the middle value In our example of five household incomes, first we rank the measurements 6,000 10,000 10,000 14,000 50,000 Sample Median is 10,000 25

26 Mode In nominal data: The value which occurs with the greatest frequency 26

27 Measures of non-central locations Quartiles Quintiles Percentiles 27

28 Measures of Dispersion or Variability Range (present highest and lowest value in a distribution. The difference between these values is the range) Variance Standard deviation (the square root of the variance) 28

29 Sample Variance S = standard deviation (square root of variance) 29

30 Calculation of Variance and Standard deviation 30

31 Mean and Standard deviation (SD) 7 7 7 7 7 7 7 8 7 7 7 6 3 2 7 8 13 9 Mean = 7 SD=0 Mean = 7 SD=0.63 Mean = 7 SD=4.04 31

32 Empirical Rule For a Normal distribution approximately, a) 68% of the measurements fall within one standard deviation around the mean b) 95% of the measurements fall within two standard deviations around the mean c) 99.7% of the measurements fall within three standard deviations around the mean 32

33 Suppose the reaction time of a particular drug has a Normal distribution with a mean of 10 minutes and a standard deviation of 2 minutes Approximately, a) 68% of the subjects taking the drug will have reaction time between 8 and 12 minutes b) 95% of the subjects taking the drug will have reaction tome between 6 and 14 minutes c) 99.7% of the subjects taking the drug will have reaction tome between 4 and 16 minutes 33


Download ppt "Thomas Songer, PhD with acknowledgment to several slides provided by M Rahbar and Moataza Mahmoud Abdel Wahab Introduction to Research Methods In the Internet."

Similar presentations


Ads by Google