Presentation is loading. Please wait.

Presentation is loading. Please wait.

Intro to Research Methods

Similar presentations


Presentation on theme: "Intro to Research Methods"— Presentation transcript:

1 Intro to Research Methods
SPH-X590  Summer 2015 Sampling Random Selection Random Assignment

2 Presentation Outline Data Collection Methods: Participants & Sampling
Homogeneity & Heterogeneity Validity Terminology Inference Sample Sizes Sampling Distribution Study Designs Considerations for Sampling Experimental, Quasi & Non Random Selection vs. Randomization Error Probability Samples and Non-Probability Samples

3 Data Collection Methods: Homogeneity & Sampling
Sampling is a bit philosophical in a way: the basic premise is that we are not that different from one another. Sampling is basically how many individuals do I need in order to say something about the group to which they belong. For example, if every member of a group or a population is identical, then that population is homogenous. The characteristics of any one individual in the population is the same as the characteristics of any other member of the population. There is little or no variation among individuals.

4 Data Collection Methods: Homogeneity & Sampling
If the humanoid population of Mars is homogenous, how many aliens would we need to abduct from Mars in order to understand what Martians are like?

5 Data Collection Methods: Heterogeneity & Sampling
When members of a group or a population are different from one another, the population is heterogeneous A wide range of characteristics among individuals Significant variation among individuals How does this change alien abduction scheme to understand Martians? To describe a heterogeneous population, we need to observe multiple individuals so that we capture the full range/ variety of important characteristics that may exist.

6 Data Collection Methods: Validity & Sampling
In the ideal study design for our research on Martians, we would randomly select the aliens (i.e. subjects) from the larger population of Martians (i.e. the population of interest). Randomly picking Martians ensures External Validity. Then we would randomly assign the individual Martians (i.e. subjects) to a Control or an Experimental condition. Randomly assigning the Martians ensures Internal Validity). Rarely are researchers able to study people (much less Martians) in an Experimental design using random selection and assignment.

7 Data Collection Methods: Validity & Sampling
In most scientific research, more emphasis on Internal Validity than External Validity. More concerned with whether the Independent Variable is truly “causing” a change in the Dependent Variable, than with the generalizability of the effect. Replicating the study with other populations is a way to establish External Validity (i.e. Reproducibility). Random Sampling ensures that the Dependent Variable is the only variable which has a differential influence on conditions being compared in the study. Controls for extraneous Confounding Variables

8 Data Collection Methods: Sampling Terminology
Sample Element: a single case/unit from a population and measured— Unit of Analysis For example, a person, thing, specific time, etc. For example, a Martian Sample Universe: theoretical aggregation of all possible Sample Elements — Unspecified by time and space For example, Mars/ All Martians Theoretical Population: theoretical aggregation of specified elements- Defined by time and space For example, 2020 Urban Population of Martians

9 Data Collection Methods: Conceptual Model of Sampling
Sample Universe Theoretical Population Sample Population Sample Frame Sample Sample Elements

10 Data Collection Methods: Sampling Terminology
Sample, Target, or Study Population: aggregation of the population from which the sample is actually drawn For example, Martians living in the Capitol City of Mars in 2020 Sampling Frame: a specific list that closely approximates all elements in the population— Researcher selects units from this list to create the study sample For example, the 2020 Phonebook of Martians living in the Capitol City of Mars. Sample: a set of cases that is drawn from a larger pool and used to make generalizations about the population

11 The Theoretical Population
Data Collection Methods: Samples & Populations The Theoretical Population 2020 Martian Population To what population do you want to generalize? The Study Population Martians living in the Capitol City of Mars in What population can you access? The Sampling Frame Martians listed in the 2020, Mars Capital City Yellow Pages. How are you going to access your sample? The Sample A Group of Martians Who is in your study?

12 Data Collection Methods & Analysis: Variation & Sampling
A sample of individuals of the population must have essentially the same variation as the population of individuals for any information from the sample to be useful. The more different individuals in a population are from one another, the greater the chance is the sample does not sufficiently describe a population. The more heterogeneous the population, the more likely that the inferences we make about the population are wrong. The more heterogeneous the population, the larger the sample needs to be to adequately describe the population. The more observations, the more accurate the inferences

13 Data Collection Methods & Analysis: Sampling Terminology
Parameter: any characteristic of a population that is true/ known Known parameters are from a Census. A Census is when all members of the Study Population included in the study For example, % of males or % of females of the 2010 US population For example, % of males or % of females of the 2020 Martian population Estimate: any characteristic assessed from a sample Estimates refer to Samples For example, of the 100 Martians sampled for our study, 45% are male and 55% are female. Sampling Error: how close the sample estimates for a characteristic are to the population parameter How well the observed value approximates the true/known value of the population. Sampling Error is a result of not being able to study all the members of a population, but only a sample of individuals from that population. Sampling Error is an estimate of precision. For example, news polls often report the results of a poll followed by “ + or – ” 5% or 3 points. Standard Error (SE): a measure of Sampling Error SE is an inverse function of Sample Size: think back to Pre-Calculus. As Sample Size increases, SE decreases: the sample is more precise. So, the smallest Standard Error (SE) has the greatest precision!: if uncertain, choose to increase sample size.

14 Data Collection Method & Analysis: Sampling & Causal Inference
Actual observations in the data Your Expectations of the Data: Hypotheses Population Sampling Process The Sample Sampling Frame Causal Inference Sampling allows researchers to use the data to say something (make an inference) with confidence, about a whole (population) based on the study of a only a few members (sample). I can infer something about all Martians based on information I collect from a small number of Martians.

15 Data Collection Methods & Analysis: Sample Sizes
Question: How large should the size of the Sample be? Answer: It all depends. Sample Size is a matter of: How much sampling error can be tolerated?: Level of Precision How big or small is the Population?: Sample Size is very important with small populations How different are individuals from each other within the population in regards to the characteristic of interests? : Within Group Variation How small is the smallest subgroup within the sample for which estimates are needed?: Sample Size must be big enough to properly estimate or make inferences about the smallest subgroup in the sample.

16 Operational Definition:
Data Collection Methods & Analysis: Populations, Samples, & Statistics Date of Interview (DOI): MM/DD/YYYY What is your Birthdate (DOB)? MM/DD/YYYY Operational Definition: AGE = DOI- DOB Variable Responses Average Age= Years Statistics The Sample Average Age= Years Study Population Population Parameter

17 Data Collection Methods & Analysis: Sample Statistic
The Standard Error (SE) is highest for a population that has a 50:50 distribution the characteristic/ variable of interest. There is NO SE for a characteristic with 100% distribution across all the Sample Elements. SE refers to variables: variables by definition have more than 1 value. Notation Small letters (miniscule) whether English or Greek alphabet refer to samples. s (or se) = standard error n = sample size p = % having particular characteristic (1-q) q = % not having particular characteristic (1-p) Large Letters whether English or Greek alphabet refer statistics about your sample For example, N = Population Size S = q*p n .5 *.5 100 = .05 or 5% .9 * .1 = .03 or 3% = . Show example on overhead to illustrate the difference in calculation for 95% confidence versus 68% confidence.

18 Data Collection Methods & Analysis: Sample Sizes & Sampling Errors
How comfortable are you with how wrong or right the result of you analysis is? Would you invest your money based answer that could be 10% higher or lower?

19 Data Collection Methods & Analysis: The Sampling Distribution
Martian Sample 1 Martian Sample 2 Martian Sample 3 Average = years Average = years Average = years The Sampling Distribution is the distribution of a statistics across an infinite number of samples. The Age Distribution of 100 Martians from a limitless number of samples pulled from the Study Population of Martians has a characteristic distribution/ curve: the average of the averages

20 Data Collection Methods & Analysis:
Designs Experimental Design: The researcher randomly assign subjects to treatment/ conditions (=variables). Causal Estimation is possible Causal Inferences are stronger Random Sampling from the population less important Usually conducted in a Laboratory Quasi- Experimental or Observational Design: e.g., survey research, polls, etc. Subjects are not randomly assigned to variables Random Sampling is important. Selection Bias is a concern Causal Inference is compromised.

21 Data Collection Methods & Analysis:
Error Ideally, sample statistics should be as close as possible to population parameters, but variability has many causes: Probability Sampling error: the difference between a sample statistic and its population parameter. Random Sampling allows us to estimate the typical size of the Sampling Error. Non-Sampling Error: from other sources, can be systematic bias, and is difficult to estimate. Examples of non-sampling error include under-coverage, nonresponse, question wording / response bias, question order.

22 Data Collection Methods & Analysis:
Natural Experiment Designs A type of Quasi- Experimental or Observational Studies (esp. surveys) in which respondents’ values on a causal variable are plausibly random. Some consider it an Experimental Design Researcher could not or did not manipulate the Experimental variable. Examples: Powerball lottery Births in last half 2014 City councils headed by women Parity 3 birth after same sex or opposite sex

23 Data Collection Methods & Analysis: Random Selection or Random Assignment
Random Selection and Random Assignment are commonly confused or used interchangeably: the terms refer to entirely different processes.  Random Selection refers to how sample members are selected from the population to participate in the study.  Random Selection relies on some form of Random Sampling. Random Sampling is a probability sampling method: relies on the laws of probability to select a sample. Sample Statistics from a random sample all for causal inferences/ estimation to the population parameters: the basis of statistical tests of significance. Random Assignment is a component of Experimental Design. Study participants have equal chance of ending up in the Experimental group/ condition or the Control group/ condition random procedure. Random Assignment is also known as Randomization.

24 Data Collection Methods & Analysis: Probability, Probability Samples, & Representativeness
Your sample must be representative of the population in terms of the variables of interest. A sample will be representative of the population from which it comes, if each individual member of the population has an equal chance (Probability) of being picked. Probability Samples are more accurate than Non-Probability Samples Conscious and unconscious Sampling Bias removed Probability Samples allow researchers to estimate the accuracy of the sample. Probability Samples permit the estimation of population parameters.

25 Data Collection Methods & Analysis: Sampling, Samples & Probability
Sampling is the process of selecting observations (a sample) to provide an adequate description and robust inferences of the population from which the sample comes. Your sample must be representative of the population. There are 2 types of Sampling: Non-Probability Sampling Probability Sampling

26 Probability Sampling: Simple Random Sampling (SRS)
The basic sampling method which most others are based on. Method: A sample size ‘n’ is drawn from a population ‘N’ in such a way that every possible element in the population has the same chance of being selected. Take a number of samples to create a sampling distribution Typically conducted “without replacement” What are some ways for conducting an SRS? Random numbers table, drawing out of a hat, random timer, etc. Not usually the most efficient, but can be most accurate! Time & money can become an issue What if you only have enough time and money to conduct one sample? Researcher creates a sampling frame, numbers the elements in the frame and uses a list of random numbers to select the elements to be included in the sample (using random number tables). Talk about marble example from the book: One sample of 100 marbles may give you a good estimate of the proportion of red versus white marbles. However, this doesn’t mean that the sample reflects the actual proportion. A number of samples of 100 marbles can be taken to see what patterns of results emerge by examining a distribution of these samples. Central limit theorem: as the number of different random samples in a sampling distribution increases toward infinity, the pattern of samples in the distribution becomes more predictablenormal curve.

27 Probability Sampling: A Simple Random Sample
How To List all the subjects in a population Assign a number to each subject Pick numbers from a list of random numbers Select the subjects who correspond to the random numbers to be in the sample. Pros Works well for people in households, or students in classes, for example. Cons The larger the population the more Cost and Feasibility become problematic.

28 Probability Sampling: Systematic & Stratified Sampling
Systematic Random Sample: Pick a random case from the first k cases of a sample: select every kth case after that one For example, you randomly picked case 12; then pick every 5th case afterwards until you have the sample size you need. Stratified Random Sample: Divide a population into groups (or strata), then select a simple random sample from each stratum. For example, dividing the population of Martians into different groups based on the characteristics of their eyes: No Eyes, One Eye, Two Eyes or Three Eyes. Then, randomly picking Martians from each of those groups until I have the sample size to represent the group (stratum) and the population.

29 Probability Sampling: Systematic Random Sampling
Method: Starting from a random point on a sampling frame, every nth element in the frame is selected at equal intervals (Sampling Interval). Sampling Interval tells the researcher how to select elements from the frame (1 in ‘k’ elements is selected): Depends on sample size needed Example: You have a sampling frame (list) of 10,000 people and you need a sample of 1000 for your study: What is the sampling interval that you should use? Every 10th person listed (1 in 10 persons) Empirically provides identical results to SRS, but is more efficient. Caution: Need to keep in mind the nature of your frame for SS to work- beware of periodicity! Periodicity is the number of times you plan on collecting data from participants: for example, 4 times a year for 5 years. Examples: doing a study on husbands and wives; would need to make sure the interval did not fall on males or females at all times.

30 Probability Sampling: Stratified Random Sampling
Method: Divide the population by certain characteristics into homogeneous subgroups (strata) Elements within each strata are homogeneous, but are heterogeneous across strata. A simple random or a systematic sample is taken from each strata relative to the proportion of that stratum to each of the others. When is it appropriate? When a stratum of interest is a small percentage of a population and random processes could miss the stratum by chance. When enough is known about the population that it can be easily broken into subgroups or strata.

31 Probability Sampling: Cluster & Multistage Sampling
Cluster Sampling: Divide the population into groups called clusters or primary sampling units (PSUs); take a random sample of the clusters. For example, if I wanted to study Martian school children, rather than randomly selecting from a list all Martian school students, I could create a list of all Martian schools and randomly choose the schools to be in my sample. Multistage Sampling: Several levels of nested clusters, often including both stratified and cluster sampling techniques. For example, I may use a cluster sampling method to choose schools for my study of Martian school children, and then I select a simple random sample of students within the school. Alternatively, I could randomly select Martian schools to create a sampling frame of Martian schools, then create a sampling frame of the Martian students attending those schools. I could create a systematic or a simple random sample of the Martian students from the list of Martian students until the needed sample size is reached: schools would be selected based on the probability of the student in that school is selected.

32 Probability Sampling: Cluster Sampling
Some populations are spread out For example, over a state or country Elements occur in clumps: Primary Sampling Units (PSU) For example, towns, districts, schools Sample Elements are hard to reach and identify. Trade Accuracy for Efficiency. Convenience, Effort, Time and Resources are important considerations. You cannot assume that any one cluster is better or worse than another cluster.

33 Probability Sampling: Multistage Sampling
Used when: Researchers lack a good sampling frame for a dispersed population. The cost to reach a sample element to is very high. Each cluster is internally heterogeneous and homogeneous to all the other clusters: Between vs. Within Variability Usually less expensive than Simple Random Sampling but not as accurate Each stage in Cluster Sampling introduces Sampling Error: the more stages, the greater the likelihood of error . Can combine Simple Random Sampling, Systematic Random Sampling, Stratified Sampling with Cluster Sampling!!

34 Data Collection Methods and Analysis: Random Selection or Random Assignment
Why Random Selection? Each Sample Element (i.e. individual) has an equal probability of being picked for a study: selection process is unpredictable/ no pattern Reduces research bias Researcher can calculate the probability of certain outcomes because of the Sampling Distribution. Several types of probability samples Why Random Assignment? Samples created by Random Assignment are most likely the best representative of the population of interest Random Assignment as a random processes, allows researchers to calculate the deviation between the sample statistics and the population parameter.

35 Data Collection Methods and Analysis: Randomization/ Random Assignment
A general term for the techniques for insuring that any member of a population has an equal chance of appearing in a sample. Each participant in the study has an equal and unbiased chance of being assigned to any of the conditions being compared in the experiment. Parallel groups are equated which controls for both known and unknown extraneous/ confounding variables. Sample statistics from randomized samples (i.e. samples formed from randomization/ random assignment) will on average have the same values as the population parameters.

36 Non-Probability Sampling:
cannot specify the probability that a given sample will be selected Examples: Snowball Sampling or Respondent Driven Sampling Why would use a Non-Probability Sample? Often inexpensive Great for “Hard to Reach” Populations Difficult to sample, Require great trust, or Using lengthy unstructured interviews Some variables and their relationships are universal: makes sampling method irrelevant! Many Life & Medical Science Researches: human physiology and anatomy are so similar that we don’t need many subjects and don’t need to worry about generalizing to the population of human beings.


Download ppt "Intro to Research Methods"

Similar presentations


Ads by Google