Download presentation
Presentation is loading. Please wait.
Published byFrank Sharp Modified over 9 years ago
1
- from experimental, observational and descriptive studies
Appraisal, Extraction and Pooling of Quantitative Data for Reviews of Effects - from experimental, observational and descriptive studies Appraisal, extraction & pooling of quantitative data for Reviews of Effects. Welcome all participants and introduce yourself. Re-iterate that whilst this Module deals with quantitative data that we may encounter to inform a range of research questions, the focus of the next two days is on using studies that provide quantitative data that can be used to inform the effectiveness of an intervention or therapy.
2
Introduction Recap of Introductory module
Developing a question (PICO) Inclusion Criteria Search Strategy Selecting Studies for Retrieval This Module considers how to appraise, extract and synthesize evidence from experimental, observational and descriptive studies. Point 1/Top (reduced) section of schematic: Provide summary of contents of the Day 1 Module - Introduction to Evidence based Health care and the Systematic Review of evidence. Introduction to JBI and Systematic review process and in particular, those parts of the process which are ‘generic’ or ‘standard’ in any systematic review, independent of the evidence which is being sought and synthesized. We examined the process of developing an appropriate question for systematic review, one of clinical relevance, using the PICO mnemonic/process. How the review question dictates the inclusion and exclusion criteria for the review and how it forms the basis for the keywords and terms used in the search strategy and which databases/sources the search will be directed towards. The process of selecting studies for retrieval and roles/possibilities of your secondary reviewer were also discussed. We also began the process of incorporating all of these activities/steps into your review protocol which you have begun to develop. Point 2/Bottom section of schematic: So it is at this first point, the critical appraisal of your retrieved studies, that the remainder of the Modules in the CSR training program deviate. In this Module, we will focus primarily on Evidence of Effectiveness, that is evidence which aims to establish casual relationships between variables, or more simply cause = effect, or intervention = outcome. The randomized controlled trial (RCT) is the preferred experimental design for establish this type of evidence. This type of evidence is often associated with the term “quantitative” because it relies on numbers to describe and infer about these relationships. Over the next 2 days, we will also consider evidence that, rather than establish causality, aims to investigate associations - that is observational or epidemiological research. Often due to financial or ethical reasons, experimentation may be impossible or undesirable. The reality of your systematic review is that you will undoubtedly encounter literature or evidence which has used observational study designs, such as a cohort study or case control. To be able to critically appraise this type of evidence, you will need to understand about study design so will spend most of the morning and after lunch discussing aspects of study design inherent to the evidence or knowledge the primary research question is trying to establish. After which we will move onto data extraction, or “pulling out” the information and values of interest for our review question and then we’ll finish the Module tomorrow with consideration of the synthesis or meta-analysis of quantitative outcome data. Throughout the Module, you’ll have the opportunity to put what you learn into practice, critically appraising papers, extracting data, and then analyzing and synthesizing data using the JBI MASTARI online analytical software, a component of the JBI SUMARI software. At the end of the day tomorrow, you’ll have some time to complete your protocols in JBI CReMS related to quantitative evidence and questions of effectiveness, and everyone will have the opportunity to present their protocols to each other. The protocols should be a brief presentation of your title, brief background covering why your topic is important/interesting, your PICO question, and the eligibility criteria for your review presented using your PICO. This should be over 3-4 slides and last 5-7 minutes (Duration allowed may depend on the size of the class).
3
Program Overview Day 1 Time Session Group Work 0900
Introductions and overview of Module 3 0930 Session 1: The Critical Appraisal of Studies 1000 Morning Tea 1030 Session 2: Appraising RCTs and experimental studies Group Work 1: Critically appraising RCTs and experimental studies. Report back 1145 Session 3: Appraising observational Studies 1230 Lunch 1330 Group Work 2: Critically appraising observational studies. Report back 1415 Session 4: Study data and data extraction 1515 Afternoon tea 1530 Group Work 3: Data extraction. Report back 1600 Session 5: Protocol development Protocol development 1700 End Timetable for Day 1 - Appraising quantitative evidence and data extraction and we will do activities for each.
4
Program Overview Day 2 Time Session Group Work 0900 Overview of Day 1
0915 Session 6: Data analysis and meta-analysis 1030 Morning Tea 1100 Session 7: Appraisal extraction and synthesis using JBI MAStARI Group Work 4: MAStARI trial. Report back 1230 Lunch 1330 Session 8: Protocol Development Protocol development 1415 Session 9: Assessment MCQ Assessment 1445 Afternoon tea 1500 Session 10: Protocol Presentations Protocol Presentations 1700 End Timetable for Day 2 - Focus will be on data synthesis and how to do meta-analysis, use of the CREMS - MASTARI software and on completing and presenting your protocols.
5
Session 1: The Critical Appraisal of Studies
So, lets begin the next ‘step/stage’ of the systematic review process, that is the critical appraisal of studies. We’ll begin this session with a general introduction as to why it is we need to critically appraise the evidence, what the aims of critical appraisal are, concepts such as validity and sources of bias and how evidence is ranked. Following this we will delve into detail on the type of evidence you have uncovered, which you must have some knowledge of to be able to effectively appraise the quality of quantitative evidence. This process will take up most of Day 1.
6
Why Critically Appraise?
1004 references 832 references Scanned Ti/Ab 172 duplicates 117 studies retrieved 715 do not meet Incl. criteria 82 do not meet 35 studies for Critical Appraisal Why Critically Appraise? Combining results of poor quality research may lead to biased or misleading estimates of effectiveness Thinking back to Day 1, our results showed that we had 35 studies that met the inclusion criteria and made their way onto appraisal. Why? Simply, there is an overwhelming amount of scientific literature available, however, not all of this literature is of high quality. All of the papers you ultimately select for inclusion into your review must then go through a rigorous appraisal process by 2 members of your review team. The aim is to include only studies which are of a high standard and exclude those which are of poor quality. Inclusion of poor quality research may lead to biased or misleading estimates of effectiveness in your review findings and the conclusions that are subsequently drawn from these results!
7
The Aims of Critical Appraisal
To establish validity to establish the risk of bias By looking into the primary research studies critically, and exploring potential sources of bias, we try to establish the ‘validity’ of the study being examined. When talking about the validity of the paper, we are referring to the minimisation of all forms of bias. Need to consider both internal and external validity….
8
Internal & External Validity
Internal Validity External Validity Relationship between IV and EV? Used locally? When discussing validity, there are 2 terms you will undoubtedly encounter, that is ‘internal’ and ‘external’ validity. The Internal validity of a study focuses on, in simple terms, how good the study is - that is, is it free from bias and systematic error? Is the paper good evidence to suggest that what was done, the intervention for instance, caused, or resulted in, what was measured - the outcome? External validity, on the other side of the swing, focuses on how the results or outcome of the experiments, can be generalized to groups or populations that did not directly participate in the study. Ensuring generalizability, or external validity, may compromise internal validity. Under everyday clinical settings, factors such as patient or doctor preferences, or relationships may influence compliance. Randomization, concealment blinding etc negate these factors and increase internal validity whilst decreasing external validity for example. 8
9
How internally valid is the study?
Strength & Magnitude Strength Magnitude & Precision How internally valid is the study? How large is the effect? A simple way to view these terms can also be with the internal validity establishing how good the study is and the extent to which the design and conduct of the study is likely to prevent systematic error. A good study that has limited error adds to the strength of the association being investigated, for example a causal association inferred from a well conducted RCT. At the other end of the pendulum, is the size of the effect being measured, or its magnitude, and also its precision. In short, whilst clearly dependent on the quality of the study, the greater and more precise the effect size measured, the greater the applicability in practice. 9
10
Clinical Significance and Magnitude of Effect
Pooling of homogeneous studies of effect or harm Weigh the effect with cost/resource of change Determine precision of estimate Beyond any individual study, the systematic review itself aims to further establish applicability and generalizability to clinical practice. Systematic review processes including meta-analysis, a statistical procedure that allows reviewers to combine the results of multiple independent studies that are similar enough, or homogeneous. Doing so increases the precision of the estimate of effect allowing greater confidence in the reliability of the effect size calculated and as such, also helps to establish the external validity, or generalizability of the results. The effect size is important to establish clinical significance, just as much as statistical significance. 10
11
Assessing the Risk of Bias
Numerous tools are available for assessing methodological quality of clinical trials and observational studies. JBI requires the use of a specific tool for assessing risk of bias in each included study. ‘High quality’ research methods can still leave a study at important risk of bias. (e.g. when blinding is impossible) Some markers of quality are unlikely to have direct implications for risk of bias (e.g ethical approval, sample size calculation) So how do we go about assessing the risk of bias, or establishing the validity, of primary research literature? First of all, there are numerous tools available, such as the JBI checklists, which are a series of questions which aim to focus your critical skills to the ‘methods’ used in the study. Different tools, and therefore questions, are available for different study designs, And these questions focus us in on particular aspects or criteria a good study of any particular type should fulfill to be able to draw any valid conclusions from the results derived. As mentioned in the last point on the slide, something like ethical approval, though essential for most experiments or observations, is unlikely to influence the outcome of an particular research.
12
Sources of Bias Selection Performance Detection Attrition
Bias, or systematic error, may impact on experimental research from a variety of avenues. We will briefly examine a handful of the most important or common sources of bias now, and some of these we will revisit in a bit more detail later when we discuss specific types of study design which may be more susceptible to particular biases. These are: Selection bias, Performance bias, Detection bias and Attrition bias.
13
Selection Bias Systematic differences between participant characteristics at the start of a trial Systematic differences occur during allocation to groups Can be avoided by concealment of allocation of participants to groups Selection bias refers to the possibility of the researchers consciously allocating participants to certain groups that may favor one of the treatments. This can be avoided by randomization and concealment of participant allocation. Randomization ensures that every participant has an equal chance of being selected for either group, we will discuss this process in more detail later. You as the appraiser will have to determine how well randomization has been performed/achieved to determine if there is the possibility of ‘selection’ bias influencing the results of the study in one direction or another. Randomization may not be possible in all study designs, for example, one of the acknowledged drawbacks of the case-control design is the inherent presence of selection bias. Selection bias may also be referred to as “allocation bias”.
14
Allocation concealment
Type of bias Quality assessment Population Allocation Selection Allocation concealment Treatment Control Here is the beginning of a table which illustrates what we just mentioned. In the first column is the type of bias, then the method of quality assessment and to whom these methods should be applied to. So here you can see that the use of allocation concealment, or concealment of the results of the randomization process, applied to both treatment and control groups should alleviate the potential for selection bias or allocation bias.
15
Performance Bias Systematic differences in the intervention of interest, or the influence of concurrent interventions Systematic differences occur during the intervention phase of a trial Can be avoided by blinding of investigators and/or participants to group Performance bias arises when there are differences in care received other than the treatment. It can be avoided by blinding - or concealment of the treatment group from the patient and the investigator. In reference to your appraisal, you will need to establish, if you are able, the extent to which the conditions for all participants in the study were the same, other than the differences between interventions or intervention and control.
16
Type of bias Quality assessment Population Allocation
Selection Allocation concealment Treatment Control Performance Blinding Exposed to intervention Not exposed So, in the table we can add blinding of participants and investigators as a method to ensure quality. The only differences observed should be in the intervention.
17
Detection Bias Systematic differences in how the outcome is assessed between groups Systematic differences occur at measurement points during the trial Can be avoided by blinding of outcome assessor Detection bias arises when there are differences in how outcomes of the intervention are assessed for each of the participant groups. For example in the timing or methods of assessment between groups. Blinding is recognized as a means of alleviating this type of bias as if those who are making the assessments are unaware of the group people are assigned to they will be more likely to deal with every subject/participant identically. A detection bias may also be referred to as “measurement bias”.
18
Type of bias Quality assessment Population Allocation
Selection Allocation concealment Treatment Control Performance Blinding Exposed to intervention Not exposed Detection Therefore to ensure detection bias cannot influence the results of the study you are reading, you , as the appraiser should be looking for some report that there has been blinding of the assessor to the allocation of the whole study population.
19
Attrition Bias Systematic differences in withdrawals and exclusions between groups Can be avoided by: Accurate reporting of losses and reasons for withdrawal Use of ITT analysis Attrition bias relates to differences in terms of losses of subjects between groups. Losses to follow up should be reported, though this is often difficult in longitudinal studies which may last many years. As an appraiser, you will need to assess how well this has been reported in the literature you are using. Intention to treat (ITT) analysis is a strategy for RCTs alone that analyses the patient to the group to which they were originally randomly assigned. We will discuss this in more detail shortly when we discuss issues inherent to RCTs.
20
Type of bias Quality assessment Population Allocation
Selection Allocation concealment Treatment Control Performance Blinding Exposed to intervention Not exposed Detection Attrition ITT follow up Follow up The table is now complete, and shows that, for attrition bias, follow up and ITT analysis on all participants enrolled in the study are the methods the appraiser of the research must look for in the original literature.
21
Ranking the “Quality” of Evidence of Effectiveness
To what extent does the study design minimize bias/demonstrate validity Generally linked to actual study design in ranking evidence of effectiveness Thus, a “hierarchy” of evidence is most often used, with levels of quality equated with specific study designs As an appraiser, you will also be required to rank the quality of your quantitative evidence. Has the researcher made all the attempts to ensure validity? Ranking of evidence of effectiveness is generally linked to study design and the ability to maximize internal validity. For example, and RCT is generally ranked higher than a cohort or case-control study. To this end, “hierarchies’ or quantitative evidence have been developed and can be used as tools when ranking the evidence. For example…
22
Hierarchy of Evidence-Effectiveness EXAMPLE 1
Grade I - systematic reviews of all relevant RCTs. Grade II - at least one properly designed RCT Grade III-1 - controlled trials without randomisation Grade III-2 - cohort or case control studies Grade III-3 - multiple time series, or dramatic results from uncontrolled studies Grade IV - opinions of respected authorities & descriptive studies (NH&MRC 1995) Hierarchy of Evidence-Effectiveness (Example 1) These are the hierarchical levels developed by National Health & Medical Research Council in You will see that the number 1 ranked level of evidence is the systematic review of all relevant RCTs, followed by at level 2, at least one properly designed RCT. Level 3-1 is the controlled trials without the randomisation process, Level 3-2 includes the cohort and case control studies, 3-3 is the multiple time series, or if any uncontrolled studies have shown dramatic results. At this time level 4 was the opinion of respected authorities and descriptive studies. In 2001, the NH&MRC revised their hierarchy to….
23
Hierarchy of Evidence-Effectiveness EXAMPLE 2
Grade I - systematic review of all relevant RCTs Grade II - at least one properly designed RCT Grade III-1 - well designed pseudo-randomised controlled trials Grade III-2 - cohort studies, case control studies, interrupted time series with a control group Grade III-3 - comparative studies with historical control, two or more single-arm studies, or interrupted time series without control group Grade IV - case series (NH&MRC 2001) Hierarchy of Evidence-Effectiveness (Example 2) Levels 1 & 2 have remained the same Level 3-1 has now become the well-designed pseudo-randomised controlled trial. (The pseudo-randomisation controlled trial refers to the fact that the researchers have made all attempts to conduct an RCT, but may have been unable to comply with a certain aspect of the methods) Cohort studies, Case control studies, and interrupted time series with control group, have been grouped into level 3-2, While level 3-3 has grouped comparative studies with historical control, two or more single-arm studies, or an interrupted time series without a control group Level 4 has replaced the opinion of respected authorities with case series, and thus opinion and text does not rank in this hierarchy.
24
JBI Levels of Evidence - Effectiveness
Level of Evidence Effectiveness E (1-4) 1 SR (with homogeneity) of experimental studies (e.g. RCT with concealed allocation) OR 1 or more large experimental studies with narrow confidence intervals 2 One or more smaller RCTs with wider confidence intervals OR Quasi-experimental studies (e.g. without randomisation) 3 3a. Cohort studies (with control group) 3b. Case-controlled 3c. Observational studies (without control groups) 4 Expert opinion, or based on physiology, bench research or consensus At JBI we use our own levels of evidence for effectiveness. In the introductory module you met the FAME scale. Effectiveness is the ‘E’ in FAME and only deals with quantitative evidence. Level 1 is again the systematic review of experimental studies (but also makes the point about homogeneity- which will be discussed further shortly), and also includes in this level one or more large experimental studies with a narrow confidence interval (we will discuss the confidence interval briefly in the meta-analysis section). There will often be comments from participants regarding what constitutes a narrow confidence interval or a large experimental study. These are interpretations that will vary. Level 2 is one or more smaller RCTS with wider (another interpretive term) confidence intervals or Quasi-experimental studies, ie those without randomisation. Level 3 again is segmented into three further subgroups – a, b, & c 3a is Cohort studies that includes control groups 3b is case controlled and 3c is observational studies without a control group Unlike the 2001 NH&MRC levels, JBI has reinstated expert opinion as level 4 evidence of effectiveness. We believe that in the absence of all higher forms of evidence, the use of expert opinion, or evidence based on physiology, bench research or consensus constitutes valid evidence. .
25
The Critical Appraisal Process
Every review must set out to use an explicit appraisal process. Essentially, A good understanding of research design is required in appraisers; and The use of an agreed checklist is usual. To be able to use a clear, well defined appraisal process, an appraiser must have a good understanding of research design. Without a good understanding, it will be difficult for an appraiser to answer the questions posed by Appraisal checklists accurately and confidently. There are many agreed and accredited, tried and tested, checklist available - these are the tools that should be used. When conducting a JBI review an appropriate JBI appraisal checklist should be used. In the event that another appraisal tool or instrument is deemed more appropriate, this must be identified during protocol development and should be appended to the published protocol.
26
Session 2: Appraising RCTs and experimental studies
We will now take an hour to discuss the methodological requirements of experimental studies and RCTs in particular. Once you understand the methodology and methods involved, you’ll have the opportunity to use the checklists to appraise some papers yourselves.
27
RCTs RCTs and quasi (pseudo) RCTs provide the most robust form of evidence for effects Ideal design for experimental studies They focus on establishing certainty through measurable attributes They provide evidence related to: whether or not a causal relationship exists between a stated intervention, and a specific, measurable outcome, and the direction and strength of the relationship These characteristics are associated with the reliability and generalizability of experimental studies The RCT is the ‘gold standard’ of evidence of effectiveness. There are many different types of study design associated with the measurement of quantitative data. Quantitative studies may be experimental or non-experimental (observational) in design. The ideal design for an experimental study is the randomised controlled trial (RCT). RCTs are used to determine the effect of an intervention compared to another treatment option, whether it be placebo, another treatment, or usual care (Webb et al 2005). When a randomised controlled trial is designed well and appropriately performed, they provide the best evidence on the effectiveness of an intervention (Altman et al 2001). They are the most rigorous method to determine the existence of a cause-effect relationship between a treatment and an outcome. (Kendall 2003) In terms of evidence generation, experimental studies RCTs and pseudo RCTs are the classical designs for establishing cause and effect. The nature of this evidence is to identify certainty – the methods of such designs are focused on establishing validity and minimizing risk of bias. The effect size adds to our knowledge on the strength of the relationship between an intervention and an outcome, thus experimental studies tell us two important things related to improving global health: whether there is a causal relationship between an intervention and an outcome and what the strength of that relationship is. These two characteristics give us more certainty about the results of well-conducted experimental studies in addition to power to generalize. The ‘reliability’ and ‘generalizability’ in the last point can be also be termed truthfulness or applicability respectively. Establishing “causality” is something observational studies can’t do, mainly because of the presence of bias and confounders due to less stringent control and manipulation of the study by the researcher. These terms will be addressed in more detail when we consider the critical appraisal of observational studies shortly.
28
Randomised Controlled Trials
Evaluate effectiveness of a treatment/therapy/intervention Randomization critical Properly performed RCTs reduce bias, confounding factors, and results by chance RCTs are often used to evaluate how effective a new treatment/therapy/intervention is for patients with a certain condition. Individuals (or other units) are randomly allocated to a treatment group. Randomization is essential as this ensures that all treatment groups are comparable at the beginning. Confounding factors (variables), which may somehow impact upon the results of the study such as age, gender, etc will be spread evenly across groups to ensure treatment arms are as comparable as possible prior to receiving the intervention. Properly designed and performed randomised controlled trials reduce the risk of bias, confounding factors, and results by chance. However, poorly conducted randomized controlled trials are susceptible to bias and may produce misleading information or exaggerated treatment effects. (Kao 2008, Moher et al 2001, Altman et al 2001) Bias is the result of errors in the research process which leads to results deviating from the truth. (Kendall 2003) There are different sources of bias that can occur in regards to health care research. A well designed randomized controlled trial attempts to eliminate sources of bias where feasible.
29
Experimental studies Three essential elements
Randomisation (where possible) Researcher-controlled manipulation of the independent variable Researcher control of the experimental situation These are some of the defining features of an experimental study. Discuss as per slide. With point 2 you may discuss terminology, clinicians will be familiar with the terms intervention and outcome. Statisticians prefer to use the terms independent and dependent variable. In an experiment you manipulate the independent variable, or intervention, and measure the change in the dependent variable, or outcome. As much as possible, everything should be controlled so any changes in the outcome can be directly associated to changes in the intervention. These characteristics are quite distinct from observational study designs, where there is very little control of the research conditions.
30
Other experimental studies
Quasi-experiments without a true method of randomization to treatment groups Quasi experiments Quasi-experimental designs without control groups Quasi-experimental designs that use control groups but not pre-tests Quasi-experimental designs that use control groups and pre-tests There are varying definitions and guidance available on what constitutes a quasi-experiment. A common and simple definition of a quasi experiment is one which is very similar to RCTs except they lack true random assignment of subjects to treatment groups. They may also be referred to as pseudo randomized trials. An example may be basing which group a participant is assigned to by alphabetical selection, or their seating arrangement in a class. Other definitions may identify a quasi experiment as a study that violates any of they characteristics of an experimental study, for example, may be without control groups, without pre-tests or may have both of these factors.
31
Sampling Selecting participants from population
Inclusion/exclusion criteria Sample should represent the population Sampling is the process of selecting individuals/groups from the target population and including them in the trial. The target population is the population which the results of the trial will be relevant/applicable. Inclusion/exclusion criteria are set to define a specific study group for the trial. The most important issue to consider when selecting a sample is that the sample is representative of the target population. A criticism of experimental research is often the strict and predefined sampling criteria that are used that often limits generalizability of the results. There are different methods of sampling from the population…
32
Sampling Methods Probabilistic (Random) sampling Consecutive
Systematic Convenience Papers you will read include a range of sampling techniques with varying levels of risk of bias: Probabilistic (Random) sampling is the random sampling of individuals from the target population. To be truly random, all members of a population should have an equal chance of being selected. Often this is difficult to achieve however, where possible probabilistic sampling should occur. If a large sample is required, cluster or multi-stage sampling can be used. For example a random sample of hospitals is drawn and random patients then from within each hospital. Consecutive sampling is the consecutive sampling of every patient who meets the inclusion criteria for the trial from the population over a designated time period. Systematic sampling occurs where samples are decided on a system, such as every third patient is to be enrolled in the trial. This may be hazardous if the investigator has the potential to affect the order in which patients are seen. And may introduce bias as a result. Convenience sampling is common, saves time and money and is quite simple, but may not be representative of the target population. You may notice subjects for trials are drawn from healthcare workers or patients in hospital as opposed to those suffering the same the same illness out in the general community - these are samples of convenience.
33
Randomization The schematic indicates the “R” in RCT - randomization of participant allocation to the experimental groups. This is an example of a parallel trial. Other examples of trial designs exist that participants may be familiar with, for example, cross over designs or factorial designs. Neither needs to be discussed in detail Here, we have our sample of the population, an individuals are randomly assigned to either group 1 or group 2. They have the same chance of ending up in either treatment group and receiving either the intervention or control. The intervention is performed and the results recorded/measured. This random allocation reduces the risk of bias, in particular selection bias, and corrects for any potential confounders. This schematic is also useful to demonstrate the basis of comparative effectiveness, or comparing the effectiveness of one intervention or another (or control), and is often the same schematic that can be used to conceptualize comparisons made in a systematic review.
34
Randomization Issues Simple methods may result in unequal group sizes
Tossing a coin or rolling a dice Block randomization Confounding factors due to chance imbalances stratification – prior to randomization ensures that important baseline characteristics are even in both groups However, we know that as reviewers, decoding claims of randomization are not always easy….The principals might be widely accepted or recognized, but the process in published research can be less clear…. Simple methods of randomization, such as tossing a coin or rolling a dice, may result in unequal group sizes, which can have an effect on the results of the trial, especially in smaller scale studies with smaller samples. Tossing a coin is still a random process. If it is flipped enough times, ultimately, probability states that you will get an approximately 50/50 split in the size of your 2 groups. Block randomization has been used to address this issue. To ensure comparable sample sizes, participants are put in blocks (groups). Randomization then occurs in the block to ensure equal numbers are assigned to each treatment arm. We’ll discuss this in more detail in just a moment Another issue that may occur with small sample sizes is chance imbalances between the treatment groups in terms of certain confounding factors. This can be reduced through stratification. This technique ensures that important baseline characteristics are even in both groups. Stratification occurs prior to randomization by placing subjects in strata this may be by age, sex, history, co-morbidities to name a few, with randomization occurring within the strata. We will illustrate this in a moment also.
35
Block Randomization All possible combinations ignoring unequal allocation 1 AABB 4 BABA 2 ABAB 5 BAAB 3 ABBA 6 BBAA Block randomization avoids the problem of uneven group sizes simple randomization techniques such as coin flipping can lead to in SMALLER trials. Blocking ensures that the numbers of participants to be assigned to each of the comparison groups will be balanced within blocks In the example the numbers 1-6 are assigned different combinations, in ‘blocks’ of 4, when values are read from a table of random numbers subjects are assigned according to the block associated with the number. For example for the number 1 two people from the population sample will be assigned to Group a And 2 people to Group B. In the example on the slide, , will generate the allocation BAAB ABBA ABBA ABAB (ignore 8 and 7) AABB. I.e you will have 10 people in Group A and 10 people in group B. Knowing the block size can introduce bias however e.g. once 3 subjects are allocated 4th is always apparent. This allocation bias can be eliminated by changing the block size as recruitment continues. This can be done randomly by computer. Use table of random numbers and generate allocation from sequence e.g Minimize bias by changing block size
36
Stratified Randomization
As mentioned, another issue that may occur with small sample sizes is chance imbalances between the treatment groups in terms of certain confounding factors. This can be reduced through stratification. This technique ensures that important baseline characteristics are even in both groups. Stratification occurs prior to randomisation by placing subjects in strata (this may be by age, sex, history, comorbidities etc), with randomisation occurring within the strata. This can also illustrate cluster randomisation, where every square or circle could represent a school or hospital for example, and all of the individuals in there are randomly assigned to the one group. This is often used for very large trials and specialised statistical methods must then be used to account for this ‘large’ scale randomisation as opposed to individual randomisation normally encountered in smaller trials.
37
Blinding Method to eliminate bias from human behaviour
Applies to participants, investigators, assessors etc Blinding of allocation Single, double and triple blinded Blinding is a method used to eliminate bias which may result from human behaviour, some of which we discussed earlier. Therefore blinding is relevant for individuals included in the trial and also the investigators and assessors. When adequate blinding has not occurred, studies report a larger treatment effect on average (Day 2000). Blinding ensures that those involved in the trial, including study participants, data collectors, and the like are unaware of which treatment group has been assigned. Blinding is most important when subjective measures are used to assess outcomes, as these are more likely to be affected by knowledge of the treatment. Sometimes blinding of an intervention is impossible, for example it would be difficult to blind a surgeon to the intervention they were performing! Single-blind and double blind are terms used to state whether blinding has occurred at either both the participant and investigator or only one. Triple blinding is where the assessor or analyst analysing the results is unaware of the treatment groups and so limits the potential of even inadvertent bias in the handling of data.
38
Blinding Schulz, 2002 Discuss the benefits of blinding as per Table.
Single, double, triple as you move down and include the people on the list from participants, investigators and assessors.
39
Intention to Treat ITT analysis is an analysis based on the initial treatment intent, not on the treatment eventually administered. Avoids various misleading artifacts that can arise in intervention research. E.g. if people who have a more serious problem tend to drop out at a higher rate, even a completely ineffective treatment may appear to be providing benefits if one merely compares those who finish the treatment with those who were enrolled in it. Everyone who begins the treatment is considered to be part of the trial, whether they finish it or not. Intention to treat, abbreviated ITT, is a strategy for the analysis of RCTs that compares patients in the groups to which they were originally randomly assigned. This is generally interpreted as including all patients, regardless of whether they actually satisfied the entry criteria, the treatment actually received, and subsequent withdrawal or deviation from the protocol. Effectiveness may be overestimated if ITT is not done, as explained in the example on the slide. Outcome date must be available for all subjects for complete ITT analysis.
40
Minimizing Risk of Bias
Randomization Allocation Blinding Intention to treat (ITT) analysis Recall that we are appraising experimental literature to determine if the study is methodologically sound and therefore if we can ‘believe’ the results as they are presented and the inferences drawn from these results. Identified bias within the study will cast doubt on the results and question the validity, both internal and external, of the study. We mentioned earlier how these processes listed here can be used to minimize the risk of bias arising from selection, performance and detection within a study. One method of dealing with bias we did not examine in any detail earlier was ITT analysis which is a method employed in trials to minimize attrition bias and poses as a threat to the internal validity of the study.
41
Appraising RCTs/quasi experimental studies JBI-MAStARI Instrument
Here is the critical appraisal tool or checklist for the appraiser to use when assessing the quality of a RCT/Pseudo randomized trail. This is a credible JBI tool which is in the JBI MASTARI software package. It represents a list of questions that need to be considered when examining experimental studies. All questions are answered either yes no or unclear. Yes indicates that there is a clear statement in the paper which directly answers the question. No is where the question has been directly answered in the negative. Unclear is where there is no clear statement, or there is ambiguous information. Explain what each of the questions mean. The first question to be considered is; was assignment to treatment groups truly random? This refers to the methods the researchers have used in attempt to ensure that each study participant received an equal chance of being in either of the groups. For example, if a statement is made such as the participants were randomised, this would be seen as an inadequate response, as it is not clear how they were randomised, and should be answered as Unclear. Whereas if the statement was made such as the participants were randomised to each study group using a blinded computer randomisation process⒊, this an be seen as a clear statement, and can be answered yes. Question 2 asks did the participants know about the treatment outcomes, or which group they had been allocated to? This could be seen as performance bias on behalf of the participant, in that if they are aware of their potential treatment outcomes, and it may sway their response. It may often be difficult to blind this, but this is aided in drug trials by the use of the placebo. Q3 Did the people who were allocating the participants to groups know which ones they were being allocated to? This process should also be blinded in order to try eliminate any selection bias. If the allocator is blinded to this process, they are unable to alter the allocation of participants to groups. Q4 Were the people who withdrew from the study for any reason mentioned, and included in the analysis? This is the question that is attempting to eliminate the attrition bias that we spoke about earlier. This may also include to presence of an ITT analysis. Q5 Were those who assessed the outcomes blinded to treatment allocation? This deals with detection bias, in that those assessing the outcomes should be, where ever possible, unaware of the treatment group of that participant. Q6 The groups of the study should be similar enough to say they are comparable when they enter the study. As we mentioned earlier in selection bias, the study should not have young fit people in group A, and elderly unfit people in group B, they are very different. Q7 Did everyone in the study receive the same care or treatment, other than the named intervention that was the focus of the study? If a study is examining the effects of an anti-psychotic medication, but that group also receives individual therapy, where the other group does not, then the groups are not treated the same, unless they too were to receive the therapy. Q8, Were the outcomes measured in the same way for all groups? From the above example, the intervention group cannot use one outcome measure, such as X measure, and then the control group uses the Y measure. Q9 Were those measures that were used to measure outcomes, seen to be a reliable and tested tool? If we were looking at a participant’s level of consciousness, then the Glasgow Coma Scale is a recognised and reliable tool. Q10 The final question asks whether an appropriate statistical analysis has been used. We will go over what dichotomous and continuous data is in a moment for those of you that are unsure.
42
Assessing Study Quality as a Basis for Inclusion in a Review
Included studies Excluded studies poor quality cut off point high quality When you have completed the critical appraisal process you will have to decide with your co-reviewer if the study should be included or excluded. This diagram shows a cut off point for good or poor quality. Your team will have to decide where this cut off will be. You may decide 6/10 or 8/10. You may place greater importance on particular questions for example, you may exclude any study which fails question 1 and you’re not convinced the randomisation process was adequate. The inclusion and exclusion criteria of your systematic review may help decide which of these appraisal questions carries more weight.
43
Group Work 1 Working in pairs, critically appraise the two papers in your workbook Reporting Back There will be two experimental studies, that is RCTs, to appraise.
44
Session 3: Appraising Observational Studies
The likelihood is, after having conducted your search of the literature, you will have come across other types of studies besides just RCTs and controlled experiments aimed at establishing cause and effect. These study designs are collectively known as observational research and are important tools in epidemiology. Observational studies represent a different way of collecting data. They are important for a number of reasons, for example, you could never ethically conduct an experiment to determine if smoking causes lung cancer. This is the realm of observational research where, rather than the investigator taking control of the experiment, nature takes its course, and the researcher observes what occurs. It is important to make clear that when using observational studies to inform the effectiveness of interventions and therapies, the inferences that we can draw from the results are not as strong as from experimental research. Due to the lack of experimenter control and the inherent impact of bias and confounding factors as a result, causal associations can rarely be inferred. Observational studies are also commonly used to investigate the etiology of disease (or causes of), in these cases, despite no experiments being conducted, if the associations are large and clear enough, such as that between smoking and lung cancer they may be deemed causal – statisticians, researchers and clinicians may argue this point!
45
Rationale and potential of observational studies as evidence
Account for majority of published research studies Need to clarify what designs to include Need appropriate critical appraisal/quality assessment tools Concerns about methodological issues inherent to observational studies Confounding, biases, differences in design Precise but spurious results Observational research accounts for the majority of published research studies. Often trials are not feasible due to costs, difficulty in recruiting participants and exorbitant costs. Dependent on your review question you may decide to exclude observational studies altogether, or only include certain types of study design such as cohort and case control, but exclude case series and case reports. As you can imagine, the questions you just answered in relation to appraisal of RCTs are unlikely to be appropriate here, where randomization for instance, does not occur. Therefore these types of studies need and have their own JBI appraisal instruments. There are many potential concerns about methodological issues related with observational studies including biases which we have briefly encountered, combination of studies with different designs, confounding which we will discuss briefly in a moment and how these factors can lead to results which are precise, but false!
46
Appraisal of Observational Studies
Critical appraisal and assessment of quality is often more difficult than RCTs. Using scales/checklists developed for RCTs may not be appropriate Methods and tools are still being developed and validated Some published tools are available Often the critical appraisal of observational studies can be more difficult than that of RCTs. This is sometimes due to the variation in methods used and the heterogeneity that may be present between studies. It is difficult to define features which are inherent to the methodologies of all observational studies as can be done for RCTs. These tools such as the JBI checklist are constantly being developed and tested for applicability.
47
Confounding The apparent effect is not the true effect
May be other factors relevant to outcome in question Can be important threat to validity of results Adjustments for confounding factors can be made - multivariate analysis Authors often look for plausible explanation for results Confounding is an important issue which arises in observational research and is simply where the apparent effect, the one which is reported on, isn’t in reality the true effect. There may be other factors which are relevant to the outcome in question. It is this underlying principle, which is not present in well conducted, randomised and controlled trials, which precludes casual inferences form observational research. Only associations can be inferred. Confounding can be an important threat to the validity of the results of the study, in particular those of cohort studies. Often when you read observational research literature, complex statistical corrections have been made to account for potential confounders and you will encounter terms such as hazard models and multivariate analyses. Another difficulty in your review may arise if you intend to or attempt to combine the results of studies which have adjusted for different sets of confounders, or if the studies have adjusted for confounders which are not really confounders!
48
Bias Selection bias Follow up bias Measurement/detection bias
differ from population with same condition Follow up bias attrition may be due to differences in outcome Measurement/detection bias knowledge of outcome may influence assessment of exposure and vice versa Valid results are said to be unbiased. Bias either exaggerates or underestimates the ‘true’ effect of an intervention or exposure. As we discovered earlier bias is systematic deviation form the truth that confounds results. Under or Over estimation of true effect may occur. There are many types of bias - here are a few - all potentially ‘confound’ results of studies and any subsequent combination or meta-analysis of them. For example, a Selection bias is a major consideration on all observational studies, as the processes we use to alleviate it in an experimental study – randomization and allocation concealment, do not apply in observational designs. A well conducted cohort study however, may have very good ‘external’ validity, as it represents the ‘clinical’ or ‘real world’ potentially much better than a trial subject to ‘artificial’ and stringent controls.
49
Observational Studies - Types
Cohort studies Case-control studies Case series/case report Cross-sectional studies Now we will briefly visit the main types of observational studies, all of which can contribute to evidence of effectiveness of an intervention. Here they are ordered with the ‘gold standard’ of observational research, the cohort study at the top, all the way down to the lowly case report and cross sectional study. Cohort and case control studies are the two most common forms of observational research. Cohort studies can easily be identified according to their comparison of interest, for example a cohort of physically active people versus those who do not exercise. Case-control studies can be identified according to the disease or outcome of interest, for example those with lung cancer and those without. The RCT is not an appropriate study design to establish the aetiology of a disease or risk factors or hypotheses related to these factors.
50
Cohort Studies Group of people who share common characteristic
Useful to determine natural history and incidence of disorder or exposure Two types prospective (longitudinal) retrospective (historic) Aid in studying causal associations A cohort is a group of people who share a common characteristic, this may be people born in the same year or month - birth cohort, or may be a cohort of healthcare workers, or smokers or sports people etc. The cohort study is considered the ‘gold standard’ of observational epidemiology and is often used to track exposure and subsequent disease associations. Cohort studies may strongly aide causal associations however an RCT will be needed to establish causality. There are 2 types of cohort study, prospective or longitudinal, which tracks outcomes forward in time, often over a very long period in time and also retrospective, or historic, studies where all of the exposure, latent period and development of the disease has occurred in the past.
51
Prospective Cohort Studies
A prospective cohort is also called longitudinal study and is often used to study exposure-disease associations ie what causes disease - as mentioned, can’t do with RCT. Outcomes tracked forward in time. The cohort identified before appearance of disease, and the sample is representative of population. Exposed = e.g. all born on same date (birth cohort), or taking same drug, or had shot of vaccine. Not exposed = maybe general pop. or sample from same pop with little or no exposure, otherwise similar. Take measures through time (over years!) Doctors/patients often used - easy to monitor. It is important to note that here the subjects are not randomly assigned. Taken from Tay & Tinmouth, 2007
52
Prospective Cohort Studies
Longitudinal observation through time Allows investigation of rare diseases or long latency Expensive Increased likelihood of attrition Long time to see useful data On this slide the advantages of cohort studies are listed above and the disadvantages below.
53
Retrospective Cohort Studies
A retrospective cohort study is also called an Historic Cohort study. Often medical records of groups who are alike but differ by some characteristic are used, eg. smokers and are compared for particular outcome eg. lung cancer. All events - exposure, latent period, and outcome have occurred in past. In a retrospective cohort study, there is no follow up. Taken from Tay & Tinmouth, 2007
54
Retrospective Cohort Studies
Mainly data collection No follow up through time Cheaper, faster This slide shows some of the advantages of a retrospective cohort study.
55
Case-Control Studies Cases’ already have disease/condition
Controls’ don’t have disease/condition Otherwise matched to control confounding Frequently used Rapid means of study of risk factors Sometimes referred to as retrospective study Cases control studies are studies where ‘cases’ people with a condition or disease for instance are specifically matched, this may be age, sex, occupation, demographics etc the their ‘controls’, this matching is done to control for potential confounding. Once matched, these studies look back, retrospectively, to see if any differences in behaviour or exposure can account for the presence of the disease or condition. They are often used to study and hypothesise about risk factors for disease.
56
Case-Control Studies Schematic of the case-control study. As we’re looking retrospectively, case control studies are often also subject to recall bias, where people with a disease or illness are often more inclined to have made predetermined associations with the cause of their illness and remember with greater impetus some potential exposure or event, than a control, who may have had the same exposure but did not develop any illness. Biomedical Library, University of Minnesaota, 2002
57
Case-Control Study Inexpensive Little manpower required Fast
No indication of absolute risk The advantages of the case control study are outlined above, furthermore, as aggregate data is determined for the case control pair, they allow specific analysis of associations. The main disadvantage is listed below - here an exposure may increase occurrence of disease by 5 times, but in absolute terms this may only be from from 1 in 5 million to 1 in 1 million - a case control cannot shed any light on the absolute change.
58
Case series/Case reports
Tracks patients given similar treatment prospective Examines medical records for exposure and outcome retrospective Detailed report of individual patient May identify new diseases and adverse effects Case series and case reports represent anecdotal evidence. They are clearly subject to selection bias however are incredibly important for the reporting of unusual or novel conditions, which may be specific to your patient, or topic of interest. Often a case report may be the only evidence you will finds on the topic. Clearly there is not control or comparison involved.
59
Case series/Case reports
In this schematic, one line represents case report, whole lot is case series.
60
Cross-sectional Studies
Takes ‘slice’ or ‘snapshot’ of target group Frequency and characteristics of disease/variables in a population at a point in time Often use survey research methods Also called prevalence studies Another study design which uses no control group is the cross-sectional study, which as the name suggests simply takes a slice through a population of interest at a given point in time. Cross sectional studies will often rely secondary analysis of data collected for other purpose, eg. Census data, or will use surveys to collect the data of interest. The major drawback of the cross sectional study is that you can’t say which came first, exposure of interest or disease. Often if exposure disease relationship suggested by cross sectional study, will lead to more robust case-control, cohort or RCT.
61
Appraising comparable Cohort and Case-control studies JBI-MAStARI Instrument
Now, understanding some of the important aspects of observational study design and how these studies are fundamentally different from experimental studies, we’ll meet the JBI appraisal instrument for cohort/case control studies. This is the critical appraisal tool or checklist for the appraiser to use when assessing the quality of a cohort/case control study. This is a credible JBI tool which is in the JBI MASTARI software package. It represents a list of questions that need to be considered when examining observational studies. All questions are answered either yes no or unclear. Yes indicates that there is a clear statement in the paper which directly answers the question. No is where the question has been directly answered in the negative. Unclear is where there is no clear statement, or there is ambiguous information. Info re questions here:
62
Appraising descriptive/case series studies JBI-MAStARI Instrument
Similarly, understanding some of the important aspects of observational study design and how these studies are fundamentally different from experimental studies, we’ll meet the JBI appraisal instrument for cohort/case control studies. This is the critical appraisal tool or checklist for the appraiser to use when assessing the quality of a descriptive/case series study. This is a credible JBI tool which is in the JBI MASTARI software package. It represents a list of questions that need to be considered when examining observational/descriptive studies. All questions are answered either yes no or unclear. Yes indicates that there is a clear statement in the paper which directly answers the question. No is where the question has been directly answered in the negative. Unclear is where there is no clear statement, or there is ambiguous information. Info re questions here:
63
Group Work 2 Working in pairs: Reporting Back
critically appraise the cohort study in your workbook critically appraise the case control study in your workbook Reporting Back Now use the first of the two appraisal sheets and appraise the two studies you have been provided with.
64
Session 4: Study data and Data Extraction
We have now discussed how to appraise the quality of experimental and observational, studies. Once you have completed the appraisal process of your retrieved articles with your secondary reviewer it is time to turn your attention to extracting the relevant data from your included literature. Data extraction does not refer solely to collecting the relevant numbers related to your outcomes of interest, although these are extremely important values for your review, but it also refers to the relevant descriptive data and the like which will be necessary to present in your review. All of this data can be extracted at the same time with use of the appropriate forms.
65
Considerations in Data Extraction
Source - citation and contact details Eligibility - confirm eligibility for review Methods - study design, concerns about bias Participants - total number, setting, diagnostic criteria Interventions - total number of intervention groups Outcomes - outcomes and time points Results - for each outcome of interest: sample size, etc Miscellaneous - funding source, etc Data extraction forms should include detailed information on allocation methods, attrition, assessment and analysis. Information on interventions should include treatment modalities and the amount, duration, frequency and intensity of the intervention. Participant characteristics should include demographic information such as age, gender, location etc. Data extraction for outcome measures includes recording information such as name of the instrument, method use to obtain the data, and the validity and reliability of the method used. It is also important to record the mode of measurement and different scales used along with their grading. Statistical data is required to calculate effect measures, therefore it is important that recorded data should include number of people (usually N) assigned to the treatment and control comparison groups and all statistical tests used to test differences between the two groups. Data for continuous measures includes means and standard deviations and data for dichotomous measures includes number of cases that experienced an event (in both treatment and control groups) and total number of cases in each group (N). Consider blinding (removal of author information, journal names etc) during data extraction and compute and report inter-rater reliability, if possible.
66
Quantitative Data Extraction
The data extracted for a systematic review are the results from individual studies specifically related to the review question. Difficulties related to the extraction of data include: different populations used different outcome measures different scales or measures used interventions administered differently reliability of data extraction (i.e: between reviewers) The systematic review pools together the results of two or more individual studies. We need to go through each individual study and extract the data that is applicable to our review question. Some of the difficulties that can arise when extracting data from a paper are ensuring that the data is line with our review question, and also in comparing with the other papers that are to be used in the systematic review. Things to be considered include different populations, different outcome measures, different scales, interventions measured differently, and the reliability of the data extraction between the reviewers.
67
Minimising Error in Data Extraction
Strategies to minimise the risk of error when extracting data from studies include: utilising a data extraction form that is developed specifically for each review pilot testing the extraction form prior to commencement of the review training and assessing data extractors having two people extract data from each study blinding extraction before conferring To overcome the difficulties associated with data extraction, and to minimise the risk of error, there are several steps that you can take; As with the critical appraisal process, you should utilise a data extraction tool that has been agreed and tested, and is based on the review criteria. This may be a form which has been specifically developed for your review to ensure those extracteing the data do not miss any relevant information. You should pilot test the data extraction tool prior to using it. Each reviewer involved should be trained in the process of data extraction, and there should be two reviewers extract data from each paper. This is to try and eliminate transcription errors. Each reviewer should do their data extraction individually before the two reviewers confer with the answers they have sourced.
68
Data most frequently extracted
1004 references 832 references Scanned Ti/Ab 172 duplicates 117 studies retrieved 715 do not meet Incl. criteria 82 do not meet 35 studies for Critical Appraisal 26 studies incl. in review 9 excluded studies Data most frequently extracted After critical appraisal in this example - 26 studies have been included and will appear in the body of the review itself - with specific details related to the outcomes of interest in the results section of the review. These details will also appear alongside each study citation In the table of included studies. The most frequently extracted data however, the data which will have the most direct bearing on the review question is the outcome data you extract from the literature…
69
Outcome Data: Effect of Treatment or Exposure
Dichotomous Effect/no effect Present/absent Continuous Interval or ratio level data BP, HR, weight, etc In a quantitative review of effects we are looking for numerical data. All numerical data comes in two distinct types. Discrete (or attribute) data are numeric data that have a finite number of possible values, cannot be subdivided meaningfully and represents information that can be categorized into a classification. Discrete data is based on counts, typically in whole numbers and exact. An example of discrete data is a finite subset of numbers (1,2,3,4,5), corresponding to, for example, No Pain…Strong Pain. Another example might be how many students were absent on a given day. When looking at outcome data, particularly for calculating an overall estimate of effect using meta-analysis, discrete data is either found as, or treated as, dichotomous, or binary data. For instance, if looking at an outcome of mortality, subjects are either dead or alive; considering a tumour, it is either present or absent. Continuous data can have almost any numeric value and, unlike discrete data, can be meaningfully subdivided into finer and finer increments, depending upon the precision of the measurement system. The numbers are continuous with no gaps or interruptions. Measurable quantities including length, volume, time, money, temperature heart rate and blood pressure are examples of continuous data. A particular data item may have a minimum and a maximum value. Continuous data can be any value in between. 69
70
What do you want to know? Is treatment X more effective than treatment Y? Is exposure to X more likely to result in an outcome or not? How many people need to receive an intervention before someone benefits or is harmed? The outcome data you look for and extract from the literature will be dependent on the information that you are looking for in relation to your review question. Are you interested in a comparison between different interventions, this data will arise from a study such as the parallel groups RCT we saw earlier. E.g. is drug A more effective at lowering blood pressure than drug B? To answer this question, data will often be presented as odds. Your question may direct you towards outcome data related to treatment is more likely to produce an outcome - this may be positive or negative and will often be presented as values of risk. Or your question may be related to the number of people need to treat or harm before one more person benefits from or is harmed by the treatment. 70
71
Risk Risk = “Risk” of birthing baby boy?
# times something happens # opportunities for it to happen “Risk” of birthing baby boy? One boy is born for every 2 opportunities: 1/2 = .5 That is: 50% probability (risk) of having a boy One of every 100 persons treated, has a side-effect, 1/100 = .01 The bulk of the discussion will focus on dichotomous data. Risk, simply put, is the probability of an event, and can be referred to as the absolute risk. This is done mainly to differentiate it from relative risk which we will encounter shortly. It can be calculated as the number of time s something happens over the total number of opportunities it potentially has to happen. Explain examples as text on slide. 71
72
Relative Risk (RR) Ratio of risk in exposed group to risk in not exposed group (Pexposed/Punexposed) The RR of anaemia during pregnancy = the risk of developing anaemia for pregnant women divided by the risk of developing anaemia for women who are not pregnant. The RR of further stroke for patients who have had a stroke = risk of a stroke within one year post stroke divided by the risk of having a stroke in one year for a similar group of patients who have not had a stroke. The relative risk, also called the risk ratio, is the ratio of the risk of an event among an exposed population (treatment, or experiment group) to the risk among the unexposed (control group, placebo group). As you can see it is different from the simple absolute risk just addressed. Explain examples as per text on slide.
73
For example A trial examined whether patients with chronic fatigue syndrome improved 6 weeks after treatment with i.m. magnesium. The group who received the magnesium were compared to a placebo group and the outcome was feeling better In this example, a clinical trial examined whether patients with chronic fatigue syndrome improved 6 weeks after treatment with intramuscular magnesium. The group who received the magnesium were compared to a group who received a placebo and the outcome was feeling better. The risk following treatment is (a/(a+c)) 12/15 = 0.8, without exposure, or the placebo injection the risk is b/(b+d). The RR is a ratio of these two probabilities. 0.8/0.18 = 4.5 x the chances of improvement! ‘Risk’ of improvement on magnesium = 12/ 15 = 0.80 ‘Risk’ of improvement on placebo = 3/ 17 = 0.18 Relative risk (of improvement on Mg2+ therapy vs placebo) = 0.80/0.18 = 4.5 Thus patients on magnesium therapy are 4 times more likely to feel better on magnesium rather than placebo
74
Interpreting Risk What does a relative risk of 1 mean?
That there is no difference in risk in the two groups. In the magnesium example it would mean that patients are as likely to “feel better” on magnesium as on placebo If there was no difference between the groups the confidence interval would include 1 It is important to know whether relative or absolute risk is being presented as this influences the way in which it is interpreted What else can we determine from the values for risk? As we saw earlier, our chances of having a boy are 1:1 or even, similarly, if we have a risk of 1 or 1x the risk/chances, there is really no difference! In the magnesium example, if the relative risk was 1, it would mean that patients are as likely to feel better on magnesium as on placebo. We can also gain further information about these point estimates by using a measure of dispersion such as a confidence interval. We will discuss these in more detail later, however, if the confidence interval includes 1, then we can conclude there is no significant increase or decrease in risk in the intervention or treatment group relative to the control group, similar to saying p > 0.05, as we will be 95% confident that there is potentially no added risk with the treatment!
75
Issues with RR – defining success
Treatment A Treatment B Success Failure 0.96 0.04 0.99 0.01 It is important to note, that results can be interpreted quite differently depending on your approach or perspective, for example, if we are interested in the “success” of an intervention, or how beneficial it is, we may find treatment A, is quite good, with an absolute risk of improvement of 0.96 whilst Treatment b is even more attractive with a risk of The RR of this improvement is 0.97 when comparing Treatment A to B. Although this is impressive, if however, you are interested in the other side of the coin, that is the failure of the intervention or treatment, then the seemingly absolute difference, when looked at in terms of RR, can become quite pronounced - with Treatment A 4x as likely to fail as B. If the outcome of interest is success then RR=0.96/0.99=0.97 If the outcome of interest is failure then RR=0.04/0.01=4
76
Absolute Risk Difference
Is the absolute additional risk of an event due to an exposure. Risk in exposed group minus risk in unexposed (or differently exposed group). Absolute risk reduction (ARR) = Pexposed - Punexposed If the absolute risk is increased by an exposure we sometimes use the term Absolute Risk Increase (ARI) Another useful statistic is the Absolute risk difference, which rather than express the risk between exposed and unexposed groups as a ratio, these same ‘risks’ are simply expressed as a difference so (b/(b+d)) - (a/(a + c)). Absolute Risk Reduction (ARR) or Absolute Risk Difference is the absolute difference in rates of harmful outcomes between experimental group and control group. Absolute Risk Increase (ARI) is the absolute difference in rates of harmful outcomes between experimental groups and control groups. ARI is calculated as risk of harmful outcome in experimental group minus the risk of harmful outcome in control group. a/(a+c) - b/(b+d)
77
For example ‘Risk’ of improvement on magnesium = 12/ 15 = 0.80
From the previous example of comparing magnesium therapy and placebo: ‘Risk’ of improvement on magnesium = 12/ 15 = 0.80 ‘Risk’ of improvement on placebo = 3/ 17 = 0.18 Absolute risk reduction = = 0.62 For example, follow as per text on slide.
78
Number Needed to Treat The additional number of people you would need to give a new treatment to in order to cure one extra person compared to the old treatment. For a harmful exposure, the number needed to harm is the additional number of individuals who need to be exposed to the risk in order to have one extra person develop the disease, compared to the unexposed group. Number needed to treat = 1 / ARR Number needed to harm = 1 / ARR, ignoring negative sign. The number needed to treat is the additional number of people you would need to give a new treatment to in order to cure one extra person compared to the old treatment. Alternatively for a harmful exposure, the number needed to treat becomes the number needed to harm and it is the additional number of individuals who need to be exposed to the risk in order to have one extra person develop the disease, compared to the unexposed group. NNH can be expressed as above, or as the inverse of the absolute risk increase (ARI), expressed as a percentage (100/ARI).
79
For example From the previous example of comparing magnesium therapy and placebo: ‘Risk’ of improvement on magnesium = 12/ 15 = 0.80 ‘Risk’ of improvement on placebo = 3/ 17 = 0.18 Absolute risk reduction = = 0.62 Number needed to treat (to benefit) = 1 / 0.62 = 1.61 ~2 Thus on average one would give magnesium to 2 patients in order to expect one extra patient (compared to placebo) to feel better Again, in our example with the magnesium treatment for chronic fatigue syndrome, following calculation of the risks in each group, and the absolute risk reduction. The NNT can be easily calculated as 1/ARR as seen in red.
80
Odds Odds = What are the odds of birthing a boy?
# times something happens # times it does not happen What are the odds of birthing a boy? For every 2 births, one is a boy and one isn’t 1/1 = 1 That is: odds are even One of every 100 persons treated, has a side-effect, 1/99 = .0101 The odds refers to the the probability an event will occur against the probability that it will not occur, or the number of times something happens over the number of times it does not happen. Discuss as examples on slide. 80
81
Odds Ratio Ratio of odds for exposed group to the odds for not exposed group: {Pexposed / (1 - Pexposed)} {Punexposed / (1 - Punexposed)} Earlier on, we met the odds which refers to the probability an event will occur against the probability that it will not occur. The odds ratio is simply the ratio of odds for exposed group to the odds for not exposed group.
82
For example From the previous example of comparing magnesium therapy and placebo: Odds of improvement on magnesium = 12/3 = 4.0 Odds of improvement on placebo = 3/14 = 0.21 Odds ratio (of Mg2+ vs placebo) = 4.0 / 0.21 = 19.0 Therefore, improvement was 19 times more likely in the Mg2+ group than the placebo group. Again in our clinical trial examining whether patients with chronic fatigue syndrome improved 6 weeks after treatment with intramuscular magnesium. The group who received the magnesium were compared to a group who received a placebo and the outcome was feeling better. The odds following treatment with and without exposure, is calculated. The OR is a ratio of these two probabilities. 4/0.21 = 19 x the chances of improvement on the treatment!
83
Relative Risk and Odds Ratio
The odds ratio can be interpreted as a relative risk when an event is rare and the two are often quoted interchangeably This is because when the event is rare (b+d)→ d and (a+c)→c. Relative risk = a(a+c) / b(b+d) Odds ratio = ac / bd When comparing the RR and OR and their uses the odds ratio and relative risk are the same when an event is rare, therefore the two are often quoted interchangeably. This is because if the event, that is ‘a’ is rare, (a+c) will effectively = c, and (b+d) will not be any different from d, I.e. those who did not develop any sign of the disease. Therefore, the OR is clearly no different from the RR.
84
Relative Risk and Odds Ratio
For case-control studies it is not possible to calculate the RR and thus the OR is used. For cohort and cross-sectional studies, both can be derived. OR have mathematical properties which makes them more often quoted for formal statistical analyses When can you expect to see these various measures appear in the literature? There are differences and preferences dependent on the study design used. For case-control studies it is not possible to calculate the RR and thus the OR is used. For cohort and cross-sectional studies, both can be derived. If it is unclear which is the causal variable and which is the outcome, the OR should be used as it is symmetrical, in that it gives the same answer if the causal and outcome variables are swapped. OR have mathematical properties which makes them more often quoted for formal statistical analyses.
85
Continuous data Means, averages, change scores etc.
E.g. BP, plasma protein concentration, Any value often within a specified range Mean, Standard deviation, N Often only the standard error, SE, presented SD = SE x √ N The data required for meta-analysis of continuous studies includes the sample size, the mean response and the standard deviation (SD) for both comparison groups treatment and control. If the standard error (SE) only is reported, the SD can be calculated from the SE, as long as the sample size (n) is known. The equation for converting from SE to SD is simply: SD = SE x √ n Extra points Combining dichotomous and continuous outcomes – dichotomise continuous outcomes. E.g. a pain scale of 10 (continuous data –can be dichotomised into pain vs no pain, a cut-off point of 5). However, the problem with the cut-off points is, they are arbitrary. There are statistical formulas to convert different effect-size indices (odds ratio, risk ratio, mean difference) to a common effect-size index. 85
86
MAStARI Data Extraction Instrument
This is the data extraction form that is used for experimental studies. This is the paper based version that does not appear exactly the same in JBI-MAStARI, but it is the same information that is being extracted. The first boxes are to include the details of the paper-Authors, journal, and year. The record number is purely for your own use, to use as a reference number for future access. The reviewer is you. You should to be specific as possible when extracting the data, it can save a lot of effort and time in having to read over the paper again, when you could have put the information on the data extraction sheet. The method is to describe what they used; ie Randomised Controlled Trial The setting describes the geographical location, as well as the environment. Eg Large metropolitan teaching hospital in Adelaide, Australia. The location can be important is you are considering comparing two studies, one located in a large New York hospital, and the other located in a community hospital in Botswana?? The populations are likely to be quite different. With the participants, they should be identitifed in as much detail as possible. For example:African-American women who are pregnant. It is also helpful to include the total number of participants in this section. The number of participants in each group should be entered into the boxes. Group A refers to the treatment or intervention group, and group B is the control or placebo group. The interventions are then listed for each of the groups. Group A may be receiving massage therapy, and Group B, the control, receives no treatment, or the standard (if it is the standard, then this should be described as to what the standard actually is). Occasionally, a trial will have 2 intervention groups, and then one intervention would be group A the other group B, and the control would be group C. Data extraction continues over
87
Data Extraction Form Cont
You now need to enter the results from the study that are appropriate to your review question. There are three separate tables to enter results, one for the outcomes measured in the study, one for results with dichotomous data, and one for results with continuous data. Outcome measures require a description of the outcome and the specific scale or measurement used. For example the outcome may be anxiety and the specific measure is the State-Trait Anxiety Inventory. Anther example may be weight, measured in kgs. Dichotomous data is the sort of data that is either one or the other, ie yes/no, male/female, black/white. Within the table you are required to enter the outcome measure (Eg infection), and then across the line within the treatment group, enter the number of participants with the outcome (those that have an infection), over the total number of study participants. The same is done for the control group. Continuous data is uses a scale for its outcome, such as weight, blood pressure, pulse. The outcome measure is again entered (eg cholesterol), and then in the treatment group, you enter the mean and standard deviation (SD), and again in the control column. The Authors Conclusion is to provide what the authors have surmised about their paper. The Reviewers Conclusions is then an area for you to enter comments you have in relation to the paper, for example, were the author’s conclusions in line with the results? It is the sorts of things that will remind you on the contents of the paper.
88
Group Work 3 Working in pairs: Reporting Back
Extract the data from the two papers in your workbook Reporting Back Now, working in pairs, you will have some time to extract the relevant outcome data from the experimental/RCT papers you appraised earlier. We will use this same data for our synthesis/analysis using JBI MASTARI software tomorrow.
89
Session 5: Protocol development
Allow time for participants to conclude preparing their protocols.
90
Program Overview Day 2 Time Session Group Work 0900 Overview of Day 1
0915 Session 6: Data analysis and meta-analysis 1030 Morning Tea 1100 Session 7: Appraisal extraction and synthesis using JBI MAStARI Group Work 4: MAStARI trial. Report back 1230 Lunch 1330 Session 8: Protocol Development Protocol development 1415 Session 9: Assessment MCQ Assessment 1445 Afternoon tea 1500 Session 10: Protocol Presentations Protocol Presentations 1700 End Timetable for Day 2 - Focus will be on data synthesis and how to do meta-analysis, use of the CREMS - MASTARI software and on completing and presenting your protocols. 90
91
Overview Recap Day 1 Critical appraisal Study design
Type of studies (experimental and observational) Data extraction Today focus is on data analysis and synthesis. Recap yesterday briefly and look at important issues pertinent to the appraisal process and quality assessment. Today the focus will be on data analysis and issues around meta-analysis, on an introduction to the MASTARI software, and completing and presenting your protocols for a systematic review of quantitative data. 91
92
Session 6: Data Analysis and Meta-synthesis/Meta-analysis
Give a current progress update. So far you have developed your protocol, done the search and retrieved the appropriate papers. Yesterday, we focussed on the critical appraisal and extraction of data from the papers you may encounter when undertaking a quantitative review of effects. These differences in the literature were both on the basis of their study design, and their topic of interest, for example, diagnosis and prognosis. Now we will move onto do the important combination of outcome data using meta-synthesis/meta-analysis of extracted data….
93
General Analysis - What Can be Reported and How
What interventions/activities have been evaluated The effectiveness/appropriateness/feasibility of the intervention/activity Contradictory findings and conflicts Limitations of study methods Issues related to study quality The use of inappropriate definitions Specific populations excluded from studies Future research needs Prior to pooling the data in a meta-analysis, there are a number of things to be considered in the general analysis.
94
Meta Analysis 117 studies retrieved 1004 references 832 references
Scanned Ti/Ab 172 duplicates 117 studies retrieved 715 do not meet Incl. criteria 82 do not meet 35 studies for Critical Appraisal 26 studies incl. in review 6 studies incl. in meta analysis 20 studies incl. in narrative 9 excluded studies Meta Analysis From the studies in our “example’ review, it may be that not all report on the same outcome e.g. - only 6 here will make it into out meta analysis whilst the rest will just be reported on in the text. Remember we need at least 2 studies to be able to do statistical combination. Talk here about meta-analysis. It is a systematic procedure for summarising and pooling the results from 2 or more research studies. We combine the results of similar individual studies, to give a cumulative result, which may then differ from the results of the individual studies. If possible, this is then reported in a visual representation, as a meta-view (MAStARI-view). We will now delve further into meta-analysis.
95
Statistical methods for meta-analysis
Quantitative method of combining results of independent studies Aim is to increase precision of overall estimate Investigate reasons for differences in risk estimates between studies Discover patterns of risk amongst studies It is a systematic procedure for summarising and pooling the results from 2 or more research studies. We combine the results of similar individual studies, to give a cumulative result, which may then differ from the results of the individual studies. Meta-analysis for RCTs is for improvement of precision of overall estimate of “effect”. For observational designs will be to increase strength of correlation or association between exposure and outcome or to investigate reasons for differences in risk estimates or discover patterns of risk amongst studies.
96
When is meta-analysis useful?
If studies report different treatment effects. If studies are too small (insufficient power) to detect meaningful effects. Single studies rarely, if ever, provide definitive conclusions regarding the effectiveness of an intervention. When there is variation in the effect size, which is if each study reports a difference in treatment effects, such as the percentage of reduction in the incidence of a condition. Often a study does not carry enough weight to make it statistically significant because of its sample size. If we able to combine the multiple, good quality, and similar small studies, this can increase the power and possibly provide a meaningful result. We should not use the results of one single study to provide us with a definitive conclusion on the effectiveness of an intervention, but rather pool the results of multiple similar studies to confirm the effectiveness.
97
When meta-analysis can be used
Meta analysis can be used if studies: have the same population use the same intervention administered in the same way. measure the same outcomes Homogeneity studies are sufficiently similar to estimate an average effect. We have seen from the previous slide when meta-analysis can be useful, now we need to see in what situations it can be used. Studies to be included in meta-analysis should be similar to each other so that generalisation of results is valid. This is referred as homogeneity. This is calculated in MAStARI using Chi-square. The four main criteria that must be considered are: patient population (eg is it valid to combine the results of studies on different races of people, or different aged people?) outcome (eg is it valid to combine studies that have measured pain via a visual analogue scale with those that have used a pain diary?) intervention (eg are the interventions being given to the ‘treatment’ group in each study similar enough to allow meta-analysis?) control (eg are the control groups in each study receiving treatment similar enough to warrant combination and meta-analysis?) The questions raised above can be very difficult to answer and often can involve subjective decision making. Involvement of experienced systematic reviewers and/or researchers with a good understanding of the clinical question being investigated should help in situations where judgement is required. These situations should be clearly described and discussed in the systematic review report.
98
Calculating an Overall Effect Estimate
Odds Ratio for dichotomous data eg. the outcome present or absent 51/49 = 1.04 (no difference between groups = 1) Weighted mean difference Continuous data, such as weight (no difference between groups = 0) Confidence Interval The range in which the real result lies, with the given degree of certainty The odds ratio is the test used for dichotomous data (sometimes referred to as binary data), which we have explained is that data where the outcome is either present or not. The odds of an event are calculated as the number of participants experiencing an outcome divided by the number of participants who do not. For example, in a study of 100 patients, 10 of 50 patients receiving treatment A develop an infection, while 16 of 50 patients receiving treatment B develop an infection. The odds of developing infection are calculated as 10/40 (0.250) for treatment A and 16/34 (0.471) for B. The OR for developing an infection on treatment A compared to treatment B is calculated by dividing the odds of infection on treatment A by the odds of infection on treatment B (ie 0.250/0.471, or 0.531). For dichotomous data, there is no difference between the groups if the number crosses 1, i.e. the line of no effect is 1. We will discuss the line of no effect on the next slide. Because the OR is less than 1, it is apparent that the odds of infection are less on treatment A than B. An OR greater than 1 would indicate that the odds of infection are greater on treatment A than B. An OR of is equivalent to a reduction in the odds of infection of 46.9% (ie 100 x ( )). With continuous data, we use the weighted mean difference, where then zero is the line of no effect. The Weighted Mean Difference measures the difference in means of each study when all outcome measurements are made using the same scale. It then calculates an overall difference in mean for all studies (this is equivalent to the effect size) based on a weighted average of all studies. The confidence interval…
99
Confidence Intervals Confidence intervals are an indication of how precise the findings are Sample size greatly impacts the CI the larger the sample size the smaller the CI, the greater the power and confidence of the estimate The Confidence Interval is a measure of how confident we are in the observed effect. A confidence interval gives an estimated range of values which is likely to include an unknown population parameter, the estimated range being calculated from a given set of sample data. Confidence intervals are usually calculated so that this percentage is 95%, but we can produce 90%, 99%, 99.9%, confidence intervals for the unknown parameter. The width of the confidence interval gives us some idea about how uncertain we are about the unknown parameter. A very wide interval may indicate that more data should be collected before anything very definite can be said about the parameter. 99
100
CIs indicate: When calculated for OR, the CI provides the upper and lower limit of the odds that a treatment may or may not work If the odds ratio is 1, odds are even and therefore, not significantly different recall the odds of having a boy When calculated for OR, the CI provides the upper and lower limit of the odds that a treatment may or may not work. As we have already mentioned, if the calculated Odds of an outcome following an intervention/exposure equals 1, then the odds are even, and there is no significant difference (p.0.05). 100
101
You will now be presented with the metaview graph
You will now be presented with the metaview graph. From the graph you are able to see the two studies that were included, and the meta-analysis (Referring to the graphical representation rather than the numbers). The first study is situated on the left hand side of the graph, favouring music, with the confidence interval quite narrow, and not crossing the centre line – the line of no effect. This study, Winter et al, is therefore statistically significant. The second study here, Augustin & Hains, is on the right hand side of the graph, thus favours no music. This had a wide CI, which crosses the line of no effect, and is therefore not statistically significant. We mentioned previously about each study being allocated a weighted percentage. This can depend on the number of participants, the number of events, and the level of variance. The program works this weight out for us. The % of weight is shown numerically; here it is 94.36% for study 1, and 5.64% for study 2. This weight is then also proportionate to the size of the square used. You are to see the exact numbers for the WMD and the CI on the right of the graph, and you could use those numbers to work out their effect and significance also. As predicted on the previous slide using the figures we gained from the meta-analysis, the combined data, represented by the diamond, falls on the left hand side of the graph, and does not cross the line, and is thus statistically significant in favour of music. Along the bottom of the graph is a Test for Heterogeneity. Looking at the P value in the bracket. We don’t want this to be less than Here it is equal to 0.12, and thus is not heterogeneous. Below this is the test for overall effect, with Z=3.62, and the P= This is<0.05 and has an overall statistical significance.
102
The Meta-view Graph Results of different studies combined
Favours treatment Favours control No effect Results of different studies combined When a meta-analysis is complete, it is often then represented as a metaview graph. Showing the results as a metaview graph can often be helpful to increase the understanding of the results. The vertical line of the metaview graph represents when the treatment and control show the same effect, and thus there is no statistical difference between them. This is what we referred to as the line of no effect, where we said that dichotomous data is 1, and continuous data, it is 0. The horizontal line at the bottom is the scale measuring the treatment. Usually, the left hand side of the graph favours the treatment outcome, and the right hand side favours the control outcome. You do however need to be careful to read the labels, as those on the left may not always refer to treatment. The top 5 horizontal lines within the graph are the confidence intervals of five individual studies. In this example, 3 of these lines cross over the vertical line, the line of no effect, and are therefore not statistically significant. Two of the studies do not cross the line, so are statistically significant. Each of the studies is given a square that falls on either side of the graph, and the side it falls on states which effect it favours. The size of the square is related to the percentage of weight allocated to that study. When the multiple studies are pooled together, the results are shown at the bottom of the metaview (the line closet to the bottom of the graph), with a diamond representing the combined studies. The horizontal width of the diamond relates to the confidence interval. In our slide here, the CI does not cross the line of no effect, so it can be said that after pooling the studies together, the overall effect is statistically significant (in favour of the treatment).
103
Heterogeneity Is it appropriate to combine or pool results from various studies? Different methodologies? Different outcomes measured? Problem greater in observational then clinical studies Studies need to be homogeneous to be able to combine in meta-analysis. The first question is generally easier to answer when considering RCTs alone and experiments which are conducted in a similar way. Observational studies introduce new difficulties, different methodologies can lead to greater variation between studies - greater than would be expected due to sampling alone. It can be a major concern when several study designs are combined together.
104
Difference between studies
Heterogeneity Favours treatment Favours control No effect Difference between studies Here you see multiple studies have vastly different outcomes, falling on both sides of the metaview. This is referred to as being heterogeneous. You can see this by a poor overlap of the confidence intervals. When used in relation to meta-analysis, the term ‘heterogeneity’ refers to the amount of variation in the results of included studies. For example, if you have three studies to be included in a meta-analysis, do they each show a similar effect (say, a better outcome for patients in the treatment group), or do the different studies show different patterns? While some variation between the results of studies will always occur due to chance alone, heterogeneity is said to occur if there are significant differences between studies, and under these circumstances meta-analysis is not valid and should not be undertaken. But how does one tell whether or not differences are significant? Visual inspection of the meta-analysis is the first stage of assessing heterogeneity. A metaview plots the results of individual studies and thus indicates the magnitude of any effect between the treatment and control groups. Do the individual studies show a similar direction and magnitude of effect – ie are the rectangular symbols at similar positions on the X-axis? A formal statistical test of the similarity of studies is provided by the test of homogeneity. This test calculates a probability (P value) from a chi-square statistic calculated using estimates of the individual study’s weight, effect size and the overall effect size. When combining these studies, you need to be cautious, and consider why the results of the studies are so different, was it the outcome, or the population. Statistical heterogeneity, as designated by significance (either P < 0.05 or 0.10 depending on the level that you are comfortable with) in the test of homogeneity, suggests that there is not one single treatment effect, rather a range of treatment effects, and thus it is not appropriate to attempt to calculate a single treatment effect. Deeks, Higgins and Altman (2006) provide six alternatives to conducting meta-analysis when heterogeneity is indicated: 1. Double-check the data and extracted from included studies. Possible errors could include using the incorrect units, or not standardising units of measurement between your studies, or using standard errors instead of standard deviations. 2. Do not proceed with the meta-analysis. Consider alternatives such as summarising the results of included studies in a narrative summary. Explore the heterogeneity, either through a stratified analysis (also known as sub-group analysis) or meta-regression. Further information about sub-group analysis and meta-regression is included in the Cochrane Handbook for Systematic Reviews of Interventions. 3. Perform a random effects meta-analysis. A random effects model does not assume homogeneity, and thus can be used as a way of examining the meta-analysis of heterogeneous studies. Note, however, that the studies’ results are still heterogeneous, with all of the implications about whether it is valid to combine the results of heterogeneous studies. In other words, the random effects model does not analyse out the heterogeneity, and the results of random effects meta-analysis of heterogeneous studies should be interpreted very cautiously. 4. Change the effect measure. You should check that you have used the correct effect measure. For example, if you are combining the effects of continuous studies that have measured the same outcome using a different measurement scale, the standardised mean difference is the appropriate effect measure. Similarly, check that the effect measure selected for dichotomous outcomes (odds ratio and relative risks) is appropriate for the included studies. 5. Exclude studies. In general, it is difficult to justify the exclusion of a study that would otherwise have been included, solely on the basis that it adds heterogeneity to the data set. Rather, it is preferable to conduct a sub-group analysis which excludes suspect studies and compare these results to the entire analysis. Reference: Deeks JJ, Higgins JPT, Altman DG. Analysing and presenting results; Section 8. In: Higgins JPT, Green S, editors. Cochrane handbook for systematic reviews of interventions 426 [updated September 2006]. Chichester: John Wiley & Sons, Ltd; 2006.
105
Tests of Heterogeneity
Measure extent to which observed study outcomes differ from calculated study outcome Visually inspect Forest Plot. Size of CI 2 Test for homogeneity or Q Test can be used low power (use p < 0.1 or 0.2) One part of your combination of the results of the various studies you combine using meta-analysis will be to test for heterogeneity. This is essentially a test to measure the extent to which observed study outcomes differ from the calculated study outcome. By virtue of your search and selection process established in your PICO you begin your meta-analysis on the assumption that all studies in the systematic review are essentially the same. If you are using observational studies in your meta-analysis however, you will expect some heterogeneity. There are things we can do at this stage to determine if there is any heterogeneity present in the results of the review however, the first step is to visually inspect the Forest plot. Larger CIs will indicate a less certain estimate and if CIs do not overlap, then it is unlikely these studies have sampled from the same population. SEE NEXT SLIDE FOR ELABORATION. We can also test for heterogeneity statistically. Refer to the music Forest Plot in your workbooks. Looking at the P value in the bracket. We don’t want this to be less than Here it is equal to 0.12, and thus is not heterogeneous. This test is often considered to be be ‘weak’ or ‘not very powerful’, due to the low degrees of freedom involved in the test. Due to this, a non-significant p value does not always mean there is no heterogeneity present. Therefore, particularly where observational studies are involved, a more conservative p value may be used. You class may notice I2 value - I2 quantifies the inconsistency in the analysis. It indicates the impact of the heterogeneity. It represents the % variability in the effect estimates that are due to heterogeneity rather than chance alone (which is the sampling error).
106
Studies too small to detect any effect
Insufficient Power Favours treatment Favours control No effect Studies too small to detect any effect Often you can have a group of studies that are all too small to show any statistical significance, but all favour a particular treatment, as shown in this slide. By combining these studies together, the meta-analysis can provide more statistical power than any one individual result. As a result of this increase in statistical power, combination of studies in meta analysis may see the overall effect measure increase in precision and, if not already, show statistical significance.
107
Meta-analysis Overall summary measure is a weighted average of study outcomes. Weight indicates influence of study Study on more subjects is more influential CI is measure of precision CI should be smaller in summary measure The overall estimate of effect is a weighted average of study outcomes. Weight indicates influence of study, or in other terms, studies are weighted by the amount of information they contribute to the overall analysis. This is calculated statistically by the inverse-variance of the effect estimate. Put simply, if there is little variance or dispersion, the inverse of this will be large, and hence the study will carry more weight, so the more precise the study, with narrower CI’s, the more weight it will carry. Studies with greater sample sizes and also those which measure more of the effect of interest will carry more weight, as they contribute more information in the overall analysis. The CI of the overall estimate of effect should be smaller than those of the individual studies.
108
Subgroup analysis Subgroup analysis
Some participants, intervention or outcome you thought were likely to be quite different to the others Should be specified in advance in the protocol Only if there are good clinical reasons Two types Between trial – trials classified into subgroups Within trial – each trial contributes to all subgroups If there were some types of participant, intervention or outcome you thought were likely to be quite different to the others, you might plan a subgroup analysis. The number of planned subgroup analyses should be kept to a minimum to avoid spurious findings. Where there is significant heterogeneity in the results and no subgroup analysis has been stated a priori, subgroup analysis may be used, but the results interpreted with extreme caution. Two types of subgroup analyses: Between trial (trials classified into subgroups) Within trial (each trial contributes to all subgroups)
109
Example subgroup analysis
Example of how a subgroup analysis may look - split between trials and cohort studies on the basis of study design. Also nice example about some of the issues with meta analysis of observational studies and the differences in results that may be seen! This is described in more detail below if it is something you wish to explore further with your group of participants. Meta-analysis of association between ß carotene intake and cardiovascular mortality: results from observational studies show considerable benefit, whereas the findings from randomised controlled trials show an increase in the risk of death. Meta-analysis is by fixed effects model. Observational studies have consistently shown that people eating more fruits and vegetables, which are rich in ß carotene, and people having higher serum ß carotene concentrations have lower rates of cardiovascular disease and cancer.27 ß carotene has antioxidant properties and could thus plausibly be expected to prevent carcinogenesis and atherogenesis by reducing oxidative damage to DNA and lipoproteins.27 Contrary to many other associations found in observational studies, this hypothesis could be, and was, tested in experimental studies. The findings of four large trials have recently been published The results were disappointing and even—for the two trials conducted in men at high risk (smokers and workers exposed to asbestos)28 29—disturbing. We performed a meta-analysis of the findings for cardiovascular mortality, comparing the results from the six observational studies recently reviewed by Jha et al27 with those from the four randomised trials. For the observational studies the results relate to a comparison between groups with high and low ß carotene intake or serum ß carotene concentration, whereas in the trials the participants randomised to ß carotene supplements were compared with those randomised to placebo. With a fixed effects model, the meta-analysis of the cohort studies shows a significantly lower risk of cardiovascular death (relative risk reduction 31% (95% confidence interval 41% to 20%, P<0.0001)) (fig 2). The results from the randomised trials, however, show a moderate adverse effect of ß carotene supplementation (relative increase in the risk of cardiovascular death 12% (4% to 22%, P=0.005)). Similarly discrepant results between epidemiological studies and trials were observed for the incidence of and mortality from cancer. This example illustrates that in meta-analyses of observational studies, the analyst may well be simply producing tight confidence intervals around spurious results. Taken from Egger, M. et al. BMJ 1998;316:
110
Sensitivity Analysis Exclude and/or include individual studies in the analysis Establish whether the assumptions or decisions we have made have a major effect on the results of the review ‘Are the findings robust to the method used to obtain them?’ The process of undertaking a systematic review and meta-analysis involves many decisions. Ideally, most of these are made while designing the protocol. The role of a sensitivity analysis is to determine whether the assumptions or decisions we have made do in fact have a major effect on the results of the review. A sensitivity analysis addresses the question 'Are the findings robust to the method used to obtain them?' Sensitivity analyses involve comparing the results of two or more meta-analyses calculated using different assumptions. If a study is of doubtful eligibility for the systematic review, then comparing meta-analyses excluding and including that study might be undertaken as a sensitivity analysis (Higgins 2008). Results may be calculated using all studies and then excluding poorer quality studies. Both fixed and random effects meta-analyses (discussed later) might be undertaken to assess the robustness of the results to the method used. If a study appears to be an outlier (has results very different from the rest of the studies) then its influence on a meta-analysis might be assessed by excluding it
111
Meta-analysis Statistical methods Fixed effects model
Random effects model There are different statistical methods and models which are used in meta-analysis. The most common model used is the fixed effects model, but the random effects model also has its place. These models of meta-analysis differ on the basis of the assumptions they place on the data being analysed…
112
Fixed Effects Model All included studies measure same outcome
Assume any difference observed is due to chance no inherent variation in source population variation within study, not between studies Inappropriate where there is heterogeneity present CI of summary measure reflects variability between patients within sample The major assumption of the FEM is that the true effect of the treatment, is the same value in all included studies, and any differences observed between treatment and control are the same, or ‘fixed’ in each study. Any differences observed are due to random error or chance, not heterogeneity. When test of heterogeneity is significant, these assumptions are questionable as the treatment effects no longer appear identical. CI of summary measure reflects variability between patients within sample, or the individual study, and NOT between studies.
113
Random Effects Model Assumed studies are different and outcome will fluctuate around own true value true values drawn randomly from population variability between patients within study and from differences between studies Overall summary outcome is estimate of mean from which sample of outcomes was drawn More commonly used with observational studies due to heterogeneity The major assumption of the REM is that the treatment effects for the individual studies are assumed to vary around some overall average treatment effect. Rather than one true effect value, values are drawn from a normal probability distribution within the population. Here variation accounted for arises from within study variability, as in the FEM and also between study variability.
114
Random Effects Model Summary value will often have wider CI than with fixed effects model Where no heterogeneity results of two methods will be similar If heterogeneity present may be best to do solely narrative systematic review In the REM the CI is often wider than the FEM and therefore, the CI more accurately reflects unaccounted for sources of variation in the study results. Generally, because the assumptions of the analysis are less stringent in the REM, the results are generally harder to interpret. Switching between the two methods of analysis and seeing no difference in results is a good indicator that there is no or very small heterogeneity present in the analysis, hence the same conclusions are drawn with both models. If heterogeneity present may be best to do solely narrative systematic review.
115
Session 7: Appraisal, extraction and synthesis using JBI-MAStARI
Now that we have discussed some of the concepts of synthesis and analysis of extracted data for a systematic review, we will attempt to try and do a meta-analysis with the data we extracted yesterday. To do this we will now introduce the JBI MASTARI software and you will have a chance to first see how the appraisal and extraction works online and then we will follow with group work where you will get to use the software yourselves using the data etc you have already extracted.
116
Meta Analysis of Statistics Assessment and Review Instrument (MAStARI)
We will now demonstrate the JBI MAStARI program.
117
Once you are logged on, you will be taken to the main JBI-MAStARI page
Once you are logged on, you will be taken to the main JBI-MAStARI page. This page consists of three parts: the main menu, reviews radio buttons and a summary of all reviews in the system. These parts will be described in more detail later. Main menu The main menu (Reviews, Studies, Logout, About) is located across the top of all JBI-MAStARI pages. Reviews Each review has a ‘primary’ and ‘secondary’ reviewer. A primary reviewer leads the review and has rights to add, edit or delete reviews; a secondary reviewer assesses every paper selected for critical appraisal, and assists the primary reviewer in conducting the review. The Reviews page lists all reviews that are in the system. This page is used to add new, edit existing or delete reviews. Information about a review, including its title, a description of the review, the year the review was commenced, the names of primary and secondary reviewers and the current status of progress in the review are all presented. Only a reviewer can view the information on this page, and only reviews that correspond to the selected ‘Reviews radio buttons’ will be displayed. In the example shown , only reviews in which the user is the primary reviewer and which are currently being conducted (ie are ‘open’) will be displayed. To access reviews as the secondary reviewer, click on the <Secondary> radio button on the left of screen. To access completed reviews, click on the <Closed> radio button. Click on ADD to add a new review...next slide
118
Adding a new review via JBI-MAStARI
Enter the relevant information: review name, review description and the year the review commenced. The person who has logged on will automatically be entered as primary reviewer. Select the secondary reviewer from the drop down box titled <Secondary Reviewer>. Click the <Update> button to save. Pressing <Cancel> returns the user to the Reviews page and does not save the information. Enter details here for a new review, or conversely click <edit> in the Actions column on the main review screen to open this window. This slide shows the new review details have been entered.
119
This slide shows the new Reviews page with the updated details containing the review which has just been added. To view the Studies for a specific review, click on the Review title: The effectiveness of SUMARI v5.0 vs…” Click “delete” to delete the review project. You will be asked for primary reviewers login confirmation to complete this operation. Click “edit’ to go through to the same screen as previous slide to modify review title, question details etc.
120
Studies page A ‘study’ is the basic unit of a review; in general, each review will assess many studies. You may also hear the term ‘paper’ or ‘publication’ used to refer to a study. To access the studies page, either click on the title of the systematic review on the reviews page or select ‘Study’ from the main menu. The studies page is a list of the first ten publications that have so far been added to the review, including the author, title, journal, year of publication, status and stage of assessment. If there are more than ten studies, these can be viewed by clicking on the ‘Next 10 rows’ hyperlink at the bottom of the screen. The studies page allows users to view, add, edit and delete studies from their review project. Studies can be either manually added or imported into JBI-CReMS from EndNote (Thomson) or other bibliographic software. Once these studies have been assigned to JBI-MAStARI they will be automatically exported into JBI-MAStARI. Note, however, that studies exported into JBI-MAStARI from JBI-CReMS will require assignment of the study design before they can be critically appraised. Click <delete> on the Actions column to delete the study from the review. A warning will appear if there are outcomes and results already assigned and extracted for the study. Studies can also be manually added directly into JBI MASTARI. Click on ADD to add a new study
121
Study Details Studies can be manually entered into JBI-MAStARI using the <Add> button at the bottom of the studies page. Fill out the standard bibliographic details of the publication (Author(s), Title, Journal, Year, Volume, Issue (if relevant) and Page number). The study design field is discussed below. The ‘tab’ key can be used to move to the next field; holding down the ‘shift’ and ‘tab’ keys together will move backwards between fields. Studies manually entered into JBI-MAStARI will be automatically uploaded into the JBI-CReMS file for that review. Assigning a study design All studies – whether exported from JBI-CReMS or manually entered into JBI-MAStARI – must be assigned to a study design. This is important because the study design will determine the critical appraisal tool and criteria that will be used to appraise the study. There are three options for study design that can be selected from the pull down menu. These are: Randomised Controlled Trials/Pseudo-randomised trial Comparable Cohort/Case Control studies Descriptive/Case Series studies If a study is not assigned a study design, it will not be possible to critically appraise the study.
122
Assessment of studies A study is critically appraised (or assessed) to determine if it is of sufficient quality and has included the correct patient population, interventions and outcomes for inclusion in the systematic review. Studies of low quality or which have not included the correct patient population, intervention or outcome are excluded from the review. A study must be read and fully understood by each reviewer before it can be critically appraised. Critical appraisal is first conducted by the primary reviewer, and then conducted independently by the secondary reviewer. The primary reviewer will then conduct the final assessment, based on the two previous assessments, to determine whether the study is included or excluded. Any disagreement between reviewers should be resolved by a third reviewer. Click on Add PRIMARY to go to the critical appraisal criteria. Once studies are entered into the system, the secondary reviewer can similarly immediately begin to appraise studies without the need for the primary reviewer to have completed the appraisal first.
123
Critical Appraisal criteria
As previously mentioned the critical appraisal criteria used in JBI-MAStARI vary with different study designs. Each of the criteria must be addressed, using the Yes/No/Unclear/NA radio buttons next to the criteria. After all criteria have been addressed, the reviewer must decide on whether to include or exclude the study from the review. The decision to include or exclude is based on pre-determined requirements (for example, all included studies must have adequate randomisation and blinding of participants, allocators and assessors, or all included studies must have a minimum number of ‘yes’ scores). The reviewer then selects ‘Yes’, ‘No’, or ‘Undefined’ from the <Include> drop down box. This process is then repeated by the Secondary Reviewer.
124
Final decision on inclusion exclusion
Once the Critical appraisal process has been completed by both reviewers, the the primary reviewer re-assesses each study and makes a final assessment. This is the stage in the critical appraisal process where discussion between appraisers (primary & secondary reviewers) is commonly initiated and where a 3rd appraiser may need to adjudicate if some consensus cannot be reached. The final decision should be made in the drop down menu. If exclude is selected, the entry here will appear in the table excluded studies beneath the reference in the appendices in CReMS. Once complete for each study it is time to move onto data extraction.
125
Extraction Details The extraction form is first and foremost an online record of the important information form the study. The extraction details page lists a range of fields which describe the study: method, setting, participants, number of participants, interventions, author’s conclusions and reviewer conclusions. These fields are present in each of the three different study design types that can be included in a systematic review using JBI-MAStARI; however, the exact details differ slightly for different study designs. In the interventions field, note that in cases where a new or modified intervention (the ‘treatment’) is being compared to a traditional or existing procedure (the ‘control’), it is conventional to include the treatment(s) as intervention A (whose participants are in group A) and the control as intervention B (whose participants are in group B). Once these fields are filled out, click ‘Yes’ on the <Complete> drop down menu at the bottom of the page, and then click on the <Save Details> button. This will take you to the results page where you will be invited to add an outcome that has been measured in the study. The asterix (*) on the extraction form indicates which of the fields will be exported to your report appendices in the CREMS which you can view in the “Report View”.
126
Results/outcomes These are the existing outcomes for this study Click Add New Outcome If you click <Delete Outcome> at the end of the row, this action will remove this study from this outcome (and any data it contributes if the data has been entered at this stage).
127
New Outcome If you have not previously conducted data extraction on your outcome of interest, you will need to generate a new outcome. You need to include a title for the outcome, a description of the outcome, the units or scale that the outcome is measured in, and whether the data is dichotomous (ie can only take two possible entities, for example yes/no, dead/alive, disease cured/not cured) or continuous (ie measured on a continuum or scale using a number, for example body mass in kg, blood pressure in mm Hg, number of infections per year). Note the title of the outcome and its description for future reference. Click on <Save Details> once this is complete. If you have previously conducted data extraction on your outcome of interest, you can simple select the appropriate outcome from the <Existing titles> drop down box. Click on <Delete Review Outcome> in this box to remove the outcome from the review project completely. (Compare to previous delete which simply removed the one study from contributing data to the outcome).
128
Outcomes The new outcome for the review will be added top the list. This slide show the outcomes listed for this Study. Once the outcomes are entered the specific interventions need to be entered. To enter the interventions, click on the Add button under Intervention A. Once the intervention has been added it will be displayed there. Clicking on an existing intervention enable you to edit it. Click on any of the interventions listed in the Intervention A column. This takes you to the New Intervention Screen.
129
New Intervention This slide show the screen for adding and editing an Intervention. Any pre-existing interventions will be viewable from the <Existing interventions> drop down menu; otherwise add an intervention from the study. Add in a description of the intervention and an abbreviation, and note these for future reference. Once the Interventions has been added/altered click Save Details.
130
Results Once the process above has been repeated for intervention B, the ‘Results’ link in the far right column will become active; clicking this link will take you to the results entry page, where the original data from the study is to be entered.
131
Dichotomous Results Adding results The results entry page will differ according to whether the data are continuous or dichotomous. For dichotomous data the required data are ‘n’ and ‘N’, where ‘n’ is the number of participants having the outcome of interest and ‘N’ is the total number of participants in the group. Once the data have been entered, the data must be re-entered, via the <DBL Data Entry> button. Data must be re-entered exactly as it was first entered with the same number of decimal places, otherwise an error will be displayed. Double data entry reduces the risk of a transcription error. For continuous data, required data are the sample size (this is entered during the extraction process), the mean and the standard deviation. The next slide shows the added results. Click <Delete Results> to clear all of the entered values from the screen.
132
Similarly, in this example this shows the screen with the results added for continuous data - notice the entry boxes include Mean, SD and N here on the results page.
133
Meta analysis The default entry screen does not include the meta-analysis, but rather only a presentation of each of the individual studies. For example, due to issues of heterogeneity you may not wish to do the statistical combination, but still present the individual studies. The meta-analysis module is made up of a number of drop down menus that allow the user to specify the comparison required (ie which intervention group is to be compared to which control group), the outcome to be included and the statistical tests to be used. The user must specify the correct data type (continuous/dichotomous), the required effects model to be used (random/fixed), the statistical method of meta-analysis required and the size of confidence limits to be included in the calculations. The method to be used will depend on the data type. For more information see the MAStARI user guide. To see the statistical combination click <analyse results>.
134
Meta analysis The meta-analysis module is made up of a number of drop down menus that allow the user to specify the comparison required (ie which intervention group is to be compared to which control group), the outcome to be included and the statistical tests to be used. The user must specify the correct data type (continuous/dichotomous), the required effects model to be used (random/fixed), the statistical method of meta-analysis required and the size of confidence limits to be included in the calculations. The method to be used will depend on the data type. For more information see the MAStARI user guide. 134
135
Subgroup analysis To conduct a subgroup analysis, click “Results” in the left menu column and then click on name of the existing “Outcome” you wish to perform the subgroup analysis on. 135
136
Subgroup analysis Now at bottom will see option to enter name of subgroup, or from the dropdown list assign to an existing subgroup. Enter or select the name of a subgroup here to assign this individual study to. As with outcomes the first time it is done in the review for any individual study “creates’ this subgroup. Other studies also within this outcome can then be assigned to the same subgroup. NB clicking on <Delete> alongside the subgroup will remove the subgroup from the analysis 136
137
In the <filter by outcome> dropdown list click on the name of the outcome of interest. If subgroups have been made under this outcome another window will open asking if you wish to analyse one or both subgroups. For example, here select “analysis” for the RCT subgroup and click <Ok> 137
138
So only RCT group is “meta analysed”, top group, without label, is presented without meta analysis.
If you click <analyse results> on the right hand side…. 138
139
If you click <analyse results> on the right hand side….
An overall analysis will appear for all 5 studies . The output here is the .jpg image produced by clicking <Save graph to disk>. 139
140
Sensitivity Analysis To perform a SENSITIVITY analysis, from the main MASTARI screen, along side “Filter by Studies”, click “Select Studies for Outcome”
141
Sensitivity Analysis Here un-tick the check box to “exclude” the study from the analysis. The study will immediately be removed from the analysis so the impact/influence this study has on the overall analysis can be readily assessed. The next screen shows the difference in the analysis of the RCT subgroup when the study is excluded.
142
In the upper panel, all 3 studies are included in the analysis
In the upper panel, all 3 studies are included in the analysis. In the lower panel the first study, which carried the most weight has been removed from the analysis. You can see this has had a significant effect on the overall results of the meta analysis!!
143
Group Work 4 MAStARI Trial and Meta Analysis
Now follow you workbook to use the MASTARI software.
144
Session 8: Protocol development
Allow time for participants to conclude preparing their protocols.
145
Session 9: Assessment Allow 20 minutes for participants to do their MCQs then run through the answers with the group. 145
146
Session 10: Protocol Presentations
Open discussion session of participants protocols.
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.