There are always numerous opinions regarding the relevance and quality of clinical studies. As we seek to determine the validity of conclusions given by authors, it is important to develop a consistent approach to the evaluation of the evidence. Here is one such method:
1) What is the question being asked? This is important to understand before proceeding further because it provides a context for the evidence you are about to evaluate.
- Is it a patient oriented outcome? Is the study focused on a lab value (clotting time), a vital sign (blood pressure), a pathology finding (colonies of bacteria), or is it focused on a patient oriented outcome (death in 30 days, loss of limb within 6 months, improved physical strength within 1 year, etc).
- Non-patient oriented outcomes require another step in the evaluation process and are less useful clinically. These types of results need further study to determine the relevance of the finding to clinical medicine. We have to determine if the non-clinical measure is an appropriate surrogate for the clinical outcome we care about. That decision is made by evaluating other existing evidence.
- An example: A study is conducted on the effect of a medication on clotting time. The outcome is that the medication improves clotting time and the stated conclusion is that it therefore improves death from hemorrhage. Although the study outcome may be scientifically valid, the conclusion is not necessarily true. Improvement in clotting time does not necessarily result in less death from hemorrhage. However, this kind of study does provide evidence for the decision to move forward and test the hypothesis that the drug reduces mortality from hemorrhage.
- Is the study a “post-hoc”analysis meaning is the analysis performed on the data after the original study was completed? If so, further study is likely necessary since the original study was not framed to ask this same question.
2) Is the finding statistically significant? This is represented as a P value. We commonly use 0.05 as a cut off because that is the level at which it is agreed upon that the findings are less likely to be due to random chance.
- The P value cut off had to be set somewhere. Setting it too low makes us disregard real effects as random chance, and setting it too high makes us draw false conclusions when change is due to chance alone.
- Statistical significance does not equate clinical significance. (see above).
- “Multiplicity” refers to running multiple tests on the same data collected. The more questions asked, the more likely one will show statistical significance based on random chance. In these scenarios, reducing the P value further may be necessary.
- Sample size is an important consideration when judging statistical significance as well. Small sample sizes may not show significance even when there is one, and very large sample sizes may do the opposite.
3) What is the risk reduction? (aka: Is it clinically significant ) These calculations quantify the effect seen and are represented by two categories: absolute and relative risk reduction. Understanding the difference between the two will prevent some common errors in interpretation.
- Absolute Risk Reduction is the difference in the event rate between the treatment group and the control group.
- Relative Risk Reduction is the absolute risk reduction divided by the risk in the control group
- Why does this matter: In order to answer this question, we need an example: A drug is theorized to reduce all causes of stroke. In a randomized control trial, the group receiving the drug (treatment group) has an absolute risk of stroke of 1/10,000 or 0.01%. The control group has an absolute risk of stroke of 3/10,000 or 0.03%. The absolute risk reduction is the difference between the two, which is 0.02%. But, the relative risk reduction is 67% (that is 0.03-0.01/0.03). Although this is may seem like an impressive reduction in relative risk, the absolute reduction of only 0.02% makes this change clinically insignificant. Do not be swayed by a large relative risk reduction without looking at the absolute risk reduction.
- Another way of drawing into focus the clinical significance is calculating the Number Needed to Treat (NNT). This calculations is 1/ARR and gives you the number of patients who must be treated to have one patient benefit. The closer this number is to one, the better. (More at TheNNT.com)
4) What kind of bias is present, and how much? Notice that this question does not read “Is there bias present?” Bias is common and is a natural product of the clinical trial environment. Controlling for every possible variable, though ideal, is not practical. Our best tool for combating bias is randomization. Many forms of bias exist but familiarity with these major forms and their methods of prevention is helpful when evaluating any trial:
- Sampling bias: This bias refers to the problem of target population selection. If I seek to develop a drug for the general population to reduce stroke risk, but I only study the effect in white middle aged men, I have committed a large sampling bias and reduced the generalizability (application) of the results to the general population.
- Selection bias: This error occurs when the treatment group and control group are not composed of similar populations, casting doubt on the results of the study. Both groups should be similarly composed in order to detect a true effect. Randomization and attention to characteristics of group members is an effective method of combatting this bias.
- Hawthorne effect: This term refers to a change in behavior brought about by observing someone. In a study this can apply to the patients (symptoms improve because a patient gains attention and frequent follow up) or it can apply to the observer (physicians change their behavior while being studied although the study is focused on the effect of a clinical prediction rule). In either case, adequate blinding of the patients and observers should prevent this problem.
- Publication bias: This refers to the assumption that negative studies may be less likely to be published by authors or the journals to which they are submitted.
- Recall bias: This occurs when patients are asked to report symptoms and are aware of their treatment group. Patients may be more likely to report negative symptoms or lack of improvement in placebo groups, and more likely to report positive changes in treatment groups if not adequately blinded.
- Observer bias: This occurs when observers are not blinded and may influence reporting.
Asking these 4 questions when evaluating a clinical study will go a long way in helping you evaluate the strength of the conclusion. For further reading, I recommend the book Evidence-Based Emergency Care (2nd ed) by Jesse Pines et al.
Tripepi G, Jager KJ, Dekker FW, Wanner C, Zoccali C. Bias in clinical research. Kidney Int. 2008;73(2):148-53. PubMed
Carneiro AV. Bias in clinical studies. Rev Port Cardiol. 2011;30(2):235-42. PubMed
Pannucci CJ, Wilkins EG. Identifying and Avoiding Bias in Research. Plastic and reconstructive surgery. 2010;126(2):619-625. PubMed
Evidence-Based Emergency Care : Diagnostic Testing and Clinical Decision Rules, edited by Jesse M. Pines, et al., John Wiley & Sons, Incorporated, 2012. Publisher