|
|
||||||||
Fundamentals of Clinical Research |
1 Department of Diagnostic Radiology, London Health Sciences Center-University Campus, Rm. 2MR21, 339 Windermere Rd., London, Ontario, N6A 5A5 Canada.
Received October 20, 2000;
accepted after revision December 4, 2000.
Address correspondence to S. J. Karlik.
Introduction
|
|
|---|
A hypothetic example perhaps, but consider the outcome of this quandary. I could simply administer the contrast material, but do I know the limitations and actual measurable flow rates attainable with its use? What would be the outcome of negative findings? What patients would be the best subjects for this contrast material? Who would benefit the most from the injection? Is there sufficient scientific backup to identify this usage? Unfortunately, many choices in radiology rest on such slim justifications and unknowns. How many times have radiologists succumbed to a manufacturer's glossy brochure or an impressive pilot study presented at a meeting by a colleague with the promise of the "holy grail" of imaging advances without solid statistically verified scientific support of the advantages of the latest and greatest? When faced with such a quandary, radiologists should consider all the options, evaluate the existing evidence, and possibly investigate the problem themselves. The purpose of this module is to introduce the concepts involved in turning an interesting and valuable question into a reasonable and effective research protocol. I will briefly introduce some essential concepts that will be expanded in detail in later modules in the series. At the end, I will use the preceding clinical scenario to focus my ideas and generate a summary of my research protocol.
|
|
|---|
Motivation is a significant additional component of the personal nature of the research. The definition of a research question is based on knowledge, skills, and the perceived issue. An inquiring mind would probably see questions in the special areas of interest, asking "I wonder if there is a better way to do this?" Choosing the topic is a matter of interest, perceived need, and remembering the fact that research requires time, effort, and money (TEaM) to succeed.
Radiology research questions fall into four general categories: evaluation of equipment (e.g., technology assessment, as in the value of helical CT), discovery of and evaluation of techniques (e.g., platinum embolization coils or accuracy of an imaging sign), reevaluation of old techniques or procedures (e.g., the assessment of ionic and nonionic contrast agents or cost-effectiveness of an evaluative pathway), and application of radiologic techniques to investigate changes in treatment (e.g., the use of diffusion MR imaging in early stroke treatment). All topics can provide significant opportunities to contribute to the advancement of the discipline of radiology.
How can the question be evaluated and put in perspective? How is the "so what" challenge met? The questions can come from many places: an interesting patient, a new piece of equipment, a new contrast agent, or a clinical collaborator. Once the problem becomes interesting, radiologists must evaluate its value to the patient population and their discipline. The investment in TEaM places the decision directly on the potential investigators. A thorough review of all existing available literature is essential, and "Module 7" will address the issues related to an effective critical review of the literature. Obviously, existing studies should not be repeated if they are well done and give an adequate answer to the question. Unfortunately, the radiology research literature has often not met this criterion [1].
One of the good ways to approach a research inquiry is to think from the beginning about publication because peer review is a critical filter for research. Does the project warrant a paper to describe the results? Is the work trivial, predictable, or unoriginal? Sometimes the issue could be outdated or irrelevant. Does the study show true innovation? Similarly, a study with a narrow interest or directed at a highly specialized target population may be of less interest. All studies must have a clinical importance, whether directly or indirectly, with significant implications for patients. In the discipline of radiology, it is important to ask if a new technique or procedure carries additional risk factors that make a study of marginal importance a poor choice. A summary of the key considerations for assessing a research protocol includes the following: a strong personal interest and motivation; a determination of originality, relevance, and lack of triviality or predictability; wide potential interest; definite clinical importance; and risk factors addressed. In the selection of this list, other key factors beyond importance, novelty, and answerability have been emphasized [2]. A recent editorial in Radiology, written to offer a series of guidelines for manuscript review, addressed the elements of both substance and style [3]. It would be wise to consider the strengths and weaknesses of the protocol and advances in knowledge mentioned in this article when planning a project.
Sometimes the question just does not seem to warrant publication, yet is still important to the investigator. An example might be the usefulness of a new piece of equipment brought to the practice, such as an add-on stereotaxic unit for mammography. Does it improve diagnostic ability compared with the previous technique and equipment? This data could be valuable to practice management, perhaps without a wider range of applicability or publication. However, the same scientific skills required for publication-quality research should be used in this investigation.
|
|
|---|
This loop is the foundation for our research work. After a discussion of the individual background components below, I will return to the initial quandary about sonographic contrast material and use the information to structure an appropriate research protocol.
|
|
|---|
What is the best way to answer the question "What proportion of patients who receive contrast agents will have a serious reaction?" If "best" means most accurate, then logging every reaction for every procedure for every bottle of contrast agent manufactured would be the best way. Although this procedure would be ideal, it is obviously not practical. For most, "best" means as accurate as one can afford to be, and accuracy can be expensive in time and money (TEaM). Therefore, generalizations are usually made from incomplete information.
|
|
|---|
The sensitivity (or more properly, the power) of statistical methods depends on the amount of data collected. Because statistical conclusions are based on incomplete information, studies with small samples can fail to determine that a large observed difference is statistically significant. Similarly, using a large sample size can also make a small difference statistically significant. After doing the statistical analysis, radiologists still must judge their results and those of others in terms of the clinical significance of the investigation. There might be highly important differences between our groups, but the sample size is too small to detect them. An example was the need to use large numbers of cases to compare the incidence of adverse effects in nonionic and ionic contrast agents because the actual incidences were small. In a paper that finds no significant difference, did the study have sufficient numbers to determine if a truly important difference existed? Conversely, studies with large samples can reveal significant results that have no substance. Thus, in a study reporting statistical significance, is the result statistical in origin and possibly not important [4, 5]? This latter scenario refers to the "so what" challenge on a completed protocol, but not on a new one.
|
|
|---|
Medical science has adopted the scientific method for determining differences between groups by testing statistical hypotheses. Usually, the question of interest is divided into two competing hypotheses, and a study must be designed to provide evidence for choosing between them. These are the null hypothesis (H0) and the alternative hypothesis (H1). Additionally, if the null hypothesis is to be disproved, studies must be designed so that it cannot be rejected unless the evidence is sufficiently strong. For example, the hypothesis that there is no difference in the adverse reactions between nonionic and ionic contrast agents (H0) is opposed to the hypothesis that there is a difference (H1).
Formulating Hypotheses for Testing
To simplify the interpretation of the results of any statistical test, what
is being compared and the expected outcome, if possible, must be clearly
defined. The rule to follow is to assume that no difference exists between
treatments, groups, and procedures. Assume that any difference that does exist
between the groups is entirely attributable to chance (sampling error, in
particular) [7]. This
assumption will be maintained until a statistical test can show that it is
unlikely that chance alone can account for the difference. This rationality is
analogous to a court of law in which someone is innocent until proven guilty.
Because absolute proof is rare in the courts, guilt that is shown beyond a
reasonable doubt is good enough. So it is in statistical analysis. Absolute
proof that a difference between groups is not due to chance is rare, so
thresholds are set beyond which one can no longer reasonably believe that the
difference is due to chance alone. Conventionally, the scientific community
has used a p value less than 0.05 as sufficiently small to call a
result statistically significant.
The statement that the groups do not differ is called the null hypothesis (H0). If the null hypothesis is shown to be sufficiently unlikely, the belief to which one switches is called the alternate hypothesis (H1) [8]. The final outcome of a hypothesis test is to either reject or not reject H0. Statisticians give the null hypothesis priority over the alternative hypothesis as it relates to the statement being tested. Often the null hypothesis is set up a straw man to be rejected in the study. However, if H0 is not rejected, the data from the experiment do not prove that the null hypothesis is true; the data only suggest that there might not be sufficient evidence against H0 in favor of H1.
A type I error occurs in a hypothesis test when a true null hypothesis is rejected (false-positive). An example would be if a study reported a difference between MR imaging and sonography for the evaluation of carotid stenosis when in fact, there was no difference. A type II error occurs when the null hypothesis is not rejected when it should be (false-negative). A type II error would occur if it were concluded that two MR imaging contrast agents produced the same enhancement when in fact, they produced different effects. A small sample size frequently leads to a type II error. Type I and type II errors are inversely related: that is, a smaller risk of one type is accompanied by a higher risk of the other. The objective is to obtain the lowest chance of a type I error, while minimizing the possibility of a type II error.
The type I error is more serious and, therefore, should be avoided. Thus, when an experiment is proposed, the hypothesis test procedure is adjusted to produce a low probability of incorrectly rejecting H0. The probability of a type I error is the "significance level" (commonly 0.05 or 5%). Therefore, a significance level of 0.05 defines the probability level that we accept to mistakenly reject the null hypothesis. The way statistical science limits a type I error to 5% is to reject the null hypothesis only if a statistic called the p value is less than 5%. The p value measures the likelihood of observing the data, or something further removed, and assuming that the null hypothesis is true. The null hypothesis is rejected when the data are a rare event (i.e., when p is small). The smaller the p value, the more it suggests that the null hypothesis is unlikely to be correct and should be rejected. How small is small? Because we consider the significance level as 5%, an event that occurs one in 20 times is rare enough to make us reject the null hypothesis. Examples of rare events are the following: being hit by lightening, one in 2,000,000; winning a state lottery, one in 14,000,000; or being killed in an automobile accident one in 5000. All these rare events are substantially less frequent than the one in 20 criteria for a rare event in scientific research.
Type II errors occur when the null hypothesis is accepted as true, although it is false. Suppose MR angiography was compared with angiography for detection of carotid stenosis. A type II error would occur if we concluded that the two imaging modalities were the same when in fact, the performance was different. A strategy to minimize the type II error is to have sufficient numbers of studies or patients. Obtaining larger study groups is a two-edged sword because the larger the numbers, the higher the risk of finding differences (or a type I error). The size of the risk of a type II error is ß, and the power of the study (the probability of drawing a true-positive conclusion when the conclusion is true) is 1-ß. Table 1 shows these concepts in a manner familiar to radiologists, the two-by-two diagram; the power of the study is analogous to the sensitivity of a diagnostic test [7]. Because we have a convention that accepts an error of 5%, the standard acceptable ß error is 20% (risk of finding no difference when one exists), and the power, 1-ß, is an 80% chance of finding a statistically significant difference when one exists.
|
Primary and Secondary Hypotheses
The discussion so far has concentrated on the concept of testing one
hypothesis. Scientific protocol is divided into primary and secondary
hypotheses. A hypotheses can be expressed in terms of
"guiding":
CT is better than MR imaging for spinal disease.or "testable":
CT is superior to MR imaging for lumbar spinal stenosis in asymptomatic individuals.
A study can be designed to investigate more than one hypothesis. For example, a study comparing the effectiveness of sonography versus MR angiography for carotid stenosis could have a primary null hypothesis that
MR imaging and sonography are equivalent for the diagnosis of carotid stenosis.
Perhaps secondary hypotheses could include a comparison of enhanced sonography and enhanced MR imaging on the evaluation:
Enhanced sonography is equivalent to enhanced MR angiography for the evaluation of carotid stenosis.or that there is equivalence only for certain degrees of stenosis:
Enhanced sonography is equivalent to enhanced MR angiography for the evaluation of carotid disease in the range of 50-80% stenosis.
Perhaps the patient's medical condition or symptoms could also be the focus of a secondary hypothesis:
Enhanced sonography is equivalent to enhanced MR angiography for the evaluation of carotid stenosis in patients with bruits.
Each one of these new ideas potentially adds to the TEaM. Sometimes, simpler is better. Answer one hypothesis, go entirely through our scientific loop as shown in the Appendix, propose a second hypothesis on the basis of the results, and continue the scientific progression [6]. A statistician collaborator should assist in making that determination on the basis of the study in question.
Similarly, specific aims should be identifiable for each of the protocol hypotheses. For example, if we hypothesize that enhanced sonography is equivalent to enhanced MR angiography for the evaluation of carotid stenosis, then we need to understand that a specific aim also should be considered, perhaps something like the following: to perform contrast-enhanced MR angiography and sonography on 100 consecutive patients with suspected carotid stenosis using carotid angiography as a standard of reference (previously called the gold standard). Subsequent other secondary hypotheses should also have identifiable associated aims.
|
|
|---|
The researcher has a valid clinical question and a specific and relevant issue in the practice of sonography. Evaluation of portal perfusion posttransplantation is a reasonable and valuable clinical diagnostic test for an important patient population.
Our Basic Query
Can enhanced sonography help detect low flow rates in vessels that are
apparently below the detection threshold for conventional Doppler sonography?
Why else would the manufacturers invest so much time and money in their
development? However, is the clinical use scientifically proven?
Some Other Relevant Questions
What is the minimal flow level at which the contrast agent will work? Is
the effectiveness of the contrast machine dependent? What are the best
techniques for visualization of low flow? Does the contrast agent work for all
vessels, or are there anatomic limitations? Are there specific patients who
should not have this contrast material?
Assessing the Existing Evidence
A contrast agent that permits visualization and quantification of low flow
velocities could potentially improve examination on sonography of the patient
with a transplanted liver. Unfortunately, portal venous thrombosis is a common
complication of liver transplantation, leading to high mortality rates,
difficult surgeries, and more postoperative complications
[9]. In the diagnostic
armamentarium, contrast-enhanced studies have proved effective in the
assessment of hepatic allografts with MR imaging and angiography
[10]. Although MR angiography
has been compared with unenhanced sonography in the examination of liver
transplants [10], only
preliminary studies have been performed with sonographic contrast agents to
determine the blood flow in the portal circulation
[11,
12]. Contrast-enhanced MR
imaging has already been used with Doppler sonography in the preoperative
assessment of the portal venous system
[13]. Clearly, potential
exists for the use of sonographic agents for the examination of the portal
venous system after transplantation in the patient. Therefore, this new
technique should be applied to the assessment of the transplanted hepatic
allograft, especially in the patients in whom a conventional unenhanced
sonogram detects low flow or fails to detect perfusion at all. Such an added
discrimination could prevent the unneeded surgical procedures, such as
mesoportal jump graft or splanchnic tributary, in lieu of thrombectomy
[9].
|
|
|---|
This statement seems reasonable; however, this hypothesis can be tested only with great difficulty because the statement is too generic. Some defining questions are the following: in what patients, tissue, or structures? What does low flow mean? These issues are addressed in hypothesis 2.
Hypothesis 2: Enhanced sonography is better than unenhanced sonography for the detection of greater than 50% thrombosis in the portal venous system.
This hypothesis is better, but questions remain. For example, what does "better" mean? Does it mean less expensive, faster, more specific, more sensitive, easier, or less risky to the patient? In the discipline of radiology, the value of a diagnostic test must rest solidly on the concepts of sensitivity and specificity (to be discussed in "Module 11") [14, 15]. A procedure is valueless if it does not show significant sensitivity and specificity. In this instance, the technique must be sensitive to flow rates currently undetected by conventional meansa valuable extension of the existing technology. This consideration leads us to hypothesis 3.
Hypothesis 3: Enhanced sonography is more sensitive than unenhanced sonography for the detection of stenotic vessels (greater than 50% stenosis) in portal venous vessels.
If the determination of sensitivity and specificity is added to the protocol, it is essential to propose some type of a standard of reference. This can be a difficult issue in radiology; a discussion of this topic will be found in "Module 9." The determination of a standard of reference for a diagnostic procedure usually involves postsurgical examination of the relevant tissues. However, other diagnostic tests with established sensitivity and specificity have also been used. In appropriate conditions, follow-up clinical diagnosis may also be appropriate. These considerations speak directly to the relevant knowledge of the investigators and their ability to choose an appropriate standard of reference and leads to hypothesis 4:
Hypothesis 4: Enhanced sonography is more sensitive than unenhanced sonography for the detection of greater than 50% stenosis in portal venous vessels, in which angiography is used as the standard of reference.
Do normal livers have stenoses? The original inquiry and postulate was concerning a transplanted liver. This problem is relevant and gives the opportunity to generate a final testable hypothesis.
Hypothesis 5: Enhanced sonography is more sensitive than unenhanced sonography for the detection of greater than 50% stenosis in liver allograft portal vessels, in which conventional angiography is used as the standard of reference.
With this hypothesis, the specific aim can be defined, incorporating a patient population with a transplanted liver, sonographic investigation with and without contrast agents, quantification of stenosis with sonography and angiography, and determination of sensitivity and specificity. As experts in the field, radiologists know the patients and appropriate radiologic measures. However, the statistical methods and sample size that will achieve the desired power must be determined. This stage is absolutely critical in the design of the study. If an investigator does not have the competence in statistical design, a statistician should be consulted to determine how the observations will be compared and how many subjects will be needed.
Aim 1.to determine and compare the sensitivity and specificity of enhanced and unenhanced sonography for the detection of portal venous stenosis in patients with transplanted livers with angiography as the standard of reference.
Additional aims from the same study could be the following:
Aim 2.to determine the highest degree of stenosis on sonography and contrast-enhanced sonography when flow is still visible.
Aim 3.to evaluate the incremental cost and benefit of the addition of contrast material to the routine examination of newly transplanted livers.
Aim 4.to determine the predictive value of the detection of stenoses below the threshold of conventional Doppler sonography to the failure of hepatic allografts.
The final step in the definition of our research protocol is to generate a research plan that incorporates the relevant experiments needed to fulfill the aims of the study. The following is an example of a research plan to fulfill the primary aims our project.
|
|
|---|
|
|
|---|
Hypothesis
Enhanced sonography is more sensitive than unenhanced sonography for the
detection of greater than 50% stenosis in liver allograft portal vessels,
whereas conventional angiography is used as the standard of reference.
Specific Aims
Aim 1.to determine and compare the sensitivity and
specificity of sonography with and without contrast agents for the detection
of portal venous stenosis in patients with transplanted livers with
angiography as the standard of reference.
Aim 2.to determine the largest stenosis on sonography and enhanced sonography when flow is still visible.
Aim 3.to evaluate the incremental cost and benefit of the addition of contrast medium administration to the routine examination of newly transplanted livers.
Aim 4.to determine the predictive value of stenoses below the threshold of conventional Doppler sonogram for the failure of hepatic allographs.
Research Plan
In consecutive patients referred to the sonography service for routine
examination of a liver allograft, a conventional Doppler sonogram, a
contrast-enhanced sonogram (with 10 mg/kg Dopplerview), and a conventional
angiogram with 30 mL radiographic contrast agent will be obtained. Inclusion
and exclusion criteria for patient participation will be defined. The degree
of stenosis will be determined for all three modalities, and the sensitivities
and specificities for enhanced and unenhanced sonography will be determined
and compared with an angiogram as the standard of reference. Scatter-plots of
detection and stenosis will be used to establish lower cutoff levels for
stenosis detection with and without contrast administration. A percentage
stenosis cutoff level will be established to perform a receiver operator
characteristic curve analysis of the ability of contrast-enhanced sonography
to reveal pathologically important portal flow. The costs of the procedures
will be established and compared. Patients will be followed up clinically for
6 months to determine the relationship between allograft survival and stenosis
detected. A significance level of p less than 0.05 will be used to
evaluate the differences with receiver operator characteristic curve,
chi-square, regression, and t tests, if appropriate.
|
|
|---|
|
Acknowledgments
We thank Donal Downey and Craig Beam for their helpful comments.
|
|
|---|
This article has been cited by other articles:
![]() |
A. R. Schleipman Navigating the Biomedical Research System as a Full Participant: Strategies and Opportunities for the Nuclear Medicine Technologist J. Nucl. Med. Technol., September 1, 2007; 35(3): 170 - 175. [Full Text] [PDF] |
||||
![]() |
M. W. Ragozzino, G. Brancatelli, V. Vilgrain, M. P. Federle, F. Uzan, M. Zappa, and Y. Menu Biases Likely Invalidate the Conclusions [letter] * Dr Brancatelli and colleagues respond: Radiology, June 1, 2004; 231(3): 926 - 927. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |