AJR AJR Reprints & E-prints Available. Order Today!
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Figures Only
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Kazerooni, E. A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Kazerooni, E. A.
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?
AJR 2001; 177:993-999
© American Roentgen Ray Society


Fundamentals of Clinical Research
for Radiologists

Population and Sample

Ella A. Kazerooni1

1 Department of Radiology, 2910 Taubman Center, University of Michigan Medical Center, 1500 E. Medical Center Dr., Ann Arbor, MI 48109-0326.

Received April 6, 2001; accepted after revision April 24, 2001.

 
Series editors: Craig A. Beam, C. Craig Blackmore, Stephen J. Karlik, and Caroline Reinhold.

This is the fifth in the series designed by the American College of Radiology (ACR), the Canadian Association of Radiologists, and the American Journal of Roentgenology. The series, which will ultimately comprise 22 articles, is designed to progressively educate radiologists in the methodologies of rigorous clinical research, from the most basic principles to a level of considerable sophistication. The articles are intended to complement interactive software that permits the user to work with what he or she has learned, which is available on the ACR Web site (www.acr.org).

Project coordinator: Bruce J. Hillman, Chair, ACR Commission on Research and Technology Assessment.

Address correspondence to E. A. Kazerooni.

The reader's attention is directed to the earlier articles in the Clinical Research Series: Introduction, which appeared in the February 2001 issue; Framework, April 2001; Protocol, June 2001; and Data Collection, October 2001.


Introduction
Top
Introduction
Definition of Sample
Can a Disease Be...
Selection Bias and How...
Retrospective Versus Prospective...
Consecutive Versus...
Conclusion
References
 
The design of clinical research begins with the formulation of a research question. As radiologists, we ask many questions about the diagnostic imaging tests we perform and interpret, particularly as new tests are introduced. Can we see a disease on an imaging test at all (technical efficacy)? What are the imaging findings of that disease (description)? Can these findings be used to distinguish between the disease in question and the condition of no disease (accuracy) or distinguish between different diseases (discrimination)? Is a newly introduced imaging test as good as or better than existing tests (comparison)? Can the test be performed in a technically adequate manner in most clinical circumstances (technical reproducibility)? Will the same radiologist interpreting an imaging study today and the same study again next month come to the same conclusion (intraobserver agreement), and will a group of radiologists of varying expertise interpret the same study the same way (interobserver agreement)? What is patient preference when given the option of two or more competing tests? How cost-effective is the test? How does the test affect treatment outcome?

Substantial research questions deal with matters of vital relevance to important groups, or populations of individuals. However, important populations are generally large and, because of numerous practicalities (economy, time, and ethics), researchers often find they cannot afford to study all members of interesting populations. The time-honored scientific solution to this problem is to draw a representative subset, or sample, from the population and to base conclusions about the population on conclusions drawn from the sample. Statistical science is then used to assess and manage the uncertainties inherent in this process of scientific inference.

The goal of this article is to review the distinction made by modern scientific thought between population and sample, and to review considerations applicable to the identification and selection of population and sample in clinical radiology research.

Conventional science distinguishes three groups of individuals (Fig. 1). The goal of the series that includes this article is to bring clinical research in radiology more in line with mainstream medical research. Researchers in radiology should therefore adhere to the modern concepts of target population, study population, and sample when designing and writing about their research. Introductory statistical texts serve to codify current concepts in mainstream scientific thinking. The following excerpt, representative of many, is taken from one such widely used text [1].



View larger version (11K):
[in this window]
[in a new window]
[as a PowerPoint slide]
 
Fig. 1. Graphic shows relationships among target population, study population, and sample. Conventional science distinguishes three groups of individuals. Target population is population of ultimate clinical interest. But, because of practicalities, entire target population often cannot be studied. Study population is subset of target population that can be studied. Samples are subsets of study populations used in clinical research because often not every member of study population can be measured.

 
We must also carefully distinguish between the TARGET POPULATION and the STUDY POPULATION. The target population is the whole group of [individuals] to which we are interested in applying our conclusions. The study population, on the other hand, is the group of [individuals] to which we can legitimately apply our conclusions. Unfortunately the target population is not always readily accessible and we can only study that part of it that is available. If, for example, we are conducting a telephone interview...we do not have access to those individuals without a telephone.

Further on in the same text, the authors identify "sample" [1]:

There are many ways to collect information about the study population. One way is to conduct a complete CENSUS, by collecting data for every [individual] in it.... A more practical approach is to study some fraction, or SAMPLE, of the population.

Before selecting a sample, the investigator first must determine whether a need really exists for the information that will come from the investigation. The question being asked is intimately related to the selection of a sample that can provide the answer, and to the size of the sample needed to answer the question. The sample composition impacts the generalizability of the results to the study population; the composition of the study population impacts further generalization to the target population. The biases that might be introduced in the selection of the sample impact the confidence in the conclusions that can be drawn from a research study. In discussing the sample necessary to answer different questions, examples have been taken from this author's subspecialty of thoracic radiology, particularly the use of CT pulmonary angiography for the diagnosis of acute pulmonary embolism and lung cancer.


Definition of Sample
Top
Introduction
Definition of Sample
Can a Disease Be...
Selection Bias and How...
Retrospective Versus Prospective...
Consecutive Versus...
Conclusion
References
 
The sample is described thoroughly in terms of clinical and demographic characteristics in the methods section of a research article so that others can draw conclusions, apply the results, and compare one investigation with another. It is not the target population, but rather a group of patients or individuals who are actually studied. The target population consists of all the individuals in the world, or in the United States, with the same characteristics as the sample to which we would like to apply the conclusions of a study. Because it is unrealistic to perform research on all individuals on earth or in the United States or in one state, we settle on a subset, or a sample, with defined inclusion and exclusion criteria. However, the results drawn from the investigation of the sample are interpreted and applied directly only to the study population. For example, to evaluate the accuracy of CT and MR imaging for lung cancer staging, it is not possible to perform CT and MR imaging on all patients diagnosed with lung cancer in the United States. The Radiologic Diagnostic Oncology Group [2] reported the accuracy of CT and MR imaging in 170 patients with "known or suspected" non-small cell lung cancer who were "considered to be surgical candidates on the basis of general health and pulmonary function." The sample was the 170 patients, and the target population was all patients with known or suspected lung cancer who were surgical candidates in the United States. A third group must be defined, however: the study population. This population includes the sample and all other patients with the same characteristics as the sample who did not participate in the study, but are in the same geographic location during the same time period of the study. For example, in the Radiologic Diagnostic Oncology Group study of 170 patients, 250 patients in total met eligibility criteria. The study population includes those 80 patients who were excluded for various reasons. Some patients might have declined to be studied, others might have dropped out after enrollment. How they differ from those who agreed to participate might introduce bias, which is discussed later.

If a group of patients in clinical practice meets the same inclusion and exclusion criteria as the sample, then we apply the conclusions drawn from the sample to these patients from the study population with confidence. The more a patient differs from the sample, the more likely it is that the results from the sample do not apply to this patient.


Can a Disease Be Detected on an Imaging Test, and What Does It Look Like?
Top
Introduction
Definition of Sample
Can a Disease Be...
Selection Bias and How...
Retrospective Versus Prospective...
Consecutive Versus...
Conclusion
References
 
If the intended purpose of proposed research is to introduce a new concept to the literature, then a sample of one or a few might be sufficient. This approach might be useful when a new technology is applied to a disease or clinical circumstance, or when the imaging findings of a specific disease are being described. This type of research is called descriptive research, and it is used in most of the published radiology articles [3,4,5,6]. Descriptive research is the lowest on the hierarchy of studies at providing information that can be used to evaluate the efficacy of a diagnostic test in actual clinical practice [7], but for rarely occuring diseases it might be difficult to do anything more. However, these studies are a necessary first step along the way to evaluating efficacy. They are the easiest to perform, use the least amount of resources, and in the circumstance of a single case report, are usually the hardest to publish. Without knowing what a disease looks like, the next step—determining whether a test can distinguish between disease or no disease, can discriminate between diseases, and, if so, how accurately and reproducibly—cannot be done.

For example, in the early to mid 1980s, several groups of researchers reported on CT and pulmonary embolism [8,9,10,11,12,13]. Those articles were case reports and small case series that for the first time documented that pulmonary embolism could be seen on IV contrast-enhanced CT. Although this simple concept might appear obvious to someone looking at the CT technology of today, it was not apparent before that time. The purpose of these reports by several investigators was to confirm the observation and to generate a database of knowledge that could lead to the generation of more complex scientific hypotheses. The early observations did not show the technical limitations of the technique or reveal the parameters necessary to optimize the technique. They did not show the accuracy of CT compared with a known reference standard such as conventional pulmonary angiography, and they did not show the accuracy of CT compared with other diagnostic tests, such as ventilation—perfusion scintigraphy alone or in combination with lower extremity sonography. They did not show whether observers of varying expertise could agree on the diagnosis reproducibly or evaluate patient preference for one diagnostic test or another. These observations were simply the first step in a series of steps that need to occur before it can be determined if and what the role of a new technology is in medical practice.


Selection Bias and How to Select an Unbiased Population
Top
Introduction
Definition of Sample
Can a Disease Be...
Selection Bias and How...
Retrospective Versus Prospective...
Consecutive Versus...
Conclusion
References
 
When looking for a population of patients with a specific disease for which the findings of that disease are to be described, or to compare the accuracy of one test against another, it might seem straightforward to generate a list of all patients with the disease who have undergone the test or tests of interest over a specified period of time. However, who is chosen impacts to whom the results can be generalized. Many times in descriptive series a statement is made in the methods section that all patients with a specific disease imaged with a specific test formed the sample. Or, when comparing one test against another, such as CT versus MR imaging, all patients who underwent CT were compared against all patients who underwent MR imaging. What does this really mean? It is important that the population studied is thoroughly described, so that readers can compare the results of one study against another, particularly when results appear to be in conflict. Several biases can be introduced; the major issues of concern are sampling bias, the exclusion of patients, the use of a retrospective sample versus a prospectively collected sample, consecutive versus nonconsecutive patient enrollment, and selection based on the availability of imaging rather than the clinical presentation or clinical question.

Sampling Bias
The best sample is one that has the same characteristics as the study population to which the investigator wishes the results to be applied. The choice of a control group might introduce bias. A control group made up of normal volunteers recruited from a newspaper advertisement or a notice on a bulletin board is likely to be healthier than disease-free patients being seen in a medical clinic, which will make a diagnostic test appear more specific [14]. For example, if the intent is to investigate the diagnostic accuracy of a test, such as positron emission tomography, to distinguish between lung cancer and no lung cancer, the appropriate group to study is all patients with suspected lung cancer, not patients with lung cancer and healthy volunteers. In actual clinical practice, the diagnostic test would not be applied to normal healthy volunteers but instead to patients with, for example, a solitary nodule detected on a chest radiograph, some of whom will have lung cancer and some of whom will not.

No matter what population is studied, it is important to thoroughly describe them. It is equally important to describe the sample. Although age and sex are usually specified, other factors, such as racial mix, inner city versus rural setting, or type of medical center in which the investigation was performed, often are not. Diseases might look different in populations of different ethnic backgrounds, and therefore diagnostic tests might perform differently. Patients referred to a tertiary academic medical center might have more severe disease than patients treated for the same disease in a community hospital. This factor might make a diagnostic test appear to be more sensitive than it is in actual community practice, because more severe disease is generally easier to detect [14]. It is also important to report comorbidities. For example, the accuracy of CT pulmonary angiography for pulmonary embolism might be different in outpatients, who in general are less sick and more likely to be able to hold their breath for a CT examination, than in hospitalized patients, particularly intensive care unit patients, who are more likely to have lung disease. In this example, reporting the frequency of pleural effusions, lung abnormalities, pulmonary function test results, and the percentage of patients who are ventilator-dependent might be crucial to understanding the population studied and how the results could be applied in clinical practice.

Exclusions and Omission of Uninterpretable Results
As important as it is to describe who was studied, it is also important to describe patients who were excluded from the study or who declined to participate, because they might be different from the patients actually studied [15]. Some exclusions are random: for example, an optical disk on which a CT scan of a patient was stored is corrupted and the hard- copy images for that case are lost, or a patient died an unrelated death as a result of an airplane crash. Other exclusions are not random, and might introduce bias. For example, if patients with early stage lung cancer manifesting predominantly as a solitary pulmonary nodule declined to participate in a CT study designed to evaluate lung cancer staging, the sensitivity of CT staging might be artificially high and the population studied might be biased to patients with relatively obvious metastatic disease. On the other hand, if patients with advanced metastatic lung cancer declined to participate in the study because they felt too sick, then the sensitivity of CT staging might be artificially low because the patients with the most obvious disease were not included. For these reasons, it is important to describe the patients studied as well as the patients who were not studied, and to compare them to determine whether inherent differences exist.

Consider the Radiologic Diagnostic Oncology Group lung cancer staging study [2] in which 80 of the 250 eligible patients were excluded from the analysis. The report states that 43 of these patients did not undergo a surgical staging procedure, and "20 of these were considered to have extensive disease on the basis of imaging studies (six of these had T3 or T4 lesions)." Therefore, six (7.5%) of 80 patients excluded had T3 or T4 lesions, compared with 48 of the 170 studied, or 28% [2]. In general, the higher the T level, the more likely that metastatic lymph nodes are present and that these lymph nodes are larger in size and greater in number than for lower level T lesions, and therefore easier to identify. If the sample is skewed toward patients with more severe disease, then the sensitivity might be overestimated. On the other hand, for the other 14 of 20 excluded for extensive disease, it is not stated in the published report what the extensive disease was. It is logical to think it might have been metastatic disease or M1 disease because patients with all levels of nodal or N disease were reported. If this is correct, then 14 (17.5%) of 80 excluded patients had metastatic disease. Because it is more likely that patients with metastatic disease have larger lymph nodes of greater size than patients without metastatic disease, selecting out more obvious cases of lymph node metastases might artificially reduce the reported sensitivity for lymph node staging compared with a group of all patients with known or suspected lung cancer selected to undergo imaging. So within the same study there are reasons to think that the reported sensitivity of CT and MR imaging for staging the lymph nodes is exaggerated and underestimated. The more thoroughly the sample and the excluded patients are defined, the easier it is to know whether they are similar or dissimilar and how that might impact these reported measures of test performance.

Omitting the results of studies that are technically inadequate and therefore uninterpretable, or including in a study only patients who can cooperate sufficiently to produce a technically optimal diagnostic test can lead to an overestimate of the test's sensitivity. For example, one cause of suboptimal-quality CT pulmonary angiography for acute pulmonary embolism is respiratory motion, because many patients with suspected pulmonary embolism are short of breath. If the sample is selected using clinical and demographic characteristics, and then the examinations of suboptimal quality are excluded from the final analysis, the reported sensitivity will be higher than if these patients were included in the analysis as cases in which no pulmonary embolism was detected on these studies (i.e., as negatives).

Using another CT pulmonary angiography example, Remy-Jardin et al. [16] compared the findings in 20 patients who underwent pulmonary angiography studies using 3-mm collimation, pitch of 1.7, and 1.0 sec per rotation with findings in 20 patients who underwent CT pulmonary angiography studies using 2-mm collimation, pitch of 2, and 0.75 sec per rotation. Remy-Jardin et al. stated the purpose of their study was to "analyze the influence of collimation on identification of segmental and subsegmental pulmonary arteries." The frequency of arteries that were sufficiently well seen to be analyzable for emboli was reported for both groups, with statistically significantly more segmental and subsegmental arteries seen with the thinner collimation protocol. When the sample is scrutinized, the scans included in the study had to be "technically acceptable," with strict inspiratory apnea and good or excellent arterial contrast opacification. Patients with prior lung surgery, lung distortion, or parenchymal infiltration on CT were excluded. Thirty-five patients were evaluated for suspected pulmonary embolism, all of whom had negative findings for pulmonary embolism on CT pulmonary angiography; the other five patients (12.5%) were not scanned because of suspected pulmonary embolism. In other words, the CT scans were much more ideal than they would be in a consecutive group of patients being scanned for pulmonary embolism, who are commonly short of breath and might have lung parenchymal or pleural abnormalities, or alterations in cardiac function that might reduce the technical adequacy of the study. Although this study of collimation showed that with thinner collimation more small vessels were well seen, it is unclear whether this finding would translate to a more realistic clinical population.


Retrospective Versus Prospective Selection
Top
Introduction
Definition of Sample
Can a Disease Be...
Selection Bias and How...
Retrospective Versus Prospective...
Consecutive Versus...
Conclusion
References
 
When patients are selected retrospectively, it is important to know why they were selected for imaging. Rather than representing all patients with a suspected disease or all patients in a specific clinical circumstance who presented for evaluation, it is more likely that patients might have been sent for imaging for clinical reasons that make them different than if the diagnostic test had been applied to all patients with the same disease or symptoms. Biases will be introduced by such patient selection that might overestimate the value of the diagnostic test being studied or the frequency with which specific abnormal findings are reported.

When looking at pulmonary embolism, the sensitivity of CT pulmonary angiography for small emboli has been questioned, leading investigators to look at the frequency with which isolated subsegmental or smaller pulmonary embolisms occur. Reported percentages have ranged broadly from 4% to 36% [17,18,19,20]. In one study, consecutive patients undergoing conventional angiography were studied, and 30% were found to have emboli in only subsegmental or smaller pulmonary arteries [20]. As the methods stated, these were consecutive patients undergoing pulmonary angiography, not consecutive patients with suspected pulmonary embolism. In fact, Oser et al. [20] stated in the discussion of their publication that

... the vast majority of our patients had intermediate-probability lung scans; thus, the patients with a larger embolic burden, namely, those with high-probability scans, were potentially excluded. This selection bias is difficult to avoid in a retrospective series, as it reflects the hospital referral pattern.

With regard to CT and pulmonary embolism, in order to know the sensitivity of CT pulmonary angiography for pulmonary embolism in the general population of patients presenting with suspected pulmonary embolism, a prospective investigation of all patients with suspected pulmonary embolism is necessary, using a reference standard such as conventional pulmonary angiography. The goal should be to prospectively recruit all patients with suspected pulmonary embolism and have all patients undergo the test under evaluation— CT pulmonary angiography, and the reference test—conventional angiography. Consider the impact of retrospective selection of the sample on diagnostic accuracy in the following scenarios. If all patients undergoing both CT pulmonary angiography and conventional pulmonary angiography over the previous 2-year period formed the sample, the reasons that patients underwent both tests, and not just CT pulmonary angiography, impact sensitivity. If a large proportion of the conventional angiograms were obtained because of inconclusive findings or a technically poor CT pulmonary angiogram, then the sensitivity of CT pulmonary angiography will appear artificially low compared with sensitivity in the general population. If a normal CT pulmonary angiography is the predominant reason for obtaining conventional angiograms, the sensitivity of CT pulmonary angiography will again be low. In this case, the frequency of subsegmental emboli found at angiography will also be higher than would be found in the general population of patients with pulmonary embolism because patients with larger and more obvious emboli will not have undergone conventional angiography.

Which physicians accept and begin to use a new imaging test might also bias the results. For example, if physicians in the emergency department began using CT pulmonary angiography before most of the physicians taking care of inpatients, then the sensitivity of CT pulmonary angiography might be high, but would be biased by the type of patients that are seen in the emergency department, who in general might be healthier, younger, able to hold their breath better, or have less lung disease than hospitalized patients. On the other hand, if critical care medicine physicians accept CT pulmonary angiography earlier for intensive care unit patients, the sensitivity of CT pulmonary angiography might appear low because of the extensive parenchymal consolidation and pleural effusions that are often present in this population of patients who are often ventilator-dependent. In this way, the spectrum of disease or the case mix in the sample impacts the measured accuracy of the diagnostic test in question. This point reinforces the need to thoroughly describe the patient population studied.

Retrospective studies also suffer from recall bias. Suppose an investigator wants to determine the severity of dyspnea in patients with suspected pulmonary embolism, hypothesizing that patients with more severe dyspnea have a higher frequency of pulmonary embolism than patients with lesser degrees of dyspnea or no dyspnea at all. The investigator might be approaching this as a way to evaluate the likelihood of a patient's having pulmonary embolism and thus to triage patients to a diagnostic test within 1 hr versus within 4-6 hr, given the available imaging facilities. If an investigator questions all patients evaluated over the past year for suspected pulmonary embolism about their dyspnea, it is likely that the patients who were diagnosed with pulmonary embolism and hospitalized for treatment will remember their dyspnea more vividly and rate it as more severe than patients not diagnosed with pulmonary embolism who were sent home. This would exaggerate the difference in reported dyspnea in the two groups, compared with what would be seen if all of the patients were asked about dyspnea before undergoing any diagnostic test for pulmonary embolism and would thereby increase the likelihood that the investigator's hypothesis would be proven correct on analysis.


Consecutive Versus Nonconsecutive Selection
Top
Introduction
Definition of Sample
Can a Disease Be...
Selection Bias and How...
Retrospective Versus Prospective...
Consecutive Versus...
Conclusion
References
 
If patients are selected in a nonconsecutive manner, they might be inherently different from a population of all patients who meet inclusion criteria for a study. Suppose that the strategy were to recruit only the first patient seen each day who met the inclusion criteria for the study. It is possible that patients who are able to come for a 7:00 A.M. clinic appointment are different from patients who come later in the day. Perhaps they are less sick, resulting in a bias toward milder disease. Suppose that the strategy were to recruit only those patients meeting the inclusion criteria who are seen Monday to Friday between 8:00 A.M. and 5:00 P.M. If the study were looking at lung cancer staging accuracy, there might be little, if any, bias. However, in other circumstances, the patients might be inherently different from patients presenting to the emergency department in the evening with the same symptom complex. For example, if the study involved suspected myocardial infarction, the patients coming to the emergency department in the evening after a day of work might have had chest pain all day long and sought medical attention hours after the onset of the acute event, whereas patients coming during the day might have had symptoms of shorter duration. Because the time from onset of symptoms is critical to outcome after a myocardial infarction, patients presenting during the day might have a better outcome than patients presenting at night, independent of any therapeutic intervention.

Reference Standard
The choice of a reference standard impacts measurements of test accuracy. In contrast to the ideal scenario for evaluating the accuracy of CT pulmonary angiography described in the previous section, a methods section might read: "All patients with pulmonary embolism confirmed at autopsy who had undergone CT pulmonary angiography formed the sample." In this case, the sensitivity of CT pulmonary angiography might be higher than in the general population because patients dying from pulmonary embolism might have larger emboli than patients not dying from pulmonary embolism.

Another problem is commonly referred to as "workup bias" [21]. Whenever the reference test is selectively applied only to patients with a positive result on the test in question—for example, only patients with a positive CT pulmonary angiography—the reported sensitivity of CT pulmonary angiography will be artificially high at 100%, whereas the specificity will be artificially low.

When a new technology is compared with accepted reference tests or gold standards, the accuracy of the reference test is often called into question [22,23,24,25,26,27]. In the example of CT pulmonary angiography, the validity of conventional pulmonary angiography has been questioned. Several studies have reported poor interobserver agreement as to the presence or absence of emboli in subsegmental pulmonary arteries on conventional angiography. The Prospective Investigation of Pulmonary Embolism Diagnosis investigators (PIOPED) [28] found only 66% agreement among observers for isolated subsegmental emboli, compared with 98% at the lobar level and 90% at the segmental artery level. Similarly, Diffin et al. [17] reported interobserver agreement of only 45% for isolated subsegmental emboli at conventional angiography. If observers cannot agree on the gold standard, how can the new test, CT pulmonary angiography, be compared with it? This problem might lead investigators to look for a new sample population and apply a new gold standard. To do so might require an animal study with autopsy confirmation as the reference standard. For CT pulmonary angiography, Baile et al. [27] did just that. To compare the accuracy of CT pulmonary angiography and conventional angiography, these investigators instilled colored methacrylate beads into the pulmonary artery circulation of pigs, with a methacrylate cast of the pulmonary arteries used as the reference standard. These researchers found no statistically significant difference in CT pulmonary angiography and conventional angiography for the detection of emboli. However, if conventional angiography were used as the reference standard to which 1-mm CT pulmonary angiography was compared, conventional angiography would, by definition as the reference test, be 100% sensitive with a 100% positive predictive value, whereas CT pulmonary angiography would be considered only 76% sensitive with a positive predictive value of only 86%. If the sensitivity of a test is in question, surrogate measurements might be used to support the value of a negative test, such as patient outcome. For CT pulmonary angiography, most investigators have looked at series of patients gathered retrospectively with negative findings for pulmonary embolism on CT pulmonary angiography, and looked at the incidence of pulmonary embolism over the next 3-12 months. These studies have shown that pulmonary embolism occurs with the same frequency after negative findings on CT pulmonary angiography as after negative findings on conventional angiography [29, 30].

Imaging-Based Selection
It is often convenient to select patients who have undergone an imaging test, or patients who are going to be sent for imaging, to form a sample. This is referred to as imaging-based selection. However, patients who undergo imaging might not be representative of all patients with a specific diagnosis or symptom. Consider describing the appearance of lung cancer on MR imaging. Investigators could generate a list of all patients at their facility who underwent thoracic MR imaging in the past or will be undergoing MR imaging over the next year, who have a diagnosis of lung cancer. A fairly high proportion of these patients will likely have masses that abut or invade the mediastinum. This does not mean that this proportion of all patients presenting with lung cancer have mediastinal invasion, because the patients undergoing MR imaging for lung cancer are usually preselected because of a suspicion of mediastinal invasion on CT, and therefore the high incidence should not be surprising. To know what the appearance of lung cancer is on MR imaging or to determine the accuracy with which MR imaging can detect lung cancer requires that all consecutive patients with a diagnosis of lung cancer over a specified period of time undergo MR imaging. Although this example might seen fairly obvious, the literature is full of examples in which this type of selection bias impacts study results, although the impact on the results might be less obvious than in the example and not initially apparent.

Generalizability
Who was studied impacts to whom the results can be applied. If all patients presenting with suspected pulmonary embolism undergo a diagnostic test, the results will be different than if only patients with acute right heart failure and suspected massive pulmonary embolism are studied, or if patients who have an inconclusive result from another diagnostic test, such as a ventilation—perfusion scan, are studied. Similarly, how the test performs on inpatients or intensive care unit patients might be different from how it performs in outpatients or patients presenting to an emergency department, who are less likely to have coexisting lung disease or abnormal chest radiographic findings. In selecting a population to study for an investigation, it is important to consider to whom the information derived from that investigation can be applied.

For example, recently the prevalence of isolated subsegmental pulmonary embolism has been debated as part of the question of how accurate CT pulmonary angiography needs to be for the detection of subsegmental pulmonary embolism. If isolated subsegmental pulmonary embolism rarely occurs, then the technology might not need to be accurate for vessels of this size. However, if isolated subsegmental emboli are commonly seen, then the technology might need to be accurate. In one study, isolated subsegmental pulmonary embolism was reported to occur in 36% of patients diagnosed with pulmonary embolism [19]. In another study, isolated subsegmental pulmonary embolism was reported to occur in only 6% of patients diagnosed with pulmonary embolism [18, 31]. Which more realistically represents a population of all patients with suspected pulmonary embolism? The former study was performed to prospectively compare helical CT with pulmonary angiography for the detection of pulmonary embolism in patients with an unresolved clinical and ventilation—perfusion scan diagnosis of pulmonary embolism. Patients with either a normal perfusion scan or a high-probability scan, the two groups for whom no pulmonary embolism and definite pulmonary embolism were diagnosed, and perhaps the easiest patients for CT to evaluate, were not studied with CT. Therefore, it is likely that 36% is an overestimate of the frequency with which isolated subsegmental pulmonary embolism occurs. The latter study was the PIOPED study [18, 28], in which patients with suspected pulmonary embolism were prospectively enrolled at multiple medical centers, and all patients underwent ventilation—perfusion scannning and conventional pulmonary angiography.

The results described by Goodman et al. [19] can be generalized only to patients with an unresolved clinical suspicion for pulmonary embolism after ventilation—perfusion scanning who underwent CT, as the title of that investigation states clearly. The results can also be generalized only to patients undergoing CT with the technique that was reported (5-mm collimation, pitch of 1:1, covering 12 cm of the thorax, and viewed on hard-copy film). Imaging technology rapidly evolves. Several researchers after Goodman et al. have reported on CT pulmonary angiography at 3-mm collimation [32,33,34]. The ability to perform multidetector CT pulmonary angiography using 1.25-mm collimation of the entire thorax is now possible, and interpretation on workstations has been shown to improve detection of pulmonary embolism compared with film-based interpretation [35]. However, the published literature lags behind what the technology of today is capable of. As investigators plan to study a new technology, they should consider ways to recruit a larger number of patients more quickly to answer the question they propose before the technology is outdated [36].

Several studies have reported the findings of pulmonary embolism detected incidentally on CT scans obtained for other reasons [13, 35, 37,38,39]. It would be incorrect to draw a conclusion that the anatomic distribution of pulmonary emboli in these patients is the same as in a population of patients presenting with clinical signs or symptoms of pulmonary embolism. In one series of nine patients, no incidentally detected emboli were seen beyond the segmental arteries [39]. This result does not mean that subsegmental pulmonary embolism does not occur as an incidental finding. The CT scans in this study might have been done with protocols used for general thoracic CT, rather than using a thin-section, rapid IV—contrast injection protocol CT, or the researchers may not have used a workstation for interpretation—both factors that improve the accuracy of CT pulmonary angiography for pulmonary embolism, particularly for small arteries.


Conclusion
Top
Introduction
Definition of Sample
Can a Disease Be...
Selection Bias and How...
Retrospective Versus Prospective...
Consecutive Versus...
Conclusion
References
 
This article has reviewed the current concepts of target population, study population, and sample. These terms need to be used appropriately in the design, execution, and reporting of clinical research in radiology. The article also has discussed considerations for the definition and selection of these entities. Other considerations, such as randomization, statistical power, and sample size, that are relevant specifically to the selection of sample, will be the subject of future articles in this series.


References
Top
Introduction
Definition of Sample
Can a Disease Be...
Selection Bias and How...
Retrospective Versus Prospective...
Consecutive Versus...
Conclusion
References
 

  1. Elston R, Johnson W. Populations, samples and study design. In: Essentials of biostatistics. 2nd ed. Philadelphia: Davis, 1994: 15-16
  2. Webb WR, Gatsonis C, Zerhouni EA, et al. CT and MR imaging in staging non-small cell bronchogenic carcinoma: report of the Radiologic Diagnostic Oncology Group. Radiology 1991;178:705 -713[Abstract/Free Full Text]
  3. Blackmore CC, Black WC, Jarvik JG, Langlotz CP. A critical synopsis of the diagnostic and screening radiology outcomes literature. Acad Radiol 1999;6[suppl 1]: S8-S18
  4. Hillman BJ. Research in radiology departments. Invest Radiol 1993;28[suppl 2]: S44-S48
  5. Applegate KE. Study design: pros and cons. In: 2000 annual meeting scientific session. Oak Brook, IL: Society of Health Services Research in Radiology, 2000
  6. Holman BL. The research that radiologists do: perspective based on a survey of the literature. Radiology 1990;176:329 -332[Abstract/Free Full Text]
  7. Green SB, Byar DP. Using observational data from registries to compare treatments: the fallacy of omnimetrics. Stat Med 1984;3:361 -373[Medline]
  8. Godwin JD, Webb WR, Gamsu G, Ovenfors CO. Computed tomography of pulmonary embolism. AJR 1980;135:691 -695[Abstract]
  9. Sinner WN. Computed tomography of pulmonary thromboembolism. Eur J Radiol 1982;2:8 -13[Medline]
  10. Ovenfors CO, Godwin JD, Brito AC. Diagnosis of peripheral pulmonary emboli by computed tomography in the living dog. Radiology 1981;141:519 -523[Abstract/Free Full Text]
  11. Cholankeril JV, Ketyer S, Ramamurti S, Millman AE. Pulmonary embolism demonstrated by computerized tomography. J Comput Assist Tomogr 1982;6:135 -139
  12. Breatnach E, Stanley RJ. CT diagnosis of segmental pulmonary artery embolus. J Comput Assist Tomogr 1984;8:762 -764[Medline]
  13. Allen BT, Day DL, Dehner LP. CT demonstration of asymptomatic pulmonary emboli after bone marrow transplantation: case report. Pediatr Radiol 1987;17:65 -67[Medline]
  14. Browner WS, Newman TB, Cummings SR. Designing a new study. III. Diagnostic tests. In: Hulley SB, Cummings SR, eds. Designing clinical research. Baltimore: Williams & Wilkins, 1988: 87-97
  15. Hulley SB, Gove S, Browner WS, Cummings SR. Choosing the study subjects: specification and sampling. In: Hulley SB, Cummings SR, eds. Designing clinical research. Baltimore: Williams & Wilkins, 1988: 10-30
  16. Remy-Jardin M, Remy J, Artaud D, Deschildre F, Duhamel A. Peripheral pulmonary arteries: optimization of the spiral CT acquisition protocol. Radiology 1997;204:157 -163[Abstract/Free Full Text]
  17. Diffin DC, Leyendecker JR, Johnson SP, Zucker RJ, Grebe PJ. Effect of anatomic distribution of pulmonary emboli on interobserver agreement in the interpretation of pulmonary angiography. AJR 1998;171:1085 -1089[Abstract/Free Full Text]
  18. Stein PD, Henry JW. Prevalence of acute pulmonary embolism in central and subsegmental pulmonary arteries and relation to probability interpretation of ventilation/perfusion lung scans. Chest 1997;111:1246 -1248[Abstract/Free Full Text]
  19. Goodman LR, Curtin JJ, Mewissen MW, et al. Detection of pulmonary embolism in patients with unresolved clinical and scintigraphic diagnosis: helical CT versus angiography. AJR 1995;164:1369 -1374[Abstract/Free Full Text]
  20. Oser RF, Zuckerman DA, Gutierrez FR, Brink JA. Anatomic distribution of pulmonary emboli at pulmonary angiography: implications for cross-sectional imaging. Radiology 1996;199:31 -35[Abstract/Free Full Text]
  21. Begg CB, McNeil BJ. Assessment of radiologic tests: control of bias and other design considerations. Radiology 1988;167:565 -569[Abstract/Free Full Text]
  22. Chugh SK. Stress echo training: need for a better gold standard—the invasive viewpoint. Eur Heart J 2000;21:859 -860[Free Full Text]
  23. Shah A, Wagner GS, Granger CB, et al. Prognostic implications of TIMI flow grade in the infarct related artery compared with continuous 12-lead ST-segment resolution analysis: reexamining the "gold standard" for myocardial reperfusion assessment. J Am Coll Cardiol 2000;35:666 -672[Abstract/Free Full Text]
  24. Koretz RL. Prospective randomized controlled trials: when the gold in the gold standard isn't pure. (commentary) J Parenter Enteral Nutr 2000;24:5 -6[Medline]
  25. Rolfe MW, Solomon DA. Lower extremity venography: still the gold standard. (editorial) Chest 1999;116:853 -854[Free Full Text]
  26. Kalodiki E, Nicolaides AN, Al-Kutoubi A, Cunningham DA, Mandalia S. How "gold" is the standard? interobservers' variation on venograms. Int Angiol 1998;17:83 -88[Medline]
  27. Baile EM, King GG, Muller NL, et al. Spiral computed tomography is comparable to angiography for the diagnosis of pulmonary embolism. Am J Respir Crit Care Med 2000;161:1010 -1015[Abstract/Free Full Text]
  28. The PIOPED Investigators. Value of the ventilation/perfusion scan in acute pulmonary embolism: results of the prospective investigation of pulmonary embolism diagnosis (PIOPED). JAMA 1990;263:2753 -2759[Abstract]
  29. Goodman LR, Lipchik RJ, Kuzo RS, Liu Y, McAuliffe TL, O'Brien DJ. Subsequent pulmonary embolism: risk after a negative helical CT pulmonary angiogram—prospective comparison with scintigraphy. Radiology 2000;215:535 -542[Abstract/Free Full Text]
  30. Garg K, Sieler H, Welsh CH, Johnston RJ, Russ PD. Clinical validity of helical CT being interpreted as negative for pulmonary embolism: implications for patient treatment. AJR 1999;172:1627 -1631[Abstract/Free Full Text]
  31. Worsley DF, Alavi A. Comprehensive analysis of the results of the PIOPED study: prospective investigation of pulmonary embolism diagnosis study. J Nucl Med 1995;36:2380 -2387[Abstract/Free Full Text]
  32. Garg K, Welsh CH, Feyerabend AJ, et al. Pulmonary embolism: diagnosis with spiral CT and ventilation-perfusion scanning—correlation with pulmonary angiographic results or clinical outcome. Radiology 1998;208:201 -208[Abstract/Free Full Text]
  33. Mayo JR, Remy-Jardin M, Muller NL, et al. Pulmonary embolism: prospective comparison of spiral CT with ventilation-perfusion scintigraphy. Radiology 1997;205:447 -452[Abstract/Free Full Text]
  34. Remy-Jardin M, Remy J, Deschildre F, et al. Diagnosis of pulmonary embolism with spiral CT: comparison with pulmonary angiography and scintigraphy. Radiology 1996;200:699 -706[Abstract/Free Full Text]
  35. Gosselin MV, Rubin GD, Leung AN, Huang J, Rizk NW. Unsuspected pulmonary embolism: prospective detection on routine helical CT scans. Radiology 1998;208:209 -215[Abstract/Free Full Text]
  36. Baum RA, Rutter CM, Sunshine JH, et al. Multicenter trial to evaluate vascular magnetic resonance angiography of the lower extremity: American College of Radiology Rapid Technology Assessment Group. JAMA 1995;274:875 -880[Abstract]
  37. Verschakelen JA, Vanwijck E, Bogaert J, Baert AL. Detection of unsuspected central pulmonary embolism with conventional contrast-enhanced CT. Radiology 1993;188:847 -850[Abstract/Free Full Text]
  38. Winston CB, Wechsler RJ, Salazar AM, Kurtz AB, Spirn PW. Incidental pulmonary emboli detected at helical CT: effect on patient care. Radiology 1996;201:23 -27[Abstract/Free Full Text]
  39. Romano WM, Cascade PN, Korobkin MT, Quint LE, Francis IR. Implications of unsuspected pulmonary embolism detected by computed tomography. Can Assoc Radiol J 1995;46:363 -367[Medline]

Add to CiteULike CiteULike   Add to Complore Complore   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati    What's this?


This article has been cited by other articles:


Home page
RadiologyHome page
G. T. Sica
Bias in Research Studies
Radiology, March 1, 2006; 238(3): 780 - 789.
[Abstract] [Full Text] [PDF]


Home page
Am. J. Roentgenol.Home page
C. C. Blackmore and P. Cummings
Observational Studies in Radiology
Am. J. Roentgenol., November 1, 2004; 183(5): 1203 - 1208.
[Full Text] [PDF]


This Article
Right arrow Figures Only
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Kazerooni, E. A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Kazerooni, E. A.
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS