|
|
||||||||
1
Department of Radiology, Johns Hopkins University School of Medicine, 600 N.
Wolfe St., Baltimore, MD 21287.
2
Department of Emergency Medicine, Johns Hopkins University School of Medicine,
Baltimore, MD 21287.
3
Present address: Department of Bioengineering, University of Pittsburgh School
of Medicine, 3550 Terrace St., Pittsburgh, PA 15213.
4
Present address: Department of Orthopedics, University of Medicine and
Dentistry of New Jersey, New Jersey Medical School, 185 S. Orange Ave.,
Newark, NJ 07103.
Received March 2, 2000;
accepted after revision April 3, 2000.
J. Eng is a fellow of the General Electric-Association of University
Radiologists Radiology Research Academic Fellowship.
Abstract
|
|
|---|
MATERIALS AND METHODS. A sample of four faculty emergency medicine physicians, four emergency medicine residents, four faculty radiologists, and four radiology residents participated in our study. Each physician interpreted 120 radiographs, approximately half containing a clinically important index finding. Radiographs were interpreted using the original films and high-resolution digital monitors. Accuracy of radiograph interpretation was measured as the area under the physicians' receiver operating characteristic (ROC) curves.
RESULTS. The area under the ROC curve was 0.15 (95% confidence interval [CI], 0.10-0.20) greater for radiologists than for emergency medicine physicians, 0.07 (95% CI, 0.02-0.12) greater for faculty than for residents, and 0.07 (95% CI, 0.02-0.12) greater for films than for video monitors. Using these results, we estimated that teleradiology coverage by faculty radiologists would add 0.09 (95% CI, 0.03-0.15) to the area under the ROC curve for radiograph interpretation by emergency medicine faculty alone, and radiology resident coverage would add 0.08 (95% CI, 0.02-0.14) to this area.
CONCLUSION. We observed significant differences between the interpretation of radiographs on film and on digital monitors. However, we observed differences of equal or greater magnitude associated with the training level and physician specialty of each observer. In evaluating teleradiology services, observer characteristics must be considered in addition to the quality of image display.
|
|
|---|
Several strategies are available to reduce the time separation between medical decision making in the emergency department and radiograph interpretation by a radiologist: full-time, on-site coverage of the emergency department by a staff radiologist; coverage of the emergency department with teleradiology; coverage of the emergency department by radiology house staff during off-hours; or elimination of radiologists' overinterpreting of emergency department radiographs. In exploring the implications of these options, it is necessary to determine the potential differences in the accuracy of radiograph interpretation associated with each option. The differences between these options involve differences in image display, physician training, and physician specialty. Therefore, we conducted a study to compare the accuracy of radiograph interpretation associated with three factors: interpretation by viewing conventional film versus viewing a digital teleradiology video display; interpretation by a faculty physician versus a house staff physician; and interpretation by an emergency medicine physician versus a radiologist. In particular, we focused on the relative strength of these three factors in influencing the accuracy of interpretation.
|
|
|---|
Each positive case was chosen either because it contained a finding originally overlooked by a physician or because it contained a finding that tends to be overlooked by inexperienced observers. The positive cases each contained one index finding of significant clinical consequence, such as a fracture, pneumothorax, lung mass, pulmonary infiltrate, pneumoperitoneum, or small-bowel obstruction. Of the positive cases, 51% (31/61) contained an index finding that required immediate action by a physician (e.g., a pneumothorax). The true diagnosis of each case was confirmed by a consensus panel of two emergency medicine physicians and two radiologists for a preliminary qualitative study of different observers [1]. Diagnostically difficult cases and an approximately equal distribution of positive and negative cases were chosen to optimize the discriminatory power of the subsequent observer comparisons. In all positive cases, the index finding was directly visible, and correct diagnosis did not solely depend on secondary findings. For example, the fracture line was visible on all fracture radiographs; no fracture case relied only on associated findings such as fat pad displacement to make the diagnosis. Similarly, pneumothoraces were all directly visible, and their diagnosis did not rely only on the presence of subcutaneous air or acute rib fractures.
Physicians
A sample of 16 physician volunteers participated as image interpreters:
four faculty emergency medicine physicians, four emergency medicine residents,
four faculty radiologists, and four radiology residents. All physicians were
affiliated with our institution. All faculty physicians were board-certified
and actively practicing in their respective specialties (emergency medicine or
radiology). The emergency medicine faculty physicians possessed 2, 7, 10, and
12 years of faculty-level experience. The radiology faculty physicians
possessed 2, 3, 4, and 28 years of faculty-level experience. Therefore, the
combined experience of the two faculty physician groups was similar. The
clinical practice of each faculty radiologist involved a significant amount of
general radiology, including the supervision of radiology residents in the
emergency department.
The emergency medicine residents were all in the first half of their postgraduate year (PGY) 3, and the radiology residents were in the first half of PGY4. The PGYs were chosen so that each resident had completed half of their specialty training in either emergency medicine or radiology. At our institution, the first postgraduate year of both residency programs involves general internship training outside either specialty. Therefore, the start of PGY3 of the emergency medicine residency (requiring 3 PGYs to complete) and the start of PGY4 of the radiology residency (requiring 5 PGYs to complete) mark the midpoints of specialty training in emergency medicine and radiology, respectively. The radiology residency program at our institution includes a total of approximately 8 weeks of radiology training in the emergency department during PGYs 2 and 3 and a total of approximately 16 weeks of night call in a hospital emergency department setting during PGYs 3 and 4.
Radiograph Presentation
Radiographs were presented as either original films on two conventional 17
x 14 inch (43.2 x 35.6 cm) illuminated viewboxes or digitized
images on a computer workstation with two 20 x 16 inch (50.8 x
40.6 cm) high-resolution video monitors (Model DX 5000 Plus; Kodak Health
Imaging Systems, Richardson, TX) that were capable of displaying 2560 x
2048 pixels. To produce the digital images, the original radiographs were
scanned with a laser digitizer (Model DX 5000 Plus Model FD-2000; DuPont,
Wilmington, DE) with a spot size of 210 µm and a gray-scale resolution of
2048 levels. The digital images could be displayed on the video monitors at
full spatial resolution without the need for a computerized magnification
function. The resolution of the video monitors was 2.5 line pairs per
millimeter, and the luminance was 380 candela (cd)/m2. These
specifications meet the standards for primary teleradiology interpretation
developed by the American College of Radiology
[2]. By comparison,
conventional radiographic film has a typical resolution of 5-7 line pairs per
millimeter, and a conventional illuminated viewbox has a typical luminance of
2000-3000 cd/m2. (In comparing luminance specifications, it should
be noted that the visual perception of luminance is nonlinear and
approximately logarithmic.)
Interpretation Protocol
Each physician interpreted 60 of 120 original radiographs on illuminated
viewboxes during two sessions of 30 cases each. Each physician interpreted the
remaining 60 radiographs displayed on video monitors during two additional
sessions of 30 cases each. Case order was randomized for each session, and the
number of positive and negative cases was approximately equal for each
session. Each radiograph was viewed only once by each observer, either on film
or on monitor, but not on both. This procedure prevented recall of previous
findings. The observers were asked to interpret the images as they would in
their usual clinical practice. They were not asked to determine the presence
or absence of a list of specific findings. The observers were unaware of the
proportion of positive and negative cases. An appropriate but nonspecific
clinical history was provided for each case. The clinical information did not
indicate the type or location of abnormalities to be detected, but the
information was not deliberately misleading.
The observers were asked to interpret each radiograph by recording any and all clinically significant findings. The observers were also asked to assign confidence ratings of low, moderate, or high to each of their findings. When combined with the positive or negative classification of each radiograph, these confidence ratings provided six categories for subsequent receiver operating characteristic (ROC) analysis. The location of findings was considered. A positive radiograph was scored as correct only if the observer identified the presence of the index finding and specified its correct location. Identification of findings unrelated to the index finding (or potential index finding) did not affect the scoring of radiographs. Before the interpretation sessions, each observer was given an individual training session in which the recording of data on study forms and the use of the image display workstation were explained.
Data Analysis
Each observer interpreted each radiograph only once, interpreting half of
the images on films and half on video monitors. The resulting 16 independent
data sets were grouped into eight pairs to produce eight correlated data sets
representing pairs of observers who together had read every case in both
display methods. Each pair was matched according to physician specialty and
training level (e.g., each emergency medicine resident was matched with
another emergency medicine resident). The eight correlated data sets were
treated as the interpretations of eight pseudoobservers, each of whom had
interpreted all the radiographs with both display methods. The pairing of data
sets to form pseudoobservers simplifies subsequent analysis, but this
simplification assumes that the two observers composing each pseudoobserver
actually have the same performance. In statistical simulations, we confirmed
that this assumption is conservative. Any systematic difference between the
two observers composing a pseudoobserver decreases the amount of explained
variance and decreases the statistical significance of any associations found
between observer factors and the dependent variable (observer
performance).
In the statistical analysis, the area under the ROC curve was used as the primary measure of observer performance. The area under the ROC curve is a commonly used index of accuracy in medical imaging trials and is equivalent to the probability of correctly classifying a random pair of images in which one is positive and one is negative [3]. Possible values for the area under an ROC curve range from 0.50 (random guessing) to 1.00 (perfect diagnostic performance).
Using conventional ROC analysis, a separate ROC curve could be fitted for each of the eight pseudoobservers, and areas under these curves could be compared using a number of statistical techniques [4,5]. However, such comparisons would be statistically inefficient because they involve comparisons of single pairs of ROC curves and do not incorporate all the available data simultaneously. A more efficient approach is a multivariate analysis such as an analysis of variance (ANOVA), which allows simultaneous comparison of the multiple observer characteristics associated with each ROC curve, but usual multivariate analysis is not directly applicable to the confidence rating data collected for ROC analysis. The jackknife method developed by Dorfman et al. [6] provides the necessary bridge between ROC analysis and conventional ANOVA. Their method was extended for this study to include comparison of observer characteristics and image display method.
With eight pseudoobservers, 120 radiographs, and two display methods, the data set for this study contained 1920 observations. The jackknife method of Dorfman et al. [6] involved transforming this data into a corresponding data set of 1920 pseudovalues [6] using formulas established for the generalized jackknife method. To generate a pseudovalue from a real observation, the following procedure was used. First, the real observation was omitted from the data. This omission resulted in 119 real observations remaining for the pseudoobserver and display method corresponding to the real observation. Second, an ROC curve was calculated for the 119 real observations. A specially scaled form of the area under the ROC curve was recorded as the pseudovalue corresponding to the real observation. Third, the omitted real observation was placed back into the original data set. The entire procedure was repeated for each of the 1920 real observations.
The jackknife procedure resulted in a data set in which each pseudovalue could be treated as the area under the ROC curve associated with each real observation. These pseudovalues were then used as the dependent continuous variable in a conventional ANOVA with physician specialty, training level, and display method serving as independent categorical variables in a linear model [6]. The ANOVA was formulated to account for statistical correlation originating from the use of the same radiographs for all the observers. The software LABMRMC (version 1.55; Metz CE, University of Chicago, Chicago, IL) was modified to generate the pseudovalues, and ANOVA of the pseudovalues was performed with SAS software (version 6.12; SAS Institute, Cary, NC).
|
|
|---|
|
The trends suggested in Table 1 are also evident in the unfitted, empiric ROC curves [7] obtained from the confidence ratings for each pseudoobserver for each radiograph (Figs. 1 and 2); estimates for the area under each ROC curve are also given in Table 1. The empiric ROC curves suggest that for both film and monitor interpretation, the performance of the four observer groups as measured by the area under the ROC curve ranked as follows, from best to worst: radiology faculty, radiology residents, emergency department faculty, and emergency department residents. These qualitative differences in the ROC curves can be quantified using the ANOVA of jackknifed pseudovalues, which provides a method for pooling the ROC data across all observer subgroups (significantly increasing the statistical power) and making statistical comparisons involving each of the observer factors while adjusting for the other independent variables. The ANOVA results (Table 2) reveal statistically significant differences in observer performance, in which the area under the ROC curve for radiologists was 0.15 (95% CI, 0.10-0.20) greater than that of emergency department physicians, 0.07 (95% CI, 0.02-0.12) greater for the interpretation of film than for the interpretation of high-resolution digital images, and 0.07 (95% CI, 0.02-0.12) greater for faculty physicians than for residents. Therefore, the training level effect was approximately equal to the display method effect, and the physician specialty effect was approximately twice that of either of the other two effects.
|
|
|
No statistically significant variance was found among the pseudoobservers after adjustment for physician specialty, training level, and display method (p = 0.66 for observer variable in Table 2). Two-way ANOVA was performed among all independent variables to evaluate all possible two-way interactions, but none of the combinations was associated with a statistically significant effect on the dependent variable.
Using the estimated effects from the ANOVA, comparisons can be made among the different strategies for radiology coverage in the emergency department. Table 3 shows all pairwise comparisons between four coverage strategies: on-site staff radiologist, teleradiology, radiology resident, or emergency department physician only. Three of the comparisons in Table 3 examine trade-offs between competing physician factors. In the first trade-off comparison, representing teleradiology coverage, emergency department faculty physicians interpreting film (strategy 4) are compared with faculty radiologists interpreting images on video monitors (strategy 2). In this comparison, the performance of faculty radiologists was statistically significantly better than that of the emergency department faculty physicians (p = 0.007). In the second trade-off comparison, representing radiology resident coverage, radiology resident physicians (strategy 3) were found to have a statistically better performance than that of the emergency department faculty (strategy 4) after adjusting for display method (p = 0.01). In the third trade-off comparison, representing a comparison of the two preceding coverage strategies, no statistically significant difference (p = 0.8) was revealed between radiology faculty interpreting images on video monitors (strategy 2) and radiology residents interpreting film (strategy 3).
|
|
|
|---|
Studies of high-resolution image display methods have concentrated almost exclusively on comparing the display methods themselves, focusing on the comparison of digital images to radiographic film, predominantly for imaging the chest [8,9,10,11,12,13,14] and skelton [15, 16]. Such studies have reported either no differences or small statistically significant differences (on the order of 0.04) in the area under the ROC curve [8,9,10,11, 15] for interpreting radiographic film compared with high-resolution digital images.
The interest in examining the diagnostic accuracy of digital imaging is driven by the notion of using teleradiology to provide the expertise of radiologists to geographically remote sites around the clock. One commonly proposed application of teleradiology is in the emergency department, a setting that often requires rapid clinical decision making but in which a radiologist may not always be immediately available.
For the emergency department radiographs in our study, we observed that interpretation by a radiologist added significantly greater accuracy (as measured by area under the ROC curve) to that of an emergency medicine physician (0.15) (Table 2). In many practice settings, on-site coverage of the emergency department by a radiologist may be economically difficult, especially during times other than standard office hours. Teleradiology has been promoted as a solution to this economic difficulty by allowing one radiologist to cover more than one emergency department simultaneously. However, from our study's results, we estimate that digital teleradiology coverage would diminish the incremental diagnostic performance of a radiologist by approximately 50% (ROC area difference for specialty minus that of display method in Table 2, or comparison of strategies 2 versus 4 in Table 3), even when using equipment meeting the teleradiology standards set by the American College of Radiology. In the comparison of film and digital display methods, qualitatively similar differences were observed in a preliminary study [1], but the study used low-resolution digital equipment not meeting teleradiology standards. More important, the preliminary study could not quantify differences in accuracy between observer groups because appropriate statistical analysis methods were unavailable.
In many academic medical centers, radiology house staff provide initial radiology interpretation in the emergency department in lieu of radiology faculty. We estimate that radiology house staff coverage of the emergency department results in a performance improvement similar in amount to that of teleradiology coverage by a faculty radiologist (strategy 2 versus strategy 4 in Table 3 compared with strategy 3 versus strategy 4). Although we acknowledge that radiology house staff are always supervised by radiology faculty, emergency department physicians often make clinical decisions when only the house staff interpretation is available. Our results also suggest that teleradiology coverage by a faculty radiologist offers no better performance than that of a radiology resident viewing the original radiographs in the emergency department (Table 3, strategy 2 versus strategy 3).
Because the images selected for this study were purposely difficult to interpret, the estimates of accuracy obtained in this study are subject to a form of bias analogous to spectrum bias [17]. If the images had been easier to interpret, then it is likely that all observers would have been more accurate and that smaller differences in accuracy would have been observed between the pairs of physician specialties, training levels, and display methods examined in this study. Therefore, the true differences in accuracy associated with the physician characteristics examined in this study are likely to be lower in the general population of emergency department radiographs. In this study we are, in effect, purposely using a form of spectrum bias to magnify any differences in the observers. Because of this spectrum bias, our estimates of the absolute accuracy of the observer groups cannot be generalized to the general population of emergency department radiographs. However, we do not expect conclusions based on the relative differences in accuracy to be similarly affected. For example, we would still expect the physician specialty effect to be more important than either the training or display effect. If the images were too easy to interpret, then the skill of the observer would not be important, so it would become difficult even to differentiate a highly accomplished observer from a naive observer such as a medical student. The emphasis on difficult radiographs is also important because skill in their interpretation is part of the expertise patients seek when obtaining medical care.
Because there was a large proportion of positive radiographs in this study, the estimates of accuracy may be inflated because of context bias [18]. If positive radiographs were rarer in our study, as in the general population of radiographs, then we would expect to have a lower general observer accuracy. However, we do not expect context bias to affect the various observer groups differentially; therefore, conclusions based on differences in accuracy should not be affected. Our study's emphasis on differences and relative differences is justified as long as the focus is on comparing observer factors. Our objective was not to estimate the absolute accuracy of radiography in the diagnosis of conditions generally encountered in the emergency department.
This study was an evaluation of radiograph interpretation, so it was necessary to isolate this activity from other components of diagnostic decision making and from potential clinical outcomes. All interpretations were performed without the benefit of associated clinical information such as detailed history, physical examination, or laboratory results. Interpreting radiographs in the setting of detailed clinical information, as is routinely done in the emergency department, would raise the accuracy of the interpretations. Therefore, it is emphasized that the results of this study are applicable to the performance of radiograph interpretation alone and not to the overall accuracy of diagnosis of conditions. Additionally, this study's emphasis on diagnostic accuracy does not include the value of interactive consultation between the emergency medicine physician and the radiologist.
The observers in this study were given only conventional radiographs to interpret. Radiographs remain the most common imaging examinations, and they are usually interpreted by physicians who are not radiologists but who have enough confidence in their radiographic interpretations to guide clinical decision making. No examinations involving IV contrast material, CT, sonography, or MR imaging were used in this study. It is logical to expect that the differences between the interpretations of the radiologists and of the emergency department physicians would have been greater if these more complex examinations were included in the data set. In practice, emergency department physicians do not commonly act on their own interpretations of these examinations, so these types of images were not included in the data set.
Because none of the physicians in our study had any significant prior experience interpreting radiographs on a digital display, inexperience with the appearance of digital radiographs may be an important contributor to the lower accuracy observed with video monitors compared with film. As primary interpretation of emergency department radiographs from video monitors becomes more routine, a follow-up study of physicians experienced in soft-copy interpretation is needed to determine the importance of soft-copy experience relative to issues of image quality, such as the lower brightness and resolution of video monitors compared with illuminated viewboxes. The loss in accuracy associated with video monitors in this study may also have been less if computed radiography had been used instead of film digitization. A follow-up study using images acquired directly with computed radiography would be needed to detect the potential loss of accuracy caused by imperfections of film digitization compared with computed radiography.
The accuracy of radiograph interpretation is the main outcome being considered in this study. Although we believe all radiologic examinations should be interpreted with the highest available accuracy, accuracy itself is not a true patient outcome. It is also important to consider whether the accurate interpretation of medical images results in improved patient outcome; however, this question is beyond the scope of our work.
The nonprobabilistic sampling of physicians is a major potential limitation to the external validity of this study. Although all faculty physicians were board-certified and all house staff physicians were participating in accredited residency programs, such certifications may be insensitive to moderate variations in interpretive skill. Although we are unable to prove that our sample of physicians is representative of all academic institutions, we do not expect that differences among academic institutions would affect the relative differences in interpretive accuracy observed in this study. However, in the community setting, differences between emergency medicine physicians and radiologists may be smaller than we observed because emergency medicine physicians in these settings may have more experience interpreting radiographs without immediate radiology consultation, which is often readily available in academic centers.
In conclusion, we observed statistically significant differences between digital display and conventional film presentation of radiographs in terms of interpretive performance, even when the comparison involved equipment generally thought to be appropriate for digital teleradiology. However, we found differences of equal or greater magnitude associated with the observer's training level and physician specialty. Therefore, in the evaluation of emergency department radiograph interpretation, observer factors must be considered of equal or greater significance than that of the quality of image display.
|
|
|---|
This article has been cited by other articles:
![]() |
References J. ICRU, April 1, 2008; 8(1): 57 - 62. [PDF] |
||||
![]() |
N. Lester, T. Durazzo, A. Kaye, M. Ahl, and H. P. Forman Referring Physicians' Attitudes Toward International Interpretation of Teleradiology Images Am. J. Roentgenol., January 1, 2007; 188(1): W1 - W8. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Samei AAPM/RSNA Physics Tutorial for Residents: Technological and Psychophysical Considerations for Digital Mammographic Displays RadioGraphics, March 1, 2005; 25(2): 491 - 501. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Monnier-Cholley, F. Carrat, B. P. Cholley, J.-M. Tubiana, and L. Arrive Detection of Lung Cancer on Radiographs: Receiver Operating Characteristic Analyses of Radiologists', Pulmonologists', and Anesthesiologists' Performance Radiology, December 1, 2004; 233(3): 799 - 805. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Kalyanpur, V. P. Neklesa, D. T. Pham, H. P. Forman, S. T. Stein, and J. A. Brink Implementation of an International Teleradiology Staffing Model Radiology, August 1, 2004; 232(2): 415 - 419. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. D. Saketkhoo, M. Bhargavan, J. H. Sunshine, and H. P. Forman Emergency Department Image Interpretation Services at Private Community Hospitals Radiology, April 1, 2004; 231(1): 190 - 197. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. L. Posner and P. R. Freund Resident Training Level and Quality of Anesthesia Care in a University Hospital Anesth. Analg., February 1, 2004; 98(2): 437 - 442. [Abstract] [Full Text] [PDF] |
||||
![]() |
K R Flaherty, E L Thwaite, E A Kazerooni, B H Gross, G B Toews, T V Colby, W D Travis, J A Mumford, S Murray, A Flint, et al. Radiological versus histological diagnosis in UIP and NSIP: survival implications Thorax, February 1, 2003; 58(2): 143 - 148. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. E. Kouri, R. G. Parsons, and H. R. Alpert Physician Self-Referral for Diagnostic Imaging: Review of the Empiric Literature Am. J. Roentgenol., October 1, 2002; 179(4): 843 - 850. [Full Text] [PDF] |
||||
![]() |
L. S. Medina, R. R. Richardson, and K. Crone Children with Suspected Craniosynostosis: A Cost-Effectiveness Analysis of Diagnostic Strategies Am. J. Roentgenol., July 1, 2002; 179(1): 215 - 221. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. L. Kundel, M. Polansky, M. K. Dalinka, R. H. Choplin, W. B. Gefter, J. B. Kneelend, W. T. Miller Sr., and W. T. Miller Jr. Reliability of Soft-Copy Versus Hard-Copy Interpretation of Emergency Department Radiographs: A Prototype Study Am. J. Roentgenol., September 1, 2001; 177(3): 525 - 528. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. F. Rogers Heeding the Call: Radiologists in the ED (Emergency Department) Am. J. Roentgenol., November 1, 2000; 175(5): 1213 - 1213. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |