AJR 2005; 185:194-198
© American Roentgen Ray Society
Performance and Reproducibility of a Computerized Mass Detection Scheme for Digitized Mammography Using Rotated and Resampled Images: An Assessment
Bin Zheng,
Glenn S. Maitz,
Marie A. Ganott,
Gordon Abrams,
Joseph K. Leader and
David Gur
Department of Radiology, University of Pittsburgh, 300 Halket St., Ste.
4200, Pittsburgh, PA 15213-3180.
Received July 26, 2004;
accepted after revision October 1, 2004.
Address correspondence to B. Zheng
(zhengb{at}upmc.edu).
Abstract
OBJECTIVE. Our objective was to compare the performance and
reproducibility of a computer-aided detection (CAD) scheme that uses multiple
rotated and resampled images with an in-house-developed CAD scheme
(single-image-based) and a commercial CAD product in detecting masses depicted
on digitized mammograms.
MATERIALS AND METHODS. Ninety-two film mammograms (acquired from 23
patients) were selected. Forty-four mass regions associated with malignancy
were visually identified. A commercial CAD system was used to scan and process
each image four times, for a total of 368 digitized images depicting 176 mass
regions. Images were processed using two CAD schemes developed in our
laboratory. One uses the detection results generated from a single image, and
the other averages five detection scores generated after processing the
originally digitized image and four slightly rotated and resampled images. A
region-based analysis was used to compare reproducibility and performance
levels among the two in-house schemes and the commercial system.
RESULTS. The commercial system detected a total of 98 mass regions
(55.7% sensitivity) and 136 false-positive regions (an average of 0.37 per
image). Among the detected mass regions, 76 represented 19 regions that were
detected on all four scans and 22 represented 10 regions that were not fully
reproducible. Eighty-eight false-positive detections represented 22
reproducible detections on all four scans. Our single-image-based scheme
identified 87 mass regions and 160 false-positive regions. Seventeen mass
regions and 28 false-positive regions were detected on all four scans. The
multiple-image-based scheme identified 98 mass regions and 132 false-positive
regions. Twenty-three mass regions were detected on all four scans. One
hundred twelve of the 132 false-positive regions represented 28 reproducible
detections.
CONCLUSION. Averaging detection scores from multiple rotated and
resampled images generated from a single digitization of a film can reduce
variations in detection scores. Our multiple-image-based scheme improved both
performance and reproducibility over the single-image-based scheme. The
multiple-image-based scheme yielded an overall performance comparable to that
of the commercial system but with improved reproducibility.
Introduction
Computer-aided detection (CAD) systems are routinely used in many
medical institutions around the world. Radiologists' confidence in CAD results
is one of the most important factors in determining whether the use of these
systems actually improves diagnostic performance
[1,
2]. Several studies have
suggested that both the performance levels and the reproducibility of these
systems can affect radiologists' confidence in and reliance on CAD results
[3-7].
The reproducibility of CAD schemes rarely has been reported, partially because
a comprehensive assessment of reproducibility is tedious and difficult,
requiring repeated digitization of a large number of film mammograms.
Otherwise, the results may be unreliable
[6]. In addition, the original
mammograms are not always available to the researcher for this purpose. The
reproducibility of true-positive findings (i.e., actual masses and
microcalcification clusters) is generally substantially better than that of
false-positive findings, suggesting that averaging detection results
(CAD-generated likelihood scores for positive findings) obtained from
repeatedly digitized films may improve reproducibility and overall performance
(i.e., may increase sensitivity or decrease false-positive detection rate)
[3,
8].
Although optical and electronic noise can change the pixel value
distribution of a digitized image, it is believed that small shifts in film
positioning between digitizations are also important factors affecting pixel
values, resulting in poor reproducibility
[6]. That effect is independent
of the specific digitizer being used. In an attempt to improve the
reproducibility and performance of an in-house-developed CAD scheme, we
developed a method that generates multiple images from a single digitization
of a film using small rotations and interpolations (resampling). This approach
intends to simulate small shifts in film positioning during repeated
digitizations [9].
In this study, we digitized a set of mammograms four times and used these
digitized images to evaluate the reproducibility and performance of a
multiple-image-based CAD scheme that averages scores detected from five
matched regions depicted on the original digitized image and four rotated and
resampled images. The results are compared with those for a single-image-based
scheme previously developed in our laboratory and a commercial CAD system.
Materials and Methods
Twenty-three four-view mammographic examinations were selected for the
study. Each examination depicted a visible mass. Biopsy reports confirmed that
all depicted masses were associated with cancer. Twenty-one of the 23 masses
were visible in both craniocaudal and mediolateral oblique views, and two were
visible in only the mediolateral oblique view because the masses were too
interior to be imaged on the craniocaudal view. Hence, 44 mass regions were
depicted in this data set. The locations of these mass regions were marked on
the appropriate images by an experienced radiologist aided by latest and prior
images and all relevant radiology and pathology reports. Each of the 92
original films was scanned (digitized) and processed four times using a
commercial CAD system (SecondLook, CADx Medical Systems; software version 6.0,
iCAD). CAD results were saved, and the digitized images were transferred to a
server in our laboratory. Figure
1 shows the distribution of measured effective sizes and contrast
levels for the true-positive mass regions. Effective size was defined as the
square root of the product of the longest and shortest axes across the
depicted mass region [10], and
contrast was defined as the difference between the mean pixel values inside
the mass region and the surrounding background
[11].
Simulation of repeated digitization using rotation and resampling of one
digitized image is based on the hypothesis that small shifts in film
positioning substantially contribute to poor reproducibility of CAD schemes
[6]. The algorithm for rotating
and resampling images has been described else-where
[9]. In brief, each digitized
image is first subsampled by averaging digital values. Each subsampled image
is then automatically cropped to remove the majority of background pixels
while retaining the entire area of breast tissue in the image. The resulting
image (M columns x N rows) is then rotated slightly
four times with rotation angles of
= ± 0.4° and ±
0.8°. The rotation center is located outside the image at (M -
573, N/2), where the origin (0, 0) of the coordinate system is at the
top left corner of the image. During each rotation, the center pixel at the
right edge of the image (M, N/2) is shifted by four pixels in the
vertical direction, which represented a maximum linear displacement of 1.6 mm
over the entire image. After rotation, the digital value of each pixel is
resampled (interpolated) on the basis of four partially covered pixels in the
initial digitized image:
 |
where Si,j is a coverage ratio of the partial area (0
Si,j
1 and S1,1 +
S1,2 + S2,1 + S2,2
= 1), I (Ii,j) is the digital
value of a pixel (i,j) in the original digitized image, and
I' (I'x,y) is the
digital value of a pixel (x,y) in the rotated and resampled image. In
this manner, we generate from a single digitized image four images, each with
a slightly different pixel value distribution.
In this study, we compared the reproducibility and performance of three CAD
schemes. The first scheme was that currently used in a commercial system
(SecondLook, software version 6.0); the second scheme was a single-image-based
scheme developed in our laboratory
[12]. This scheme uses three
stages to identify and classify suggestive mass regions. First, we use image
subtraction after processing by two Gaussian filters with a large difference
in the kernel sizes, followed by thresholding to identify between 10 and 30
suggestive regions per image. Second, an adaptive region growth algorithm
defines three topographic layers for each region on the basis of local
contrast measurement. Simple intralayer rules on growth ratio and shape factor
are used to eliminate as many as 75-85% of the identified regions. Third, a
set of features is also computed for each region, and the features are used as
input values in an artificial neural network. The region is classified as
positive or negative on the basis of the region-specific artificial neural
network-generated detection score. The third scheme was a multiple-image-based
scheme that uses the average of five detection scores for all matched regions
after application of the second scheme to the originally digitized image and
four rotated and resampled images. The region-matching criterion is as
follows. If the distance between the centers of gravity of two regions is
smaller than the maximum radial length of either of the two regions in
question, they are considered to be matched
[9]. The radial length is the
computed distance from the region center of gravity to a pixel on the boundary
(contour) [13]. If a matched
region is not identified, a zero score is assigned to the region for this
image. This matching and scoring is done automatically, without human
intervention.
The three schemes were applied to all 368 images in an attempt to detect
the 176 depicted mass regions. The overall performance and reproducibility of
the three schemes were compared. A region-based analysis was performed, in
which each depicted mass region was considered an independent observation.
Similar to the commercial CAD product, one predetermined and fixed threshold
was used in both in-house schemes. Suggestive regions with detection scores
greater than this threshold were considered to be positive; otherwise, the
regions were discarded. The same threshold (0.55) was used in this study as in
a previous study to compare the performance of two commercial CAD systems and
our single-image-based scheme
[14].
Results
Figure 2 compares the
reproducibility of true-positive detections of mass regions for the commercial
CAD system, our single-image-based scheme, and our multiple-image-based
scheme. A reproducible detection was defined as a mass region that was either
detected or missed on all four scans. If a region was detected on only one,
two, or three scans, it was considered nonreproducible. Of the 44 depicted
mass regions, the commercial system generated 34 reproducible detections
(including 19 that were actually detected and 15 that were totally missed).
Using our single-image-based scheme, 35 mass regions were reproducible
(including 17 that were actually detected and 18 that were totally missed).
The multiple-image-based scheme generated 41 reproducible detections
(including 23 that were actually detected and 18 that were totally missed).
Hence, using the multiple-image-based scheme, the nonreproducible detections
of true-positive findings were reduced by 67% (from 9 to 3) when compared with
the single-image-based scheme.
Figure 3 compares the
reproducibility of false-positive detections for the 368 images. The
commercial CAD system detected a total of 136 false-positive regions
representing 47 independent regions (different locations). Twenty-two (46.8%)
of these 47 regions were reproducible. The single-image-based scheme detected
a total of 160 false-positive regions in 63 different locations. Twenty-eight
(44.4%) of these 63 regions were reproducible. Twenty-five (71.4%) of 35
nonreproducible regions were detected only once in four scans. The
multiple-image-based scheme detected a total of 132 false-positive regions in
38 different locations. Twenty-eight (73.7%) of these regions were
reproducible.
Tables 1,
2,
3 compare the region-based
performance levels among the three CAD schemes.
Table 1 summarizes the total
number of true- and false-positive mass regions detected in each of the four
digitizations of the 92 images by these three schemes. It also summarizes the
performance levels of these schemes in detecting the 176 mass regions depicted
on all 368 digitized images. Compared with the single-image-based scheme, the
multiple-image-based scheme detected 12 additional mass regions (from 87 to
98) and reduced false-positive detections by 28 (from 160 to 132). Compared
with the multiple-image-based scheme, the commercial CAD system detected the
same number of mass regions and four additional false-positive regions. Tables
2 and
3 show that the
multiple-image-based scheme yielded higher reproducibility than did either the
commercial CAD system or the single-image-based scheme. The
multiple-image-based scheme detected four more reproducible mass regions than
did the commercial CAD system (23 vs 19). The multiple-image-based scheme also
generated six more reproducible false-positive detections than did the
commercial CAD system (28 vs 22). As a result, the multiple-image-based scheme
detected fewer independent regions than did either the commercial CAD system
or the single-image-based scheme (Table
3).
View this table:
[in this window]
[in a new window]
|
TABLE 2 : Number of True-and False-Positive Regions Detected All Four Times When
the Three CAD Schemes Were Applied to 92 Images
|
|
Only four repeated digitizations of each image were used in this study.
Therefore, we computed the largest variation among the detection scores for
the 26 true-positive regions that were detected at least once (in four scans)
by both the single-image-based and the multiple-image-based CAD schemes
(Table 3). The average
variations in scores for regions detected by the single-image-based and the
multiple-image-based schemes were 0.088 and 0.042, respectively.
Discussion
Pixel value variations in repeated digitizations of film mammograms are
caused mainly by slight shifts in film positioning and noise from the optical
and electronic components of the digitizer
[6]. These variations may
affect computed feature values of suspected regions, potentially resulting in
differences in the detection scores generated by CAD schemes. Current CAD
schemes use a binary threshold to determine which region will be prompted
(detected) or not prompted (discarded); hence, regions with detection scores
close to this threshold (e.g., 0.55) could be detected in one scan and missed
in another. One approach to improve the reproducibility of detection is to
reduce possible variations in the detection scores. In this study, a 52%
reduction (from 0.088 to 0.042) in the variation of detection scores was
achieved. We demonstrated a practical approach in which a set of rotated and
resampled images could replace the tedious task of repeating digitization of
the same films to assess the reproducibility of CAD schemes. What is perhaps
just as important is that previous studies indicated that false-positive
detections were substantially less reproducible than true-positive detections
[3,
8]. Therefore, using the
average of detection scores generated from one digitized image and four
corresponding rotated and resampled images, we could substantially improve the
overall performance of our own CAD scheme (i.e., a 12.6% increase in
sensitivity and a 17.5% reduction in the false-positive detection rate).
Assessment of the performance of a CAD scheme (i.e., sensitivity and
false-positive rate) and assessment of its reproducibility are two different
tasks. We noted that the number of true- and false-positive regions detected
by the different schemes could be comparable but that the actual regions
(locations) might be different. For example, the number of true- and
false-positive regions detected by the commercial scheme was comparable to
that detected by our multiple-image-based scheme. However, the two schemes
demonstrated substantially different reproducibilities. Sixty-six percent
(19/29) of the true-positive regions and 47% (22/47) of the false-positive
regions were detected during all four digitizations by the commercial system,
whereas 89% (23/26) of the true-positive regions and 74% (28/38) of the
false-positive regions were detected four times by the multiple-image-based
scheme.
The effective size and contrast levels of the mass regions, combined with
the region-based sensitivity (55.7%) we obtained with the commercial system,
suggest that our data set was not particularly difficult. Similar results
(ranging from 52% to 56%) were found in two previous studies testing the
performance of two leading commercial CAD systems
[5,
14]. Hence, our relatively
small data set is quite representative of the distribution of cases we
sequentially ascertained from a large screening population
[14].
The images in this preliminary study were digitized using a Multi-RAD 861
(Howtek Devices). Our previous tests indicated that this digitizer produces
higher noise levels than do other digitizers used in other CAD systems.
Because the pixel value variations in repeatedly digitized images are
generated by both the small shifts of film positioning and the inherent noise
of the digitizer, the improvements in performance with this method may vary
for different digitizers and we expect that this method would perform
some-what better for noisier images. Additional data are needed in this
regard.
Using a desktop PC with a 1.8-GHz central processing unit (Athlon XP 2200,
AMD) and 512 MB of random-access memory, our CAD scheme (for which the source
codes have not been optimized for the computational speed) takes approximately
8-10 sec to complete the process on a set of five images (including the time
required to rotate and resample the original digitized image four times).
Compared with the time required to digitize film mammograms using current
commercial CAD systems (approximately 45 sec to 1 min per image), the increase
in computing time should have little impact on the overall efficiency of the
CAD system.
Our measured reproducibility for the commercial system was comparable to
that reported for another commercial system
[3,
6,
8] despite the use of different
types of digitizers and algorithms. We therefore believe that, although tested
on images digitized by one commercial system, the concept of averaging
detection scores of matched regions as depicted in a set of rotated and
resampled images should be applicable to other CAD schemes that use digitized
mammograms to identify suggestive regions based on features that are at least
partially derived from the local pixel value distribution.
Acknowledgments
This work was supported in part by grants CA77850 and CA101733 to the
University of Pittsburgh from the National Cancer Institute, National
Institutes of Health.
References
- D'Orsi CJ. Computer-aided detection: there is no free lunch.
Radiology2001; 221:585
-586[Free Full Text]
- Guenin MA. How not to assess computer-aided detection for
mammography. (letter) AJR2004; 182:1599[Free Full Text]
- Malich A, Azhari T, Bohm T, Fleck M, Kaiser WA. Reproducibility: an
important factor determining the quality of computer aided detection (CAD)
systems. Eur J Radiol2000; 36:170
-174[CrossRef][Medline]
- Moberg K, Bjurstam N, Wilczek B, Rostgard L, Egge E, Muren C.
Computer assisted detection of interval breast cancers. Eur J
Radiol 2001;39:104
-110[CrossRef][Medline]
- Zheng B, Ganott MA, Britton CA, et al. Soft-copy mammographic
readings with different computer-assisted detection cueing environments:
preliminary findings. Radiology2001; 221:633
-640[Abstract/Free Full Text]
- Taylor CG, Champness J, Reddy M, Taylor P, Potts HW, Given-Wilson
R. Reproducibility of prompts in computer-aided detection (CAD) of breast
cancers. Clin Radiol2003; 58:733
-738[CrossRef][Medline]
- Gur D, Sumkin JH, Rockette HE, et al. Changes in breast cancer
detection and mammography recall rates after the introduction of a
computer-aided detection system. J Natl Cancer Inst2004; 96:185
-190[Abstract/Free Full Text]
- Zheng B, Hardesty LA, Poller WR, Sumkin JH, Golla S. Mammography
with computer-aided detection: reproducibility assessmentinitial
experience. Radiology2003; 228:58
-62[Abstract/Free Full Text]
- Zheng B, Gur D, Good WF, Hardesty LA. A method to test the
reproducibility and to improve performance of computer-aided detection schemes
for digitized mammograms. Med Phys2004; 31:2964
-2972[Medline]
- Nishikawa RM, Giger ML, Doi K, et al. Effect of case selection on
the performance of computer-aided detection schemes, Med
Phys 1994;21:265
-269[CrossRef][Medline]
- Zheng B, Chang YH, Gur D. On the reporting of mass contrast in CAD
research. Med Phys1996; 23:2007
-2009[Medline]
- Zheng B, Sumkin JH, Good WF, Maitz GS, Chang YH, Gur D. Applying
computer-assisted detection schemes to digitized mammograms after JPEG data
compression: an assessment. Acad Radiol2000; 7:595
-602[CrossRef][Medline]
- Li L, Zheng Y, Zhang L, Clark RA. False-positive reduction in CAD
mass detection using a competitive classification strategy. Med
Phys 2001;28:250
-258[Medline]
- Gur D, Stalder JS, Hardesty LA, et al. Computer-aided detection
performance in mammographic examination of masses: assessment.
Radiology2004; 233:418
-423.[Abstract/Free Full Text]

CiteULike
Complore
Connotea
Del.icio.us
Digg
Reddit
Technorati What's this?