AJR Women's Imaging Online
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Zheng, B.
Right arrow Articles by Gur, D.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Zheng, B.
Right arrow Articles by Gur, D.
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?
AJR 2004; 182:579-583
© American Roentgen Ray Society


Computer-Aided Detection Schemes: The Effect of Limiting the Number of Cued Regions in Each Case

Bin Zheng1, Joseph K. Leader, Gordon Abrams, Betty Shindel, Victor Catullo, Walter F. Good and David Gur

1 All authors: Department of Radiology, Imaging Research, Magee-Women's Hospital, University of Pittsburgh, 300 Halket St., Ste. 4200, Pittsburgh, PA 15213-3180.

Received May 7, 2003; accepted after revision September 11, 2003.

 
Address correspondence to B. Zheng (zhengb{at}msx.upmc.edu).

The information contained in this article does not necessarily reflect the position or the policy of the United States government, and no official endorsement should be inferred.

Supported in part by grants CA85241, CA77850, and CA80836 from the National Cancer Institute of the National Institutes of Health, and by contract DAMD17-00-1-0410 from the United States Army Medical Research Acquisition Center at Fort Detrick, MD.


Abstract
Top
Abstract
Introduction
Materials and Methods
Results
Discussion
References
 
OBJECTIVE. We assessed performance changes of a mammographic computer-aided detection scheme when we restricted the maximum number of regions that could be identified (cued) as showing positive findings in each case.

MATERIALS AND METHODS. A computer-aided detection scheme was applied to 500 cases (or 2,000 images), including 300 cases in which mammograms showed verified malignant masses. We evaluated the overall case-based performance of the scheme using a free-response receiver operating characteristic approach, and we measured detection sensitivity at a fixed false-positive detection rate of 0.4 per image after gradually reducing the maximum number of cued regions allowed for each case from seven to one.

RESULTS. The original computer-aided detection scheme achieved a maximum case-based sensitivity of 97% at 3.3 false-positive detected regions per image. For a detection decision score set at 0.565, the scheme had a 79% (237/300) case-based sensitivity, with 0.4 false-positive detected regions per image. After limiting the number of maximum allowed cued regions per case, the false-positive rates decreased faster than the true-positive rates. At a maximum of two cued regions per case, the false-positive rate decreased from 0.4 to 0.21 per image, whereas detection sensitivity decreased from 237 to 220 masses. To maintain sensitivity at 79%, we reduced the detection decision score to as low as 0.36, which resulted in a reduction of false-positive detected regions from 0.4 to 0.3 per image and a reduction in region-based sensitivity from 66.1% to 61.4%.

CONCLUSION. Limiting the maximum number of cued regions per case can improve the overall case-based performance of computer-aided detection schemes in mammography.


Introduction
Top
Abstract
Introduction
Materials and Methods
Results
Discussion
References
 
Computer-aided detection systems are routinely used in a number of medical institutions around the world to assist radiologists in the detection of abnormalities depicted on mammograms. The number of mammograms scanned through commercial computer-detection systems has been rapidly increasing. Although no general agreement has been reached on how computer-aided detection affects radiologists' performance in terms of sensitivity and specificity [14], there are indications that the performance of the computer-aided detection scheme itself has an impact on radiologists' performance in detecting abnormalities [5, 6], and observer confidence levels in accepting the cues generated by these systems increases with higher performance levels of the scheme [7, 8]. Several commercial computer-aided detection systems have been approved by the United States Food and Drug Administration, and the relative performance levels of such systems have been compared [9, 10]. All commercial computer-aided detection systems use specific threshold values to determine whether an identified suspicious region is ultimately cued as a positive finding, and the performance of these systems is frequently evaluated on the basis of the case-based sensitivity achieved at a given false-positive detection rate. In a case-based (or a breast-based) analysis, sensitivity is based on the correct detection of at least one true-positive region on either the craniocaudal or mediolateral oblique mammographic view or on both [1].

Evaluation of computer-aided detection performance is not a simple matter. Previous studies have shown that performance can vary widely depending on which scoring method is used, and there is no general agreement on which scoring method should be used for this purpose [11, 12]. One study showed that at approximately the same false-positive rate (e.g., 1.5 per image), the measured sensitivity for the detection of microcalcification clusters ranged between 45% and 85% depending on which of three different assessment methods were used [11].

In addition, computer-aided detection performance depends on the composition of the image database used [13]. In general, computer-aided detection schemes may identify a large number of suspicious regions on some images (e.g., images depicting dense tissue patterns), but only a few suspicious regions on other images (e.g., images dominated by fatty tissue) [14]. Therefore, limiting the maximum number of suspicious regions allowed to be cued for one case could potentially reduce the false-positive rate with a relatively small decrease in sensitivity. This approach is used in commercially available systems, but to the best of our knowledge, the effect of implementing the approach on image- and case-based sensitivity and false-positive detection rates has not been described in detail. This study was performed to assess this issue.


Materials and Methods
Top
Abstract
Introduction
Materials and Methods
Results
Discussion
References
 
We selected 500 cases (or 2,000 digitized mammograms) from a large image database available in our laboratory. Among these cases, verified malignant masses were depicted in 300 cases, and the remaining 200 were negative findings. In all cases with positive findings, a panel of radiologists identified the locations of the mass regions on the images using the original diagnostic and biopsy reports. The central coordinates (x and y) of each mass region were visually identified, marked, and saved in a "truth file." In this data set, mass regions were visible on both the craniocaudal and mediolateral oblique mammographic views in 270 cases and were only visible on one of the two views in 30 cases. Thus, 570 mass regions were identified on the images in this study. Figure 1 shows the size distribution of the 300 masses in the data set.



View larger version (13K):
[in this window]
[in a new window]
[as a PowerPoint slide]
 
Fig. 1. Bar graph shows size distribution of 300 masses depicted in data set. Mass size is represented by larger depicted area on either craniocaudal or mediolateral oblique mammographic view.

 

A computer program determined the size of each mass region by counting the total number of pixels inside the identified boundary contour of the region (multiplied by 0.0016 cm2 per pixel). The size of a mass was represented by a large computed area on either the craniocaudal or mediolateral oblique mammogram. For each identified mass region, the panel of radiologists assigned a subjective rating of subtlety using a 5-point rating scale that ranged from 1 (very easily visible) to 5 (very subtly visible). Figure 2 shows the distribution of assigned subtlety ratings in this data set. Subtlety of a mass was represented by the lower rating assigned to either the craniocaudal or mediolateral oblique mammographic view. We verified all cases with negative (or benign) findings by reviewing the available diagnostic information and the data from a follow-up examination with negative results, confirming a minimum of one disease-free year.



View larger version (11K):
[in this window]
[in a new window]
[as a PowerPoint slide]
 
Fig. 2. Bar graph shows distribution of subjectively rated subtlety of 300 masses depicted in data set. Subtlety of each identified mass was rated on 5-point scale, ranging from 1 (very easily visible) to 5 (very subtly visible). Mass subtlety is represented by lower-rated depiction on either craniocaudal or mediolateral oblique mammographic view.

 

A computer-aided detection scheme developed previously in our laboratory [15] was applied to the 2,000 images in the data set. Because we only examined computer-aided detection performance for mass detection in this study, each image was first reduced by pixel averaging (a factor of 8 in both x and y directions), increasing the effective pixel size from 50 x 50 µm in the original digitized image to 400 x 400 µm. The mass detection scheme then identified between 10 and 30 suspicious regions in each image depending on the regional tissue patterns. For each identified region, a multilayer regional growth algorithm [16] was applied to define the contours of the region as depicted in the image. If the region met simple growth criteria, a set of features from the interior and surrounding background of the region was computed by the scheme. Otherwise, the region was considered to have negative findings and was deleted. Finally, a feature-based artificial neural network classified each suspicious region as showing positive or negative findings by assigning a detection (or probability) score. In a manner similar to the commercial computer-aided detection products, our detection scheme identified a region as having a positive finding if the detection score exceeded a predetermined threshold. If the detection score did not exceed the threshold, the region was not cued and was considered to be a negative finding.

After processing all images, we compared the regions with detected positive findings with the results saved in the truth file. To determine whether a detected region was considered a true-positive finding, we applied the following criterion: If the distance between the computed center of a detected region and the visually marked coordinate on a mammogram was shorter than the effective radius (the average radial length computed by the computer-aided detection scheme), the region was considered to be a match to a true-positive mass. Otherwise, the region was considered a false-positive case.

To show the original performance of the computer-aided detection scheme when applied to this data set, we plotted free-response receiver operating characteristic curves for both case-based and region-based scores. In the case-based performance curve, sensitivity was assessed on the basis of the correct marking of at least one true-positive region in either (or both) of the two mammographic views, and if two regions were detected, the higher score was selected to represent the mass. In the region-based performance curve, if the same mass was depicted on both craniocaudal and mediolateral oblique views, we considered these two images to represent two independent regions.

We applied a threshold score to the artificial neural network results to evaluate the sensitivity of the scheme at different false-positive rates. We also adjusted the threshold value to produce a false-positive rate comparable to that of the leading commercial computer-aided detection systems (e.g., a false-positive rate of 0.4 regions per image [2]). By changing the total number of cued regions permitted in each case to anywhere from seven to one, we compared the change in performance levels (including both sensitivity and false-positive rate). The scores generated by the artificial neural networks for all detected regions were sorted by value from the highest to the lowest, and the regions with higher scores were selected sequentially until the predetermined limit of cued regions per case was reached. In addition, we kept the case-based sensitivity constant by reducing the detection threshold and assessed the changes in false-positive rates and image-based sensitivity as the total number of allowed cues per case was reduced from seven to two.


Results
Top
Abstract
Introduction
Materials and Methods
Results
Discussion
References
 
Figure 3 shows two computed free-response receiver operating characteristic curves after the application of our computer-aided detection scheme to this data set. One is a case-based free-response receiver operating characteristic performance curve; the other is a region-based curve. Setting the threshold value of the artificial neural network detection scores at 0.565 generated a decision threshold line, as shown in Figure 3. At this level, the computer-aided detection scheme identified 79% of the malignant masses with 0.4 false-positive regions per image being cued. At this threshold, the scheme did not detect any false-positive regions in 33.2% (166/500) of the cases.



View larger version (13K):
[in this window]
[in a new window]
[as a PowerPoint slide]
 
Fig. 3. Graph illustrates overall performance of computer-aided detection scheme when applied to database of 2,000 mammograms (500 cases) with no limitation on number of cued regions. Detection decision threshold line is represented by dotted line. {diamondsuit} = case-based free-response receiver operating characteristic curve, = image-based free-response receiver operating characteristic curve.

 

Table 1 provides the performance levels of the computer-aided detection scheme when we limited the maximum number of cued regions allowed in one case at this threshold level (0.565). The false-positive detection rate decreased substantially faster than the case-based sensitivity. For example, when we limited the maximum number of cued regions to two per case, the detection sensitivity decreased by 7.2% (from 237/300 to 220/300 cases), whereas the false-positive detection rate decreased by 47.3% (from 0.40 to 0.21 per image). In 65% of the true-positive cases, the region with the highest artificial neural network score was the malignant mass region (Table 1).


View this table:
[in this window]
[in a new window]

 
TABLE 1 Performance Levels of Computer-Aided Detection as a Function of the Maximum Number of Cued Regions Allowed per Case

 

Figure 4 shows five free-response receiver operating characteristic curves generated when the maximum allowed number of cues per case was limited to between seven and two. As the maximum number of allowed cues was reduced, the free-response receiver operating characteristic curves tended to become steeper. Table 2 summarizes the results after limiting the maximum number of cued regions and changing the threshold value of the artificial neural network detection scores to maintain a 79% case-based sensitivity. The table shows that we were able to reduce the false-positive rates while maintaining a constant sensitivity. For example, by limiting the maximum allowed number of cues to two per case and adjusting the artificial neural network threshold to 0.36, we reduced the false-positive rate from 0.4 to 0.3 regions per image.



View larger version (14K):
[in this window]
[in a new window]
[as a PowerPoint slide]
 
Fig. 4. Graph shows five plots depicting free-response receiver operating characteristic curves generated by different maximum numbers of cued regions allowed per case. Maximum number of cued regions indicated by {diamondsuit} = no limit, {blacksquare} = <= 7, = <= 5, {blacktriangleup} = <= 3, {circ} = <= 2.

 

View this table:
[in this window]
[in a new window]

 
TABLE 2 Performance Levels of Computer-Aided Detection with Constant Sensitivity of 79% as a Function of the Maximum Number of Cued Regions Allowed per Case

 

One interesting finding was that the 17 (of the 237) masses detected using these two scoring methods were not identical. When the maximum number of cued regions was limited to two per case, 17 masses with artificial neural network scores higher than 0.565 (range, 0.57–0.77) were eliminated. Reducing the threshold score to 0.36 resulted in the identification of 17 different masses with artificial neural network scores in the range between 0.36 and 0.51. Figure 5 shows the distribution of mass sizes and subtlety ratings of the 34 masses missed by both scoring methods. The results suggest that the 17 masses that were detected only when the number of allowed cues was limited to two per case and the threshold was lowered tended to be somewhat small. All 34 masses were actually positive findings. At this time, the follow-up period on these patients has not been long enough to assess the difference (if any) in clinical impact of the two approaches.



View larger version (9K):
[in this window]
[in a new window]
[as a PowerPoint slide]
 
Fig. 5. Scatterplot shows sizes and subtlety ratings distributions for 34 masses that were undetected by both case-based and image-based scoring methods. {diamondsuit} = no limit to number of regions in each case that may be cued as showing positive findings, = maximum number of regions that may be cued is <= 2.

 


Discussion
Top
Abstract
Introduction
Materials and Methods
Results
Discussion
References
 
Case distributions and rating methods could have a significant effect on the evaluation of computer-aided detection performance levels [1113]. In this study, we tested a simple scoring method that alters measured performance. The method of limiting the maximum number of cued regions allowed per case is commonly used in commercial computer-aided detection products. However, the actual scores for each region are not available to users. Therefore, several related issues—such as the effect of this approach on overall performance and on the detection (or the missed detection) of specific masses— have not, to our knowledge, been described in the past.

Our study showed that by limiting the maximum number of allowed regions to be cued in each case, a substantial fraction of false-positive regions can be eliminated with only a small decrease in sensitivity. If one wishes to maintain sensitivity, threshold values can be appropriately adjusted for this purpose. Because most masses were visible on both the craniocaudal and mediolateral oblique mammograms and because the detection performance of computer-aided detection systems is commonly evaluated using case-based sensitivity, our results are quite encouraging. It appears that this approach could reduce the false-positive detection rate of the scheme and possibly eliminate some true-positive region-based detections while retaining the initial (unrestricted number of cues) case-based sensitivity. Although the sensitivity can be maintained using this approach (changing the threshold levels for detection), one does not detect exactly the same true-positive masses. We found that limiting the maximum number of cues allowed per case and adjusting the threshold appropriately increased computer-aided detection sensitivity in the subset of smaller masses. In general, this effect is desirable in that it could reduce the number of regions that have to be ruled out by the radiologist. We caution that the use of this approach may not yield improvements of similar magnitude in the clinical environment with a substantially different distribution of truly positive and truly negative cases.

It should be noted that the size and subtlety ratings of masses in the data set were somewhat conservative. In Figures 1 and 2, we used the larger of the sizes computed for a mass from the two mammographic views and presented the less subtle rating for the same mass. Hence, distribution based on image or region would show a somewhat smaller average mass size and a more subtle data set.

Only malignant masses were considered true-positive identifications in this study. In visually assessing the false-positive regions with higher scores (e.g., > 0.7), we found that 19% (40/213) of these regions represented well-defined benign masses (i.e., round benign masses with high contrast and relatively sharp margins). Considering the detection of benign masses as either true-positive or false-positive may have a substantial impact on the evaluation of computer-aided detection performance levels. Because of the approach we used to reduce the number of cued regions per case and because of the size and diversity of the data set used, we believe that our results are not unique to our own computer-aided detection scheme.


References
Top
Abstract
Introduction
Materials and Methods
Results
Discussion
References
 

  1. Warren Burhenne LJ, Wood SA, D'Orsi CJ, et al. Potential contribution of computer-aided detection to the sensitivity of screening mammography. Radiology2000; 215:554 –562[Abstract/Free Full Text]
  2. Freer TW, Ulissey MJ. Screening mammography with computer-aided detection: prospective study of 12,860 patients in a community breast center. Radiology2001; 220:781 –786[Abstract/Free Full Text]
  3. Brem RF, Schoonjans JM. Radiologist detection of microcalcifications with and without computer-aided detection: a comparative study. Clin Radiol2001; 56:150 –154[Medline]
  4. Ciatto S, Turco MR, Risso G, et al. Comparison of standard reading and computer aided detection (CAD) on a national proficiency test of screening mammography. Eur J Radiol2003; 45:135 –138[Medline]
  5. Malich A, Azhari T, Bohm T, Fleck M, Kaiser WA. Reproducibility: an important factor determining the quality of computer aided detection (CAD) systems. Eur J Radiol2000; 36:170 –174[Medline]
  6. Zheng B, Ganott MA, Britton CA, et al. Soft-display mammographic reading with different computer-assisted detection cueing environments: preliminary findings. Radiology2001; 221:633 –640[Abstract/Free Full Text]
  7. Moberg K, Bjurstam N, Wilczek B, Rostgard L, Egge E, Muren C. Computer assisted detection of interval breast cancers. Eur J Radiol 2001;39:104 –110[Medline]
  8. D'Orsi CJ. Computer-aided detection: there is no free lunch. Radiology2001; 221:585 –586[Free Full Text]
  9. Malich A, Marx C, Facius M, Boehm T, Fleck M, Kaiser WA. Tumor detection rate of a new commercially available computer-aided detection system. Eur Radiol2001; 11:2454 –2459[Medline]
  10. Hoffmeister JW, Rogers SK, DeSimio MP, Brem R. Determining efficacy of mammographic CAD systems. J Digit Imaging2002; 15:198 –200
  11. Nishikawa RM, Yarusso LM. Variations in measured performance of CAD schemes due to database composition and scoring protocol. Proc SPIE 1998;3338:840 –844
  12. Kallergi M, Carney GM, Gaviria J. Evaluating the performance of detection algorithms in digital mammography. Med Phys1999; 26:267 –275[Medline]
  13. Nishikawa RM, Giger ML, Doi K, et al. Effect of case selection on the performance of computer-aided detection schemes. Med Phys 1994;21:265 –269[Medline]
  14. Zheng B, Chang YH, Gur D. Adaptive computer-aided diagnosis scheme of digitized mammograms. Acad Radiol1996; 3:806 –814[Medline]
  15. Zheng B, Sumkin JH, Good WF, Maitz GS, Chang YH, Gur D. Applying computer-assisted detection schemes to digitized mammograms after JPEG data compression: an assessment. Acad Radiol2000; 7:595 –602[Medline]
  16. Zheng B, Chang YH, Gur D. Computerized detection of masses in digitized mammograms using single image segmentation and multilayer topographic feature analysis. Acad Radiol1995; 2:959 –966[Medline]

Add to CiteULike CiteULike   Add to Complore Complore   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati    What's this?


This article has been cited by other articles:


Home page
RadiologyHome page
J. A. Baker, E. L. Rosen, M. M. Crockett, and J. Y. Lo
Accuracy of Segmentation of a Commercial Computer-aided Detection System for Mammography
Radiology, May 1, 2005; 235(2): 385 - 390.
[Abstract] [Full Text] [PDF]


Home page
Am. J. Roentgenol.Home page
N. A. Obuchowski
ROC Analysis
Am. J. Roentgenol., February 1, 2005; 184(2): 364 - 372.
[Full Text] [PDF]


Home page
RadiologyHome page
D. Gur, J. S. Stalder, L. A. Hardesty, B. Zheng, J. H. Sumkin, D. M. Chough, B. E. Shindel, and H. E. Rockette
Computer-aided Detection Performance in Mammographic Examination of Masses: Assessment
Radiology, November 1, 2004; 233(2): 418 - 423.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Zheng, B.
Right arrow Articles by Gur, D.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Zheng, B.
Right arrow Articles by Gur, D.
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS