AJR Custom publishing of AJR articles and ARRS Cat. Course
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Thomas, B. J.
Right arrow Articles by Rosenthal, D. I.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Thomas, B. J.
Right arrow Articles by Rosenthal, D. I.
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?
AJR 2005; 184:687-690
© American Roentgen Ray Society

Automated Computer-Assisted Categorization of Radiology Reports

Bijoy J. Thomas1, Hugue Ouellette, Elkan F. Halpern and Daniel I. Rosenthal

1 All authors: Division of Musculoskeletal Radiology, Department of Radiology, Massachusetts General Hospital, 32 Fruit St., YAW 6E, Boston, MA 02114.

Received October 30, 2003; accepted after revision June 30, 2004.

 
Address correspondence to B. J. Thomas.


Abstract
Top
Abstract
Introduction
Materials and Methods
Results
Discussion
References
 
OBJECTIVE. The objective of our study was to create and validate an automated computerized method for the categorization of narrative text radiograph reports.

MATERIALS AND METHODS. Using commercially available software with embedded Boolean logic, we created a text search algorithm to categorize reports of radiography examinations into "fracture," "normal," and "neither normal nor fracture." The algorithm was refined and optimized through repeated testing on 512 consecutive ankle radiography reports from a single clinical imaging center. The final algorithm was applied on a different set of 750 consecutive radiography reports of the spine and extremities produced at three different clinical imaging sites and interpreted by 44 different radiologists. Expert reviewers assessed the accuracy of the final classification. The chi-square test or Fisher's exact test was performed to determine the reproducibility of results across different clinical imaging sites.

RESULTS. The computerized classification was highly accurate for the classification of radiography reports into "normal" (specificity, 91.6%; sensitivity, 91.3%), "neither normal nor fracture" (sensitivity, 87.8%; specificity, 94.9%), and "fracture" (sensitivity, 94.1%; specificity, 98.1%) categories. This performance showed no significant difference across the three sites (p > 0.05).

CONCLUSION. Computerized categorization of narrative-text radiography reports is highly sensitive and specific and can be used to classify reports from different imaging sites generated by different radiologists. This method can be an extremely powerful tool in future cost-effectiveness studies, health care policy studies, operations assessments, and quality control.


Introduction
Top
Abstract
Introduction
Materials and Methods
Results
Discussion
References
 
Medical language processing systems have been used successfully in the extraction of information from textual reports [18]. Computerized textual analysis has found applications in the field of radiology for the categorization and classification of radiology reports [923]. Boolean language analysis has been used successfully for categorization of narrative radiology reports [24, 25].

Boolean analysis of text reports is a binary-based data-mining technique that uses dependency and association rules. It enables complex search strategies using Boolean operators, such as an "AND," "OR," "NOT," and so on, which can be combined with specific words (e.g., "fracture") either in isolation or in sequence. The occurrence within a predetermined number of words can be identified and either included or excluded. For example, the search string "no fracture"/4 identifies the ordered occurrence of the words "no" and "fracture" within four words of each other. This search will thus identify statements such as "no fracture," "no new fracture," "no visible fracture," "no acute fracture," and "no evidence of fracture."

The purpose of this study is to create and validate Boolean language search strings capable of classifying radiography reports into one of three categories: normal, fracture, and "neither normal nor fracture." Although some of the clinical "rules" look only at normal and fracture outcomes [24], we have decided to include a "neither normal nor fracture" category because of our belief in the significance of other types of abnormalities.


Materials and Methods
Top
Abstract
Introduction
Materials and Methods
Results
Discussion
References
 
Radiology reports from an urban teaching hospital and associated clinics were used as the source of data. All radiology reports from 1988 to the present (> 4 million reports) have been archived on an information system (IDXrad, IDX Systems). For ease of search and retrieval, all reports are also transferred to a nonrelational database (Folio, Open Market Inc.) and are made available over the Web on a PC server. Briefly, data are stored in discrete fields (Demographics, History, Body, Impression) that are individually searchable. Each word of each report is indexed, allowing almost instantaneous retrieval of reports containing a single search term. The software permits complex and sophisticated searches using Boolean operators, allowing the user to find reports that include certain words in certain sequences while excluding others.

Creating the Search Strategy
This study was limited to radiographs of the ankle obtained in 2001 and all radiographs of the spine and extremities obtained in 2003. Construction of a search algorithm was an iterative process, as we attempted to deal with idiosyncrasies of syntax and with certain classification difficulties on the boundary between normal and abnormal.

A consecutive series of 512 radiography reports from ankle examinations performed at a single health center in 2001 was used as the test set. An initial pass through the data identified fracture cases if the word "fracture" was actually used in the Impression field, while excluding those reports with a negative statement ("no evidence of fracture," "without evidence of fracture," and so on). For example, the search string "fracture NOT `no fracture'/8" identifies all reports that have the word "fracture" in them and excludes, from the selected set, the reports that have the words "no" and "fracture" within eight words of each other. Thus, all reports that say "No fracture," "No acute fracture," "No evidence of fracture," "No radiographic evidence of acute or displaced fracture," and so on are excluded, thus yielding a set of reports with positive findings for fracture.

Cases were classified as neither normal nor fracture using a list of terms indicating generic abnormalities, not necessarily specific to the ankle. For example, terms ending in -osis, -otic, -itis, -itic, and so on were used to classify cases as neither normal nor fracture. For example, the search string "itis NOT `no *itis'/8" detects reports that have words ending in itis such as arthritis, myositis, tendinitis, and bursitis and excludes negative constructions as described earlier.

If neither fracture nor one of the "neither normal nor fracture" diagnoses applied, the case was classified as normal.

The validity of the classification was tested by two authors who subsequently read and manually classified the reports. From this analysis, several findings were quickly discovered. First, certain observations did not make sense when classified in this manner. Although perhaps not normal, they were unlikely to justify the imaging examination, either because of their prevalence in the asymptomatic population (heal spurs, normal variants, osteopenia, enthesopathy, calcified vessels) or because they would have been known without imaging (soft-tissue swelling, old or healed fractures). These findings were specifically excluded from subsequent searches.

Second, because different radiologists have different dictation styles, the optimal proximity between words—for example, between "no" and "fracture" as described earlier—had to be obtained by trial and error. For example, by testing different proximities between four and 12, it was determined that permitting an eight-word separation between "no" and "fracture" gave the greatest overall accuracy. Shorter separations missed some negative constructions; longer ones included too many coincidental occurrences, and thus missed some fractures.

Third, reports that applied to more than one examination ("associated" examinations) had to be excluded because one examination might be positive and one negative. For example, a single report for a foot and an ankle examination might say "No fractures of the ankle. Fracture of the base of the fifth metatarsal present." Classification of such a report proved to be impossible by our method.

Fourth, in some instances, important findings were omitted from the Impression field. In others, no impression was given. Therefore, the search algorithm was applied to both the Impression and the Body fields of the report in a sequential manner to pick as many positive reports as possible in each category.

Fifth, certain errors of classification persisted, despite repeated modifications to the search algorithm. For example, a report that says, "No evidence of effusion. Fracture of the fifth metatarsal present" is excluded from the fracture category in our search, because the words "no" and "fracture" are within eight words of each other, even though these words are in different sentences.

Ultimately, a cascade of search strings was used. Based on the presence or absence of text in the Impression section, the ankle X-ray reports were classified into reports with an impression and reports without an impression. The reports with impression were initially categorized into "Fracture 1" and "Filtered Reports 1" using a search of the Impression field. Further filtration of the "Filtered Reports 1" category, searching the Body section of the report, was done to repatriate erroneously categorized fracture cases to the "Fracture 2" category, thereby generating the "Filtered Reports 2" category. Then we performed a search of the Impression field in the "Filtered Reports 2" category to obtain "Neither Normal Nor Fracture 1" (NNNF 1) and "Filtered Reports 3" categories. A search of the Body field was done in the "Filtered Reports 3" category to pick "Neither Normal Nor Fracture 2" (NNNF 2) cases erroneously categorized in the "Filtered Reports 3" category, thereby generating the "Normal 1" category. For the group of reports without an impression, a sequential search was done on the Body section of the report alone, thereby generating the "Fracture 3," "Neither Normal Nor Fracture 3" (NNNF 3), and "Normal 2" categories, and the results were added to each other. The sequence of search can be better understood with the help of Figure 1.



View larger version (27K):
[in this window]
[in a new window]
[as a PowerPoint slide]
 
Fig. 1. Schematic shows sequence of application of search strings. Total no. of cases of fractures = Fracture 1 + Fracture 2 + Fracture 3; total no. of cases of "neither normal nor fracture" (NNNF) = NNNF 1 + NNNF 2 + NNNF 3; total no. of cases with normal findings = Normal 1 + Normal 2.

 

Testing the Search Strategy
The validity of the final computer classification was tested against an entirely new set of consecutive cases, consisting of radiography reports of the spine and extremities drawn from three different clinical sites: a hospital emergency department, a community-based outpatient health center, and a hospital-based walk-in department. A different panel of radiologists served each site, although there was some overlap. A consecutive series of 750 reports from the year 2003 was identified.

Each report was read by two of the authors. Their consensus classification as normal, fracture, or "neither normal nor fracture" was used as the gold standard for comparison. The final computerized classification algorithm was run for the same cases, and the results were compared with our gold standard to calculate specificity, sensitivity, positive predictive value, and negative predictive value.

To determine whether there were significant differences in the accuracy of the search strings in the three different clinical settings, we used the chi-square test or we used the Fisher's exact test if the chi-square test was not valid owing to an inadequate sample size.


Results
Top
Abstract
Introduction
Materials and Methods
Results
Discussion
References
 
In our test set of 750 cases, the expert reviewers classified 311 cases (41.5%) as normal, 119 (15.9%) as examples of fracture, and 320 cases (42.7%) as neither normal nor fractures. The search algorithm identified normal examinations with a high degree of accuracy. It correctly identified 285 of a possible 311 reports, yielding a specificity of 91.6% (95% confidence interval [CI], 88.0–94.5%). Overall, the sensitivity for any abnormality was 91.3%. However, the sensitivity depended on the form of abnormality (chi-square test, p = 0.0161). Although 96.6% (95% CI, 91.6–99.1%) of the fracture cases were identified as abnormal, only 89.4% (95% CI, 85.5–92.5%) of the "neither normal nor fracture" (NNNF) cases were recognized. The negative predictive value was 88.2%, and the positive predictive value was 93.9%. There was no difference in the ability to distinguish between normal and abnormal (combined fracture and "neither normal nor fracture") examinations among the three sites (chi-square test, p = 0.0550).

Similarly, the search results for "neither normal nor fracture" were compared with the gold standard and yielded 281 correctly categorized reports of a possible 320. The sensitivity of the search was 87.8% (95% CI, 83.7–91.2%), the overall specificity was 94.9%, the positive predictive value was 92.7%, and the negative predictive value was 91.3%. There was no difference in the ability to identify "neither normal nor fracture" examinations among the three sites (chi-square test, p = 0.2321).

When compared with our gold standard, the search algorithm for the fracture category identified correctly 112 of 119 fractures, resulting in a sensitivity of 94.1% (95% CI, 88.3–97.6%), an overall specificity of 98.1%, a positive predictive value of 90.3%, and a negative predictive value of 98.9%. There was no difference in the ability to identify fracture examinations among the three sites (Fisher's exact test, p = 0.3252).

No significant difference was detected in the ability of the search algorithm to separate normal from abnormal (combined fracture and "neither normal nor fracture") or to separate cases of "neither normal nor fracture" and fracture across the three sites.


Discussion
Top
Abstract
Introduction
Materials and Methods
Results
Discussion
References
 
We have shown that it is possible to classify radiography reports into one of three categories with a high degree of accuracy using an automated search strategy based on Boolean analysis of unstructured dictated text. Our method successfully used reports generated by 44 different radiologists practicing in three different work environments with different prior probabilities of abnormal examinations.

Because the use and associated costs of medical imaging have increased, controversy has developed over its role in the delivery of health care services. Efforts to control costs have focused on the notion of "appropriateness," which is an outgrowth of the traditional medical concept of "indication." Various attempts to limit to those indications deemed appropriate by various authorities have been made [26], but these efforts have had limited success, in part because of their subjectivity. Further, appropriate applications of technology are constantly evolving. Finally, the appropriateness of a particular examination may depend on the availability of alternative resources. For example, in certain instances, imaging may serve as a substitute for a referral to a specialist.

An alternative approach has been to limit the use of imaging to those clinical instances in which the expected yield of positive findings is above certain thresholds. Various clinical "rules" have been developed to determine when imaging should be used [27].

In theory, evaluation of the results of imaging could help to determine indications and whether the use of particular techniques by individual physicians or groups of physicians is consistent with that of their peers. Unfortunately, large-scale applications of this method are laborious, because they require ongoing evaluation of the outcomes of large numbers of imaging studies. Manual coding ("results coding") by trained personnel is rarely performed because it is time-consuming and costly [9], and coding by radiologists has been generally resisted by the radiology community [9].

A successful approach to this problem has been the use of natural language analysis [923] and Boolean language analysis [24, 25] to classify text reports into a structured outcome format to make the data useful for clinical research.

Such natural language analysis has been recognized as a promising research tool [5]. The challenge of this approach is to devise a search strategy that can classify results into the desired categories with a high degree of accuracy across a broad range of users and body parts. Different authors have described the use of natural language analysis for the detection of patients with suspected disease conditions such as pneumonia [9, 1517], inhalational anthrax [18], tuberculosis [19, 20], and breast cancer [21]. This type of analysis has also been successfully used to study an emergency radiology report database [24], head trauma database [25], stroke database [22], and ventilation–perfusion lung scan report database [23].

Our results have been comparable to those of previously described natural language processing systems. For example, the natural language processor described by Hripcsak et al. [13] has shown a sensitivity of 81% and a specificity of 99% in coding narrative chest radiography reports. Similarly, the natural language processing system, described by Zingmond et al. [14] had a sensitivity of 90% and a specificity of 82% in the classification of narrative chest radiography reports.

We believe that our method can, with further refinement and validation, be used to provide data about the effectiveness with which medical imaging is used. Thus, individual practices or groups with a high percentage of normal imaging studies may be overusing a technique. An important caveat is that a high percentage of follow-up examinations will tend to cause an over-estimation of the contribution of the radiographs. For example, an orthopedic practice in which every patient has a fracture will produce an extremely small number of normal radiographs.

In summary, we have shown that an automated analysis of radiology reports of radiography examinations can classify findings into normal and significantly abnormal categories. Additional work will be required to determine whether it can be used to perform comparisons of ordering practices among physician groups.


References
Top
Abstract
Introduction
Materials and Methods
Results
Discussion
References
 

  1. Friedman C. Towards a comprehensive medical language processing system: methods and issues. Proc AMIA Annu Fall Symp1997 : 595–599
  2. Friedman C. A broad-coverage natural language processing system. Proc AMIA Symp 2000:270 –274
  3. Spyns P. Natural language processing in medicine: an overview. Methods Inf Med1996; 35:285 –301[Medline]
  4. Sager N, Lyman M, Bucknall C, Nhan N, Tick LJ. Natural language processing and the representation of clinical data. J Am Med Inform Assoc 1994;1:142 –160[Abstract/Free Full Text]
  5. Gabrieli ER, Speth DJ. Automated analysis of medical text. I. Clue gathering. J Med Syst1990; 14:71 –91[Medline]
  6. Baud RH, Rassinoux AM, Scherrer JR. Natural language processing and semantical representation of medical texts. Methods Inf Med 1992;31:117 –125[Medline]
  7. Taira RK, Soderland SG. A statistical natural language processor for medical reports. Proc AMIA Symp 1999:970 –974
  8. Taira RK, Soderland SG. Automatic structuring of radiology free-text reports. RadioGraphics 2001;21 : 237–245[Abstract/Free Full Text]
  9. Fiszman M, Chapman WW, Aronsky D, Evans RS, Haug PJ. Automatic detection of acute bacterial pneumonia from chest X-ray reports. J Am Med Inform Assoc 2000;7:593 –604[Abstract/Free Full Text]
  10. Haug PJ, Ranum DL, Frederick PR. Computerized extraction of coded findings from free-text radiologic reports: work in progress. Radiology1990; 174:543 –548[Abstract/Free Full Text]
  11. Friedman C, Alderson PO, Austin JH, Cimino JJ, Johnson SB. A general natural-language text processor for clinical radiology. J Am Med Inform Assoc 1994;1:161 –174[Abstract/Free Full Text]
  12. Hripcsak G, Friedman C, Alderson PO, DuMouchel W, Johnson SB, Clayton PD. Unlocking clinical data from narrative reports: a study of natural language processing. Ann Intern Med1995; 122:681 –688[Abstract/Free Full Text]
  13. Hripcsak G, Austin JH, Alderson PO, Friedman C. Use of natural language processing to translate clinical information from a database of 889,921 chest radiographic reports. Radiology2002; 224:157 –163[Abstract/Free Full Text]
  14. Zingmond D, Lenert LA. Monitoring free-text data using medical language processing. Comput Biomed Res1993; 26:467 –481[Medline]
  15. Chapman WW, Haug PJ. Comparing expert systems for identifying chest X-ray reports that support pneumonia. Proc AMIA Symp1999 : 216–220
  16. Chapman WW, Fiszman M, Chapman BE, Haug PJ. A comparison of classification algorithms to automatically identify chest X-ray reports that support pneumonia. J Biomed Inform2001; 34:4 –14[Medline]
  17. Fiszman M, Chapman WW, Evans SR, Haug PJ. Automatic identification of pneumonia related concepts on chest x-ray reports. Proc AMIA Symp 1999:67 –71
  18. Chapman WW, Cooper GF, Hanbury P, Chapman BE, Harrison LH, Wagner MM. Creating a text classifier to detect radiology reports describing mediastinal findings associated with inhalational anthrax and other disorders. J Am Med Inform Assoc2003; 10:494 –503[Abstract/Free Full Text]
  19. Hripcsak G, Knirsch CA, Jain NL, Pablos-Mendez A. Automated tuberculosis detection. J Am Med Inform Assoc1997; 4:376 –381[Abstract/Free Full Text]
  20. Jain NL, Knirsch CA, Friedman C, Hripcsak G. Identification of suspected tuberculosis patients based on natural language processing of chest radiograph reports. Proc AMIA Annu Fall Symp1996 : 542–546
  21. Jain NL, Friedman C. Identification of findings suspicious for breast cancer based on natural language processing of mammogram reports. Proc AMIA Annu Fall Symp 1997:829 –833
  22. Elkins JS, Friedman C, Boden-Albala B, Sacco RL, Hripcsak G. Coding neuroradiology reports for the Northern Manhattan Stroke Study: a comparison of natural language processing and manual review. Comput Biomed Res 2000;33:1 –10[Medline]
  23. Fiszman M, Haug PJ, Frederick PR. Automatic extraction of PIOPED interpretations from ventilation/perfusion lung scan reports. Proc AMIA Symp 1998:860 –864
  24. Lee SI, Chew FS. 1998 ARRS Executive Council Award: radiology in the emergency department— technique for quantitative description of use and results. AJR1998; 171:559 –564[Abstract/Free Full Text]
  25. Imberman SP, Domanski B, Thompson HW. Using dependency/association rules to find indications for computed tomography in a head trauma dataset. Artif Intell Med2002; 26:55 –68[Medline]
  26. Sistrom CL, Honeyman JC. Relational data model for the American College of Radiology Appropriateness Criteria. J Digit Imaging 2002; 15:216 –225
  27. Stiell I, Wells G, Laupacis A, et al. Multicentre trial to introduce the Ottawa ankle rules for use of radiography in acute ankle injuries: Multicentre Ankle Rule Study Group. BMJ1995; 311:594 –597[Abstract/Free Full Text]

Add to CiteULike CiteULike   Add to Complore Complore   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati    What's this?


This article has been cited by other articles:


Home page
Am. J. Roentgenol.Home page
M. Torriani, B. J. Thomas, M. A. Bredella, and H. Ouellette
MRI of Metatarsal Head Subchondral Fractures in Patients with Forefoot Pain
Am. J. Roentgenol., March 1, 2008; 190(3): 570 - 575.
[Abstract] [Full Text] [PDF]


Home page
J. Am. Med. Inform. Assoc.Home page
B. de Bruijn, A. Cranney, S. O'Donnell, J. D. Martin, and A. J. Forster
Identifying Wrist Fracture Patients with High Accuracy by Automatic Categorization of X-ray Reports
J. Am. Med. Inform. Assoc., November 1, 2006; 13(6): 696 - 698.
[Abstract] [Full Text] [PDF]


Home page
RadiologyHome page
J. H. Thrall
Reinventing Radiology in the Digital Age: Part II. New Directions and New Stakeholder Value
Radiology, October 1, 2005; 237(1): 15 - 18.
[Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Thomas, B. J.
Right arrow Articles by Rosenthal, D. I.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Thomas, B. J.
Right arrow Articles by Rosenthal, D. I.
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS