|
|
||||||||
1
Department of Radiology, University of Texas Medical Branch, 301 University
Blvd., Galveston, TX 77555-0465.
2
Present address: Department of Radiology, University of California, S.F.G.H.
(Bldg. NH, Rm. G100), 1001 Potrero Ave., San Francisco, CA 94110.
3
Present address: Department of Radiology, Brigham and Women's Hospital,
Harvard Medical School, 75 Francis St., Boston, MA 02115.
Received February 11, 1999;
accepted after revision August 10, 1999.
Presented at the annual meeting of the American Roentgen Ray Society, San
Francisco, AprilMay 1998.
Abstract
|
|
|---|
MATERIALS AND METHODS. Data from 5072 reports generated in our MR imaging section during a 9-month period after the implementation of a commercial continuous speech recognition system were compared with 4552 reports produced during the same period 1 year earlier. Information pertaining to the use of continuous speech recognition, report turnaround time, word recognition rate, report appearance, and equipment costs was collected.
RESULTS. After its system installation, continuous speech recognition was used to dictate 81.8% of all reports. The mean report turnaround time decreased from 87.8 to 43.6 hr, and report availability at 24 hr increased from 10.5% to 62.5%. The system was found to have an average word recognition accuracy of 92.7% for spontaneous dictation. Mean report length declined from 95 to 60 words, with an increase in spacing errors from 0.3 to 8.0 per 1000 words and a decrease in spelling errors from 3.0 to 0.8 per 1000 words. Initial hardware and software costs were approximately $10,000, compared with a yearly cost of $12,000 for human transcription.
CONCLUSION. Although the technology is still evolving and was evaluated in its earliest implementation stages, continuous speech recognition nonetheless markedly improved report turnaround time and proved cost-effective.
|
|
|---|
More recent articles have described systems capable of recognizing larger vocabularies with less training [4, 5]; however, these systems still require discrete speech. Only recently has computer hardware and software technology improved to the point that affordable commercial continuous speech recognition systems are now available for radiology dictation [6, 7]. By enabling the radiologist to speak at a more natural and faster pace, these systems offer a palatable alternative to discrete speech recognizers.
Such a commercial continuous speech recognition system has been implemented in the MR imaging section of our radiology department for 9 months at the time of this writing. The purpose of this article is to describe our initial experience with the system in daily clinical use and to identify its main advantages, disadvantages, and costs compared with those of more conventional transcription methods. Considering these data, we provide practical recommendations for the appropriate application of this new technology.
|
|
|---|
Before February 1997, all dictation in the MR imaging section was performed using a single conventional dictation station (VoiceWriter; Lanier, Atlanta, GA). Similar dictation stations are located throughout the department; all are linked to a central archive in which dictations are stored until they can be transcribed by a member of the department's central transcription pool. At that time, this pool consisted of nine transcriptionists responsible for more than 20,000 reports a month.
In February 1997, a commercially available continuous speech recognition hardware and software package (MedSpeak/Radiology version 1.2; International Business Machines, Armonk, NY) was installed in the MR imaging section. This system contains a 25,000-word vocabulary designed specifically for the dictation of radiology reports, with the capability to learn new words as needed. Before using the system, each radiologist underwent an individualized training session in which he or she learned the basic commands needed to dictate, edit, and sign reports. Included in this session was a process known as enrollment, in which the user trained the system to his or her voice by dictating 200 standard sentences. The session took approximately 1 hr and was conducted by a member of the hospital's information system service.
Both the transcription pool and the continuous speech recognition unit are interfaced over a local area network to the department's radiology information system (IDXrad version 9.0; IDXrad, Burlington, VT). This radiology information system allows referring clinicians to obtain access to reports from terminals located throughout the hospital as soon as they are transcribed. In addition, the radiology information system stores a variety of information about all radiologic examinations performed in the department.
Data Acquisition
The radiology information system was used retrospectively to obtain a list
of all adult MR imaging examinations performed during the 9-month period after
the installation of continuous speech recognition (February 1-October 31,
1997). Additional data acquired for each examination include the time of
examination completion, the time of report transcription, the report length,
the dictating and signing radiologists, and whether the report was transcribed
by a transcriptionist or the continuous speech recognition system. Similar
data were acquired for all adult MR imaging examinations performed within the
same 9-month period of the preceding year (February 1-October 31, 1996).
Available information regarding the cost of the equipment used in this study was also obtained. In addition, subjective information and ideas were informally collected from radiologists at weekly users' meetings and departmental meetings to gauge their acceptance of the system and are presented in this article.
Radiologists
During the 1996 study period, 4552 adult MR imaging reports were dictated
by 39 radiologists; during the 1997 study period, 5072 reports were dictated
by 33 radiologists. Because of the change of house staff between the two
periods, 44 different radiologists were involved in the study overall, with 17
radiologists (three faculty and 14 house staff) accounting for 90% of all
dictation. These radiologists represented a wide range of radiology experience
(residents, fellows, and faculty), computer experience, and speech patterns.
Two of the three neuroradiology faculty were born abroad, with English learned
as a second language.
Turnaround Time
The report generation process at our institution was divided into six
component parts (Fig. 1).
Although the most direct measure of the effect of continuous speech
recognition would be the time between report dictation (step 3) and report
finalization (step 6), the radiology information system records only the time
of examination completion (step 1), the time of report transcription (step 4),
and the time of report finalization (step 6). During the 1997 study period, a
program was instituted to expedite the signing of reports through electronic
monitoring of each radiologist's signing habits. As a result, radiologists
became much more aware of the need to edit and sign reports expeditiously.
Because this action would spuriously improve the time of report finalization
(cosigned by a staff radiologist) for 1997, we decided to measure turnaround
time from the time of examination completion (step 1) to the time of report
transcription and preliminary report availability (step 4).
|
On the basis of this analysis, turnaround time may be expressed as time1-2 + time2-3 + time3-4 (Fig. 1). Of these factors, continuous speech recognition would be expected to influence only the time between report dictation and report transcription (time3-4). To ensure that the implementation of continuous speech recognition was the only factor that would significantly influence the turnaround times between the two study periods, the other factors that affect turnaround time were examined. The time between examination completion and the delivery of images to the reading room (time1-2) was considered minimal and was therefore ignored.
The time (time2-3) between the delivery of images to the reading room and their interpretation is influenced by three factors: the frequency of interpretation sessions, the number of examinations performed on weeknights and weekends (because these off-hour studies are not interpreted officially until the next working day), and the number of examinations removed by clinicians before they can be interpreted (because these are not usually recovered or reprinted and interpreted until several days or weeks later). The frequency of interpretation sessions was constant and was ignored. The proportion of off-hours examinations remained similar, with 45% and 42% of examinations performed during off-hours in 1996 and 1997, respectively. The number of examinations removed by clinicans also remained similar, with 1.85% of examinations taken and left not interpreted for an average of 5.0 weeks in 1996, and 1.46% of examinations taken and left not interpreted for an average of 5.7 weeks in 1997. These removed examinations raised both the 1996 and 1997 mean turnaround times by approximately 14 hr. Because these values remained similar between the two study periods, we feel confident in attributing any changes in turnaround time primarily to the introduction and use of continuous speech recognition.
Accuracy
System accuracy was assessed by measuring its word recognition rate
[8]. A representative sample of
five radiologists (one faculty, one fellow, one fourth-year resident, and two
third-year residents) was chosen on the basis of their availability on the MR
service. All five radiologists are Americanborn speakers of English with no
significant accent or speech impediment, and all underwent the routine
training and enrollment process. Three radiologists had extensive system
experience (dictating several hundred reports each), whereas two radiologists
(the third-year residents) had never used the system for actual report
dictation.
The radiologists were asked to spontaneously dictate cases using continuous speech recognition while being observed by two of the authors. (The radiologists were aware of being observed.) Each was observed for a single readout session during which 15-20 cases (mean, 16) were dictated. For each radiologist, the total number of words spoken and the number of words correctly recognized by the system were recorded. Subsequently, a 124-word test report was developed that contained a variety of words commonly used in MR dictation (Appendix 1). Each radiologist was asked to read this test report three times, and a similar determination of word recognition rate was performed. All testing was performed in our designated MR reading room, with no attempt made to control background noise or interruptions.
|
For the purposes of accuracy determination, punctuation marks and formatting commands (e.g., "new paragraph") were considered words. Numbers, dates, and other compound words were counted as single words, regardless of the number of utterances it took to produce them. A word was considered correctly recognized only if it was transcribed exactly as intended. Errors caused by mispronunciations, homonyms (words with similar pronunciations but different spellings, such as "to," "too," and "two"), and out-of-vocabulary words (words not included in the computer's 25,000-word vocabulary) were considered incorrect.
Report Appearance
Data were collected from the radiology information system regarding the
length (in lines, words, and characters) of each report dictated during both
the continuous speech recognition and the conventional transcription study
periods. Additionally, 100 reports from each period were chosen randomly and
reviewed by one of the authors for word errors. These errors were classified
as errors in spacing (i.e., more than one space between words), spelling, or
word omission or duplication; these data were then scored per 1000 words of
report text.
|
|
|---|
|
We also found marked improvement in report availability after the implementation of continuous speech recognition (Fig. 3). Most notably, report availability at 24 hr increased from 10.5% to 62.5% (Table 1). The mean turnaround time declined from 87.8 hr in 1996 to 43.6 hr in 1997, a 50.3% reduction (Table 1). During the last 3 months of the 1997 study period, in which continuous speech recognition use peaked, these figures improved further, with a mean turnaround time of 32.3 hr and a 24-hr report availability of 71.1%.
|
|
During spontaneous dictation, the word recognition rate of our continuous speech recognition system for the five radiologists tested varied from 91.9% to 93.3%, with a mean accuracy of 92.7% (n = 4689 words spoken in 80 reports) (Table 2). A typical report examined during this testing is shown in Appendix 2. During reading of the standardized MR test report (Appendix 1), a higher word recognition rate was achieved, varying from 94.1% to 98.9% with a mean of 96.5% (n = 1860 words spoken in 15 readings). After implementation of continuous speech recognition, the mean report length declined by 37%, from 95 words (13.6 lines) to 60 words (7.9 lines) (Table 3). Irregular spacing increased from 0.3 (n = 9266 words) to 8.0 per 1000 words (n = 6297 words), and word omissions and duplications increased from 0.3 to 1.0 per 1000 words. However, spelling errors did decrease from 3.0 to 0.8 per 1000 words.
|
|
|
The equipment cost of our continuous speech recognition system was approximately $10,000 (Table 4). Because system maintenance and training are performed by several existing members of our hospital's information system department (occupying only a small amount of each employee's time), we could not accurately estimate these costs. Thus, in the future further cost analysis is recommended. Before the implementation of continuous speech recognition, MR report transcription occupied 5.5% of the effort of the entire transcription pool (on a per-character basis), which has an annual operating budget of $220,000. On the basis of these figures, we estimate the annual cost of human transcription for our MR imaging section to be $12,000.
|
|
|
|---|
Because continuous speech recognition transcribes reports in real-time (i.e., as they are being dictated), the third and fourth steps of the report generation process are combined, eliminating a significant amount of time (time3-4) that would have transpired with conventional dictation. A tangible benefit to our radiologists of the improvement in turnaround time has been a virtual elimination of incoming telephone calls for preliminary reports (which were cited as having been time-consuming). In contrast, a report by Seltzer et al. [9] published in 1997 examined a different manufacturer's technology. These researchers did not find continuous speech recognition to be an effective component of a total quality approach to reducing turnaround time in a large teaching hospital. This discrepancy in findings suggests that significant differences may exist in various continuous speech recognition technologies or dictating patterns or both.
Because a transcribed report is available to the dictating radiologist as the images are being read, he or she can edit and sign the report immediately [2, 4, 6], combining step 5 (when the report is dictated by a house staff radiologist) or steps 5 and 6 (when the report is dictated by faculty) into the dictation process; this can be expected to eliminate a significant amount of time (time4-5 and time5-6) [7]. The resulting timesaving was not evaluated in this study because of the confounding effect of other programs recently instituted to encourage radiologists to sign their reports expeditiously.
Although a variety of measures have been proposed to reduce turnaround time [9], continuous speech recognition has a distinct benefit in that a report can be edited and signed on-line on a case-by-case basis (not in a batch mode) while the images are still being viewed [6]. We feel it is reasonable to expect that such immediacy will lead to a reduction in major dictation and transcription errors (i.e., those that change the meaning of a report) as have occurred when reports were not edited until several hours or days after the images were viewed. Examples of such errors encountered include patient misidentification, misstatements by the radiologist (such as transposition of the words "right" and "left" during dictation), and misinterpretation of a critical word by the transcriptionist. Immediate editing and signing is considered a great benefit of the continuous speech recognition system, obviating correcting and signing reports afterward as compared with conventional transcription.
Disadvantages of Continuous Speech Recognition
The major objection given by our radiologists for not using continuous
speech recognition is an increased editing burden. The amount of editing
required by a continuous speech recognition system may be estimated by its
accuracy rate [6]. Although the
92.7% word recognition rate of our system may seem high, all five of the
radiologists examined subjectively thought that this value belied the amount
of editing required [4]. Actual
editing times were not measured, because they varied widely depending on a
radiologist's dictation style (i.e., report length and complexity),
familiarity with computer word processing, ability to type, and tolerance for
minor errors (i.e., those that do not change the meaning of a report)
[6]. In addition, most users
found the editing process to be distracting from image interpretation
[1,
10] and wanted a user
interface more closely simulating conventional dictation
[7]. This finding suggests that
any changes in practice patterns should be minimized for a smooth
transition.
Spontaneous speech is rife with what we will refer to in this article as "disfluencies" (e.g., stammering, slurring, hesitation, filled pauses, repairs, fragments, and interruptions). Human listeners (particularly transcriptionists) are accustomed to these disfluencies and can accommodate them. However, continuous computer speech recognition systems often interpret these sounds literally, resulting in reduced accuracy [8, 11, 12]. This fact is reflected in the higher word recognition rate (96.5%) achieved by our radiologists when reading the standardized test report. To minimize disfluencies during dictation, we suggest to our residents that they thoughtfully formulate the entire report before beginning to speak, rather than using a "thinking out loud" approach. Additionally, these data suggest that one should be cautious when evaluating a continuous speech recognition system using a prepared text because diminished accuracy is likely to be encountered during actual practice. Another limiting factor of the continuous speech recognition system is that a minor illness such as pharyngitis results in decreased accuracy.
With increased continuous speech recognition experience, an individual speaker's accuracy usually improves as he or she adapts his or her speech pattern to the system, and concurrently as the system adapts to the individual's respective speech pattern [8]. To eliminate this effect, two radiologists with no continuous speech recognition experience were included in our study, both of whom performed as well as the more experienced speakers.
A limitation of this study is that all five radiologists examined are American-born speakers of English with no significant accent or speech impediment. An extensive evaluation of a wide range of speakers would be useful in estimating the accuracy that could be expected by speakers with different backgrounds.
Presumably to accommodate for the increased amount of editing, our radiologists began to shorten their reports (by an average of 37%). This shortening was done primarily by minimizing the clinical history and technique sections and by limiting descriptive detail. In another study, the researchers found this forced conciseness to be beneficial [13], whereas other researchers found physicians were indignant at having to adapt their dictation style to accommodate a computer [10].
It has been suggested that a human transcriptionist performs significant functions in addition to rote transcription, including formatting and grammar correction, that continuous speech recognition systems are not yet capable of handling [6] (although transcriptionists do occasionally make spelling or omission errors, or cannot understand sections of dictated material). The subsequent transfer of this clerical burden to the radiologist could then be expected to lead to an increase in these types of errors. Our data verify this, showing a significant increase in spacing and wording errors after the switch to continuous speech recognition.
Spacing errors were primarily extra spaces between words that appear to accumulate during the editing process as radiologists unfamiliar with computer word processing delete words but not the adjacent space. Word omissions and duplications resulted from dictation that was not fluent. When questioned about their errors, most radiologists indicated they considered the errors minor and had either overlooked them or ignored them in the interest of time. The automatic spell-checking provided by the continuous speech recognition system did result in a significant decrease in spelling errors.
Costs of Continuous Speech Recognition
The costs of a continuous speech recognition installation can vary
depending on the size of the practice and the preexisting infrastructure,
including conventional dictation equipment, networks, computers, and
personnel. A detailed cost analysis should therefore be performed before such
an installation is considered. In our case, the start-up costs of the
continuous speech recognition equipment were found to be favorable compared
with those of human transcription, with the costs recovered in less than a
year. A similar time for cost recovery was recently reported by Rosenthal et
al. [7]. The maintenance of the
continuous speech recognition system was provided by in-house hospital
information system personnel, which is a cost to the hospital that was not
directly measured in this initial study.
Recommendations
Although continuous speech recognition techniques hold great promise for
the future of radiology transcription, at present the technology requires
further evolution, with a need for increased accuracy, improved grammatical
sense, and a more radiologist-friendly user interface. However, current
commercial units (such as described here) do provide distinct benefits. Use of
such a unit should be considered in environments in which one or more of the
following circumstances exist: the current turnaround time is considered too
long; there is a high demand for preliminary reports by radiologists or
clinicians; reports are short or highly standardized (i.e., screening
mammography); the radiologists involved are interested in using the system,
are familiar with or willing to learn computer word processing, and speak
clearly and succinctly, with background noise and interruptions kept to a
minimum and significant computer expertise available.
In our own MR imaging section, this technology has proven to be viable and cost-effective. Because of the improved turnaround time, continuous speech recognition has all but replaced more traditional transcription methods. Our department's transcription pool decreased from nine individuals to one. A limitation of this study is the lack of specific data on cases imaged and reported during normal daily working hours. We believe that the mean turnaround time for such reports generated using continuous speech recognition would be much lower. The continuous speech recognition technology does occasionally malfunction, and a backup transcriptionist is recommended. Templates of common reports are being implemented in our department and are sure to expedite the reporting process even further.
Acknowledgments
We thank Ralph Farr, Tom Epley, and Beth Hill for their efforts in helping
to install and maintain the equipment used in this study, and for training its
users. We thank Laverne Earles for preparation of the manuscript.
|
|
|---|
This article has been cited by other articles:
![]() |
D. L. Weiss and C. P. Langlotz Structured Reporting: Patient Care Enhancement or Productivity Nightmare? Radiology, December 1, 2008; 249(3): 739 - 747. [Full Text] [PDF] |
||||
![]() |
S McGURK, K BRAUER, T V MACFARLANE, and K A DUNCAN The effect of voice recognition software on comparative error rates in radiology reports Br. J. Radiol., October 1, 2008; 81(970): 767 - 770. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |