AJR Not a Member? Click to Join ARRS!
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Bossuyt, P. M.
Right arrow Articles by de Vet, H. C. W.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Bossuyt, P. M.
Right arrow Articles by de Vet, H. C. W.
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?
AJR 2003; 181:51-55
© American Roentgen Ray Society


Towards Complete and Accurate Reporting of Studies of Diagnostic Accuracy: The STARD Initiative

Patrick M. Bossuyt1, Johannes B. Reitsma, David E. Bruns, Constantine A. Gatsonis, Paul P. Glasziou, Les M. Irwig, Jeroen G. Lijmer, David Moher, Drummond Rennie and Henrica C. W. de Vet for the STARD group

1 Department of Clinical Epidemiology and Biostatistics, Academic Medical Center-University of Amsterdam, P. O. Box 22700, 1100 DE Amsterdam, The Netherlands.



 
Reprinted from www.consort-statement.org/stardstatement.htm. Accessed April 8, 2003.

First official version, January 2003.

Address correspondence to P. M. Bossuyt.


Abstract
Top
Abstract
Introduction
METHODS
RESULTS
DISCUSSION
References
 
OBJECTIVE. To improve the accuracy and completeness of reporting of studies of diagnostic accuracy in order to allow readers to assess the potential for bias in a study and to evaluate the generalisability of its results.

METHODS. The Standards for Reporting of Diagnostic Accuracy (STARD) steering committee searched the literature to identify publications on the appropriate conduct and reporting of diagnostic studies and extracted potential items into an extensive list. Researchers, editors, and members of professional organisations shortened this list during a 2-day consensus meeting with the goal of developing a checklist and a generic flow diagram for studies of diagnostic accuracy.

RESULTS. The search for published guidelines about diagnostic research yielded 33 previously published checklists, from which we extracted a list of 75 potential items. At the consensus meeting, participants shortened the list to a 25-item checklist, by using evidence whenever available. A prototype of a flow diagram provides information about the method of recruitment of patients, the order of test execution and the numbers of patients undergoing the test under evaluation, the reference standard, or both.

CONCLUSIONS. Evaluation of research depends on complete and accurate reporting. If medical journals adopt the checklist and the flow diagram, the quality of reporting of studies of diagnostic accuracy should improve to the advantage of clinicians, researchers, reviewers, journals, and the public.


Introduction
Top
Abstract
Introduction
METHODS
RESULTS
DISCUSSION
References
 
The world of diagnostic tests is highly dynamic. New tests are developed at a fast rate, and the technology of existing tests is continuously being improved. Exaggerated and biased results from poorly designed and reported diagnostic studies can trigger their premature dissemination and lead physicians into making incorrect treatment decisions. A rigorous evaluation process of diagnostic tests before introduction into clinical practice could not only reduce the number of unwanted clinical consequences related to misleading estimates of test accuracy, but also limit health care costs by preventing unnecessary testing. Studies to determine the diagnostic accuracy of a test are a vital part in this evaluation process [13].Go



View larger version (29K):
[in this window]
[in a new window]
[as a PowerPoint slide]
 
Fig. 1. Prototypical flow diagram of a diagnostic accuracy study.

 

In studies of diagnostic accuracy, the outcomes from one or more tests under evaluation are compared with outcomes from the reference standard, both measured in individuals who are suspected of having the condition of interest. The term test refers to any method for obtaining additional information on a patient's health status. It includes information from history and physical examination, laboratory tests, imaging tests, function tests, and histopathology. The condition of interest or target condition can refer to a particular disease or to any other identifiable condition that may prompt clinical actions, such as further diagnostic testing, or the initiation, modification or termination of treatment. In this framework, the reference standard is considered to be the best available method for establishing the presence or absence of the condition of interest. The reference standard can be a single method, or a combination of methods, to establish the presence of the target condition. It can include laboratory tests, imaging tests, and pathology, but also dedicated clinical follow-up of participants. The term accuracy refers to the amount of agreement between the information from the test under evaluation, referred to as the index test, and the reference standard. Diagnostic accuracy can be expressed in many ways, including sensitivity and specificity, likelihood ratios, diagnostic odds ratio, and the area under a receiver operator characteristic (ROC) curve [46].

Several potential threats to the internal and external validity of a study of diagnostic accuracy exist. A survey of studies of diagnostic accuracy published in four major medical journals between 1978 and 1993 revealed that the methodological quality was mediocre at best [7]. However, assessments were hampered because many reports lacked information on key elements of design, conduct, and analysis of diagnostic studies [7]. The absence of essential information about the design and conduct of diagnostic studies has been confirmed by authors of meta-analyses [8, 9]. As in any other type of research, flaws in study design can lead to biased results. One report showed that diagnostic studies with specific design features are associated with biased, optimistic estimates of diagnostic accuracy compared with studies without such deficiencies [10].

At the 1999 Cochrane Colloquium meeting in Rome, the Cochrane Diagnostic and Screening Test Methods Working Group discussed the low methodological quality and substandard reporting of diagnostic test evaluations. The Working Group felt that the first step towards correcting these problems was to improve the quality of reporting of diagnostic studies. Following the successful CONSORT (consolidated standards of reporting trials) initiative [1113], the Working Group aimed to develop a checklist of items that should be included in the report of a study of diagnostic accuracy.

The objective of the Standards for Reporting of Diagnostic Accuracy (STARD) initiative is to improve the quality of reporting of studies of diagnostic accuracy. Complete and accurate reporting allows the reader to detect the potential for bias in the study (internal validity) and to assess the generalisability and applicability of the results (external validity).


METHODS
Top
Abstract
Introduction
METHODS
RESULTS
DISCUSSION
References
 
The STARD steering committee (see appendix for membership) started with an extensive search to identify publications on the conduct and reporting of diagnostic studies. This search included MEDLINE, EMBASE, BIOSIS, and the methodological database from the Cochrane Collaboration up to July 2000. In addition, the members of the steering committee examined reference lists of retrieved articles, searched personal files, and contacted other experts in the field of diagnostic research. They reviewed all relevant publications and extracted an extended list of potential checklist items.

Subsequently, the STARD steering committee convened a 2-day consensus meeting for invited experts from the following interest groups: researchers, editors, methodologists, and professional organisations. The aim of the conference was to reduce the extended list of potential items, as appropriate, and to discuss the optimum format and phrasing of the checklist. The selection of items to retain was based on evidence whenever possible.

The meeting format consisted of a mixture of small group sessions and plenary sessions. Each small group focused on a group of related items of the list. The suggestions of the small groups were then discussed in plenary sessions. Overnight, a first draft of the STARD checklist was assembled on the basis of suggestions from the small groups and additional remarks from the plenary sessions. All meeting attendees discussed this version the next day and made additional changes. The members of the STARD group could suggest further changes through a later round of comments by email.

Potential users field-tested the conference version of the checklist and flow diagram, and additional comments were collected. This version was placed on the CONSORT website with a call for comments. The STARD steering committee discussed all comments and assembled the final checklist.


RESULTS
Top
Abstract
Introduction
METHODS
RESULTS
DISCUSSION
References
 
The search for published guidelines for diagnostic research yielded 33 checklists. Based on these published guidelines and on input of steering and STARD group members, the steering committee assembled a list of 75 items. During the consensus meeting on 16-17 September 2000, participants consolidated and eliminated items to form the 25-item checklist. Conference members made major revisions to the phrasing and format of the checklist.

The STARD group received valuable comments and remarks during the various stages of evaluation after the conference, which resulted in the version of the STARD checklist that appears in Table 1.


View this table:
[in this window]
[in a new window]

 
Table 1. STARD checklist for the reporting of studies of diagnostic accuracy.

 


DISCUSSION
Top
Abstract
Introduction
METHODS
RESULTS
DISCUSSION
References
 
The purpose of the STARD initiative is to improve the quality of reporting of diagnostic studies. The items in the checklist and the flowchart can help authors to describe essential elements of the design and conduct of their study, the execution of tests, and their results.

We arranged the items under the usual headings of a medical research article, but this is not intended to dictate the order in which they have to appear within an article.

The guiding principle in the development of the STARD checklist was to select items that would help readers to judge the potential for bias in the study and to appraise the applicability of the findings. Two other general considerations shaped the content and format of the checklist. First, the STARD group believes that one general checklist for studies of diagnostic accuracy, rather than different checklists for each speciality, is likely to be more widely disseminated and perhaps accepted by authors, peer reviewers, and journal editors. Although the evaluation of an imaging test differs from that of a test in the laboratory, we felt that these differences were more of degree than in kind. The second consideration was the development of a checklist specifically aimed at studies of diagnostic accuracy. We did not include general issues in the reporting of research findings, such as the recommendations contained in the Uniform Requirements for Manuscripts Submitted to Biomedical Journals [14].

Wherever possible, the STARD group based the decision to include an item on evidence linking the item to biased estimates (internal validity) or to variations in measures of diagnostic accuracy (external validity). The evidence varied from narrative articles that explained theoretical principles and papers that presented the results from statistical modelling to empirical evidence derived from diagnostic studies. For several items, the evidence was rather limited.

A separate background document explains the meaning and rationale of each item and briefly summarises the type and amount of evidence [15]. This background document should enhance the use, understanding and dissemination of the STARD checklist.

The STARD group put considerable effort into the development of a flow diagram for diagnostic studies. A flow diagram has the potential to communicate vital information about the design of a study and the flow of participants in a transparent manner [16]. A comparable flow diagram has become an essential element in the CONSORT standards for reporting of randomised trials [12, 16]. The flow diagram could be even more essential in diagnostic studies, in view of the variety of designs employed in diagnostic research. Flow diagrams in the reports of studies of diagnostic accuracy indicate the process of sampling and selecting participants (external validity), the flow of participants in relation to the timing and outcomes of tests, the number of participants who do not receive either the index test or the reference standard, or both (potential for verification bias [1719], and the number of patients at each stage of the study, which provides the correct denominator for proportions (internal consistency).

The STARD group plans to measure the impact of the statement on the quality of published reports on diagnostic accuracy using a before-and-after assessment [l3]. Updates of the STARD initiative's documents will be provided when new evidence on sources of bias or variability becomes available. We welcome any comments, whether on content or form, to improve the current version.


Acknowledgments
 
Members of the STARD Steering Committee

Patrick Bossuyt

Academic Medical Center, Dept. of Clinical Epidemiology, Amsterdam, The Netherlands

Constantine Gatsonis

Brown University, Centre for Statistical Sciences, Providence, RI, United States of America

Les Irwig

Screening and Test Evaluation Program, School of Public Health, University of Sydney, Australia

David Moher

Chalmers Research, Group Ottawa, ON, Canada

Riekie de Vet

Free University, Institute for Research in Extramural Medicine, Amsterdam, The Netherlands

David Bruns

Clinical Chemistry, Charlottesville, VA, United States of America

Paul Glasziou

Mayne Medical School, Dept. of Social & Preventive Medicine, Herston, Australia

Jeroen Lijmer

Academic Medical Center, Dept. of Clinical Epidemiology, Amsterdam, The Netherlands

Drummond Rennie

Journal of the American Medical Association, Chicago, IL, United States of America

Members of the STARD Group

Doug Altman

Institute of Health Sciences, Centre for Statistics in Medicine, Oxford, United Kingdom

Colin Begg

Memorial Sloan-Kettering Cancer Center, Dept. Epidemiology & Biostatistics, New York, NY, United States of America

Harry Büller

Academic Medical Center, Dept. of Vascular Medicine, Amsterdam, The Netherlands

Frank Davidoff

Annals of Internal Medicine, Philadelphia, PA, United States of America

Paul Dieppe

Dept. Social Medicine, University of Bristol, Bristol, United Kingdom

Rijk van Ginkel

Academic Medical Center, Dept. of Clinical Epidemiology, Amsterdam, The Netherlands

Gordon Guyatt

McMaster University, Clinical Epidemiology and Biostatistics, Hamilton, ON, Canada

Richard Horton

The Lancet, London, United Kingdom

Stuart Barton

British Medical Journal, BMA House, London, United Kingdom

William Black

Dartmouth Hitchcock Medical Center, Dept. of Radiology, Lebanon, NH, United States of America

Gregory Campbell

US FDA, Center for Devices and Radiological Health, Rockville, MD, United States of America

Jon Deeks

Institute of Health Sciences, Centre for Statistics in Medicine, Oxford, United Kingdom

Kenneth Fleming

John Radcliffe Hospital, Oxford, United Kingdom

Afina Glas

Academic Medical Center, Dept. of Clinical Epidemiology, Amsterdam, The Netherlands

James Hanley

McGill University, Dept. Epidemiology & Biostatistics, Montreal, QC, Canada

Myriam Hunink

Erasmus Medical Center, Dept. Epidemiology & Biostatistics, Rotterdam, The Netherlands

Jos Kleijnen

NHS Centre for Reviews and Dissemination, York, United Kingdom

Erik Magid

Amager Hospital, Dept. Clinical Biochemistry, Copenhagen, Denmark

Matthew McQueen

Hamilton Civic Hospitals, Dept. of Laboratory Medicine, Hamilton, ON, Canada

John Overbeke

Nederlands Tijdschrift voor Geneeskunde, Amsterdam, The Netherlands

Anthony Proto

Radiology, Editorial Office, Richmond, VA, United States of America

David Sackett

Trout Research and Education Centre, Irish Lake, ON, Canada

Harold Sox

Annals of Internal Medicine, Philadelphia, PA, United States of America

Stephan Walter

McMaster University, Clinical Epidemiology and Biostatistics, Hamilton, ON, Canada

Andre Knottnerus

Maastricht University, Netherlands School of Primary Care Research, Maastricht, The Netherlands

Barbara McNeil

Harvard Medical School, Dept. of Health Care Policy, Boston, MA, United States of America

Andrew Onderdonk

Channing Laboratory, Boston, MA, United States of America

Christopher Price

St Bartholomew's - Royal London School of Medicine and Dentistry, London, United Kingdom

Hans Reitsma

Academic Medical Center, Dept. of Clinical Epidemiology, Amsterdam, The Netherlands

Gerard Sanders

Academic Medical Center, Dept. of Clinical Chemistry, Amsterdam, The Netherlands

Sharon Straus

Mt. Sinai Hospital, Toronto, ON, Canada


References
Top
Abstract
Introduction
METHODS
RESULTS
DISCUSSION
References
 

  1. Guyatt GH, Tugwell PX, Feeny DH, Haynes RB, Drummond M. A framework for clinical evaluation of diagnostic technologies. Can Med Assoc J 1986;134587 –594.
  2. Fryback DG, Thornbury JR. The efficacy of diagnostic imaging.Med Decis Making 1991;11:88 –94.
  3. Kent DL, Larson EB. Disease, level of impact, and quality of research methods. Three dimensions of clinical efficacy assessment applied to magnetic resonance imaging. Invest Radiol 1992;27 : 245–254.[Medline]
  4. Griner PF, Mayewski RJ, Mushlin AI, Greenland P. Selection and interpretation of diagnostic tests and procedures. Principles and applications. Ann Intern Med1981; 94:557 –592.
  5. Sackett DL, Haynes RB, Guyatt GH, Tugwell P. The selection of diagnostic tests. In: Sackett D, editor. Clinical epidemiology, 2nd edn. Boston/Toronto/London: Little, Brown and Company, 1991:47 –57.
  6. Metz CE. Basic principles of ROC analysis. Semin Nucl Med 1978;8:283 –298.[Medline]
  7. Reid MC, Lachs MS, Feinstein AR. Use of methodological standards in diagnostic test research. Getting better but still not good.JAMA 1995;274:645 –651.[Abstract]
  8. Nelemans PJ, Leiner T, de Vet HCW, van Engelshoven JMA. Peripheral arterial disease: Meta-analysis of the diagnostic performance of MR angiography. Radiology2000; 217:105 –114.[Abstract/Free Full Text]
  9. Devries SO, Hunink MGM, Polak JF. Summary receiver operating characteristic curves as a technique for meta-analysis of the diagnostic performance of duplex ultrasonography in peripheral arterial disease.Acad Radiol 1996;3:361 –369.[Medline]
  10. Lijmer JG, Mol BW, Heisterkamp S, Bonsel GJ, Prins MH, van der Meulen JH, et al. Empirical evidence of design-related bias in studies of diagnostic tests. JAMA1999; 282:1061 –6.[Abstract/Free Full Text]
  11. Begg C, Cho M, Eastwood S, Horton R, Moher D, Olkin I, et al. Improving the quality of reporting of randomized controlled trials. The CONSORT statement. JAMA1996; 276:637 –9.[Medline]
  12. Moher D, Schulz KF, Altman D. The CONSORT statement: Revised recommendations for improving the quality of reports of parallel-group randomized trials. JAMA2001; 285:1987 –1991.[Abstract/Free Full Text]
  13. Moher D, Jones A, Lepage L. Use of the CONSORT statement and quality of reports of randomized trials. A comparative before-and-after evaluation. JAMA 2001;285:1992 –1995.[Abstract/Free Full Text]
  14. International Committee of Medical Journal Editors. Uniform Requirements for manuscripts submitted to biomedical journals. JAMA. 1997;277:927–934. Also available at: http://www.acponline.org.
  15. Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig LM, Moher D, Rennie D, de Vet HC, Lijmer JG. The STARD statement for reporting studies of diagnostic accuracy: explanation and elaboration. Clin Chem 2003;49:7 –18.[Abstract/Free Full Text]
  16. Egger M, Juni P, Barlett C. Value of flow diagrams in reports of randomized controlled trials. JAMA2001; 285:1996 –1999.[Abstract/Free Full Text]
  17. Knottnerus JA. The effects of disease verification and referral on the relationship between symptoms and diseases. Med Decis Making 1987;7139 –148.
  18. Panzer RJ, Suchman AL, Griner PF. Workup bias in prediction research. Med Decis Making 1987;7115 –119.
  19. Begg CB. Biases in the assessment of diagnostic tests. Stat Med 1987;6:411 –423.[Medline]

Add to CiteULike CiteULike   Add to Complore Complore   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati    What's this?


This article has been cited by other articles:


Home page
Am. J. Roentgenol.Home page
A. B. Meijer, Y. L. O, J. Geleijns, and L. J. M. Kroft
Meta-Analysis of 40- and 64-MDCT Angiography for Assessing Coronary Artery Stenosis
Am. J. Roentgenol., December 1, 2008; 191(6): 1667 - 1675.
[Abstract] [Full Text] [PDF]


Home page
Age AgeingHome page
D. Oliver, A. Papaioannou, L. Giangregorio, L. Thabane, K. Reizgys, and G. Foster
A systematic review and meta-analysis of studies using the STRATIFY tool for prediction of falls in hospital patients: how well does it work?
Age Ageing, November 1, 2008; 37(6): 621 - 627.
[Abstract] [Full Text] [PDF]


Home page
RadiologyHome page
A. van Randen, S. Bipat, A. H. Zwinderman, D. T. Ubbink, J. Stoker, and M. A. Boermeester
Acute Appendicitis: Meta-Analysis of Diagnostic Performance of CT and Graded Compression US Related to Prevalence of Disease
Radiology, October 1, 2008; 249(1): 97 - 106.
[Abstract] [Full Text] [PDF]


Home page
JNMHome page
J. W. Fletcher, S. M. Kymes, M. Gould, N. Alazraki, R. E. Coleman, V. J. Lowe, C. Marn, G. Segall, L. A. Thet, K. Lee, et al.
A Comparison of the Diagnostic Accuracy of 18F-FDG PET and CT in the Characterization of Solitary Pulmonary Nodules
J. Nucl. Med., February 1, 2008; 49(2): 179 - 185.
[Abstract] [Full Text] [PDF]


Home page
Am. J. Roentgenol.Home page
X. Liu, X. Zhao, J. Huang, C. J. Francois, D. Tuite, X. Bi, D. Li, and J. C. Carr
Comparison of 3D Free-Breathing Coronary MR Angiography and 64-MDCT Angiography for Detection of Coronary Stenosis in Patients with High Calcium Scores
Am. J. Roentgenol., December 1, 2007; 189(6): 1326 - 1332.
[Abstract] [Full Text] [PDF]


Home page
Am. J. Roentgenol.Home page
S. Suzuki, S. Furui, K. Okinaga, T. Sakamoto, J. Murata, A. Furukawa, and Y. Ohnaka
Differentiation of Femoral Versus Inguinal Hernia: CT Findings
Am. J. Roentgenol., August 1, 2007; 189(2): W78 - W83.
[Abstract] [Full Text] [PDF]


Home page
Clin TrialsHome page
S. M Kymes, K. Lee, J. W Fletcher, and SNAP (CSP 027) Study Group
Assessing diagnostic accuracy and the clinical value of positron emission tomography imaging in patients with solitary pulmonary nodules (SNAP)
Clinical Trials, February 1, 2006; 3(1): 31 - 42.
[Abstract] [PDF]


Home page
Am J EpidemiolHome page
L. Tooth, R. Ware, C. Bain, D. M. Purdie, and A. Dobson
Quality of Reporting of Observational Longitudinal Research
Am. J. Epidemiol., February 1, 2005; 161(3): 280 - 288.
[Abstract] [Full Text] [PDF]


Home page
Am. J. Roentgenol.Home page
M. T. Zivian, R. Gershater, W. K. Erly, B. C. Ashdown, and R. W. Lucio
General Radiologists' Diagnostic Accuracy: Incomplete Presentation of Data Casts Doubt on Study's Conclusions
Am. J. Roentgenol., February 1, 2005; 184(2): 697 - 699.
[Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Bossuyt, P. M.
Right arrow Articles by de Vet, H. C. W.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Bossuyt, P. M.
Right arrow Articles by de Vet, H. C. W.
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS