AJR Get Involved! Great Benefits! Join ARRS
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Sistrom, C. L.
Right arrow Articles by Mergo, P. J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Sistrom, C. L.
Right arrow Articles by Mergo, P. J.
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?
AJR 2000; 174:1241-1244
© American Roentgen Ray Society


Computers in Radiology

A Simple Method for Obtaining Original Data from Published Graphs and Plots

Chris L. Sistrom1 and Patricia J. Mergo

1 Both authors: Department of Radiology, University of Florida College of Medicine, P. O. Box 100374, Gainesville, FL 32610-0374.

Received August 3, 1999; accepted after revision September 29, 1999.

 
Address correspondence to C. L. Sistrom.


Abstract
Top
Abstract
Introduction
Scanning the Graph or...
Recording Locations of Data...
Conversion of Raw Value...
Assessment of Method Error
Discussion
References
 
OBJECTIVE. To describe a method for deriving original data values from scanned images of graphs and scatterplots published in the medical literature.

CONCLUSION. The procedure is simple, reproducible, and relatively error free (when performed carefully). This method is useful in converting published graphic material into numeric data for various uses when the original data are unavailable directly from the authors.


Introduction
Top
Abstract
Introduction
Scanning the Graph or...
Recording Locations of Data...
Conversion of Raw Value...
Assessment of Method Error
Discussion
References
 
There are many situations in which it is desirable to obtain the original data values that were used to create graphs or scatterplots published in the medical literature. These situations include meta-analysis, production of slide presentations, preparation of material for computer-based teaching, and writing textbooks or review articles. Having the original data values rather than an image of the graph or plot is advantageous for several reasons. These reasons include the ability to manipulate the data, combine the data with other sources for meta-analysis, store the data more compactly, and gain flexibility in reproduction method and style. Graphs or plots from different sources can be reproduced in presentations and texts in a uniform format, enhancing continuity and readability. Photographic methods are being replaced by digital means of transmission and reproduction, and graphs or plots from articles are increasingly being scanned into digital form. Our method converts scanned images of plots or graphics into data values, which take much less storage space than a high-resolution picture file. The only specialized piece of software needed is NIH (National Institutes of Health) Image, which can be down-loaded free of charge from the Internet in both IBM-compatible (www.scioncorp.com) and Macintosh-compatible (rsb.info.nih.gov/nihimage or www.scioncorp.com) formats.

The most accurate and desirable means of obtaining original data is by contacting the author of the article or chapter in question; however, this may be impractical or impossible, especially if the material is somewhat dated. In our experience, many authors cannot readily obtain the original data. If the method we describe is used, readers should be aware of the possibility of error between derived and original values of the data. As we show, with careful attention to detail, the magnitude of such errors should be rather small. Strict attention to relevant copyright laws and general tenants of professional courtesy are important in any use or reproduction of published scientific material.


Scanning the Graph or Plot
Top
Abstract
Introduction
Scanning the Graph or...
Recording Locations of Data...
Conversion of Raw Value...
Assessment of Method Error
Discussion
References
 
Almost all consumer-grade flatbed scanners can produce images of sufficient quality and resolution to yield accurate results. We used a CanoScan 600 scanner (Canon Computer Systems, Costa Mesa, CA). An original copy of the journal or book is best, though a high-quality photocopy made with the page perfectly flat on the copy glass can be used. The same care should be used in positioning the material on the scanner. Additionally, the abscissa and ordinate of the graph or plot must be parallel to the x-axis and y-axis of the scanner. The scan is made in gray scale and the dots per inch (DPI) are adjusted so that the resulting image will have at least 1000 pixels in each axis covering the graph or plot being analyzed. We used 600 DPI. Once an adequate scan of the graph or plot has been obtained, it is saved in a tagged image file (.TIF) format (gray scale of 256 shades).


Recording Locations of Data Points and Axes
Top
Abstract
Introduction
Scanning the Graph or...
Recording Locations of Data...
Conversion of Raw Value...
Assessment of Method Error
Discussion
References
 
A copy of NIH Image (either the PC or Macintosh version) is required to perform this part of the procedure. Installation is simply done by running the provided setup program. In addition, the Microsoft DirectX (Microsoft, Redmond, WA) drivers must be installed on your computer. Often, these are already present and registered. If not, they can be obtained from the Microsoft Web site (www.microsoft.com) and installed. It is best to have selected a high-resolution setting in your graphics card software (1024 x 760 or more). Also, the mouse sensitivity, speed, and acceleration should be set to low values to allow accurate and reproducible cursor positioning. Figure 1 is a representation of a scanned scatterplot with original and scanned coordinate systems shown. The conversion of graph or plot points to pixel values is performed with the following steps.



View larger version (21K):
[in this window]
[in a new window]
[as a PowerPoint slide]
 
Fig. 1. —Graph shows scanned plot with scan and plot coordinate systems labeled..TXT file produced by NIH Image will contain numbers in scan coordinate spaces (X and Y pixel values). Spreadsheet operations serve to convert these into plot coordinates (Abs = abscissa, Ord = ordinate, Org = origin, Max = maximum), which are original data values.

 

Start NIH Image. Open the.TIF file made when the graph or plot was scanned. The image can be scaled to fit your screen by setting the checkbox under the edit menu. Inverting the image (command found under the Options menu) is useful because NIH Image places a single pixel black dot at each location measured and this can only be seen with the original image inverted (white on black). Under the Analyze menu, select Options, check X-Y Center, and check Wand Auto-Measure. Then, under the Analyze menu, select Set Scale and Set Units = Pixels. On the toolbar, select Wand (cross with a tiny circle in the middle). Click once at the graph or plot origin (AbsOrg, OrdOrg in Fig. 1). Click once at the highest value of the abscissa and the lowest value of the ordinate (AbsMax, OrdOrg in Fig. 1). Click once at the lowest value of the abscissa and the highest value of the ordinate (AbsOrg, OrdMax in Fig. 1). Click once at every point along the graph or each plotted value (1, 2, 3,... i in Fig. 1). The Info window will show the coordinates of the points as they are registered and Count will show the number of points stored. Under the File menu, select Export and check Measurements. Set Save as File Type = Text. Specify a meaningful file name (include.TXT at the end) and directory location for storage. Click Save. The resulting text file may be examined in a text editor before closing NIH Image to be sure it contains the numbers you obtained. The first two columns of Table 1 show the first few sets of raw pixel values contained in such a file.


View this table:
[in this window]
[in a new window]

 
TABLE 1 Spreadsheet Operations Required to Convert Raw Pixel Location Values into Original Data Values

 


Conversion of Raw Value to Original Data Values
Top
Abstract
Introduction
Scanning the Graph or...
Recording Locations of Data...
Conversion of Raw Value...
Assessment of Method Error
Discussion
References
 
This part of the process may be performed using any commercially available spreadsheet program. We used Lotus 123 for Windows (Lotus Development, Cambridge, MA) to perform the following steps.

Start your spreadsheet program. Import the.TXT file produced by NIH Image. It should form two columns of numbers with the left-most column (usually A) containing the raw pixel values representing the abscissa (x-axis) and the right-sided column (usually B) containing the raw pixel values for the ordinate (y-axis). Check that A1 and A3 are nearly equal. Check that B1 and B2 are nearly equal. This is to insure that the original was not tilted or distorted during scanning. In the first row of the third column (usually C) place the following formula:

(A1-value of A1)/[(value of A2-value of A1)/abscissa span]+lowest value of the abscissa

In the first row of the fourth column (usually D) place the following formula:

(value of B1-B1)/[(value of B1-value of B3)/ordinate span]+lowest value of the ordinate

Copy C1 and paste it into the rest of the C column. Copy D1 and paste it into the rest of the D column. Check the formulae in C2 through Ci to insure that the numbers are incremented correctly in the formula variable (A2 through Ai). Check the formulae in D2 through Di to insure that the numbers are incremented correctly in the formula variable (B2 through Bi). Column C should now contain the original abscissa values. Column D should now contain the original ordinate values. Check that C1 equals abscissa origin, C2 equals abscissa maximum, D1 equals ordinate origin, and D3 equals ordinate maximum. If they do not, you have entered the formulae incorrectly and need to correct them. Copy columsn C and D and paste into any application desired. It may be necessary to Export them as text values for Import into some applications (such as your spreadsheet). This is because the Copy and Paste operation with some combinations of spreadsheets and other applications actually transfers the formulae rather than the calculated values. Table 1 lists raw pixel values, formulae, and calculated values from sample data. In this example, the abscissa went from 20 to 100 (span, 80) while the ordinate went from 0 to 300 (span, 300). Therefore, the calculated values for the abscissa were corrected by adding 20 and no correction was made to ordinate values. Furthermore, A1 equals A3 and B1 equals B2, indicating a perfectly straight scan.


Assessment of Method Error
Top
Abstract
Introduction
Scanning the Graph or...
Recording Locations of Data...
Conversion of Raw Value...
Assessment of Method Error
Discussion
References
 
We applied the method described in this article to a scatterplot already published [1]. This scatterplot (Fig. 2) depicted the weight in pounds versus the age in years for 227 patients being studied to determine factors influencing the prevertebral soft-tissue thickness on cervical spine radiographs. We reproduced the scatterplot (Fig. 3) by graphing values for age and weight derived from the scanned image by means of the steps already described. Microsoft PowerPoint (Microsoft) was used to generate Figure 3, whereas Harvard Graphics for Windows (Software Publishing, Nashua, NH) was used to make the original (Fig. 2). Note that distribution of data is indistinguishable though the axis scaling, labeling, and data point markers are different.



View larger version (15K):
[in this window]
[in a new window]
[as a PowerPoint slide]
 
Fig. 2. —Reproduction of image file produced by scanning original scatterplot used to test our method. Weight of 227 patients was plotted against age in study of prevertebral soft-tissue thickness on cervical spine radiographs.

 


View larger version (15K):
[in this window]
[in a new window]
[as a PowerPoint slide]
 
Fig. 3. —Scatterplot produced after importing derived data values for weight and age into graphing program (PowerPoint; Microsoft, Redmond, WA). Note that distribution of data points is visually indistinguishable from original material in Figure 2.

 

We were able to quantitatively compare the results derived by our method with the original data values that were still available to us. This comparison followed the method developed by Bland and Altman [2] for comparing measurements made with two instruments of the same quantity. In their analyses, values obtained with a reference instrument (ref) are compared with those obtained with a test instrument (test). This comparison is done by plotting ref-test on the ordinate and test+ref/2 on the abscissa. Lines representing the mean of ref-test and ±2 standard deviations are plotted as well to show central tendency of error and the limits of agreement.

For purposes of this analysis, the original data for weight and age were treated as if they represented reference values, and the derived values for age and weight were treated in the same manner as test instrument results. Rather than plotting the average of derived and original values on the abscissa, we simply plotted the original values. When we generated error plots for age and weight, we found no errors for age (Fig. 4). Errors for weight only exceeded 1 lb (0.373 kg) for a single data point. The limits of agreement for weight were ±1 lb (0.373 kg) (Fig. 5).



View larger version (11K):
[in this window]
[in a new window]
[as a PowerPoint slide]
 
Fig. 4. —Graph shows error terms for patient age (in years) derived by conversion of scanned plot (Fig. 2) then compared with original data. Difference between original and derived ages (orig—deriv) is plotted against original age values. Derived age equaled original age for every point.

 


View larger version (13K):
[in this window]
[in a new window]
[as a PowerPoint slide]
 
Fig. 5. —Graph shows error terms for subject weight (in pounds) derived by conversion of scanned plot in Figure 2 then compared with original data. Difference between original and derived weights (orig—deriv) is plotted against original weight values. Mean error (solid line) and limits of agreement (±2 SD, dotted line) are shown.

 

To test the effect of improper scanning technique, we deliberately rotated the original plot by 3° (clockwise) before making a second scan. All other parameters were the same as before. We then performed the same steps to derive the original data values and the same error analysis. When the derived values for age and weight were plotted and visually compared with the original scan, it was hard to distinguish any difference. However, quantitative analysis revealed that the mean error for age was -2 years with limits of agreement of -1 to -3 years. The mean error for weight was +6 lb (2.238 kg) and the limits of agreement were 2-10.5 lb (0.746-3.917 kg). This result emphasizes the importance of having the original material (or a high-quality copy) and performing the scan with the plot or graph perfectly aligned with the scanner x- and y-axes.


Discussion
Top
Abstract
Introduction
Scanning the Graph or...
Recording Locations of Data...
Conversion of Raw Value...
Assessment of Method Error
Discussion
References
 
The method we describe for deriving original data values from published graphs or plots was initially developed by one of the authors during preparation of a textbook [3] that included many such illustrations. In some cases, regressions or other equations based on these data were listed in the text of the paper. When these functions were recalculated from the derived values, the results were almost always in very close numeric agreement with those published. Using the same graphing program to redraw all the plots and graphs with the derived data enabled the production of illustrations with a consistent appearance. Careful visual comparison of these illustrations with the original material insured that the reproductions were accurate and complete. In some instances, data from different sources could be combined into single graphs or plots, thus visually showing relationships in ways that were not otherwise possible. Permission to reproduce the figures was requested from the corresponding author and journal publisher in all cases. We have completely detailed the method in stepwise fashion with illustrative sample data (Table 1) so that readers may perform it themselves.


References
Top
Abstract
Introduction
Scanning the Graph or...
Recording Locations of Data...
Conversion of Raw Value...
Assessment of Method Error
Discussion
References
 

  1. Sistrom CL, Southall P, Peddada SD, Shaffer HH. Factors affecting the thickness of cervical prevertebral soft tissue. Skeletal Radiol 1993;22:167 -172[Medline]
  2. Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet 1986;1:307 -310[Medline]
  3. Keats TE, Sistrom CL. Atlas of Radiologic measurement, 7th ed. Philadelphia: Mosby Year Book, 2000

Add to CiteULike CiteULike   Add to Complore Complore   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati    What's this?



This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Sistrom, C. L.
Right arrow Articles by Mergo, P. J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Sistrom, C. L.
Right arrow Articles by Mergo, P. J.
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS