-
PDF
- Split View
-
Views
-
Cite
Cite
P Joanne Cornbleet, Nathan Gochman, When Linear Regression Gets Out of Line: Finding the Fix, Clinical Chemistry, Volume 66, Issue 9, September 2020, Pages 1238–1239, https://doi.org/10.1093/clinchem/hvaa163
- Share Icon Share
Featured Article: Cornbleet PJ, Gochman N. Incorrect least-squares regression coefficients in method-comparison analysis. Clin Chem 1979;25:432-8.c
More than 40 years ago, as a (much younger!) clinical pathology resident at University of California, San Diego, I was looking for a research project during my clinical chemistry rotation. A college course in statistics had stimulated my interest in data evaluation. I noted medical journal articles that were critical of common statistical calculations that depended on a gaussian or “normal” distribution (e.g., t-tests or reference ranges for laboratory values). To obtain accurate results, the authors of one article recommended the use of nonparametric methods, which were independent of data distribution requirements (1).
My resident training years were a time of laboratory automation. Implementation required method comparison studies, and linear regression was commonly used for data analysis. If a linear relationship between the test and the reference method could be defined, then the slope and the intercept of this line provided estimates of the proportional and constant error between the 2 methods.
But least squares regression analysis of actual laboratory comparison data was sometimes confusing. Analytes with closely clustered data values or with imprecise manual methods as the reference method (X variable) seemed to give spurious results. In these situations, when the X (reference) and Y (test) results were swapped as the independent and dependent variables in least squares regression (reverse regression), 2 different lines were obtained. Were underlying assumptions for least squares regression analysis violated?
Three factors commonly seen in laboratory method comparisons appeared to be suspect: (a) imprecision in the measurement of the reference method or X variable, (b) proportional (CV) rather than constant measurement error in the test method or Y variable, and (c) the large influence of outlier data points. I decided to explore nonparametric statistical methods to see if they could help determine the correct linear relationship when these confounding factors were present. A search of the literature revealed 3 alternative methods that might be useful in addressing this problem: methods proposed by Deming (2,), Mandel (3,), and Bartlett (4).
With the help of a computer program (Minitab®) installed on our laboratory mainframe, I generated X and Y comparison data (both gaussian and log-gaussian) with a known slope (0.9) and intercept (0) and then subjected it to both constant and proportional error.
The results from the study discussed in this article showed that the method of Deming (2,) was the most useful to obtain the expected slope and intercept. The data also indicated when the least squares method was adequate vs when the method of Deming was preferable. Within the range of measurement error likely to be encountered in laboratory tests (CV up to 20%), the least squares method still calculated the correct line, even with proportional measurement error. For closely clustered data, significant error in least squares slope estimation occurred when the ratio of the SD of measurement of an X value close to the mean of the data set to the SD of the entire X data set exceeded 0.2. In these cases, the method of Deming should be used. Finally, errors in least squares coefficients attributable to outliers could be avoided by eliminating data points with vertical distance from the regression line that exceeded 4 times Sy.x, as originally proposed by Draper and Smith (5). In least squares linear regression, Sy.x is the SD of the estimate (also known as the SD of the regression residuals), obtained by calculating the SD of the distances between the observed Y values and those predicted by the regression line.
Armed with criteria to detect when least squares regression was invalid and with a remedial method, Dr. Gochman and I applied these findings to laboratory comparison data for which measurements of X and Y were both subject to error. It was easy to find examples of 2 disparate regression lines depending on which data set was used as the independent variable. In these cases, omission of identified outliers or the use of the Deming method provided a solution to this dilemma, yielding 1 regression line between X and Y, no matter which was used as the independent variable. The Deming method still survives as the method most used for laboratory comparison data when the least squares method is inaccurate.
After my residency in laboratory medicine, my career took a different path as head of the hematology laboratory at Stanford University Medical Center. But I continued to assess published data with an enlightened view toward appropriate statistical analysis and experimental design.
Footnotes
This article has been cited more than 525 times since publication.
Author Contributions
All authors confirmed they have contributed to the intellectual content of this paper and have met the following 4 requirements: (a) significant contributions to the conception and design, acquisition of data, or analysis and interpretation of data; (b) drafting or revising the article for intellectual content; (c) final approval of the published article; and (d) agreement to be accountable for all aspects of the article thus ensuring that questions related to the accuracy or integrity of any part of the article are appropriately investigated and resolved.
Authors' Disclosures or Potential Conflicts of Interest
No authors declared any potential conflicts of interest.