-
PDF
- Split View
-
Views
-
Cite
Cite
Ionut Nistor, Davide Bolignano, Maria C. Haller, Evi Nagler, Sabine N. van der Veer, Kitty Jager, Adrian Covic, Angela Webster, Wim Van Biesen, Why creating standardized core outcome sets for chronic kidney disease will improve clinical practice, Nephrology Dialysis Transplantation, Volume 32, Issue 8, August 2017, Pages 1268–1273, https://doi.org/10.1093/ndt/gfv365
- Share Icon Share
Abstract
Chronic kidney disease (CKD) is common and is associated with increased mortality, morbidity and cost. However, insufficient high-quality trial data are available to answer many relevant clinical questions in this field. In addition, a wide range of variable outcomes are used in studies, and often they are incompletely reported. Furthermore, there is a lack of patient-relevant outcomes, such as mortality, morbidity, quality of life, pain, need for dialysis or costs. Common problems with outcome reporting are as follows: choosing the wrong domains to measure; within domains, choosing the wrong measures (invalid surrogates, composite, non-patient relevant); within measures, choosing the wrong/variable metrics; and within metrics, choosing variable presentation methods. With this article, we aim to underline why standardized outcome reporting is key to achieving evidence-based guidance and improving clinical care for patients; highlight the frameworks available for achieving core outcome sets; and starting from these frameworks, we propose steps needed to develop a core outcome set in the field of CKD. We hope that standardized core outcome sets for nephrology will lead to the most important outcome of guideline production, improving outcomes for our patients.
OUTCOMES IN NEPHROLOGY: WHAT IS THE PROBLEM?
Chronic kidney disease (CKD) [estimated glomerular filtration rate (eGFR) <60 mL/min/1.73 m2 for >3 months] is common, affecting one in eight persons and their families, and is associated with increased mortality, morbidity, patient burden and cost [1–3]. Despite the important health and financial burden of CKD, and the substantial amount of nephrology-oriented research, insufficient high-quality data are available to answer many, often very relevant, clinical questions. A poor evidence base may result in suboptimal guidance for clinical decision-making in caring for people with CKD. Fewer randomized clinical trials (RCTs) are published in nephrology than in any other medical specialty [4]. Existing trials are commonly at high risk of bias, and so may not provide valid estimates of the effects of interventions, further diminishing the evidence that can be derived from them. Often, outcomes are reported in different formats, limiting mathematical synthesis, or are incompletely reported. In addition to these issues, only 17.4% of reported studies contain outcomes relevant to patients, but instead use surrogate markers. An example of a surrogate outcome is serum parathyroid hormone (PTH), used as a surrogate marker for mortality in trials on the management of chronic kidney disease-mineral bone disease (CKD-MBD). As a result, informative data on hard outcomes are sparse, hampering clinical decision-making in everyday clinical practice.
This article is focussed on how outcome reporting influences medical decision-making and why the creation of standardized core outcome sets can improve everyday clinical decision-making and thus patient outcomes. We aim to explain (i) what the potential pitfalls of outcome reporting are, (ii) why good-quality standardized outcome reporting is necessary to achieve evidence-based guidance, (iii) which frameworks are available to reflect outcome reporting and, finally, (iv) we propose a strategy to define standardized core outcome sets in nephrology, using progression of CKD as an example.
THE PITFALLS OF ‘OUTCOME REPORTING’
The quality of outcome reporting can be considered over three levels: (i) the completeness and variability of reporting the outcome, (ii) the selection of the outcome domain(s) and (iii) the selection of the outcome measure(s).
Incomplete and variable reporting
Reporting the results of a trial can be problematic for several reasons. The most frequent problem is that of selective outcome reporting. Up to two-thirds of studies do not report the outcomes of all study participants [5], resulting in problems of internal validity and unreliable estimates of effect. As a consequence, physicians might have unreliable information about the effects of an intervention they prescribe to their patients. In principle, a minimal requirement would be that the primary outcome is reported for all participants, allowing an intention-to-treat analysis. If authors of a randomized trial cannot provide the primary outcome for all participants, serious questions can arise about the quality of the study. A limited degree of missing data for secondary outcomes can be accepted, provided that there is an equal distribution of missing data and the underlying reason why they are missing between study groups. Statistical techniques to account for missing data, such as imputation, can only be accepted under these premises.
Incomplete reporting can also impact the extent to which the study results can be generalized to other populations (external validity), if the baseline characteristics of the study become unclear, e.g. when data for diabetics are lacking to a greater degree than for non-diabetics. The Consolidated Standards of Reporting Trials (CONSORT) statement helps evidence users judge reporting completeness, as CONSORT suggests inclusion of a flow diagram accounting for all study participants from the initial screening for eligibility through to the final analysis.
Some studies do not report all outcomes they originally planned in the protocol, often because of negative (no difference) results. Incomplete outcome reporting may introduce bias, as only study outcomes that are statistically significant are made public. This selective publication results in overestimation of the effect and might cause harm as clinicians are misled on interventions they advocate to their patients. Mandatory trial registration and pre-publication of study protocols at least offer the opportunity to reveal where selective outcome reporting has occurred. However, despite the fact that most journal editors endorse the CONSORT statement, many study reports do not adhere well to the CONSORT recommendations. Authors still fail to report all pre-specified end points, and editors still accept papers for publication without pre-published protocols [5–7]. The other side of this is the reporting of post hoc analyses, including outcomes that were not pre-specified, just because they show statistical significance or desired effects [8].
Lastly, the way the effect size of study results is reported can cause unnecessary ambiguity. Problems arise when study results are reported in a qualitative rather than a quantitative way (e.g. Treatment A was better than Treatment B) without clear definition of the meaning or threshold for ‘better’. Many authors of RCTs choose to report only relative effects while omitting absolute effect sizes. This may inflate the apparent effect size and result in presentation bias. Reporting that an event is twice as likely to happen with one treatment versus another (relative risk = 2) may translate into the event occurring in 1/100 000 versus 2/100 000 (absolute risk difference = 1/100 000). Similarly, a 10% survival advantage for someone who has a life expectancy of 3 years translates into 3.6 months, while for someone expected to live another 25 years, this would translate into 2.5 years.
Selection of incorrect outcome domains (what is measured)
Not all results of a study are equally important for the decision-making process, as physicians and patients might give more weight to the importance of some events or outcomes rather than others. Measuring outcomes that stakeholders (patients, clinicians, policymakers, payers) do not find relevant is a waste of time and money. Moreover, reporting unimportant outcomes results in ‘anchoring’ [9], the subluminal psychological process whereby our decisions are unconsciously influenced by a (positive) result, even if we objectively know that the outcome is not relevant.
Patient-relevant outcomes, also called hard clinical end points [10, 11], are outcomes that matter to patients or society, such as mortality, morbidity, quality of life, pain, need for dialysis or costs. However, measuring hard clinical end points often requires a longer follow-up time and/or larger patient numbers to obtain sufficient power to show a difference [12]. As a consequence, only a limited number of RCTs use patient-relevant hard clinical outcomes as their primary end point [13]. In an attempt to increase event rates (and thus power), many studies use composite outcomes, whereby multiple outcomes are considered as one end point. However, this assumes that all the individual outcomes in the composite are equally important or relevant to the patient. A composite end point can inflate effect estimates by including less relevant outcomes that move in the same direction in response to the intervention. The use of composite outcomes can also result in false-negative effects by including less relevant outcomes that move in the opposite direction in response to the intervention compared with the hard outcome. Where the use of composite outcomes is unavoidable, all outcomes included within the composite measure should also be reported separately.
Another way of manipulating sample size and power calculations to make trials more efficient is to use surrogate outcomes. Surrogate outcomes are (assumed to be) associated with a hard clinical end point but can be measured more easily and usually within a shorter time frame. Examples in nephrology are cholesterol levels, haemoglobin levels, vascular calcification markers, calcium phosphorus product and PTH, all of which are used as surrogate markers of downstream events such as cardiovascular morbidity or mortality. A surrogate outcome needs to fulfil some assumptions before it can be considered valid: the association between intervention, surrogate and clinical end point needs to be consistent (i.e. the association is always present, always in the same direction and always the same order of magnitude), and the surrogate has to be on the causal pathway between the intervention and the hard clinical end point [10]. Ideally, the surrogate outcome should have demonstrated validity through use in an RCT with a hard end point. An example of the difficulties that can arise through the use of surrogate outcomes is illustrated by the evolution of the evidence base for erythropoietins. While erythropoietins correct haemoglobin values effectively, their use has not translated as expected into improvement in patient-important outcomes such as mortality [14, 15]. Another example is the use of calcimimetics. Given the association between lower PTH and improved survival (and other hard outcomes) in people with CKD [16, 17], it seemed reasonable to accept that actively reducing PTH would translate into better survival. However, recently published results of the EVOLVE study, including nearly 4000 patients, showed no effect of cinacalcet on hard clinical end points such as mortality, despite substantial improvements in PTH [18, 19]. The validity of surrogate markers can also be population specific. For example, reducing serum cholesterol concentration translates into improved survival in the general population; however, the same does not seem to be true for chronic dialysis patients [20]. While cardiovascular mortality in the general population is mainly due to atherosclerosis, and thus causally related to hypercholesterolaemia, cardiovascular mortality in dialysis patients can be due to congestive heart failure and therefore not directly causally related to cholesterol levels.
Selection of incorrect outcome measures (how it is measured)
The instrument or metric used to measure an outcome can influence the usefulness of the findings to evidence-based decision-making. An outcome measure should be valid, discriminative and feasible. Validity of an outcome measure refers to whether that measure really reflects the core of the outcome (face value) and whether it really measures what it claims to represent, e.g. does eGFR based on serum creatinine measurement really reflect measured GFR. Discriminative refers to the fact that the measure can discriminate between different stages of a disease, either cross-sectional in a group of people (categorization, classification, prognostication) or longitudinal in the same patient (sensitivity to change). The way an outcome is measured (the metric) can also lead to differences in the perceived importance, e.g. reporting the need for dialysis as time to event, or as % starting dialysis in a certain timeframe might create a different perception of the effect size of an intervention, as was the case in the trials of angiotensin II receptor blockers to prevent progression of diabetic nephropathy [21]. In this case, doubling of serum creatinine was used as a dichotomic surrogate outcome rather than the change in measured glomerular filtration rate (GFR). As a result, minor differences in the absolute change in kidney function (e.g. all patients from 1 to 1.99 mg/dL in the intervention and from 1 to 2.01 mg/dL in the placebo group) resulted in an inflated impression of the effect when the surrogate outcome (doubling of serum creatinine) was used (no events in the intervention group versus 100% events in the placebo group).
WHY STANDARDIZED OUTCOME REPORTING IS ESSENTIAL FOR EVIDENCE-BASED GUIDANCE
Inadequate reporting or reporting of inadequate outcomes is problematic for clinical decision-making and for generating evidence-based guidance. The solution is standardized reporting of outcomes. ‘Standardized’ means that for each condition, there is a predefined set of outcomes (what to measure) and outcome measures (how to measure) on which there is broad consensus on relevance and validity.
By using core outcome sets, investigators can be more certain that their results can be used in creating guidance. Core outcome sets can help prevent studies with irrelevant or invalid outcome measures and result in less waste of research effort and cost. Furthermore, a standardized outcome set allows pooling results of different studies more easily in meta-analyses and comparative effectiveness analyses and reduces potential bias arising from selective outcome reporting.
The process of establishing a core outcome set can be valuable as it forces stakeholders to consider exactly what is relevant and important. In conclusion, there is a clear need to standardize which and how outcomes should be reported to ensure univocal interpretation and reduce uncertainty and cost.
WHAT WORK HAS ALREADY BEEN DONE ON CORE OUTCOME SETS?
A good framework on how to think about outcomes and how outcomes can be reported in medicine was proposed by Zarin et al. [22]. As an example of how this framework can be applied to nephrology topics, we have developed a similar format for reporting measures of progression of kidney failure (Figure 1).
![An example of the four levels of specification in reporting outcome measures (adapted after Zarin et al. [22]). RRT, renal replacement therapy; eGFR, estimated glomerular filtration rate. Level 1: Selecting the correct domains to be evaluated, i.e. those with relevance to patient or society. Consequence of failure: outcomes become irrelevant for evidence-based guidance, creation of noise that might blur the true signal and anchoring on irrelevant results. Level 2: Choosing valid, discriminative and feasible instruments to measure the outcome. Consequence of failure: remaining uncertainty on the direction and size of the actual effect on hard patient-important outcomes. Level 3: Choosing one relevant metric to express the outcome. Consequences of failure: complicating comparison across studies when different outcome measures are reported by different studies, and different expressions of the outcome measure might have different interpretations/consequences in different situations (context specificity). Level 4: Choosing a correct and valid way to express the metric. Consequences of failure: difficulties in interpreting the results or translating them to the clinical level; biased interpretation (framing) of the results.](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/ndt/32/8/10.1093_ndt_gfv365/2/m_gfv365f1.jpeg?Expires=1750479317&Signature=LIhmgX5QRe36KAfgQ1KxqJJruvh5quFePwymVBCsKuDFfPGWfjeC3u9HEognb6C0yxKHZeCX2neYfdltnmcOnghCcWUreoVtcz1qULDI7K7wHQ3rasMznUUXL5ACoIlj6lNjLN860m1Z69M19~tAFCOZgfH2o9iApMwuuzbBdLb9B0B4u4PFNsqGBuB37fTirkYXWocXEg2MGoF7JdW~NQUIJsSLq3~wcX0Di5arFsdO0OMh4OwW~NjNcMNjh5G-SoAKReSo0ot2tL7MBgoEZSZR-OFpey-0s2HzhH69Na3TwuFz2yqlwlZOxXZPyn0Jb4idN4IPxDzKhWtJnbXTCg__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
An example of the four levels of specification in reporting outcome measures (adapted after Zarin et al. [22]). RRT, renal replacement therapy; eGFR, estimated glomerular filtration rate. Level 1: Selecting the correct domains to be evaluated, i.e. those with relevance to patient or society. Consequence of failure: outcomes become irrelevant for evidence-based guidance, creation of noise that might blur the true signal and anchoring on irrelevant results. Level 2: Choosing valid, discriminative and feasible instruments to measure the outcome. Consequence of failure: remaining uncertainty on the direction and size of the actual effect on hard patient-important outcomes. Level 3: Choosing one relevant metric to express the outcome. Consequences of failure: complicating comparison across studies when different outcome measures are reported by different studies, and different expressions of the outcome measure might have different interpretations/consequences in different situations (context specificity). Level 4: Choosing a correct and valid way to express the metric. Consequences of failure: difficulties in interpreting the results or translating them to the clinical level; biased interpretation (framing) of the results.
One early initiative on standardization of outcomes in clinical trials was the Outcome Measures in Rheumatoid Arthritis Clinical Trials (OMERACT) initiative (www.omeract.org). Following repeated meetings over the past 20 years, the OMERACT initiative has managed to achieve consensus on a standardized core outcome set for intervention studies in rheumatology, and have these used in all trials. To date, >200 studies have been published incorporating the OMERACT standardized outcome set. Recently, OMERACT introduced a new conceptual framework for outcome reporting [23]. Two important differences with the model of Zarin et al. are the incorporation of ‘core areas’ and the necessity to specify whether undesired outcomes are attributable to the intervention (side effects) or to the disease itself. The former difference introduces the necessity to consider the importance and relevance of the outcome domain by attributing it to one of the four core areas (death, life impact, pathophysiologic manifestation and resource use). Most surrogate markers will resolve under ‘pathophysiologic manifestations’ and are automatically separated from more patient-relevant outcomes, such as mortality or quality of life. As the framework states that at least one domain has to be selected from each core area to develop a core outcome set, the major relevant outcomes of mortality and quality of life will always be represented in the set. The second difference intrinsically allows the balancing of the pros and cons of an intervention against the natural evolution and prognosis of a condition. For example, less tight glycaemic control might result in more diabetic microvascular disease (outcome related to the disease), but more intense glycaemic control might lead to more hypoglycaemia-induced mortality (side effect of intervention). This approach forces people to take a holistic approach and avoids tunnel vision [24].
The Core Outcome Measures in Effectiveness Trials (COMET) initiative was launched in Liverpool in January 2010 to develop core outcome sets for different research areas. Data-driven recommendations have been prepared and updated by expert working groups, involving consumers (patients), clinical experts, methodologists and statisticians as important stakeholders, which first established consensus on domains and concepts that should be measured. More than 120 published or ongoing studies about ‘core outcome measures’ can be found on the COMET website (www.cometinitiative.org).
As part of the COMET initiative, there are ongoing projects to develop core outcome sets for cancers, such as for colorectal, head and neck, breast and prostate, some of which have already been used as core outcome sets in clinical trials [25, 26].
DEFINING A STANDARDIZED CORE OUTCOME SET IN NEPHROLOGY
Thus far, no such ‘core outcome set’ has been agreed upon in the field of nephrology. There have been some initiatives suggesting common definitions of conditions such as CKD, acute kidney injury or vascular access [27], but these are only very preliminary steps in creating consensus on a set of relevant and unambiguous outcomes. Recently, the SONG-HD initiative was launched to define a core outcome set for haemodialysis, and the first results are expected in 2 years [28].
There is a pressing need for an international initiative to develop, propose and reach consensus on a core outcome set for future trials, systematic reviews and meta-analyses in renal medicine. Certainly, researchers would still have the freedom to investigate and report other outcomes, but reporting outcomes from the standardized core set would be a prerequisite.
How this could work is illustrated using the progression of CKD (Table 1).
![]() |
![]() |
Example of the steps required to obtain core outcome sets can be developed in nephrology following the OMERACT 2.0 filter (hypothetical example; none of the steps have actually been formally performed thus far).
![]() |
![]() |
Example of the steps required to obtain core outcome sets can be developed in nephrology following the OMERACT 2.0 filter (hypothetical example; none of the steps have actually been formally performed thus far).
A representative core group(s) should decide the health conditions for which they want to define core outcome sets, in this case prevention of progression of CKD. As a second step, the group needs to decide by a consensus model, e.g. the Delphi procedure [29], which domains within the four core areas are of importance, and rank them to obtain a core domains set for that specific health condition. Domains represent concepts that one wants to assess (what to measure), e.g. in the example of CKD this could be death (area: death), quality of life or capacity to remain employed (area: life impact), renal function or bone mineral disease (area: pathophysiological manifestation). By the nature of the OMERACT 2.0 framework, it becomes more apparent that outcome domains such as bone mineral disease are less important, because they appear under pathophysiological manifestations, unless they have proved to be valid surrogate markers. The list for the Delphi procedure can most optimally be fed by a systematic literature review on which outcome domains have been reported in this health condition in the literature.
Next, for each domain, instruments that can assess this domain (how to measure) need to be explored and evaluated for validity, discriminative power and feasibility by a systematic appraisal of the literature to select the most appropriate one(s) for a core outcome measure set. As all these can be context specific, it is of importance that the core group specifies which (sub)groups of patients (elderly, paediatric, CKD Stage 4, etc.) and in which conditions (region, health care setting, etc.) the measures can be appropriately used. For example, quality of life can be measured with different rating instruments, or questionnaires, and in early CKD, scoring instruments from the general population might be applicable, but specific kidney disease questionnaires might be more appropriate in advanced stages of kidney disease. Thus, a core set specifies at least one domain in each area and at least one valid measuring instrument in each domain.
By producing such uniform core sets following this framework, most appropriate domains and metrics can be obtained in an evidence-based manner. By adopting these core sets, the nephrology community will not only benefit from more studies with relevant outcomes, but also from the increased possibility of aggregating data in a meta-analysis. As a result of the increased availability of relevant, high-quality evidence, guideline development will be more robust, and patient care will improve. In line with the mission of European Renal Best Practice (ERBP), we hope standardized core outcome sets for nephrology will lead to the most important outcome of guideline production: improving the outcomes of our patients.
AUTHORS’ ROLES
I.N. researched, wrote and revised the manuscript. D.B. researched and revised the manuscript. M.C.H. researched and revised the manuscript. E.N. researched, wrote and revised the manuscript. S.N.V. researched, wrote and revised the manuscript. K.J. researched and revised the manuscript. A.C. researched and revised the manuscript. A.W. researched and revised the manuscript. W.V.B. conceived the idea for the article and wrote and revised the manuscript.
ACKNOWLEDGEMENTS
The concept of this article has been approved by the advisory board of ERBP (in alphabetical order: D. Abramovic, J. Cannata, P. Cochat, A. Covic, K.-U. Eckhardt, D. Fouque, O. Heimburger, K. Jager, S. Jenkins, E. Lindley, F. Locatelli, G. London, A. McLeod, G. Spasovski, J. Tattersall, R. Vanholder, W. Van Biesen, C. Wanner, A. Wiecek and C. Zocalli) during different meetings.
FUNDING
ERBP is funded by the ERA-EDTA.
CONFLICT OF INTEREST STATEMENT
Detailed declarations of interest for members of ERBP are available at http://european.renal.best.practice.org. A.W. has no financial disclosures.
Comments