Longitudinal varying coefficient single-index model with censored covariates Free

Summary statistics (count and percent) and estimation results (point estimate and 95% CI) of linear coefficients for medical cost trajectory from SEER-Medicare prostate cancer data.

	Count	Percent	Estimate	95% CI
Time to death ≤40 quarters (N = 134 163)
Comorbidity score
0–1	90 130	67.2	–	–
>1	44 033	32.8	0.632	[0.613, 0.651]
Age at baseline
65–74	84 969	63.3	–	–
≥75	49 194	36.7	−0.336	[−0.360, −0.313]
Race
Non-Hispanic White	101 810	75.9	–	–
Non-Hispanic Black	15 466	11.5	0.266	[0.231, 0.302]
Others	16 887	12.6	0.135	[0.097, 0.173]
Initial treatment within 12 months after cancer diagnosis
Radiotherapy	57 474	42.8	0.629	[0.608, 0.649]
Surgery	38 964	29.0	0.051	[0.025, 0.078]
Time to death >40 quarters (N = 27 467)
Comorbidity score
0–1	22 162	80.7	–	–
>1	5305	19.3	0.994	[0.991, 0.997]
Age at baseline
65–74	19 768	72.0	–	–
≥75	7699	28.0	0.013	[−0.015, 0.041]
Race
Non-Hispanic White	21 928	79.8	–	–
Non-Hispanic Black	2400	8.7	−0.055	[−0.088, −0.022]
Others	3139	11.5	−0.038	[−0.067, −0.009]
Initial treatment within 12 months after cancer diagnosis
Radiotherapy	14 499	52.8	0.075	[0.053, 0.097]
Surgery	8975	32.7	−0.042	[−0.063, −0.021]

	Count	Percent	Estimate	95% CI
Time to death ≤40 quarters (N = 134 163)
Comorbidity score
0–1	90 130	67.2	–	–
>1	44 033	32.8	0.632	[0.613, 0.651]
Age at baseline
65–74	84 969	63.3	–	–
≥75	49 194	36.7	−0.336	[−0.360, −0.313]
Race
Non-Hispanic White	101 810	75.9	–	–
Non-Hispanic Black	15 466	11.5	0.266	[0.231, 0.302]
Others	16 887	12.6	0.135	[0.097, 0.173]
Initial treatment within 12 months after cancer diagnosis
Radiotherapy	57 474	42.8	0.629	[0.608, 0.649]
Surgery	38 964	29.0	0.051	[0.025, 0.078]
Time to death >40 quarters (N = 27 467)
Comorbidity score
0–1	22 162	80.7	–	–
>1	5305	19.3	0.994	[0.991, 0.997]
Age at baseline
65–74	19 768	72.0	–	–
≥75	7699	28.0	0.013	[−0.015, 0.041]
Race
Non-Hispanic White	21 928	79.8	–	–
Non-Hispanic Black	2400	8.7	−0.055	[−0.088, −0.022]
Others	3139	11.5	−0.038	[−0.067, −0.009]
Initial treatment within 12 months after cancer diagnosis
Radiotherapy	14 499	52.8	0.075	[0.053, 0.097]
Surgery	8975	32.7	−0.042	[−0.063, −0.021]

TABLE 1

Summary statistics (count and percent) and estimation results (point estimate and 95% CI) of linear coefficients for medical cost trajectory from SEER-Medicare prostate cancer data.

	Count	Percent	Estimate	95% CI
Time to death ≤40 quarters (N = 134 163)
Comorbidity score
0–1	90 130	67.2	–	–
>1	44 033	32.8	0.632	[0.613, 0.651]
Age at baseline
65–74	84 969	63.3	–	–
≥75	49 194	36.7	−0.336	[−0.360, −0.313]
Race
Non-Hispanic White	101 810	75.9	–	–
Non-Hispanic Black	15 466	11.5	0.266	[0.231, 0.302]
Others	16 887	12.6	0.135	[0.097, 0.173]
Initial treatment within 12 months after cancer diagnosis
Radiotherapy	57 474	42.8	0.629	[0.608, 0.649]
Surgery	38 964	29.0	0.051	[0.025, 0.078]
Time to death >40 quarters (N = 27 467)
Comorbidity score
0–1	22 162	80.7	–	–
>1	5305	19.3	0.994	[0.991, 0.997]
Age at baseline
65–74	19 768	72.0	–	–
≥75	7699	28.0	0.013	[−0.015, 0.041]
Race
Non-Hispanic White	21 928	79.8	–	–
Non-Hispanic Black	2400	8.7	−0.055	[−0.088, −0.022]
Others	3139	11.5	−0.038	[−0.067, −0.009]
Initial treatment within 12 months after cancer diagnosis
Radiotherapy	14 499	52.8	0.075	[0.053, 0.097]
Surgery	8975	32.7	−0.042	[−0.063, −0.021]

	Count	Percent	Estimate	95% CI
Time to death ≤40 quarters (N = 134 163)
Comorbidity score
0–1	90 130	67.2	–	–
>1	44 033	32.8	0.632	[0.613, 0.651]
Age at baseline
65–74	84 969	63.3	–	–
≥75	49 194	36.7	−0.336	[−0.360, −0.313]
Race
Non-Hispanic White	101 810	75.9	–	–
Non-Hispanic Black	15 466	11.5	0.266	[0.231, 0.302]
Others	16 887	12.6	0.135	[0.097, 0.173]
Initial treatment within 12 months after cancer diagnosis
Radiotherapy	57 474	42.8	0.629	[0.608, 0.649]
Surgery	38 964	29.0	0.051	[0.025, 0.078]
Time to death >40 quarters (N = 27 467)
Comorbidity score
0–1	22 162	80.7	–	–
>1	5305	19.3	0.994	[0.991, 0.997]
Age at baseline
65–74	19 768	72.0	–	–
≥75	7699	28.0	0.013	[−0.015, 0.041]
Race
Non-Hispanic White	21 928	79.8	–	–
Non-Hispanic Black	2400	8.7	−0.055	[−0.088, −0.022]
Others	3139	11.5	−0.038	[−0.067, −0.009]
Initial treatment within 12 months after cancer diagnosis
Radiotherapy	14 499	52.8	0.075	[0.053, 0.097]
Surgery	8975	32.7	−0.042	[−0.063, −0.021]

4.2 Nomogram on total cost summary

Since the estimated linear coefficients of |$\boldsymbol{\theta }$| (Table 1) lacked direct interpretation as a covariate effect in a regression model, we applied the concept of nomograms to place |$\boldsymbol{\theta }$| in the context of covariate effect on costs. Specifically, Figure 1 shows the use of a nomogram to summarize the covariate effect through distinct index points on total cost given survival time. A nomogram is widely used as a visualization tool to depict the relationship between covariates and outcomes for complex statistical models such as the Cox proportional hazards model and logistic regression model. In the past 2 decades, nomograms have been widely used in clinical decision-making for prostate cancer patients, such as the assessment of time-to-PSA level elevation recurrence (Kattan, 2003), and the Partin Table for pathological cancer stage (Partin et al., 1993). As illustrated in Figure 1, patients in the reference group have 0 total points, on average they costed $144K, $167K, or $182K, if they died at 16, 24, or 32 quarters after the cancer diagnosis, respectively. Given other conditions unchanged, a similar non-Hispanic Black patient has 0.266 total points, and the mean total cost was $161K, $186K, or $203K, respectively, which is higher than their non-Hispanic White counterparts. If the patient received radiotherapy in initial treatment and had over 1 comorbidity score, the total points become 0.895, and the average total costs reach $199K, $230K, or $254K, respectively. Comparatively, the mean total cost for LTS in the reference group is $106K. Fixing other conditions, the total points value for a similar non-Hispanic Black is −0.055, and thus that patient costed $2K less on average for LTS. Note that the area under the curve of cost trajectory is the mean total cost reported in the nomogram. Receiving radiotherapy in initial treatment barely changed the total cost for LTS, while a high comorbidity score is associated with a $43K increase in total cost. The use of nomograms allowed us to translate a complicated single-index regression model into informative graphical presentations that are easier to comprehend for a lay audience and policymakers on cost estimation and evaluation.

FIGURE 1

SEER-Medicare data application results of the cost nomogram. Instructions for policymakers are as follows: first, locate the patient’s baseline covariates on the top axes, and draw a vertical line straight upwards to the points axis to determine how many points are contributed. Second, sum the points achieved for each covariate and locate this sum on the single index axis below. Third, draw a vertical line straight down to find the patient’s total cost ($1000), assuming the time to death is 16, 24, 32, or over 40 quarters. The total cost of LTS is limited to be within 40 quarters. Green: time to death ≤ 40 quarters; orange: time to death > 40 quarters.

4.3 Varying coefficient and linear index associated with cost

Figure 2 clearly shows that the index values were positively associated with cost trajectory, and the relations were not linear. This evidence aligns with the hypothesis test result (H₀: μ_1k(t, s) is constant, k=1, 2; P-value <0.0001) suggesting that a simple model such as (1) is inadequate to capture the complex relationship between covariates and cost trajectory. Figure 2 suggests a dramatic elevation of the cost trajectory right after the cancer diagnosis, especially for subjects who received radiotherapy within the first year of diagnosis. We can find that among patients who died 16 quarters after the cancer diagnosis, the first quarter costs were highest for non-Hispanic Black ($9292; $8985–$9599), lower for non-Hispanic White ($7446; $7117–$7775), and intermediate for others ($8,383; $8074–$8692). After 1 year of diagnosis, the costs went down gradually, and the trajectories for different indexes seem to overlap in the continuing care phase. Within 1 year before death, the cost trajectory quickly increases, and the rates of cost increase are generally higher for higher index values. For example, for a non-Hispanic Black patient aged 65–74 who had comorbidity scores greater than 1 and received radiotherapy as initial treatment, the corresponding single index value was 1.014 and the mean total cost was $150K.

FIGURE 2

SEER-Medicare data application results of the cost trajectories under different indexes when survival time is 16, 24, 32, or over 40 quarters. A single index equals 0 as a reference group. For patients who died within 40 quarters after cancer diagnosis (or for LTS), the single index is −0.336 (0.013) for patients aged over 75, 0.266 (-0.055) for non-Hispanic Black, −0.07 (−0.042) for patients aged over 75, non-Hispanic Black, 0.895 (0.13) for non-Hispanic Black with receipt of radiotherapy as an initial treatment, 1.527 (1.014) for non-Hispanic Black who has an over 1 comorbidity score and receipt of radiotherapy as an initial treatment. The unmentioned conditions are the same as the reference group. For time to death ≤ 40 quarters, single-index = −0.336 (blue), −0.07 (green), 0 (lime), 0.266 (yellow), 0.895 (orange), and 1.527 (red); for time to death > 40 quarters, single-index = −0.055 (blue), −0.042 (green), 0 (lime), 0.013 (yellow), 0.13 (orange), and 1.014 (red).

In Table 1, among patients who survived below 40 quarters, we observed a higher mean cost trajectory for patients who had higher comorbidity scores, were between 65 and 74 years of age, non-Hispanic Black, and received radiotherapy or surgery as the initial treatment. This result is consistent with findings published in Schmid et al. (2016) and Trogdon et al. (2019). Since all covariates in our regression model are binary, and the linear coefficients are on a unit circle, we are able to compare the relative effects of the covariates. For example, in Table 1, the mean effect of being non-Hispanic Black (0.266; 0.231–0.302) is twice as large as being in “other” race categories (0.135; 0.097–0.173). Table 1 also shows some interesting findings for patients who survived over 40 quarters (LTS). We observed slightly lower costs for patients who were non-Hispanic Black or other races and who received surgery within the first 12 months of a cancer diagnosis. The significantly positive coefficients associated with non-Hispanic Blacks and other races compared to non-Hispanic Whites suggest that policymakers should explore factors that can potentially contribute to the higher cost of racial/ethnic minority patients for Medicare.

4.4 Reference cancer care cost

Putting together all the reference trajectories at different survival quarters s = 1, 2|$,\ldots,$| we can depict a visually “smooth” bivariate surface as shown in Figure 3b. Since the measurement time t = 1, 2|$,\ldots,$|s, the surface is on a triangular region. The reference medical cost trajectory has a nonlinear dependence on the time after diagnosis, and the corresponding relationships are highly heterogeneous for different survival times. Fixing the time to death, the trajectories are rough “U-shaped”, which means that patients are likely to receive more care right after the initial cancer diagnosis, as well as more intensive care towards the end of life (EOL).

FIGURE 3

SEER-Medicare data application results for the estimated reference cost trajectories (a) 2D heatmaps and (b) 3D “surface” with 95% confidence intervals with 95% upper and lower bounds in gray meshes.

In Figure 3a, we first look “upward” (bottom to top) at the trajectory surface within 1 year after diagnosis, the costs are lower for patients who survived longer. The treatment guideline for prostate cancer published by the American Urological Association (AUA) makes treatment recommendations based on a risk assessment that incorporates factors such as cancer stage, serum prostate-specific antigen, cancer grade, and tumor volume on biopsy (Eastham et al., 2022). Patients who survived longer are likely those who were healthier and had a lower risk of cancer progression; thus were suitable candidates for observation or active surveillance, which was less expensive than definitive treatment options (eg, surgical or radiotherapy). The boundary oscillation close to τ may be due to limited sample size and spline approximation. Looking “backward” (right to left) at the trajectory pattern from the time to death, there is a change point around four quarters before death, which aligns with the definition of terminal phase in NCI's cost reporting (Mariotto et al., 2020). In Figure 3a, higher terminal care cost is a common pattern for the elderly due to the use of all services in EOL care, including hospitalization related and unrelated to cancer, hospice care, and outpatient services (ie, office or emergency room visits and hospital outpatient procedures) (Duncan et al., 2019). However, the trajectory is L-shaped on average among LTS, possibly because the observed costs are not all from their EOL care.

Besides the cost within the first few months after diagnosis and the last few months before death, in the middle of a lifespan, we observe that patients with longer-term survival have lower average quarterly costs. For example, the cost at the 8th, 12th, 16th, and 12th quarters postdiagnosis for patients who died at 16, 24, 32, and over 40 quarters follows a decreasing trend: $5888, $4007, $3864, and $3787, respectively. However, the cumulative cost in the middle of patients’ lifespans may be higher for patients who survive longer.

The complex nonlinear interdependencies are clearly seen between time after diagnosis, time to death, and costs. In addition, the estimated compound symmetry correlation for patients who survived within 40 quarters after diagnosis is 0.07, which suggests a low positive within-subject correlation in costs. The corresponding correlations are slightly lower for LTS (0.05) because LTS includes subjects with different survival times and the variability of costs is thus larger.

5 SIMULATION STUDY

In this section, we study the finite sample performance of the proposed method by two simulation examples:

Example 1

The outcome Y is sampled from a normal distribution with baseline mean trajectory μ₀₁(t, s) = cos(2πt/s) + 1 to mimic U-shaped trajectories for the non-LTS subjects (0 < s ≤ τ), and μ₀₂(t) = cos(2πt) × 1(t ≤ 0.5) − 1(t > 0.5) + 1 to mimic the L-shaped trajectories for LTS (s > τ). The variance is 0.25 with a compound symmetry correlation structure and the correlation is 0.2.

Example 2

The outcome Y is sampled from a gamma distribution with chosen shape and scale parameters such that the baseline mean trajectory follows μ₀₁(t, s) = exp(t/s) to mimic increasing trajectories for the non-LTS subjects (0 < s ≤ τ), and μ₀₂(t) = exp(t) for LTS (s > τ). The variance depends on mean as var(Y) = μ². To simulate zero-inflation, the outcome Y is multiplied by 10/9, and then 10% outcome values are randomly set to 0.

For both examples, the covariates X = [X₁, X₂] are independently generated from Uniform[0,1] and Bernoulli(0.5). The mean index functions |$\boldsymbol{X} ^T \boldsymbol{\theta }_{1}= \boldsymbol{X} ^T \boldsymbol{\theta }_{2}=(X_1+X_2)/\sqrt{2}$|⁠, and the varying coefficient μ₁₁(t, s) = μ₁₂(t, s) = 1 + κt/s where κ is the nonconstant strength. κ = 0 means that the covariates have an additive effect as presented in model (1), and a higher κ means more interaction between the covariate effect and the trajectory shape. The covariance matrix has a compound symmetry correlation structure with α = 0.2 by using the R package simstudy (Goldfeld and Wujciak-Jens, 2020). The failure time of each subject is sampled such that the baseline hazard follows a Weibull distribution (scale = 1/3 and shape = 1), and the log of the relative hazard is 2X₁ + X₂. Independent censoring is drawn from exponential distribution with a rate of 2, and all survival times are administratively censored at 1. It yields 4% LTS and 55% those censored before τ, mimicking the motivating data example. We set n = 2000 and 4000, and repeated the simulations 1000 times for each scenario. To obtain smooth estimates of trajectory curve, we choose 5 equally spaced knots and a truncated quadratic basis. The penalty parameter is chosen based on the proposed GCV-based criteria through a grid search on five simulated datasets. The following metrics are evaluated for the index coefficient |$\boldsymbol{\theta }$|⁠, and the nonparametric functions μ_0k, k = 1, 2: the bias, the mean squared error (MSE), and the average coverage probability (CP) of their estimators. The results are summarized in Table 2 and Figures 4 and 5.

FIGURE 4

Simulation results of estimated trajectories for normal data (top) and zero-inflated gamma data (bottom). n = 4000, x = (0,0), s = 0.4. Left: plots of point-wise biases; middle: plots of point-wise mean squared errors; right: plots of pointwise empirical coverage probabilities. We compare lines for methods using all data (red solid) or using partial data (blue dashed).

FIGURE 5

Simulation results of estimated power curves for normal data (top) and zero-inflated gamma data (bottom). n = 4000, x = (0,0), s = 0.4. We compare lines for methods using all data (red solid) or using partial data (blue dashed).

TABLE 2

Bias, MSE and CP of linear coefficients in the simulation studies.

		Example 1: normal data
		All data			Partial data
n		Bias	MSE (× 10⁻²)	CP	Bias	MSE (× 10⁻²)	CP
2000	θ₁₁	−0.001	0.018	0.935	0.001	0.019	0.951
	θ₁₂	−0.001	0.018	0.935	0.001	0.019	0.951
	θ₂₁	−0.001	0.079	0.937	−0.003	0.158	0.937
	θ₂₂	0.001	0.077	0.941	0.001	0.150	0.938
4000	θ₁₁	−0.001	0.008	0.949	−0.001	0.009	0.947
	θ₁₂	0.001	0.008	0.946	0.001	0.009	0.948
	θ₂₁	−0.001	0.040	0.935	−0.001	0.074	0.947
	θ₂₂	0.001	0.040	0.932	0.001	0.073	0.944
		Example 2: zero-inflated gamma data
		All data			Partial data
n		Bias	MSE (× 10⁻²)	CP	Bias	MSE (× 10⁻²)	CP
2000	θ₁₁	−0.011	0.526	0.942	−0.016	0.830	0.907
	θ₁₂	0.004	0.459	0.935	0.005	0.702	0.887
	θ₂₁	−0.013	0.922	0.921	−0.029	2.032	0.901
	θ₂₂	0.001	0.808	0.916	0.003	1.567	0.888
4000	θ₁₁	−0.005	0.257	0.930	−0.008	0.376	0.913
	θ₁₂	0.002	0.241	0.933	0.003	0.335	0.926
	θ₂₁	−0.005	0.403	0.935	−0.012	0.875	0.925
	θ₂₂	−0.001	0.382	0.938	−0.001	0.781	0.925

		Example 1: normal data
		All data			Partial data
n		Bias	MSE (× 10⁻²)	CP	Bias	MSE (× 10⁻²)	CP
2000	θ₁₁	−0.001	0.018	0.935	0.001	0.019	0.951
	θ₁₂	−0.001	0.018	0.935	0.001	0.019	0.951
	θ₂₁	−0.001	0.079	0.937	−0.003	0.158	0.937
	θ₂₂	0.001	0.077	0.941	0.001	0.150	0.938
4000	θ₁₁	−0.001	0.008	0.949	−0.001	0.009	0.947
	θ₁₂	0.001	0.008	0.946	0.001	0.009	0.948
	θ₂₁	−0.001	0.040	0.935	−0.001	0.074	0.947
	θ₂₂	0.001	0.040	0.932	0.001	0.073	0.944
		Example 2: zero-inflated gamma data
		All data			Partial data
n		Bias	MSE (× 10⁻²)	CP	Bias	MSE (× 10⁻²)	CP
2000	θ₁₁	−0.011	0.526	0.942	−0.016	0.830	0.907
	θ₁₂	0.004	0.459	0.935	0.005	0.702	0.887
	θ₂₁	−0.013	0.922	0.921	−0.029	2.032	0.901
	θ₂₂	0.001	0.808	0.916	0.003	1.567	0.888
4000	θ₁₁	−0.005	0.257	0.930	−0.008	0.376	0.913
	θ₁₂	0.002	0.241	0.933	0.003	0.335	0.926
	θ₂₁	−0.005	0.403	0.935	−0.012	0.875	0.925
	θ₂₂	−0.001	0.382	0.938	−0.001	0.781	0.925

Data generated from normal distribution and zero-inflated gamma distribution in examples 1 and 2.

TABLE 2

Bias, MSE and CP of linear coefficients in the simulation studies.

		Example 1: normal data
		All data			Partial data
n		Bias	MSE (× 10⁻²)	CP	Bias	MSE (× 10⁻²)	CP
2000	θ₁₁	−0.001	0.018	0.935	0.001	0.019	0.951
	θ₁₂	−0.001	0.018	0.935	0.001	0.019	0.951
	θ₂₁	−0.001	0.079	0.937	−0.003	0.158	0.937
	θ₂₂	0.001	0.077	0.941	0.001	0.150	0.938
4000	θ₁₁	−0.001	0.008	0.949	−0.001	0.009	0.947
	θ₁₂	0.001	0.008	0.946	0.001	0.009	0.948
	θ₂₁	−0.001	0.040	0.935	−0.001	0.074	0.947
	θ₂₂	0.001	0.040	0.932	0.001	0.073	0.944
		Example 2: zero-inflated gamma data
		All data			Partial data
n		Bias	MSE (× 10⁻²)	CP	Bias	MSE (× 10⁻²)	CP
2000	θ₁₁	−0.011	0.526	0.942	−0.016	0.830	0.907
	θ₁₂	0.004	0.459	0.935	0.005	0.702	0.887
	θ₂₁	−0.013	0.922	0.921	−0.029	2.032	0.901
	θ₂₂	0.001	0.808	0.916	0.003	1.567	0.888
4000	θ₁₁	−0.005	0.257	0.930	−0.008	0.376	0.913
	θ₁₂	0.002	0.241	0.933	0.003	0.335	0.926
	θ₂₁	−0.005	0.403	0.935	−0.012	0.875	0.925
	θ₂₂	−0.001	0.382	0.938	−0.001	0.781	0.925

		Example 1: normal data
		All data			Partial data
n		Bias	MSE (× 10⁻²)	CP	Bias	MSE (× 10⁻²)	CP
2000	θ₁₁	−0.001	0.018	0.935	0.001	0.019	0.951
	θ₁₂	−0.001	0.018	0.935	0.001	0.019	0.951
	θ₂₁	−0.001	0.079	0.937	−0.003	0.158	0.937
	θ₂₂	0.001	0.077	0.941	0.001	0.150	0.938
4000	θ₁₁	−0.001	0.008	0.949	−0.001	0.009	0.947
	θ₁₂	0.001	0.008	0.946	0.001	0.009	0.948
	θ₂₁	−0.001	0.040	0.935	−0.001	0.074	0.947
	θ₂₂	0.001	0.040	0.932	0.001	0.073	0.944
		Example 2: zero-inflated gamma data
		All data			Partial data
n		Bias	MSE (× 10⁻²)	CP	Bias	MSE (× 10⁻²)	CP
2000	θ₁₁	−0.011	0.526	0.942	−0.016	0.830	0.907
	θ₁₂	0.004	0.459	0.935	0.005	0.702	0.887
	θ₂₁	−0.013	0.922	0.921	−0.029	2.032	0.901
	θ₂₂	0.001	0.808	0.916	0.003	1.567	0.888
4000	θ₁₁	−0.005	0.257	0.930	−0.008	0.376	0.913
	θ₁₂	0.002	0.241	0.933	0.003	0.335	0.926
	θ₂₁	−0.005	0.403	0.935	−0.012	0.875	0.925
	θ₂₂	−0.001	0.382	0.938	−0.001	0.781	0.925

Data generated from normal distribution and zero-inflated gamma distribution in examples 1 and 2.

In Table 2, when the outcomes follow a multivariate normal distribution, we see that the coverage probabilities for index parameter estimates are close to the nominal 95% confidence level. The bias is small, and the MSE decreases as the sample size increases. Compared to the method using only uncensored or LTS (partial) data, the method using all data reduces the MSE. For example, when n = 2000, the MSE of θ₂₁ reduces by 50% when cost data from censored subjects is used. This indicates that properly accounting for censoring is particularly beneficial for improved efficiency when the number of observations is relatively small.

When the outcomes follow a zero-inflated gamma distribution, the bias for index parameter estimates is small, and the MSE decreases as the sample size increases. This decrease indicates that the proposed estimation under the identity link function is robust against skewness and zero-inflation. This is because it does not model the full distribution of the cost data. The coverage probabilities for the method using partial data are slightly less than 95%, but close to 90% when the sample size is 2000. The coverage probability is closer to 95% when censored cost data is used, and the estimates of the confidence interval become more accurate as sample size increases.

Figure 4 visualizes the simulation results of the estimated baseline trajectories for examples 1 and 2 when n = 2000 at terminal event times s = 0.4 and baseline covariates x = (0,0). We see that the coverage probabilities are close to the nominal 95% confidence level. The pointwise bias and MSE are both small, suggesting that the proposed estimator fits the baseline trajectory well. As expected, the MSE curve for the method using all data is lower as expected. For a few boundary points, MSE is large due to the small bias caused by the limited sample size in those regions.

Lastly, we evaluate the proposed Wald test statistic using the above-mentioned examples. The statistic has two main goals as described in Section 3.6: one is to test whether the shape of the trajectory curves is constant, and the other is to test whether the covariate index parameters equal zero. Figure 5 shows the power function curves under the given significance levels. When the outcome is normal, Figure 5a and b shows that the power curves increase rapidly with the non-constant strength κ for μ₁₁(t, s) and μ₁₂(t, s), which are the varying coefficient functions in the proposed 2-part model. The method using all data has better statistical power. When the effect is close to 0, the test sizes are all approximately at the significance level. Figure 5c and d show that the power curve for index coefficients θ₁₂ and θ₂₂, which represents the second index parameter in each part of the model. As the sample size in the partial data method for estimating the θ₁₂ is large, its power is similar to the power of the all data method. The second row of Figure 5 indicates similar findings for zero-inflated gamma data. Additional simulation in Web Appendix for multivariate gamma and zero-inflated normal data shows unbiased estimation of both index coefficients and trajectory functions. Additional simulations that evaluate the proposed method under various settings with different censoring rates are presented in Web Appendix Section 3.

6 DISCUSSION

In this paper, we propose a longitudinal varying coefficient single-index model to detect and test for the complicated nonlinear relationships between the longitudinal medical cost trajectory and baseline covariates in the presence of right-censoring. This model helps health services researchers and policy-makers understand how the baseline patient and initial treatment characteristics affect the shape of subsequent cost trajectories given their survival time. Since the healthcare costs are related to both the time since the initial diagnosis and the survival time, which is subject to right-censoring, the model has to account for both. To our knowledge, there have been no published statistical methods for this problem. Our model formulation balances the consideration of research objective, flexibility, parsimony, and interpretation. The estimation is robust against a distribution assumption of the cost data or incorrect modeling of the within-subject correlation. From a methodological perspective, this is an extension of GEE to incorporate censored covariates.

One advantage of the proposed method is that a consistent initial estimator can be obtained by analyzing the uncensored subjects and LTS. This is helpful from a computational perspective because the data analysts can quickly fit the model using standard software. The estimation efficiency and statistical power can be significantly improved by using data from censored subjects, who can be a large proportion in any real-world dataset. This final analysis can be completed by using the proposed method in this paper and our specialized program. We will make the R code that implements the proposed method for the simulation study and data analysis available to the public. Furthermore, the proposed method has broad and rich interpretations through not only numeric tables, 2D / 3D figures, but also an extended use of nomogram, which serves as a translational tool to accommodate the proposed models in a visually interpretable way.

There are some different results of covariate effects for patients who died within vs. over 40 quarters (eg, using surgery as an initial treatment) in the data example, possibly because multiple factors may dilute the cost across patients’ lifespans. Thus, evaluating the relationship between cost trajectory and time-dependent covariates such as cancer progression and the definitive treatments will be important for subsequent research. The statistical literature in the field of cost-effectiveness has been deficient in addressing the potential changes in cost-effectiveness over time. To address this gap, we put forth a semiparametric estimator and variance that can serve as a foundation for further research into the time-varying cost trajectory in relation to baseline covariates. Future work in this area might develop a methodology for the measure of cost-effectiveness.

Funding

This research is supported by NIH grants R01CA225646 and P30CA016672, and CPRIT grant RP210130.

Conflict of interest

None declared.

Data availability

The data that support the findings of this paper are available from SEER-Medicare. Restrictions apply to the availability of these data, which were used under license for this paper. Data are available at https://healthcaredelivery.cancer.gov/seermedicare/ with the permission of SEER-Medicare.

References

Bai

Y.

,

Fung

W. K.

,

Zhu

Z. Y

. (

2009

).

Penalized quadratic inference functions for single-index models with longitudinal data

.

Journal of Multivariate Analysis

,

100

,

152

–

161

.

Bang

H

. (

2005

).

Medical cost analysis: application to colorectal cancer data from the seer medicare database

.

Contemporary Clinical Trials

,

26

,

586

–

597

.

Baser

O.

,

Gardiner

J. C.

,

Bradley

C. J.

,

Yüce

H.

,

Given

C

. (

2006

).

Longitudinal analysis of censored medical cost data

.

Health Economics

,

15

,

513

–

525

.

Chen

J.

,

Liu

L.

,

Shih

Y. T.

,

Zhang

D.

,

Severini

T. A

. (

2016

).

A flexible model for correlated medical costs, with application to medical expenditure panel survey data

.

Statistics in Medicine

,

35

,

883

–

894

.

Cronin

K. A.

,

Lake

A. J.

,

Scott

S.

,

Sherman

R. L.

,

Noone

A.-M.

et al. (

2018

).

Annual report to the nation on the status of cancer, part i: National cancer statistics

.

Cancer

,

124

,

2785

–

2800

.

Duncan

I.

,

Ahmed

T.

,

Dove

H.

,

Maxwell

T. L

. (

2019

).

Medicare cost at end of life

.

American Journal of Hospice and Palliative Medicine

,

36

,

705

–

710

.

Eastham

J. A.

,

Auffenberg

G. B.

,

Barocas

D. A.

,

Chou

R.

,

Crispino

T.

,

Davis

J. W.

et al. (

2022

).

Clinically localized prostate cancer: Aua/astro guideline, part i: Introduction, risk assessment, staging and risk-based management

.

Journal of Urology

,

208

,

1

–

18

.

OpenURL Placeholder Text

Fan

J.

,

Zhang

W

. (

2008

).

Statistical methods with varying coefficient models

.

Statistics and its Interface

,

1

,

179

–

195

.

Goldfeld

K.

,

Wujciak-Jens

J

. (

2020

).

simstudy: Illuminating research methods through data generation

.

Journal of Open Source Software

,

5

,

2763

.

Kattan

M. W

. (

2003

).

Nomograms are superior to staging and risk grouping systems for identifying high-risk patients: preoperative application in prostate cancer

.

Current Opinion in Urology

,

13

,

111

–

116

.

Kauermann

G

. (

2005

).

Penalized spline smoothing in multivariable survival models with varying coefficients

.

Computational Statistics & Data Analysis

,

49

,

169

–

186

.

Klabunde

C. N.

,

Legler

J. M.

,

Warren

J. L.

,

Baldwin

L.-M.

,

Schrag

D

. (

2007

).

A refined comorbidity measurement algorithm for claims-based studies of breast, prostate, colorectal, and lung cancer patients

.

Annals of Epidemiology

,

17

,

584

–

590

.

Klabunde

C. N.

,

Potosky

A. L.

,

Legler

J. M.

,

Warren

J. L

. (

2000

).

Development of a comorbidity index using physician claims data

.

Journal of Clinical Epidemiology

,

53

,

1258

–

1267

.

Li

L.

,

Wu

C.

,

Ning

J.

,

Huang

X.

,

Shih

Y. T.

,

Shen

Y

. (

2018

).

Semiparametric estimation of longitudinal medical cost trajectory

.

Journal of the American Statistical Association

,

113

,

582

–

592

.

Lin

D. Y

. (

2003

).

Regression analysis of incomplete medical cost data

.

Statistics in Medicine

,

22

,

1181

–

1200

.

Lin

D. Y.

,

Feuer

E. J.

,

Etzioni

R.

,

Wax

Y.

(

1997

).

Estimating medical costs from incomplete follow-up data

.

Biometrics

,

53

,

419

–

434

.

Liu

L

. (

2009

).

Joint modeling longitudinal semi-continuous data and survival, with application to longitudinal medical cost data

.

Statistics in Medicine

,

28

,

972

–

986

.

Liu

L.

,

Wolfe

R. A.

,

Kalbfleisch

J. D

. (

2007

).

A shared random effects model for censored medical costs and mortality

.

Statistics in Medicine

,

26

,

139

–

155

.

Mariotto

A.

,

Enewold

L.

,

Zhao

J.

,

Zeruto

C. A.

,

Yabroff

K. R

. (

2020

).

Medical care costs associated with cancer survivorship in the united states

.

Cancer Epidemiology and Prevention Biomarkers

,

29

,

1304

–

1312

.

Mariotto

A.

,

Shao

Y.

,

Feuer

E. J.

,

Brown

M. L

. (

2011

).

Projections of the cost of cancer care in the united states: 2010–2020

.

Journal of the National Cancer Institute

,

103

,

117

–

128

.

Partin

A. W.

,

Yoo

J.

,

Carter

H. B.

,

Pearson

J. D.

,

Chan

D. W.

,

Epstein

J. I.

et al. (

1993

).

The use of prostate specific antigen, clinical stage and gleason score to predict pathological stage in men with localized prostate cancer

.

The Journal of Urology

,

150

,

110

–

114

.

Schmid

M.

,

Meyer

C. P.

,

Reznor

G.

,

Choueiri

T. K.

,

Hanske

J.

,

Sammon

J. D.

et al. (

2016

).

Racial differences in the surgical care of medicare beneficiaries with localized prostate cancer

.

JAMA Oncology

,

2

,

85

–

93

.

Siegel

D. A.

,

O’Neil

M. E.

,

Richards

T. B.

,

Dowling

N. F.

,

Weir

H. K

. (

2020

).

Prostate cancer incidence and survival, by stage and race/ethnicity—united states, 2001–2017

.

Morbidity and Mortality Weekly Report

,

69

,

1473

.

Trogdon

J. G.

,

Falchook

A. D.

,

Basak

R.

,

Carpenter

W. R.

,

Chen

R. C

. (

2019

).

Total medicare costs associated with diagnosis and treatment of prostate cancer in elderly men

.

JAMA Oncology

,

5

,

60

–

66

.

Wang

J. L.

,

Xue

L.

,

Zhu

L.

,

Chong

Y. S

. (

2010

).

Estimation for a partial-linear single-index model

.

The Annals of Statistics

,

38

,

246

–

274

.

OpenURL Placeholder Text

Wang

S.

,

Shen

Y.

,

Shih

Y. T.

,

Xu

Y.

,

Li

L.

(

2020

).

Statistical modeling of longitudinal medical cost trajectory: renal cell cancer care cost analyses

.

Biostatistics

,

24

,

244

–

61

.

Wu

J.

,

Peng

H.

,

Tu

W

. (

2019

).

Large-sample estimation and inference in multivariate single-index models

.

Journal of Multivariate Analysis

,

171

,

382

–

396

.

Yu

Y.

,

Ruppert

D

. (

2002

).

Penalized spline estimation for partially linear single-index models

.

Journal of the American Statistical Association

,

97

,

1042

–

1054

.