Nonparametric predictive model for sparse and irregular longitudinal data

Wang, Shixuan; Kim, Seonjin; Ryan Cho, Hyunkeun; Chang, Won

doi:10.1093/biomtc/ujad023

Abstract

We propose a kernel-based estimator to predict the mean response trajectory for sparse and irregularly measured longitudinal data. The kernel estimator is constructed by imposing weights based on the subject-wise similarity on L₂ metric space between predictor trajectories, where we assume that an analogous fashion in predictor trajectories over time would result in a similar trend in the response trajectory among subjects. In order to deal with the curse of dimensionality caused by the multiple predictors, we propose an appealing multiplicative model with multivariate Gaussian kernels. This model is capable of achieving dimension reduction as well as selecting functional covariates with predictive significance. The asymptotic properties of the proposed nonparametric estimator are investigated under mild regularity conditions. We illustrate the robustness and flexibility of our proposed method via extensive simulation studies and an application to the Framingham Heart Study.

distance, kernel estimation, longitudinal data analysis, trajectory prediction

1 INTRODUCTION

Longitudinal studies in which subjects are measured repeatedly over a period of time |${\cal T}$| allow us to examine associations between a response variable and a set of predictors at distinct time points. We consider prediction of the response variable over a period of time |${\cal T}_2 \subset {\cal T}$| based on repeated measures of the predictors over another period of time |${\cal T}_1 \subset {\cal T}$|⁠. We denote Y(s) by the response variable at time |$s \in {\cal T}_2$| and x_r = {x_r(t₁), |$\ldots,$| x_r(t_m)} for r = 1, |$\dots,$| p by a vector of the rth predictor sparsely collected at m time points t₁, |$\ldots,$| t_m. Then, |${\cal T}_1$| can be defined as [t₁, t_m] with respect to the measurement times of predictors. It is of primary interest to predict the mean response trajectory over |${\cal T}_2$| given a history of the repeatedly observed predictors x₁, |$\dots,$| x_p over |${\cal T}_1$|

$$\begin{eqnarray} &&E\lbrace Y(s)\mid {\bf x}_{1},\dots ,{\bf x}_{p}\rbrace\\ &&=E\lbrace Y(s)\mid {\bf X}_{1}={\bf x}_1,\dots ,{\bf X}_{p}={\bf x}_p\rbrace , \quad s \in {\cal T}_2. \end{eqnarray}$$

(1)

Here, the response variable Y(s) can be either continuous or discrete. For instance, if prediction of obesity incidence is of interest, the response variable is dichotomous, that is, Y(s) = I(Z(s) > 30 kg/m²), where Z(s) is body mass index (BMI) at time s and I(·) is an indicator function. If prediction of the mean trajectory of BMI is of interest, the response variable Y(s) = Z(s) is continuous. If |${\cal T}_2$| is a future time period (i.e., t_m < s for any |$s \in {\cal T}_2$|⁠), Equation 1 can be viewed as the future mean trajectory of the response variable given the history of the past exogenous variables.

In recent years, various nonparametric methods have been studied to predict the future longitudinal response trajectory based on earlier predictors, including ranking-tracking probability approaches (Wu and Tian, 2013; Wu et al., 2020) and predictive longitudinal models (Kim et al., 2021a, 2021b). However, these methods are limited in the view of longitudinal study because prediction is performed based on predictors evaluated at a certain time in the past. In the literature on function data analysis, many studies have poured attention on predicting the mean response profiles based on repeated measures of functional predictors; see Ivanescu et al. (2015), Scheipl, Staicu, and Greven (2015), and Sun et al. (2018). A famous and classic functional linear regression for sparse longitudinal data proposed by Yao et al. (2005a) achieves a function-on-function or function-on-scalar estimation of the mean response trajectory given one functional predictor. It is extended to multiple functional predictors and nonlinear predictors via the functional additive model (James and Silverman 2005, Müller and Yao 2008, McLean et al. 2014, Fan et al. 2014, Kim et al. 2018). Even though they improve flexibility of regression to functional predictors, the complexity of estimation increases substantially, including estimation of smooth functions in the additive model, the choice of basis functions and knots, and regularization. The complexity may cause a convergence issue, especially in sparse and irregular longitudinal data (Scheipl et al. 2015). As an alternative, we propose a novel nonparametric approach with a kernel-weighted estimator.

We devote ourselves to the core idea of the kernel regression that prediction of the response variable over time is weighted to reflect the similarity of the history of the predictors. Due to the inherent difficulty of longitudinal data, where observations are often measured irregularly at different time points among subjects, measurement-wise distances between subjects cannot be employed directly. Instead, we adopt the L₂ metric based on individual trajectories of predictors estimated by the functional principal component analysis (Hall et al. 2008) to measure the similarity between histories of repeatedly measured predictors. Our methodology assumes that the more similar the predictors’ trajectories are to each other, the more likely their response trajectories travel in a similar fashion.

In order to deal with the curse of dimensionality in pure-nonparametric approaches caused by multiple predictors, we propose a multiplicative model with multivariate Gaussian kernel function. The logarithm transformation turns the multiplicative model into a linear model of the L₂ metric between individual response trajectories over |${\cal T}_2$| on the L₂ metrics between individual predictor trajectories over |${\cal T}_1$| based on elegant mathematical properties of Gaussian kernel function. This model not only circumvents the curse of dimensionality but also enables us to evaluate the association between predictor trajectories over |${\cal T}_1$| and response trajectory over |${\cal T}_2$|⁠. More specifically, the model investigates the similarity of which predictor’s trajectory leads to similar response trajectories. Furthermore, the asymptotic properties of the proposed estimator is studied, and a bootstrap-based simultaneous confidence band (Kim et al. 2021a) is adopted for statistical inferences. An accompanying R package longke (Wang et al. 2023) is developed for the proposed model.

2 METHODOLOGY

In this section, we discuss in detail technical challenges for the nonparametric kernel estimation of the conditional mean trajectory in Equation 1 due to sparse and irregular measurements over subjects in the longitudinal data and the curse of dimensionality. These issues are circumvented by modeling and predicting the L₂ metric between response trajectories over |${\cal T}_2$| based on the L₂ metrics between predictor trajectories over |${\cal T}_1$|⁠.

2.1 Kernel estimation and distance

The longitudinal random sample is denoted by [{X_i(t_ij), t_ij}, {Y_i(s_ik), s_ik}; i = |$1,\ldots ,$| n, j = 1|$,\ldots ,$| m_i, k = 1|$, \ldots ,$| n_i], where X_i(t) = {X_i1(t)|$, \ldots ,$| X_ip(t)} and Y_i(s) are a vector of predictors and the response variable of subject i measured at sparse time points |$t_{i1},\dots ,t_{im_i}\in {\cal T}_3 \subset {\cal T}$| and |$s_{i1},\dots ,s_{in_i}\in {\cal T}_4 \subset {\cal T}$|⁠, respectively. Here, the two time periods of interest |${\cal T}_1$| and |${\cal T}_2$| defined in Equation 1 are subsets of the time periods of the measurements in the longitudinal sample, that is, |${\cal T}_1\subset {\cal T}_3$| and |${\cal T}_2\subset {\cal T}_4$|⁠. The number of measurements m₁|$,\ldots ,$| m_n and n₁|$, \ldots ,$| n_n are independent and identically distributed positive integer-valued random variables with E(m_i) < ∞ and E(n_i) < ∞, respectively. A longitudinal sample from a typical setting of longitudinal studies, where the response and predictors are measured concurrently is a special case of the above longitudinal sample, where t_ij = s_ik for all i and j = k. For simplicity, we denote |${\bf X}_{ir}=\lbrace X_{ir}(t_{i1}),\dots ,X_{ir}(t_{im_i})\rbrace$|⁠, r = 1|$, \ldots ,$| p, and |${\bf Y}_{i}=(Y_{i1},\dots ,Y_{in_i})$| with Y_ik = Y_i(s_ik).

In order to predict the mean response trajectory, the existing work, including the functional data analysis, the rank tracking-based approaches, and the predictive longitudinal model approaches discussed in Section 1, focuses on identifying the relationship between Y(s) and X(t) for s ≠ t. It is quite challenging to impose a structural assumption on the relationship, especially in cases with sparse longitudinal data having multiple predictors repeatedly and irregularly measured over time. Thus, we rather devote our attention on the similarity among subjects in terms of their observed values of predictors over |${\cal T}_1$| and contrive a way to aggregate the information from the predictors.

Based on the assumption that measurement time points of the response variable |$s_{i1},\dots ,s_{in_i}$| are independent of predictors X_i1, |$\ldots,$| X_ip, a nonparametric kernel estimator of E{Y(s)∣x₁ |$, \ldots ,$| x_p} in (1) over |$s \in {\cal T}_2$| can be obtained as

$$\begin{eqnarray} &&\hat{E}\lbrace Y(s)\mid {\bf x}_{1},\dots ,{\bf x}_{p}\rbrace\\ &&=\frac{\sum _{i=1}^nK_X\lbrace D_i({\bf x}_{1},\dots ,{\bf x}_{p})\rbrace n_i^{-1}\sum _{k=1}^{n_i}Y_{ik}K_{ik}(s) }{\sum _{i=1}^nK_X\lbrace D_i({\bf x}_{1},\dots ,{\bf x}_{p})\rbrace n_i^{-1}\sum _{k=1}^{n_i}K_{ik}(s)}, \end{eqnarray}$$

(2)

where K_ik(s) = K{(s_ik − s)/b_s}, K(·) and K_X(·) are kernel functions and b_s is a bandwidth. D_i(x₁ x_p) is an overall distance between subject i’s predictors X_i1|$\, \ldots ,$| X_ip and a history of predictors of interest x₁|$,\ldots ,$| x_p in |${\cal T}_1$|⁠. The responses from subject i are more weighted if subject i’s observed predictors are more similar to x_{1|$, \ldots ,$|}x_p. In like manner, if its measurement time s_ik is closer to s, the kernel function K_ik(s) outweighs Y_ik in estimation of E{Y(s)∣x₁|$, \ldots ,$| x_p}.

As the first step of developing a method of measuring D_i(x₁|$, \ldots ,$| x_p), we adopt the L₂ metric between the individual trajectories of the rth predictor of subject i and subject i′ over |${\cal T}_1$|⁠:

$$\begin{eqnarray} &&D_{{\cal T}_1}({\bf X}_{ir},{\bf X}_{i^{\prime }r})\\ &&=\sqrt{\int _{{\cal T}_1} \left[E\lbrace X_{ir}(t)\mid {\bf X}_{ir}\rbrace - E\lbrace X_{i^{\prime }r}(t)\mid {\bf X}_{i^{\prime }r}\rbrace \right]^2dt }. \end{eqnarray}$$

(3)

It resolves the issue from irregular measurements over subjects because the distance is computed based on two trajectories over the same time period instead of the actually measurements. |$D_{{\cal T}_1}({\bf X}_{ir},{\bf X}_{i^{\prime }r})=0$| indicates the individual trajectories of the rth predictor of subject i and subject i′ over |${\cal T}_1$| are identical, and the larger |$D_{{\cal T}_1}({\bf X}_{ir},{\bf X}_{i^{\prime }r})$| the more dissimilarity between the two trajectories. The overall distance D_i(x₁|$, \ldots ,$| x_p) in Equation 2 should be defined as an increasing function of |$D_{{\cal T}_1}({\bf X}_{i1},{\bf x}_1),\dots ,D_{{\cal T}_1}({\bf X}_{ip},{\bf x}_p)$|⁠. As a result, the kernel function K_X(·) puts more weight on subjects having a similar history with x₁, …, x_p.

Following the traditional multivariate kernel approach,

$$\begin{eqnarray} &&K_X\lbrace D_i({\bf x}_{1},\dots ,{\bf x}_{p})\rbrace\\ &&= K_1\lbrace D_{{\cal T}_1}({\bf X}_{i1},{\bf x}_{1})\rbrace \cdots K_p\lbrace D_{{\cal T}_1}({\bf X}_{ip},{\bf x}_{p})\rbrace , \end{eqnarray}$$

(4)

a naive pure nonparametric estimator of Equation 1 for |$s \in {\cal T}_2$| is

$$\begin{eqnarray} &&\tilde{E}\lbrace Y(s)\mid {\bf x}_{1},\dots ,{\bf x}_{p}\rbrace \cr &&=\frac{\sum _{i=1}^nK_1\lbrace D_{{\cal T}_1}({\bf X}_{i1},{\bf x}_{1})\rbrace \cdots K_p\lbrace D_{{\cal T}_1}({\bf X}_{ip},{\bf x}_{p})\rbrace n_i^{-1}\sum _{k=1}^{n_i}Y_{ik}K_{ik}(s) }{\sum _{i=1}^nK_1\lbrace D_{{\cal T}_1}({\bf X}_{i1},{\bf x}_{1})\rbrace \cdots K_p\lbrace D_{{\cal T}_1}({\bf X}_{ip},{\bf x}_{p})\rbrace n_i^{-1}\sum _{k=1}^{n_i}K_{ik}(s) } , \end{eqnarray}$$

(5)

where for r = 1, |$\ldots,$| p,

$$\begin{eqnarray} K_r\lbrace D_{{\cal T}_1}({\bf X}_{ir},{\bf x}_{r})\rbrace &=&\frac{1}{b_r\sqrt{2\pi }}\exp \left\lbrace \frac{-D_{{\cal T}_1}^2({\bf X}_{ir},{\bf x}_{r})}{2b_r^2}\right\rbrace \cr &=&\frac{1}{b_r\sqrt{2\pi }}\exp \left(\frac{-\int _{{\cal T}_1} \left[E\lbrace X_{ir}(t)|{\bf X}_{ir}\rbrace - E\lbrace X_{r}(t)|{\bf x}_{r}\rbrace \right]^2dt }{2b_r^2}\right). \end{eqnarray}$$

(6)

Here, a Gaussian kernel function is a natural choice based on the fact that the L₂ metric is employed, and b₁, …, b_p are bandwidths.

In particular, |$\tilde{E}\lbrace Y(s)\mid {\bf x}_{1},\dots ,{\bf x}_{p}\rbrace$| is an estimator of |$E\lbrace Y(s)\mid D_{{\cal T}_1}({\bf X}_{1},{\bf x}_{1})=\cdots =D_{{\cal T}_1}({\bf X}_{p},{\bf x}_{p})=0\rbrace = E[Y(s)\mid E\lbrace X_1(t)\mid {\bf x}_{1}\rbrace ,\dots ,E\lbrace X_p(t)\mid {\bf x}_{p}\rbrace ]$|⁠. The L₂ metric in Equation 3 used to measure the distance between two individual trajectories is a semi-metric on the space consisting of random vectors {X_ir} because |$D_{{\cal T}_1}({\bf X}_{ir},{\bf X}_{i^{\prime }r})=0$| implies |$E\lbrace X_{ir}(t)\mid {\bf X}_{ir}\rbrace = E\lbrace X_{i^{\prime }r}(t)\mid {\bf X}_{i^{\prime }r}\rbrace$| for all |$t\in {\cal T}_1$|⁠, but it is not a sufficient condition to imply |${\bf X}_{ir}={\bf X}_{i^{\prime }r}$|⁠. In other words, even though observed predictors from two subjects are different, if their predictor trajectories are identical, both subjects would have the same mean response trajectory in |${\cal T}_2$|⁠. However, for i ≠ i′, |$P[E\lbrace X_{ir}(t)\mid {\bf X}_{ir}\rbrace = E\lbrace X_{i^{\prime }r}(t)\mid {\bf X}_{i^{\prime }r}\rbrace ]=0 \mbox{ for all } t\in {\cal T}_1$|⁠. In addition, the trajectory is of interest rather than observations themselves, especially in longitudinal data analysis with irregular measurement times across subjects. Thus, the proposed approach based on the distance between trajectories is reasonable and practical. An alternative way to measure a distance for sparse and irregular data is proposed in Peng and Müller (2008).

Remark 1

To avoid the complexity of notation, we present that the predictors from the same subject in the longitudinal sample are measured concurrently at the beginning of this section. However, it is not necessary since the trajectory of each predictor is estimated separately. The proposed approach can be applicable to an even more general longitudinal sample allowing different measurement times for predictors, that is, denoted by {|$(X_{ ir}(t_{ irj}),\, t_{ irj}), (Y_{ i}(s_{ ik}),s_{ ik}); i=1,\, \dots, n, r=1,\, \dots, p,j=1 \,\dots, m_{ ir}, k=1,\, \dots, n_{ i}$|}. In the same fashion, repeatedly observed predictors of interest |$\chi_{ 1},\, \ldots, \, \chi_{ p}$| are not necessary to be measured at the same time points. However, we use the simpler notation at the beginning of this section for ease of presentation.

2.2 Multiplicative model of D_i(x₁, …, x_p) with its proxy variable

Even though the repeated measurements of predictors are aggregated with the Euclidean distance in Equation 3, the estimator |$\tilde{E}\lbrace Y(s)\mid {\bf x}_{1},\dots ,{\bf x}_{p}\rbrace$| in Equation 5 still suffers from the curse of dimensionality when p ≥ 2, considering that the total number of kernel functions is p + 1, including K_ik(s). To overcome the curse of dimensionality, we identify a proxy variable of D_i(x₁|$, \ldots ,$| x_p) and predict the proxy by a novel multiplicative model based on |$D_{{\cal T}_1}({\bf X}_{i1},{\bf x}_{1}),\dots ,D_{{\cal T}_1}({\bf X}_{ip},{\bf x}_{p})$|⁠. When the predictor trajectories over |${\cal T}_1$| have predictive significance in the mean response trajectory over |${\cal T}_2$|⁠, it is natural that subjects with comparable history of predictors over |${\cal T}_1$| are likely to have similar response trajectories over |${\cal T}_2$|⁠. Based on this supposition, we propose the following multiplicative model motivated by Equation 4:

$$\begin{eqnarray} &&K_Y\lbrace D_{{\cal T}_2}({\bf Y}_i,{\bf Y}_{i^{\prime }})\rbrace\\ && =c K_1\lbrace D_{{\cal T}_1}({\bf X}_{i1},{\bf X}_{i^{\prime }1})\rbrace \cdots K_p\lbrace D_{{\cal T}_1}({\bf X}_{ip},{\bf X}_{i^{\prime }p})\rbrace e^{\epsilon _{ii^{\prime }}}, \end{eqnarray}$$

where K_Y(·) is a Gaussian kernel with bandwidth b_Y, |$\epsilon _{ii^{\prime }}$| are independent and identically distributed random errors with mean 0 and a finite variance, c is a constant, and

$$\begin{eqnarray} D_{{\cal T}_2}({\bf Y}_i,{\bf Y}_{i^{\prime }})=\sqrt{\int _{{\cal T}_2} \left[E\lbrace Y_i(s)|{\bf Y}_{i}\rbrace - E\lbrace Y_{i^{\prime }}(s)|{\bf Y}_{i^{\prime } }\rbrace \right]^2ds}. \end{eqnarray}$$

The logarithm transformation leads to

$$\begin{eqnarray} &&\log K_Y\lbrace D_{{\cal T}_2}({\bf Y}_i,{\bf Y}_{i^{\prime }})\rbrace \\ &&= \log c +\log K_1\lbrace D_{{\cal T}_1}({\bf X}_{i1},{\bf X}_{i^{\prime }1})\rbrace\\ && +\cdots +\log K_p\lbrace D_{{\cal T}_1}({\bf X}_{ip},{\bf X}_{i^{\prime }p})\rbrace +\epsilon _{ii^{\prime }}. \end{eqnarray}$$

It follows from the Guassian kernels as in Equation 6

$$\begin{eqnarray} \log \frac{1}{b_Y}&-&\frac{D_{{\cal T}_2}^2({\bf Y}_i,{\bf Y}_{i^{\prime }})}{2b_Y^2} =\log c+ \log \frac{1}{b_1}-\frac{D_{{\cal T}_1}^2({\bf X}_{i1},{\bf X}_{i^{\prime }1})}{2b_1^2}\\ &+&\cdots +\log \frac{1}{b_p}-\frac{D_{{\cal T}_1}^2({\bf X}_{ip},{\bf X}_{i^{\prime }p})}{2b_p^2}+\epsilon _{ii^{\prime }}. \end{eqnarray}$$

It can be reparameterized as

$$\begin{eqnarray} D_{{\cal T}_2}^2({\bf Y}_i,{\bf Y}_{i^{\prime }})&=& \alpha _0+\alpha _1D_{{\cal T}_1}^2({\bf X}_{i1},{\bf X}_{i^{\prime }1})\\ &&+\cdots +\alpha _pD_{{\cal T}_1}^2({\bf X}_{ip},{\bf X}_{i^{\prime }p}) + \varepsilon _{ii^{\prime }}, \end{eqnarray}$$

(7)

where α₀, α₁|$, \ldots ,$| α_p are non-negative coefficients. |$D_{{\cal T}_2}^2({\bf Y}_i,{\bf Y}_{i^{\prime }})$| is inclined to increase along with the predictor distances |$D_{{\cal T}_1}^2({\bf X}_{i1},{\bf X}_{i^{\prime }1}),\dots ,D_{{\cal T}_1}^2({\bf X}_{ip},{\bf X}_{i^{\prime }p})$|⁠. In particular, it is likely that contribution from each predictor distance to |$D_{{\cal T}_2}^2({\bf Y}_i,{\bf Y}_{i^{\prime }})$| varies across predictors. The coefficients |$\alpha _r=b_Y^2/b_r^2$|⁠, r = 1|$, \ldots ,$| p become smaller as bandwidth b_r gets larger. The larger bandwidth of the corresponding predictor indicates the lower contribution on prediction of the response variable. The model in Equation 7 not only aggregates the distances from multiple predictors successfully resolving the curse of dimensionality, but also selectively chooses predictors that have a significant impact on prediction of the mean response trajectory.

In order to estimate the coefficients α₀, α₁|$, \ldots ,$| α_p, we first estimate the distances of individual predictor trajectories over |${\cal T}_1$| and the distance of individual response trajectories over |${\cal T}_2$| between all subjects in the longitudinal data. See Section 3.1 for implementation. Then, the data structure is as follows:

Open in new tab

|$\hat{{\bf D}}_{Y}^2$|

|$\hat{{\bf D}}_{X_1}^2$|

…

|$\hat{{\bf D}}_{X_p}^2$|

|$\hat{D}_{{\cal T}_2}^2({\bf Y}_1,{\bf Y}_{2})$|

|$\hat{D}_{{\cal T}_1}^2({\bf X}_{11},{\bf X}_{21})$|

|$\ldots$|

|$\hat{D}_{{\cal T}_1}^2({\bf X}_{1p},{\bf X}_{2p})$|

|$\hat{D}_{{\cal T}_2}^2({\bf Y}_1,{\bf Y}_{3})$|

|$\hat{D}_{{\cal T}_1}^2({\bf X}_{11},{\bf X}_{31})$|

|$\ldots$|

|$\hat{D}_{{\cal T}_1}^2({\bf X}_{1p},{\bf X}_{3p})$|

|$\hat{D}_{{\cal T}_2}^2({\bf Y}_1,{\bf Y}_{4})$|

|$\hat{D}_{{\cal T}_1}^2({\bf X}_{11},{\bf X}_{41})$|

|$\ldots$|

|$\hat{D}_{{\cal T}_1}^2({\bf X}_{1p},{\bf X}_{4p})$|

⋮

|$\ddots$|

⋮

|$\hat{D}_{{\cal T}_2}^2({\bf Y}_{n-1},{\bf Y}_{n})$|

|$\hat{D}_{{\cal T}_1}^2({\bf X}_{n-1,1},{\bf X}_{n1})$|

|$\ldots$|

|$\hat{D}_{{\cal T}_1}^2({\bf X}_{n-1,p},{\bf X}_{np})$|

The coefficients α₀, α₁|$, \ldots ,$| α_p can be estimated by non-negative least squares regression of |$\hat{{\bf D}}_{Y}^2$| on |$\hat{{\bf D}}_{X_1}^2,\dots ,\hat{{\bf D}}_{X_p}^2$| because |$\hat{D}_{{\cal T}_1}^2({\bf X}_{i1},{\bf X}_{i^{\prime }1}),\dots ,\hat{D}_{{\cal T}_1}^2({\bf X}_{ip},{\bf X}_{i^{\prime }p})$|⁠, and |$\hat{D}_{{\cal T}_2}^2({\bf Y}_i,{\bf Y}_{i^{\prime }})$| are non-negative.

Recall that x₁|$, \ldots ,$| x_p are histories of observed predictors of interest over |${\cal T}_1$|⁠, so that for each i, |$D_{{\cal T}_1}^2({\bf X}_{i1},{\bf x}_{1}),\dots , D_{{\cal T}_1}^2({\bf X}_{ip},{\bf x}_{p})$| can be estimated. Then, |$D_{{\cal T}_2}({\bf Y}_i,\dot{{\bf y}})$|⁠, the distance between the individual response trajectories of subject i and unobserved response of interest, |$\dot{{\bf y}}$|⁠, over |${\cal T}_2$|⁠, can be predicted as follows:

$$\begin{eqnarray} \tilde{D}_{{\cal T}_2}^2({\bf Y}_i,\dot{{\bf y}})= \hat{\alpha }_0+\hat{\alpha }_1\hat{D}_{{\cal T}_1}^2({\bf X}_{i1},{\bf x}_{1})+\cdots +\hat{\alpha }_p\hat{D}_{{\cal T}_1}^2({\bf X}_{ip},{\bf x}_{p}), \end{eqnarray}$$

where |$\hat{\alpha }_0$| is the estimated mean of |$\tilde{D}_{{\cal T}_2}^2({\bf Y}_i,\dot{{\bf y}})$| when |$\hat{D}_{{\cal T}_1}({\bf X}_{i1},{\bf x}_{1})=\cdots =\hat{D}_{{\cal T}_1}({\bf X}_{ip},{\bf x}_{p})=0$|⁠. We assure that |$\tilde{D}_{{\cal T}_2}^2({\bf Y}_i,\dot{{\bf y}})-\hat{\alpha }_0=\hat{\alpha }_1\hat{D}_{{\cal T}_1}^2({\bf X}_{i1},{\bf x}_{1})+\cdots +\hat{\alpha }_p\hat{D}_{{\cal T}_1}^2({\bf X}_{ip},{\bf x}_{p})$| is a reasonable proxy of |$D_i^2({\bf x}_1,\dots ,{\bf x}_p)$| as well as an aggregating measure of the distance taking into account multiple predictor trajectories since it is a nonnegative increasing function of |$\hat{D}_{{\cal T}_1}({\bf X}_{i1},{\bf x}_{1}),\dots ,\hat{D}_{{\cal T}_1}({\bf X}_{ip},{\bf x}_{p})$| and it is equal to 0 when |$\hat{D}_{{\cal T}_1}({\bf X}_{i1},{\bf x}_{1})=\cdots =\hat{D}_{{\cal T}_1}({\bf X}_{ip},{\bf x}_{p})=0$|⁠.

2.3 Proposed estimator and asymptotic results

We propose a nonparametric kernel estimator of E{Y(s)∣x₁|$, \ldots ,$| x_p} replacing D_i(x₁|$, \ldots ,$| x_p) in Equation 2 by its estimated proxy in Section 2.2

$$\begin{eqnarray} \hat{\mu }(s,0)=\frac{\sum _{i=1}^n K_{w}( \hat{w}_i)n_i^{-1}\sum _{k=1}^{n_i}Y_{ik}K_{ik}(s) }{\sum _{i=1}^n K_{w}(\hat{w}_i)n_i^{-1}\sum _{k=1}^{n_i}K_{ik}(s)}, \quad s \in {\cal T}_2, \\ \end{eqnarray}$$

(8)

where K_w(·) is a kernel with bandwidth b_w and

$$\begin{eqnarray} \hat{w}_i&=&\hat{w}_i({\bf x}_1,\dots ,{\bf x}_p)\\ &=&e_i\sqrt{ \hat{\alpha }_1\hat{D}_{{\cal T}_1}^2({\bf X}_{i1},{\bf x}_{1})+\cdots +\hat{\alpha }_p\hat{D}_{{\cal T}_1}^2({\bf X}_{ip},{\bf x}_{p})}, \end{eqnarray}$$

e_i is a discrete random variable with P(e_i = 1) = P(e_i = −1) = 0.5 and consequently, |$\hat{w}_i^2= \hat{\alpha }_1\hat{D}_{{\cal T}_1}^2({\bf X}_{i1},{\bf X}_{1})+\cdots +\hat{\alpha }_p\hat{D}_{{\cal T}_1}^2({\bf X}_{ip},{\bf X}_{p})$|⁠. The random variable e_i plays an important role of dealing with the positive support of the proxy of |$D_i^2({\bf x}_1,\dots ,{\bf x}_p)$| by allowing the density function of w_i to be defined in the neighborhood of 0. As alternative to using e_i, asymmetric kernel functions can be used to handle the positive support (Ferraty and Vieu 2006). Let |$w=w({\bf x}_1,\dots ,{\bf x}_p)=e\sqrt{\alpha _1D_{{\cal T}_1}^2({\bf X}_{1},{\bf x}_{1})+\cdots +\alpha _p D_{{\cal T}_1}^2({\bf X}_{p},{\bf x}_{p})}$|⁠. We let μ(s, 0) be the mean response profiles over |${\cal T}_2$| given a set of predictor trajectories measured over |${\cal T}_1$|⁠, i.e.

$$\begin{eqnarray} \mu (s,0)&=&E\lbrace Y(s)\mid w=0\rbrace =E\lbrace Y(s)\mid D_{{\cal T}_1} ({\bf X}_{1},{\bf x}_{1})\\ &=&\cdots =D_{{\cal T}_1}({\bf X}_{p},{\bf x}_{p})=0\rbrace \\ &=& E[Y(s)\mid E\lbrace X_1(t)\mid {\bf x}_{1}\rbrace ,\dots ,E\lbrace X_p(t)\mid {\bf x}_{p}\rbrace ],\\ && s\in {\cal T}_2. \end{eqnarray}$$

The following theorem states the asymptotic results of the resultant estimator |$\hat{\mu }(s,0)$|⁠.

Theorem 1

Suppose the model in Equation 7 is correctly specified and let s₀ be an interior point in |${\cal T}_2$|⁠. Under Assumptions (1)–(5) in the Appendix, as nb_sb_w → ∞, and |$\sup _n nb_sb_w(b_s^4+b_w^4)\lt \infty$|⁠, we have

$$\begin{eqnarray} &&\sqrt{(nb_sb_w)}\left\lbrace \hat{\mu }(s_0,0) -\mu (s_0,0)-\frac{\rho (s_0,0)}{f_s(s_0)f_w(0)}\right\rbrace \stackrel{d}{\rightarrow }\\ &&N\left\lbrace 0,\frac{\tau \sigma ^2(s_0)\psi _K^2}{f_s(s_0)f_w(0)}\right\rbrace , \end{eqnarray}$$

where |$f_{ s}(\cdot)$| and |$f_{ w}(\cdot)$| are the density functions of |$s_{ ik}$| and w_i, respectively, σ²(s) is the variance of |$Y(S)$|⁠, |$\psi_{ k}$| |$= \int \!\!K^{ 2}(v)dv,$||$\rho (s,0)=b_s^2f_w(0)\left\lbrace \mu ^{\prime }_s(s,0)f^{\prime }_s(s)+\mu ^{\prime \prime }_w(s,0)f_s(s)/2\right\rbrace \phi _K+ b_w^2f_s(s)\left\lbrace \mu ^{\prime }_w(s,0)f^{\prime }_w(0)+\mu ^{\prime \prime }_s(s,0)f_w(0)/2\right\rbrace \phi _K,$| and τ = E(1/n₁). Here, ϕ_K = ∫v²K(v)dv, |$\mu ^{\prime }_s(s_0,0)=\partial \mu (s_0,0)/\partial s$|⁠, |$\mu ^{\prime \prime }_s(s_0,0)=\partial ^2 \mu (s_0,0)/\partial s^2$|⁠, |$\mu ^{\prime }_w(s_0,0)=\partial \mu (s_0,0)/\partial w$| and |$\mu ^{\prime \prime }_w(s_0,0)=\partial ^2 \mu (s_0,0)/\partial w^2$|⁠.

Theorem 1 shows that the proposed estimator is a consistent estimator of μ(s, 0) and states its asymptotic distribution. The bias and limiting variance contains many terms that cannot be easily estimated. Especially, because 0 is not an interior point of w, whose range is [0, ∞), estimation of f_w(0) is challenging. For statistical inference about μ(s, 0), we suggest adopting a bootstrap approach to construct a simultaneous confidence band (SCB). Kim et al. (2021a) propose a bootstrap SCB that meets the nominal coverage probability and, in other words, contains |$(1-\alpha )100\%$| of bootstrap trajectory estimates over |${\cal T}_2$|⁠. The bootstrap method is implemented here by sampling |$\lbrace ({\bf Y}_i,\hat{w}_i)\rbrace _i$| with replacement.

Remark 2

When the longitudinal data are regularly measured at the same time from all the subjects, |$D_{{\cal T}_1}({\bf X}_{ir},{\bf X}_{i^{\prime }r})$| in Equation 3 can be redefined measurement-wisely as

$$\begin{eqnarray} D_{{\cal T}_1}({\bf X}_{ir},{\bf X}_{i^{\prime }r})=\sqrt{\sum _{j=1}^m\lbrace X_{ir}(t_{j})-X_{i^{\prime }r}(t_{j})\rbrace ^2}, \end{eqnarray}$$

and it is a metric. As a result, |$D_{{\cal T}_1}({\bf X}_{ir},{\bf X}_{i^{\prime }r})=0$| implies |${\bf X}_{ir}={\bf X}_{i^{\prime }r}$|⁠, and |$\hat{\mu }(s,0)$| is a consistent estimator of E{Y(s)∣x₁|$, \ldots ,$| x_p}.

Remark 3

When forecasting the future mean response trajectory is of main interest (i.e. for any |$s\in {\cal T}_2$|⁠, t_m < s), the observed response in |${\cal T}_1$| is often employed as one of the predictors. A simple and special case is to predict the future mean response trajectory based only on its own past observations

$$\begin{eqnarray} E[g\lbrace Y(s)\rbrace \mid y(t_{1}),\dots ,y(t_{m})], \quad s \in {\cal T}_2. \end{eqnarray}$$

In this case, the use of proxy variable in Equation 7 is not necessary because the curse of dimensionality is not seriously problematic anymore based on the fact that the response in the past is the only predictor. The future mean trajectory |$E\lbrace Y(s)\mid D_{{\cal T}_1}({\bf Y}_i,{\bf y})=0\rbrace =E[Y(s)\mid E\lbrace Y(t)\mid {\bf y}\rbrace ]$|⁠, where y = {y(t₁)|$, \ldots ,$| y(t_m)} is observed in |${\cal T}_1$|⁠, can be readily estimated by

$$\begin{eqnarray} &&\hat{E}[Y(s)\mid E\lbrace Y(t)\mid {\bf y}\rbrace ]\\ &&=\frac{\sum _{i=1}^nK_Y\lbrace \hat{D}_{{\cal T}_1}({\bf Y}_i,{\bf y})\rbrace n_i^{-1}\sum _{k=1}^{n_i}Y_{ik}K_{ik}(s) }{\sum _{i=1}^nK_Y\lbrace \hat{D}_{{\cal T}_1}({\bf Y}_i,{\bf y})\rbrace n_i^{-1}\sum _{k=1}^{n_i}Y_{ik}K_{ik}(s) }, \quad s \in {\cal T}_2. \end{eqnarray}$$

3 IMPLEMENTATION

3.1 Estimation of individual trajectories and distance

Estimation of individual trajectories has been actively studied during recent decades (Yao 2007, Di et al. 2009, Leroux et al. 2018). We employ the nonparametric functional principle component analysis (FPCA) approach to estimate individual trajectories of predictors and response variables (Hall et al. 2008). Under an assumption that the individual trajectory follows a latent Gaussian process model, the Karhunen–Loève expansion is employed

$$\begin{eqnarray} X_{ir}(t_{ij})=\mu _r(t_{ij})+\sum _{k=1}^\infty \zeta _{ir k}\eta _{r k}(t_{ij})+\varepsilon _{ijr}, \end{eqnarray}$$

where η_rk(t_ij) are orthonormal eigenfunctions, scores ζ_irk serve as random effects, and ε_ijr is an independent and identically distributed random error. As a result, the individual trajectory is

$$\begin{eqnarray} E\lbrace X_{ir}(t)\mid {\bf X}_{ir}\rbrace =\mu _r(t)+\sum _{k=1}^\infty E(\zeta _{ir k}\mid {\bf X}_{ir})\eta _{r k}(t). \end{eqnarray}$$

This approach is flexible and developed for sparse and irregular longitudinal data. The estimated individual trajectories of predictors and response variables, that is,

$$\begin{eqnarray} &&\hat{E}\lbrace X_{ir}(t)\mid {\bf X}_{ir}\rbrace ,~r=\times1,\dots ,p,~ t \in {\cal T}_1,\\ && \mbox{and}~ \hat{E}\lbrace Y_{i}(s)\mid {\bf Y}_{i}\rbrace ,~s \in {\cal T}_2, \end{eqnarray}$$

(9)

are obtained for all subjects (Yao et al. 2005a) in the statistical software R using the FPCA function from the package fdapace (Zhou et al. 2022). Let v₁|$, \ldots ,$| v_L and u₁|$, \ldots ,$| u_H be evenly spaced time points on |${\cal T}_1$| and |${\cal T}_2$|⁠, respectively. The L₂ metrics between individual trajectories can be estimated as for any i < i′

$$\begin{eqnarray} \hat{D}_{{\cal T}_1}({\bf X}_{ir},{\bf X}_{i^{\prime }r})&=& \sqrt{L^{-1}\sum _{l=1}^{L} [\hat{E}\lbrace X_{ir}(v_l)|{\bf X}_{ir}\rbrace -\hat{E}\lbrace X_{i^{\prime }r}(v_l)|{\bf X}_{i^{\prime }r}\rbrace ]^2},\\ && r=1,\dots ,p\\ \hat{D}_{{\cal T}_2}({\bf Y}_i,{\bf Y}_{i^{\prime }})&=& \sqrt{H^{-1}\sum _{h=1}^{H} [\hat{E}\lbrace Y_{i}(u_h)|{\bf Y}_{i}\rbrace -\hat{E}\lbrace Y_{i^{\prime }}(u_h)|{\bf Y}_{i^{\prime }}\rbrace ]^2}. \end{eqnarray}$$

3.2 Estimation of tuning parameters

We first obtain the estimates of the coefficients in Equation 7, |$\hat{\alpha }_0,\hat{\alpha }_1,\dots ,\hat{\alpha }_p$|⁠, by non-negative least squares regression of |$\hat{{\bf D}}_{Y}^2$| on |$\hat{{\bf D}}_{X_1}^2,\dots ,\hat{{\bf D}}_{X_p}^2$| defined in Section 2.2. To select the bandwidths b_w and b_s in Equation 8, we adopt the leave-one-subject-out cross-validation approach that minimizes the prediction error of Y_ik. For each subject i, we obtain the predicted L₂ metric |$D_{{\cal T}_2}({\bf Y}_i,{\bf Y}_{i^{\prime }})$| denoted by |$\tilde{D}_{{\cal T}_2}({\bf Y}_i,{\bf Y}_{i^{\prime }})$| using the estimated coefficients

$$\begin{eqnarray} \tilde{D}_{{\cal T}_2}({\bf Y}_i,{\bf Y}_{i^{\prime }})&=&\hat{\alpha }_0+\hat{\alpha }_1\hat{D}_{{\cal T}_1}^2({\bf X}_{i1},{\bf X}_{i^{\prime }1})\\ &&+\cdots +\hat{\alpha }_p\hat{D}_{{\cal T}_1}^2({\bf X}_{ip},{\bf X}_{i^{\prime }p}), \quad i\ne i^{\prime }. \end{eqnarray}$$

Then, we compute |$\hat{w}_{ii^{\prime }}=e_{ii^{\prime }}\sqrt{\tilde{D}_{{\cal T}_2}({\bf Y}_i,{\bf Y}_{i^{\prime }})-\hat{\alpha }_0}$|⁠, where |$e_{ii^{\prime }}$| is a discrete random variable with |$P(e_{ii^{\prime }}=1)=P(e_{ii^{\prime }}=-1)=0.5$|⁠. Then, using all subjects except for the ith subject, we have a prediction of Y_i(s_ik), |$s_{ik}\in {\cal T}_2$| as follows

$$\begin{eqnarray} &&\hat{E}^{(-i)}\lbrace Y_{i}(s_{ik})\mid {\bf X}_{i1},\dots ,{\bf X}_{ip}\rbrace \\ &&=\frac{\sum _{i^{\prime }=1}^n I(i^{\prime }\ne i)K_{w}(\hat{w}_{ii^{\prime }})n_{i^{\prime }}^{-1}\sum _{k^{\prime }=1}^{n_{i^{\prime }}}Y_{i^{\prime }k^{\prime }}K_{i^{\prime }k^{\prime }}(s_{ik}) }{\sum _{i^{\prime }=1}^nI(i^{\prime }\ne i)K_{w}(\hat{w}_{ii^{\prime }})n_{i^{\prime }}^{-1}\sum _{k^{\prime }=1}^{n_{i^{\prime }}}K_{i^{\prime }k^{\prime }}(s_{ik}) }. \end{eqnarray}$$

The optimal bandwidths are selected by minimizing

$$\begin{eqnarray} (b_w^{*},b_s^{*})&=&\mathop {\mbox{argmin}}_{b_w,b_s}\sum _{i=1}^n\sum _{k=1}^{n_i}I(s_{ik}\in {\cal T}_2)\\ &&\times\, [Y_{ik}-\hat{E}^{(-i)}\lbrace Y_{i}(s_{ik})\mid {\bf X}_{i1},\dots ,{\bf X}_{ip}\rbrace ]^2. \end{eqnarray}$$

3.3 Estimation of mean response trajectory

Recall that x_r = {x_r(t₁)|$, \ldots ,$| x_r(t_m)}, for r = 1|$, \ldots ,$| p, are vectors of the observer predictors of interest at times t₁|$, \ldots ,$| t_m. Since all the tuning parameters α₀, α₁|$, \ldots ,$| α_p, b_Y and b_s are estimated in Section 3.2, the mean response trajectory μ(s, 0), |$s \in {\cal T}_2$|⁠, can be readily predicted through the following steps:

Step 1: Estimate individual trajectories E{X_r(t)∣x_r}, r = 1|$, \ldots ,$| p, over |$t\in {\cal T}_1=[t_1,t_m]$| with the fitted model in Equation 9.

Step 2: Compute the distances from individual predictor trajectories based on x₁|$, \ldots ,$| x_p to them of subjects in the longitudinal data:

$$\begin{eqnarray} \hat{D}_{{\cal T}_1}({\bf X}_{ir},{\bf x}_{r}) &=& \sqrt{L^{-1}\sum _{l=1}^{L} [\hat{E}\lbrace X_{ir}(v_l)|{\bf X}_{ir}\rbrace -\hat{E}\lbrace X_{r}(v_l)|{\bf x}_{r})\rbrace ]^2},\\ && r=1,\dots ,p. \end{eqnarray}$$

Step 3: Compute

|$\hat{w}_i=e_i\sqrt{\hat{\alpha }_1\hat{D}_{{\cal T}_1}^2({\bf X}_{i1},{\bf x}_{1})+\cdots +\hat{\alpha }_p\hat{D}_{{\cal T}_1}^2({\bf X}_{ip},{\bf x}_{p})}$|⁠.

Step 4: μ(s, 0) can be predicted with |$b_s^{*}$| and |$b_w^{*}$| by the proposed approach

$$\begin{eqnarray} \hat{\mu }(s,0)=\frac{\sum _{i=1}^nK_{w}( \hat{w}_i)n_i^{-1}\sum _{k=1}^{n_i}Y_{ik}K_{ik}(s) }{\sum _{i=1}^nK_{w}(\hat{w}_i)n_i^{-1}\sum _{k=1}^{n_i}K_{ik}(s)}, \quad s \in {\cal T}_2. \end{eqnarray}$$

4 REAL DATA ANALYSIS

4.1 Data description and objective of analysis

The Framingham Heart Study (FHS) is one of the most well-known longitudinal investigations that began in 1948. The main goal of the study is to identify the risk factors of cardiovascular diseases, including obesity, hypertension, and myocardial infarction, and to estimate corresponding growth curves over time. The FHS consists of 5209 participants of Framingham cohort and 5124 of their second generation cohort with spouses who took physical examinations every 3 or 4 years, but the participants may miss measurements due to various reasons, and therefore the measurements are sparse and irregular (O’Donnell and Elosua 2008).

In our analysis, we demonstrate the functionality of our proposed method by evaluating the prediction significance of body mass index (BMI), high-density lipoprotein (HDL), systolic blood pressure (SBP), diastolic blood pressure (DBP), and total cholesterol (TOT) over |${\cal T}_1=[25,40]$| in forecasting the BMI trajectories over |${\cal T}_2=[41,60]$|⁠. We chose this specific time period so that participants who just turned middle-aged from young adulthood can adjust their dietary structures or receive timely medication care if overweight and obesity were predicted. Since the five predictors have right-skewed distributions, logarithm transformation is applied. They are denoted by LBMI, LHDL, LSBP, LDBP, and LTOT, respectively. To obtain a robust interpolation when estimating individual trajectories by sparse FPCA, the cohort data are pre-processed to let each subject have at least 7 measurements over |${\cal T}=[25,60]$|⁠, so that the domain plane is densely populated with the design points as shown in Figure 1 (Yao et al. 2005b).

$Design plot of 541 subjects in Framingham cohort data for sparse FPCA over ${\cal T} = [25,60]$. The plot contains assembled pairs (Ageij, Ageik), i = 1, …, n, j, k = 1, …, mi, where mi is the number of measurements from the ith subject.$

FIGURE 1

Design plot of 541 subjects in Framingham cohort data for sparse FPCA over |${\cal T} = [25,60]$|⁠. The plot contains assembled pairs (Age_ij, Age_ik), i = 1, …, n, j, k = 1, …, m_i, where m_i is the number of measurements from the ith subject.

Open in new tab Download slide

We compare our method with three functional regression models: the function-on-function linear regression (FLR) model

$$\begin{eqnarray} Y(s)&=&\beta _0(s)+\sum _{i=1}^{p-1}\int _{{\cal T}_1}X_i(t)\beta _i(t,s)dt\\ && + \, \int _{{\cal T}_1}Y(t)\beta _p(t,s)dt+\epsilon (s), \end{eqnarray}$$

the function-on-function additive regression (FAR) model in Scheipl et al. (2015)

$$\begin{eqnarray} Y(s)&=&\beta _0(s)+\sum _{i=1}^{p-1}\int _{{\cal T}_1}\beta _i\lbrace t,s,X_i(t)\rbrace dt\\ && + \, \int _{{\cal T}_1}\beta _p\lbrace t,s,Y(t)\rbrace dt+\epsilon (s), \end{eqnarray}$$

and a dynamic prediction in Leroux et al. (2018) based on the functional concurrent regression (FCR) model

$$\begin{eqnarray} Y(t)=\beta _0(t)+\sum _{i=1}^{p-1}\beta _i(t)X_i(t)+b(t)+\epsilon (t). \end{eqnarray}$$

Leroux et al. (2018) enables to predict subject-specific future trajectories using the mixed effect representation based on their past data. We remark while our proposed method, FLR and FAR models are marginal models, the dynamic FCR model is a mixed effects model, that takes into account the random effects. Furthermore, our proposed method, FLR and FAR models can employ the response variable measured in |${\cal T}_1$| directly as a predictor. In contrast, the dynamic FCR model adopts the response information in |${\cal T}_1$| via the functional random effect b(t). The FLR and FAR are implemented using refund package (Goldsmith et al. 2023) in R and the FCR model is implemented using fcr package (Leroux et al. 2022) in R.

4.2 Results

The point estimates by non-negative least squares and their 95|$\%$| bootstrap confidence intervals for the coefficients in Equation 7 are shown in Table 1. The 95|$\%$| bootstrap confidence interval is obtained based on 1000 bootstrap random samples from the distance data shown in Section 2.2. We demonstrate the performance of the proposed bootstrap confidence interval via simulation studies and the results are reported in supplementary materials. It is expected that a lower HDL and higher SBP, BMI, and TOT in young adulthood result in a higher risk of obesity in the middle-aged.

TABLE 1

Open in new tab

Coefficient estimation for the prediction of future BMI trajectory.

Predictor distance	Parameter	Estimate	\|$95\%$\| Bootstrap CI
Intercept	α₀	0.0199	(0.0195, 0.0203)
\|$\hat{{\bf D}}_{LSBP}^2$\|	α₁	0.0445	(0.0371, 0.0521)
\|$\hat{{\bf D}}_{LHDL}^2$\|	α₂	0.0011	(0.0003, 0.0021)
\|$\hat{{\bf D}}_{LBMI}^2$\|	α₃	0.8630	(0.8568, 0.8693)
\|$\hat{{\bf D}}_{LDBP}^2$\|	α₄	0.0000	(0.0000, 0.0000)
\|$\hat{{\bf D}}_{LTOT}^2$\|	α₅	0.0065	(0.0044, 0.0088)

Predictor distance	Parameter	Estimate	\|$95\%$\| Bootstrap CI
Intercept	α₀	0.0199	(0.0195, 0.0203)
\|$\hat{{\bf D}}_{LSBP}^2$\|	α₁	0.0445	(0.0371, 0.0521)
\|$\hat{{\bf D}}_{LHDL}^2$\|	α₂	0.0011	(0.0003, 0.0021)
\|$\hat{{\bf D}}_{LBMI}^2$\|	α₃	0.8630	(0.8568, 0.8693)
\|$\hat{{\bf D}}_{LDBP}^2$\|	α₄	0.0000	(0.0000, 0.0000)
\|$\hat{{\bf D}}_{LTOT}^2$\|	α₅	0.0065	(0.0044, 0.0088)

TABLE 1

Open in new tab

Coefficient estimation for the prediction of future BMI trajectory.

Predictor distance	Parameter	Estimate	\|$95\%$\| Bootstrap CI
Intercept	α₀	0.0199	(0.0195, 0.0203)
\|$\hat{{\bf D}}_{LSBP}^2$\|	α₁	0.0445	(0.0371, 0.0521)
\|$\hat{{\bf D}}_{LHDL}^2$\|	α₂	0.0011	(0.0003, 0.0021)
\|$\hat{{\bf D}}_{LBMI}^2$\|	α₃	0.8630	(0.8568, 0.8693)
\|$\hat{{\bf D}}_{LDBP}^2$\|	α₄	0.0000	(0.0000, 0.0000)
\|$\hat{{\bf D}}_{LTOT}^2$\|	α₅	0.0065	(0.0044, 0.0088)

Predictor distance	Parameter	Estimate	\|$95\%$\| Bootstrap CI
Intercept	α₀	0.0199	(0.0195, 0.0203)
\|$\hat{{\bf D}}_{LSBP}^2$\|	α₁	0.0445	(0.0371, 0.0521)
\|$\hat{{\bf D}}_{LHDL}^2$\|	α₂	0.0011	(0.0003, 0.0021)
\|$\hat{{\bf D}}_{LBMI}^2$\|	α₃	0.8630	(0.8568, 0.8693)
\|$\hat{{\bf D}}_{LDBP}^2$\|	α₄	0.0000	(0.0000, 0.0000)
\|$\hat{{\bf D}}_{LTOT}^2$\|	α₅	0.0065	(0.0044, 0.0088)

Based on the results in Table 1, we can conclude that the trajectories of SBP, BMI, and TOT in |${\cal T}_1$| are highly significant predictors for the trajectories of BMI in |${\cal T}_2$|⁠. As expected, BMI in the past has the strongest association with the BMI in the future. Even though TOT effect is significant, its effect is minimal compared to the effects of BMI and SBP. Thus, it is shown that the future BMI trajectory is predicted mainly by the past trajectories of BMI and SBP. In contrast, the CI for the parameter regarding HDL is barely >0 and the estimated coefficient for DBP is effectively 0. This can be explained by the fact that (i) HDL is referred to as good cholesterol and hence not considered as a risk factor for increase in BMI, and (ii) SBP and DBP are usually correlated but SBP serves as a more significant indicator of obesity-related conditions, and hence the effect of DBP is ruled out by our model. We refer to Nielsen et al. (1997), He and Whelton (1999), and Mourad (2008) for more detailed discussion.

The predicted future BMI trajectories of 4 patients over |${\cal T}_2=[41,60]$|⁠, computed by the proposed kernel estimator (KE), FCR, FLR, and FAR models are displayed on Figure 2. These 4 patients are selected from a testing set to visually compare the model performance. Because all 4 estimation approaches can decently capture the overall trend of BMI over |${\cal T}_2$|⁠, we further conduct 5 repeated 10-fold cross validations to quantify their performance via the average and standard deviation of mean squared prediction errors (AMSPE and SD, respectively) from 50 splits in total. The results are reported in Table 2. The four methods are quite comparable, but FCR performs best in terms of AMSPE. The proposed method performs prediction more accurately (with a smaller AMSPE) than FLR and FAR, but less precisely (with a larger SD) compared to its competitors.

$Predicted BMI trajectories over ${\cal T}_2=[41,60]$ by the proposed approach (solid), FCR (dotted and dashed), FLR (dashed), and FAR (dotted).$

FIGURE 2

Predicted BMI trajectories over |${\cal T}_2=[41,60]$| by the proposed approach (solid), FCR (dotted and dashed), FLR (dashed), and FAR (dotted).

Open in new tab Download slide

TABLE 2

Open in new tab

Average and standard deviation (SD) of MSPE from 5 repeated 10-fold cross validations (×100) in the future BMI prediction.

	Full data				Reduced data
	KE	FCR	FLR	FAR	KE	FCR	FLR	FAR
AMSPE	1.057	0.985	1.086	1.094	1.043	0.981	1.072	1.084
SD	0.232	0.227	0.200	0.212	0.162	0.197	0.201	0.194

	Full data				Reduced data
	KE	FCR	FLR	FAR	KE	FCR	FLR	FAR
AMSPE	1.057	0.985	1.086	1.094	1.043	0.981	1.072	1.084
SD	0.232	0.227	0.200	0.212	0.162	0.197	0.201	0.194

TABLE 2

Open in new tab

Average and standard deviation (SD) of MSPE from 5 repeated 10-fold cross validations (×100) in the future BMI prediction.

	Full data				Reduced data
	KE	FCR	FLR	FAR	KE	FCR	FLR	FAR
AMSPE	1.057	0.985	1.086	1.094	1.043	0.981	1.072	1.084
SD	0.232	0.227	0.200	0.212	0.162	0.197	0.201	0.194

	Full data				Reduced data
	KE	FCR	FLR	FAR	KE	FCR	FLR	FAR
AMSPE	1.057	0.985	1.086	1.094	1.043	0.981	1.072	1.084
SD	0.232	0.227	0.200	0.212	0.162	0.197	0.201	0.194

The kernel-based prediction is vulnerable to unusual predictor values because it predicts the response based on similarity of predictors without imposing a structural assumption on the relationship between the response and predictors. It is found that there are a few subjects having extremely large BMI values in |${\cal T}_1$| and their poor prediction leads to relatively large standard deviation. After removing subjects with LBMI>4 (BMI>54.6) in |${\cal T}_1$|⁠, we perform 5 repeated 10-fold cross-validations again, and the results are also reported in Table 2. This result shows that our proposed method may outperform its competitors in the prediction for future BMI trajectories of subjects with more ordinary past biometric trajectories, but an extra caution is required if a subject’s past biometric information is extraordinary. The fact that FCR performs very well with this real data enables us to conjecture that the relationship between the response BMI and the predictors are quite linear concurrently based on the simulation studies in the next section.

We provide more comprehensive figures for the four subjects in Figure 2. Figures 3 and 4 show observed values of SBP, DBP, TOT, HDL, and BMI, as well as their estimated trajectories in |${\cal T}_1=[25,40]$|⁠. They also show observed values of BMI, its predicted trajectories, and a 95% simultaneous confidence band computed by the proposed method in |${\cal T}_2=[41,60]$|⁠. In Figure 3, it is observed that the subjects have similar SBP, TOT, and BMI trajectories and this results in similar future BMI trajectories. However, the sudden increase in BMI of Subject 48 in |${\cal T}_2$| is hard to predict based on the patients’ past biometric information, so none of the four methods considered are successfully to predict it in Figure 2. We remark that the trajectories are the conditional mean of the future response and the band is a confidence band. Thus, there might be some discrepancy between real observations and the predicted trajectories. It should be noted that the SCBs become wider as age increases, that makes sense because uncertainty enlarges when making predictions for further future. In Figure 4, we observe that there is more prediction variability for obese subjects, and even though BMI of Subject 36 decreased in the past, the proposed method successfully predicts the increasing trend in future BMI based on increasing trend in the past trajectories of other predictors.

$Observed values of SBP, DBP, TOT, HDL, and BMI and their estimated trajectories (dashed) in ${\cal T}_1=[25,40]$, and observed values of BMI and its predicted BMI trajectory (solid) over ${\cal T}_2=[41,60]$ by proposed approach with 95$\%$ bootstrap SCB (dotted).$

FIGURE 3

Observed values of SBP, DBP, TOT, HDL, and BMI and their estimated trajectories (dashed) in |${\cal T}_1=[25,40]$|⁠, and observed values of BMI and its predicted BMI trajectory (solid) over |${\cal T}_2=[41,60]$| by proposed approach with 95|$\%$| bootstrap SCB (dotted).

Open in new tab Download slide

$Observed values of SBP, DBP, TOT, HDL, and BMI and their estimated trajectories (dashed) in ${\cal T}_1=[25,40]$, and observed values of BMI and its predicted BMI trajectory (solid) over ${\cal T}_2=[41,60]$ by proposed approach with 95$\%$ bootstrap SCB (dotted).$

FIGURE 4

Observed values of SBP, DBP, TOT, HDL, and BMI and their estimated trajectories (dashed) in |${\cal T}_1=[25,40]$|⁠, and observed values of BMI and its predicted BMI trajectory (solid) over |${\cal T}_2=[41,60]$| by proposed approach with 95|$\%$| bootstrap SCB (dotted).

Open in new tab Download slide

5 SIMULATION STUDIES

In this section, we conduct simulation studies to compare the predictive performance of the proposed kernel estimator with three competing methods and to justify the consistency of the proposed method in Theorem 1. The training sets are generated over |${\cal T}=[1,50]$| by the following three concurrent time-varying coefficient regression models, a linear model with Equation 10, a quadratic model with Equation 11, and an interaction model with Equation 12:

$$\begin{eqnarray} X_i(t_{ij})& =& \alpha _{0i} + \mu _{X}(t_{ij})+L_{ij}, \\ Z_i(t_{ij})& =& \alpha _{0i}+ \mu _{Z}(t_{ij})+G_{ij}, \\ Y_i(t_{ij}) &=& \frac{X_i(t_{ij})}{5} + Z_i(t_{ij})+D_{ij}, \end{eqnarray}$$

(10)

$$\begin{eqnarray} Y_i(t_{ij}) &=& \frac{X_i(t_{ij})}{5} + Z_i(t_{ij})+\frac{X_i^2(t_{ij})}{50}+D_{ij}, \end{eqnarray}$$

(11)

$$\begin{eqnarray} Y_i(t_{ij}) &=& \frac{X_i(t_{ij})}{5} + Z_i(t_{ij})+\frac{X_i(t_{ij})Z_i(t_{ij})}{50}+D_{ij}, \end{eqnarray}$$

(12)

where μ_X(t) = t and μ_Z(t) = 5cos (2πt/50) are marginal mean functions of measurement time t, α_0i ∼ U( − 3.5, 3.5) is a random subject effect, which adds a dependence between the predictors X_i(t) and Z_i(t). The irregular measurement times are generated by t_ij = 5(j − 1) + DU(1, 5), j = 1|$, \ldots ,$| 10 with a discrete uniform distribution DU(·, ·). Here, the discrete uniform distribution is employed because refund packages (Goldsmith et al. 2023) in R requires a set of measurements at discrete time points in order to compute the covariance matrix for FLR model and FAR model. The errors L_ij, G_ij, and D_ij are generated from Gaussian processes to account for dependence within subject, i.e.

$$\begin{eqnarray} {L}_i \sim \mathcal {GP}(0,\Sigma _i), \quad {G}_i \sim \mathcal {GP}(0,\Sigma _i), \quad {D}_i \sim \mathcal {GP}(0,\Sigma _i), \end{eqnarray}$$

where the covariance matrix Σ_i is defined by the squared exponential covariance function k(t, t′) = σ²exp{ − (t − t′)²/(2l²)} with the variance parameter σ² = 1 and the length scale parameter l = 50. Two additional distributions are considered to generate errors of Y_i(t_ij), that is, D_ij: multivariate t-distribution with 10° of freedom and lognormal distribution by exponentiating Gaussian process D_i with the variance σ² = 0.25.

Testing sets are generated in |${\cal T}_1=[1,25]$| via the data-generating mechanism in Equations 10, 11, and 12. Then, using the generated training longitudinal data, we perform a prediction of the future response Y_i(t) for |$t \in {\cal T}_2=[26,50]$|⁠, based on the past observed predictors Y_i(t_ij), X_i(t_ij), and Z_i(t_ij) for |$t_{ij} \in {\cal T}_1=[1,25]$|⁠. It is worthwhile to note that FLR, FAR, and the proposed approach are misspecified because the data are generated from the concurrent regression models. FCR may outperform for the linear model in Equation 10 because it is correctly specified, but it is also tested under the incorrectly specified scenarios in Equations 11 and 12. We train the proposed method and its competitors with sample sizes of 300, 500, and 800 and test their prediction performance with 50 subjects. The optimal bandwidths are obtained by the proposed leave-one-subject-our cross validation in Section 3.2.

To evaluate the prediction performance, we conduct 500 simulation runs and adopt the same criteria of average and standard deviation of mean squared prediction error (AMSPE and SD) as we have done in real data analysis. The results with Gaussian process errors are reported in Table 3. AMSPE and corrected SD (CSD) of our method decreases as the number of subjects increases, which justifies the consistency of proposed method in Theorem 1. Corrected AMSPE (CAMSPE) is computed after dropping extreme MSPEs (MSPE > 5) that are observed, especially in FCR and FAR due to the convergence issue (Scheipl et al. 2015). The considerable difference between AMSPE and CAMSPE indicates the existence of prediction failures due to the convergence issue. When the sample size is relatively small, the FLR and FAR sometimes fail in prediction, but the proposed approach does not fail with any data set generated in this simulation study. In contrast, FCR struggles with the convergence issue most frequently and the failure rates are mostly between 1 and 3% because the number of observations within subjects, that is, 10, are relatively small to compute the random effect.

TABLE 3

Open in new tab

Normal error case: average (AMSPE), standard deviation (SD), corrected average (CAMSPE), and corrected standard deviation (CSD) of MSPE, when n = 300, 500, and 800 with the failure rate (FR).

			AMSPE	SD	CAMSPE	CSD	FR (⁠\|$\%$\|⁠)
Linear	300	KE	0.4215	0.1126	0.4228	0.1129	0.0
		FCR	0.5773	3.2819	0.3164	0.5751	2.0
		FLR	0.4163	0.2367	0.4176	0.2375	0.0
		FAR	0.4148	0.1570	0.4160	0.1563	0.0
	500	KE	0.3785	0.1197	0.3782	0.1202	0.0
		FCR	0.3738	0.9245	0.3092	0.6362	1.0
		FLR	0.4471	0.6310	0.4145	0.1480	0.2
		FAR	0.4297	0.2785	0.4172	0.1508	0.2
	800	KE	0.3393	0.1080	0.3390	0.1084	0.0
		FCR	0.4211	1.4461	0.2962	0.5822	1.0
		FLR	0.4351	0.1385	0.4346	0.1390	0.0
		FAR	0.4351	0.1358	0.4347	0.1363	0.0
Quadratic	300	KE	0.5219	0.1092	0.5232	0.1097	0.0
		FCR	0.6545	3.2673	0.3787	0.6131	2.2
		FLR	0.4728	0.2903	0.4618	0.1630	0.2
		FAR	0.5049	0.5968	0.4674	0.1694	0.4
	500	KE	0.4803	0.1070	0.4804	0.1073	0.0
		FCR	0.4401	0.9373	0.3761	0.6530	1.0
		FLR	0.5921	2.4062	0.4767	0.1680	0.2
		FAR	11.9083	250.8289	0.4782	0.1633	0.6
	800	KE	0.4350	0.0953	0.4346	0.0956	0.0
		FCR	0.4760	1.4241	0.3540	0.5997	0.2
		FLR	0.4879	0.1645	0.4874	0.1652	0.0
		FAR	0.4851	0.1448	0.4848	0.1453	0.0
Interaction	300	KE	0.6624	0.1307	0.6631	0.1317	0.0
		FCR	1.6664	4.5351	1.2379	0.6994	3.2
		FLR	0.8295	4.4788	0.5753	0.2028	0.4
		FAR	0.7765	2.7949	0.5921	0.2531	0.6
	500	KE	0.6251	0.1349	0.6255	0.1353	0.0
		FCR	1.3794	1.3184	1.2012	0.6185	2.6
		FLR	0.5857	0.1838	0.5842	0.1831	0.0
		FAR	0.6161	0.4347	0.5932	0.2476	0.4
	800	KE	0.5791	0.1139	0.5802	0.1138	0.0
		FCR	1.4297	1.9109	1.2040	0.6556	2.2
		FLR	0.5832	0.1738	0.5832	0.1753	0.0
		FAR	0.5816	0.1733	0.5816	0.1747	0.0

			AMSPE	SD	CAMSPE	CSD	FR (⁠\|$\%$\|⁠)
Linear	300	KE	0.4215	0.1126	0.4228	0.1129	0.0
		FCR	0.5773	3.2819	0.3164	0.5751	2.0
		FLR	0.4163	0.2367	0.4176	0.2375	0.0
		FAR	0.4148	0.1570	0.4160	0.1563	0.0
	500	KE	0.3785	0.1197	0.3782	0.1202	0.0
		FCR	0.3738	0.9245	0.3092	0.6362	1.0
		FLR	0.4471	0.6310	0.4145	0.1480	0.2
		FAR	0.4297	0.2785	0.4172	0.1508	0.2
	800	KE	0.3393	0.1080	0.3390	0.1084	0.0
		FCR	0.4211	1.4461	0.2962	0.5822	1.0
		FLR	0.4351	0.1385	0.4346	0.1390	0.0
		FAR	0.4351	0.1358	0.4347	0.1363	0.0
Quadratic	300	KE	0.5219	0.1092	0.5232	0.1097	0.0
		FCR	0.6545	3.2673	0.3787	0.6131	2.2
		FLR	0.4728	0.2903	0.4618	0.1630	0.2
		FAR	0.5049	0.5968	0.4674	0.1694	0.4
	500	KE	0.4803	0.1070	0.4804	0.1073	0.0
		FCR	0.4401	0.9373	0.3761	0.6530	1.0
		FLR	0.5921	2.4062	0.4767	0.1680	0.2
		FAR	11.9083	250.8289	0.4782	0.1633	0.6
	800	KE	0.4350	0.0953	0.4346	0.0956	0.0
		FCR	0.4760	1.4241	0.3540	0.5997	0.2
		FLR	0.4879	0.1645	0.4874	0.1652	0.0
		FAR	0.4851	0.1448	0.4848	0.1453	0.0
Interaction	300	KE	0.6624	0.1307	0.6631	0.1317	0.0
		FCR	1.6664	4.5351	1.2379	0.6994	3.2
		FLR	0.8295	4.4788	0.5753	0.2028	0.4
		FAR	0.7765	2.7949	0.5921	0.2531	0.6
	500	KE	0.6251	0.1349	0.6255	0.1353	0.0
		FCR	1.3794	1.3184	1.2012	0.6185	2.6
		FLR	0.5857	0.1838	0.5842	0.1831	0.0
		FAR	0.6161	0.4347	0.5932	0.2476	0.4
	800	KE	0.5791	0.1139	0.5802	0.1138	0.0
		FCR	1.4297	1.9109	1.2040	0.6556	2.2
		FLR	0.5832	0.1738	0.5832	0.1753	0.0
		FAR	0.5816	0.1733	0.5816	0.1747	0.0

TABLE 3

Open in new tab

Normal error case: average (AMSPE), standard deviation (SD), corrected average (CAMSPE), and corrected standard deviation (CSD) of MSPE, when n = 300, 500, and 800 with the failure rate (FR).

			AMSPE	SD	CAMSPE	CSD	FR (⁠\|$\%$\|⁠)
Linear	300	KE	0.4215	0.1126	0.4228	0.1129	0.0
		FCR	0.5773	3.2819	0.3164	0.5751	2.0
		FLR	0.4163	0.2367	0.4176	0.2375	0.0
		FAR	0.4148	0.1570	0.4160	0.1563	0.0
	500	KE	0.3785	0.1197	0.3782	0.1202	0.0
		FCR	0.3738	0.9245	0.3092	0.6362	1.0
		FLR	0.4471	0.6310	0.4145	0.1480	0.2
		FAR	0.4297	0.2785	0.4172	0.1508	0.2
	800	KE	0.3393	0.1080	0.3390	0.1084	0.0
		FCR	0.4211	1.4461	0.2962	0.5822	1.0
		FLR	0.4351	0.1385	0.4346	0.1390	0.0
		FAR	0.4351	0.1358	0.4347	0.1363	0.0
Quadratic	300	KE	0.5219	0.1092	0.5232	0.1097	0.0
		FCR	0.6545	3.2673	0.3787	0.6131	2.2
		FLR	0.4728	0.2903	0.4618	0.1630	0.2
		FAR	0.5049	0.5968	0.4674	0.1694	0.4
	500	KE	0.4803	0.1070	0.4804	0.1073	0.0
		FCR	0.4401	0.9373	0.3761	0.6530	1.0
		FLR	0.5921	2.4062	0.4767	0.1680	0.2
		FAR	11.9083	250.8289	0.4782	0.1633	0.6
	800	KE	0.4350	0.0953	0.4346	0.0956	0.0
		FCR	0.4760	1.4241	0.3540	0.5997	0.2
		FLR	0.4879	0.1645	0.4874	0.1652	0.0
		FAR	0.4851	0.1448	0.4848	0.1453	0.0
Interaction	300	KE	0.6624	0.1307	0.6631	0.1317	0.0
		FCR	1.6664	4.5351	1.2379	0.6994	3.2
		FLR	0.8295	4.4788	0.5753	0.2028	0.4
		FAR	0.7765	2.7949	0.5921	0.2531	0.6
	500	KE	0.6251	0.1349	0.6255	0.1353	0.0
		FCR	1.3794	1.3184	1.2012	0.6185	2.6
		FLR	0.5857	0.1838	0.5842	0.1831	0.0
		FAR	0.6161	0.4347	0.5932	0.2476	0.4
	800	KE	0.5791	0.1139	0.5802	0.1138	0.0
		FCR	1.4297	1.9109	1.2040	0.6556	2.2
		FLR	0.5832	0.1738	0.5832	0.1753	0.0
		FAR	0.5816	0.1733	0.5816	0.1747	0.0

			AMSPE	SD	CAMSPE	CSD	FR (⁠\|$\%$\|⁠)
Linear	300	KE	0.4215	0.1126	0.4228	0.1129	0.0
		FCR	0.5773	3.2819	0.3164	0.5751	2.0
		FLR	0.4163	0.2367	0.4176	0.2375	0.0
		FAR	0.4148	0.1570	0.4160	0.1563	0.0
	500	KE	0.3785	0.1197	0.3782	0.1202	0.0
		FCR	0.3738	0.9245	0.3092	0.6362	1.0
		FLR	0.4471	0.6310	0.4145	0.1480	0.2
		FAR	0.4297	0.2785	0.4172	0.1508	0.2
	800	KE	0.3393	0.1080	0.3390	0.1084	0.0
		FCR	0.4211	1.4461	0.2962	0.5822	1.0
		FLR	0.4351	0.1385	0.4346	0.1390	0.0
		FAR	0.4351	0.1358	0.4347	0.1363	0.0
Quadratic	300	KE	0.5219	0.1092	0.5232	0.1097	0.0
		FCR	0.6545	3.2673	0.3787	0.6131	2.2
		FLR	0.4728	0.2903	0.4618	0.1630	0.2
		FAR	0.5049	0.5968	0.4674	0.1694	0.4
	500	KE	0.4803	0.1070	0.4804	0.1073	0.0
		FCR	0.4401	0.9373	0.3761	0.6530	1.0
		FLR	0.5921	2.4062	0.4767	0.1680	0.2
		FAR	11.9083	250.8289	0.4782	0.1633	0.6
	800	KE	0.4350	0.0953	0.4346	0.0956	0.0
		FCR	0.4760	1.4241	0.3540	0.5997	0.2
		FLR	0.4879	0.1645	0.4874	0.1652	0.0
		FAR	0.4851	0.1448	0.4848	0.1453	0.0
Interaction	300	KE	0.6624	0.1307	0.6631	0.1317	0.0
		FCR	1.6664	4.5351	1.2379	0.6994	3.2
		FLR	0.8295	4.4788	0.5753	0.2028	0.4
		FAR	0.7765	2.7949	0.5921	0.2531	0.6
	500	KE	0.6251	0.1349	0.6255	0.1353	0.0
		FCR	1.3794	1.3184	1.2012	0.6185	2.6
		FLR	0.5857	0.1838	0.5842	0.1831	0.0
		FAR	0.6161	0.4347	0.5932	0.2476	0.4
	800	KE	0.5791	0.1139	0.5802	0.1138	0.0
		FCR	1.4297	1.9109	1.2040	0.6556	2.2
		FLR	0.5832	0.1738	0.5832	0.1753	0.0
		FAR	0.5816	0.1733	0.5816	0.1747	0.0

In addition, the CSD of the proposed method is substantially smaller than its competitors’. In terms of CAMSPE, FCR performs best in the linear and quadratic models, and the proposed approach gets improved quickly and eventually becomes comparable to FCR and outperforms FLR and FAR as the sample size gets larger. Like any kernel-based approach, the proposed method’s performance relies on the quality and quantity of local observations, similar to the trajectories of predictors of interest. As the sample size increases, more various trajectories can be observed in the training set and it leads to a better prediction by our method. Any of the 4 methods does not outperform with the interaction model because the interaction term is not considered as the predictor in the prediction, but FCR struggles more than the others since it is developed based on the time-varying coefficient model depending relatively more on the linear structure. The results with t-distribution errors and lognormal distribution errors are similar and reported in supplementary materials. In summary, we assert from these extensive simulation studies that the proposed method provides a robust and reliable estimation even with a small sample, while maintaining a comparable prediction accuracy to the existing methods with a small or medium sample size. With a large sample, it could outperform the competing methods by taking advantage of abundant patterns of trajectories.

6 DISCUSSION

In this article, a nonparametric kernel approach has been developed to predict the mean response trajectory by integrating sparse and irregular longitudinal covariates. Our methodology provides new perspectives for predictive longitudinal modeling by utilizing subject-wise similarities. The major contribution is that we address the long-standing curse of dimensionality issue in the functional kernel regression by a novel multiplicative model with the multivariate Gaussian kernel. Compared with other functional data analysis tools, the proposed kernel estimator is more interpretable and easier to implement. The application to FHS and the simulation study show that our method is preferable, especially when the sample size is large enough for the training set to have various patterns of predictor trajectories. L² metric is employed for the convenience of estimating the distance between trajectories conditioning on sparse measurements, but semi-metric approaches such as dynamic time warping could also serve as a useful alternative. In addition, since the data of distances in Section 2.2 could be very large n(n − 1)/2 >>p, modern statistical models such as a deep neural network model could be used to capture the nonlinear relationship between response trajectory and predictor trajectories. Because the theoretical difficulty is high, we use a bootstrap-based SCB to make inferences about μ(s, 0). Establishing a maximal deviation result for |$\hat{\mu }(s,0)$| over |$s \in {\cal T}_2$| or validating the bootstrap approach is potentially future work.

FUNDING

None declared.

CONFLICT OF INTEREST

None declared.

DATA AVAILABILITY

The Framingham heart study cohort data are available in the Biologic Specimen and Data Repository Information Coordinating Center of the National Heart, Lung, and Blood Institute at https://biolincc.nhlbi.nih.gov/studies/framoffspring/.

References

Di

C. Z.

,

Crainiceanu

C. M.

,

Caffo

B. S.

(

2009

).

Multilevel functional principal component analysis

.

The Annals of Applied Statistics

,

3

,

458

–

488

.

Fan

Y.

,

Foutz

N.

,

James

G. M.

,

Jank

W.

(

2014

).

Functional response additive model estimation with online virtual stock markets

.

The Annals of Applied Statistics

,

8

,

2435

–

2460

.

Google Scholar

Crossref

WorldCat

Ferraty

F.

,

Vieu

P.

(

2006

).

Nonparametric Functional Data Analysis: Theory and Practice

.

New York

:

Springer

.

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

Goldsmith

J.

,

Scheipl

F.

,

Huang

L.

,

Wrobel.

J.

,

Di.

C.

,

Gellar.

J.

, et al. (

2023

).

Regression with Functional Data

.

CRAN

,

https://cran.r-project.org/web/packages/refund/refund.pdf

Google Scholar

OpenURL Placeholder Text

WorldCat

He

J.

,

Whelton

P. K.

(

1999

).

Elevated systolic blood pressure and risk of cardiovascular and renal disease: overview of evidence from observational epidemiologic studies and randomized controlled trials

.

American Heart Journal

,

138

,

S211

–

S219

.

Google Scholar

Crossref

WorldCat

Hall

P.

,

Müller

H. G.

,

Yao

F.

(

2008

).

Modelling sparse generalized longitudinal observations with latent Gaussian processes

.

Journal of the Royal Statistical Society: Series B

,

70

,

703

–

723

.

Google Scholar

Crossref

WorldCat

Ivanescu

A. E.

,

Staicu

A. M.

,

Scheipl

F.

,

Greven.

S.

(

2015

).

Penalized function-on-function regression

.

Computational Statistics

,

30

,

539

–

568

.

Google Scholar

Crossref

WorldCat

James

G. M.

,

Silverman

B. W.

(

2005

).

Functional adaptive model estimation

.

Journal of the American Statistical Association

,

100

,

565

–

576

.

Google Scholar

Crossref

WorldCat

Kim

J. S.

,

Staicu

A. M.

,

Maity

A.

,

Carroll.

R. J

,

Ruppert.

D.

(

2018

).

Additive function-on-function regression

.

Journal of Computational and Graphical Statistics

,

27

,

234

–

244

.

Kim

S.

,

Cho

H.

,

Kim

M. O.

(

2021a

).

Predictive generalized varying-coefficient longitudinal model

.

Statistics in Medicine

,

40

,

6243

–

6259

.

Kim

S.

,

Cho

H.

,

Wu

C.

(

2021b

).

Risk-predictive probabilities and dynamic nonparametric conditional quantile models for longitudinal analysis

.

Statistica Sinica

,

31

,

1415

–

1439

.

Google Scholar

OpenURL Placeholder Text

WorldCat

Leroux

A.

,

Xiao

L.

,

Crainiceanu

C.

,

Checkley.

W.

(

2018

).

Dynamic prediction in functional concurrent regression with an application to child growth

.

Statistics in Medicine

,

37

,

1376

–

1388

.

Leroux

A.

,

Xiao

L.

,

Crainiceanu

C.

,

Checkly.

W.

(

2022

).

Functional Concurrent Regression for Sparse Data

.

CRAN

,

https://cran.r-project.org/web/packages/fcr/fcr.pdf

Google Scholar

OpenURL Placeholder Text

WorldCat

McLean

M. W.

,

Hooker

G.

,

Staicu

A. M.

,

Scheipl.

F.

,

Ruppert.

D.

2014

).

Functional generalized additive models

.

Journal of Computational and Graphical Statistics

,

23

,

249

–

269

.

Müller

H. G.

,

Yao

F.

(

2008

).

Functional additive models

.

Journal of the American Statistical Association

,

103

,

1534

–

1544

.

Google Scholar

Crossref

WorldCat

Mourad

J. J.

(

2008

).

The evolution of systolic blood pressure as a strong predictor of cardiovascular risk and the effectiveness of fixed-dose ARB/CCB combinations in lowering levels of this preferential target

.

Vascular Health and Risk Management

,

4

,

1315

–

1325

.

Nielsen

W. B.

,

Lindenstrøm

E.

,

Vestbo

J.

,

Jensen.

G. B.

(

1997

).

Is diastolic hypertension an independent risk factor for stroke in the presence of normal systolic blood pressure in the middle-aged and elderly?

.

American Journal of Hypertension

,

10

,

634

–

639

.

O’Donnell

C. J.

,

Elosua

R.

(

2008

).

Cardiovascular risk factors. Insights from framingham heart study

.

Revista Española de Cardiología (English Edition)

,

61

,

299

–

310

.

Google Scholar

Crossref

WorldCat

Peng

J.

,

Müller

H. G.

(

2008

).

Distance-based clustering of sparsely observed stochastic processes, with applications to online auctions

.

The Annals of Applied Statistics

,

2

,

1056

–

1077

.

Google Scholar

Crossref

WorldCat

Scheipl

F.

,

Staicu

A. M.

,

Greven

S.

(

2015

).

Functional additive mixed models

.

Journal of Computational and Graphical Statistics

,

24

,

477

–

501

.

Sun

X.

,

Du

P.

,

Wang

X.

,

Ma.

P.

(

2018

).

Optimal penalized function-on-function regression under a reproducing kernel Hilbert space framework

.

Journal of the American Statistical Association

,

113

,

1601

–

1611

.

Wang

S.

,

Kim

S.

,

Cho

H.

,

Chang.

W.

(

2023

).

longke: Nonparametric Predictive Model for Sparse and Irregular Longitudinal Data

.

CRAN

,

https://CRAN.R-project.org/package=longke

Google Scholar

OpenURL Placeholder Text

WorldCat

Wu

C. O.

,

Tian

X.

(

2013

).

Nonparametric estimation of conditional distributions and rank-tracking probabilities with time-varying transformation models in longitudinal studies

.

Journal of the American Statistical Association

,

108

,

971

–

982

.

Google Scholar

Crossref

WorldCat

Wu

C. O.

,

Tian

X.

,

Tian

L.

et al. (

2020

).

Nonparametric estimation of risk tracking indices for longitudinal studies

.

Statistical Methods in Medical Research

,

29

,

481

–

497

.

Yao

F.

(

2007

).

Functional principal component analysis for longitudinal and survival data

.

Statistica Sinica

,

17

,

965

–

983

.

Google Scholar

OpenURL Placeholder Text

WorldCat

Yao

F.

,

Müller

H. G.

,

Wang

J. L.

(

2005a

).

Functional data analysis for sparse longitudinal data

.

Journal of the American Statistical Association

,

100

,

577

–

590

.

Google Scholar

Crossref

WorldCat

Yao

F.

,

Müller

H. G.

,

Wang

J. L.

(

2005b

).

Functional linear regression analysis for longitudinal data

.

Annals of Statistics

,

33

,

2873

–

2903

.

Google Scholar

OpenURL Placeholder Text

WorldCat

Zhou

Y.

,

Bhattacharjee

S.

,

Carroll

C.

,

Chen.

Y.

,

Dai.

X.

,

Fan.

J.

, et al. (

2022

).

Functional Data Analysis and Empirical Dynamics

.

CRAN

,

https://cran.r-project.org/web/packages/fdapace/fdapace.pdf

Google Scholar

OpenURL Placeholder Text

WorldCat

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://dbpia.nl.go.kr/journals/pages/open_access/funder_policies/chorus/standard_publication_model)

Download all slides

Month:	Total Views:
February 2024	140
March 2024	183
April 2024	92
May 2024	21
June 2024	12
July 2024	13
August 2024	11
September 2024	15
October 2024	13
November 2024	10
December 2024	18
January 2025	11
February 2025	23
March 2025	127
April 2025	113
May 2025	20

Article Contents

Nonparametric predictive model for sparse and irregular longitudinal data

Abstract

1 INTRODUCTION

2 METHODOLOGY

2.1 Kernel estimation and distance

2.2 Multiplicative model of D_i(x₁, …, x_p) with its proxy variable

2.3 Proposed estimator and asymptotic results

3 IMPLEMENTATION

3.1 Estimation of individual trajectories and distance

3.2 Estimation of tuning parameters

3.3 Estimation of mean response trajectory

4 REAL DATA ANALYSIS

4.1 Data description and objective of analysis

4.2 Results

5 SIMULATION STUDIES

6 DISCUSSION

FUNDING

CONFLICT OF INTEREST

DATA AVAILABILITY

References

Supplementary data

Citations

Views

Altmetric

Email alerts

Related articles in

Citing articles via

Latest

Most Read

Most Cited

Article Contents

Nonparametric predictive model for sparse and irregular longitudinal data

Abstract

1 INTRODUCTION

2 METHODOLOGY

2.1 Kernel estimation and distance

2.2 Multiplicative model of Di(x1, …, xp) with its proxy variable

2.3 Proposed estimator and asymptotic results

3 IMPLEMENTATION

3.1 Estimation of individual trajectories and distance

3.2 Estimation of tuning parameters

3.3 Estimation of mean response trajectory

4 REAL DATA ANALYSIS

4.1 Data description and objective of analysis

4.2 Results

5 SIMULATION STUDIES

6 DISCUSSION

FUNDING

CONFLICT OF INTEREST

DATA AVAILABILITY

References

Supplementary data

Citations

Views

Altmetric

Email alerts

Related articles in

Citing articles via

Latest

Most Read

Most Cited

This Feature Is Available To Subscribers Only

2.2 Multiplicative model of D_i(x₁, …, x_p) with its proxy variable