Abstract

We propose a kernel-based estimator to predict the mean response trajectory for sparse and irregularly measured longitudinal data. The kernel estimator is constructed by imposing weights based on the subject-wise similarity on L2 metric space between predictor trajectories, where we assume that an analogous fashion in predictor trajectories over time would result in a similar trend in the response trajectory among subjects. In order to deal with the curse of dimensionality caused by the multiple predictors, we propose an appealing multiplicative model with multivariate Gaussian kernels. This model is capable of achieving dimension reduction as well as selecting functional covariates with predictive significance. The asymptotic properties of the proposed nonparametric estimator are investigated under mild regularity conditions. We illustrate the robustness and flexibility of our proposed method via extensive simulation studies and an application to the Framingham Heart Study.

1 INTRODUCTION

Longitudinal studies in which subjects are measured repeatedly over a period of time |${\cal T}$| allow us to examine associations between a response variable and a set of predictors at distinct time points. We consider prediction of the response variable over a period of time |${\cal T}_2 \subset {\cal T}$| based on repeated measures of the predictors over another period of time |${\cal T}_1 \subset {\cal T}$|⁠. We denote Y(s) by the response variable at time |$s \in {\cal T}_2$| and xr = {xr(t1), |$\ldots,$|  xr(tm)} for r = 1, |$\dots,$|  p by a vector of the rth predictor sparsely collected at m time points t1, |$\ldots,$|  tm. Then, |${\cal T}_1$| can be defined as [t1, tm] with respect to the measurement times of predictors. It is of primary interest to predict the mean response trajectory over |${\cal T}_2$| given a history of the repeatedly observed predictors x1, |$\dots,$|  xp over |${\cal T}_1$|

(1)

Here, the response variable Y(s) can be either continuous or discrete. For instance, if prediction of obesity incidence is of interest, the response variable is dichotomous, that is, Y(s) = I(Z(s) > 30 kg/m2), where Z(s) is body mass index (BMI) at time s and I(·) is an indicator function. If prediction of the mean trajectory of BMI is of interest, the response variable Y(s) = Z(s) is continuous. If |${\cal T}_2$| is a future time period (i.e., tm < s for any |$s \in {\cal T}_2$|⁠), Equation 1 can be viewed as the future mean trajectory of the response variable given the history of the past exogenous variables.

In recent years, various nonparametric methods have been studied to predict the future longitudinal response trajectory based on earlier predictors, including ranking-tracking probability approaches (Wu and Tian, 2013; Wu et al., 2020) and predictive longitudinal models (Kim et al., 2021a, 2021b). However, these methods are limited in the view of longitudinal study because prediction is performed based on predictors evaluated at a certain time in the past. In the literature on function data analysis, many studies have poured attention on predicting the mean response profiles based on repeated measures of functional predictors; see Ivanescu et al. (2015), Scheipl, Staicu, and Greven (2015), and Sun et al. (2018). A famous and classic functional linear regression for sparse longitudinal data proposed by Yao et al. (2005a) achieves a function-on-function or function-on-scalar estimation of the mean response trajectory given one functional predictor. It is extended to multiple functional predictors and nonlinear predictors via the functional additive model (James and Silverman 2005, Müller and Yao 2008, McLean et al. 2014, Fan et al. 2014, Kim et al. 2018). Even though they improve flexibility of regression to functional predictors, the complexity of estimation increases substantially, including estimation of smooth functions in the additive model, the choice of basis functions and knots, and regularization. The complexity may cause a convergence issue, especially in sparse and irregular longitudinal data (Scheipl et al. 2015). As an alternative, we propose a novel nonparametric approach with a kernel-weighted estimator.

We devote ourselves to the core idea of the kernel regression that prediction of the response variable over time is weighted to reflect the similarity of the history of the predictors. Due to the inherent difficulty of longitudinal data, where observations are often measured irregularly at different time points among subjects, measurement-wise distances between subjects cannot be employed directly. Instead, we adopt the L2 metric based on individual trajectories of predictors estimated by the functional principal component analysis (Hall et al. 2008) to measure the similarity between histories of repeatedly measured predictors. Our methodology assumes that the more similar the predictors’ trajectories are to each other, the more likely their response trajectories travel in a similar fashion.

In order to deal with the curse of dimensionality in pure-nonparametric approaches caused by multiple predictors, we propose a multiplicative model with multivariate Gaussian kernel function. The logarithm transformation turns the multiplicative model into a linear model of the L2 metric between individual response trajectories over |${\cal T}_2$| on the L2 metrics between individual predictor trajectories over |${\cal T}_1$| based on elegant mathematical properties of Gaussian kernel function. This model not only circumvents the curse of dimensionality but also enables us to evaluate the association between predictor trajectories over |${\cal T}_1$| and response trajectory over |${\cal T}_2$|⁠. More specifically, the model investigates the similarity of which predictor’s trajectory leads to similar response trajectories. Furthermore, the asymptotic properties of the proposed estimator is studied, and a bootstrap-based simultaneous confidence band (Kim et al. 2021a) is adopted for statistical inferences. An accompanying R package longke (Wang et al. 2023) is developed for the proposed model.

2 METHODOLOGY

In this section, we discuss in detail technical challenges for the nonparametric kernel estimation of the conditional mean trajectory in Equation 1 due to sparse and irregular measurements over subjects in the longitudinal data and the curse of dimensionality. These issues are circumvented by modeling and predicting the L2 metric between response trajectories over |${\cal T}_2$| based on the L2 metrics between predictor trajectories over |${\cal T}_1$|⁠.

2.1 Kernel estimation and distance

The longitudinal random sample is denoted by [{Xi(tij), tij}, {Yi(sik), sik}; i = |$1,\ldots ,$|  n, j = 1|$,\ldots ,$|  mi, k = 1|$, \ldots ,$|  ni], where Xi(t) = {Xi1(t)|$, \ldots ,$|  Xip(t)} and Yi(s) are a vector of predictors and the response variable of subject i measured at sparse time points |$t_{i1},\dots ,t_{im_i}\in {\cal T}_3 \subset {\cal T}$| and |$s_{i1},\dots ,s_{in_i}\in {\cal T}_4 \subset {\cal T}$|⁠, respectively. Here, the two time periods of interest |${\cal T}_1$| and |${\cal T}_2$| defined in Equation 1 are subsets of the time periods of the measurements in the longitudinal sample, that is, |${\cal T}_1\subset {\cal T}_3$| and |${\cal T}_2\subset {\cal T}_4$|⁠. The number of measurements m1|$,\ldots ,$|  mn and n1|$, \ldots ,$|  nn are independent and identically distributed positive integer-valued random variables with E(mi) < ∞ and E(ni) < ∞, respectively. A longitudinal sample from a typical setting of longitudinal studies, where the response and predictors are measured concurrently is a special case of the above longitudinal sample, where tij = sik for all i and j = k. For simplicity, we denote |${\bf X}_{ir}=\lbrace X_{ir}(t_{i1}),\dots ,X_{ir}(t_{im_i})\rbrace$|⁠, r = 1|$, \ldots ,$|  p, and |${\bf Y}_{i}=(Y_{i1},\dots ,Y_{in_i})$| with Yik = Yi(sik).

In order to predict the mean response trajectory, the existing work, including the functional data analysis, the rank tracking-based approaches, and the predictive longitudinal model approaches discussed in Section 1, focuses on identifying the relationship between Y(s) and X(t) for st. It is quite challenging to impose a structural assumption on the relationship, especially in cases with sparse longitudinal data having multiple predictors repeatedly and irregularly measured over time. Thus, we rather devote our attention on the similarity among subjects in terms of their observed values of predictors over |${\cal T}_1$| and contrive a way to aggregate the information from the predictors.

Based on the assumption that measurement time points of the response variable |$s_{i1},\dots ,s_{in_i}$| are independent of predictors Xi1, |$\ldots,$|  Xip, a nonparametric kernel estimator of E{Y(s)∣x1  |$, \ldots ,$|  xp} in (1) over |$s \in {\cal T}_2$| can be obtained as

(2)

where Kik(s) = K{(siks)/bs}, K(·) and KX(·) are kernel functions and bs is a bandwidth. Di(x1  xp) is an overall distance between subject i’s predictors Xi1|$\, \ldots ,$|  Xip and a history of predictors of interest x1|$,\ldots ,$|  xp in |${\cal T}_1$|⁠. The responses from subject i are more weighted if subject i’s observed predictors are more similar to x1|$, \ldots ,$|xp. In like manner, if its measurement time sik is closer to s, the kernel function Kik(s) outweighs Yik in estimation of E{Y(s)∣x1|$, \ldots ,$|  xp}.

As the first step of developing a method of measuring Di(x1|$, \ldots ,$|  xp), we adopt the L2 metric between the individual trajectories of the rth predictor of subject i and subject i′ over |${\cal T}_1$|⁠:

(3)

It resolves the issue from irregular measurements over subjects because the distance is computed based on two trajectories over the same time period instead of the actually measurements. |$D_{{\cal T}_1}({\bf X}_{ir},{\bf X}_{i^{\prime }r})=0$| indicates the individual trajectories of the rth predictor of subject i and subject i′ over |${\cal T}_1$| are identical, and the larger |$D_{{\cal T}_1}({\bf X}_{ir},{\bf X}_{i^{\prime }r})$| the more dissimilarity between the two trajectories. The overall distance Di(x1|$, \ldots ,$|  xp) in Equation 2 should be defined as an increasing function of |$D_{{\cal T}_1}({\bf X}_{i1},{\bf x}_1),\dots ,D_{{\cal T}_1}({\bf X}_{ip},{\bf x}_p)$|⁠. As a result, the kernel function KX(·) puts more weight on subjects having a similar history with x1, …, xp.

Following the traditional multivariate kernel approach,

(4)

 a naive pure nonparametric estimator of Equation 1 for |$s \in {\cal T}_2$| is

(5)

where for r = 1, |$\ldots,$|  p,

(6)

Here, a Gaussian kernel function is a natural choice based on the fact that the L2 metric is employed, and b1, …, bp are bandwidths.

In particular, |$\tilde{E}\lbrace Y(s)\mid {\bf x}_{1},\dots ,{\bf x}_{p}\rbrace$| is an estimator of |$E\lbrace Y(s)\mid D_{{\cal T}_1}({\bf X}_{1},{\bf x}_{1})=\cdots =D_{{\cal T}_1}({\bf X}_{p},{\bf x}_{p})=0\rbrace = E[Y(s)\mid E\lbrace X_1(t)\mid {\bf x}_{1}\rbrace ,\dots ,E\lbrace X_p(t)\mid {\bf x}_{p}\rbrace ]$|⁠. The L2 metric in Equation 3 used to measure the distance between two individual trajectories is a semi-metric on the space consisting of random vectors {Xir} because |$D_{{\cal T}_1}({\bf X}_{ir},{\bf X}_{i^{\prime }r})=0$| implies |$E\lbrace X_{ir}(t)\mid {\bf X}_{ir}\rbrace = E\lbrace X_{i^{\prime }r}(t)\mid {\bf X}_{i^{\prime }r}\rbrace$| for all |$t\in {\cal T}_1$|⁠, but it is not a sufficient condition to imply |${\bf X}_{ir}={\bf X}_{i^{\prime }r}$|⁠. In other words, even though observed predictors from two subjects are different, if their predictor trajectories are identical, both subjects would have the same mean response trajectory in |${\cal T}_2$|⁠. However, for ii′, |$P[E\lbrace X_{ir}(t)\mid {\bf X}_{ir}\rbrace = E\lbrace X_{i^{\prime }r}(t)\mid {\bf X}_{i^{\prime }r}\rbrace ]=0 \mbox{ for all } t\in {\cal T}_1$|⁠. In addition, the trajectory is of interest rather than observations themselves, especially in longitudinal data analysis with irregular measurement times across subjects. Thus, the proposed approach based on the distance between trajectories is reasonable and practical. An alternative way to measure a distance for sparse and irregular data is proposed in Peng and Müller (2008).

 
Remark 1

To avoid the complexity of notation, we present that the predictors from the same subject in the longitudinal sample are measured concurrently at the beginning of this section. However, it is not necessary since the trajectory of each predictor is estimated separately. The proposed approach can be applicable to an even more general longitudinal sample allowing different measurement times for predictors, that is, denoted by {|$(X_{ ir}(t_{ irj}),\, t_{ irj}), (Y_{ i}(s_{ ik}),s_{ ik}); i=1,\, \dots, n, r=1,\, \dots, p,j=1 \,\dots, m_{ ir}, k=1,\, \dots, n_{ i}$|}. In the same fashion, repeatedly observed predictors of interest |$\chi_{ 1},\, \ldots, \, \chi_{ p}$| are not necessary to be measured at the same time points. However, we use the simpler notation at the beginning of this section for ease of presentation.

2.2 Multiplicative model of Di(x1, …, xp) with its proxy variable

Even though the repeated measurements of predictors are aggregated with the Euclidean distance in Equation 3, the estimator |$\tilde{E}\lbrace Y(s)\mid {\bf x}_{1},\dots ,{\bf x}_{p}\rbrace$| in Equation 5 still suffers from the curse of dimensionality when p ≥ 2, considering that the total number of kernel functions is p + 1, including Kik(s). To overcome the curse of dimensionality, we identify a proxy variable of Di(x1|$, \ldots ,$|  xp) and predict the proxy by a novel multiplicative model based on |$D_{{\cal T}_1}({\bf X}_{i1},{\bf x}_{1}),\dots ,D_{{\cal T}_1}({\bf X}_{ip},{\bf x}_{p})$|⁠. When the predictor trajectories over |${\cal T}_1$| have predictive significance in the mean response trajectory over |${\cal T}_2$|⁠, it is natural that subjects with comparable history of predictors over |${\cal T}_1$| are likely to have similar response trajectories over |${\cal T}_2$|⁠. Based on this supposition, we propose the following multiplicative model motivated by Equation 4:

where KY(·) is a Gaussian kernel with bandwidth bY, |$\epsilon _{ii^{\prime }}$| are independent and identically distributed random errors with mean 0 and a finite variance, c is a constant, and

The logarithm transformation leads to

It follows from the Guassian kernels as in Equation 6

It can be reparameterized as

(7)

where α0, α1|$, \ldots ,$| αp are non-negative coefficients. |$D_{{\cal T}_2}^2({\bf Y}_i,{\bf Y}_{i^{\prime }})$| is inclined to increase along with the predictor distances |$D_{{\cal T}_1}^2({\bf X}_{i1},{\bf X}_{i^{\prime }1}),\dots ,D_{{\cal T}_1}^2({\bf X}_{ip},{\bf X}_{i^{\prime }p})$|⁠. In particular, it is likely that contribution from each predictor distance to |$D_{{\cal T}_2}^2({\bf Y}_i,{\bf Y}_{i^{\prime }})$| varies across predictors. The coefficients |$\alpha _r=b_Y^2/b_r^2$|⁠, r = 1|$, \ldots ,$|  p become smaller as bandwidth br gets larger. The larger bandwidth of the corresponding predictor indicates the lower contribution on prediction of the response variable. The model in Equation 7 not only aggregates the distances from multiple predictors successfully resolving the curse of dimensionality, but also selectively chooses predictors that have a significant impact on prediction of the mean response trajectory.

In order to estimate the coefficients α0, α1|$, \ldots ,$| αp, we first estimate the distances of individual predictor trajectories over |${\cal T}_1$| and the distance of individual response trajectories over |${\cal T}_2$| between all subjects in the longitudinal data. See Section 3.1 for implementation. Then, the data structure is as follows:

|$\hat{{\bf D}}_{Y}^2$||$\hat{{\bf D}}_{X_1}^2$||$\hat{{\bf D}}_{X_p}^2$|
|$\hat{D}_{{\cal T}_2}^2({\bf Y}_1,{\bf Y}_{2})$||$\hat{D}_{{\cal T}_1}^2({\bf X}_{11},{\bf X}_{21})$||$\ldots$||$\hat{D}_{{\cal T}_1}^2({\bf X}_{1p},{\bf X}_{2p})$|
|$\hat{D}_{{\cal T}_2}^2({\bf Y}_1,{\bf Y}_{3})$||$\hat{D}_{{\cal T}_1}^2({\bf X}_{11},{\bf X}_{31})$||$\ldots$||$\hat{D}_{{\cal T}_1}^2({\bf X}_{1p},{\bf X}_{3p})$|
|$\hat{D}_{{\cal T}_2}^2({\bf Y}_1,{\bf Y}_{4})$||$\hat{D}_{{\cal T}_1}^2({\bf X}_{11},{\bf X}_{41})$||$\ldots$||$\hat{D}_{{\cal T}_1}^2({\bf X}_{1p},{\bf X}_{4p})$|
|$\ddots$|
|$\hat{D}_{{\cal T}_2}^2({\bf Y}_{n-1},{\bf Y}_{n})$||$\hat{D}_{{\cal T}_1}^2({\bf X}_{n-1,1},{\bf X}_{n1})$||$\ldots$||$\hat{D}_{{\cal T}_1}^2({\bf X}_{n-1,p},{\bf X}_{np})$|
|$\hat{{\bf D}}_{Y}^2$||$\hat{{\bf D}}_{X_1}^2$||$\hat{{\bf D}}_{X_p}^2$|
|$\hat{D}_{{\cal T}_2}^2({\bf Y}_1,{\bf Y}_{2})$||$\hat{D}_{{\cal T}_1}^2({\bf X}_{11},{\bf X}_{21})$||$\ldots$||$\hat{D}_{{\cal T}_1}^2({\bf X}_{1p},{\bf X}_{2p})$|
|$\hat{D}_{{\cal T}_2}^2({\bf Y}_1,{\bf Y}_{3})$||$\hat{D}_{{\cal T}_1}^2({\bf X}_{11},{\bf X}_{31})$||$\ldots$||$\hat{D}_{{\cal T}_1}^2({\bf X}_{1p},{\bf X}_{3p})$|
|$\hat{D}_{{\cal T}_2}^2({\bf Y}_1,{\bf Y}_{4})$||$\hat{D}_{{\cal T}_1}^2({\bf X}_{11},{\bf X}_{41})$||$\ldots$||$\hat{D}_{{\cal T}_1}^2({\bf X}_{1p},{\bf X}_{4p})$|
|$\ddots$|
|$\hat{D}_{{\cal T}_2}^2({\bf Y}_{n-1},{\bf Y}_{n})$||$\hat{D}_{{\cal T}_1}^2({\bf X}_{n-1,1},{\bf X}_{n1})$||$\ldots$||$\hat{D}_{{\cal T}_1}^2({\bf X}_{n-1,p},{\bf X}_{np})$|
|$\hat{{\bf D}}_{Y}^2$||$\hat{{\bf D}}_{X_1}^2$||$\hat{{\bf D}}_{X_p}^2$|
|$\hat{D}_{{\cal T}_2}^2({\bf Y}_1,{\bf Y}_{2})$||$\hat{D}_{{\cal T}_1}^2({\bf X}_{11},{\bf X}_{21})$||$\ldots$||$\hat{D}_{{\cal T}_1}^2({\bf X}_{1p},{\bf X}_{2p})$|
|$\hat{D}_{{\cal T}_2}^2({\bf Y}_1,{\bf Y}_{3})$||$\hat{D}_{{\cal T}_1}^2({\bf X}_{11},{\bf X}_{31})$||$\ldots$||$\hat{D}_{{\cal T}_1}^2({\bf X}_{1p},{\bf X}_{3p})$|
|$\hat{D}_{{\cal T}_2}^2({\bf Y}_1,{\bf Y}_{4})$||$\hat{D}_{{\cal T}_1}^2({\bf X}_{11},{\bf X}_{41})$||$\ldots$||$\hat{D}_{{\cal T}_1}^2({\bf X}_{1p},{\bf X}_{4p})$|
|$\ddots$|
|$\hat{D}_{{\cal T}_2}^2({\bf Y}_{n-1},{\bf Y}_{n})$||$\hat{D}_{{\cal T}_1}^2({\bf X}_{n-1,1},{\bf X}_{n1})$||$\ldots$||$\hat{D}_{{\cal T}_1}^2({\bf X}_{n-1,p},{\bf X}_{np})$|
|$\hat{{\bf D}}_{Y}^2$||$\hat{{\bf D}}_{X_1}^2$||$\hat{{\bf D}}_{X_p}^2$|
|$\hat{D}_{{\cal T}_2}^2({\bf Y}_1,{\bf Y}_{2})$||$\hat{D}_{{\cal T}_1}^2({\bf X}_{11},{\bf X}_{21})$||$\ldots$||$\hat{D}_{{\cal T}_1}^2({\bf X}_{1p},{\bf X}_{2p})$|
|$\hat{D}_{{\cal T}_2}^2({\bf Y}_1,{\bf Y}_{3})$||$\hat{D}_{{\cal T}_1}^2({\bf X}_{11},{\bf X}_{31})$||$\ldots$||$\hat{D}_{{\cal T}_1}^2({\bf X}_{1p},{\bf X}_{3p})$|
|$\hat{D}_{{\cal T}_2}^2({\bf Y}_1,{\bf Y}_{4})$||$\hat{D}_{{\cal T}_1}^2({\bf X}_{11},{\bf X}_{41})$||$\ldots$||$\hat{D}_{{\cal T}_1}^2({\bf X}_{1p},{\bf X}_{4p})$|
|$\ddots$|
|$\hat{D}_{{\cal T}_2}^2({\bf Y}_{n-1},{\bf Y}_{n})$||$\hat{D}_{{\cal T}_1}^2({\bf X}_{n-1,1},{\bf X}_{n1})$||$\ldots$||$\hat{D}_{{\cal T}_1}^2({\bf X}_{n-1,p},{\bf X}_{np})$|

The coefficients α0, α1|$, \ldots ,$| αp can be estimated by non-negative least squares regression of |$\hat{{\bf D}}_{Y}^2$| on |$\hat{{\bf D}}_{X_1}^2,\dots ,\hat{{\bf D}}_{X_p}^2$| because |$\hat{D}_{{\cal T}_1}^2({\bf X}_{i1},{\bf X}_{i^{\prime }1}),\dots ,\hat{D}_{{\cal T}_1}^2({\bf X}_{ip},{\bf X}_{i^{\prime }p})$|⁠, and |$\hat{D}_{{\cal T}_2}^2({\bf Y}_i,{\bf Y}_{i^{\prime }})$| are non-negative.

Recall that x1|$, \ldots ,$|  xp are histories of observed predictors of interest over |${\cal T}_1$|⁠, so that for each i, |$D_{{\cal T}_1}^2({\bf X}_{i1},{\bf x}_{1}),\dots , D_{{\cal T}_1}^2({\bf X}_{ip},{\bf x}_{p})$| can be estimated. Then, |$D_{{\cal T}_2}({\bf Y}_i,\dot{{\bf y}})$|⁠, the distance between the individual response trajectories of subject i and unobserved response of interest, |$\dot{{\bf y}}$|⁠, over |${\cal T}_2$|⁠, can be predicted as follows:

where |$\hat{\alpha }_0$| is the estimated mean of |$\tilde{D}_{{\cal T}_2}^2({\bf Y}_i,\dot{{\bf y}})$| when |$\hat{D}_{{\cal T}_1}({\bf X}_{i1},{\bf x}_{1})=\cdots =\hat{D}_{{\cal T}_1}({\bf X}_{ip},{\bf x}_{p})=0$|⁠. We assure that |$\tilde{D}_{{\cal T}_2}^2({\bf Y}_i,\dot{{\bf y}})-\hat{\alpha }_0=\hat{\alpha }_1\hat{D}_{{\cal T}_1}^2({\bf X}_{i1},{\bf x}_{1})+\cdots +\hat{\alpha }_p\hat{D}_{{\cal T}_1}^2({\bf X}_{ip},{\bf x}_{p})$| is a reasonable proxy of |$D_i^2({\bf x}_1,\dots ,{\bf x}_p)$| as well as an aggregating measure of the distance taking into account multiple predictor trajectories since it is a nonnegative increasing function of |$\hat{D}_{{\cal T}_1}({\bf X}_{i1},{\bf x}_{1}),\dots ,\hat{D}_{{\cal T}_1}({\bf X}_{ip},{\bf x}_{p})$| and it is equal to 0 when |$\hat{D}_{{\cal T}_1}({\bf X}_{i1},{\bf x}_{1})=\cdots =\hat{D}_{{\cal T}_1}({\bf X}_{ip},{\bf x}_{p})=0$|⁠.

2.3 Proposed estimator and asymptotic results

We propose a nonparametric kernel estimator of E{Y(s)∣x1|$, \ldots ,$|  xp} replacing Di(x1|$, \ldots ,$|  xp) in Equation 2 by its estimated proxy in Section 2.2

(8)

where Kw(·) is a kernel with bandwidth bw and

ei is a discrete random variable with P(ei = 1) = P(ei = −1) = 0.5 and consequently, |$\hat{w}_i^2= \hat{\alpha }_1\hat{D}_{{\cal T}_1}^2({\bf X}_{i1},{\bf X}_{1})+\cdots +\hat{\alpha }_p\hat{D}_{{\cal T}_1}^2({\bf X}_{ip},{\bf X}_{p})$|⁠. The random variable ei plays an important role of dealing with the positive support of the proxy of |$D_i^2({\bf x}_1,\dots ,{\bf x}_p)$| by allowing the density function of wi to be defined in the neighborhood of 0. As alternative to using ei, asymmetric kernel functions can be used to handle the positive support (Ferraty and Vieu 2006). Let |$w=w({\bf x}_1,\dots ,{\bf x}_p)=e\sqrt{\alpha _1D_{{\cal T}_1}^2({\bf X}_{1},{\bf x}_{1})+\cdots +\alpha _p D_{{\cal T}_1}^2({\bf X}_{p},{\bf x}_{p})}$|⁠. We let μ(s, 0) be the mean response profiles over |${\cal T}_2$| given a set of predictor trajectories measured over |${\cal T}_1$|⁠, i.e.

The following theorem states the asymptotic results of the resultant estimator |$\hat{\mu }(s,0)$|⁠.

 
Theorem 1
Suppose the model in Equation 7 is correctly specified and let s0 be an interior point in |${\cal T}_2$|⁠. Under Assumptions (1)–(5) in the Appendix, as nbsbw → ∞, and |$\sup _n nb_sb_w(b_s^4+b_w^4)\lt \infty$|⁠, we have
where |$f_{ s}(\cdot)$| and |$f_{ w}(\cdot)$| are the density functions of |$s_{ ik}$| and wi, respectively, σ2(s) is the variance of |$Y(S)$|⁠, |$\psi_{ k}$|  |$= \int \!\!K^{ 2}(v)dv,$||$\rho (s,0)=b_s^2f_w(0)\left\lbrace \mu ^{\prime }_s(s,0)f^{\prime }_s(s)+\mu ^{\prime \prime }_w(s,0)f_s(s)/2\right\rbrace \phi _K+ b_w^2f_s(s)\left\lbrace \mu ^{\prime }_w(s,0)f^{\prime }_w(0)+\mu ^{\prime \prime }_s(s,0)f_w(0)/2\right\rbrace \phi _K,$| and τ = E(1/n1). Here, ϕK = ∫v2K(v)dv, |$\mu ^{\prime }_s(s_0,0)=\partial \mu (s_0,0)/\partial s$|⁠, |$\mu ^{\prime \prime }_s(s_0,0)=\partial ^2 \mu (s_0,0)/\partial s^2$|⁠, |$\mu ^{\prime }_w(s_0,0)=\partial \mu (s_0,0)/\partial w$| and |$\mu ^{\prime \prime }_w(s_0,0)=\partial ^2 \mu (s_0,0)/\partial w^2$|⁠.

Theorem 1 shows that the proposed estimator is a consistent estimator of μ(s, 0) and states its asymptotic distribution. The bias and limiting variance contains many terms that cannot be easily estimated. Especially, because 0 is not an interior point of w, whose range is [0, ∞), estimation of fw(0) is challenging. For statistical inference about μ(s, 0), we suggest adopting a bootstrap approach to construct a simultaneous confidence band (SCB). Kim et al. (2021a) propose a bootstrap SCB that meets the nominal coverage probability and, in other words, contains |$(1-\alpha )100\%$| of bootstrap trajectory estimates over |${\cal T}_2$|⁠. The bootstrap method is implemented here by sampling |$\lbrace ({\bf Y}_i,\hat{w}_i)\rbrace _i$| with replacement.

 
Remark 2
When the longitudinal data are regularly measured at the same time from all the subjects, |$D_{{\cal T}_1}({\bf X}_{ir},{\bf X}_{i^{\prime }r})$| in Equation 3 can be redefined measurement-wisely as
and it is a metric. As a result, |$D_{{\cal T}_1}({\bf X}_{ir},{\bf X}_{i^{\prime }r})=0$| implies |${\bf X}_{ir}={\bf X}_{i^{\prime }r}$|⁠, and |$\hat{\mu }(s,0)$| is a consistent estimator of E{Y(s)∣x1|$, \ldots ,$|  xp}.
 
Remark 3
When forecasting the future mean response trajectory is of main interest (i.e. for any |$s\in {\cal T}_2$|⁠, tm < s), the observed response in |${\cal T}_1$| is often employed as one of the predictors. A simple and special case is to predict the future mean response trajectory based only on its own past observations

In this case, the use of proxy variable in Equation 7 is not necessary because the curse of dimensionality is not seriously problematic anymore based on the fact that the response in the past is the only predictor. The future mean trajectory |$E\lbrace Y(s)\mid D_{{\cal T}_1}({\bf Y}_i,{\bf y})=0\rbrace =E[Y(s)\mid E\lbrace Y(t)\mid {\bf y}\rbrace ]$|⁠, where y = {y(t1)|$, \ldots ,$|  y(tm)} is observed in |${\cal T}_1$|⁠, can be readily estimated by

3 IMPLEMENTATION

3.1 Estimation of individual trajectories and distance

Estimation of individual trajectories has been actively studied during recent decades (Yao 2007, Di et al. 2009, Leroux et al. 2018). We employ the nonparametric functional principle component analysis (FPCA) approach to estimate individual trajectories of predictors and response variables (Hall et al. 2008). Under an assumption that the individual trajectory follows a latent Gaussian process model, the Karhunen–Loève expansion is employed

where ηrk(tij) are orthonormal eigenfunctions, scores ζirk serve as random effects, and εijr is an independent and identically distributed random error. As a result, the individual trajectory is

This approach is flexible and developed for sparse and irregular longitudinal data. The estimated individual trajectories of predictors and response variables, that is,

(9)

are obtained for all subjects (Yao et al. 2005a) in the statistical software R using the FPCA function from the package fdapace (Zhou et al. 2022). Let v1|$, \ldots ,$|  vL and u1|$, \ldots ,$|  uH be evenly spaced time points on |${\cal T}_1$| and |${\cal T}_2$|⁠, respectively. The L2 metrics between individual trajectories can be estimated as for any i < i

3.2 Estimation of tuning parameters

We first obtain the estimates of the coefficients in Equation 7, |$\hat{\alpha }_0,\hat{\alpha }_1,\dots ,\hat{\alpha }_p$|⁠, by non-negative least squares regression of |$\hat{{\bf D}}_{Y}^2$| on |$\hat{{\bf D}}_{X_1}^2,\dots ,\hat{{\bf D}}_{X_p}^2$| defined in Section 2.2. To select the bandwidths bw and bs in Equation 8, we adopt the leave-one-subject-out cross-validation approach that minimizes the prediction error of Yik. For each subject i, we obtain the predicted L2 metric |$D_{{\cal T}_2}({\bf Y}_i,{\bf Y}_{i^{\prime }})$| denoted by |$\tilde{D}_{{\cal T}_2}({\bf Y}_i,{\bf Y}_{i^{\prime }})$| using the estimated coefficients

Then, we compute |$\hat{w}_{ii^{\prime }}=e_{ii^{\prime }}\sqrt{\tilde{D}_{{\cal T}_2}({\bf Y}_i,{\bf Y}_{i^{\prime }})-\hat{\alpha }_0}$|⁠, where |$e_{ii^{\prime }}$| is a discrete random variable with |$P(e_{ii^{\prime }}=1)=P(e_{ii^{\prime }}=-1)=0.5$|⁠. Then, using all subjects except for the ith subject, we have a prediction of Yi(sik), |$s_{ik}\in {\cal T}_2$| as follows

The optimal bandwidths are selected by minimizing

3.3 Estimation of mean response trajectory

Recall that xr = {xr(t1)|$, \ldots ,$|  xr(tm)}, for r = 1|$, \ldots ,$|  p, are vectors of the observer predictors of interest at times t1|$, \ldots ,$|  tm. Since all the tuning parameters α0, α1|$, \ldots ,$| αp, bY and bs are estimated in Section 3.2, the mean response trajectory μ(s, 0), |$s \in {\cal T}_2$|⁠, can be readily predicted through the following steps:

Step 1: Estimate individual trajectories E{Xr(t)∣xr}, r = 1|$, \ldots ,$|  p, over |$t\in {\cal T}_1=[t_1,t_m]$| with the fitted model in Equation 9.

Step 2: Compute the distances from individual predictor trajectories based on x1|$, \ldots ,$|  xp to them of subjects in the longitudinal data:

Step 3: Compute

|$\hat{w}_i=e_i\sqrt{\hat{\alpha }_1\hat{D}_{{\cal T}_1}^2({\bf X}_{i1},{\bf x}_{1})+\cdots +\hat{\alpha }_p\hat{D}_{{\cal T}_1}^2({\bf X}_{ip},{\bf x}_{p})}$|⁠.

Step 4: μ(s, 0) can be predicted with |$b_s^{*}$| and |$b_w^{*}$| by the proposed approach

4 REAL DATA ANALYSIS

4.1 Data description and objective of analysis

The Framingham Heart Study (FHS) is one of the most well-known longitudinal investigations that began in 1948. The main goal of the study is to identify the risk factors of cardiovascular diseases, including obesity, hypertension, and myocardial infarction, and to estimate corresponding growth curves over time. The FHS consists of 5209 participants of Framingham cohort and 5124 of their second generation cohort with spouses who took physical examinations every 3 or 4 years, but the participants may miss measurements due to various reasons, and therefore the measurements are sparse and irregular (O’Donnell and Elosua 2008).

In our analysis, we demonstrate the functionality of our proposed method by evaluating the prediction significance of body mass index (BMI), high-density lipoprotein (HDL), systolic blood pressure (SBP), diastolic blood pressure (DBP), and total cholesterol (TOT) over |${\cal T}_1=[25,40]$| in forecasting the BMI trajectories over |${\cal T}_2=[41,60]$|⁠. We chose this specific time period so that participants who just turned middle-aged from young adulthood can adjust their dietary structures or receive timely medication care if overweight and obesity were predicted. Since the five predictors have right-skewed distributions, logarithm transformation is applied. They are denoted by LBMI, LHDL, LSBP, LDBP, and LTOT, respectively. To obtain a robust interpolation when estimating individual trajectories by sparse FPCA, the cohort data are pre-processed to let each subject have at least 7 measurements over |${\cal T}=[25,60]$|⁠, so that the domain plane is densely populated with the design points as shown in Figure 1 (Yao et al. 2005b).

Design plot of 541 subjects in Framingham cohort data for sparse FPCA over ${\cal T} = [25,60]$. The plot contains assembled pairs (Ageij, Ageik), i = 1, …, n, j, k = 1, …, mi, where mi is the number of measurements from the ith subject.
FIGURE 1

Design plot of 541 subjects in Framingham cohort data for sparse FPCA over |${\cal T} = [25,60]$|⁠. The plot contains assembled pairs (Ageij, Ageik), i = 1, …, n, j, k = 1, …, mi, where mi is the number of measurements from the ith subject.

We compare our method with three functional regression models: the function-on-function linear regression (FLR) model

the function-on-function additive regression (FAR) model in Scheipl et al. (2015)

and a dynamic prediction in Leroux et al. (2018) based on the functional concurrent regression (FCR) model

Leroux et al. (2018) enables to predict subject-specific future trajectories using the mixed effect representation based on their past data. We remark while our proposed method, FLR and FAR models are marginal models, the dynamic FCR model is a mixed effects model, that takes into account the random effects. Furthermore, our proposed method, FLR and FAR models can employ the response variable measured in |${\cal T}_1$| directly as a predictor. In contrast, the dynamic FCR model adopts the response information in |${\cal T}_1$| via the functional random effect b(t). The FLR and FAR are implemented using refund package (Goldsmith et al. 2023) in R and the FCR model is implemented using fcr package (Leroux et al. 2022) in R.

4.2 Results

The point estimates by non-negative least squares and their 95|$\%$| bootstrap confidence intervals for the coefficients in Equation 7 are shown in Table 1. The 95|$\%$| bootstrap confidence interval is obtained based on 1000 bootstrap random samples from the distance data shown in Section 2.2. We demonstrate the performance of the proposed bootstrap confidence interval via simulation studies and the results are reported in supplementary materials. It is expected that a lower HDL and higher SBP, BMI, and TOT in young adulthood result in a higher risk of obesity in the middle-aged.

TABLE 1

Coefficient estimation for the prediction of future BMI trajectory.

Predictor distanceParameterEstimate|$95\%$| Bootstrap CI
Interceptα00.0199(0.0195, 0.0203)
|$\hat{{\bf D}}_{LSBP}^2$|α10.0445(0.0371, 0.0521)
|$\hat{{\bf D}}_{LHDL}^2$|α20.0011(0.0003, 0.0021)
|$\hat{{\bf D}}_{LBMI}^2$|α30.8630(0.8568, 0.8693)
|$\hat{{\bf D}}_{LDBP}^2$|α40.0000(0.0000, 0.0000)
|$\hat{{\bf D}}_{LTOT}^2$|α50.0065(0.0044, 0.0088)
Predictor distanceParameterEstimate|$95\%$| Bootstrap CI
Interceptα00.0199(0.0195, 0.0203)
|$\hat{{\bf D}}_{LSBP}^2$|α10.0445(0.0371, 0.0521)
|$\hat{{\bf D}}_{LHDL}^2$|α20.0011(0.0003, 0.0021)
|$\hat{{\bf D}}_{LBMI}^2$|α30.8630(0.8568, 0.8693)
|$\hat{{\bf D}}_{LDBP}^2$|α40.0000(0.0000, 0.0000)
|$\hat{{\bf D}}_{LTOT}^2$|α50.0065(0.0044, 0.0088)
TABLE 1

Coefficient estimation for the prediction of future BMI trajectory.

Predictor distanceParameterEstimate|$95\%$| Bootstrap CI
Interceptα00.0199(0.0195, 0.0203)
|$\hat{{\bf D}}_{LSBP}^2$|α10.0445(0.0371, 0.0521)
|$\hat{{\bf D}}_{LHDL}^2$|α20.0011(0.0003, 0.0021)
|$\hat{{\bf D}}_{LBMI}^2$|α30.8630(0.8568, 0.8693)
|$\hat{{\bf D}}_{LDBP}^2$|α40.0000(0.0000, 0.0000)
|$\hat{{\bf D}}_{LTOT}^2$|α50.0065(0.0044, 0.0088)
Predictor distanceParameterEstimate|$95\%$| Bootstrap CI
Interceptα00.0199(0.0195, 0.0203)
|$\hat{{\bf D}}_{LSBP}^2$|α10.0445(0.0371, 0.0521)
|$\hat{{\bf D}}_{LHDL}^2$|α20.0011(0.0003, 0.0021)
|$\hat{{\bf D}}_{LBMI}^2$|α30.8630(0.8568, 0.8693)
|$\hat{{\bf D}}_{LDBP}^2$|α40.0000(0.0000, 0.0000)
|$\hat{{\bf D}}_{LTOT}^2$|α50.0065(0.0044, 0.0088)

Based on the results in Table 1, we can conclude that the trajectories of SBP, BMI, and TOT in |${\cal T}_1$| are highly significant predictors for the trajectories of BMI in |${\cal T}_2$|⁠. As expected, BMI in the past has the strongest association with the BMI in the future. Even though TOT effect is significant, its effect is minimal compared to the effects of BMI and SBP. Thus, it is shown that the future BMI trajectory is predicted mainly by the past trajectories of BMI and SBP. In contrast, the CI for the parameter regarding HDL is barely >0 and the estimated coefficient for DBP is effectively 0. This can be explained by the fact that (i) HDL is referred to as good cholesterol and hence not considered as a risk factor for increase in BMI, and (ii) SBP and DBP are usually correlated but SBP serves as a more significant indicator of obesity-related conditions, and hence the effect of DBP is ruled out by our model. We refer to Nielsen et al. (1997), He and Whelton (1999), and Mourad (2008) for more detailed discussion.

The predicted future BMI trajectories of 4 patients over |${\cal T}_2=[41,60]$|⁠, computed by the proposed kernel estimator (KE), FCR, FLR, and FAR models are displayed on Figure 2. These 4 patients are selected from a testing set to visually compare the model performance. Because all 4 estimation approaches can decently capture the overall trend of BMI over |${\cal T}_2$|⁠, we further conduct 5 repeated 10-fold cross validations to quantify their performance via the average and standard deviation of mean squared prediction errors (AMSPE and SD, respectively) from 50 splits in total. The results are reported in Table 2. The four methods are quite comparable, but FCR performs best in terms of AMSPE. The proposed method performs prediction more accurately (with a smaller AMSPE) than FLR and FAR, but less precisely (with a larger SD) compared to its competitors.

Predicted BMI trajectories over ${\cal T}_2=[41,60]$ by the proposed approach (solid), FCR (dotted and dashed), FLR (dashed), and FAR (dotted).
FIGURE 2

Predicted BMI trajectories over |${\cal T}_2=[41,60]$| by the proposed approach (solid), FCR (dotted and dashed), FLR (dashed), and FAR (dotted).

TABLE 2

Average and standard deviation (SD) of MSPE from 5 repeated 10-fold cross validations (×100) in the future BMI prediction.

Full dataReduced data
KEFCRFLRFARKEFCRFLRFAR
AMSPE1.0570.9851.0861.0941.0430.9811.0721.084
SD0.2320.2270.2000.2120.1620.1970.2010.194
Full dataReduced data
KEFCRFLRFARKEFCRFLRFAR
AMSPE1.0570.9851.0861.0941.0430.9811.0721.084
SD0.2320.2270.2000.2120.1620.1970.2010.194
TABLE 2

Average and standard deviation (SD) of MSPE from 5 repeated 10-fold cross validations (×100) in the future BMI prediction.

Full dataReduced data
KEFCRFLRFARKEFCRFLRFAR
AMSPE1.0570.9851.0861.0941.0430.9811.0721.084
SD0.2320.2270.2000.2120.1620.1970.2010.194
Full dataReduced data
KEFCRFLRFARKEFCRFLRFAR
AMSPE1.0570.9851.0861.0941.0430.9811.0721.084
SD0.2320.2270.2000.2120.1620.1970.2010.194

The kernel-based prediction is vulnerable to unusual predictor values because it predicts the response based on similarity of predictors without imposing a structural assumption on the relationship between the response and predictors. It is found that there are a few subjects having extremely large BMI values in |${\cal T}_1$| and their poor prediction leads to relatively large standard deviation. After removing subjects with LBMI>4 (BMI>54.6) in |${\cal T}_1$|⁠, we perform 5 repeated 10-fold cross-validations again, and the results are also reported in Table 2. This result shows that our proposed method may outperform its competitors in the prediction for future BMI trajectories of subjects with more ordinary past biometric trajectories, but an extra caution is required if a subject’s past biometric information is extraordinary. The fact that FCR performs very well with this real data enables us to conjecture that the relationship between the response BMI and the predictors are quite linear concurrently based on the simulation studies in the next section.

We provide more comprehensive figures for the four subjects in Figure 2. Figures 3 and 4 show observed values of SBP, DBP, TOT, HDL, and BMI, as well as their estimated trajectories in |${\cal T}_1=[25,40]$|⁠. They also show observed values of BMI, its predicted trajectories, and a 95% simultaneous confidence band computed by the proposed method in |${\cal T}_2=[41,60]$|⁠. In Figure 3, it is observed that the subjects have similar SBP, TOT, and BMI trajectories and this results in similar future BMI trajectories. However, the sudden increase in BMI of Subject 48 in |${\cal T}_2$| is hard to predict based on the patients’ past biometric information, so none of the four methods considered are successfully to predict it in Figure 2. We remark that the trajectories are the conditional mean of the future response and the band is a confidence band. Thus, there might be some discrepancy between real observations and the predicted trajectories. It should be noted that the SCBs become wider as age increases, that makes sense because uncertainty enlarges when making predictions for further future. In Figure 4, we observe that there is more prediction variability for obese subjects, and even though BMI of Subject 36 decreased in the past, the proposed method successfully predicts the increasing trend in future BMI based on increasing trend in the past trajectories of other predictors.

Observed values of SBP, DBP, TOT, HDL, and BMI and their estimated trajectories (dashed) in ${\cal T}_1=[25,40]$, and observed values of BMI and its predicted BMI trajectory (solid) over ${\cal T}_2=[41,60]$ by proposed approach with 95$\%$ bootstrap SCB (dotted).
FIGURE 3

Observed values of SBP, DBP, TOT, HDL, and BMI and their estimated trajectories (dashed) in |${\cal T}_1=[25,40]$|⁠, and observed values of BMI and its predicted BMI trajectory (solid) over |${\cal T}_2=[41,60]$| by proposed approach with 95|$\%$| bootstrap SCB (dotted).

Observed values of SBP, DBP, TOT, HDL, and BMI and their estimated trajectories (dashed) in ${\cal T}_1=[25,40]$, and observed values of BMI and its predicted BMI trajectory (solid) over ${\cal T}_2=[41,60]$ by proposed approach with 95$\%$ bootstrap SCB (dotted).
FIGURE 4

Observed values of SBP, DBP, TOT, HDL, and BMI and their estimated trajectories (dashed) in |${\cal T}_1=[25,40]$|⁠, and observed values of BMI and its predicted BMI trajectory (solid) over |${\cal T}_2=[41,60]$| by proposed approach with 95|$\%$| bootstrap SCB (dotted).

5 SIMULATION STUDIES

In this section, we conduct simulation studies to compare the predictive performance of the proposed kernel estimator with three competing methods and to justify the consistency of the proposed method in Theorem 1. The training sets are generated over |${\cal T}=[1,50]$| by the following three concurrent time-varying coefficient regression models, a linear model with Equation 10, a quadratic model with Equation 11, and an interaction model with Equation 12:

(10)
(11)
(12)

where μX(t) = t and μZ(t) = 5cos (2πt/50) are marginal mean functions of measurement time t, α0iU( − 3.5, 3.5) is a random subject effect, which adds a dependence between the predictors Xi(t) and Zi(t). The irregular measurement times are generated by tij = 5(j − 1) + DU(1, 5), j = 1|$, \ldots ,$| 10 with a discrete uniform distribution DU(·, ·). Here, the discrete uniform distribution is employed because refund packages (Goldsmith et al. 2023) in R requires a set of measurements at discrete time points in order to compute the covariance matrix for FLR model and FAR model. The errors Lij, Gij, and Dij are generated from Gaussian processes to account for dependence within subject, i.e.

where the covariance matrix Σi is defined by the squared exponential covariance function k(t, t′) = σ2exp{ − (tt′)2/(2l2)} with the variance parameter σ2 = 1 and the length scale parameter l = 50. Two additional distributions are considered to generate errors of Yi(tij), that is, Dij: multivariate t-distribution with 10° of freedom and lognormal distribution by exponentiating Gaussian process Di with the variance σ2 = 0.25.

Testing sets are generated in |${\cal T}_1=[1,25]$| via the data-generating mechanism in Equations 10, 11, and 12. Then, using the generated training longitudinal data, we perform a prediction of the future response Yi(t) for |$t \in {\cal T}_2=[26,50]$|⁠, based on the past observed predictors Yi(tij), Xi(tij), and Zi(tij) for |$t_{ij} \in {\cal T}_1=[1,25]$|⁠. It is worthwhile to note that FLR, FAR, and the proposed approach are misspecified because the data are generated from the concurrent regression models. FCR may outperform for the linear model in Equation 10 because it is correctly specified, but it is also tested under the incorrectly specified scenarios in Equations 11 and 12. We train the proposed method and its competitors with sample sizes of 300, 500, and 800 and test their prediction performance with 50 subjects. The optimal bandwidths are obtained by the proposed leave-one-subject-our cross validation in Section 3.2.

To evaluate the prediction performance, we conduct 500 simulation runs and adopt the same criteria of average and standard deviation of mean squared prediction error (AMSPE and SD) as we have done in real data analysis. The results with Gaussian process errors are reported in Table 3. AMSPE and corrected SD (CSD) of our method decreases as the number of subjects increases, which justifies the consistency of proposed method in Theorem 1. Corrected AMSPE (CAMSPE) is computed after dropping extreme MSPEs (MSPE > 5) that are observed, especially in FCR and FAR due to the convergence issue (Scheipl et al. 2015). The considerable difference between AMSPE and CAMSPE indicates the existence of prediction failures due to the convergence issue. When the sample size is relatively small, the FLR and FAR sometimes fail in prediction, but the proposed approach does not fail with any data set generated in this simulation study. In contrast, FCR struggles with the convergence issue most frequently and the failure rates are mostly between 1 and 3% because the number of observations within subjects, that is, 10, are relatively small to compute the random effect.

TABLE 3

Normal error case: average (AMSPE), standard deviation (SD), corrected average (CAMSPE), and corrected standard deviation (CSD) of MSPE, when n = 300, 500, and 800 with the failure rate (FR).

AMSPESDCAMSPECSDFR (⁠|$\%$|⁠)
Linear300KE0.42150.11260.42280.11290.0
FCR0.57733.28190.31640.57512.0
FLR0.41630.23670.41760.23750.0
FAR0.41480.15700.41600.15630.0
500KE0.37850.11970.37820.12020.0
FCR0.37380.92450.30920.63621.0
FLR0.44710.63100.41450.14800.2
FAR0.42970.27850.41720.15080.2
800KE0.33930.10800.33900.10840.0
FCR0.42111.44610.29620.58221.0
FLR0.43510.13850.43460.13900.0
FAR0.43510.13580.43470.13630.0
Quadratic300KE0.52190.10920.52320.10970.0
FCR0.65453.26730.37870.61312.2
FLR0.47280.29030.46180.16300.2
FAR0.50490.59680.46740.16940.4
500KE0.48030.10700.48040.10730.0
FCR0.44010.93730.37610.65301.0
FLR0.59212.40620.47670.16800.2
FAR11.9083250.82890.47820.16330.6
800KE0.43500.09530.43460.09560.0
FCR0.47601.42410.35400.59970.2
FLR0.48790.16450.48740.16520.0
FAR0.48510.14480.48480.14530.0
Interaction300KE0.66240.13070.66310.13170.0
FCR1.66644.53511.23790.69943.2
FLR0.82954.47880.57530.20280.4
FAR0.77652.79490.59210.25310.6
500KE0.62510.13490.62550.13530.0
FCR1.37941.31841.20120.61852.6
FLR0.58570.18380.58420.18310.0
FAR0.61610.43470.59320.24760.4
800KE0.57910.11390.58020.11380.0
FCR1.42971.91091.20400.65562.2
FLR0.58320.17380.58320.17530.0
FAR0.58160.17330.58160.17470.0
AMSPESDCAMSPECSDFR (⁠|$\%$|⁠)
Linear300KE0.42150.11260.42280.11290.0
FCR0.57733.28190.31640.57512.0
FLR0.41630.23670.41760.23750.0
FAR0.41480.15700.41600.15630.0
500KE0.37850.11970.37820.12020.0
FCR0.37380.92450.30920.63621.0
FLR0.44710.63100.41450.14800.2
FAR0.42970.27850.41720.15080.2
800KE0.33930.10800.33900.10840.0
FCR0.42111.44610.29620.58221.0
FLR0.43510.13850.43460.13900.0
FAR0.43510.13580.43470.13630.0
Quadratic300KE0.52190.10920.52320.10970.0
FCR0.65453.26730.37870.61312.2
FLR0.47280.29030.46180.16300.2
FAR0.50490.59680.46740.16940.4
500KE0.48030.10700.48040.10730.0
FCR0.44010.93730.37610.65301.0
FLR0.59212.40620.47670.16800.2
FAR11.9083250.82890.47820.16330.6
800KE0.43500.09530.43460.09560.0
FCR0.47601.42410.35400.59970.2
FLR0.48790.16450.48740.16520.0
FAR0.48510.14480.48480.14530.0
Interaction300KE0.66240.13070.66310.13170.0
FCR1.66644.53511.23790.69943.2
FLR0.82954.47880.57530.20280.4
FAR0.77652.79490.59210.25310.6
500KE0.62510.13490.62550.13530.0
FCR1.37941.31841.20120.61852.6
FLR0.58570.18380.58420.18310.0
FAR0.61610.43470.59320.24760.4
800KE0.57910.11390.58020.11380.0
FCR1.42971.91091.20400.65562.2
FLR0.58320.17380.58320.17530.0
FAR0.58160.17330.58160.17470.0
TABLE 3

Normal error case: average (AMSPE), standard deviation (SD), corrected average (CAMSPE), and corrected standard deviation (CSD) of MSPE, when n = 300, 500, and 800 with the failure rate (FR).

AMSPESDCAMSPECSDFR (⁠|$\%$|⁠)
Linear300KE0.42150.11260.42280.11290.0
FCR0.57733.28190.31640.57512.0
FLR0.41630.23670.41760.23750.0
FAR0.41480.15700.41600.15630.0
500KE0.37850.11970.37820.12020.0
FCR0.37380.92450.30920.63621.0
FLR0.44710.63100.41450.14800.2
FAR0.42970.27850.41720.15080.2
800KE0.33930.10800.33900.10840.0
FCR0.42111.44610.29620.58221.0
FLR0.43510.13850.43460.13900.0
FAR0.43510.13580.43470.13630.0
Quadratic300KE0.52190.10920.52320.10970.0
FCR0.65453.26730.37870.61312.2
FLR0.47280.29030.46180.16300.2
FAR0.50490.59680.46740.16940.4
500KE0.48030.10700.48040.10730.0
FCR0.44010.93730.37610.65301.0
FLR0.59212.40620.47670.16800.2
FAR11.9083250.82890.47820.16330.6
800KE0.43500.09530.43460.09560.0
FCR0.47601.42410.35400.59970.2
FLR0.48790.16450.48740.16520.0
FAR0.48510.14480.48480.14530.0
Interaction300KE0.66240.13070.66310.13170.0
FCR1.66644.53511.23790.69943.2
FLR0.82954.47880.57530.20280.4
FAR0.77652.79490.59210.25310.6
500KE0.62510.13490.62550.13530.0
FCR1.37941.31841.20120.61852.6
FLR0.58570.18380.58420.18310.0
FAR0.61610.43470.59320.24760.4
800KE0.57910.11390.58020.11380.0
FCR1.42971.91091.20400.65562.2
FLR0.58320.17380.58320.17530.0
FAR0.58160.17330.58160.17470.0
AMSPESDCAMSPECSDFR (⁠|$\%$|⁠)
Linear300KE0.42150.11260.42280.11290.0
FCR0.57733.28190.31640.57512.0
FLR0.41630.23670.41760.23750.0
FAR0.41480.15700.41600.15630.0
500KE0.37850.11970.37820.12020.0
FCR0.37380.92450.30920.63621.0
FLR0.44710.63100.41450.14800.2
FAR0.42970.27850.41720.15080.2
800KE0.33930.10800.33900.10840.0
FCR0.42111.44610.29620.58221.0
FLR0.43510.13850.43460.13900.0
FAR0.43510.13580.43470.13630.0
Quadratic300KE0.52190.10920.52320.10970.0
FCR0.65453.26730.37870.61312.2
FLR0.47280.29030.46180.16300.2
FAR0.50490.59680.46740.16940.4
500KE0.48030.10700.48040.10730.0
FCR0.44010.93730.37610.65301.0
FLR0.59212.40620.47670.16800.2
FAR11.9083250.82890.47820.16330.6
800KE0.43500.09530.43460.09560.0
FCR0.47601.42410.35400.59970.2
FLR0.48790.16450.48740.16520.0
FAR0.48510.14480.48480.14530.0
Interaction300KE0.66240.13070.66310.13170.0
FCR1.66644.53511.23790.69943.2
FLR0.82954.47880.57530.20280.4
FAR0.77652.79490.59210.25310.6
500KE0.62510.13490.62550.13530.0
FCR1.37941.31841.20120.61852.6
FLR0.58570.18380.58420.18310.0
FAR0.61610.43470.59320.24760.4
800KE0.57910.11390.58020.11380.0
FCR1.42971.91091.20400.65562.2
FLR0.58320.17380.58320.17530.0
FAR0.58160.17330.58160.17470.0

In addition, the CSD of the proposed method is substantially smaller than its competitors’. In terms of CAMSPE, FCR performs best in the linear and quadratic models, and the proposed approach gets improved quickly and eventually becomes comparable to FCR and outperforms FLR and FAR as the sample size gets larger. Like any kernel-based approach, the proposed method’s performance relies on the quality and quantity of local observations, similar to the trajectories of predictors of interest. As the sample size increases, more various trajectories can be observed in the training set and it leads to a better prediction by our method. Any of the 4 methods does not outperform with the interaction model because the interaction term is not considered as the predictor in the prediction, but FCR struggles more than the others since it is developed based on the time-varying coefficient model depending relatively more on the linear structure. The results with t-distribution errors and lognormal distribution errors are similar and reported in supplementary materials. In summary, we assert from these extensive simulation studies that the proposed method provides a robust and reliable estimation even with a small sample, while maintaining a comparable prediction accuracy to the existing methods with a small or medium sample size. With a large sample, it could outperform the competing methods by taking advantage of abundant patterns of trajectories.

6 DISCUSSION

In this article, a nonparametric kernel approach has been developed to predict the mean response trajectory by integrating sparse and irregular longitudinal covariates. Our methodology provides new perspectives for predictive longitudinal modeling by utilizing subject-wise similarities. The major contribution is that we address the long-standing curse of dimensionality issue in the functional kernel regression by a novel multiplicative model with the multivariate Gaussian kernel. Compared with other functional data analysis tools, the proposed kernel estimator is more interpretable and easier to implement. The application to FHS and the simulation study show that our method is preferable, especially when the sample size is large enough for the training set to have various patterns of predictor trajectories. L2 metric is employed for the convenience of estimating the distance between trajectories conditioning on sparse measurements, but semi-metric approaches such as dynamic time warping could also serve as a useful alternative. In addition, since the data of distances in Section 2.2 could be very large n(n − 1)/2 >>p, modern statistical models such as a deep neural network model could be used to capture the nonlinear relationship between response trajectory and predictor trajectories. Because the theoretical difficulty is high, we use a bootstrap-based SCB to make inferences about μ(s, 0). Establishing a maximal deviation result for |$\hat{\mu }(s,0)$| over |$s \in {\cal T}_2$| or validating the bootstrap approach is potentially future work.

FUNDING

None declared.

CONFLICT OF INTEREST

None declared.

DATA AVAILABILITY

The Framingham heart study cohort data are available in the Biologic Specimen and Data Repository Information Coordinating Center of the National Heart, Lung, and Blood Institute at https://biolincc.nhlbi.nih.gov/studies/framoffspring/.

References

Di
 
C. Z.
,
Crainiceanu
 
C. M.
,
Caffo
 
B. S.
(
2009
).
Multilevel functional principal component analysis
.
The Annals of Applied Statistics
,
3
,
458
488
.

Fan
 
Y.
,
Foutz
 
N.
,
James
 
G. M.
,
Jank
 
W.
(
2014
).
Functional response additive model estimation with online virtual stock markets
.
The Annals of Applied Statistics
,
8
,
2435
2460
.

Ferraty
 
F.
,
Vieu
 
P.
(
2006
).
Nonparametric Functional Data Analysis: Theory and Practice
.
New York
:
Springer
.

Goldsmith
 
J.
,
Scheipl
 
F.
,
Huang
 
L.
,
Wrobel.
 
J.
,
Di.
 
C.
,
Gellar.
 
J.
, et al. (
2023
).
Regression with Functional Data
.
CRAN
,

He
 
J.
,
Whelton
 
P. K.
(
1999
).
Elevated systolic blood pressure and risk of cardiovascular and renal disease: overview of evidence from observational epidemiologic studies and randomized controlled trials
.
American Heart Journal
,
138
,
S211
S219
.

Hall
 
P.
,
Müller
 
H. G.
,
Yao
 
F.
(
2008
).
Modelling sparse generalized longitudinal observations with latent Gaussian processes
.
Journal of the Royal Statistical Society: Series B
,
70
,
703
723
.

Ivanescu
 
A. E.
,
Staicu
 
A. M.
,
Scheipl
 
F.
,
Greven.
 
S.
(
2015
).
Penalized function-on-function regression
.
Computational Statistics
,
30
,
539
568
.

James
 
G. M.
,
Silverman
 
B. W.
(
2005
).
Functional adaptive model estimation
.
Journal of the American Statistical Association
,
100
,
565
576
.

Kim
 
J. S.
,
Staicu
 
A. M.
,
Maity
 
A.
,
Carroll.
 
R. J
,
Ruppert.
 
D.
(
2018
).
Additive function-on-function regression
.
Journal of Computational and Graphical Statistics
,
27
,
234
244
.

Kim
 
S.
,
Cho
 
H.
,
Kim
 
M. O.
(
2021a
).
Predictive generalized varying-coefficient longitudinal model
.
Statistics in Medicine
,
40
,
6243
6259
.

Kim
 
S.
,
Cho
 
H.
,
Wu
 
C.
(
2021b
).
Risk-predictive probabilities and dynamic nonparametric conditional quantile models for longitudinal analysis
.
Statistica Sinica
,
31
,
1415
1439
.

Leroux
 
A.
,
Xiao
 
L.
,
Crainiceanu
 
C.
,
Checkley.
 
W.
(
2018
).
Dynamic prediction in functional concurrent regression with an application to child growth
.
Statistics in Medicine
,
37
,
1376
1388
.

Leroux
 
A.
,
Xiao
 
L.
,
Crainiceanu
 
C.
,
Checkly.
 
W.
(
2022
).
Functional Concurrent Regression for Sparse Data
.
CRAN
,

McLean
 
M. W.
,
Hooker
 
G.
,
Staicu
 
A. M.
,
Scheipl.
 
F.
,
Ruppert.
 
D.
 
2014
).
Functional generalized additive models
.
Journal of Computational and Graphical Statistics
,
23
,
249
269
.

Müller
 
H. G.
,
Yao
 
F.
(
2008
).
Functional additive models
.
Journal of the American Statistical Association
,
103
,
1534
1544
.

Mourad
 
J. J.
(
2008
).
The evolution of systolic blood pressure as a strong predictor of cardiovascular risk and the effectiveness of fixed-dose ARB/CCB combinations in lowering levels of this preferential target
.
Vascular Health and Risk Management
,
4
,
1315
1325
.

Nielsen
 
W. B.
,
Lindenstrøm
 
E.
,
Vestbo
 
J.
,
Jensen.
 
G. B.
(
1997
).
Is diastolic hypertension an independent risk factor for stroke in the presence of normal systolic blood pressure in the middle-aged and elderly?
.
American Journal of Hypertension
,
10
,
634
639
.

O’Donnell
 
C. J.
,
Elosua
 
R.
(
2008
).
Cardiovascular risk factors. Insights from framingham heart study
.
Revista Española de Cardiología (English Edition)
,
61
,
299
310
.

Peng
 
J.
,
Müller
 
H. G.
(
2008
).
Distance-based clustering of sparsely observed stochastic processes, with applications to online auctions
.
The Annals of Applied Statistics
,
2
,
1056
1077
.

Scheipl
 
F.
,
Staicu
 
A. M.
,
Greven
 
S.
(
2015
).
Functional additive mixed models
.
Journal of Computational and Graphical Statistics
,
24
,
477
501
.

Sun
 
X.
,
Du
 
P.
,
Wang
 
X.
,
Ma.
 
P.
(
2018
).
Optimal penalized function-on-function regression under a reproducing kernel Hilbert space framework
.
Journal of the American Statistical Association
,
113
,
1601
1611
.

Wang
 
S.
,
Kim
 
S.
,
Cho
 
H.
,
Chang.
 
W.
(
2023
).
longke: Nonparametric Predictive Model for Sparse and Irregular Longitudinal Data
.
CRAN
,

Wu
 
C. O.
,
Tian
 
X.
(
2013
).
Nonparametric estimation of conditional distributions and rank-tracking probabilities with time-varying transformation models in longitudinal studies
.
Journal of the American Statistical Association
,
108
,
971
982
.

Wu
 
C. O.
,
Tian
 
X.
,
Tian
 
L.
 et al. (
2020
).
Nonparametric estimation of risk tracking indices for longitudinal studies
.
Statistical Methods in Medical Research
,
29
,
481
497
.

Yao
 
F.
(
2007
).
Functional principal component analysis for longitudinal and survival data
.
Statistica Sinica
,
17
,
965
983
.

Yao
 
F.
,
Müller
 
H. G.
,
Wang
 
J. L.
(
2005a
).
Functional data analysis for sparse longitudinal data
.
Journal of the American Statistical Association
,
100
,
577
590
.

Yao
 
F.
,
Müller
 
H. G.
,
Wang
 
J. L.
(
2005b
).
Functional linear regression analysis for longitudinal data
.
Annals of Statistics
,
33
,
2873
2903
.

Zhou
 
Y.
,
Bhattacharjee
 
S.
,
Carroll
 
C.
,
Chen.
 
Y.
,
Dai.
 
X.
,
Fan.
 
J.
, et al. (
2022
).
Functional Data Analysis and Empirical Dynamics
.
CRAN
,

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://dbpia.nl.go.kr/journals/pages/open_access/funder_policies/chorus/standard_publication_model)

Supplementary data