A model where the least trimmed squares estimator is maximum likelihood

Data generating processes

	Model	MLE	‘good’ error	‘outliers’
DGP1	LTS	LTS	$N (0, 1)$	$ν^{+} = ν^{-} = 0$
DGP2	LTS	LTS	$N (0, 1)$	$ν^{+} = 3$ ⁠, $ν^{-} = 1$
DGP3	LMS	LMS	$U (- 1, 1)$	$ν^{+} = 3$ ⁠, $ν^{-} = 1$
DGP4	$ϵ$ -contamination	OLS	$N (0, 1)$	$N (0, 1)$
DGP5	$ϵ$ -contamination	—	$N (0, 1)$	$N (0, 3)$
DGP6	$ϵ$ -contamination	—	$N (0, 1)$	$N (2, 1)$

	Model	MLE	‘good’ error	‘outliers’
DGP1	LTS	LTS	$N (0, 1)$	$ν^{+} = ν^{-} = 0$
DGP2	LTS	LTS	$N (0, 1)$	$ν^{+} = 3$ ⁠, $ν^{-} = 1$
DGP3	LMS	LMS	$U (- 1, 1)$	$ν^{+} = 3$ ⁠, $ν^{-} = 1$
DGP4	$ϵ$ -contamination	OLS	$N (0, 1)$	$N (0, 1)$
DGP5	$ϵ$ -contamination	—	$N (0, 1)$	$N (0, 3)$
DGP6	$ϵ$ -contamination	—	$N (0, 1)$	$N (2, 1)$

Note. The proportion of good errors is $λ = 0.8$ ⁠. Columns 2 and 3 show the model and indicate which estimator is maximum likelihood. Columns 4 and 5 indicate how the ‘good’ and the ‘outlying’ errors are chosen. For DGP1 – DGP3: $λ n$ ‘good’ observations,‘outliers’ have $ξ_{j} - ν^{+} 1_{(ξ_{j} > 0)} + ν^{-} 1_{(ξ_{j} < 0)}$ is $N (0, 1)$ ⁠. For DGP4 – DGP6: distribution is $λ N (0, 1) + (1 - λ) H$ ⁠.

Table 1.

Data generating processes

	Model	MLE	‘good’ error	‘outliers’
DGP1	LTS	LTS	$N (0, 1)$	$ν^{+} = ν^{-} = 0$
DGP2	LTS	LTS	$N (0, 1)$	$ν^{+} = 3$ ⁠, $ν^{-} = 1$
DGP3	LMS	LMS	$U (- 1, 1)$	$ν^{+} = 3$ ⁠, $ν^{-} = 1$
DGP4	$ϵ$ -contamination	OLS	$N (0, 1)$	$N (0, 1)$
DGP5	$ϵ$ -contamination	—	$N (0, 1)$	$N (0, 3)$
DGP6	$ϵ$ -contamination	—	$N (0, 1)$	$N (2, 1)$

	Model	MLE	‘good’ error	‘outliers’
DGP1	LTS	LTS	$N (0, 1)$	$ν^{+} = ν^{-} = 0$
DGP2	LTS	LTS	$N (0, 1)$	$ν^{+} = 3$ ⁠, $ν^{-} = 1$
DGP3	LMS	LMS	$U (- 1, 1)$	$ν^{+} = 3$ ⁠, $ν^{-} = 1$
DGP4	$ϵ$ -contamination	OLS	$N (0, 1)$	$N (0, 1)$
DGP5	$ϵ$ -contamination	—	$N (0, 1)$	$N (0, 3)$
DGP6	$ϵ$ -contamination	—	$N (0, 1)$	$N (2, 1)$

Note. The proportion of good errors is $λ = 0.8$ ⁠. Columns 2 and 3 show the model and indicate which estimator is maximum likelihood. Columns 4 and 5 indicate how the ‘good’ and the ‘outlying’ errors are chosen. For DGP1 – DGP3: $λ n$ ‘good’ observations,‘outliers’ have $ξ_{j} - ν^{+} 1_{(ξ_{j} > 0)} + ν^{-} 1_{(ξ_{j} < 0)}$ is $N (0, 1)$ ⁠. For DGP4 – DGP6: distribution is $λ N (0, 1) + (1 - λ) H$ ⁠.

The last three DGPs are examples of $ϵ$ -contamination (Huber, 1964). We draw n observations from the distribution function $0.8 Φ + 0.2 H$ ⁠, where Φ is standard normal and $H$ represents contamination. In DGP4, $H = Φ$ ⁠, giving a standard i.i.d. normal model. In DGP5 and DGP6, $H$ is $N (0, 3)$ and $N (2, 1)$ ⁠, giving symmetric and nonsymmetric mixtures, respectively.

We have different maximum-likelihood estimators for the different models. These are LTS for DGP1 and DGP2, LMS for DGP3, and OLS for DGP4. None of the considered estimators are maximum likelihood for DGP5 and DGP6.

Table 2 reports results from $10^{6}$ repetitions. The Monte Carlo standard error is 0.001.

Table 2.

Simulated rejection frequencies for nominal 5% tests on intercept

n	DGP1	2	3	4	5	6	DGP1	2	3	4	5	6
	OLS						LTS
25	0.092	0.084	0.084	0.067	0.066	0.359	0.255	0.081	0.072	0.371	0.337	0.388
100	0.083	0.129	0.199	0.054	0.054	0.887	0.180	0.058	0.055	0.383	0.345	0.518
400	0.100	0.321	0.664	0.051	0.051	1.000	0.110	0.052	0.051	0.389	0.349	0.827
1600	0.159	0.745	0.998	0.050	0.050	1.000	0.071	0.050	0.050	0.390	0.349	0.998
	LMS						SLTS
25	0.720	0.489	0.070	0.641	0.631	0.702	0.011	0.000	0.000	0.034	0.027	0.063
100	0.961	0.785	0.054	0.836	0.831	0.901	0.002	0.000	0.000	0.041	0.028	0.116
400	0.999	0.936	0.051	0.931	0.929	0.982	0.000	0.000	0.000	0.047	0.031	0.373
1600	1.000	0.992	0.050	0.972	0.971	0.999	0.000	0.000	0.000	0.049	0.032	0.941

n	DGP1	2	3	4	5	6	DGP1	2	3	4	5	6
	OLS						LTS
25	0.092	0.084	0.084	0.067	0.066	0.359	0.255	0.081	0.072	0.371	0.337	0.388
100	0.083	0.129	0.199	0.054	0.054	0.887	0.180	0.058	0.055	0.383	0.345	0.518
400	0.100	0.321	0.664	0.051	0.051	1.000	0.110	0.052	0.051	0.389	0.349	0.827
1600	0.159	0.745	0.998	0.050	0.050	1.000	0.071	0.050	0.050	0.390	0.349	0.998
	LMS						SLTS
25	0.720	0.489	0.070	0.641	0.631	0.702	0.011	0.000	0.000	0.034	0.027	0.063
100	0.961	0.785	0.054	0.836	0.831	0.901	0.002	0.000	0.000	0.041	0.028	0.116
400	0.999	0.936	0.051	0.931	0.929	0.982	0.000	0.000	0.000	0.047	0.031	0.373
1600	1.000	0.992	0.050	0.972	0.971	0.999	0.000	0.000	0.000	0.049	0.032	0.941

Table 2.

Simulated rejection frequencies for nominal 5% tests on intercept

n	DGP1	2	3	4	5	6	DGP1	2	3	4	5	6
	OLS						LTS
25	0.092	0.084	0.084	0.067	0.066	0.359	0.255	0.081	0.072	0.371	0.337	0.388
100	0.083	0.129	0.199	0.054	0.054	0.887	0.180	0.058	0.055	0.383	0.345	0.518
400	0.100	0.321	0.664	0.051	0.051	1.000	0.110	0.052	0.051	0.389	0.349	0.827
1600	0.159	0.745	0.998	0.050	0.050	1.000	0.071	0.050	0.050	0.390	0.349	0.998
	LMS						SLTS
25	0.720	0.489	0.070	0.641	0.631	0.702	0.011	0.000	0.000	0.034	0.027	0.063
100	0.961	0.785	0.054	0.836	0.831	0.901	0.002	0.000	0.000	0.041	0.028	0.116
400	0.999	0.936	0.051	0.931	0.929	0.982	0.000	0.000	0.000	0.047	0.031	0.373
1600	1.000	0.992	0.050	0.972	0.971	0.999	0.000	0.000	0.000	0.049	0.032	0.941

n	DGP1	2	3	4	5	6	DGP1	2	3	4	5	6
	OLS						LTS
25	0.092	0.084	0.084	0.067	0.066	0.359	0.255	0.081	0.072	0.371	0.337	0.388
100	0.083	0.129	0.199	0.054	0.054	0.887	0.180	0.058	0.055	0.383	0.345	0.518
400	0.100	0.321	0.664	0.051	0.051	1.000	0.110	0.052	0.051	0.389	0.349	0.827
1600	0.159	0.745	0.998	0.050	0.050	1.000	0.071	0.050	0.050	0.390	0.349	0.998
	LMS						SLTS
25	0.720	0.489	0.070	0.641	0.631	0.702	0.011	0.000	0.000	0.034	0.027	0.063
100	0.961	0.785	0.054	0.836	0.831	0.901	0.002	0.000	0.000	0.041	0.028	0.116
400	0.999	0.936	0.051	0.931	0.929	0.982	0.000	0.000	0.000	0.047	0.031	0.373
1600	1.000	0.992	0.050	0.972	0.971	0.999	0.000	0.000	0.000	0.049	0.032	0.941

The OLS statistic is maximum likelihood with DGP4 and performs well. It performs equally well with the symmetric, i.i.d. DGP5. For DGP1, it is slowly diverging, possibly because the absolute sample mean is diverging with n. OLS performs poorly with the nonsymmetric DGP2, DGP3, and DGP6.

The LTS statistic is maximum likelihood with DGP1 and DGP2. The asymptotic theory also applies for DGP3. The convergence is slow for DGP1, where there is no separation. The LTS statistic does not perform well with $ϵ$ -contamination in DGP4, DGP5, and DGP6.

The LMS statistic is maximum likelihood with DGP3 and perform well with that DGP, but poorly with all other DGPs. The SLTS statistic is not maximum likelihood for any of the considered models. It is calibrated to be asymptotically unbiased for DGP4 and performs well for that model, but poorly with all other DGPs.

Overall, we see that it is a good idea to apply maximum likelihood but this does require that the model specification is checked. In particular, the LTS estimator is best in DGP1–DGP3, although with some finite sample distortion with DGP1 where ‘good’ and ‘outlying’ observations are not well-separated. The LTS estimator does not work well for the $ϵ$ -contaminated models, where a model dependent scale correction is needed. The usual approach of using the normal scale correction as in SLTS does not work well in general. All estimators are poor for asymmetric $ϵ$ -contamination.

7.2 h estimation

Next, we study the finite sample properties of estimating h using the cumulant-based normality test statistic $T_{h}$ in (6.2). Results are reported in Table 3, based on $10^{3}$ repetitions.

Table 3.

Estimating h by minimizing the $T_{h}$ statistic in (6.2)

	DGP1						DGP2
	$λ = 0.7$			$λ = 0.8$			$λ = 0.7$			$λ = 0.8$
n	${\hat{h}}_{bias}$	${\hat{h}}_{sd}$	${\hat{h}}_{r}$	${\hat{h}}_{bias}$	${\hat{h}}_{sd}$	${\hat{h}}_{r}$	${\hat{h}}_{bias}$	${\hat{h}}_{sd}$	${\hat{h}}_{r}$	${\hat{h}}_{bias}$	${\hat{h}}_{sd}$	${\hat{h}}_{r}$
25	1.2	2.5		$-$ 1.0	2.5		0.3	2.1		$-$ 1.5	2.1
100	9.6	8.6		3.7	6.0		$-$ 0.2	4.3		$-$ 1.1	3.0
400	41.3	35.8		4.5	12.4		$-$ 0.8	2.1	✓	$-$ 0.8	2.0	✓
1600	92.8	143.1	✓	$-$ 0.4	3.5	✓	$-$ 0.8	2.0	✓	$-$ 0.8	2.0	✓
12800	$-$ 0.7	4.2	✓	$-$ 0.6	4.4	✓	$-$ 1.2	3.2	✓	$-$ 1.3	3.2	✓

	DGP1						DGP2
	$λ = 0.7$			$λ = 0.8$			$λ = 0.7$			$λ = 0.8$
n	${\hat{h}}_{bias}$	${\hat{h}}_{sd}$	${\hat{h}}_{r}$	${\hat{h}}_{bias}$	${\hat{h}}_{sd}$	${\hat{h}}_{r}$	${\hat{h}}_{bias}$	${\hat{h}}_{sd}$	${\hat{h}}_{r}$	${\hat{h}}_{bias}$	${\hat{h}}_{sd}$	${\hat{h}}_{r}$
25	1.2	2.5		$-$ 1.0	2.5		0.3	2.1		$-$ 1.5	2.1
100	9.6	8.6		3.7	6.0		$-$ 0.2	4.3		$-$ 1.1	3.0
400	41.3	35.8		4.5	12.4		$-$ 0.8	2.1	✓	$-$ 0.8	2.0	✓
1600	92.8	143.1	✓	$-$ 0.4	3.5	✓	$-$ 0.8	2.0	✓	$-$ 0.8	2.0	✓
12800	$-$ 0.7	4.2	✓	$-$ 0.6	4.4	✓	$-$ 1.2	3.2	✓	$-$ 1.3	3.2	✓

Note. Here, ${\hat{h}}_{bias}$ and ${\hat{h}}_{sd}$ report the simulated bias and standard deviation for $\hat{h}$ ⁠, while ${\hat{h}}_{r}$ is ticked if all simulated values of $\hat{h}$ are interior to the range from 60% to 90%.

Table 3.

Estimating h by minimizing the $T_{h}$ statistic in (6.2)

	DGP1						DGP2
	$λ = 0.7$			$λ = 0.8$			$λ = 0.7$			$λ = 0.8$
n	${\hat{h}}_{bias}$	${\hat{h}}_{sd}$	${\hat{h}}_{r}$	${\hat{h}}_{bias}$	${\hat{h}}_{sd}$	${\hat{h}}_{r}$	${\hat{h}}_{bias}$	${\hat{h}}_{sd}$	${\hat{h}}_{r}$	${\hat{h}}_{bias}$	${\hat{h}}_{sd}$	${\hat{h}}_{r}$
25	1.2	2.5		$-$ 1.0	2.5		0.3	2.1		$-$ 1.5	2.1
100	9.6	8.6		3.7	6.0		$-$ 0.2	4.3		$-$ 1.1	3.0
400	41.3	35.8		4.5	12.4		$-$ 0.8	2.1	✓	$-$ 0.8	2.0	✓
1600	92.8	143.1	✓	$-$ 0.4	3.5	✓	$-$ 0.8	2.0	✓	$-$ 0.8	2.0	✓
12800	$-$ 0.7	4.2	✓	$-$ 0.6	4.4	✓	$-$ 1.2	3.2	✓	$-$ 1.3	3.2	✓

	DGP1						DGP2
	$λ = 0.7$			$λ = 0.8$			$λ = 0.7$			$λ = 0.8$
n	${\hat{h}}_{bias}$	${\hat{h}}_{sd}$	${\hat{h}}_{r}$	${\hat{h}}_{bias}$	${\hat{h}}_{sd}$	${\hat{h}}_{r}$	${\hat{h}}_{bias}$	${\hat{h}}_{sd}$	${\hat{h}}_{r}$	${\hat{h}}_{bias}$	${\hat{h}}_{sd}$	${\hat{h}}_{r}$
25	1.2	2.5		$-$ 1.0	2.5		0.3	2.1		$-$ 1.5	2.1
100	9.6	8.6		3.7	6.0		$-$ 0.2	4.3		$-$ 1.1	3.0
400	41.3	35.8		4.5	12.4		$-$ 0.8	2.1	✓	$-$ 0.8	2.0	✓
1600	92.8	143.1	✓	$-$ 0.4	3.5	✓	$-$ 0.8	2.0	✓	$-$ 0.8	2.0	✓
12800	$-$ 0.7	4.2	✓	$-$ 0.6	4.4	✓	$-$ 1.2	3.2	✓	$-$ 1.3	3.2	✓

Note. Here, ${\hat{h}}_{bias}$ and ${\hat{h}}_{sd}$ report the simulated bias and standard deviation for $\hat{h}$ ⁠, while ${\hat{h}}_{r}$ is ticked if all simulated values of $\hat{h}$ are interior to the range from 60% to 90%.

In each repetition, we compute the $T_{h}$ statistic for each h in the range from 60% to 90% of n. The estimator of h is the minimizer of $T_{h}$ over that range.

The data generating processes are DGP1 and DGP2 from above. These are examples of the LTS model, so that DGP1 has symmetric ‘outliers’ that are not separated from the ‘good’ observations, while DGP2 has asymmetric ‘outliers’ that are separated from the ‘good’ observations. For each DGP, we consider cases with 70% or 80% ‘good’ observations.

Table 3 reports three quantities for each of the DGPs. First, ${\hat{h}}_{bias}$ and ${\hat{h}}_{sd}$ are the Monte Carlo average and standard deviation of the estimation error $\hat{h} - h$ ⁠. Further, ${\hat{h}}_{r}$ is a binary variable, which is checked if $\hat{h}$ is in the interior of the range from 60% to 90% of n for all $10^{3}$ simulations. The theory suggests that the proportion $h / n$ of ‘good’ observations is consistently estimated, whereas h is not consistently estimated. Thus, we would expect ${\hat{h}}_{bias}$ and ${\hat{h}}_{sd}$ to grow slower than linearly in n, but not to vanish.

For all the four set-ups, the simulations confirm that $\hat{h} / n$ is consistent for λ. In DGP1, where there is only little separation between ‘good’ and ‘outlying’ observations, the performance differs substantially between the cases $λ = 0.7$ and $λ = 0.8$ ⁠. We do not have an explanation for this difference. Nonetheless, the estimation works much better for DGP2, with its separation between ‘good’ observations and ‘outliers’, in both cases $λ = 0.7$ and $λ = 0.8$ ⁠, matching the theoretical discussion in the above sections.

We also considered the information criteria $Φ_{h}$ in (6.1), but omit the results. The performance of $Φ_{h}$ is poor, quite possibly due to the $\log \log n$ rates involved.

8 Empirical illustrations

We provide two empirical illustrations using the stars data of Rousseeuw and Leroy (1987) and the state infant mortality data of Wooldridge (2015). Both analyses illustrate the estimation of h. The second case also illustrates inference in the LTS model. Therefore, we must determine empirically if the LTS model is appropriate. Indeed, there could be a contamination pattern that is consistent with the $ϵ$ -contamination, with the LTS model, or with neither of those. For both illustrations, we study the source of the data to arrive at reasonable empirical models.

Throughout, we use R version 4.0.2 (R Core Team, 2020) estimating LTS using ltsReg from the Robustbase package. Before each LTS call we apply set.seed(0). R code with embedded data is available as Supplementary material.

8.1 The stars data

For this empirical illustration, we consider the data on log light intensity and log temperature for the Hertzsprung–Russell diagram of the star cluster CYG OB1 containing $n = 47$ stars as reported by Rousseeuw and Leroy (1987, Table 2.3). Figure 1 shows a cross plot of the variables, where the log temperature axis is reversed. The majority of observations follow a steep band called the main sequence. Rousseeuw and Leroy (1987) refer to the four stars to the top right of Figure 1 as ‘outliers’.

By consulting the original source of the data in Humphreys (1978), see also (Vansina & De Grève, 1982, Appendix A), we found that the 4 stars to the right of Figure 1 are of M-type (observations 11, 20, 30, 34) and they are red supergiants. Further, the fifth star from the right is of F-type (observation 7, called 44 Cyg). The next 31 stars (1 doublet) from the right are of B-type and the remaining 11 stars (1 doublet) furthest to the left are of the O-type. The doublets are not exact doublets in Humphreys’ original data, so this in itself should not be seen as evidence against a normality assumption.

We fitted the linear model $l o g l i g h t = β_{1} + β_{2} l o g t e m p e r a t u r e + σ ε_{i}$ ⁠. Table 4, left panel, shows LTS estimates for different h values. For $h = n = 47$ ⁠, the LTS estimator is the full sample OLS estimator. The β estimates are the ‘raw coefficients’ found by ltsReg while $\hat{σ}$ is computed directly from (2.3) without any consistency correction. Figure 1 shows LTS fits for selected values of h. It is seen that the fits rotate when h increases. Table 4 also reports the $T_{h}$ criterion as a function of h. It is minimized for $h = 42$ pointing at five ‘outliers’: The four M-stars and the F-star. Figure 1 indicates estimated ‘good’ observations and ‘outliers’ with solid and open dots.

Table 4.

Estimates by LTS and $T_{h}$ criterion for selecting h

	Full sample				Sub sample
h	${\hat{β}}_{1}$	${\hat{β}}_{2}$	$\hat{σ}$	$T_{h}$	${\hat{β}}_{1}$	${\hat{β}}_{2}$	$\hat{σ}$	$T_{h}$
25	$-$ 13.62	4.22	0.18	1.72	$-$ 13.62	4.22	0.18	1.72
36	$-$ 11.49	3.71	0.27	1.98	$-$ 11.49	3.71	0.27	1.98
37	$-$ 9.00	3.16	0.28	2.49	$-$ 9.00	3.16	0.28	2.49
40	$-$ 8.58	3.07	0.31	2.13	$-$ 8.58	3.07	0.31	2.13
41	$-$ 8.50	3.05	0.33	1.26	$-$ 8.50	3.05	0.33	1.26
42	$-$ 7.40	2.80	0.37	0.39	$-$ 7.40	2.80	0.37	0.39
43	$-$ 4.06	2.05	0.40	0.69	7.88	$-$ 0.65	0.49	2.57
44	1.89	0.70	0.49	0.49	7.74	$-$ 0.62	0.51	2.76
45	7.34	$-$ 0.53	0.51	2.94	7.58	$-$ 0.59	0.53	2.73
46	6.92	$-$ 0.44	0.53	2.74	7.12	$-$ 0.49	0.55	2.83
47	6.79	$-$ 0.41	0.55	2.75

	Full sample				Sub sample
h	${\hat{β}}_{1}$	${\hat{β}}_{2}$	$\hat{σ}$	$T_{h}$	${\hat{β}}_{1}$	${\hat{β}}_{2}$	$\hat{σ}$	$T_{h}$
25	$-$ 13.62	4.22	0.18	1.72	$-$ 13.62	4.22	0.18	1.72
36	$-$ 11.49	3.71	0.27	1.98	$-$ 11.49	3.71	0.27	1.98
37	$-$ 9.00	3.16	0.28	2.49	$-$ 9.00	3.16	0.28	2.49
40	$-$ 8.58	3.07	0.31	2.13	$-$ 8.58	3.07	0.31	2.13
41	$-$ 8.50	3.05	0.33	1.26	$-$ 8.50	3.05	0.33	1.26
42	$-$ 7.40	2.80	0.37	0.39	$-$ 7.40	2.80	0.37	0.39
43	$-$ 4.06	2.05	0.40	0.69	7.88	$-$ 0.65	0.49	2.57
44	1.89	0.70	0.49	0.49	7.74	$-$ 0.62	0.51	2.76
45	7.34	$-$ 0.53	0.51	2.94	7.58	$-$ 0.59	0.53	2.73
46	6.92	$-$ 0.44	0.53	2.74	7.12	$-$ 0.49	0.55	2.83
47	6.79	$-$ 0.41	0.55	2.75

Note. Left panel has full sample. Right panel excludes the F-type star.

Table 4.

Open in new tab Download slide

Estimates by LTS and $T_{h}$ criterion for selecting h

	Full sample				Sub sample
h	${\hat{β}}_{1}$	${\hat{β}}_{2}$	$\hat{σ}$	$T_{h}$	${\hat{β}}_{1}$	${\hat{β}}_{2}$	$\hat{σ}$	$T_{h}$
25	$-$ 13.62	4.22	0.18	1.72	$-$ 13.62	4.22	0.18	1.72
36	$-$ 11.49	3.71	0.27	1.98	$-$ 11.49	3.71	0.27	1.98
37	$-$ 9.00	3.16	0.28	2.49	$-$ 9.00	3.16	0.28	2.49
40	$-$ 8.58	3.07	0.31	2.13	$-$ 8.58	3.07	0.31	2.13
41	$-$ 8.50	3.05	0.33	1.26	$-$ 8.50	3.05	0.33	1.26
42	$-$ 7.40	2.80	0.37	0.39	$-$ 7.40	2.80	0.37	0.39
43	$-$ 4.06	2.05	0.40	0.69	7.88	$-$ 0.65	0.49	2.57
44	1.89	0.70	0.49	0.49	7.74	$-$ 0.62	0.51	2.76
45	7.34	$-$ 0.53	0.51	2.94	7.58	$-$ 0.59	0.53	2.73
46	6.92	$-$ 0.44	0.53	2.74	7.12	$-$ 0.49	0.55	2.83
47	6.79	$-$ 0.41	0.55	2.75

	Full sample				Sub sample
h	${\hat{β}}_{1}$	${\hat{β}}_{2}$	$\hat{σ}$	$T_{h}$	${\hat{β}}_{1}$	${\hat{β}}_{2}$	$\hat{σ}$	$T_{h}$
25	$-$ 13.62	4.22	0.18	1.72	$-$ 13.62	4.22	0.18	1.72
36	$-$ 11.49	3.71	0.27	1.98	$-$ 11.49	3.71	0.27	1.98
37	$-$ 9.00	3.16	0.28	2.49	$-$ 9.00	3.16	0.28	2.49
40	$-$ 8.58	3.07	0.31	2.13	$-$ 8.58	3.07	0.31	2.13
41	$-$ 8.50	3.05	0.33	1.26	$-$ 8.50	3.05	0.33	1.26
42	$-$ 7.40	2.80	0.37	0.39	$-$ 7.40	2.80	0.37	0.39
43	$-$ 4.06	2.05	0.40	0.69	7.88	$-$ 0.65	0.49	2.57
44	1.89	0.70	0.49	0.49	7.74	$-$ 0.62	0.51	2.76
45	7.34	$-$ 0.53	0.51	2.94	7.58	$-$ 0.59	0.53	2.73
46	6.92	$-$ 0.44	0.53	2.74	7.12	$-$ 0.49	0.55	2.83
47	6.79	$-$ 0.41	0.55	2.75

Note. Left panel has full sample. Right panel excludes the F-type star.

The estimation of h by the statistic $T_{h}$ appears a little shaky in Table 4, left panel. The lowest value is obtained for $h = 42$ ⁠, while the value for $h = 44$ is nearly as low. The slope coefficients ${\hat{β}}_{2}$ change gradually for $h > 42$ with no obvious choice of h for $h > 42$ ⁠. Table 4, right panel, shows corresponding results when dropping the F star from the sample. The results are the same for $h \leq 42$ ⁠. We see that $h = 42$ is now a clear minimum identifying the four M-type stars as ‘outliers’. The M stars appear to have a masking effect, where after their deletion, the F star emerges as very influential in the sense of Rousseeuw and Leroy (1987, p. 81). Perhaps for this reason, different conclusions are reached by different traditional methods. An LTS index plot points at 5 ‘outliers’, see Section 6.1, while an MM index plot (using lmrob in the R package robustbase) and LMS residuals (using lmsreg in the R package MASS) both point at 4 ‘outliers’, not detecting the F star as ‘outlying’.

Figure 2 shows kernel density plots for the scaled residuals for $h = 25, 42, 47$ ⁠. The black, thin lines gives kernel densities for the full sample. The red, dashed lines gives kernel densities for the estimated ‘good’ observations. The standard normal distribution is shown with a blue, thick line. For $h = 42$ ⁠, the red, blue, and part of the black lines coincide, which indicates the normality of the ‘good’ observations. The full sample kernel density has a probability mass in the right tail corresponding to the four giants. There is a slight discrepancy between the full sample and the ‘good’ kernel densities in the region from 2 to 4 corresponding to the F star. By construction this 43rd residual will be outside the range of the 42 ‘good’ residuals, but not by far. Kernel densities are very sensitive to the choice of kernel and bandwidth. For illustration, we chose a Gaussian kernel and bandwidth $1.5 h^{- 1 / 5}$ to get the best match of the red and blue curves for $h = 42$ ⁠. With that choice, we can more clearly see discrepancies between the kernel density for the ‘good’ observations and the normal curve for $h = 25, 47$ ⁠, that is, for the LTS with a high breakdown point and the OLS estimator, respectively.

Figure 2.

Scaled LTS residuals for $h = 25, 42, 47$ ⁠. Kernel densities for all residuals (thin) and for ‘good’ residuals (dashed) with fitted standard normal density (thick).

8.2 State infant mortality rates

We consider 1990 data for the United States on infant mortality rates by state, including Washington DC, which has a particularly high infant mortality rate (Wooldridge, 2015, p. 299). The data are available as infmrt in the R package wooldridge and are taken from U.S. Bureau of the Census (1994). We analyse two models.

The first model follows Wooldridge. It is a linear regression of the number of deaths within the first year per 1,000 live births, infmort, on the log of per capita income, lpcinc, the log of physicians per 100,000 members of the civilian population, lphysic, and the log of population in thousands, lpopul. In Figure 3, the graph using circle symbols shows the cumulant-based criteria $T_{h}$ as function of h. The OLS regression is obtained for $h = n = 51$ and has a rather large value of $T_{h}$ —notice the gap in the $T_{h}$ -axis—indicating that the full sample model is mis-specified. Choosing $h = 50$ would lead to one ‘outlier’, which is Washington DC. However, the $T_{h}$ function is minimized at $h = 45$ indicating six ‘outliers’: Delaware, Washington DC, Georgia, Texas, California, and Alaska (U.S. Bureau of the Census, 1994, Table 123). The interpretation is not obvious and could be due to a missing regressor.

Figure 3.

State infant mortality rates. $T_{h}$ criterion as function of h. Model for infmrt shown with circles. Model for $\log$ infmrt shown with squares.

Open in new tab Download slide

The second model differes in two respects. It applies logarithms to the regressand to stabilize rates close to zero as well as for Washington DC. It also includes a regressor for the log proportion of black people in the population, lblack (U.S. Bureau of the Census, 1992, Table 255), since infant mortality is quite different for white and black infants in most states (U.S. Bureau of the Census, 1994, Table 123). In Figure 3, the graph using square symbols shows $T_{h}$ versus h for this model. The minimizer is $h = 50$ with Washington DC as the ‘outlier’. The minimum of $0.08$ is small compared with a $χ_{2}^{2}$ distribution, so no evidence against the LTS model. We note that the $T_{h}$ function is also quite low for h in the left side of the plot, albeit not as low as for $h = 50$ ⁠. The estimated LTS model for $h = 50$ is

\begin{aligned} \underset{(s . e .)}{\hat{\log i n f m r t}} & = \underset{(0.98)}{4.91} - \underset{(0.128)}{0.104} l p c i n c - \underset{(0.093)}{0.251} l p h y s i c - \underset{(0.019)}{0.012} l p o p u l + \underset{(0.014)}{0.093} l b l a c k, \\ \hat{σ} & = 0.0973 . \end{aligned}

(8.1)

The standard errors are the usual OLS standard errors as the LTS model appears to apply. We conclude that the variables lpcinc and lpopul are not significant.

Changing the regressand to be measured on the original scale introduces a second ‘outlier’, South Dakota, which has one of the lowest infant mortality rates for the black population.

9 Concluding remarks

Estimating the proportion of ‘good’ observations. We consider a few issues regarding the estimation of h and the adequacy of the model in the two empirical examples.

First, the sample sizes, $n = 47$ and $n = 50$ ⁠, are rather small. The results in Section 6.2 indicate that the estimation of $λ = h / n$ based on the $T_{h}$ criterion is $\log h$ consistent, hence, it may be imprecise in small samples.

Second, the results in Section 6.2 focus on the location-scale LTS model. The empirical illustrations above consider models with regressors. We believe the estimation of λ based on the $T_{h}$ criterion extends to a well-specified multiple LTS regression model. However, as pointed out in Section 8.2, the omission of regressors could interfere with the estimation of λ. This aspect needs further investigation.

Third, the results in Section 6.2 refer to the LTS model, where, once ‘outliers’ have been removed, the remaining ‘good’ observations look normal. In practice, the adequacy of the model in a given data set has to be studied. A formal testing procedure is not yet available for the present model. Estimating λ by the $T_{h}$ criterion is unlikely to be consistent in a model where ‘outliers’ are generated by $ϵ$ -contamination.

Fourth, notwithstanding the above issues, the suggested procedure for selecting h seems to work well in the two empirical illustrations in this section and helped in finding satisfactory LTS models for the data.

Other models of the LTS type. New models and estimators can be generated by replacing the normal assumption in the LTS models with some other distribution. For instance, the uniform leads to the least median of squares (LMS) estimator analysed in Online Supplementary Material, Appendix C, while the Laplace distribution leads to the least trimmed sum of absolute deviations (LTA) estimator (Dodge & Jurečková, 2000; Hawkins & Olive, 1999; Hössjer, 1994, Section 2.7). Ron Butler has suggested to us that the approach in this paper could also be applied to the minimum covariance determinant (MCD) estimator for multivariate location and scale (Butler et al., 1993; Rousseeuw, 1985).

Alternative models for ‘outliers’. The maximum likelihood argument in Section 4 would also work if the ‘outlier’ errors $ε_{j}$ rather than $ξ_{j}$ have distribution $G_{j}$ ⁠. The analysis would be related to the trimmed likelihood argument of Gallegos and Ritter (2009). However, the resulting LTS estimator would have less attractive asymptotic properties in that model compared to those derived in this paper.

Inference requires a model for both ‘good’ and ‘outlying’ observations. In the presented theory, the ‘good’ and the ‘outlying’ observations are separated. The traditional approach, as advocated by Huber (1964), is to consider mixture distributions formed by mixing a reference distribution with a contamination distribution. Any subsequent inference on the regression parameter β would require a specific formulation of the $ϵ$ -contaminated distribution. This is a nontrivial practical problem. Instead, there has been a focus on showing that the bias of estimators is bounded under contamination (Huber & Ronchetti, 2009, Section 4) while inference is conducted using the asymptotic distribution that assumes all observations are i.i.d. normal, implying that the ‘good’ observations are truncated normal. This gives a different distribution theory for inference compared with the one presented here and simulations in Section 7.1 indicate that this will not control size in general. It is therefore of interest to formulate models allowing ‘outliers’ under which consistency can be proved. The LTS model in this paper does so. The presented simulations show that the two inferential theories are really very different. In practice, LTS estimation should therefore be evaluated in the context of a particular model and inference should be conducted accordingly.

Alternative estimators of h. We have proposed consistent estimators for $h / n$ ⁠, but it would be useful to investigate their performance further. In a regression context, it may be worth considering the Forward Search algorithm (Atkinson et al., 2010). Omitted regressors and ‘outliers’ may confound each other, so a simultaneous search over these may be useful as in the Autometrics algorithm (Castle et al., 2021; Hendry & Doornik, 2014). Some asymptotic theory for these algorithms are provided in Johansen and Nielsen (2016a, 2016b).

Misspecification tests can be developed for the present model. The asymptotic theory developed here shows that standard normality tests can be applied to the set of estimated ‘good’ observations. Other tests could also be investigated, in particular those that are concerned with functional form or omission of regressors.

More ‘outliers’ than ‘good’ observations. This is allowed in the LTS model and supported by the asymptotic analysis under regularity conditions for the ‘outliers’. This lends some support to the practice of starting the Forward Search algorithm by an LTS estimator for fewer than half of the observations (Atkinson et al., 2010). Whether it makes more sense to model the ‘outliers’ or the ‘good’ observations as normally distributed in this situation must rest on a careful consideration of the data and the substantive context.

Acknowledgments

Comments from R. Butler, Y. Gao, O. Hao, C. Henning, referees, and the associate editors are gratefully acknowledged.

Data availability

All the data that we use are publicly available; the precise sources are cited in the article. The source code files in R available as Supplementary material.

Supplementary material

Supplementary material are available at Journal of the Royal Statistical Society: Series B online.

Appendix A. Derivation of the LTS likelihood

We prove that $P (z_{j} - ϵ < y_{j} \leq z_{j} | y_{i} for i \in ζ) = Δ^{ϵ} G_{j} ({\tilde{z}}_{j}^{β σ}) = G_{j} ({\tilde{z}}_{j}^{β σ ϵ}) - G_{j} ({\tilde{z}}_{j}^{β σ}) .$

First, we prove that $P_{0} = P (y_{j} \leq z_{j} | \cdot) = G ({\tilde{z}}_{j}^{β σ})$ ⁠, where the dot indicates the conditioning set. Since $ε_{j} = (y_{j} - β^{'} x_{j}) / σ$ and $z_{j}^{β σ} = (z_{j} - β^{'} x_{j}) / σ$ ⁠, we have that $P_{0} = P (ε_{j} \leq z_{j}^{β σ} | \cdot)$ ⁠.

Recall that the ‘outlier’ errors are $ε_{j} = (max_{i \in ζ} y_{i}^{β σ} + ξ_{j}) 1_{(ξ_{j} > 0)} + (min_{i \in ζ} y_{i}^{β σ} + ξ_{j}) 1_{(ξ_{j} < 0)},$ see (3.1) and since $y_{i}^{β σ} = ε_{i}$ ⁠. Further, $ξ_{j}$ has distribution function $G_{j}$ ⁠, which is continuous at zero. As a consequence, $P (min_{i \in ζ} y_{i}^{β σ} \leq ε_{j} \leq max_{i \in ζ} y_{i}^{β σ} | \cdot) = 0$ and $P (ε_{j} \leq min_{i \in ζ} y_{i}^{β σ}) = P (ξ_{j} < 0) = G_{j} (0)$ ⁠. Thus, $P_{0} = P (ε_{j} \leq z_{j}^{β σ} | \cdot) = G_{j} (0)$ for $min_{i \in ζ} y_{i}^{β σ} \leq z_{j}^{β σ} \leq max_{i \in ζ} y_{i}^{β σ}$ ⁠.

Let $P_{0} = P_{1} + P_{2}$ ⁠, where $P_{1} = P {(ε_{j} \leq z_{j}^{β σ}) \cap (ξ_{j} < 0) | \cdot}$ and $P_{2} = P {(ε_{j} \leq z_{j}^{β σ}) \cap (ξ_{j} > 0) | \cdot}$ ⁠.

By (3.1), we have $(ε_{j} \leq z_{j}^{β σ}) = (ξ_{j} \leq z_{j}^{β σ} - min_{i \in ζ} y_{i}^{β σ})$ ⁠, so that $P_{1}$ can be written as $P {ξ_{j} \leq min (z_{j}^{β σ} - min_{i \in ζ} y_{i}^{β σ}, 0) | \cdot}$ ⁠. Hence,

P_{1} = {\begin{matrix} G_{j} (z_{j}^{β σ} - min_{i \in ζ} y_{i}^{β σ}) & if & z_{j}^{β σ} < min_{i \in ζ} y_{i}^{β σ}, \\ G_{j} (0) & if & z_{j}^{β σ} > min_{i \in ζ} y_{i}^{β σ} . \end{matrix}

Similarly, $P_{2} = P {(ξ_{j} \leq z_{j}^{β σ} - max_{i \in ζ} y_{i}^{β σ}) \cap (ξ_{j} > 0) | \cdot}$ by (3.1). If $z_{j}^{β σ} < max_{i \in ζ} y_{i}^{β σ}$ ⁠, then the intersection is empty. If instead $z_{j}^{β σ} > max_{i \in ζ} y_{i}^{β σ}$ ⁠, then, the intersection is the set $(0 < ξ_{i} \leq z_{j}^{β σ} - max_{i \in ζ} y_{i}^{β σ})$ ⁠. Hence,

P_{2} = {\begin{matrix} 0 & if & z_{j}^{β σ} < max_{i \in ζ} y_{i}^{β σ}, \\ G_{j} (z_{j}^{β σ} - max_{i \in ζ} y_{i}^{β σ}) - G_{j} (0) & if & z_{j}^{β σ} > max_{i \in ζ} y_{i}^{β σ} . \end{matrix}

Note also that if $z_{j}^{β σ} < min_{i \in ζ} y_{i}^{β σ}$ ⁠, then $z_{j}^{β σ} < max_{i \in ζ} y_{i}^{β σ}$ ⁠. And, if $z_{j}^{β σ} > max_{i \in ζ} y_{i}^{β σ}$ ⁠, then $z_{j}^{β σ} > min_{i \in ζ} y_{i}^{β σ}$ ⁠. In combination, we have

P_{0} = P_{1} + P_{2} = {\begin{matrix} G_{j} (z_{j}^{β σ} - min_{i \in ζ} y_{i}^{β σ}) & if & z_{j}^{β σ} < min_{i \in ζ} y_{i}^{β σ}, \\ G_{j} (0) & if & min_{i \in ζ} y_{i}^{β σ} \leq z_{j}^{β σ} \leq max_{i \in ζ} y_{i}^{β σ}, \\ G_{j} (z_{j}^{β σ} - max_{i \in ζ} y_{i}^{β σ}) & if & z_{j}^{β σ} > max_{i \in ζ} y_{i}^{β σ} . \end{matrix}

Recall the notation ${\tilde{z}}_{j}^{β σ} = (z_{j}^{β σ} - min_{i \in ζ} ε_{i}) 1_{(z_{j}^{β σ} < min_{i \in ζ} ε_{i})} + (z_{j}^{β σ} - max_{i \in ζ} ε_{i}) 1_{(z_{j}^{β σ} > max_{i \in ζ} ε_{i})}$ ⁠. We then get $P_{0} = P (y_{j} < z_{j} | \cdot) = G_{j} ({\tilde{z}}_{j}^{β σ})$ ⁠.

Second, similarly, $P (y_{j} < z_{j} - ϵ | \cdot) = G_{j} ({\tilde{z}}_{j}^{β σ ϵ})$ ⁠. Combining, the desired result follows.

Appendix B. Proofs of asymptotic theory for the LTS model

We start with some preliminary extreme value results, which then allows analysis of the case with more ‘good’ observations than ‘outliers’. Then some results on empirical processes follows, which are needed for the general case.

B.1 Extreme values

For a distribution function $F$ define the quantile function $Q (ψ) = inf {c : F (c) \geq ψ}$ ⁠.

Lemma B.1

Suppose $F (c) = 0$ for $c < 0$ ⁠. Let $ψ_{n} = o_{P} (1)$ ⁠. Then $Q_{n} (ψ_{n}) = O_{P} (1)$ ⁠.

Proof.

Let a small $ϵ > 0$ be given. Then a finite $x \geq 0$ exists to that $f = F (x) \geq 1 - ϵ$ ⁠. We show $P_{n} = P (A_{n}) \leq 2 ϵ$ where $A_{n} = {Q_{n} (ψ_{n}) \geq x}$ ⁠. Applying $F_{n}$ ⁠, we get $A_{n} = {ψ_{n} \geq F_{n} (x)}$ ⁠. By the Law of Large Numbers, $F_{n} (x) = f + o_{P} (1)$ ⁠. Hence, if $B_{n} = {F_{n} (x) > f - ϵ}$ then $P_{n} (B_{n}) > 1 - ϵ$ for large n. Since $A_{n} = (A_{n} \cap B_{n}) \cup (A_{n} \cap B_{n}^{c})$ ⁠, we have $A_{n} \subset (A_{n} \cap B_{n}) \cup B_{n}^{c}$ ⁠. Here, $P (B_{n}^{c}) \leq ϵ$ by construction. Moreover, $A_{n} \cap B_{n} \subset (ψ_{n} > f - ϵ)$ where $P (ψ_{n} > f - ϵ) \leq ϵ$ for large n since $ψ_{n} = o_{P} (1)$ by assumption. Thus, $P_{n} \leq 2 ϵ$ ⁠. □

Lemma B.2

Suppose $ε_{1}, \dots, ε_{n}$ are i.i.d. with $E | ε_{i} |^{q} < \infty$ for some $q > 0$ ⁠. Then $ε_{(n)} = o_{P} (n^{1 / p})$ for any $p < q$ ⁠.

Proof.

We show $P_{n} = P {ε_{(n)} \geq ϵ n^{1 / p}} \to 0$ for any $ϵ > 0$ ⁠. Write $P_{n} = P \cup_{i = 1}^{n} (ε_{i} \geq ϵ n^{1 / p})$ ⁠. Boole’s inequality gives $P_{n} \leq \sum_{i = 1}^{n} P (ε_{i} \geq ϵ n^{1 / p}) = n P (ε_{1} \geq ϵ n^{1 / p})$ ⁠. Markov’s inequality gives $P_{n} \leq n^{1 - q / p} E | ε_{i} |^{q}$ ⁠, which vanishes for $p < q$ ⁠. □

B.2 Some initial properties of the LTS estimator

We consider the LTS estimator under the sequence of data generating processes defined in Section 5.1. The main challenge is to show that $\hat{δ}$ is close to the $Binomial (n - h_{n}, ρ)$ -distributed variable $δ_{n} = \sum_{j \in ζ_{n}} 1_{(ε_{j} < min_{i \in ζ_{n}} ε_{i})}$ ⁠. In the following lemmas, we condition on the sequence $δ_{n}$ ⁠, so that the randomness stems from ‘good’ errors $ε_{(δ_{n} + 1)}, \dots, ε_{(δ_{n} + h_{n})}$ and the magnitudes of the ‘outliers’, ${\underline{ε}}_{(δ_{n} + 1 - j)}$ for $j \leq δ_{n}$ and ${\bar{ε}}_{(j - δ_{n} - h_{n})}$ for $j > δ_{n} + h_{n}$ ⁠. The unconditional statements in the Theorems about $\hat{δ}$ are then derived as follows. If $P (\hat{δ} - δ_{n} \in I_{n} | δ_{n}) \to 1$ for an interval $I_{n}$ and a sequence $δ_{n}$ then by the law of iterated expectations

P (\hat{δ} - δ_{n} \in I_{n}) = E {P (\hat{δ} - δ_{n} \in I_{n} | δ_{n})} \to 1,

(B1)

due to the dominated converges theorem, because $P (\hat{δ} - δ_{n} \in I_{n} | δ_{n})$ is bounded.

We give detailed proofs for the case $\hat{δ} > δ_{n}$ ⁠, so that some of the small ‘good’ observations are considered left ‘outliers’ and some of the small right ‘outliers’ are considered ‘good’. The case $\hat{δ} < δ_{n}$ is analogous, since we can multiply all observations by $- 1$ and relabel left and right. When considering $\hat{δ} > δ_{n}$ we note that $\hat{δ} - δ_{n} \leq \bar{n}$ ⁠, the number of right ‘outliers’. Recall $ρ = G (0)$ ⁠. Hence, if $ρ = 1$ ⁠, then all outliers are to the left. In particular, due to the binomial construction of $δ_{n}$ then $\bar{n} = 0$ $a . s$ ⁠. when $ρ = 1$ ⁠, so that the event $\hat{δ} > δ_{n}$ is a null set. Thus, when analysing $(\hat{δ} - δ_{n}) / h_{n}$ it suffices to consider $\hat{δ} > δ_{n}$ and $ρ < 1$ ⁠.

The next lemma concerns cases where $\hat{s} = \hat{δ} - δ_{n}$ is small. We show that the LTS estimators are close to the infeasible OLS estimators for the ‘good’ observations. We note that, by Lemma B.2 and Assumption 5.1(ii) that $E | ε_{i} |^{2 + ω} < \infty$ ⁠, we can find a small $η > 0$ so that $ε_{(δ_{n} + 1)}, ε_{(δ_{n} + h_{n})} = o_{P} (h_{n}^{1 / 2 - η})$ ⁠. We write $ε_{(\hat{δ} + i)}^{p}$ for ${ε_{(\hat{δ} + i)}}^{p}$ ⁠.

Lemma B.3

Consider the LTS estimator under Assumption 5.1(ii). Let $\hat{δ} = δ_{n} + O_{P} (h_{n}^{η})$ for some small $η > 0$ defined above. Then, $h_{n}^{1 / 2} (\hat{μ} - {\hat{μ}}_{δ_{n}})$ and ${\hat{σ}}^{2} - {\hat{σ}}_{δ_{n}}^{2}$ are $o_{P} (1)$ ⁠.

Proof.

The estimators

\hat{μ}

and

{\hat{σ}}^{2}

are formed from the sample moments of

ε_{(\hat{δ} + 1)}, \dots, ε_{(\hat{δ} + h_{n})} .

Let

E_{n p} = \sum_{i = 1}^{h_{n}} ε_{(\hat{δ} + i)}^{p} - \sum_{i = 1}^{h_{n}} ε_{(δ_{n} + i)}^{p}

⁠. We need to show that

E_{n 1} = o_{P} (h_{n}^{1 / 2})

and

E_{n 2} = o_{P} (h_{n})

⁠. As remarked above, we condition on

δ_{n}

and consider

\hat{δ} > δ_{n}

while assuming

ρ < 1

⁠. Then

E_{n p} = \sum_{i = 1}^{h_{n} + δ_{n} - \hat{δ}} ε_{(\hat{δ} + i)}^{p} + \sum_{i = h_{n} + δ_{n} - \hat{δ} + 1}^{h_{n}} ε_{(\hat{δ} + i)}^{p} - \sum_{i = 1}^{\hat{δ} - δ_{n}} ε_{(δ_{n} + i)}^{p} - \sum_{i = \hat{δ} - δ_{n} + 1}^{h_{n}} ε_{(δ_{n} + i)}^{p} .

The first and the fourth sum cancel. In the second sum, change summation index

j = i - h_{n} - δ_{n} + \hat{δ}

⁠, so that

{\hat{δ}}_{n} + i = δ_{n} + h_{n} + j

⁠, and replace

ε_{(δ_{n} + h_{n} + j)} = ε_{(δ_{n} + h_{n})} + {\bar{ε}}_{(j)}

⁠. This gives

E_{n p} = \sum_{i = 1}^{\hat{δ} - δ_{n}} {ε_{(δ_{n} + h_{n} + i)}^{p} - ε_{(δ_{n} + i)}^{p}} = \sum_{i = 1}^{\hat{δ} - δ_{n}} [{ε_{(δ_{n} + h_{n})} + {\bar{ε}}_{(i)}}^{p} - ε_{(δ_{n} + i)}^{p}] .

Here,

ε_{(δ_{n} + h_{n})}

is the maximum and

ε_{(δ_{n} + i)}

is the ith order statistic of the ‘good’ errors.

For

p = 1,

we find

ε_{(δ_{n} + i)} \geq ε_{(δ_{n} + 1)}

and

{\bar{ε}}_{(i)} \leq {\bar{ε}}_{(\hat{δ} - δ_{n})}

⁠, so that

E_{n 1} \leq (\hat{δ} - δ_{n}) {ε_{(δ_{n} + h_{n})} - ε_{(δ_{n} + 1)} + {\bar{ε}}_{(\hat{δ} - δ_{n})}} .

Here,

{\bar{ε}}_{(\hat{δ} - δ_{n})}

is the

(\hat{δ} - δ_{n}) / \bar{n}

empirical quantile in the

\bar{G}

distribution. By assumption

\hat{δ} - δ_{n} = O_{P} (h_{n}^{η})

and

\bar{n} / h_{n} \to \bar{ω} = (1 - λ) (1 - ρ) / λ > 0

so that

(\hat{δ} - δ_{n}) / \bar{n} = O_{P} (h_{n}^{η - 1}) = o_{P} (1)

⁠. Lemma B.1 then shows

{\bar{ε}}_{(\hat{δ} - δ_{n})} = O_{P} (1)

⁠. Further, Lemma B.2 using Assumption 5.1(ii) that

E | ε_{i} |^{2 + ω} < \infty

shows that

ε_{(δ_{n} + 1)}, ε_{(δ_{n} + h_{n})}

are

o_{P} (h_{n}^{1 / 2 - η})

for some

η > 0

⁠. In combination,

E_{n 1} = O_{P} (h_{n}^{η}) {o_{P} (h_{n}^{1 / 2 - η}) + O_{P} (1)} = o_{P} (n^{1 / 2})

⁠.

For

p = 2

⁠, we find similarly, using the inequality

(x + y)^{2} \leq 2 (x^{2} + y^{2})

⁠,

E_{n 2} \leq \sum_{i = 1}^{\hat{δ} - δ_{n}} {2 ε_{(δ_{n} + h_{n})}^{2} + 2 {\bar{ε}}_{(i)}^{2} - ε_{(δ_{n} + i)}^{2}} \leq 2 (\hat{δ} - δ_{n}) {ε_{(δ_{n} + h_{n})}^{2} + {\bar{ε}}_{(\hat{δ} - δ_{n})}^{2}} .

Apply the above bounds

\hat{δ} - δ_{n} = O_{P} (h_{n}^{η})

⁠,

ε_{(δ_{n} + h_{n})} = o_{P} (h_{n}^{1 / 2 - η})

and

{\bar{ε}}_{(\hat{δ} - δ_{n})} = O_{P} (1)

to get that

E_{n 2} = O_{P} (h_{n}^{η}) [o_{P} {h_{n}^{(1 / 2 - η) 2}} + O_{P} (1)] = o_{P} (h_{n})

⁠. □

The next Lemma is the main ingredient to showing consistency of $\hat{δ}$ ⁠. It is convenient to define sequences

{\underline{s}}_{n} = h_{n}^{η}, {\bar{s}}_{n} = | ε_{(δ_{n} + h_{n})} |^{- 1 / 2} h_{n} .

(B2)

We note that by Assumption 5.1(iii), $ε_{i}$ has a distribution function with infinite support. Hence, the extreme value $ε_{(δ_{n} + h_{n})}$ diverges to infinity and $ε_{(δ_{n} + h_{n})}^{- 1} = o_{P} (1)$ ⁠. By Assumption 5.1(ii), Lemma B.2 applies, so that $ε_{(δ_{n} + h_{n})} = o_{P} (h_{n}^{1 / 2 - η})$ for some $η > 0$ ⁠. Then, for large $h_{n}$ and η small, $0 < {\underline{s}}_{n} < {\bar{s}}_{n}$ where ${\bar{s}}_{n} / h_{n} = o_{P} (1)$ ⁠.

Lemma B.4

Suppose Assumption 5.1 and let $ρ < 1$ and $0 < λ < 1$ ⁠. Then, for any $η > 0$ ⁠, we have $min_{{\underline{s}}_{n} \leq s \leq h_{n} - {\bar{s}}_{n}} h_{n}^{1 - η} ({\hat{σ}}_{δ_{n} + s}^{2} - {\hat{σ}}_{δ_{n}}^{2}) \to \infty$ in probability.

Proof.

We condition on

δ_{n}

⁠, the number of left ‘outliers’. Recall that the ordered ‘good’ observations are

ε_{(δ_{n} + 1)}, \dots, ε_{(δ_{n} + h_{n})}

⁠. Let

ε_{(δ_{n} + h_{n} + j)} = ε_{(δ_{n} + h_{n})} + {\bar{ε}}_{(j)}

for

1 \leq j \leq \bar{n}

⁠. Expand, see Online Supplementary Material, Section D in Supplementary material,

S_{s} = ({\hat{σ}}_{δ_{n} + s}^{2} - {\hat{σ}}_{δ_{n}}^{2}) / σ = (s / h_{n}) {1 - (s / h_{n})} ε_{(δ_{n} + h_{n})}^{2} + A_{n},

(B3)

with coefficients

A_{n} = A_{n 1} - A_{n 2} + 2 A_{n 3} - 2 A_{n 4}

where

\begin{aligned} A_{n 1} & = \frac{1}{h_{n}} \sum_{j = 1}^{s} {\bar{ε}}_{(j)}^{2} - {\frac{1}{h_{n}} \sum_{j = 1}^{s} {\bar{ε}}_{(j)}}^{2}, A_{n 3} = \frac{1}{h_{n}} \sum_{j = 1}^{s} {\bar{ε}}_{(j)} \frac{1}{h_{n}} \sum_{i = s + 1}^{h_{n}} {ε_{(δ_{n} + h_{n})} - ε_{(δ_{n} + i)}}, \\ A_{n 2} & = \frac{1}{h_{n}} \sum_{i = 1}^{s} ε_{(δ_{n} + i)}^{2} - {\frac{1}{h_{n}} \sum_{i = 1}^{s} ε_{(δ_{n} + i)}}^{2}, A_{n 4} = \frac{1}{h_{n}} \sum_{i = s + 1}^{h_{n}} ε_{(δ_{n} + i)} \frac{1}{h_{n}} \sum_{i = 1}^{s} {ε_{(δ_{n} + h_{n})} - ε_{(δ_{n} + i)}} . \end{aligned}

We find a lower bound for

A_{n}

⁠. Notice

A_{n 1}, A_{n 3} \geq 0

⁠. Further,

A_{n 2} \leq h_{n}^{- 1} \sum_{i = 1}^{s} ε_{(δ_{n} + i)}^{2} = B_{n 2}

say. For

A_{n 4}

⁠, use Jensen’s inequality, add further summand and use the Law of Large Numbers using Assumption 5.1(i) for the unordered normal ‘good’

ε_{δ_{n} + i}

to get

{| \frac{1}{h_{n}} \sum_{i = s + 1}^{h_{n}} ε_{(δ_{n} + i)} |}^{2} \leq \frac{1}{h_{n}} \sum_{i = s + 1}^{h_{n}} ε_{(δ_{n} + i)}^{2} \leq \frac{1}{h_{n}} \sum_{i = 1}^{h_{n}} ε_{(δ_{n} + i)}^{2} = \frac{1}{h_{n}} \sum_{i = 1}^{h_{n}} ε_{δ_{n} + i}^{2} \overset{P}{\to} 1.

(B4)

Further, we have

h_{n}^{- 1} \sum_{i = 1}^{s} {ε_{(δ_{n} + h_{n})} - ε_{(δ_{n} + i)}} \leq (s / h_{n}) {ε_{(δ_{n} + h_{n})} - ε_{(δ_{n} + 1)}} = B_{n 4}

say, so that

| A_{n 4} | \leq B_{n 4} {1 + o_{P} (1)}

⁠, where the remainder term from (B4) is uniform in s. In combination,

S_{s} \geq (s / h_{n}) (1 - s / h_{n}) ε_{(δ_{n} + h_{n})}^{2} - B_{n 2} - 2 B_{n 4} {1 + o_{P} (1)},

(B5)

where the

o_{P} (1)

term is uniform in s. We analyse separately for

s \leq {\bar{s}}_{n}

and

{\bar{s}}_{n} \leq s

⁠.

1. Consider ${\bar{s}}_{n} \leq s \leq h_{n} - {\bar{s}}_{n}$ where ${\bar{s}}_{n} / h_{n} = | ε_{(δ_{n} + h_{n})} |^{- 1 / 2}$ ⁠. We start by finding a lower bound to $(s / h_{n}) (1 - s / h_{n}) ε_{(δ_{n} + h_{n})}^{2}$ ⁠. The range of interest is ${\bar{s}}_{n} / h_{n} \leq s / h_{n} \leq 1 - {\bar{s}}_{n} / h_{n}$ ⁠. As argued above ${\bar{s}}_{n} / h_{n} = o_{P} (1)$ ⁠. The function $x (1 - x)$ is concave with roots at $x = 0$ and $x = 1$ ⁠. Then, for $x_{0} < x < 1 - x_{0}$ with $0 < x_{0} < 1 / 2$ ⁠, the function $x (1 - x)$ has two minima, one at $x_{0}$ and the other one at $1 - x_{0}$ ⁠. Therefore, $x (1 - x) \geq x_{0} (1 - x_{0}) \geq x_{0} / 2$ ⁠. Thus, $2 (s / h_{n}) (1 - s / h_{n}) \geq {\bar{s}}_{n} / h_{n} = | ε_{(δ_{n} + h_{n})} |^{- 1 / 2}$ on the considered range.

Next, we bound $B_{n 2} \leq 1 + o_{P} (1)$ uniformly in s as in (B4).

For $B_{n 4} = (s / h_{n}) {ε_{(δ_{n} + h_{n})} - ε_{(δ_{n} + 1)}}$ use $s / h_{n} \leq 1$ so that $B_{n 4} \leq {ε_{(δ_{n} + h_{n})} - ε_{(δ_{n} + 1)}}$ ⁠.

Thus, (B5) reduces to $2 S_{s} \geq | ε_{(δ_{n} + h_{n})} |^{- 1 / 2} ε_{(δ_{n} + h_{n})}^{2} - 2 {1 + 2 ε_{(δ_{n} + h_{n})} - 2 ε_{(δ_{n} + 1)}} {1 + o_{P} (1)} .$ Since $ε_{(δ_{n} + h_{n})} \to \infty$ in probability and $ε_{(δ_{n} + 1)} / ε_{(δ_{n} + h_{n})} = - 1 + o_{P} (1)$ by Assumption 5.1(iv) we get that $min_{{\bar{s}}_{n} \leq s \leq h_{n} - {\bar{s}}_{n}} 2 S_{s} \geq | ε_{(δ_{n} + h_{n})} |^{3 / 2} {1 + o_{P} (1)}$ ⁠. In particular, $min_{{\bar{s}}_{n} \leq s \leq h_{n} - {\bar{s}}_{n}} h_{n}^{1 - η} S_{s} \to \infty$ in probability.

2. Consider ${\underline{s}}_{n} \leq s \leq {\bar{s}}_{n}$ where ${\underline{s}}_{n} = h_{n}^{η}$ for any $η > 0$ and ${\bar{s}}_{n} = | ε_{(δ_{n} + h_{n})} |^{- 1 / 2} h_{n}$ ⁠, see (B2). We find bounds for the $B_{n ℓ}$ terms in (B5).

First, $B_{n 2} = h_{n}^{- 1} \sum_{i = 1}^{s} ε_{(δ_{n} + i)}^{2}$ ⁠. Write $B_{n 2} = h_{n}^{- 1} {\sum_{i = 1}^{r_{n}} ε_{(δ_{n} + i)}^{2} + \sum_{i = r_{n} + 1}^{s} ε_{(δ_{n} + i)}^{2}}$ ⁠, where $r_{n} = h_{n}^{η / 2}$ ⁠. In the left tail, the squared order statistics decrease with increasing index with large probability. Hence, we can bound $B_{n 2} \leq h_{n}^{- 1} {r_{n} ε_{(δ_{n} + 1)}^{2} + (s - r_{n}) ε_{(δ_{n} + r_{n})}^{2}}$ ⁠. Bounding $(s - r_{n}) \leq s$ and $r_{n} / s \leq r_{n} / {\underline{s}}_{n} = h_{n}^{- η / 2}$ we get $B_{n 2} \leq (s / h_{n}) {h_{n}^{- η / 2} ε_{(δ_{n} + 1)}^{2} + ε_{(δ_{n} + r_{n})}^{2}}$ ⁠. By Assumption 5.1(iv,v) then $ε_{(δ_{n} + 1)} / ε_{(δ_{n} + h_{n})} = - 1 + o_{P} (1)$ and $ε_{(δ_{n} + r_{n})} / ε_{(δ_{n} + 1)} \leq C_{η} + o_{P} (1)$ for some $C_{η} < 1$ ⁠. Then, $B_{n 2} \leq (s / h_{n}) ε_{(δ_{n} + h_{n})}^{2} [h_{n}^{- η / 2} {1 + o_{P} (1)} + {C_{η}^{2} + o_{P} (1)}]$ ⁠, which reduces to $(s / h_{n}) ε_{(δ_{n} + h_{n})}^{2} {C_{η}^{2} + o_{P} (1)}$ ⁠.

Second, $B_{n 4} = (s / h_{n}) {ε_{(δ_{n} + h_{n})} - ε_{(δ_{n} + 1)}}$ by definition.

Insert bounds for

B_{n 2}, B_{n 4}

in (B5) along with

s / h_{n} \leq {\bar{s}}_{n} / h_{n} = | ε_{(δ_{n} + h_{n})} |^{- 1 / 2}

to get

\begin{aligned} S_{s} \geq & (s / h_{n}) [ε_{(δ_{n} + h_{n})}^{2} - | ε_{(δ_{n} + h_{n})} |^{3 / 2} - ε_{(δ_{n} + h_{n})}^{2} {C_{η}^{2} + o_{P} (1)} - 2 {ε_{(δ_{n} + h_{n})} - ε_{(δ_{n} + 1)}} \\ {1 + o_{P} (1)}] . \end{aligned}

Using that

ε_{(δ_{n} + 1)} / ε_{(δ_{n} + h_{n})} = - 1 + o_{P} (1)

and

ε_{(δ_{n} + h_{n})}^{- 1} = o_{P} (1)

by Assumption 5.1(iii) gives that

S_{s} \geq (s / h_{n}) ε_{(δ_{n} + h_{n})}^{2} {1 - C_{η}^{2} + o_{P} (1)} .

Further

s \geq {\underline{s}}_{n}

where

{\underline{s}}_{n} / h_{n} = h_{n}^{η - 1}

⁠. Thus,

min_{{\underline{s}}_{n} \leq s \leq {\bar{s}}_{n}} h_{n}^{1 - η} S_{s} \to \infty

in probability. □

B.3 Fewer ‘outliers’ than ‘good’ observations

Proof of Theorem 5.2.

First, we show that $\hat{s} = \hat{δ} - δ_{n} = o_{P} (h_{n}^{η})$ for any $η > 0$ ⁠. It suffices to show $\hat{s} = O_{P} (h_{n}^{η})$ for each η.

As discussed above, we consider the case $\hat{δ} > δ_{n}$ ⁠, so that some of the small ‘good’ observations are considered left ‘outliers’ and some of the small right ‘outliers’ are considered ‘good’. The case $\hat{δ} < δ_{n}$ is analogous. When considering $\hat{δ} > δ_{n}$ we note that $\hat{s} = \hat{δ} - δ_{n} \leq \bar{n}$ ⁠, the number of right ‘outliers’. Moreover, $\hat{s} \leq \bar{n} \leq n - h_{n}$ ⁠, since there are $\bar{n}$ right ‘outliers’ and $n - h_{n}$ ‘outliers’ in total.

Choose ${\underline{s}}_{n} = h_{n}^{η}$ and ${\bar{s}}_{n} = | ε_{(δ_{n} + h_{n})} |^{- 1 / 2} h_{n}$ as in (B2). Since $\hat{s} / h_{n} \leq (n - h_{n}) / h_{n}$ converges to a value less than unity then $\hat{s} / h_{n} \leq 1 - {\bar{s}}_{n} / h_{n}$ for large n. Thus, if we show $P ({\underline{s}}_{n} \leq \hat{s} \leq h_{n} - {\bar{s}}_{n}) \to 0$ then $P (\hat{s} < {\underline{s}}_{n}) \to 1$ as desired. Now, $\hat{s}$ is the minimizer of $S_{s} = {\hat{σ}}_{δ_{n} + s}^{2} - {\hat{σ}}_{δ_{n}}^{2}$ ⁠. Since $S_{0} = 0$ ⁠, then $P ({\underline{s}}_{n} \leq \hat{s} \leq h_{n} - {\bar{s}}_{n}) \to 0$ if $min_{{\underline{s}}_{n} \leq s \leq h_{n} - {\bar{s}}_{n}} h_{n}^{1 - η} ({\hat{σ}}_{δ_{n} + s}^{2} - {\hat{σ}}_{δ_{n}}^{2}) \to \infty$ in probability. This follows from Lemma B.4 using Assumption 5.1.

Second, since $\hat{δ} - δ_{n} = o_{P} (h_{n}^{η})$ then Lemma B.3 using Assumption 5.1(ii) shows that $h_{n}^{1 / 2} (\hat{μ} - {\hat{μ}}_{δ_{n}})$ ⁠, ${\hat{σ}}^{2} - {\hat{σ}}_{δ_{n}}^{2}$ are $o_{P} (1)$ ⁠.

Third, the i.i.d. Law of Large Numbers and Central Limit Theorem using Assumption 5.1(i) show that $h_{n}^{1 / 2} ({\hat{μ}}_{δ_{n}} - μ) / σ$ is asymptotically normal while ${\hat{σ}}_{δ_{n}}^{2}$ is consistent for $σ^{2}$ ⁠. □

B.4 Marked empirical processes evaluated at quantiles

We start with some preliminary results on marked empirical processes evaluated at quantiles.

Consider random variables $ε_{i}$ for $i = 1, \dots, n$ and define the marked empirical distribution and its expectation, for $c \geq 0$ ⁠, by

F_{n}^{p} (c) = n^{- 1} \sum_{i = 1}^{n} ε_{i}^{p} 1_{(ε_{i} \leq c)}, {\bar{F}}^{p} (c) = {EF}_{n}^{p} (c) = E ε_{1}^{p} 1_{(ε_{1} \leq c)} .

(B6)

For $p = 0$ ⁠, let $F_{n}^{0} = F_{n}$ ⁠. We also define the quantile function $Q (ψ) = inf {c : F (c) \geq ψ}$ and the empirical quantiles $Q_{n} (ψ) = inf {c : F_{n} (c) \geq ψ}$ ⁠. The first result follows from the theory of empirical quantile processes.

Lemma B.5

Suppose $F$ is regular (Definition 5.1). Then, for all $ζ > 0$ ⁠,

$n^{1 / 2} [F_{n} {Q (ψ)} - ψ]$ converges in distribution on $D [0, 1]$ to a Brownian bridge;
$sup_{0 \leq ψ \leq 1} | n^{1 / 2} [F {Q_{n} (ψ)} - ψ] + n^{1 / 2} [F_{n} {Q (ψ)} - ψ] | \overset{a . s .}{=} o (n^{ζ - 1 / 4})$ ⁠.

Proof.

$(a)$ This is Billingsley (1968, Theorem 16.4).

$(b)$ Let $D (ψ) = f {Q (ψ)} n^{1 / 2} {Q_{n} (ψ) - Q (ψ)}$ and write the object of interest as the sum of $n^{1 / 2} [F {Q_{n} (ψ)} - ψ] - D (ψ)$ and $n^{1 / 2} [F_{n} {Q (ψ)} - ψ] + D (ψ)$ ⁠. These two terms are $o (n^{ζ - 1 / 4})$ $a . s$ ⁠. by Csörgö (1983, Corollaries 6.2.1, 6.2.2), uniformly in ψ. □

We need the following Glivenko–Cantelli result.

Lemma B.6

Let $ε_{i}$ be i.i.d. continuous, positive random variables with $E | ε_{i} |^{p} < \infty$ ⁠. Then $sup_{c > 0} | F_{n}^{p} (c) - {\bar{F}}^{p} (c) | = o_{P} (1)$ ⁠.

Proof.

We note that ${\bar{F}}^{p}$ is nondecreasing with ${\bar{F}}^{p} (\infty) < \infty$ ⁠. Since ${\bar{F}}^{p}$ is continuous, then for any $δ > 0$ exists a finite integer $K \in N$ and chaining points $- \infty = c_{0} < c_{1} < \dots < c_{K} = \infty$ so that $max_{1 \leq k \leq K} {{\bar{F}}^{p} (c_{k}) - {\bar{F}}^{p} (c_{k - 1})} \leq δ$ ⁠.

Since

F_{n}^{p}

and

{\bar{F}}^{p}

are both nondecreasing we get for

c_{k - 1} < c \leq c_{k}

the bounds

\begin{aligned} F_{n}^{p} (c) - {\bar{F}}^{p} (c) & \leq {F_{n}^{p} (c_{k}) - {\bar{F}}^{p} (c_{k})} + {{\bar{F}}^{p} (c_{k}) - {\bar{F}}^{p} (c_{k - 1})}, \\ F_{n}^{p} (c) - {\bar{F}}^{p} (c) & \geq {F_{n}^{p} (c_{k - 1}) - {\bar{F}}^{p} (c_{k - 1})} - {{\bar{F}}^{p} (c_{k}) - {\bar{F}}^{p} (c_{k - 1})} . \end{aligned}

In combination,

| F_{n}^{p} (c) - {\bar{F}}^{p} (c) | \leq max_{k - 1, k} | F_{n}^{p} (c_{k}) - {\bar{F}}^{p} (c_{k}) | + {{\bar{F}}^{p} (c_{k}) - {\bar{F}}^{p} (c_{k - 1})} .

The last term is bounded by

δ

⁠, so that

sup_{c > 0} | F_{n}^{p} (c) - {\bar{F}}^{p} (c) | \leq max_{1 \leq k \leq K} | F_{n}^{p} (c_{k}) - {\bar{F}}^{p} (c_{k}) | + δ .

For each k then

F_{n}^{p} (c_{k}) - {\bar{F}}^{p} (c_{k}) = o_{P} (1)

by the Law of Large Numbers, requiring

ε_{i}

to be i.i.d. with

E | ε_{i} |^{p} < \infty

⁠. Since K is finite then the maximum over k also vanishes almost surely. By choosing

δ

sufficiently small the overall bound is seen to be

o_{P} (1)

⁠. □

The next result is inspired by Johansen and Nielsen (2016a, Lemma D.11).

Lemma B.7

Let $p \in N_{0}$ ⁠. Suppose $ε_{i}$ is positive, regular and $E ε_{i}^{q} < \infty$ for some $q > 2 p$ ⁠. Then, $sup_{1 / (n + 1) \leq ψ \leq n / (n + 1)} | F_{n}^{p} {Q_{n} (ψ)} - {\bar{F}}^{p} {Q (ψ)} | = o_{P} (1)$ ⁠.

Proof.

For

p = 0

⁠, then

ϕ_{n} = n^{1 / 2} [F_{n}^{p} {Q_{n} (ψ)} - {\bar{F}}^{p} {Q (ψ)}]

satisfies

ϕ_{n} = n^{1 / 2} [F {Q_{n} (ψ)} - ψ]

⁠. Lemma B.5 shows that

ϕ_{n}

converges in distribution to a Brownian bridge as a process in ψ. By Billingsley (1968, p. 142–143) then

sup_{0 \leq ψ \leq 1} ϕ_{n}

converges in distribution so that

sup_{0 \leq ψ \leq 1} ϕ_{n} = O_{P} (1)

and the result follows. For

p \in N

⁠, add and subtract

{\bar{F}}_{n}^{p} {Q_{n} (ψ)}

to get

F_{n}^{p} {Q_{n} (ψ)} - {\bar{F}}^{p} {Q (ψ)} = [F_{n}^{p} {Q_{n} (ψ)} - {\bar{F}}^{p} {Q_{n} (ψ)}] + [{\bar{F}}^{p} {Q_{n} (ψ)} - {\bar{F}}^{p} {Q (ψ)}] .

Term 1. This is

o_{P} (1)

since

sup_{c > 0} | F_{n}^{p} (c) - {\bar{F}}^{p} (c) | = o_{P} (1)

by Lemma B.6.

Term 2. Write

S_{n} (ψ) = {\bar{F}}^{p} {Q_{n} (ψ)} - {\bar{F}}^{p} {Q (ψ)}

as integral. First, change variable

u = F (c)

⁠,

d u = f (c) d c

⁠, so that

c = Q (u)

⁠. Then apply the mean value theorem, so that

S_{n} (ψ) = \int_{Q (ψ)}^{Q_{n} (ψ)} c^{p} f (c) d c = \int_{ψ}^{F {Q_{n} (ψ)}} {Q (u)}^{p} d u = {Q (ψ^{*})}^{p} n^{- 1 / 2} ϕ_{n} .

for an intermediate point

ψ^{*}

so that

Q (ψ^{*})

belongs to the interval from

Q (ψ)

to

Q_{n} (ψ)

⁠. Since

sup_{0 \leq ψ \leq 1} ϕ_{n} = O_{P} (1)

⁠, we must show that

{Q (ψ^{*})}^{p} = o_{P} (n^{1 / 2})

⁠. It suffices to show that

Q (ψ)

and

Q_{n} (ψ)

are

o_{P} {n^{1 / (2 p)}}

for

ψ = n / (n + 1)

⁠.

Consider $Q_{n} (ψ)$ ⁠. Write $Q_{n} (ψ) = max_{1 \leq i \leq n} ε_{i}$ ⁠. Lemma B.2 shows $Q_{n} (ψ) = o_{P} {n^{1 / (2 p)}}$ for $2 p < q$ since $E ε_{i}^{q} < \infty$ by assumption.

Consider $Q (ψ)$ ⁠. Bound $| Q (ψ) | \leq c_{n}$ ⁠, where $c_{n}$ satisfies $(n + 1)^{- 1} = P (ε_{1} > c_{n})$ ⁠. We must show $c_{n} = o {n^{1 / (2 p)}}$ ⁠. It suffices that $c_{n}^{q} = O (n)$ for $q > 2 p$ ⁠. By the Markov inequality $P (| ε_{1} | > c_{n}) \leq c_{n}^{- q} E ε_{i}^{q}$ ⁠, so that $c_{n}^{q} \leq (n + 1) / E ε_{i}^{q} = O (n)$ ⁠. □

B.5 More ‘outliers’ than ‘good’ observations

The next lemma is needed when there are more than half of the observations are ‘outliers’. As ${\hat{σ}}_{δ}^{2}$ is not diverging, additional regularity conditions are needed to ensure that ${\hat{σ}}_{δ}^{2} > {\hat{σ}}_{δ_{n}}^{2}$ ⁠.

Lemma B.8

Suppose Assumption 5.2(i) holds. Let $1 \leq \bar{ω} = (1 - ρ) (1 - λ) / λ < \infty$ ⁠. Recall ${\bar{s}}_{n} = | ε_{(δ_{n} + h_{n})} |^{- 1 / 2} h_{n}$ ⁠, from (B2). Then, conditional on $δ_{n}$ ⁠, an $ϵ > 0$ exists so that $min_{h_{n} - {\bar{s}}_{n} \leq s \leq \bar{n}} ({\hat{σ}}_{δ_{n} + s}^{2} - {\hat{σ}}_{δ_{n}}^{2}) \geq ϵ + o_{P} (1)$ for large n.

Proof.

The errors $ε_{(δ_{n} + i)}$ are standard normal order statistics for $1 \leq i \leq h_{n}$ and $ε_{(δ_{n} + h_{n} + j)} = ε_{(δ_{n} + h_{n})} + {\bar{ε}}_{(j)}$ for $1 \leq j \leq \bar{n}$ ⁠, where ${\bar{ε}}_{j}$ is $\bar{G}$ -distributed. We let ${\bar{ε}}_{0} = 0$ ⁠.

It suffices to show that ${\hat{σ}}_{δ_{n} + s}^{2} / σ^{2} \geq 1 + ϵ + o_{P} (1)$ uniformly in s, since ${\hat{σ}}_{δ_{n}}^{2} / σ^{2} = 1 + o_{P} (1)$ by the Law of Large Numbers using Assumption 5.1(i) applied to the sample variance of the ‘good’ errors. We consider separately the cases $h_{n} \leq s \leq \bar{n}$ and $h_{n} - {\bar{s}}_{n} \leq s < h_{n}$ ⁠.

1. Consider $h_{n} \leq s \leq \bar{n}$ ⁠. In this case, ${\hat{σ}}_{δ_{n} + s}^{2}$ is the sample variance of $ε_{(δ_{n} + s + j)} = ε_{(δ_{n} + h_{n})} + {\bar{ε}}_{(s - h_{n} + j)}$ for $1 \leq j \leq h_{n}$ ⁠. Sample variances are invariant to the location, so that ${\hat{σ}}_{δ_{n} + s}^{2} = h^{- 1} \sum_{j = 1}^{h} {\bar{ε}}_{(s - h_{n} + j)}^{2} - {h^{- 1} \sum_{j = 1}^{h} {\bar{ε}}_{(s - h_{n} + j)}}^{2}$ ⁠. Let $A_{s / \bar{n}} = {s / \bar{n} - {\bar{ω}}^{- 1} < \bar{G} ({\bar{ε}}_{1}) \leq s / \bar{n}}$ ⁠.

We argue that

min_{h_{n} \leq s \leq \bar{n}} {\hat{σ}}_{δ_{n} + s}^{2} / σ^{2} \geq \bar{v}

⁠, where

\bar{v} = min_{{\bar{ω}}^{- 1} \leq ς \leq 1} Var ({\bar{ε}}_{1} | A_{ς})

⁠. This suffices, as

\bar{v} > 1

by Assumption 5.2(i). Write

\sum_{j = 1}^{h_{n}} {\bar{ε}}_{(s - h_{n} + j)}^{p} = \sum_{k = 1}^{\bar{n}} {\bar{ε}}_{k}^{p} 1_{{{\bar{ε}}_{(s - h_{n})} < {\bar{ε}}_{k} \leq {\bar{ε}}_{(s)}}}

and let

{\bar{G}}_{n}^{p} (c) = {\bar{n}}^{- 1} \sum_{i = 1}^{\bar{n}} {\bar{ε}}_{i}^{p} 1_{(ε_{i} \leq c)}

and

{\bar{G}}_{n}^{- 1} (ψ) = inf {c : \bar{G} (c) \geq ψ}

⁠, so that

{\bar{ε}}_{(k)} = {\bar{G}}_{n}^{- 1} (k / \bar{n})

⁠. Then,

{\bar{n}}^{- 1} \sum_{j = 1}^{h_{n}} {\bar{ε}}_{(s - h_{n} + j)}^{p} = {\bar{G}}_{n}^{p} {{\bar{G}}_{n}^{- 1} (s / \bar{n})} - {\bar{G}}_{n}^{p} [{\bar{G}}_{n}^{- 1} {(s - h_{n}) / \bar{n}}] .

Apply Lemma B.7 with

F = \bar{G}

⁠,

n = \bar{n}

⁠, requiring the

4 +

moments and regularity of Assumption 5.2(i), so that, uniformly in

h_{n} \leq s \leq \bar{n}

⁠,

{\bar{n}}^{- 1} \sum_{j = 1}^{h_{n}} {\bar{ε}}_{(s - h_{n} + j)}^{p} = E {\bar{ε}}_{1}^{p} 1_{{s / \bar{n} - h_{n} / \bar{n} < \bar{G} ({\bar{ε}}_{1}) \leq s / \bar{n}}} + o_{P} (1) .

Now,

h_{n} / \bar{n} \to {\bar{ω}}^{- 1}

⁠, where

\bar{ω} \geq 1

by construction. Then,

E {\bar{ε}}_{1}^{p} 1_{{s / \bar{n} - h_{n} / \bar{n} < \bar{G} ({\bar{ε}}_{1}) \leq s / \bar{n}}} = E {\bar{ε}}_{1}^{p} 1_{A_{s / \bar{n}}} + o_{P} (1)

uniformly in

h_{n} \leq s \leq \bar{n}

⁠. Noting that

h_{n} / \bar{n} = {\bar{n}}^{- 1} \sum_{j = 1}^{h_{n}} {\bar{ε}}_{(s + j)}^{0} = E 1_{A_{s / \bar{n}}} + o_{P} (1)

⁠, we get

h_{n}^{- 1} \sum_{j = 1}^{h_{n}} {\bar{ε}}_{(s + j)}^{p} = \frac{E {\bar{ε}}_{1}^{p} 1_{A_{s / \bar{n}}}}{E 1_{A_{s / \bar{n}}}} + o_{P} (1) = E ({\bar{ε}}_{1}^{p} | A_{s / \bar{n}}) + o_{P} (1),

so that

{\hat{σ}}_{δ_{n} + s}^{2} / σ^{2} = E ({\bar{ε}}_{1}^{2} | A_{s / \bar{n}}) + o_{P} (1) - {E ({\bar{ε}}_{1} | A_{s / \bar{n}}) + o_{P} (1)}^{2}

⁠. Since

E {\bar{ε}}_{1} 1_{A_{s / \bar{n}}} \leq E {\bar{ε}}_{1} < \infty

and

E 1_{A_{s / \bar{n}}} = {\bar{ω}}^{- 1} > 0

uniformly in s, we get

E ({\bar{ε}}_{1} | A_{s}) \leq \bar{ω} E {\bar{ε}}_{1} < \infty

⁠. Thus,

{\hat{σ}}_{δ_{n} + s}^{2} / σ^{2} = E ({\bar{ε}}_{1}^{2} | A_{s / \bar{n}}) - {E ({\bar{ε}}_{1} | A_{s / \bar{n}})}^{2} + o_{P} (1) = Var ({\bar{ε}}_{1} | A_{s / \bar{n}}) + o_{P} (1) .

Since the errors are uniform in s, we get

min_{h_{n} \leq s \leq \bar{n}} {\hat{σ}}_{δ_{n} + s}^{2} / σ^{2} \geq \bar{v} + o_{P} (1)

as desired.

2. Consider

h_{n} - {\bar{s}}_{n} \leq s < h_{n}

where

{\bar{s}}_{n} = (2 \log h_{n})^{- 1 / 4} h_{n}

⁠, see (B2). In this case, we have

h_{n} - {\bar{s}}_{n}

‘outliers’ and

{\bar{s}}_{n}

‘good’ observations. Expand,

{\hat{σ}}_{δ_{n} + s}^{2} / σ^{2} = A_{n} = A_{n 1} + A_{n 2} + A_{n 3} + 2 A_{n 4},

(B7)

see Online Supplementary Material, Section D in Supplementary material, where

\begin{aligned} A_{n 1} & = h_{n}^{- 1} \sum_{i = s + 1}^{h_{n}} {ε_{(δ_{n} + i)} - ε_{(δ_{n} + h_{n})}}^{2} - {[h_{n}^{- 1} \sum_{i = s + 1}^{h_{n}} {ε_{(δ_{n} + i)} - ε_{(δ_{n} + h_{n})}}]}^{2}, \\ A_{n 2} & = {(\frac{s}{h_{n}})}^{2} [\frac{1}{s} \sum_{j = 1}^{s} {\bar{ε}}_{(j)}^{2} - {\frac{1}{s} \sum_{j = 1}^{s} {\bar{ε}}_{(j)}}^{2}], A_{n 3} = \frac{s}{h_{n}} (1 - \frac{s}{h_{n}}) \frac{1}{s} \sum_{j = 1}^{s} {\bar{ε}}_{(j)}^{2} \\ A_{n 4} & = [h_{n}^{- 1} \sum_{i = s + 1}^{h_{n}} {ε_{(δ_{n} + h_{n})} - ε_{(δ_{n} + i)}}] {h_{n}^{- 1} \sum_{j = 1}^{s} {\bar{ε}}_{(j)}} . \end{aligned}

We note that

A_{n 1}, A_{n 2}, A_{n 3}, A_{n 4} \geq 0

⁠. Therefore,

{\hat{σ}}_{δ_{n} + s}^{2} / σ^{2} \geq A_{n 2}

⁠.

We argue, as in part 1, that $A_{n 2} \geq \bar{v} + o_{P} (1)$ ⁠. Indeed, since $1 > s / h_{n} \geq 1 - {\bar{s}}_{n} / h_{n} \to 1$ ⁠, then ${\bar{n}}^{- 1} \sum_{j = 1}^{h_{n} - {\bar{s}}_{n}} {\bar{ε}}_{(j)}^{p} \leq {\bar{n}}^{- 1} \sum_{j = 1}^{s} {\bar{ε}}_{(j)}^{p} \leq {\bar{n}}^{- 1} \sum_{j = 1}^{h_{n}} {\bar{ε}}_{(j)}^{p}$ ⁠. Both bounds equal $E {\bar{ε}}_{1}^{p} 1_{A_{{\bar{ω}}^{- 1}}} + o_{P} (1)$ ⁠, so that we can proceed as in part 1. □

Proof of Theorem 5.3.

We will show that $\hat{s} = \hat{δ} - δ_{n} = o_{P} (h_{n}^{α})$ for any $α > 0$ ⁠. It suffices to consider the case where $P (\hat{δ} - δ_{n} > h_{n}^{α}) \to 0$ and where $ρ < 1$ as remarked in Section 5.1. We consider $λ, ρ < 1$ so that $\bar{ω} = (1 - ρ) (1 - λ) / λ$ satisfies $0 < \bar{ω}$ ⁠. The case $\bar{ω} < 1$ was covered in the proof of Theorem 5.2. Thus, suppose $\bar{ω} \geq 1$ ⁠.

Recall ${\underline{s}}_{n}, {\bar{s}}_{n}$ from (B2). In particular, $h_{n}^{α} > {\underline{s}}_{n}$ for large n, so that $P (\hat{s} > h_{n}^{α}) \leq P (\hat{s} \geq {\underline{s}}_{n})$ ⁠. We show, that the latter probability vanishes. Note that $\hat{s} \leq \bar{n}$ ⁠.

We have that $\hat{s}$ is the minimizer to ${\hat{σ}}_{δ_{n} + s}^{2} - {\hat{σ}}_{δ_{n}}^{2}$ ⁠, which is zero for $s = δ - δ_{n} = 0$ ⁠. The Lemmas B.4, B.8 using Assumption 5.2(i) show that ${\hat{σ}}_{δ_{n} + s}^{2} - {\hat{σ}}_{δ_{n}}^{2}$ ⁠, asymptotically, has a uniform, positive lower bound on each of the intervals ${\underline{s}}_{n} \leq \hat{s} \leq h_{n} - {\bar{s}}_{n}$ and $h_{n} - {\bar{s}}_{n} \leq \hat{s} \leq \bar{n}$ ⁠. Thus, ${\hat{σ}}_{δ_{n} + s}^{2} - {\hat{σ}}_{δ_{n}}^{2}$ is bounded away from zero on $s \geq {\underline{s}}_{n}$ so that $P (\hat{s} \geq {\underline{s}}_{n}) \to 0$ ⁠.

A similar argument applies for $\hat{δ} - δ_{n} < - h_{n}^{α}$ using Assumption 5.2(ii). The limiting results for $\hat{μ}, {\hat{σ}}^{2}$ then follow as in the proof of Theorem 5.2. □

B.6 The OLS estimator in the LTS model

Proof of Theorem 5.4.

The sample average satisfies

(\bar{μ} - μ) / σ = n^{- 1} \sum_{i = 1}^{n} ε_{i}

⁠. Since

ρ = 0

there are only right ‘outliers’. Separate the ‘good’ observations

ε_{i}

for

i = 1, \dots, h_{n}

with maximum

ε_{(h_{n})}

and ‘outliers’

ε_{h_{n} + j} = ε_{(h_{n})} + {\bar{ε}}_{j}

for

j = 1, \dots, n - h_{n}

⁠, to get

\frac{\bar{μ} - μ}{σ} = \frac{1}{n} \sum_{i = 1}^{h_{n}} ε_{i} + (\frac{n - h_{n}}{n}) ε_{(h_{n})} + \frac{1}{n} \sum_{j = 1}^{n - h_{n}} {\bar{ε}}_{j} .

(B8)

Under Assumption 5.1(i,iii) the first sum vanishes by the Law of Large Numbers, while

ε_{(h_{n})} \to \infty

in probability. Further,

(n - h_{n}) / n \to 1 - λ

⁠. The second sum converges to

(1 - λ) μ_{G}

by the Law of Large Numbers. Combine to see that

| ε_{(h_{n})} |^{- 1} ({\hat{μ}}_{O L S} - μ) / σ = 1 - λ + o_{P} (1)

⁠. □

References

Alfons

A.

,

Croux

C.

, &

Gelper

S.

(

2013

).

Sparse least trimmed squares regression for analyzing high-dimensional large data sets

.

The Annals of Applied Statistics

,

7

(

1

),

226

–

248

. https://doi.org/10.1214/12-AOAS575

. https://doi.org/10.1016/j.jkss.2010.02.007

Atkinson

A. C.

,

Riani

M.

, &

Cerioli

A.

(

2010

).

The forward search: Theory and data analysis (with discussion)

.

Journal of the Korean Statistical Society

,

39

(

2

),

117

–

134

Berenguer-Rico

V.

, &

Nielsen

B.

(

2017

).

Marked and weighted empirical processes of residuals with applications to robust regressions

.

(Discussion Paper 841)

.

Oxford

:

Dept. Econ.

.

Billingsley

P

(

1968

).

Convergence of probability measures

.

John Wiley & Sons

.

. https://doi.org/10.1214/aos/1176345702

Butler

R. W.

(

1982

).

Nonparametric interval and point prediction using data trimmed by a Grubbs-type outlier rule

.

The Annals of Statistics

,

10

(

1

),

197

–

204

. https://doi.org/10.1214/aos/1176349264

Butler

R. W.

,

Davies

P. L.

, &

Jhun

M.

(

1993

).

Asymptotics for the minimum covariance determinant estimator

.

The Annals of Statistics

,

21

(

3

),

1385

–

1400

. https://doi.org/10.2202/1941-1928.1097

Castle

J. L.

,

Doornik

J. A.

, &

Hendry

D. F.

(

2011

).

Evaluating automatic model selection

.

Journal of Time Series Econometrics

,

3

(

1

), 1–33.

Article 8

. https://doi.org/10.1016/j.ecosta.2021.05.004

Castle

J. L.

,

Doornik

J. A.

, &

Hendry

D. F.

(

2023

).

Robust discovery of regression models

.

Econometrics and Statistics

,

26

,

31

–

51

. https://doi.org/10.1016/j.jspi.2005.05.004

Čížek

P.

(

2005

).

Least trimmed squares in nonlinear regression under dependence

.

Journal of Statistical Planning and Inference

,

136

(

11

),

3967

–

3988

. https://doi.org/10.1080/03610929208830889

Croux

C.

, &

Rousseeuw

P. J.

(

1992

).

A class of high-breakdown scale estimators based on subranges

.

Communications in Statistics - Theory and Methods

,

21

(

7

),

1935

–

1951

. https://doi.org/10.1111/sjos.12207

Csörgö

M.

(

1983

).

Quantile processes with statistical applications

(Vol.

42

).

CBMS-NFS Regional Conference Series in Applied Mathematics

.

SIAM

.

Dodge

Y.

, &

Jurečková

J

(

2000

).

Adaptive regression

.

Springer

.

Doornik

J. A.

(

2016

).

An example of instability: Discussion of the paper by Søren Johansen and Bent Nielsen

.

Scandinavian Journal of Statistics

,

43

,

2

357

–

359

. https://doi.org/10.1098/rsta.1922.0009

Fisher

R. A.

(

1922

).

On the mathematical foundations of theoretical statistics

.

Philosophical Transactions of the Royal Society of London. Series A

,

222

(

602

),

309

–

368

Gallegos

M. T.

, &

Ritter

G.

(

2009

).

Trimmed ML estimation of contaminated mixtures

.

Sankhyā: The Indian Journal of Statistics, Series A

,

71

(

2

),

164

–

220

.

. https://doi.org/10.1111/sjos.12446

Gissibl

N.

,

Klüppelberg

C.

, &

Lauritzen

S.

(

2021

).

Identifiability and estimation of recursive max-linear models

.

Scandinavian Journal of Statistics

,

48

(

1

),

188

–

211

. https://doi.org/10.1214/aoms/1177729749

Gumbel

E. J.

, &

Keeney

R. D.

(

1950

).

The extremal quotient

.

The Annals of Mathematical Statistics

,

21

(

4

),

523

–

538

. https://doi.org/10.1214/aoms/1177693054

Hald

A.

(

2007

).

A history of parametric statistical inference from Bernoulli to Fisher, 1713–1935

.

Springer

.

Hampel

F. R.

(

1971

).

A general qualitative definition of robustness

.

The Annals of Mathematical Statistics

,

42

(

6

),

1887

–

1896

. https://doi.org/10.1016/S0167-9473(98)00082-6

Hawkins

D. M.

, &

Olive

D. J.

(

1999

).

Improved feasible solution algorithms for high break down estimation

.

Computational Statistics & Data Analysis

,

30

(

1

),

1

–

11

Hendry

D. F.

, &

Doornik

J. A.

(

2014

).

Empirical model discovery and theory evaluation: Automatic selection methods in econometrics

.

MIT Press

.

Hendry

D. F.

, &

Santos

C.

(

2010

). An automatic test of super exogeneity. In

M. V.

Watson

,

T.

Bollerslev

, &

J.

Russell

(Eds.),

Volatility and time series econometrics

.

Oxford University Press

.

. https://doi.org/10.1080/01621459.1994.10476456

Hössjer

O.

(

1994

).

Rank-based estimates in the linear model with high breakdown point

.

Journal of the American Statistical Association

,

89

(

425

),

149

–

158

. https://doi.org/10.1214/aoms/1177703732

Huber

P. J.

(

1964

).

Robust estimation of a location parameter

.

The Annals of Mathematical Statistics

,

35

(

1

),

73

–

101

. https://doi.org/10.1086/190559

Huber

P. J.

, &

Ronchetti

E. M.

(

2009

).

Robust statistics

(2nd ed.).

John Wiley & Sons

.

Humphreys

R. M.

(

1978

).

Studies of luminous stars in nearby galaxies. I. Supergiants and O stars in the Milky Way

.

Astrophysical Journal Supplement Series

,

38

(

4

),

309

–

350

Johansen

S.

(

1978

).

The product limit estimator as maximum likelihood estimator

.

Scandinavian Journal of Statistics

,

5

(

4

),

195

–

199

.

. https://doi.org/10.3150/14-BEJ689

Johansen

S.

, &

Nielsen

B.

(

2016a

).

Analysis of the forward search using some new results for martingales and empirical processes

.

Bernoulli

,

22

(

2

),

1131

–

1183

.

Corrigendum (2019) 25, 3201

. https://doi.org/10.1111/sjos.12174

Johansen

S.

, &

Nielsen

B.

(

2016b

).

Asymptotic theory of outlier detection algorithms for linear time series regression models (with discussion)

.

Scandinavian Journal of Statistics

,

43

(

2

),

321

–

348

. https://doi.org/10.1214/aoms/1177728066

Kiefer

J.

, &

Wolfowitz

J.

(

1956

).

Consistency of the maximum likelihood estimator in the presence of infinitely many incidental parameters

.

The Annals of Mathematical Statistics

,

27

(

4

),

887

–

906

Knight

K.

(

2017

).

On the asymptotic distribution of the l_∞ estimator in linear regression

.

Mimeo

.

Leadbetter

M. R.

,

Lindgreen

G.

, &

Rootzén

H.

(

1982

).

Extremes and related properties of random sequences and processes

.

Springer

.

R Core Team

(

2020

).

R: A language and environment for statistical computing

.

Vienna, Austria

.

. https://doi.org/10.1214/13-STS437

Riani

M.

,

Atkinson

A. C.

, &

Perrotta

D.

(

2014

).

A parametric framework for the comparison of methods of very robust regression

.

Statistical Science

,

29

(

1

),

128

–

143

. https://doi.org/10.1080/01621459.1984.10477105

Rousseeuw

P. J.

(

1984

).

Least median of squares regressions

.

Journal of the American Statistical Association

,

79

(

388

),

871

–

880

Rousseeuw

P. J.

(

1985

). Least median of squares regressions. In

W.

Grossmann

,

G.

Pflug

,

I.

Vincze

, &

W.

Wertz

(Eds.),

Mathematical statistics and applications

(pp.

283

–

297

).

Reidel

.

Rousseeuw

P. J.

, &

Hubert

M.

(

1997

). Recent developments in PROGRESS. In

Y.

Dodge

(Ed.),

L₁-statistical procedures and related topics (Vol. 31) of Lecture notes–monograph series

(pp.

201

–

214

).

Institute of Mathematical Statistics

.

. https://doi.org/10.1016/j.ecosta.2018.05.001

Rousseeuw

P. J.

, &

Leroy

A. M.

(

1987

).

Robust regression and outlier detection

.

John Wiley & Sons

.

Rousseeuw

P. J.

,

Perrotta

D.

,

Riani

M.

, &

Hubert

M.

(

2019

).

Robust monitoring of time series with application to fraud detection

.

Econometrics and Statistics

,

9

,

108

–

121

. https://doi.org/10.2307/3315231

Rousseeuw

P. J.

, &

van Driessen

K.

(

2000

). An algorithm for positive-breakdown regression based on concentration steps. In

W.

Gaul

,

O.

Opitz

, &

M.

Schader

(Eds.),

Data analysis: Scientific modeling and practical application

(pp.

335

–

346

).

Springer Verlag

.

Scholz

F. W.

(

1980

).

Towards a unified definition of maximum likelihood

.

Canadian Journal of Statistics

,

8

(

2

),

193

–

203

U.S. Bureau of the Census

(

1992

).

1990 census of population: General population statistics, CP-1-1

.

Washington DC

.

U.S. Bureau of the Census

(

1994

).

Statistical abstract of the United States: 1994

.

Washington DC

.

. https://doi.org/10.1007/BF00648931

Vansina

F.

, &

De Grève

J. P.

(

1982

).

Close binary systems before and after mass transfer. III. Spectroscopic binaries

.

Astrophysics and Space Science

,

87

(

1-2

),

377

–

401

Víšek

J. A.

(

2006

).

The least trimmed squares; part III: Asymptotic normality

.

Kybernetika

,

42

(

2

),

203

–

224

.

Wooldridge

J. M.

(

2015

).

Introductory econometrics

(6th ed.).

Cengage Learning

.

. https://doi.org/10.1214/aos/1176350366

Yohai

V. J.

(

1987

).

High breakdown-point and high efficiency robust estimates for regression

.

The Annals of Statistics

,

15

(

2

),

642

–

656