-
PDF
- Split View
-
Views
-
Cite
Cite
Zoë L de Beurs, Andrew Vanderburg, Erica Thygesen, Joseph E Rodriguez, Xavier Dumusque, Annelies Mortier, Luca Malavolta, Lars A Buchhave, Christopher J Shallue, Sebastian Zieba, Laura Kreidberg, John H Livingston, R D Haywood, David W Latham, Mercedes López-Morales, André M Silva, Characterization of K2-167 b and CALM, a new stellar activity mitigation method, Monthly Notices of the Royal Astronomical Society, Volume 529, Issue 2, April 2024, Pages 1047–1066, https://doi.org/10.1093/mnras/stae207
- Share Icon Share
ABSTRACT
We report precise radial velocity (RV) observations of HD 212657 (= K2-167), a star shown by K2 to host a transiting sub-Neptune-sized planet in a 10 d orbit. Using Transiting Exoplanet Survey Satellite (TESS) photometry, we refined the planet parameters, especially the orbital period. We collected 74 precise RVs with the HARPS-N spectrograph between August 2015 and October 2016. Although this planet was first found to transit in 2015 and validated in 2018, excess RV scatter originally limited mass measurements. Here, we measure a mass by taking advantage of reductions in scatter from updates to the HARPS-N Data Reduction System (2.3.5) and our new activity mitigation method called CCF Activity Linear Model (CALM), which uses activity-induced line shape changes in the spectra without requiring timing information. Using the CALM framework, we performed a joint fit with RVs and transits using exofastv2 and find Mp = |$6.3_{-1.4}^{+1.4}$| |$\, M_{\hbox{$\oplus $}}$| and Rp = |$2.33^{+0.17}_{-0.15}$| |$\, R_{\hbox{$\oplus $}}$|, which places K2-167 b at the upper edge of the radius valley. We also find hints of a secondary companion at a ∼22 d period, but confirmation requires additional RVs. Although characterizing lower mass planets like K2-167 b is often impeded by stellar variability, these systems especially help probe the formation physics (i.e. photoevaporation, core-powered mass-loss) of the radius valley. In the future, CALM or similar techniques could be widely applied to FGK-type stars, help characterize a population of exoplanets surrounding the radius valley, and further our understanding of their formation.
1 INTRODUCTION
The study of exoplanets has the ultimate goal of understanding the formation and evolution of planets. In our own Solar system, we have a class of dense smaller rocky planets with iron cores and a class of large gas giants that have extended H/He envelopes. However, the HARPS spectrograph (Mayor et al. 2003) and Kepler mission (Borucki et al. 2010) found hundreds of planets (Mayor et al. 2011; Batalha et al. 2013) that fall between these two categories and with it brought many new questions: What is the composition of these in-between planets? Does liquid water exist on their surface? Do they have gaseous envelopes? If so, do their envelopes mainly consist of H/He? How did this class of planets form and evolve? And why do we not see any examples of them in our own Solar system? To answer these questions, we need to characterize these types of systems and constrain their possible compositions through precise radius and mass measurements.
Precise radial velocity (RV) observations revealed that transiting planets larger than 1.6 |$\, R_{\hbox{$\oplus $}}$| are typically inconsistent with rocky compositions (Rogers 2015), so these intermediate planets are often sub-divided into two classes super-Earths (|$1 \, R_{\hbox{$\oplus $}}$| ≲ |$R_p \lesssim 1.6 \, R_{\hbox{$\oplus $}}$|) and sub-Neptunes (|$1.6 \, R_{\hbox{$\oplus $}}\lesssim R_p \lesssim 4 \, R_{\hbox{$\oplus $}}$|). A gap in the occurrence of planets with radii between these two populations was predicted by several researchers based on the effect of photoevaporation (Lopez & Fortney 2013; Owen & Wu 2013; Jin et al. 2014). This gap is often described as the ‘radius valley’ or ‘radius gap’ and describes a paucity of planets between 1.5–2 |$\, R_{\hbox{$\oplus $}}$|. This bimodality in the occurence rates of small planets has been detected (Fulton et al. 2017) and further refined and characterized using parallax and asteroseismic measururements (Berger et al. 2018; Van Eylen et al. 2018; Petigura 2020).
Recently, Petigura et al. (2022) investigated the location of the radius gap as a function of stellar mass, metallicity, age, and orbital period and found that the radius gap location follows Rp ∝ Pm, where m = −0.10 ± 0.03. Dattilo, Batalha & Bryson (2023) also found that the location of the radius gap varies depending on stellar type for FGK stars. However, the exact underlying physics that explains the origins of the radius gap is still somewhat debated (e.g. Ginzburg, Schlichting & Sari 2018; Gupta & Schlichting 2020; Lee, Karalis & Thorngren 2022) and determining which mechanism is primarily responsible requires precise characterization of a larger population of exoplanets (Rogers et al. 2021). Precise radius and mass measurements of planets at the boundaries of the radius valley can especially help refine our understanding of the physics by constraining the bulk planetary composition and inferring either the presence or absence of a thick volatile envelope.
Unfortunately, the difficulty of modelling stellar variability in RV observations has limited the study of these important systems. In particular, spurious RV shifts due to stellar activity signals can mask or mimic planetary signals. In the past decade, stellar activity has been identified as the limiting factor in RV precision across the HR diagram (National Academies of Sciences, Engineering, and Medicine and others 2018; Haywood et al. 2020), but it is especially problematic for young stellar hosts, including those where planets might still be undergoing photoevaporation. Finding new ways to mitigate stellar activity is critical for our understanding of planetary composition and evolution.
Stellar activity mitigation methods have traditionally focused on using indicators of the level of stellar activity, which can either be focused on measuring the level of magnetic activity on a star (Bonfils et al. 2007; Boisse et al. 2009; Dumusque et al. 2011; Robertson et al. 2014) or focused on detecting the presence (or lack thereof) of spots and plagues on the surface (Figueira et al. 2013; Milbourne et al. 2021; Wise et al. 2022). The activity indicators that correlate with magnetic activity include spectral features such as the emission in the core of Ca ii H&K lines (Saar, Butler & Marcy 1998; Saar & Fischer 2000; Meunier & Lagrange 2013), the Hα line (Bonfils et al. 2007; Robertson et al. 2014), and the Ca infrared triplet. Recently, photospheric unsigned magnetic flux has been shown to be strongly correlated with RV perturbations caused by solar surface activity (Haywood et al. 2022) and new methods for deriving proxies for magnetic flux using least-squares deconvolution similarly show some of the strongest correlations among activity indicators (Lienhard et al. 2023). Beyond these types of activity indicators, properties in cross-correlation function (CCFs) have been used to decorrelate stellar activity from RVs. CCFs are computed by cross-correlating the spectra with a binary mask which contains delta functions at the location of spectral lines. Conceptually, a CCF represents an average of all line shapes in a star’s spectrum and is therefore sensitive to line shape changes that persist in most stellar lines. The line shape changes introduced by stellar activity that cause asymmetry in the CCF are commonly measured using a CCF Bisector Inverse Span (e.g. Queloz et al. 2001; Lovis et al. 2011) or the FWHM of the CCF (e.g. Queloz et al. 2009; Hébrard et al. 2010). Although the bisector and FWHM have the advantage that they can easily be computed and provide only two free parameters in RV models, much of the information present in the CCF is lost by only using these metrics to measure line shape changes. In addition, these traditional stellar activity indicators have been insufficient to reach the sub |$\rm m\, s^{-1}$| RV measurement precision necessary to detect Earth-twins.
There have been recent developments in using advanced flexible models like Gaussian Process (GP) regression to model quasi-periodic stellar activity signals and reveal planet signals (Haywood et al. 2014; Rajpaul et al. 2015; Jones et al. 2017; Delisle et al. 2018; Barragán et al. 2019; Gilbertson et al. 2020; Nicholson & Aigrain 2022; Barragán et al. 2023; Tran et al. 2023). However, even with the current state-of-the-art stellar variability mitigation techniques (Zhao et al. 2022), our RV measurement precision must still improve by an order of magnitude to detect the 10 |$\rm cm\, s^{-1}$| signals induced by Earth-mass exoplanets in the habitable zones of Sun-like stars. To work towards achieving this goal, we can first focus on precise mass measurements of slightly more massive planets like super-Earths and sub-Neptunes. This requires us to continue developing new methods to mitigate the impact of stellar activity on exoplanet mass measurements.
Inspired by the successful implementation of a neural network stellar activity mitigation technique on solar observations (de Beurs et al. 2022) and other uses of machine and deep-learning methods for modelling stellar activity (Colwell, Timmaraju & Wise 2023; Perger et al. 2023), we designed a simplified method called CCF Activity Linear Model (CALM), which can be applied to other stars with smaller RV data sets observed at nighttime. Our method focuses on exploiting the information content present in the CCFs and traces the line shape changes introduced by stellar activity to separate Doppler reflex motion from signals from the star. We have already successfully applied this method to RVs for four stars observed with the EXPRES spectrograph and reduced the scatter by about a factor of two for the more active stars (Zhao et al. 2022). In this paper, we apply this same method to one star observed by HARPS-N: K2-167.
K2-167 b is a planet located just above the radius valley and this transiting sub-Neptune was first found by the K2 mission (Vanderburg et al. 2016; Mayo et al. 2018). The planetary signal was re-detected (Ikwut-Ukwa et al. 2020) by the Transiting Exoplanet Survey Satellite (TESS; Ricker et al. 2015), and later again in its first extended mission (Thygesen et al. 2023). HARPS-N observations of K2-167, a relatively active, yet slowly rotating F7-type star, began in late 2015, but in the intervening years, stellar activity had prevented a confident detection of the planet’s mass. Recently, Bonomo et al. (2023) analysed the K2-167 system as part of a larger sample and reported a mass of |$6.5_{-1.5}^{+1.6}$| |$\, M_{\hbox{$\oplus $}}$|. Here, we perform a more in-depth analysis of the RV time series than this previous work (which was included as part of a catalog paper). Using our CALM framework, we performed a joint fit of the transits and RVs, constraining the radius and mass, and enabling future studies of its composition which may provide insight into its formation.
Our paper is organized as follows. In Section 2, we describe the the spectroscopic observations from HARPS-N and the photometric observations from K2 and TESS. In Section 3, we describe our development of the CALM stellar activity mitigation method, how we prevent overfitting on our data, and introduce a new metric to measure RV scatter. In Section 4, we describe the transit, spectroscopic, and RV analysis and resulting system parameters of K2-167. In Section 5, we discuss the implications of these results and we conclude in Section 6.
2 DATA
2.1 HARPS-N spectroscopy
K2-167 was observed with the HARPS-N Spectrograph on the 3.58-m Telescopio Nazionale Galileo (TNG). Designed specifically to yield precise RV measurements, HARPS-N is a vacuum-enclosed cross-dispersed echelle spectrograph with temperature and pressure stabilization (Cosentino et al. 2012). We collected 76 precise RV observations of K2-167 (Appendix Table A1) with typical integration times of 15 min1 with the HARPS-N spectrograph between August 2015 and October 2016. For the first season of observations (2015 August 20–2015 December 27), a ThAr lamp served as a calibration source. For the second season (2016 June 20–2016 October 12), a stabilized Fabry Perot was used as the calibration source. Most of the time, we obtained only one observation per night, but between July and November 2016, we often obtained two spectra per night, separated by a few hours, in order to average RV variations due to granulation.
The 15 min exposures of K2-167 yielded typical formal measurement uncertainties of 1.4 |$\rm m\, s^{-1}$|. For a bright star like K2-167 (mv = 8.24), we expect this uncertainty to be a combination of instrumental systematics and the photon-limited uncertainty. We performed our analysis on an older version of the HARPS-N pipeline (DRS 3.7) and the latest version of the pipeline (DRS 2.3.5) (Dumusque et al. 2021) and found significantly larger scatter for the older DRS pipeline, which we discuss extensively in Section 3. There were numerous improvements between this older and new DRS to correct for systematics and instrumental effects. In addition, in the new DRS, different stellar masks were used that focus on spectral lines that are less sensitive to stellar activity and to more closely match the stellar type of the targets. In the case of K2-167 (F7-type star), DRS 3.7 uses a K5-type binary mask and the DRS 2.3.5 uses a G9-type binary mask. For both of these pipeline versions, the data reduction was carried out by cross-correlating the observations with one of these binary masks. For the DRS 3.7, this produces a 161 element CCF, whereas for the newer 2.3.5 DRS, the sampling was reduced and the pipelines yields a 49 element CCF. These resulting CCFs are used as an input representation for our CALM method to track the activity-induced line shape changes in the spectra. The DRS measures the radial velocities of each observation by fitting the CCF with a Gaussian function. However, a Gaussian is simple and symmetric and therefore unable to model the small CCF shape changes introduced by stellar activity. Thus, these radial velocities will include contributions from both Keplerian shifts and stellar activity signals. The full HARPS-N RV time series from the new DRS is shown in Fig. 1.

RVs from the HARPS-N old DRS (3.7) in blue and the new DRS (2.3.5) in orange. (a) The RVs are plotted over time. There appears to be a slight upward trend toward the end of the time series. (b) Histogram of the old and new DRS. The new DRS has less scatter than the old DRS.
2.2 Photometry
2.2.1 K2
K2-167 b was observed by K2, Kepler’s extended mission which achieved similar precision to the original Kepler mission after applying systematic corrections (Vanderburg et al. 2016). K2-167 was observed during K2 Campaign 3 from 2014 November 17 to 2015 January 23 in K2’s long-cadence mode (29.4 min exposure times). We used the light curve produced by Mayo et al. (2018). In brief, they extracted the light curve for K2-167 from the target pixel files, which were accessed through the Mikulski Archive for Space Telescopes (MASTs)2. The light curves were reprocessed to simultaneously fit the known planet transit, remove mechanical K2 systematics, and fit stellar variability by using the methods described in Vanderburg & Johnson (2014a) and Vanderburg et al. (2016). Due to the unusual brightness of the host star, Mayo et al. (2018) used larger photometric apertures than their standard analysis for this star. These apertures did not require dilution corrections because K2-167 is sufficiently isolated from other stars or background objects (Ligi et al. 2018). To remove high-frequency variability, we divided the simultaneous-fit light curve by the best fit spline component. The K2 phase-folded transit light curves are shown in the top panel of Fig. 2. For computational efficiency, we trimmed the light curve away from transit for our global fit, and only included a baseline of one transit duration on either side of the full transit (See subsection 4.2).

K2 (top) and TESS (bottom) light curves phase-folded on the period and centred at mid-transit from K2 Campaign 3 and TESS Sectors 2, 28, 42. The purple lines shows the best-fitting transit models.
2.2.2 TESS
K2-167 was pre-selected for two-minute cadence observations by the TESS mission (Ricker et al. 2015) and observed by Camera 1 during TESS Sector 2 from 2018 August 22 to September 20, Sector 28 from 2020 July 31 to 2020 August 25, and Sector 42, from 2021 August 20 to 2021 September 16. The TESS light curve was generated by NASA’s Science Processing Operations Centre (SPOC) pipeline using the Lightkurve package (Lightkurve Collaboration et al. 2018). SPOC received raw data and processed the images to remove systematic errors and extract photometry (Jenkins et al. 2016). Using the SPOC Transiting Planet Search (Jenkins 2002), the light curves are searched for transit events. The Pre-search Data Conditioned Simple Aperture Photometry (PDCSAP) flux uses the optimal TESS aperture to extract the flux, corrects for systematics using PDC, and produces the light curves (Smith et al. 2012; Stumpe et al. 2012, 2014). From these light curves we remove low-frequency trends that correspond to astrophysical variability and remove remaining systematics based on the out-of-transit photometry using keplerspline.3 (Vanderburg & Johnson 2014b; Shallue & Vanderburg 2018). The TESS phase-folded transit light curves are shown in the bottom panel of Fig. 2.
2.2.3 Spitzer
To perform follow-up observations on the K2 transits, one transit of K2-167 b was observed with the Spitzer InfraRed Array Camera (IRAC; Fazio et al. 2004) on 2016 March 1 as part of a larger K2 follow-up programme (Programme ID: 11026; PI Michael Werner). The observations lasted for 10.16 h and were taken with 0.4 s exposures using channel 2 on IRAC (2–4 μm). The observations are further described in detail in Duck et al. (2021) who found that the complete egress was missing due to the uncertainty in scheduling of the observations. Nonetheless, we reduced the data using BiLinearly Interpolated Subpixel Sensitivity (BLISS) mapping (Stevenson et al. 2012) and Pixel Level Decorrelation (Deming et al. 2015; Garhart et al. 2020) and saw a feature consistent with a partial transit, but with low enough SNR that it was not constraining to our model. We therefore exclude the Spitzer observations from our final analysis.
3 RV ANALYSIS
We performed an extensive series of analyses to determine the mass of K2-167 b. The RV time series is shown in Fig. 1 where there appears to be a slight long-term upward trend in the RVs that may be the result of long-term activity cycles. For our activity analyses, we first performed a simple RV analysis where we fitted a Keplerian signal and did not model the stellar variability. We ran Markov Chain Monte Carlo (MCMC) to sample the parameter space and model the Keplerian using radvel’s kepler.rv_drive function4 (Fulton et al. 2018). We use the differential evolution (DE) MCMC package edmcmc5 (Vanderburg 2021), which is an implementation of MCMC that incorporates DE (Ter Braak 2006) to solve the problem of choosing an appropriate scale and orientation for the jumping distribution, enabling a significantly faster fitting process. We computed 5000 MCMC chains, discard the first 25 per cent as burn-in, and required that our parameters converged using the Gelman–Rubin statistic (Gelman & Rubin 1992), where our Gelman–Rubin threshold <1.01. In this analysis, we observed that there is a large difference in RV scatter6 between the two DRS versions (3.95 |$\rm m\, s^{-1}$| and 3.01 |$\rm m\, s^{-1}$| for the old and new DRS, respectively) and that the uncertainties on the estimated semi-amplitude (K) for the Keplerian signal are larger for the old DRS (|$2.0_{-0.7}^{+0.7}$| |$\rm m\, s^{-1}$|) than the new DRS (|$2.0_{-0.5}^{+0.5}$| |$\rm m\, s^{-1}$|). The expected scatter due to instrument systematics (photon noise, drift noise, and wavelength calibration errors) for the both DRS versions is ∼1 |$\rm m\, s^{-1}$|, which is significantly less than the scatter that we measured. For this chromospherically active star K2-167 (|$\textrm {log~R}_{\textrm {HK}}^{^{\prime }} = -4.66$|), stellar variability could be causing a part of this excess scatter (and the relatively large uncertainty in K). Thus, we sought to develop new methods that could model this stellar variability by tracing shape changes in the CCF. For each of our RV analyses, we performed an independent analysis using only the 3.7 DRS and an independent analysis using only the 2.3.5 DRS observations. We included both DRS versions in our RV analyses since we found interesting differences in the effectiveness of our activity mitigation methods for the two versions as discussed extensively in subsection 3.6.
3.1 CALM method
In order to account for the stellar variability we observe in the RV measurements of K2-167, we developed a simplified machine learning model CALM that allows us to separate the stellar activity from the Doppler reflex motion. This simplified method is inspired by the previous implementation of a neural network-based stellar activity mitigation method for solar observations using line shape changes to predict stellar activity signals (de Beurs et al. 2022). To extend this approach to extrasolar observations, a new simplified method was required for two primary reasons:
RV data sets for other stars rarely have as many observations as the solar data set from the HARPS-N Solar Telescope used in de Beurs et al. (2022), which consisted of ∼600 d of observations.
We lack a firm ‘ground-truth’ for the stellar activity signals of other stars. For the sun, we can easily remove the RV contributions of the known planets in the Solar system, but this is not as straightforward for other stars. This is because there may always be unmodelled planetary signals that fall below the detection threshold. These unmodelled planetary signals may add noise to our estimates of the stellar variability signal of a given star.
To overcome these challenges, we made two modifications to our method. First, we greatly simplified the machine learning models for use on stars other than the sun. Instead of using complex and flexible models like our convolutional neural networks, we use a linear regression model with a significantly smaller input representation. In particular, the input representation consists of 5–25 CCF activity indicators7 instead of the entire CCF array, which contains 49 elements for the 2.3.5 DRS and 161 elements for the 3.7 DRS. Essentially, we have sacrificed some fidelity in our stellar activity corrections in exchange for immediate applicability to a much larger sample of stars. Eventually, it may be possible to train a more complex model on the entire ensemble of stars observed, but our simplified approach is a first step towards this goal.
The second modification we made to our technique for use on stars other than the sun involves fitting for planet signals simultaneously. We cannot a priori remove all planetary signals from the data set (since the amplitude and period of such signals are unknown and/or they may be hidden by stellar activity). Thus, we must fit any remaining planetary signals simultaneously with the stellar activity correction. Although this increases the computational difficulty of our method (because we must test many different possible planetary signals to discover new planets), the increase in computational difficulty is offset by the simplicity of our linear regression model. An overview of our simplified approach is shown in Fig. 3.

Schematic of our CALM pipeline. We use the changing shape of spectral features (measured by the changes in the CCF) to predict and remove stellar activity signals. We use a linear regression model with 5–25 indexes from the CCF residuals used as inputs. We also simultaneously fit Keplerians along with the activity corrections to predict the overall RV signal. The data shown in (b) and (d) are ΔCCFs and the RVs of K2-167 respectively for the old DRS (3.7).
We can describe this simplified approach as simultaneously fitting for n Keplerian signal(s) and a linear regression model with m CCF activity indicators for stellar activity. The number of CCF activity indicators m is determined by finding a balance between optimizing goodness-of-fit and preventing overfitting. In equation form, our model is given as
where CCFm is the value of the CCF at index m, wm is a weight parameter for CCFm, Keplern is the nth fitted Keplerian, and bn is this Keplerian’s weight parameter. This Keplerian weight parameter bn is equivalent to a planet’s semi-amplitude (K), but parametrizing the model in this way allows us to do a linear least-squares regression fit where we can simultaneously model planetary and stellar variability contributions to the overall RV signal.
This section describes our new CALM method in detail. Subsection 3.2 describes the pre-processing we must apply to run our CALM method; in particular, we first must prepare the CCF input representation for our model by centring the data and choosing the most information-rich subset of the CCF to feed into the model. Before running our models, we check several diagnostics to prevent overfitting, which are described in subsection 3.4. Later, we sample the parameter space using a DE-based MCMC method described in subsection 3.3. Finally, we evaluate our models and compare them with several traditional activity indicators using a new scatter metric described in subsection 3.5 and describe our RV analysis results in subsection 3.6.
3.2 Preparing the CCF input representation
3.2.1 Computing and centring our ΔCCFs
For our CALM model, we have to pre-process our data products into a uniform format which makes capturing and learning features in our data easier. We design the input representation such that the model becomes sensitive to shape changes due to stellar activity, not translational shifts caused by true Doppler shifts. Simply put, we want the CALM model to find the difference between a Gaussian fit to the CCF and the true centre-of-mass velocity shift. In this way, our model will be able to disentangle stellar activity signals from true Keplerian shifts.
To frame the problem in this way, we shift every CCF such that the velocity predicted by the Gaussian fit to the CCF is at the median velocity across all observations. We note that we took precautions to prevent improperly centring the CCFs in a way that planetary reflex motion remains in the input data. This could cause overfitting and we further describe our precautionary steps in subsection 3.4. Later, we subtract a median CCF from each CCF to compute the residual CCF8 (ΔCCF), which are visualized in Fig. 4, where they are colour-coded based on how blueshifted or redshifted the RV signals are. There exists a strong signal in shape changes based on these RV signals for the old DRS and a hint of a dependence for the new DRS. This difference may be explained by the new DRS using spectral lines less sensitive to stellar variability and the difference in stellar type mask used. Overall, we want our CALM model to learn how these shape changes relate to the stellar activity contributions to RVs so that we can model and mitigate the stellar variability using the ΔCCFs.

Computed residual CCFs (ΔCCFs) for both the older (3.7) and newer DRS (2.3.5). ΔCCFs are computed by subtracting the mean CCF and highlight differences in features between CCFs. The model predicted stellar activity contribution to the RVs is indicated by its colour (red = redshifted, blue = blueshifted).
After computing the ΔCCFs, we normalize the observations to ensure the scale of variations across each input parameter is approximately equal. A subset of indexes across these normalized ΔCCFs are then fed into the CALM model. We choose to only include a subset of indexes (5–259 indexes) rather than the entire normalized ΔCCFs (49 indexes or 160 indexes for DRS 2.3.5 and DRS 3.7, respectively) because we only have ∼80 RV observations and the entire normalized ΔCCFs would provide too many free parameters. We describe how we select which CCF indexes to use in subsection 3.2.2.
3.2.2 Selecting the most significant indexes of the ΔCCF
To ensure that our number of free parameters is significantly less than our number of observations (74), we include only a subset of the CCF rather than the entire 49-element array of the CCF10 We choose the subset of the CCF by checking which indexes provide the most statistically significant contribution in equation (1). We check for statistical significance by sampling the parameter space with DE MCMC (described in subsection 3.3) and estimating uncertainties on the CCF weights (w1, w2,..., wn). Commonly, when including ≥30 CCF indexes, the information at these indexes has degeneracies which dilutes the significance, resulting in the highest significance indexes only having 1–2σ significance. This means that for these indexes the value of the weights for CCFi is one to two times its corresponding uncertainty. We then choose the 5–25 most statistically significant CCF indexes for our final model. This exact number depends on the DRS version as explained in detail in subsection 3.6.1. When only including those ‘significant indexes’ in our model, the significance increases to 3–5σ. Generally, we find that the estimated semi-amplitude (K) of the Keplerian signal of K2-167 b is consistent for any of the number of CCF indexes tested (Fig. 5).

Estimated semi-amplitude for various model configurations. (a) Estimated semi-amplitude (K) as function of number CCF indexes for both the old DRS (grey) and new DRS (blue). As the number of CCF indexes included increases, there a slight positive trend in K. (b) Estimated K for K2-167 b in a two-planet fit model as described in subsection 3.7. Generally, the estimates of K agree within the errors for any of these model configurations.
3.3 Running our RV models with DE MCMC
Once we have chosen our CCF indexes and the number of Keplerians to include in our model, we sample our parameter space using DE MCMC implemented in edmcmc11 (Vanderburg 2021). We implemented Gaussian priors to the P and tc parameters based on the exofastv2 transit fit results (Table 1), which are discussed in subsection 4.2. We note that without including priors on P and tc, we recover the period and time of conjunction in the RVs. However, we include the priors in our analysis since they allow us to get the tightest constraints on the mass of K2-167 b. We model the Keplerian signal of K2-167 b by using radvel’s kepler.rv_drive function (Fulton et al. 2018). After performing the MCMC fit, we determine convergence of the parameters by using the Gelman–Rubin statistic (Gelman & Rubin 1992), where our Gelman–Rubin threshold < 1.01. We computed 5000 MCMC chains and discard the first 25 per cent as burn-in before producing our posterior samples. We compute the median and 68.3 per cent confidence intervals to derive our final parameter values. The results of our DE-MCMC analysis of the RVs are described in subsection 3.6.
Photometric properties and median values and 68 per cent confidence interval for global exofastv2 Models for the K2-167 system. See Table 3 in Eastman et al. () for a detailed description of all parameters.
Parameter . | Units . | Values . |
---|---|---|
Observed stellar parameters | ||
log g.......... | Spectroscopic surface gravity (cgs).......... | 4.10 ± 0.05 |
Teff.......... | Effective temperature (K).......... | 6083 ± 90 |
[Fe/H].......... | Metallicity (dex).......... | −0.34 ± 0.09 |
vsin i.......... | Projected rotational velocity (|$\rm km\, s^{-1}$|).......... | 3.8 ± 0.5 |
Derived stellar parameters | ||
M⋆.......... | Mass (M⊙).......... | |$1.109^{+0.11}_{-0.091}$| |
R⋆.......... | Radius (R⊙).......... | |$1.554^{+0.051}_{-0.050}$| |
L⋆.......... | Luminosity (L⊙).......... | |$3.08^{+0.16}_{-0.17}$| |
ρ⋆.......... | Density (cgs).......... | |$0.419^{+0.065}_{-0.059}$| |
Age.......... | Age (Gyr)a.......... | |$5.6^{+2.2}_{-2.0}$| |
log g.......... | Model-derived surface gravity (cgs).......... | |$4.102^{+0.052}_{-0.055}$| |
u1.......... | Kepler-band linear limb-darkening coeff .......... | 0.342 ± 0.050 |
u1.......... | TESS-band linear limb-darkening coeff .......... | 0.242 ± 0.030 |
u2.......... | Kepler-band quadratic limb-darkening coeff .......... | |$0.311^{+0.049}_{-0.050}$| |
u2.......... | TESS-band quadratic limb-darkening coeff .......... | 0.298 ± 0.029 |
Planetary parameters: | b | |
P.......... | Period (days).......... | |$9.978543^{+0.000023}_{-0.000020}$| |
MP.......... | Mass (|$\, M_{\hbox{$\oplus $}}$|).......... | |$7.0^{+1.7}_{-1.7}$| |
RP.......... | Radius (|$\, R_{\hbox{$\oplus $}}$|) b.......... | |$2.33^{+0.17}_{-0.15}$| |
TC.......... | Time of conjunction (|$\rm {BJD_{TDB}}$|)c.......... | 2456979.9329 ± 0.0016 |
a.......... | Semimajor axis (au).......... | |$0.0939^{+0.0029}_{-0.0027}$| |
i.......... | Inclination (degrees).......... | |$86.61^{+1.5}_{-0.53}$| |
e.......... | Eccentricity .......... | |$0.30^{+0.16}_{-0.19}$| |
ω⋆.......... | Argument of periastron (degrees) .......... | |$99^{+44}_{-37}$| |
ecos ω*.......... | .......... | −0.03 ± 0.14 |
esin ω*.......... | .......... | |$0.26^{+0.17}_{-0.23}$| |
MP sin i.......... | Minimum mass (|$\, M_{\hbox{$\oplus $}}$|) .......... | |$6.962^{+1.653}_{-1.621}$| |
MP/M*.......... | Mass ratio .......... | 0.0000173 ± 0.0000044 |
K.......... | RV semi-amplitude (|$\rm m\, s^{-1}$|) .......... | |$1.88^{+0.49}_{-0.48}$| |
Teq.......... | Equilibrium temperature (K)d.......... | |$1203^{+25}_{-27}$| |
RP/R⋆.......... | Radius of planet in stellar radii .......... | |$0.01444^{+0.00077}_{-0.00057}$| |
a/R⋆.......... | Semimajor axis in stellar radii .......... | 13.01 ± 0.64 |
d/R⋆.......... | Planet/star separation at mid transit .......... | |$9.5^{+3.0}_{-2.3}$| |
δ.......... | Transit depth in Kepler (Rp/R⋆)2.......... | |$0.0002299^{+0.0000086}_{-0.0000085}$| |
δ.......... | Transit depth in TESS (Rp/R⋆)2 e.......... | |$0.0002235^{+0.0000097}_{-0.0000096}$| |
T14.......... | Total transit duration (days).......... | |$0.1507^{+0.0037}_{-0.0039}$| |
b.......... | Transit Impact parameter .......... | |$0.60^{+0.18}_{-0.37}$| |
〈F〉.......... | Incident Flux (109 erg s−1 cm−2).......... | |$0.425^{+0.054}_{-0.045}$| |
Parameter . | Units . | Values . |
---|---|---|
Observed stellar parameters | ||
log g.......... | Spectroscopic surface gravity (cgs).......... | 4.10 ± 0.05 |
Teff.......... | Effective temperature (K).......... | 6083 ± 90 |
[Fe/H].......... | Metallicity (dex).......... | −0.34 ± 0.09 |
vsin i.......... | Projected rotational velocity (|$\rm km\, s^{-1}$|).......... | 3.8 ± 0.5 |
Derived stellar parameters | ||
M⋆.......... | Mass (M⊙).......... | |$1.109^{+0.11}_{-0.091}$| |
R⋆.......... | Radius (R⊙).......... | |$1.554^{+0.051}_{-0.050}$| |
L⋆.......... | Luminosity (L⊙).......... | |$3.08^{+0.16}_{-0.17}$| |
ρ⋆.......... | Density (cgs).......... | |$0.419^{+0.065}_{-0.059}$| |
Age.......... | Age (Gyr)a.......... | |$5.6^{+2.2}_{-2.0}$| |
log g.......... | Model-derived surface gravity (cgs).......... | |$4.102^{+0.052}_{-0.055}$| |
u1.......... | Kepler-band linear limb-darkening coeff .......... | 0.342 ± 0.050 |
u1.......... | TESS-band linear limb-darkening coeff .......... | 0.242 ± 0.030 |
u2.......... | Kepler-band quadratic limb-darkening coeff .......... | |$0.311^{+0.049}_{-0.050}$| |
u2.......... | TESS-band quadratic limb-darkening coeff .......... | 0.298 ± 0.029 |
Planetary parameters: | b | |
P.......... | Period (days).......... | |$9.978543^{+0.000023}_{-0.000020}$| |
MP.......... | Mass (|$\, M_{\hbox{$\oplus $}}$|).......... | |$7.0^{+1.7}_{-1.7}$| |
RP.......... | Radius (|$\, R_{\hbox{$\oplus $}}$|) b.......... | |$2.33^{+0.17}_{-0.15}$| |
TC.......... | Time of conjunction (|$\rm {BJD_{TDB}}$|)c.......... | 2456979.9329 ± 0.0016 |
a.......... | Semimajor axis (au).......... | |$0.0939^{+0.0029}_{-0.0027}$| |
i.......... | Inclination (degrees).......... | |$86.61^{+1.5}_{-0.53}$| |
e.......... | Eccentricity .......... | |$0.30^{+0.16}_{-0.19}$| |
ω⋆.......... | Argument of periastron (degrees) .......... | |$99^{+44}_{-37}$| |
ecos ω*.......... | .......... | −0.03 ± 0.14 |
esin ω*.......... | .......... | |$0.26^{+0.17}_{-0.23}$| |
MP sin i.......... | Minimum mass (|$\, M_{\hbox{$\oplus $}}$|) .......... | |$6.962^{+1.653}_{-1.621}$| |
MP/M*.......... | Mass ratio .......... | 0.0000173 ± 0.0000044 |
K.......... | RV semi-amplitude (|$\rm m\, s^{-1}$|) .......... | |$1.88^{+0.49}_{-0.48}$| |
Teq.......... | Equilibrium temperature (K)d.......... | |$1203^{+25}_{-27}$| |
RP/R⋆.......... | Radius of planet in stellar radii .......... | |$0.01444^{+0.00077}_{-0.00057}$| |
a/R⋆.......... | Semimajor axis in stellar radii .......... | 13.01 ± 0.64 |
d/R⋆.......... | Planet/star separation at mid transit .......... | |$9.5^{+3.0}_{-2.3}$| |
δ.......... | Transit depth in Kepler (Rp/R⋆)2.......... | |$0.0002299^{+0.0000086}_{-0.0000085}$| |
δ.......... | Transit depth in TESS (Rp/R⋆)2 e.......... | |$0.0002235^{+0.0000097}_{-0.0000096}$| |
T14.......... | Total transit duration (days).......... | |$0.1507^{+0.0037}_{-0.0039}$| |
b.......... | Transit Impact parameter .......... | |$0.60^{+0.18}_{-0.37}$| |
〈F〉.......... | Incident Flux (109 erg s−1 cm−2).......... | |$0.425^{+0.054}_{-0.045}$| |
Notes. (a) We note that age measurements derived from isochrones have been found to have large systematic uncertainties (Torres, Andersen & Giménez 2010).
(b) We note that this is in equatorial earth radii.
(c) Time of conjunction is commonly reported as the ‘transit time’
(d) Assumes no albedo and perfect redistribution.
(e) The difference in transit depths for Kepler and TESS is is due to limb darkening effects.
Photometric properties and median values and 68 per cent confidence interval for global exofastv2 Models for the K2-167 system. See Table 3 in Eastman et al. () for a detailed description of all parameters.
Parameter . | Units . | Values . |
---|---|---|
Observed stellar parameters | ||
log g.......... | Spectroscopic surface gravity (cgs).......... | 4.10 ± 0.05 |
Teff.......... | Effective temperature (K).......... | 6083 ± 90 |
[Fe/H].......... | Metallicity (dex).......... | −0.34 ± 0.09 |
vsin i.......... | Projected rotational velocity (|$\rm km\, s^{-1}$|).......... | 3.8 ± 0.5 |
Derived stellar parameters | ||
M⋆.......... | Mass (M⊙).......... | |$1.109^{+0.11}_{-0.091}$| |
R⋆.......... | Radius (R⊙).......... | |$1.554^{+0.051}_{-0.050}$| |
L⋆.......... | Luminosity (L⊙).......... | |$3.08^{+0.16}_{-0.17}$| |
ρ⋆.......... | Density (cgs).......... | |$0.419^{+0.065}_{-0.059}$| |
Age.......... | Age (Gyr)a.......... | |$5.6^{+2.2}_{-2.0}$| |
log g.......... | Model-derived surface gravity (cgs).......... | |$4.102^{+0.052}_{-0.055}$| |
u1.......... | Kepler-band linear limb-darkening coeff .......... | 0.342 ± 0.050 |
u1.......... | TESS-band linear limb-darkening coeff .......... | 0.242 ± 0.030 |
u2.......... | Kepler-band quadratic limb-darkening coeff .......... | |$0.311^{+0.049}_{-0.050}$| |
u2.......... | TESS-band quadratic limb-darkening coeff .......... | 0.298 ± 0.029 |
Planetary parameters: | b | |
P.......... | Period (days).......... | |$9.978543^{+0.000023}_{-0.000020}$| |
MP.......... | Mass (|$\, M_{\hbox{$\oplus $}}$|).......... | |$7.0^{+1.7}_{-1.7}$| |
RP.......... | Radius (|$\, R_{\hbox{$\oplus $}}$|) b.......... | |$2.33^{+0.17}_{-0.15}$| |
TC.......... | Time of conjunction (|$\rm {BJD_{TDB}}$|)c.......... | 2456979.9329 ± 0.0016 |
a.......... | Semimajor axis (au).......... | |$0.0939^{+0.0029}_{-0.0027}$| |
i.......... | Inclination (degrees).......... | |$86.61^{+1.5}_{-0.53}$| |
e.......... | Eccentricity .......... | |$0.30^{+0.16}_{-0.19}$| |
ω⋆.......... | Argument of periastron (degrees) .......... | |$99^{+44}_{-37}$| |
ecos ω*.......... | .......... | −0.03 ± 0.14 |
esin ω*.......... | .......... | |$0.26^{+0.17}_{-0.23}$| |
MP sin i.......... | Minimum mass (|$\, M_{\hbox{$\oplus $}}$|) .......... | |$6.962^{+1.653}_{-1.621}$| |
MP/M*.......... | Mass ratio .......... | 0.0000173 ± 0.0000044 |
K.......... | RV semi-amplitude (|$\rm m\, s^{-1}$|) .......... | |$1.88^{+0.49}_{-0.48}$| |
Teq.......... | Equilibrium temperature (K)d.......... | |$1203^{+25}_{-27}$| |
RP/R⋆.......... | Radius of planet in stellar radii .......... | |$0.01444^{+0.00077}_{-0.00057}$| |
a/R⋆.......... | Semimajor axis in stellar radii .......... | 13.01 ± 0.64 |
d/R⋆.......... | Planet/star separation at mid transit .......... | |$9.5^{+3.0}_{-2.3}$| |
δ.......... | Transit depth in Kepler (Rp/R⋆)2.......... | |$0.0002299^{+0.0000086}_{-0.0000085}$| |
δ.......... | Transit depth in TESS (Rp/R⋆)2 e.......... | |$0.0002235^{+0.0000097}_{-0.0000096}$| |
T14.......... | Total transit duration (days).......... | |$0.1507^{+0.0037}_{-0.0039}$| |
b.......... | Transit Impact parameter .......... | |$0.60^{+0.18}_{-0.37}$| |
〈F〉.......... | Incident Flux (109 erg s−1 cm−2).......... | |$0.425^{+0.054}_{-0.045}$| |
Parameter . | Units . | Values . |
---|---|---|
Observed stellar parameters | ||
log g.......... | Spectroscopic surface gravity (cgs).......... | 4.10 ± 0.05 |
Teff.......... | Effective temperature (K).......... | 6083 ± 90 |
[Fe/H].......... | Metallicity (dex).......... | −0.34 ± 0.09 |
vsin i.......... | Projected rotational velocity (|$\rm km\, s^{-1}$|).......... | 3.8 ± 0.5 |
Derived stellar parameters | ||
M⋆.......... | Mass (M⊙).......... | |$1.109^{+0.11}_{-0.091}$| |
R⋆.......... | Radius (R⊙).......... | |$1.554^{+0.051}_{-0.050}$| |
L⋆.......... | Luminosity (L⊙).......... | |$3.08^{+0.16}_{-0.17}$| |
ρ⋆.......... | Density (cgs).......... | |$0.419^{+0.065}_{-0.059}$| |
Age.......... | Age (Gyr)a.......... | |$5.6^{+2.2}_{-2.0}$| |
log g.......... | Model-derived surface gravity (cgs).......... | |$4.102^{+0.052}_{-0.055}$| |
u1.......... | Kepler-band linear limb-darkening coeff .......... | 0.342 ± 0.050 |
u1.......... | TESS-band linear limb-darkening coeff .......... | 0.242 ± 0.030 |
u2.......... | Kepler-band quadratic limb-darkening coeff .......... | |$0.311^{+0.049}_{-0.050}$| |
u2.......... | TESS-band quadratic limb-darkening coeff .......... | 0.298 ± 0.029 |
Planetary parameters: | b | |
P.......... | Period (days).......... | |$9.978543^{+0.000023}_{-0.000020}$| |
MP.......... | Mass (|$\, M_{\hbox{$\oplus $}}$|).......... | |$7.0^{+1.7}_{-1.7}$| |
RP.......... | Radius (|$\, R_{\hbox{$\oplus $}}$|) b.......... | |$2.33^{+0.17}_{-0.15}$| |
TC.......... | Time of conjunction (|$\rm {BJD_{TDB}}$|)c.......... | 2456979.9329 ± 0.0016 |
a.......... | Semimajor axis (au).......... | |$0.0939^{+0.0029}_{-0.0027}$| |
i.......... | Inclination (degrees).......... | |$86.61^{+1.5}_{-0.53}$| |
e.......... | Eccentricity .......... | |$0.30^{+0.16}_{-0.19}$| |
ω⋆.......... | Argument of periastron (degrees) .......... | |$99^{+44}_{-37}$| |
ecos ω*.......... | .......... | −0.03 ± 0.14 |
esin ω*.......... | .......... | |$0.26^{+0.17}_{-0.23}$| |
MP sin i.......... | Minimum mass (|$\, M_{\hbox{$\oplus $}}$|) .......... | |$6.962^{+1.653}_{-1.621}$| |
MP/M*.......... | Mass ratio .......... | 0.0000173 ± 0.0000044 |
K.......... | RV semi-amplitude (|$\rm m\, s^{-1}$|) .......... | |$1.88^{+0.49}_{-0.48}$| |
Teq.......... | Equilibrium temperature (K)d.......... | |$1203^{+25}_{-27}$| |
RP/R⋆.......... | Radius of planet in stellar radii .......... | |$0.01444^{+0.00077}_{-0.00057}$| |
a/R⋆.......... | Semimajor axis in stellar radii .......... | 13.01 ± 0.64 |
d/R⋆.......... | Planet/star separation at mid transit .......... | |$9.5^{+3.0}_{-2.3}$| |
δ.......... | Transit depth in Kepler (Rp/R⋆)2.......... | |$0.0002299^{+0.0000086}_{-0.0000085}$| |
δ.......... | Transit depth in TESS (Rp/R⋆)2 e.......... | |$0.0002235^{+0.0000097}_{-0.0000096}$| |
T14.......... | Total transit duration (days).......... | |$0.1507^{+0.0037}_{-0.0039}$| |
b.......... | Transit Impact parameter .......... | |$0.60^{+0.18}_{-0.37}$| |
〈F〉.......... | Incident Flux (109 erg s−1 cm−2).......... | |$0.425^{+0.054}_{-0.045}$| |
Notes. (a) We note that age measurements derived from isochrones have been found to have large systematic uncertainties (Torres, Andersen & Giménez 2010).
(b) We note that this is in equatorial earth radii.
(c) Time of conjunction is commonly reported as the ‘transit time’
(d) Assumes no albedo and perfect redistribution.
(e) The difference in transit depths for Kepler and TESS is is due to limb darkening effects.
3.4 Overfitting diagnostics
Overfitting is a significant problem in RV analyses and the state-of the art methods that are widely used to model and predict stellar variability often do not agree within error. The severity of this problem is particularly evident in recent work by Zhao et al. (2022) where 11 teams were invited to use their 22 different methods on ultra precise EXPRES radial velocities of four stars that encompass various levels of activity. There is a concerning lack of agreement between these methods across these targets, which perhaps can be attributed to some of the methods overfitting to noise in the observations. Recently, Blunt et al. (2023) demonstrated that GP regression, one of the most common and state-of-the-art methods of modelling stellar variability, can be susceptible to overfitting. We wanted to prevent overfitting and thus set out to develop various methods to mitigate the risk of overfitting.
We developed several diagnostic plots as illustrated in Fig. 6. We developed these diagnostics to prevent two common types of overfitting concerns:
Not properly centring CCFs such that planetary reflex motion remains in the input data.
Using too many CCFs indexes such that there are too many free parameters compared to number of observations

Overfitting diagnostics for stellar activity RV analysis using the new DRS (2.3.5) HARPS-N ΔCCFs. The three columns in this Figure show the ΔCCFs, the weight parameters corresponding to the ΔCCFs, and the phase-folded RVs for three scenarios respectively. For each of the Figures in the third column (c, f, i), we provide a legend which includes the scatter of the radial velocities from the HARPS-N pipeline (raw rvs, std) in |$\rm m\, s^{-1}$|, the scatter of the activity corrected radial velocities (corr rvs, std) in |$\rm m\, s^{-1}$|, and the predicted semi-amplitude of the planet (planet preds, K) in |$\rm m\, s^{-1}$|. The first two rows corresponds to two different overfitting concerns and the last row demonstrates a case where both of these concerns are addressed. In the first row (a, b, c), we plot the ΔCCFs that have not been shifted to be centred at the median velocity. Without centring the ΔCCFs, the algorithm will be able to access translational shift (i.e. Doppler shifts) information and attribute potential planet signals to stellar activity signals, significantly attenuating their amplitude. In the second row (d, e, f), we show a case where the ΔCCFs are shifted but the number of indexes is too large. This results in overfitting as seen in the weights plot for the second row. In the third row (g, h, i), we have only fed in the ΔCCF indexes that are considered significant (as described in subsection 3.2.2) and use the properly shifted ΔCCFs.
To address the first concern, we plot the ΔCCFs in the first column of Fig. 6 to check whether all translation shifts (i.e. planetary reflex motion) were removed and the CCFs were properly centred. To illustrate how this problem appears, we intentionally did not centre the ΔCCFs in Fig. 6a. We thus observe a sinusoidal-like pattern for the red ΔCCFs which correspond to more redshifted RVs and the opposite pattern for more blue ΔCCFs. This is a tell-tale sign of a translation shift still remaining within the ΔCCFs. We can understand this shape by reminding ourselves of the definition of a derivative where we define the median CCF as f(x) and a redshifted CCF as f(x − h) and thus can write
Thus, the shape expected from the difference between the redshifted CCF and the median CCF is simply the derivative of a Gaussian. This is illustrated in Appendix Fig. A1 where we simulated a few Gaussian curves and added white Gaussian noise. We show the expected residuals for improperly centred and properly centred CCFs in Appendix Figs A1a, b, c, and d, respectively. When ΔCCFs are properly shifted to the centre, this pattern disappears as seen in both the simulated example in Appendix Fig. A1c, d and in the real data in Fig. 6d, g.
To address the second concern, we created diagnostic plots that include the weights for the ΔCCFs as seen in the second column of Fig. 6. When too many CCF weights are included, we tend to see structure in these weights plots that is caused by overfitting as one weight is compensating for its neighbouring weight. We illustrate this in the second row of Fig. 6 where we show a case where the ΔCCFs are shifted properly but the number of indexes is too large. Although this result in quite a low rms of 2.3 |$\rm m\, s^{-1}$|, the large number of free parameters results in overfitting on the noise in the data. We note that even when this overfitting occurs, the planetary signal remains preserved with a semi-amplitude of K ∼ 2.2 |$\rm m\, s^{-1}$|.
Finally, we show an example in the third row of Fig. 6 where both the ΔCCFs are properly centred and the number of indexes (5) is small enough to not result in overfitting. Both the derivative-of-a-Gaussian pattern in the ΔCCFs and the structure in the weights disappear.
3.5 Introducing a new performance metric: unexplained scatter
To measure the original scatter in the data before applying any activity correction, we compute the raw scatter, σraw, which is defined as the sample mean standard deviation of the RVs minus the expected measurement error such that
where SD is the sample standard deviation, RVHARPS are the raw radial velocities from the HARPS DRS pipeline, and σmeasurement is the median estimated RV errors from the DRS which includes photon noise, drift noise, and errors in the wavelength calibration (Dumusque et al. 2021). Defining the scatter in this way assumes that we can approximate the source of RV variations as Gaussian noise and that they can be approximated as independent scatter. Both of these assumptions are not true but this provides a sufficient approximation of the noise. We then evaluate the reduction in RV scatter using two scatter metrics: the activity corrected scatter (σac) and the unexplained scatter (σus). We define σac as
where RVSA are the stellar activity contributions to the RVs as predicted by the given activity model. This type of scatter metric12 is sometimes used to evaluate and compare stellar activity models, but this metric does not account for the scatter that can be caused by planet signal(s). Focusing on just reducing this scatter metric could unintentionally incentivize removing planet signals rather than preserving them with these activity correction methods. To partially address this concern, we introduce a metric that subtracts modelled planet signals so we primarily evaluate the activity model correction method on unexplained scatter rather than any scatter. We define the unexplained scatter (σus) as
where RVplanet is the contribution of modelled planets to the RV measurements. We note that there can evidently still remain un-modelled planet signals so this scatter metric does not fully address the concern of incentivizing removal of planet signals to reduce RMS. However, this can be a step towards finding better metrics for evaluating our RV models.
3.6 MCMC results
For both the new and old DRS, the MCMC successfully converged to a solution for the stellar activity model and the planet parameters.
For the old DRS (3.7), we found that CALM stellar activity mitigation method yields a significant improvement in both the RV scatter (from σraw = 4.02 |$\rm m\, s^{-1}$| to σus = 2.04 |$\rm m\, s^{-1}$|) and the precision with which we recover the mass of the transiting planet (improving a 3.0σ significance detection into a 4.3σ significance detection; see Fig. 7a, b). With our MCMC fit, we derive a mass of |$6.3_{-1.4}^{+1.4}$| |$\, M_{\hbox{$\oplus $}}$| and find that the orbital period and phase of the detection agrees with the known 9.978543 d period from transit observations.

Phase-folded RVs before and after applying our activity correction model. From the first to second column, we compare the phase-folded RVs for the old (3.7) and new (2.3.5) DRS. (a) Before applying our linear regression activity correction, the RV scatter is σraw = 4.02 |$\rm m\, s^{-1}$| and we detect the semi-amplitude of the planet with 3.0σ confidence in the old DRS (3.7) RVs. (b) When we simultaneously fit our CALM activity model based on shape changes in the CCF with a Keplerian, the detection significance increases to 4.3σ. (c) For the new DRS, the raw scatter is much lower (σraw = 3.01 |$\rm m\, s^{-1}$|) and without applying an activity model, we get a 4.3σ detection. (d) With an activity model, the raw scatter decreases to σraw = 2.41 |$\rm m\, s^{-1}$| and the detection significance remains the same at 4.3σ. For a description of the scatter metrics, σraw and σus, see subsection 3.5.
For the new DRS (2.3.5), we find that the CALM method does improve the scatter in the RVs from from σraw = 3.01 |$\rm m\, s^{-1}$| to σus = 2.41 |$\rm m\, s^{-1}$|. However, the level of stellar activity is already sufficiently low in the new DRS that even without an activity model, we can detect the planet signal at 4.3σ significance (Fig. 7c, d). When applying our CALM framework, the detection significance remains at 4.3σ. Although this does not provide a notable improvement, this result does demonstrate that our method effectively preserves planetary Doppler signals rather than removing them to achieve a lower RV scatter. We note that after applying our activity mitigation method, the final unexplained scatter in the new DRS (σus = 2.41 |$\rm m\, s^{-1}$|) is marginally higher than in the old DRS (σus = 2.04 |$\rm m\, s^{-1}$|). This may be due to a difference in line lists between the old and new DRS where less activity sensitive lines were used to derive the CCFs and RVs in the new DRS. Although this may result in cleaner RVs, this may also make it more challenging to remove remaining stellar activity signals with the CCFs in the new DRS. In the future, using custom line lists that are especially sensitive to stellar activity to compute custom CCFs may allow us to mitigate the stellar activity more effectively using CALM.
3.6.1 Results in Fourier space: activity signal peaks decrease and planetary signal emerges
For both the old DRS and new DRS, we investigate the behaviour of raw and stellar activity corrected HARPS RVs in the Fourier domain to investigate which (quasi)-periodic signals are being modelled and removed. To do this, we compute the Lomb–Scargle Periodogram (Lomb 1976; Scargle 1982) of the RVs before and after applying the CALM activity correction. We used the periodogram functions in astropy.timeseries (VanderPlas et al. 2012; VanderPlas & Ivezić 2015) to implement a generalized Lomb–Scargle Periodogram, where the periodograms are normalized according to the formalism in Zechmeister & Kürster (2009). In the periodograms corresponding to the old DRS (Fig. 8) and the new DRS (Fig. 9), we indicate the expected planet period (9.978543 d) from transits in yellow. Based on the vsin i = 3.8 km s−1 derived from the SPC analysis of HARPS-N spectra, we expect the rotation period of the star to be ∼15 d and half the rotation period to be at ∼7.5 d as indicated by Prot and Prot/2 in grey in Figs 8, 9.

Old DRS (3.7) Periodograms as a function of number of CCF indexes used in the stellar activity correction model. HARPS-N Raw (a) and Corrected (b) RVs in Fourier space for 5 CCF indexes. In each panel, the 1.0 per cent and 10.0 per cent false alarm probabilities (FAPs) computed using the bootstrap method are indicated with green and black dotted lines, respectively. The long-term activity signals in panel (a) decrease in magnitude after applying the activity correction in panel (b). The suspected rotation period and half the rotation period are indicated in grey. The planet period is indicated in yellow. As we increase the number of included CCF indexes to N=15 (c) and N = 25 (d), the peak corresponding to the star’s rotation period (Prot) decreases significantly in magnitude after applying the stellar activity corrections and a planet signal emerges at 9.978543 d.
For the old DRS (3.7), we plot Lomb–Scargle Periodograms of the RVs before and after applying the activity correction models with increasing number of CCF indexes in Fig. 8. As we increase the number of CCF indexes (N) included in the CALM model from N = 5 (Fig. 8b) to N = 15 (Fig. 8c), and finally to N = 25 (Fig. 8d), we see that the peaks corresponding to Prot, Prot/2, and long-term stellar activity decrease in magnitude. A strong signal emerges instead at the the expected planet period (9.978543 d). Thus, a larger number of CCF indexes (25) clearly more effectively separates stellar activity signals from Doppler reflex motion and reveals planetary signals. However, this does not imply that we should infinitely increase the number of CCF indexes. As discussed in subsection 3.4, increasing the number of CCF indexes can result in too many free parameters compared to the number of observations and overfitting on noise in the data. We explored using 5–40 CCF indexes and find that ∼25 CCF indexes strikes the balance between preventing overfitting and still modelling the stellar activity signals effectively.
For the new DRS (2.3.5), we plot periodograms of the RVs before and after applying the activity correction model with 5 CCF indexes in Fig. 9. We do not include several activity models with increasing number of CCF indexes since we did not find that increasing the number of CCF indexes made any notable difference in the periodograms for the new DRS. Although we do see the magnitude of the peaks corresponding to Prot, Prot/2, and long-term stellar activity decrease marginally in magnitude and a slightly stronger peak corresponding to the planet period (9.978543 d) emerge in 9b, this effect is much less pronounced compared to the old DRS. We expect that this is due to a difference in the use of linelists in the new and old DRS to compute the CCFs and RVs as discussed in subsection 3.6.

New DRS (2.3.5) Periodograms for 5 CCF indexes. HARPS-N Raw (a) and Corrected (b) RVs in Fourier space for 5 CCF indexes. The 1.0 per cent and 10.0 per cent FAPs computed using the bootstrap method are indicated with green and black dotted lines, respectively. The long-term activity signals in panel (a) decrease in magnitude after applying the activity correction in panel (b). The suspected rotation period and half the rotation period are indicated in grey. The planet period is indicated in yellow. From (a) to (b), the peak corresponding to a planet signal at 9.978543 d increases slightly in magnitude.
3.7 Assessing the possibility of a second planetary companion
In our activity corrected periodograms (bottom panels of Figs 8, 9), we noticed a peak arising at ∼22 d that could be hinting at a secondary companion. This period does not correspond to sidereal day, sidereal year, or lunation period aliases of K2-167 b or the star’s rotation period and aliases. Thus, we performed a 2 planet fit with our CCF linear activity model where K2-167 b is modelled as a Keplerian and the possible companion is modelled as a cosine function to keep free parameters to a minimum. Although our DE-MCMC model converged (Gelman Rubin values of <1.009) and the fit looks relatively convincing as seen in Fig. 10, we do not find a significant detection for this potential second companion. From the corner plot showing the distribution of possible periods (P2) in Appendix Fig. A2, we see that multiple periods between 21.6 and 22.8 could fit the observed data, which may help explain the lack of significant detection. Additional RV measurements are necessary to constrain the period and confirm or rule out this companion’s existence.

Phasefolded two planet fit. (a) The RV model for K2-167 b is plotted in black and the binned RVs are plotted in yellow. The data is phase-folded on K2-167 b’s period of 9.978543 d. (b) RV model for a hypothetical outer companion is plotted in blue and the binned RVs are plotted in yellow. The data is phase-folded on the fitted period of 22.01 d.
3.7.1 Comparing CALM to traditional activity indicators
For the old DRS, which shows a larger scatter, we compared our CCF activity indicators with traditional activity indicators such as s-index, Hα, NaD, and other CCF metrics such as bisector (BIS), full width at half-maximum (FWHM), and contrast. In Fig. 11, we evaluate these methods and compare them with our CALM method by both examining how well the planet signal is preserved and what the final remaining scatter in the RV measurements is. We use the scatter metrics defined in detail in subsection 3.5 for this comparison. We find that using any combination of activity indexes and/or BIS, FWHM results in a higher final unexplained scatter and a less significant detection of K2-167 b’s mass. In particular, using only the BIS, FWHM, and contrast results in the highest final unexplained scatter of 3.18 |$\rm m\, s^{-1}$| with the lowest significance of detection of 2.92σ. The other activity indicator models perform marginally better and yield detections of 3.16σ–3.25σ. However, when we include CCFs, we see a significant reduction in unexplained scatter and increased detection significance. Both the 25 CCF CALM model and the 25 CCF CALM model with activity indexes (s-index, H α, NaD) yield a σus = 2.0 |$\rm m\, s^{-1}$|, which is the lowest σus we find. The 25 CCF CALM model has the higher detection significance with 4.29σ compared to 3.95σ for the CCF with activity indexes model. Thus, overall the 25 CCF CALM model leads to the lowest σus and the highest detection significance among these methods.

Comparison between using CCFs, CCF metrics (FWHM, bisector, contrast), and activity indexes (s-index, Ca ii, NaD, Hα) as activity correction models for the Old DRS (3.7). On the bottom panel, the corrected scatter metrics (σac, σus) for each of the activity models are plotted. The activity corrected standard deviation σac measures the scatter in the activity corrected RV measurements. The unexplained scatter (σus) measures the scatter remaining after removing all known contributions (stellar activity, planet signal, estimated instrumental noise, photon noise) to the RV signal. These scatter metrics are defined in more detail in subsection 3.5. In the top panel, we include the estimated semi-amplitude of K2-167 b for each activity correction model and the corresponding error bars and detection significance to check how well the planet signal is preserved for each activity mitigation method.
3.8 Final RV model
For our final RV model, we use the new DRS (2.3.5) RVs and the 5 CCF activity indicator CALM activity correction model since this yields the lowest final unexplained scatter (σus). The CCF activity indicators are then used in the global transit, RV, stellar parameter fit to decorrelate the RVs as further described in subsection 4.2. We do not include the possible additional planet that we investigated as described in subsection 3.7 in our final RV model.
4 K2-167 SYSTEM PARAMETERS
In addition to performing the RV activity mitigation analysis, we also measure the spectroscopic properties and use the CALM activity indicators to perform a global transit and stellar parameter analysis for deriving the refined K2-167 system parameters as described in this section.
4.1 Spectroscopic parameters
Using all 74 of the HARPS-N spectra, we measure the spectroscopic properties of K2-167 using three independent methods: ARES + MOOG13,14 (Sousa 2014), CCFpams15 (Malavolta et al. 2017), and Stellar Parameter Classification (SPC) (Buchhave et al. 2014).
First, we measured the spectroscopic parameters using ARES + MOOG, which is a curve-of-growth method that relies on neutral and ionized iron lines and is further described by Mortier et al. (2013). We co-added all the spectra, used ARES2 (Sousa 2014; Sousa et al. 2015) to measured equivalent widths of iron lines, and used MOOG (Sneden 1973) to determine atmospheric parameters. We accounted for systematic effects in ARES + MOOG by using the surface gravity corrected method from Mortier et al. (2014). Our systematic errors were added to our precision errors in quadrature. These precision errors are intrinsic to the method. The systematic errors are 60 K for the the effective temperature, 0.1 dex for the surface gravity, and 0.04 dex for the metallicity (Sousa et al. 2011). This analysis yielded an effective temperature Teff, MOOG = 6115 ± 69 K, surface gravity log gcgs, MOOG = 4.12 ± 0.13, and an iron abundance [Fe/H]MOOG = −0.32 ± 0.04.
The CCFpams method (Malavolta et al. 2017), which relies on an empirical calibration that uses the width of CCFs to determine the effective temperature, surface gravity, and iron abundance. Using the same systematic errors for effective temperature, surface gravity, and metallicity as used for ARES + MOOG from Sousa et al. (2011), this method computed an effective temperature Teff, CCFpams = 6092 ± 68 K, surface gravity log gcgs, CCFpams = 4.09 ± 0.12, and an iron abundance [Fe/H]CCFpams = −0.34 ± 0.05.
We used the SPC method (Buchhave et al. 2014), which determines stellar atmospheric parameters by cross-correlating a stellar spectrum with synthetic spectra from Kurucz (1992) model atmospheres and then interpolating correlation peaks to find the effective temperature, surface gravity, and metallicity. We ran SPC on all our HARPS-N spectra. Using a prior on gravity of log gcgs, SPC = 4.12, we measured a temperature of Teff, SPC = 6047 ± 49 K, a surface gravity of log gcgs, SPC = 4.12 ± 0.10, an iron abundance [Fe/H]SPC = −0.37 ± 0.08, and the projected rotational velocity vsin iSPC = 3.8 ± 0.5 km s−1.
Lastly, we computed the weighted average across all three methods. The weighted average yields an effective temperature Teff, avg = 6083 ± 90 K, a surface gravity of log gcgs, avg = 4.10 ± 0.05, and an iron abundance [Fe/H]avg = −0.34 ± 0.09. We adopt these values and include Teff, avg and [Fe/H]avg as Gaussian priors in our global transit and stellar parameter analysis.
4.2 Global transit/stellar parameter analysis
To refine the ephemerides and system parameters of K2-167 b, we used the fitting software exofastv2 (Eastman, Gaudi & Agol 2013; Eastman 2017), which allows us to combine transit observations from several missions with our spectroscopic parameters and archival photometry. In particular, we simultaneously fit the TESS, K2 photometry, RVs, and stellar parameters (see Section 2). In addition to including the RVs, we include the CCF-based activity indicators derived from the CALM model (described in subsection 3.2) and use these to detrend the RVs in the exofastv2 fit. Since this detrending is identical to performing a linear regression, incorporating the CCF activity indicators in exofastv2 is identical to using our CALM model for the RVs. In this fit, we constrain the possible mass, radius, and age of the host star according to the MESA Isochrones and Stellar Tracks (MISTs) and stellar evolution models (Paxton et al. 2011, 2013, 2015; Choi et al. 2016; Dotter 2016).
We required a Gelman–Rubin statistic (Gelman & Rubin 1992) of less than 1.01 and at least 1000 independent draws in each parameter to ensure strict convergence criteria. We imposed Gaussian priors on [Fe/H] and Teff using the weighted averages derived from our spectroscopic analysis described in subsection 4.1. The results of our fit are shown in Fig. 2 and Table 1. In the exofastv2 fit, we find Mp = |$7.0^{+1.7}_{-1.7}$| |$\, M_{\hbox{$\oplus $}}$| and thus we find close agreement between the CALM estimate of mass (|$6.3_{-1.4}^{+1.4}$| |$\, M_{\hbox{$\oplus $}}$|) and the mass measured by Bonomo et al. (2023) (|$6.5_{-1.5}^{+1.6}$| |$\, M_{\hbox{$\oplus $}}$|).
4.3 Stellar variability signatures in the photometry
We examined both the K2 and TESS light curves to look for stellar rotation signals. We did not find significant rotation signals in the TESS data. The K2 light curves have higher photometric precision and probe a bluer wavelength region compared to TESS and thus should be able to probe more stellar variability. For both the TESS and K2 data, We used the unflattened light curve that is not corrected for systematics since the systematic corrections often also remove stellar variability signals. We observe less than 0.1 per cent variability in the K2 light curves. In Fig. 12, we examine the K2 light curve in Fourier space and find no significant signals at the expected rotation period (Prot ∼15 d). We find that the periodogram is dominated by K2 instrumental systematics. Significant signal loss is expected for K2 due to long-term systematic noise (e.g. differential velocity aberration) for periods greater than 15 d (Van Cleve et al. 2016) and make it challenging to detect oscillations and rotation signals.

Periodogram of K2 photometry. In grey, the expected stellar rotation period (Prot) of ∼15 d is indicated and the instrumental systematics that dominate signals with P >20 d are indicated. The instrumental systematics dominate the Fourier spectrum of the K2 light curve.
5 DISCUSSION
In this work, we have measured the mass of K2-167 b as |$6.3_{-1.4}^{+1.4}$| |$\, M_{\hbox{$\oplus $}}$| by taking advantage of the reduction in scatter resulting from updates to the HARPS-N Data Reduction System (2.3.5) and our new stellar activity mitigation method (CALM). This measurement agrees within error with the recently published mass measurement of Bonomo et al. (2023) (|$6.5_{-1.5}^{+1.6}$| |$\, M_{\hbox{$\oplus $}}$|), which was part of a larger statistical sample of exoplanet systems. Together with our radius measurement (|$2.33^{+0.17}_{-0.15}$| |$\, R_{\hbox{$\oplus $}}$|), we can use our precise mass measurement to place constraints on K2-167 b’s composition. We compared our measured mass and radius to mass–radius relations for planets of different compositions from Zeng et al. (2016). In Fig. 13, we demonstrate this newly determined mass in a mass/radius diagram. The mass and composition of K2-167 b is not consistent with an Earth-like composition (32.5 per cent Iron core, 67.5 per cent silicate mantle). Rather, K2-167 b is less dense than earth-like planets and falls closer to the pure ice (H2O) mass–radius relation from Zeng et al. (2016).

Residual CCFs (ΔCCFs) from the new DRS (2.3.5) for two example stars, K2-2 (Thygeson et al. in prep) and LTT 3780 (Cloutier et al. 2020). Although the new DRS (2.3.5) did not show a clear pattern in shape changes for K2-167, we can see that other targets do clearly exhibit shape changes corresponding to stellar variability and could potentially benefit from our CALM method.
This leaves several possibilities for the exact composition of K2-167 b. This planet could be a mixture of rock, iron, and hydrogen/helium (H/He). One possibility would be that K2-167 b has an earth-like iron core and mantle that is surrounded by a volatile envelope, which is believed to be a common composition for super-Earths and sub-Neptunes (Weiss & Marcy 2014; Dressing et al. 2015; Rogers 2015). Another possibility is that K2-167 b is a planet with a much smaller iron core than earth and mostly or entirely consists of water or methane or some other heavy volatiles.
Despite the fact that this planet was originally detected over 8 yr ago, only recently has it been possible to make such inferences about its composition. Like many other low-mass planets, K2-167 b was first detected using the transit method and was validated several years ago but stellar activity prevented astronomers from measuring its mass until recently. To ensure the robustness of K2-167 b’s mass measurement, we performed an extensive RV analysis where we compared our CALM framework to traditional activity mitigation methods (H-α, s-index, FWHM, BIS, etc). Instead of using these indicators, our method exploits the activity-induced line shape changes in the spectra without requiring timing information like GP regression. We trace these shape changes using CCFs and find that this method either outperforms or produces comparable results to traditional methods.
The fact that different data-reduction software versions, which use different linelists, yield radial velocities with significantly distinct stellar activity characteristics suggests interesting future directions in terms of mitigating stellar activity. The effectiveness of spectra-based activity mitigation methods has already been shown to be highly dependent on linelists and McWilliam et al. (in preparation) recently found that using chromatic CCFs, which are CCFs that use only part of the spectral orders, add complementary information for stellar activity mitigation for the Sun. Thus, in the future, we plan to generate our our line profiles generated using primarily activity sensitive lines such that we can apply CALM to these profiles rather than to CCFs from the HARPS-N DRS, which may greatly increase the effectiveness of spectra-based activity mitigation techniques like CALM.
5.1 Prospects for using CALM on other stars with the new DRS
In the future, we plan to apply our CALM method to variety of others stars of different stellar types and different ages. Although the new DRS (2.3.5) RVs for K2-167 did not require our method to derive a mass a measurement, this is not the case for other targets. From preliminary analyses, we have found that CALM can be effective at predicting and mitigating stellar variability for several stars with data from the new DRS (2.3.5). In Fig. 14, we show two examples of the residual CCF observations for K2-2 (Thygeson at el. submitted), a K0-type star, and LTT 3790 (Cloutier et al. 2020), an M4-type star. Both these targets have planets confirmed with transits and we find that CALM can reduce the RV scatter while preserving and increasing the signal strength of the planetary reflex motion.

The mass/radius diagram for small exoplanets. Planet masses and radii come from the NASA Exoplanet Archive (Akeson et al. 2013), accessed on 2023 May 7. The darkness of each of the symbols is directly proportional to the precision of the mass and radii measurements. The coloured lines represent theoretical mass/radius relations for solid planets of various compositions from (Zeng, Sasselov & Jacobsen 2016) and for cold hydrogen planets from (Seager et al. 2007). The planets in our Solar system are plotted in blue and K2-167 b is plotted in purple with their corresponding uncertainties.
In addition to using CALM to model magnetic activity, we are hoping to develop methods to model granulation, which is especially relevant for stars with high-surface gravity. The granulation-induced RV variation is anticorrelated with flicker, which are low-level photometric variations induced by granulation (Bastien et al. 2013). Recently, Lakeland et al. (2023) found that granulation and supergranulation are some of the main remaining limiting factors in reaching tens of |$\rm cm\, s^{-1}$| level for solar observations. Thus, it is critical that we find ways to model and predict granulation and supergranulation, especially for solar-like stars.
5.2 Applying CALM to young planetary systems
Improved activity mitigation methods are especially interesting for helping to constrain the mechanisms involved in creating the radius valley. One of the best ways to distinguish the most commonly proposed mechanisms, photoevaporation, and core-powered mass-loss, is by investigating the time-scale on which mass-loss takes place. Since photoevaporation is expected to proceed on time-scales shorter than 100 million years, measuring the compositions of planets younger than this age is of strong interest. Unfortunately, stars this young typically have very high levels of stellar activity, making their precise characterization difficult and making them prone to overfitting using current state-of-the art techniques like GPs (Blunt et al. 2023). Applying CALM or similar techniques to these young stars could open the door to measuring more masses and densities for transiting planets orbiting bright young stars (i.e. EPIC 247 589 423 (Mann et al. 2018), HD 283 869 (Vanderburg et al. 2018), V1298 Tau (David et al. 2019), DS Tuc A (Benatti et al. 2019; Newton et al. 2019)) and provide insight into the time-scales of mass-loss for these planets.
5.3 Future further characterization of K2-167 b
In the future, additional RV observations of K2-167 b could allow us to use a larger portion of the average line spectrum and potentially improve our modelling of the stellar variability. If timed during transit, additional RV measurements could allow us to measure the spin-orbit angle using the Rossiter–McLaughlin (RM) effect (McLaughlin 1924; Rossiter 1924). We compute the expected RM RV deviation ΔRVRM using the approximation from Seager & Dotson (2010) that
For K2-167 b, the expected RM RV deviation is ΔRVRM ∼ 20.1 |$\rm m\, s^{-1}$| and should be measure-able with precision and high-precision spectographs. RM measurements could provide insight into the angular momentum history and thereby dynamical history of the K2-167 system.
For future atmospheric target selection studies, we calculated the Transmission Spectroscopy Metric (TSM; Kempton et al. 2018) and the Emission Spectroscopy Metric (ESM; Kempton et al. 2018) for K2-167 b. We find that TSM = 6.06 and ESM = 4.12, which means that K2-167 b is not an ideal target for transmission and a moderately favorable target for emission.
6 CONCLUSION
In this paper, we have developed a new spectral-based activity mitigation method called CALM and used it to characterize the K2-167 system, which has a planet at the edge of the radius valley. In-depth characterization of systems like this can be especially critical in probing the formation physics of the radius valley. In this paper, we have specifically obtained the following results:
(1) We performed a detailed RV analysis, where we compared CALM to various traditional activity indicators, evaluated various HARPS-N DRS versions, and tested a range of number of CCF activity indicators within the CALM framework. Across all of these analysis choices, we derive a consistent mass within uncertainty and find that the period agrees with the transit period of P = 9.978543 d. Our final CALM model derives a mass of Mp = |$6.3_{-1.4}^{+1.4}$| |$\, M_{\hbox{$\oplus $}}$|.
(2) We find that the effectiveness of CALM at reducing the RV scatter is highly dependent on the version of the HARPS-N DRS used, which suggests that this is highly dependent on the choice of linelists. In the future, custom linelists to compute CCFs may greatly increase the the effectiveness of spectra based techniques like CALM.
(3) We investigated our RV results in Fourier space using periodograms to understand which periodic signals were responsible for the observed decrease in RV scatter. Using CALM, we find that both stellar rotation signals and long-term magnetic activity signals decrease in magnitude while preserving the planet signal. This effect is most prominent in the old DRS (3.7). In the new DRS (2.3.5), we do not see the signal strength of activity signals signal decrease significantly, but the planet signal is preserved nonetheless. We expect that this difference is the result of the different linelists used in the different DRS versions.
(4) We performed a joint fit with the RVs and transit using exofastv2. We found consistent mass estimates and refined the radius to Rp = |$2.33^{+0.17}_{-0.15}$| |$\, R_{\hbox{$\oplus $}}$|, which places K2-167 b at the upper edge of the radius valley. We refined the stellar parameters using three spectroscopic parameter estimation methods, and computed the weighted average across these methods, which yields an effective temperature Teff, avg = 6083 ± 90 K, a surface gravity of log gcgs, avg = 4.10 ± 0.05, and an iron abundance [Fe/H]avg = −0.34 ± 0.09.
(5) We investigated the possibility of a secondary companion at a ∼22 d orbital period, but we found degeneracies in the possible periods and no significant detection. Further RV follow-up may help refine the period and confirm or rule out a possible companion.
(6) We introduced a new scatter metric called the unexplained scatter that accounts for both modelled stellar variability and modelled planetary signals to asses model performance. We introduce this metric since commonly used scatter metrics do not account for scatter originating from unmodelled planets and sometimes inadvertently incentivize removing planet signals rather than preserving them.
(7) We compared our CALM model to traditional activity indicators (s-index, H α, NaD, bisector of the CCF, FWHM of the CCF, contrast of the CCF) and find that our models outperform any combination of these indicators for the old DRS. Our CCF-based CALM model both reduces the unexplained RV scatter and results in more significant mass detections.
(8) Combining our mass and radius measurements, we place constraints on K2-167 b’s composition and find that it is less dense than would be expected for an earth-like composition. This means that it could have either (i) an earth-like iron core and mantle with a volatile envelope, or (ii) a much smaller iron core and further consist of water, methane, or other heavy volatiles.
(9) Furthermore, we demonstrate that although CALM only yields a more significant mass detection for the old DRS (3.7) and not the new DRS (2.3.5), this is not the case for several other stars (e.g. K-type stars K2-2, M-type star LTT 3790) where we do reduce the RV scatter and increase the mass detection significance using the new DRS.
In the future, continued development and broad deployment of stellar activity mitigation algorithms will have wide-ranging applications in exoplanet science. Methods like these could unlock the ability to measure masses of planets orbiting active stars spanning the HR diagram and pave the way towards precise characterization of larger exoplanet populations in the super-Earth and sub-Neptune regime.
ACKNOWLEDGEMENTS
We acknowledge helpful conversations and feedback from members of Dave Latham’s Coffee Club. We thank Dr Sai Ravela for fruitful discussions on the best ways to choose the model input parameters and suggesting future directions for refining our process. The HARPS-N project has been funded by the Prodex Programme of the Swiss Space Office (SSO), the Harvard University Origins of Life Initiative (HUOLI), the Scottish Universities Physics Alliance (SUPA), the University of Geneva, the Smithsonian Astrophysical Observatory (SAO), and the Italian National Astrophysical Institute (INAF), the University of St Andrews, Queen’s University Belfast, and the University of Edinburgh.
ZLD would like to thank the generous support of the MIT Presidential Fellowship, the MIT Collamore-Rogers Fellowship and to acknowledge that this material is based upon work supported by the National Science Foundation Graduate Research Fellowship under Grant Number 1745302. ZLD and AV acknowledge support from the TESS Guest Investigator Programme under NASA grant number 80NSSC19K0388. ZLD and AV acknowledge support from the D.17 Extreme Precision Radial Velocity Foundation Science programme under NASA grant number 80NSSC22K0848.
AMS would like to acknowledge that this work was supported by FCT – Fundação para a Ciência e a Tecnologia through national funds by these grants: UIDB/04434/2020; UIDP/04434/2020”. Funded/Co-funded by the European Union (ERC, FIERCE, 101052347). Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Research Council. Neither the European Union nor the granting authority can be held responsible for them. AMS acknowledges support from the Fundação para a Ciência e a Tecnologia (FCT) through the Fellowship 2020.05387.BD.
RDH is funded by the UK Science and Technology Facilities Council (STFC)’s Ernest Rutherford Fellowship (grant number ST/V004735/1).
This research has made use of the NASA Exoplanet Archive, which is operated by the California Institute of Technology, under contract with the National Aeronautics and Space Administration under the Exoplanet Exploration Programme.
This paper includes data collected by the Kepler mission and obtained from the MAST data archive at the Space Telescope Science Institute (STScI). Funding for the Kepler mission is provided by the NASA Science Mission Directorate. STScI is operated by the Association of Universities for Research in Astronomy, Inc., under NASA contract NAS 5–26555.
This paper includes data collected with the TESS mission, obtained from the MAST data archive at the Space Telescope Science Institute (STScI). Funding for the TESS mission is provided by the NASA Explorer Program. STScI is operated by the Association of Universities for Research in Astronomy, Inc., under NASA contract NAS 5–26555.
This research has made use of the VizieR catalogue access tool, CDS, Strasbourg, France (DOI : 10.26093/cds/vizier). The original description of the VizieR service was published in 2000, A&AS 143, 23.
DATA AVAILABILITY
All RVs and activity indicators used in our analyses will be available via VizieR at CDS. The TESS and K2 photometry data are accesible via the MASTs portal at https://mast.stsci.edu/portal/ Mashup/Clients/Mast/Portal.html.
Footnotes
All observations had 15 min integration times except for one 20 min exposure on 2016 September 26 and two 30-min observations on 2015 November 28 and 2016 July 9, respectively.
mast.stsci.edu/portal/Mashup/Clients/Mast/Portal.html
We note that we accounted for the way that radvel used ωp rather than ω* in its implementation (Householder & Weiss 2022) and only report ω* in this paper.
Here we report the remaining RV scatter after removing our modelled Keplerian signal. In this way, we report only the unmodelled variation in the RVs as further detailed in subsection 3.5.
We tested a range of input representation sizes as discussed in subsection 3.2.2 and ultimately find that 5 CCF activity indicators to balances the risk of overfitting and underfitting to the data for the 2.3.5 DRS. For the 3.7 DRS, we find that 25 CCF indexes balances these concerns.
We note that subtracting this median is purely for visualization purposes and not required for the regression algorithm.
The exact number is determined by optimizing the fit while preventing overfitting as described in subsection 3.4
the entire array of the CCF contains 161 values for the old DRS (3.7). This was adjusted to 49 in the new DRS (2.3.5) to correct for oversampling of the CCF.
Sometimes the convention is to not subtract the measurement errors from the instrument (σmeasurement) in calculating this scatter metric.
MOOG- 2017: http://www.as.utexas.edu/ chris/moog.html
References
APPENDIX
In this appendix, we include the RV measurements included in this analysis (Table A1). In addition, we provide a supplementary figure that visually explains the CCF shape diagnostics used to prevent overfitting in the RV analysis (Fig. A1. Lastly, we provide an MCMC corner plot for the two-planet RV fit where the the period of a hypothetical companion is poorly constrained (Fig. A2).

Understanding CCF Shape Diagnostics. (a) Simulated CCFs that are red-shifted (red), blue-shifted (blue), and the median CCF (grey) are plotted. These CCFs are not shifted to the centre. (b) The residual CCF (simulated CCFs minus the median CCF) are plotted and are equivalent to the derivatives of a Gaussian. (c) A red-shifted (red), blue-shifted (blue), and median CCF (grey) are plotted. However, in this case they are all centred at the median RV and we simulated some white Gaussian noise as a simple approximation to shape changes induced by stellar activity. (d) The residual CCF (simulated CCFs minus the median CCF) are plotted and do not resemble the derivatives of a Gaussian.

DE-MCMC corner plot for two planet fit. For K2-167 b the free parameters are semi-amplitude (Amplitude), period (P), Time of periastron passage (Tp). Eccentricity is frozen at e = 0 for this fit. The potential companion is modelled by a cosine curve and thus the free parameters are semi-amplitude (K2), period (P2), and phase (phi2). For P2, we can clearly see that the data can be consistent with multiple possible periods.
RV observations and estimated instrumental errors for K2-167 using the HARPS-N DRS 2.3.5. The full table is available online.
BJD . | RV [|$\rm m\, s^{-1}$|] . | RVerror [|$\rm m\, s^{-1}$|] . |
---|---|---|
2457618.62 | −17549.45 | 0.89 |
2457618.55 | −17548.22 | 1.17 |
2457270.54 | −17555.49 | 0.79 |
2457269.56 | −17555.69 | 1.2 |
2457574.74 | −17552.52 | 0.89 |
. | . | . |
. | . | . |
. | . | . |
2457565.71 | −17553.78 | 1.29 |
2457562.71 | −17551.56 | 1.2 |
2457322.45 | −17548.57 | 0.81 |
2457356.3 | −17548.43 | 1.41 |
2457354.36 | −17546.95 | 1.26 |
BJD . | RV [|$\rm m\, s^{-1}$|] . | RVerror [|$\rm m\, s^{-1}$|] . |
---|---|---|
2457618.62 | −17549.45 | 0.89 |
2457618.55 | −17548.22 | 1.17 |
2457270.54 | −17555.49 | 0.79 |
2457269.56 | −17555.69 | 1.2 |
2457574.74 | −17552.52 | 0.89 |
. | . | . |
. | . | . |
. | . | . |
2457565.71 | −17553.78 | 1.29 |
2457562.71 | −17551.56 | 1.2 |
2457322.45 | −17548.57 | 0.81 |
2457356.3 | −17548.43 | 1.41 |
2457354.36 | −17546.95 | 1.26 |
RV observations and estimated instrumental errors for K2-167 using the HARPS-N DRS 2.3.5. The full table is available online.
BJD . | RV [|$\rm m\, s^{-1}$|] . | RVerror [|$\rm m\, s^{-1}$|] . |
---|---|---|
2457618.62 | −17549.45 | 0.89 |
2457618.55 | −17548.22 | 1.17 |
2457270.54 | −17555.49 | 0.79 |
2457269.56 | −17555.69 | 1.2 |
2457574.74 | −17552.52 | 0.89 |
. | . | . |
. | . | . |
. | . | . |
2457565.71 | −17553.78 | 1.29 |
2457562.71 | −17551.56 | 1.2 |
2457322.45 | −17548.57 | 0.81 |
2457356.3 | −17548.43 | 1.41 |
2457354.36 | −17546.95 | 1.26 |
BJD . | RV [|$\rm m\, s^{-1}$|] . | RVerror [|$\rm m\, s^{-1}$|] . |
---|---|---|
2457618.62 | −17549.45 | 0.89 |
2457618.55 | −17548.22 | 1.17 |
2457270.54 | −17555.49 | 0.79 |
2457269.56 | −17555.69 | 1.2 |
2457574.74 | −17552.52 | 0.89 |
. | . | . |
. | . | . |
. | . | . |
2457565.71 | −17553.78 | 1.29 |
2457562.71 | −17551.56 | 1.2 |
2457322.45 | −17548.57 | 0.81 |
2457356.3 | −17548.43 | 1.41 |
2457354.36 | −17546.95 | 1.26 |
Author notes
Quad Fellow