-
PDF
- Split View
-
Views
-
Cite
Cite
Tara P A Tahseen, João M Mendonça, Kai Hou Yip, Ingo P Waldmann, Enhancing 3D planetary atmosphere simulations with a surrogate radiative transfer model, Monthly Notices of the Royal Astronomical Society, Volume 535, Issue 3, December 2024, Pages 2210–2227, https://doi.org/10.1093/mnras/stae2461
- Share Icon Share
ABSTRACT
This work introduces an approach to enhancing the computational efficiency of 3D atmospheric simulations by integrating a machine-learned surrogate model into the oasis global circulation model (GCM). Traditional GCMs, which are based on repeatedly numerically integrating physical equations governing atmospheric processes across a series of time-steps, are time-intensive, leading to compromises in spatial and temporal resolution of simulations. This research improves upon this limitation, enabling higher resolution simulations within practical time frames. Speeding up 3D simulations holds significant implications in multiple domains. First, it facilitates the integration of 3D models into exoplanet inference pipelines, allowing for robust characterization of exoplanets from a previously unseen wealth of data anticipated from JWST and post-JWST instruments. Secondly, acceleration of 3D models will enable higher resolution atmospheric simulations of Earth and Solar system planets, enabling more detailed insights into their atmospheric physics and chemistry. Our method replaces the radiative transfer module in oasis with a recurrent neural network-based model trained on simulation inputs and outputs. Radiative transfer is typically one of the slowest components of a GCM, thus providing the largest scope for overall model speed-up. The surrogate model was trained and tested on the specific test case of the Venusian atmosphere, to benchmark the utility of this approach in the case of non-terrestrial atmospheres. This approach yields promising results, with the surrogate-integrated GCM demonstrating above 99.0 per cent accuracy and factor of 147 speed-up of the entire simulation executed on one graphics processing unit (GPU) compared to using the matched original GCM under Venus-like conditions.
1 INTRODUCTION
3D atmospheric models, commonly known as global circulation models (GCMs), are key tools for studying the climates of Solar system planets including the Earth, and are becoming increasingly important in the characterization of exoplanet atmospheres. GCMs consist of multiple components, which individually model different atmospheric processes: each component numerically solves equations governing an atmospheric process across elements of a grid and across many time-steps until simulation convergence criteria are reached. Due to the large number of numerical integrations involved in GCM simulations, producing a simulated atmospheric state for a given input set of planetary and stellar parameters is incredibly computationally expensive and time-consuming.
GCMs are applied in exoplanet science in the forward modelling component of Bayesian atmospheric retrieval pipelines. To retrieve posterior distributions on planetary and stellar parameters from a single transit spectrum, the Bayesian retrieval framework requires of the order of tens of thousands of forward simulations corresponding to different samples of input parameter space. With current state-of-the-art 3D modelling techniques, the compute resources and time needed to produce the required number of 3D simulations prohibit statistically rigorous inference of data from the JWST (Gardner et al. 2006) and further next-generation observational instruments yet to come, namely the Ariel Space Telescope (Tinetti et al. 2022). Acceleration without compromising the accuracy of GCMs is thus one clear method of facilitating inference using JWST and post-JWST data in exoplanet science.
Reducing GCM simulation time would also incur benefits in climate science of Earth and other Solar system planets. Increased speed of computation would enable simulations to be run at greater resolution, and/or with more physics included, thus enabling more realistic climate simulations to be achieved.
The past few years have yielded work in both exoplanet science and Earth climate science to accelerate 3D models. Much of this work in exoplanet science involves extending 1D models with extra parameters to reflect certain 3D atmospheric variations deemed necessary to account for 3D effects in phase-curve data (Changeat & Al-Refaie 2020; Feng, Line & Fortney 2020; Irwin et al. 2020; Chubb & Min 2022; Nixon & Madhusudhan 2022; Himes, Harrington & Baydin 2023). This approach is prone to introducing biases to the simulated atmospheric states produced, and a fully 3D model parametrization is essential to ensuring that simulations are robust against both known and unknown biases resulting from model oversimplifications (Pluriel et al. 2020). Earth climate science, benefiting from a wealth of high-resolution observational measurements, has had a different set of methods employed, namely machine-learned surrogate models trained on such observations (Ukkonen 2022; Yao et al. 2023). Surrogate models have not been broadly explored outside of Earth climate science, but demonstrate the potential of speed-up without requiring an oversimplified model parametrization (Ukkonen 2022; Yao et al. 2023).
Of the components within GCMs, radiative transfer (RT) is often the slowest or least resolved process. In the oasis GCM (Mendonca et al. 2014), the RT component contributes to 50–99 per cent of the total simulation runtime (depending on the temporal resolution of the RT, see Section 4.3 for more details) for massive and complex atmospheres such as Venus (Mendonça & Buchhave 2020). There is thus key scope to substantially improve GCM computational efficiency by targeting the RT component specifically.
To assess the effectiveness of our new approach, we are testing it on Venus’s atmospheric conditions. Simulating the Venus atmosphere in 3D is very computationally intensive due to the need for a complex RT model to accurately represent the energy balance in the atmosphere (e.g. Eymet et al. 2009; Lee & Richardson 2012; Mendonca et al. 2015). Venus has a substantial amount of CO|$_2$|, which generates a strong greenhouse effect (Sagan 1962) and is covered by highly reflective sulphuric acid clouds, obscuring the planet’s surface. Only about 2.5 per cent of the incoming solar radiation reaches the surface (e.g. Tomasko et al. 1980; Mendonca et al. 2015). The radiation model also requires fine spectral resolution to capture the spectral windows impacting energy exchange between the deep atmosphere and the upper layers above the clouds. Additionally, due to the high thermal inertia of the massive CO|$_2$| atmosphere, the models need to be integrated over a long period to reach a statistically steady state (Mendonça & Read 2016). Using a surrogate model to represent the RT is key to enhancing the performance of 3D simulations while enabling a more realistic depiction of radiative processes at minimal cost. Therefore, Venus presents a significant challenge and serves as a benchmark for the complexities our new modelling approach may encounter in future applications.
2 DATA AND CODES
2.1 OASIS
oasis is a planetary climate model composed of different coupled modules representing physical and chemical processes within planetary atmospheres (Mendonça & Buchhave 2020). The mathematics and assumptions of the RT component of oasis are detailed in Mendonca et al. (2015).
The equations in oasis are discretized over concentric icosahedral spatial grids (Mendonça et al. 2016; Deitrick et al. 2020). For the 3D simulations in this study, we set the grid to approximately 2 degrees in the horizontal and 49 vertical layers. This grid configuration results in a total of 10 242 columns covering the entire model domain. Our simulations started from the converged state obtained in Mendonça & Buchhave (2020) and were integrated for 5 Venus solar days (approximately 117 Earth days) using a time-step of 15 s. The model configuration is similar to the simulations in Mendonça & Buchhave (2020), and in the following sections, we describe the main physical modules relevant to this work.
2.1.1 OASIS-RT
The RT module of oasis (henceforth referred to as OASIS-RT) models the interaction of radiation with gas and cloud species in the atmosphere in a two-stream manner. Radiation from two different sources are modelled separately: these are solar radiation (|$0.1\!\!-\!\!5.5 \ \mu \text{m}$|) and thermal radiation (|$1.7\!\!-\!\!260 \ \mu \text{m}$|). The solar radiation code utilizes the |$\delta$|-Eddington approximation (Joseph, Wiscombe & Weinman 1976) in combination with an adding-layer method (Liu & Weng 2006; Mendonca et al. 2015), while the thermal code considers absorptivity and emissivity (Mendonca et al. 2015). The radiation scheme uses the k-distribution method to represent the gas absorption cross-sections, which are integrated over 353 spectral bands and 20 Gaussian points (Mendonça & Buchhave 2020).
In the oasis code used for this project, the spatial distribution and radiative properties of the clouds, which are composed of sulphuric acid and water, were taken from Crisp (1986) and Mendonca et al. (2015). The main cloud deck is located between roughly |$45 \ \mathrm{ and} \ 65 \ \text{km}$| altitude with layers of sub-micron particles below and above (Knollenberg & Hunten 1980). More details on the cloud properties can be found in Mendonca et al. (2015).
The RT code involves two steps of computation for each time-step; these are
Computing the gas optics: computing the optical properties of layer boundaries from the thermodynamic variables.
Computing the flow of radiation: computing the upward- and downward-welling fluxes at each layer boundary, from the incident flux at the top of the column in combination with the optical properties at each layer boundary.
The model uses a k-distribution table (Lacis & Oinas 1991) to calculate the optical properties of each layer across the wavelength bands using pre-computed wavelength-dependent absorption coefficients for |$\text{CO}_2$|, |$\text{SO}_2$|, and |$\text{H}_2\text{O}$|. These coefficients are combined with a continuum absorption, mostly from |$\text{CO}_2$|–|$\text{CO}_2$| collisions, and Rayleigh scattering from |$\text{CO}_2$| and |$\text{N}_2$| (Mendonca et al. 2015).
The RT model takes inputs of the density |$\rho$|, pressure p, temperature T, and chemical composition per grid element as outputted from the dynamical core THOR1 (Mendonça et al. 2016). Heating rates are then calculated per grid element from fluxes outputted from the RT model, and these heating rates update the temperature profile of the atmosphere, which serves as input back into the dynamical core. Flux is used to compute the heating rate per layer according to equation (1).
where |$\mathrm{ d}F^{\mathrm{ net}}$| is the spectral-integrated net radiative flux (|$\text{W}\, \text{m}^{-2}$|), |$\rho$| is the atmospheric density (|$\text{kg} \, \text{m}^{-3}$|), and |$c_p$| is the specific heat capacity at constant pressure (900 |$\text{J}\, \text{kg}^{-1}\, \text{K}^{-1}$|).
In order to improve the computational efficiency of 3D simulations with RT, Venus GCMs traditionally do not update the radiative fluxes from the solar and thermal schemes at every time-step (Lebonnois, Sugimoto & Gilli 2016; Mendonça & Read 2016; Mendonça & Buchhave 2020). In the 3D simulations with explicit RT used in this study, the fluxes calculated from the solar radiation scheme were updated every 2880 steps, and the thermal radiation fluxes were updated every 320 steps. These values are adjusted by the model user and are specific to Venus’s simulations. Larger values could potentially cause model instabilities. Although this approach introduces some inaccuracies in the heating/cooling rates, the robustness of the simulation is not compromised because the composition of the atmosphere and clouds remains constant over time. With the efficiency of the new surrogate model described in the next section, we are now able to update radiative fluxes at every time-step. To enhance the stability of the model with the new surrogate model, we also apply the three-step Adams–Bashforth method to the heating rate calculated from the surrogate radiative fluxes.
2.2 Data
The data used for this project are 3D data simulated by oasis corresponding to input parameters of Venus (Mendonça & Buchhave 2020). The data used in this work are from a simulation of Venus using OASIS-RT. The data consist of 1000 time-steps of simulation, with the time interval between consecutive time-steps within the recorded data set, |$\Delta t$|, set to 12 h. These data correspond to time-steps of a full simulation for which the RT update has been executed. The data cover the entire icosahedral grid with dimensions 10 242 columns |$\times$| 49 layers.
At the sample level (per atmospheric column), the data comprise the following quantities: pressure p, temperature T, and gas density |$\rho$|, all defined per layer; upwelling short-wave flux |$F^{\text{SW}, \uparrow }$|, downwelling short-wave flux |$F^{\text{SW}, \downarrow }$|, upwelling long-wave flux |$F^{\text{LW}, \uparrow }$|, and downwelling long-wave flux |$F^{\text{LW}, \downarrow }$|, all defined per layer boundary; and cosine of the solar zenith angle |$\mu$|, short-wave surface albedo |$\alpha _{\text{SW}}$|, long-wave surface albedo |$\alpha _{\text{SW}}$|, and surface temperature |$T_0$|, all defined per column.
Fig. E1 and Fig. E2 illustrate the distribution of samples within the data set as a function of |$\cos \mu$| and surface temperature.
3 METHODS
3.1 Surrogate modelling
Surrogate models, or emulators, are approximate mathematical models that model outcomes of interest, whereby the emulator mechanism does not necessarily reflect the physical mechanism that produces the outcomes. Surrogate models are useful in cases where the physical mechanism producing such outcomes is not well understood, or in cases where modelling the physical mechanism is excessively computationally demanding; in the case of the latter, the aim is to produce a surrogate model that is computationally efficient compared to the physical model while maintaining accuracy. Machine learning provides a framework by which surrogate models can be produced: deep neural networks adhere to the Universal Approximation Theorem, and can (in theory) model any arbitrarily complex non-linear relationship, thus providing a function space whereby a suitable surrogate function is almost certain to exist (Goodfellow, Bengio & Courville 2016).
Research into surrogate modelling of exoplanetary atmospheres is relatively nascent (Himes et al. 2022, 2023; Unlu et al. 2023). The use of surrogate models within Earth climate science, though still new, is much more established and well explored (Ukkonen 2022; Mukkavilli et al. 2023; Yao et al. 2023). The development of surrogate models for atmospheric models of exoplanets can thus be informed by the development of such surrogate models for the parameter space of Earth.
An example where surrogate modelling has already proved valuable within 3D modelling of exoplanetary atmospheres is work by Schneider et al. (2024), who utilized DeepSets to achieve fast and accurate mixing of correlated-k opacities. Their work exemplifies how machine learning can be leveraged to successfully speed up a single process of a GCM while retaining accuracy, thus establishing a basis for further exploration into the integration of surrogates within GCM frameworks.
In this work, two surrogate models were produced for 3D modelling of Venus using oasis: one surrogate model to emulate the short-wave RT schema, and one to emulate the long-wave RT schema (see Section 2.1.1 for details on the computations for the two radiation schemas).
3.2 Data pre-processing
Data were pre-processed separately for the long-wave and short-wave regimes. Each model took input of two types of data: variables defined at the grid-element level, referred to as vector variables of dimension |$(n_{\text{columns}},) = (49,)$|, and variables defined at the column level, referred to as scalar variables.
The short-wave surrogate model took scalar inputs of surface temperature, |$T_0$|; gas density of the lowest altitude layer, |$\rho _\mathrm{ S}$|; pressure of the lowest altitude layer, |$p_\mathrm{ S}$|; cosine of the solar zenith angle |$\mu$|; and short-wave surface albedo |$\alpha _{\mathrm{ SW}}$|. The long-wave surrogate model took scalar inputs of surface temperature, |$T_0$|; gas density of the lowest altitude layer, |$\rho _\mathrm{ S}$|; pressure of the lowest altitude layer, |$p_\mathrm{ S}$|; and long-wave surface albedo |$\alpha _{\mathrm{ LW}}$|.
Both models took input of the same vector variables, which were as follows: temperature T; pressure p; and gas density |$\rho$|. Vector variables were scaled as
for |$x\in \lbrace T, p\rbrace$|, and
for |$x=\rho$|, for the ith column and jth atmospheric level.
Scalar variables were then re-scaled as
for |$x \in \lbrace T_0, p_0, \rho _0\rbrace$|, where |$S_{\text{train}}$| is the training set.
Targets were scaled as follows:
for |$y_j \in \lbrace F_{j}^{\text{LW}, \uparrow }, F_{j}^{\text{LW}, \downarrow } \rbrace$| and
for |$y_j \in \lbrace F_{j}^{\text{SW}, \uparrow }, F_{j}^{\text{SW}, \downarrow } \rbrace$|, for the jth altitude level (|$j \in [0,49]$| where 0 indexes the ground level and 49 indexes the top level of the atmospheric column), where |$A^{\text{LW}}$| and |$B^{\text{SW}}$| are scaling factors linear in |$T_0$| and |$\mu$|, respectively, and fitted from the data:
where |$(a_1, a_2)$| are constants fitted using |$y^{\text{LW}, \uparrow }_{0}$| and |$(b_1, b_2)$| are fitted using |$y^{\text{SW}, \downarrow }_{49}$| across all columns of the training set |$S_{\text{train}}$|. Values of |$(a_1, a_2, b_1, b_2)$| are retained for model prediction post-processing. Residuals between the targets |$y^{\text{LW}, \uparrow }_{0}$|, |$y^{\text{SW}, \downarrow }_{49}$|, and the respective scaling factors approximating the value of these targets, are displayed in Fig. 1. These simple linear scaling methods were chosen for data pre-processing in this work instead of using exact computations of |$y^{\text{LW}, \uparrow }_{0}$| and |$y^{\text{SW}, \downarrow }_{49}$| as the latter computations are much more involved, and the more simple computations produce results with an acceptably small marginal difference in the values of the fluxes.

The figures above display scatter plots illustrating the residuals between scaling factors |$A^{\text{LW}}(T_0)$| and |$B^{\text{SW}}(\mu)$| and their respective targets |$y^{\text{LW}, \uparrow }_{0}$| and |$y^{\text{SW}, \downarrow }_{49}$|, plotted over 1 epoch of 10 242 test columns covering the entire icosahedral grid. Left: This panel shows the residuals between the scaling factors |$B^{\text{SW}}(\mu _i)$| used for the short-wave targets of a given column i and the downward-welling short-wave flux at the top of the column |$F^{\text{SW}, \downarrow }_{i,49}$|. |$B^{\text{SW}}(\mu _i)$|, a linear function of the cosine of the solar zenith angle |$\mu _i$|, approximates |$F^{\text{SW}, \downarrow }_{i,49}$|. Right: This panel displays the residuals between the scaling factors |$A^{\text{LW}}(T_{i,0})$| used for the long-wave targets of a given column i and the upward-welling long-wave flux at the ground level |$F^{\text{LW}, \uparrow }_{i,0}$|. Here, |$A^{\text{LW}}(T_{i,0})$| approximates |$F^{\text{LW}, \uparrow }_{i,0}$| as a linear function of surface temperature |$T_{i,0}$|. Both: In both the long-wave and short-wave cases, the preferred quantities |$F^{\text{LW}, \uparrow }_{i,0}$| and |$F^{\text{SW}, \downarrow }_{i,49}$| to use for scaling flux profiles across atmospheric levels involve complex calculations. The figures demonstrate that simple linear functions |$A^{\text{LW}}(T_0)$| and |$B^{\text{SW}}(\mu)$| yield close approximations with low residuals, making them suitable scaling factors instead.
Columns across all epochs were shuffled and split into train, test, and validation data sets, in the ratio |$70\!\!:\!\!15\!\!:\!\!15$|.
3.3 Model architecture
Model architecture was chosen to be based on recurrent neural networks (RNNs),2 as RNNs structurally incorporate the spatial dependence of the training data, which fits naturally in this scenario. RNN layers were implemented in the form of gated recurrent units (GRUs; for further details on GRUs, see Cho et al. 2014). Simple RNNs are susceptible to short-term memory problems, whereby information propagated forwards diminishes quickly. A common implementation of RNNs is the GRU that utilizes a more sophisticated mechanism for propagating information ‘memory’ forwards through a sequence: this is the base of the model architecture we employ in this work. The specific architecture of the benchmark was chosen to be that used by Ukkonen (2022) (illustrated in Fig. 2), which utilized a bidirectional RNN-based architecture to create a two-stream RT emulator for Earth, trained using observational data. Ukkonen’s model performed with |$\le\!\! 0.5{{\ \rm per\, cent}}$| mean absolute error (MAE) for the upwelling and downwelling fluxes on the test set (Ukkonen 2022), suggesting its potential efficacy for developing surrogates trained on analogous simulated data. The models used in this work were constructed and trained using tensorflow version 2.12.0 (Abadi et al. 2016), and converted into Open Neural Network Exchange (ONNX) format for integration within oasis. Input data to the surrogate model are detailed in Table 1, and surrogate model parameters are detailed in Table C1 and Table C2.

Figure illustrating the architecture of the learned surrogate models used in this work. Surrogate models for both the long-wave and short-wave schemas took scaled vector inputs |$\boldsymbol{ X}_j = (\tilde{p}_j, \tilde{T}_j, \tilde{\rho }_j)$| for the jth layer where |$\tilde{p}_j, \tilde{T}_j,\ \mathrm{ and} \ \tilde{\rho }_j$| are scaled pressure, scaled temperature, and scaled density of the jth layer of the input atmospheric column, respectively (see Section 3.2 for details on scaling). The surrogate models took input of scalar inputs |$\boldsymbol{ \beta} $|, where |$\boldsymbol{ \beta}_{\mathrm{ SW}} = (\mu , p_0, T_0, \rho _0, \alpha _{\mathrm{ SW}})$| for the short-wave surrogate model, and |$\boldsymbol{ \beta}_{\mathrm{ LW}} = (p_0, T_0, \rho _0, \alpha _{\mathrm{ LW}})$| for the long-wave surrogate model, where |$\mu$| and |$\alpha$| denote the cosine of the solar zenith angle and surface albedo, respectively. Vector inputs are passed through a GRU layer starting from the top atmospheric layer (layer 48) and moving downwards to the ground layer (layer 0). Scalar inputs are concatenated with the output of the downward GRU, and inputted into a dense layer, which then serves as the first input to an upward-moving GRU layer. The outputs of the two GRU layers are then concatenated and passed through a dense layer, which produces the model outputs of shape |$(n_{\text{levels}}, n_{\text{outputs}}) = (50, 2)$|, corresponding to scaled upwards and downwards flux for each atmospheric level. Rectified linear unit (ReLU, Agarap 2019) activation functions were used for both GRU layers and |$\text{Dense}_1$|, while a sigmoid activation function was used for the |$\text{Dense}_{\text{out}}$| layer.
Surrogate model inputs: Surrogate models for both the long-wave and short-wave schemas took scaled vector inputs |$\boldsymbol{ X}_i = (\tilde{p}_i, \tilde{T}_i, \tilde{\rho }_i)$| for the ith layer where |$\tilde{p}_i, \tilde{T}_i,\ \mathrm{ and} \ \tilde{\rho }_i$| are scaled pressure, scaled temperature, and scaled density of the ith layer of the input atmospheric column, respectively (see Section 3.2 for details on scaling). The surrogate models took input of scalar inputs |$\boldsymbol{ \beta}$|, where |$\boldsymbol{ \beta}_{\mathrm{ SW}} = (\mu , p_0, T_0, \rho _0, \alpha _{\mathrm{ SW}})$| for the short-wave surrogate model, and |$\boldsymbol{ \beta}_{\mathrm{ LW}} = (p_0, T_0, \rho _0, \alpha _{\mathrm{ LW}})$| for the long-wave surrogate model, where |$\mu$| and |$\alpha$| represent the cosine of the solar zenith angle and surface albedo, respectively.
Surrogate schema . | Short-wave . | Long-wave . |
---|---|---|
Vector inputs | |$\boldsymbol{ X}_i = (\tilde{p}_i, \tilde{T}_i, \tilde{\rho }_i)$| | |$\boldsymbol{ X}_i = (\tilde{p}_i, \tilde{T}_i, \tilde{\rho }_i)$| |
Scalar inputs | |$\boldsymbol{ \beta}_{\mathrm{ SW}} = (\mu , p_0, T_0, \rho _0, \alpha _{\mathrm{ SW}})$| | |$\boldsymbol{ \beta}_{\mathrm{ LW}} = (p_0, T_0, \rho _0, \alpha _{\mathrm{ LW}})$| |
Surrogate schema . | Short-wave . | Long-wave . |
---|---|---|
Vector inputs | |$\boldsymbol{ X}_i = (\tilde{p}_i, \tilde{T}_i, \tilde{\rho }_i)$| | |$\boldsymbol{ X}_i = (\tilde{p}_i, \tilde{T}_i, \tilde{\rho }_i)$| |
Scalar inputs | |$\boldsymbol{ \beta}_{\mathrm{ SW}} = (\mu , p_0, T_0, \rho _0, \alpha _{\mathrm{ SW}})$| | |$\boldsymbol{ \beta}_{\mathrm{ LW}} = (p_0, T_0, \rho _0, \alpha _{\mathrm{ LW}})$| |
Surrogate model inputs: Surrogate models for both the long-wave and short-wave schemas took scaled vector inputs |$\boldsymbol{ X}_i = (\tilde{p}_i, \tilde{T}_i, \tilde{\rho }_i)$| for the ith layer where |$\tilde{p}_i, \tilde{T}_i,\ \mathrm{ and} \ \tilde{\rho }_i$| are scaled pressure, scaled temperature, and scaled density of the ith layer of the input atmospheric column, respectively (see Section 3.2 for details on scaling). The surrogate models took input of scalar inputs |$\boldsymbol{ \beta}$|, where |$\boldsymbol{ \beta}_{\mathrm{ SW}} = (\mu , p_0, T_0, \rho _0, \alpha _{\mathrm{ SW}})$| for the short-wave surrogate model, and |$\boldsymbol{ \beta}_{\mathrm{ LW}} = (p_0, T_0, \rho _0, \alpha _{\mathrm{ LW}})$| for the long-wave surrogate model, where |$\mu$| and |$\alpha$| represent the cosine of the solar zenith angle and surface albedo, respectively.
Surrogate schema . | Short-wave . | Long-wave . |
---|---|---|
Vector inputs | |$\boldsymbol{ X}_i = (\tilde{p}_i, \tilde{T}_i, \tilde{\rho }_i)$| | |$\boldsymbol{ X}_i = (\tilde{p}_i, \tilde{T}_i, \tilde{\rho }_i)$| |
Scalar inputs | |$\boldsymbol{ \beta}_{\mathrm{ SW}} = (\mu , p_0, T_0, \rho _0, \alpha _{\mathrm{ SW}})$| | |$\boldsymbol{ \beta}_{\mathrm{ LW}} = (p_0, T_0, \rho _0, \alpha _{\mathrm{ LW}})$| |
Surrogate schema . | Short-wave . | Long-wave . |
---|---|---|
Vector inputs | |$\boldsymbol{ X}_i = (\tilde{p}_i, \tilde{T}_i, \tilde{\rho }_i)$| | |$\boldsymbol{ X}_i = (\tilde{p}_i, \tilde{T}_i, \tilde{\rho }_i)$| |
Scalar inputs | |$\boldsymbol{ \beta}_{\mathrm{ SW}} = (\mu , p_0, T_0, \rho _0, \alpha _{\mathrm{ SW}})$| | |$\boldsymbol{ \beta}_{\mathrm{ LW}} = (p_0, T_0, \rho _0, \alpha _{\mathrm{ LW}})$| |
3.4 Model training
Both models were trained in a supervised, end-to-end fashion. An Adam (Kingma & Ba 2017) optimizer was used in combination with a cyclical learning rate. Models were trained using tensorflow on an nvidia A100 GPU.
3.4.1 Loss function
The loss function was constructed as a combination of the mean percentage error (MPE) between the predictions and targets, per output. The mean error (ME) for the ith test column and kth target variable is defined as follows:
where |$\hat{y}_{i,j,k}$| is the target for the ith test column, jth atmospheric level, and kth target variable, where |$k=0$| corresponds to downwelling flux and |$k=1$| corresponds to upwelling flux, and |$n_{\text{levels}}$| is the number of atmospheric levels (|$n_{\text{levels}}=50$| in this work). Normalization factors were defined as
such that MPE of the ith test column and kth target variable can be expressed as
The loss function per sample was then defined as
with the total loss defined as the sum over all test samples:
3.4.2 Hyperparameter tuning
Multiple models were trained corresponding to different hyperparameter values. Number of neurons of all RNN layers was varied across the range of values |$[16, 32, 64, 128]$| for both surrogate models. The best candidate models were chosen as having 128 neurons per RNN layer for the short-wave surrogate model, and 32 neurons per RNN layer for the long-wave surrogate model.
3.5 Performance analysis
Below, we detail the metrics used to analyse the performance of the surrogate models on the test set. In the results (Section 4), different aggregations of absolute error (equation 15 for raw model outputs and equation 14 for post-processed model outputs) are used to investigate the performance of both the long-wave and short-wave surrogate models.
where |$\hat{y}_{i,j,k}$| are the post-processed model predictions, and |$\tilde{y}_{i,j,k}$| are the unscaled target variables.
where |$\hat{u}_{i,j,k}$| are the raw model predictions, and |$\tilde{u}_{i,j,k}$| are target variables, which have been scaled to lie in the interval [0, 1] using the scaling methods detailed in Section 3.2.
Column-aggregated error quantities are defined as follows:
where |$n_{\text{levels}}$| is the number of atmospheric levels.
Error quantities averaged across samples per altitude level are defined as follows:
where N is the number of test samples.
Mean flux for the kth target variable is defined as
and the MAE for the kth target variable aggregated across all altitude levels is calculated as
4 RESULTS AND DISCUSSION
4.1 Model performance on test set
Table 2 summarizes the MAE (equation 21) across the four target variables, and relative to the mean values of these four variables, across the test set. These MAE values are in line with those achieved using similar surrogate modelling methods for RT within the Earth’s atmosphere, with Ukkonen (2022) quoting MAE for short-wave fluxes of around 1 per cent or less. For both the long-wave and short-wave regimes, the MAE is higher for the upwelling fluxes as compared to downwelling fluxes: this is expected as physical computation of the upwelling flux depends on the computation of the downwelling flux, thus meaning that this is a more complicated mapping to emulate. The percentage errors for the short-wave targets are a factor of 2–3 greater than those for the long-wave targets: this is to be expected as the magnitude of the short-wave targets is smaller, and the supervised learning task set for the short-wave model in this work is a more complex mapping from inputs to outputs as compared to that for the long-wave model.
Regime . | Stream . | Mean flux |$\bar{\boldsymbol{ F}}_k$| (|$\text{W}\, \text{m}^{-2}$|) . | |${\bf MAE}_k$| (|$\text{W}\, \text{m}^{-2}$|) . | |${\bf MAE}_k$|/|$\bar{\boldsymbol{ F}}_k$| (per cent) . |
---|---|---|---|---|
Long-wave | Upwelling | 4211.0 | 18.8 | 0.45 |
Downwelling | 4139.3 | 16.9 | 0.41 | |
Short-wave | Upwelling | 577.9 | 6.4 | 1.11 |
Downwelling | 707.8 | 7.7 | 1.09 |
Regime . | Stream . | Mean flux |$\bar{\boldsymbol{ F}}_k$| (|$\text{W}\, \text{m}^{-2}$|) . | |${\bf MAE}_k$| (|$\text{W}\, \text{m}^{-2}$|) . | |${\bf MAE}_k$|/|$\bar{\boldsymbol{ F}}_k$| (per cent) . |
---|---|---|---|---|
Long-wave | Upwelling | 4211.0 | 18.8 | 0.45 |
Downwelling | 4139.3 | 16.9 | 0.41 | |
Short-wave | Upwelling | 577.9 | 6.4 | 1.11 |
Downwelling | 707.8 | 7.7 | 1.09 |
Regime . | Stream . | Mean flux |$\bar{\boldsymbol{ F}}_k$| (|$\text{W}\, \text{m}^{-2}$|) . | |${\bf MAE}_k$| (|$\text{W}\, \text{m}^{-2}$|) . | |${\bf MAE}_k$|/|$\bar{\boldsymbol{ F}}_k$| (per cent) . |
---|---|---|---|---|
Long-wave | Upwelling | 4211.0 | 18.8 | 0.45 |
Downwelling | 4139.3 | 16.9 | 0.41 | |
Short-wave | Upwelling | 577.9 | 6.4 | 1.11 |
Downwelling | 707.8 | 7.7 | 1.09 |
Regime . | Stream . | Mean flux |$\bar{\boldsymbol{ F}}_k$| (|$\text{W}\, \text{m}^{-2}$|) . | |${\bf MAE}_k$| (|$\text{W}\, \text{m}^{-2}$|) . | |${\bf MAE}_k$|/|$\bar{\boldsymbol{ F}}_k$| (per cent) . |
---|---|---|---|---|
Long-wave | Upwelling | 4211.0 | 18.8 | 0.45 |
Downwelling | 4139.3 | 16.9 | 0.41 | |
Short-wave | Upwelling | 577.9 | 6.4 | 1.11 |
Downwelling | 707.8 | 7.7 | 1.09 |
In the following sections, we visualize and interrogate the variation of model prediction errors with altitude and with scalar variables: |$\cos \mu$| for both long-wave and short-wave model predictions, and surface temperature for long-wave model predictions only. For completeness, further plots of error as a function of the remaining input scalar variables (surface temperature, surface pressure, and surface gas density) are contained in Appendix F.
4.1.1 Short-wave surrogate model
Variation of error versus altitude: Fig. 3 displays the average absolute error of predictions at different altitude levels across test samples. Below around |$65\ \text{km}$|, the average error increases roughly in proportion to the average target fluxes. Above this altitude, the average error decreases for both target variables, even though the average values of these variables continue to rise with altitude. This change in the trend of the average error occurs around the top of the cloud deck.
![Plots A and B display the $\widehat{\text{MAE}}_{j,k}$ (defined in equation 19) for test samples that have column-aggregated absolute error ($\widehat{\text{CAE}}_{i,k}$, defined in equation 17) in the $[5, 95]$ percentile interval (that is, test samples with errors falling in the smallest 5 per cent and largest 5 per cent of the test set have been discarded from this aggregate statistic, in order to better illustrate typical model performance). Plots D and E display the $\text{MAE}_{j,k}$ (defined in equation 18) of surrogate model predictions across the two short-wave target variables, for test samples that lie within 1 angular degree of the substellar point. Plots G and H display the $\text{MAE}_{j,k}$ of surrogate model predictions across the two short-wave target variables, for test samples that lie within 0.1 angular degree of the day-night terminator. Plot C displays the scaled target short-wave flux profiles averaged across the test samples with $\widehat{\text{CAE}}_{i,k}$ within the chosen percentile interval. Plot F displays the target short-wave flux profiles averaged across samples that lie within 1 angular degree of the substellar point. Plot I displays the target short-wave flux profiles averaged across samples that lie within 1.1 angular degree of the terminator. The shaded regions in all plots represent the interval $[\max {(0,\theta _j-\sigma _j)},\, \, \theta _j+\sigma _j]$, where $\theta _j$ and $\sigma _j$ denote the mean and standard deviation of the plotted quantities for the jth altitude level.](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/mnras/535/3/10.1093_mnras_stae2461/2/m_stae2461fig3.jpeg?Expires=1749201288&Signature=qKz7~TgoYmVDWsJXBNS3BuNx4D1s7r99Q0Z86pxghchemwX3srGdyODhrxE-vvcJNxwlbCdxZOFsF0xdO0TBR~YI9j9AvNBOL25O4pY0CAI-RptgaVzIHTby2pBolT9ZEv77uKWXv1i49iJWVn8DlrHvFE2zgesxcxrfUBCeDf8u~cOeVCnsYAklFiE115WEqln6PIOG-~EHYalEIx5ujfNpV4Ato7Npx4M4W21y64R62vT7YneKngLQawAuDIwAFBH5FSE8b-rKVRn~kawQ2Tmcj7B83ZmjM0tJvkC-sqkXauFheu9PnoaPe2nHMSH1dmrg5kaMApEdPtQYzf3MEw__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Plots A and B display the |$\widehat{\text{MAE}}_{j,k}$| (defined in equation 19) for test samples that have column-aggregated absolute error (|$\widehat{\text{CAE}}_{i,k}$|, defined in equation 17) in the |$[5, 95]$| percentile interval (that is, test samples with errors falling in the smallest 5 per cent and largest 5 per cent of the test set have been discarded from this aggregate statistic, in order to better illustrate typical model performance). Plots D and E display the |$\text{MAE}_{j,k}$| (defined in equation 18) of surrogate model predictions across the two short-wave target variables, for test samples that lie within 1 angular degree of the substellar point. Plots G and H display the |$\text{MAE}_{j,k}$| of surrogate model predictions across the two short-wave target variables, for test samples that lie within 0.1 angular degree of the day-night terminator. Plot C displays the scaled target short-wave flux profiles averaged across the test samples with |$\widehat{\text{CAE}}_{i,k}$| within the chosen percentile interval. Plot F displays the target short-wave flux profiles averaged across samples that lie within 1 angular degree of the substellar point. Plot I displays the target short-wave flux profiles averaged across samples that lie within 1.1 angular degree of the terminator. The shaded regions in all plots represent the interval |$[\max {(0,\theta _j-\sigma _j)},\, \, \theta _j+\sigma _j]$|, where |$\theta _j$| and |$\sigma _j$| denote the mean and standard deviation of the plotted quantities for the jth altitude level.
Potential explanations as to why there is lower error in predicting targets above |$65\ \text{km}$| are as follows: above the top of the cloud deck, there is less complexity in mapping from input variables to output fluxes, and so this can be naively assumed to be an easier task to learn; there is also tighter variance in the target variables within the test set above this altitude threshold.
Figs 3(A) and (B) display MEs of less than 3 per cent across all altitude levels, averaged across test samples in the [5, 95] percentile interval of column-aggregated errors. This accuracy falls within an acceptably small margin of error.
Variation of error versus|$\cos \mu$|: Fig. 5 displays the distribution in test errors as a function of |$\cos \mu$| of the test column. Data in these plots have been binned to more simply display the spread of errors for a given |$\cos \mu$| interval. For values of |$\cos \mu$| close to 1, there is a narrower distribution of error, with test samples being predicted reliably more accurately as compared to test samples corresponding to lower values of |$\cos \mu$|.
This variation in test error distribution as a function of |$\cos \mu$| may be attributable to the approximations used during target pre-processing when training the model (see Section 3.2); if indeed this is the case, then there is scope to mitigate against this by refining data pre-processing methods when producing future iterations of this surrogate model. This error variation with |$\cos \mu$| may otherwise be due to the model not exactly capturing the complexities in how |$\cos \mu$| is used in mapping inputs to outputs, which may be an acceptable and necessary trade-off in using a surrogate model for the purposes of model speed-up.
4.1.2 Long-wave surrogate model
Variation of error versus altitude: Fig. 4 displays the absolute error of predictions averaged across altitude levels across test samples. A similar trend can be seen in these plots as compared to the trends described in Section 4.1.1: average magnitude of error roughly follows the same trend as average magnitude of the target variables, except for between |$40 \ \mathrm{ and} \ 70 \ \text{km}$| altitude (roughly in the interval of the main cloud deck) whereby the error rises significantly at the top of the cloud deck and decreases going deeper into the cloud deck, for both target variables.
![Plots A and B display the $\widehat{\text{MAE}}_{j,k}$ (defined in equation 19) and plots D and E display the $\text{MAE}_{j,k}$ (defined in equation 18) of surrogate model predictions across the two long-wave target variables, for test samples that have column-aggregated absolute error ($\widehat{\text{CAE}}_{i,k}$, $\text{CAE}_{i,k}$) in the $[5, 95]$ percentile interval (that is, test samples with errors falling in the smallest 5 per cent and largest 5 per cent of the test set have been discarded from this aggregate statistic, in order to better illustrate typical model performance). Plot C displays the scaled target long-wave flux profiles averaged across the test samples with $\widehat{\text{CAE}}_{i,k}$ within the chosen percentile interval. Plot F displays the target flux long-wave profiles averaged across the test samples with ${\text{CAE}}_{i,k}$ within the chosen percentile interval (not necessarily the same subset displayed in plot C). The shaded regions in all plots represent the interval $[\max {(0,\theta _j-\sigma _j)},\, \, \theta _j+\sigma _j]$ where $\theta _j$ and $\sigma _j$ denote the mean and standard deviation of the plotted quantities for the jth altitude level.](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/mnras/535/3/10.1093_mnras_stae2461/2/m_stae2461fig4.jpeg?Expires=1749201288&Signature=Y4egtI3cSnCL-qUPZBKt3E1F9-tZlZwL6O2kz-qTTvSMA7nvzOY3LO-O4jPw0YPg9VCgeXC2J7vd29uD7vC9JmZO4usH9AwvBRkZMEd3qxF5dCY5Rw6D-Shk0lJJsgv-MaPBsTPSii9RwgzahNKRoN~u9Sqxp1FoahPg4bfYuuRahnWQhdyTQjNYRWjbIJadqY9P6EzaoOaA~I-4VIx5yBAHlqnQbP8M7woLWQhjV9uj8~YDbqVl32cm-Pysu94h2~mYOs1lnm~6Hl5g0lqUb-7xH-tOZDOtpFeeTf1r~iY-rqUYmpSAAsyi4IqDB65WL~~epSEeIT6nmTlAFqCxUA__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Plots A and B display the |$\widehat{\text{MAE}}_{j,k}$| (defined in equation 19) and plots D and E display the |$\text{MAE}_{j,k}$| (defined in equation 18) of surrogate model predictions across the two long-wave target variables, for test samples that have column-aggregated absolute error (|$\widehat{\text{CAE}}_{i,k}$|, |$\text{CAE}_{i,k}$|) in the |$[5, 95]$| percentile interval (that is, test samples with errors falling in the smallest 5 per cent and largest 5 per cent of the test set have been discarded from this aggregate statistic, in order to better illustrate typical model performance). Plot C displays the scaled target long-wave flux profiles averaged across the test samples with |$\widehat{\text{CAE}}_{i,k}$| within the chosen percentile interval. Plot F displays the target flux long-wave profiles averaged across the test samples with |${\text{CAE}}_{i,k}$| within the chosen percentile interval (not necessarily the same subset displayed in plot C). The shaded regions in all plots represent the interval |$[\max {(0,\theta _j-\sigma _j)},\, \, \theta _j+\sigma _j]$| where |$\theta _j$| and |$\sigma _j$| denote the mean and standard deviation of the plotted quantities for the jth altitude level.
Below the cloud deck, it can be seen from Figs 4(A) and (B) that the ME in target predictions is less than 5 per cent of mean target magnitude, for the given percentile interval of test samples. Above the cloud deck, the long-wave fluxes tend to zero; so, though the percentage errors increase for increasing altitude, these accuracies still fall within the reasonable margin of error for the model.
Variation of error versus|$\cos \mu$|/surface temperature: Variations in test error versus surface temperature (displayed in Fig. 5) appear to follow the same trend as compared to residuals in the scaling factor used in pre-processing (displayed in Fig. 1). This is promising as it may indicate that scaling residuals are the limiting factor in model accuracy, and these scaling residuals were initially deemed as acceptably small for the purpose of this work. Considering variation in test error versus cosine of the solar zenith angle (|$\cos \mu$|) (also displayed in Fig. 5), there is not a discernible trend in the error variation across |$\cos \mu$|, though it appears that the error distribution is most narrow for |$\cos \mu$| approaching 1. This may reflect the variability in target flux profiles in both the train and test set for different |$\cos \mu$| bins, as can be seen plotted in Appendix G, whereby targets for higher values of |$\cos \mu$| (i.e. approaching the substellar point) are more constrained, and thus are intuitively easier to predict.

The above plots display the variation in |$\widehat{\text{CAE}}_{i,k}$| (as defined in equation 17) across the long-wave and short-wave schemas, as functions of the cosine of the solar zenith angle (|$\cos \mu$|) and of surface temperature |$T_0$| (for the long-wave schema only). The extent of the plots on the y-axis covers the full range of the error quantity on the y-axis of each plot, with white spaces corresponding to values falling in the lowest value bin. Top row: The above figure displays the error distribution of test samples per |$\cos \mu$| bin, across both short-wave target variables. Test samples have been binned into 20 bins of |$\cos \mu$|, into 20 bins of |$\widehat{\text{CAE}}_{i,k}$|. Bins have been normalized to percentages by the total number of test columns in each |$\cos \mu$| bin. Middle and bottom rows: The above figure displays the error distribution of test samples per |$\cos \mu$| bin, across both long-wave target variables as a function of |$\cos \mu$| (middle row) and |$T_0$| (bottom row). |$\cos \mu$|, |$T_0$|, and |$\widehat{\text{CAE}}_{i,k}$| have been divided linearly into 20 bins; the plots display the proportion of test samples falling into each error bin relative to total samples in a given |$\cos \mu$| or |$T_0$| bin.
4.2 Model performance in simulation
To evaluate the performance of our new surrogate model on 3D simulations of Venus’s atmosphere, we have run two oasis simulations: one using OASIS-RT for the RT scheme and the other using the new surrogate model presented in this work. In order to efficiently run the 3D simulation with OASIS-RT, as detailed in Section 2.1.1, we updated the radiative fluxes for solar radiation every 2880 steps, and the thermal radiation fluxes were updated every 320 steps. Both simulations run for 5 Venus solar days, each of which is approximately 117 Earth days. Fig. 6 displays the temperature of the simulation using OASIS-RT averaged over the last simulated Venus day in the left plot, and the percentage difference in this quantity between the two simulations in the right plot. Beneath the bottom of the cloud deck, the percentage difference between the time-averaged temperature profiles produced by the two simulations is below 1.5 per cent; in the interval of the cloud deck, below 2.7 per cent; and above the cloud deck, below 4.0 per cent. These deviations in the final temperature profiles fall within the range of uncertainties in the measurements. This is reasonable due to two main factors: first, there will be inherent discrepancies between simulated and real temperatures arising from assumptions made in the physical model, and secondly, deviations will also naturally arise between the physical model and real temperature profiles due to the spatial resolution of the simulation. Also, we expected differences to arise due to the frequency at which the radiative fluxes are updated. In the case of the surrogate model, it is possible to update them at every physical time-step. Fig. B1 displays the contrast of temperature profiles of simulations using OASIS-RT and the surrogate models, whereby the frequency of updating radiative fluxes in the latter matches that of the former (every 2880 steps for the short-wave regime and every 320 steps for the long-wave regime). There is negligible discrepancy between the final temperature profile of Venus as simulated using the surrogate-integrated schema for both frequencies of executing the RT update. A large difference between the plots was not expected because, in both cases, the RT updates are performed at intervals shorter than the radiative time-scale. This means that similar results would occur if the simulation were executed using full RT updates at every time-step, as compared to the simulation executed using RT updates at every 2880/320 time-steps for SW/LW, respectively. Despite this small difference in the resulting temperature profile corresponding to the different update frequencies, our future simulations will avoid using large step updates since these compromise the accuracy of the atmospheric physics, computed by the dynamical core, and would also compromise the future implementation of radiatively active clouds in the 3D simulations.

Left: The above figure displays the simulated atmospheric temperature profile (in units of K) of Venus averaged over the final Venus solar day of the simulation. The simulation was run using oasis and OASIS-RT for 5 Venus solar days (each Venus solar day is equivalent to approximately 117 Earth days). Right: The above figure displays the percentage difference in simulated atmospheric temperature profiles of Venus averaged across the final Venus day of the simulation, for two simulations using (1) OASIS-RT and (2) the surrogate models presented in this work, to model the RT. Both: In both plots, the main cloud deck extends from approximately |$10^2$| to |$2\times 10^3 \, \text{mbar}$|.
4.3 Simulation runtime
To benchmark the speed-up achieved using the surrogate RT schema, we run four simulations of the Venus atmosphere. All 4 of these simulations span 1000 time-steps, with each time-step corresponding to an increment of 15 s. Simulations were run on one NVIDIA Tesla V100 GPU. The simulations were as follows:
Simulation with no RT update at any time-step: 95 s runtime.
Simulation with RT executed in the first time-step, and then updated every 2880 time-steps (equivalent to a time-step of 12 h) for the short-wave regime, and every 320 time-steps (equivalent to a time-step of 1 h 20 min) for the long-wave regime: 195 s.
Simulation with RT updated every time-step: 27 046 s.
Simulation using the surrogate models to update the RT every time-step: 184 s.
Simulation (ii) uses the typical RT update frequency employed for using oasis to simulate Venus: this satisfies a trade-off between sufficient temporal resolution of the RT, which must be modelled with a resolution smaller than the radiative time constant |$\tau _\mathrm{ R}$| of the atmosphere (|$\tau _\mathrm{ R} \approx 43 \text{ h}$| at the altitude in the atmosphere where it takes its smallest value),3 and overall simulation runtime. For clarity, in simulation (ii), the short-wave RT computation is executed once and the long-wave RT computation is executed three times. As mentioned in Section 4.2, less frequent RT updates limit the accuracy of modelling the atmospheric physics, and so a higher frequency of RT updates is preferred. Simulation (iii), which updates the RT at every 15 s time-step, illustrates how increasing the frequency of RT updates enormously increases the simulation runtime; here, by a factor of 92 compared to simulation (ii) with less frequent RT updates, and by a factor of 285 compared to simulation (i) with no RT update at all.
Comparing simulation (iv) to simulation (ii) with infrequent RT updates, we see a speed-up of |$\sim\!\! 6{{\ \rm per\, cent}}$| while achieving a higher temporal resolution of RT. Comparing simulation (iv) to simulation (iii), we see a factor of |$147\times$| speed-up of the entire simulation runtime.
In addition to the reduction in simulation runtime, simulation (iv) is much more memory-efficient than simulations (i) and (ii), as the surrogate RT models do not require storing opacity cross-sections from the gas absorption or clouds. Furthermore, with the potential to update the radiative fluxes on every physical time-step now computationally feasible, our new approach allows for the inclusion of dynamical cloud feedback at a small computational cost. This addresses one of the main limitations of current 3D Venus atmospheric models.
4.4 Limitations
The surrogate models produced in this work do not take explicit input of the structure of cloud and gas absorber constituents of the atmosphere being modelled, except for the input of gas density, |$\rho _{i,j}$|. This means that the learning objective for surrogate models in this work is to approximate the mapping from input thermodynamic column variables to output columnar flux profiles, conditioned on a specific cloud and absorber structure. Consequently, this means that the models produced in this work are only applicable for planets corresponding to the planetary parameters and cloud and absorber structure specific to Venus. This is a limitation in terms of generalizability of the surrogate-integrated GCM to other types of atmosphere.
5 CONCLUSIONS
This work introduces a surrogate model approach to replacing numerical simulations of short-wave and long-wave computations in a two-stream RT model, aimed at accelerating the GCM, oasis. The results show a significant GCM speed-up by a factor of 147 GPU performance, with surrogate models for both long-wave and short-wave regimes achieving test set accuracies of approximately 99 per cent. Additionally, this approach replicates the temperature profile of the original Venus simulations averaged across a Venus solar day with differences of 4 per cent after 5 Venus solar days of simulation.
This work is significant in that it enables
|$\sim\!\! 150\times$| faster simulations of planets with massive atmospheres that require complex RT schemes, such as the Venus atmosphere.
Longer simulations with a much higher spatial resolution (|$\sim\!\! 10\times$|).
Improved representation of the temperature evolution of short-term physical phenomena in the atmosphere. These can be atmospheric waves with time-scales shorter than the period at which the radiative fluxes are updated in the simulation. In the case of our Venus simulations, we can measure the temperature change of atmospheric waves with time-scales |$<\!\! 12$| h.
A model free of model tuning to optimize performance, such as the frequency of how the radiative fluxes are updated.
The inclusion in 3D simulations of cloud dynamical feedback or higher order, more complex radiative schemes with a small extra computational cost.
This achievement of faster and/or higher spatial resolution atmospheric simulations will facilitate better insight into the nature of the atmosphere of Venus, as well as benchmarking the utility and applicability of such modelling techniques for use in exoplanet science.
ACKNOWLEDGEMENTS
We thank Dr Ahmed Al-Refaie, Nikita Pond, and Max Hart for helpful conversations on integrating tensorflow models within a c/c++ framework.
The authors acknowledge the use of the High Performance Computing facilities of University College London to carry out this work, specifically the Hypatia cluster. This work used computing equipment funded by the Research Capital Investment Fund (RCIF) provided by UKRI, and partially funded by the UCL Cosmoparticle Initiative. This research received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement no. 758892/ExoAI), and from the Science and Technology Facilities Council (STFC; grant no. ST/W00254X/1 and grant no. ST/W50788X/1). JMM acknowledges support from the Horizon Europe Guarantee Fund, grant EP/Z00330X/1.
DATA AVAILABILITY
The code for this project will be made publicly available at https://github.com/ttahseen/oasis-rt-surrogate on acceptance of this paper.
Footnotes
The model component that governs the resolved 3D fluid flow evolution.
See Appendix A for more details on neural networks and RNNs.
Radiative time constants for the Venus atmosphere can be seen in table 2 of Pollack & Young (1975).
Below, |$\bigcirc _{k=0}^{i} (a_k \circ M_k)$| represents the composition of functions |$(a_k \circ M_k)$| from |$k = 0$| to |$k = i$|.
REFERENCES
APPENDIX A: MODEL ARCHITECTURE
A1 Feedforward neural networks
A neural network is a type of function f that, given an input feature vector |$\mathcal {X}$|, can be implemented as combination of matrix transformations, |$\lbrace \mathcal {M}_k \in \mathbb {R}^{m_k \times n_k}\rbrace$|, and non-linear transformations, |$\lbrace \alpha _k\rbrace$|, where each |$\alpha _k$| is a dimension-preserving mapping, i.e. |$\alpha _k: \mathbb {R}^{m \times n} \rightarrow \mathbb {R}^{m \times n}$|. A simple feedforward neural network implements the transformation4
The matrices |$\lbrace \mathcal {M}_k\rbrace$| are fitted as an optimization task of a chosen objective function |$\mathcal {L}$| given a set of examples |$\mathcal {S} = \lbrace \hat{\mathcal {Y}}, \hat{\mathcal {X}}\rbrace$| to constrain the network to approximate some given mapping |$\tilde{f}$| captured by |$\mathcal {S}$|. Neural networks as described above are universal function approximators, and so constitute an appropriate function space within which to seek an approximate function f for the target function |$\tilde{f}$|. The matrices |$\lbrace \mathcal {M}_k\rbrace$| are often referred to as network weights or network parameters; the dimension of each |$\mathcal {M}_k$| is free in one dimension [except for |$\mathcal {M}_i$| for which the dimension is fully constrained by the dimension of |$\mathcal {Y}$| and the dimension of |$\left(\bigcirc _{k=0}^{i-1} (\alpha _k \circ \mathcal {M}_k) \right)(\mathcal {X})$|]. This free dimension |$m_k$| of each |$\mathcal {M}_k$| we can ascribe as the number of neurons of the layer|$\mathcal {M}_k$|, to be consistent with machine learning terminology. For further details about the mathematics and implementation of neural networks, see chapters 3 and 4 of Prince (2023).
A2 Recurrent neural networks
RNNs are function spaces defined using similar concepts to simple feedforward neural networks, designed to incorporate the structural dependence of sequential input feature vectors. Given a set of input feature vectors |$\lbrace X_k\rbrace \, \forall \, k \in [0, N]$|, with some relation between consecutive feature vectors |$X_k$| and |$X_{k+1}$|, a simple RNN operates as follows:
where |$r_k$| is defined as the hidden state of the network for the kth input. The above formulation of the simple recurrent network illustrates how information from preceding input feature vectors contributes to the function output corresponding to the kth input.
In practice, the RNN implementation is more complicated than the two equations above, but these equations capture the core operation of the RNN. For an in-depth treatment of the mathematics and implementation pragmatics of RNNs, see Géron (2019).
APPENDIX B: RESULTS
Below, we display the contrast of temperature profiles of simulations using OASIS-RT and the surrogate models, whereby the frequency of updating radiative fluxes in the latter matches that of the former (every 2880 steps for the short-wave regime and every 320 steps for the long-wave regime). There is negligible discrepancy between the final temperature profile of Venus as simulated using the surrogate-integrated schema for both frequencies of executing the RT update.

Both: The above figures display the percentage difference in simulated atmospheric temperature profiles of Venus averaged across the final Venus day of the simulation, for two simulations using (1) OASIS-RT and (2) the surrogate models presented in this work, to model the RT. Left: This plot was generated by updating the short-wave radiative fluxes at every 2880 steps, and the long-wave radiative fluxes at every 320 steps, for the simulation with the surrogate models. Right: This plot was generated by updating the short-wave radiative fluxes and long-wave radiative fluxes at every time-step, for the simulation with the surrogate models.
APPENDIX C: MODEL PARAMETERS
The tables below display the number of model parameters per layer of the two surrogate models produced in this work.
C1 Short-wave surrogate model
C2 Long-wave surrogate model
This table displays the number of parameters across model layers for the short-wave surrogate model, as well as the output shapes of each layer. n denotes the number of atmospheric columns passed as input to the model; layers correspond to those displayed in Fig. 2.
Layer . | Output shape . | Number of parameters . |
---|---|---|
Main inputs | [(n, 49, 3)] | 0 |
Auxiliary inputs | [(n, 5)] | 0 |
|$\text{GRU}_{\downarrow }$| | [(n, 49, 128), (n, 128)] | 51 072 |
|$\text{Dense}_{1}$| | [(n, 128)] | 17 152 |
|$\text{GRU}_{\uparrow }$| | [(n, 50, 128)] | 99 072 |
|$\text{Dense}_{\text{out}}$| | [(n, 50, 2)] | 514 |
Total number of model parameters: | 167 810 |
Layer . | Output shape . | Number of parameters . |
---|---|---|
Main inputs | [(n, 49, 3)] | 0 |
Auxiliary inputs | [(n, 5)] | 0 |
|$\text{GRU}_{\downarrow }$| | [(n, 49, 128), (n, 128)] | 51 072 |
|$\text{Dense}_{1}$| | [(n, 128)] | 17 152 |
|$\text{GRU}_{\uparrow }$| | [(n, 50, 128)] | 99 072 |
|$\text{Dense}_{\text{out}}$| | [(n, 50, 2)] | 514 |
Total number of model parameters: | 167 810 |
This table displays the number of parameters across model layers for the short-wave surrogate model, as well as the output shapes of each layer. n denotes the number of atmospheric columns passed as input to the model; layers correspond to those displayed in Fig. 2.
Layer . | Output shape . | Number of parameters . |
---|---|---|
Main inputs | [(n, 49, 3)] | 0 |
Auxiliary inputs | [(n, 5)] | 0 |
|$\text{GRU}_{\downarrow }$| | [(n, 49, 128), (n, 128)] | 51 072 |
|$\text{Dense}_{1}$| | [(n, 128)] | 17 152 |
|$\text{GRU}_{\uparrow }$| | [(n, 50, 128)] | 99 072 |
|$\text{Dense}_{\text{out}}$| | [(n, 50, 2)] | 514 |
Total number of model parameters: | 167 810 |
Layer . | Output shape . | Number of parameters . |
---|---|---|
Main inputs | [(n, 49, 3)] | 0 |
Auxiliary inputs | [(n, 5)] | 0 |
|$\text{GRU}_{\downarrow }$| | [(n, 49, 128), (n, 128)] | 51 072 |
|$\text{Dense}_{1}$| | [(n, 128)] | 17 152 |
|$\text{GRU}_{\uparrow }$| | [(n, 50, 128)] | 99 072 |
|$\text{Dense}_{\text{out}}$| | [(n, 50, 2)] | 514 |
Total number of model parameters: | 167 810 |
This table displays the number of parameters across model layers for the long-wave surrogate model, as well as the output shapes of each layer. n denotes the number of atmospheric columns passed as input to the model; layers correspond to those displayed in Fig. 2.
Layer . | Output shape . | Number of parameters . |
---|---|---|
Main inputs | [(n, 49, 3)] | 0 |
Auxiliary inputs | [(n, 4)] | 0 |
|$\text{GRU}_{\downarrow }$| | [(n, 49, 32), (n, 32)] | 3552 |
|$\text{Dense}_{1}$| | [(n, 32)] | 1184 |
|$\text{GRU}_{\uparrow }$| | [(n, 50, 32)] | 6336 |
|$\text{Dense}_{\text{out}}$| | [(n, 50, 2)] | 130 |
Total number of model parameters: | 11 202 |
Layer . | Output shape . | Number of parameters . |
---|---|---|
Main inputs | [(n, 49, 3)] | 0 |
Auxiliary inputs | [(n, 4)] | 0 |
|$\text{GRU}_{\downarrow }$| | [(n, 49, 32), (n, 32)] | 3552 |
|$\text{Dense}_{1}$| | [(n, 32)] | 1184 |
|$\text{GRU}_{\uparrow }$| | [(n, 50, 32)] | 6336 |
|$\text{Dense}_{\text{out}}$| | [(n, 50, 2)] | 130 |
Total number of model parameters: | 11 202 |
This table displays the number of parameters across model layers for the long-wave surrogate model, as well as the output shapes of each layer. n denotes the number of atmospheric columns passed as input to the model; layers correspond to those displayed in Fig. 2.
Layer . | Output shape . | Number of parameters . |
---|---|---|
Main inputs | [(n, 49, 3)] | 0 |
Auxiliary inputs | [(n, 4)] | 0 |
|$\text{GRU}_{\downarrow }$| | [(n, 49, 32), (n, 32)] | 3552 |
|$\text{Dense}_{1}$| | [(n, 32)] | 1184 |
|$\text{GRU}_{\uparrow }$| | [(n, 50, 32)] | 6336 |
|$\text{Dense}_{\text{out}}$| | [(n, 50, 2)] | 130 |
Total number of model parameters: | 11 202 |
Layer . | Output shape . | Number of parameters . |
---|---|---|
Main inputs | [(n, 49, 3)] | 0 |
Auxiliary inputs | [(n, 4)] | 0 |
|$\text{GRU}_{\downarrow }$| | [(n, 49, 32), (n, 32)] | 3552 |
|$\text{Dense}_{1}$| | [(n, 32)] | 1184 |
|$\text{GRU}_{\uparrow }$| | [(n, 50, 32)] | 6336 |
|$\text{Dense}_{\text{out}}$| | [(n, 50, 2)] | 130 |
Total number of model parameters: | 11 202 |
APPENDIX D: FURTHER AGGREGATE STATISTICS OF MODEL TEST SET PERFORMANCE
The figures included in this section display the |$\widehat{\text{MAE}}_{j,k}$| (defined in equation 19) of the entire test set for the long-wave (Fig. D1) and short-wave (Fig. D2) schemas. These figures are included for completeness, and are analogous to Figs 3(A)–(C) and 4(A)–(C), respectively, but for the entire test sets instead of the [5, 95] percentile interval subset of the test sets.

The above plots display the |$\widehat{\text{MAE}}_{j,k}$| (defined in equation 19) of the long-wave surrogate model predictions. Plot C displays the target scaled flux profiles averaged across the test set. Plots A and B display the MAE of the scaled predictions and targets across the test set.

The above plots display the |$\widehat{\text{MAE}}_{j,k}$| (defined in equation 19) of the short-wave surrogate model predictions. Plot C displays the target scaled flux profiles averaged across the test set. Plots A and B display the MAE of the scaled predictions and targets across the test set.
APPENDIX E: DISTRIBUTION OF SAMPLES
The plots in this section display the distribution of the entire data set across surface temperature and cosine of the solar zenith angle.

Top row: The figure displays the distributions of surface temperature (left tile) and cosine of the solar zenith angle, |$\cos \mu$| (right tile), across the data set used for this work. Bottom row: The figure displays the distributions of surface temperature across samples falling within the dayside (left tile) and nightside (right tile).

The figure displays the co-variation of surface temperature and |$\cos \mu$| across dayside samples within the data set.
APPENDIX F: DISTRIBUTION OF ERRORS WITH SCALAR PARAMETERS
The plots in this section are analogous to those displayed in Fig. 5 and display the variation in |$\widehat{\text{CAE}}_{i,k}$| (as defined in equation 17) of short-wave (Fig. F1) and long-wave model predictions (Fig. F2) versus scalar model variables. These figures are included for completeness.

The figure displays variation in |$\widehat{\text{CAE}}_{i,k}$| (as defined in equation 17) of short-wave model predictions as a function of surface temperature (top row), surface pressure (middle row), and surface gas density (bottom row). The extent of the plots on the y-axis covers the full range of the error quantity on the y-axis of each plot, with white spaces corresponding to values falling in the lowest value bin. Bins have been normalized to percentages by the total number of test columns in each x-axis bin.

The figure displays variation in |$\widehat{\text{CAE}}_{i,k}$| (as defined in equation 17) of long-wave model predictions as a function of surface pressure (top row) and surface gas density (bottom row). The extent of the plots on the y-axis covers the full range of the error quantity on the y-axis of each plot, with white spaces corresponding to values falling in the lowest value bin. Bins have been normalized to percentages by the total number of test columns in each x-axis bin.
APPENDIX G: VARIATION OF ERROR VERSUS cos |$\mu$| IN THE SHORT-WAVE REGIME
To investigate the amount of variability in targets versus |$\cos \mu$|, we define the following quantity:
and plot this quantity for the test set, displayed in Fig. G1. The plots displayed in Fig. G1 illustrate a lower level of variation in the flux–altitude profiles with higher values of |$\cos \mu$|, and for the bin with the lowest values of |$\cos \mu$| (i.e. at the terminator).

The above figure shows the distribution of column-aggregated error |$\sigma _{i, k}$| (computed according to equation G1) per |$\cos \mu$| bin per short-wave target for both the train set (top row) and test set (bottom row). These figures illustrate a lower level of variation in the flux–altitude profiles with higher values of |$\cos \mu$|, and for the bin with the lowest values of |$\cos \mu$| (i.e. at the terminator).