-
PDF
- Split View
-
Views
-
Cite
Cite
Cyrille Ahmed Midingoyi, Christophe Pradal, Ioannis N Athanasiadis, Marcello Donatelli, Andreas Enders, Davide Fumagalli, Frédérick Garcia, Dean Holzworth, Gerrit Hoogenboom, Cheryl Porter, Hélène Raynal, Peter Thorburn, Pierre Martre, Reuse of process-based models: automatic transformation into many programming languages and simulation platforms, in silico Plants, Volume 2, Issue 1, 2020, diaa007, https://doi.org/10.1093/insilicoplants/diaa007
- Share Icon Share
Abstract
The diversity of plant and crop process-based modelling platforms in terms of implementation language, software design and architectural constraints limits the reusability of the model components outside the platform in which they were originally developed, making model reuse a persistent issue. To facilitate the intercomparison and improvement of process-based models and the exchange of model components, several groups in the field joined to create the Agricultural Model Exchange Initiative (AMEI). Agricultural Model Exchange Initiative proposes a centralized framework for exchanging and reusing model components. It provides a modular and declarative approach to describe the specification of unit models and their composition. A model algorithm is associated with each model specification, which implements its mathematical behaviour. This paper focuses on the expression of the model algorithm independently of the platform specificities, and how the model algorithm can be seamlessly integrated into different platforms. We define CyML, a Cython-derived language with minimum specifications to implement model component algorithms. We also propose CyMLT, an extensible source-to-source transformation system that transforms CyML source code into different target languages such as Fortran, C#, C++, Java and Python, and into different programming paradigms. CyMLT is also able to generate model components to target modelling platforms such as DSSAT, BioMA, Record, SIMPLACE and OpenAlea. We demonstrate our reuse approach with a simple unit model and the capacity to extend CyMLT with other languages and platforms. The approach we present here will help to improve the reproducibility, exchange and reuse of process-based models.
1. INTRODUCTION
Process-based crop models (PBMs) are increasingly developed for a wide range of applications and research purposes. Even though there are key biophysical processes in PBM such as phenology, soil water balance or biomass production, their modelling differs from one model to another according to the biological details, influenced by the availability of input data and final use of the model. The choice of modelling approaches to represent processes and combine them is also one of the main reasons which led to the development of multiple PBM to simulate the same crops (Jones et al. 2017). They have often been written repeatedly in several different languages with different software architectures. For example, the WOFOST model is implemented in Fortran in the WOFOST Control Centre (WCC) package, in Python in the Python Crop Simulation Environment framework, in Java in the Wageningen Integrated Systems Simulator framework (WISS), in C# in the Biophysical Models Application (BioMA) framework and in C++ in the Crop Growth Monitoring System (CGMS) (de Wit et al. 2019; van Kraalingen et al. 2020).
The diversity of PBM has motived the development of different initiatives that intend to compare their performance and improve them by integrating new scientific knowledge to target the next generation of crop models (Rosenzweig et al. 2013; Bindi et al. 2015). Process-based crop model intercomparison studies (Palosuo et al. 2011; Rötter et al. 2011; Asseng et al. 2013; Aslam et al. 2017) have pointed out the variability in model outputs but often without quantifying the sources of uncertainty or analysing the processes involved. These studies showed the potential and limits of PBM and highlighted the need to evaluate them at the process level, but also to exchange model parts (components) between models (Donatelli et al. 2014; Muller and Martre 2019). Process-based crop models are increasingly implemented as autonomous components describing each biophysical process. However, there is currently little exchange and reuse of PBM components between modelling groups despite theoretical and application interests (Holzworth et al. 2014a). The main limitation comes from compatibility issues between PBM platforms (frameworks) resulting from differences in programming languages that are used and their specificities.
The modelling frameworks used in agricultural modelling depend on the programming language in which they have been implemented, the software design and code conventions they use. For example, the crop modelling frameworks APSIM Next Generation (Holzworth et al. 2018) and BioMA (Donatelli et al. 2010) are based on component-oriented techniques and require models to be developed in C#. DSSAT (Jones et al. 2003; Hoogenboom et al. 2019) and STICS (Brisson et al. 1998) provide generic crop modules in Fortran with a procedural approach that can be specialized for different species. Simplace (Enders et al. 2010) uses the Java language, while Record (Bergez et al. 2016) uses C++; both require that their components share a built-in interface. Therefore, model components can be reused in a given platform but their reuse in other platforms remains difficult. Existing solutions that couple models written in different languages are rather technical (generation of wrappers) or low level (reading and writing in files). We propose here an abstraction, a sharing language and a transformation system, based on the scientific content of the model, i.e. its algorithms. Multilanguage and integrated modelling frameworks like OpenAlea (Pradal et al. 2008, 2015) and yggdrasil (Lang 2019) offer a language binding approach to provide third-party developers with a choice of languages (Villa 2001; Lang 2019). Therefore, they overcome the difficulty of implementing algorithms efficiently in high-level languages. However, they do not provide a solution to the reuse or exchange of models between frameworks. In these platforms, models are reused as black boxes and the integrated models, therefore, lack the required transparency. Moreover, this approach requires knowledge of the frameworks they integrate and the deployment of the core of each framework. Domain-specific programming languages that are agnostic to a specific programming language have also been proposed as a solution to the problem (Athanasiadis and Villa 2013; Villa et al. 2017) aiming to support interoperability with rich semantics.
To facilitate PBM component exchange, several groups in the field have joined forces to create the Agricultural Model Exchange Initiative (AMEI; Martre et al. 2018). Agricultural Model Exchange Initiative brings together some of the most widely used crop modelling and simulation platforms, including APSIM, BioMA, DSSAT, OpenAlea, RECORD, Simplace and other crop models such as STICS and SiriusQuality (Martre et al. 2006) The vision of AMEI is to (i) increase capabilities and responsiveness to model developers’ needs; (ii) use modular modelling to share knowledge and rapidly develop operational tools; (iii) reuse model parts to leverage the expertise of third parties; (iv) renovate legacy code; and (v) realize the benefit of sharing and complementing different expertise.
Based on a declarative modelling approach (Athanasiadis et al. 2011), AMEI proposes a centralized framework (Crop2ML; Midingoyi et al. 2020) to exchange and reuse model components. Crop2ML provides a meta-language based on shared concepts between crop simulation platforms to describe specifications of model components and compositions. A model algorithm describes the behaviour of the component in terms of the sequence of inputs, successive rules or actions, conditions or a flow of instructions from inputs to outputs including mathematical expressions. A model algorithm is associated with each model specification. After a modeller has represented the specifications of its model, two relevant questions remain to be answered: (i) How can a model algorithm be described independently of the platform specificities? (ii) How can it be seamlessly integrated into existing simulation platforms?
Similar approaches have been used in the Systems Biology community where several domain-specific modelling standard languages including SBML, CellML and NeuroML have been designed to exchange and store models (Cuellar et al. 2006; Gleeson et al. 2010; Hucka et al. 2015). These XML-based languages provide specific elements to describe model structure and equations using Mathematical Markup Language (MathML; Ausbrooks et al. 2003) that describes mathematical notations and captures both its structure and content. However, these languages are limited to specific formalisms (e.g. chemical reactions, differential equations) and cannot be easily extended to represent crop models in their full complexity and diversity. System Biology languages support model transformation from one standard to another (e.g. from CellML to SBML; Schilstra et al. 2006) and from XML to executable code. In contrast, Crop2ML provides models as components that can be integrated into simulation platforms. Therefore, our design choice was to introduce a general programming language to represent complex control flow such as loops or conditions statements.
In this paper, we present CyML, a Cython-derived language (Behnel et al., 2011) with minimum meta-specifications to implement algorithms of Crop2ML models. This language allows encoding the model algorithm independently of any crop modelling platform and implementation language. We also propose CyMLT, a source-to-source transformation system. This one-to-many transpiler transforms CyML source code into different target languages such as Fortran, C#, C++, Java and Python. CyMLT is also able to directly generate components to target modelling platforms such as DSSAT, BioMA, Record, SIMPLACE and OpenAlea. Differences between platforms are not only due to the languages used to implement models but also to the software architectural design choices and modelling conventions. For instance, model components in Plant Modelling Framework (PMF) (APSIM next generation) and BioMA are written in C# in both platforms but the reuse of PMF components in BioMA (and vice versa) can only be done at the level of binaries, and, therefore, as black boxes. CyMLT takes into account platform requirements to generate model components that are compliant with existing platforms. Source-to-source transformation is a well-established solution used to address software reuse issues (Plaisted 2013; Fernique and Pradal 2017). It transforms source code from a high-level language to another one. However, to the best of our knowledge, no solution exists that targets PBM component reuse using automated source-to-source transformation. In this paper we present this issue by focusing on code reuse and reproducibility to enhance collaboration between crop modellers and to facilitate model coding for non-programmers, while keeping the transparency of model constructs.
Different source-to-source transformation systems are available for different purposes, both commercial (e.g. Baxter et al. 2004) and open source (Quan and Hui 2011). Some lessons can be learned from these approaches. Many source-to-source transformation systems take as input a subset of one language and transform it to a single target language with specific transformation purposes without showing their extensibility (Akeret et al. 2015; Bysiek et al. 2017; Misse-chanabier et al. 2019). Few one-to-many (Plaisted 2013; Schaub and Malloy 2016) and many-to-many (Baxter et al. 2004) solutions have been proposed. They usually define a subset of language features and are based on a common intermediate representation of the languages provided from their similarities. However, they do not consider transformation between different programming paradigms. For instance, to our knowledge, there is no system that transpiles from a procedural algorithm to both a procedural and an object-oriented programme. To avoid losing assumptions or domain knowledge such as code documentation or variable units, a PBM source-to-source transformation should also integrate domain-specific knowledge to generate code that is easy to read, following developer guidelines specific to each language.
First, we present the design and implementation of CyML language and the one-to-many transformation workflow. Then we demonstrate the use of CyML and for a simple model component, which simulates wheat shoot number and the extensibility of CyMLT to new languages or simulation platforms. Finally, we discuss our results and present some perspectives. This paper is not intended to provide a full description of the language and its transformation but uses them to demonstrate that a model algorithm can be implemented once and be used to generate reusable and reproducible model components in different target languages and platforms.
2.METHODS
2.1 Brief overview of Crop2ML
Crop2ML has been developed to offer to the crop modelling community a common framework for crop model component development, exchange and reuse. It provides a model component specification language based on XML meta-language. It consists of unified concepts and elements allowing to describe a biophysical process regardless of the simulation platform. A Crop2ML model is an abstract model that may be either a unit model with fine granularity or a composite model represented as a graph of unit models connected by their inputs and outputs to manage model complexity. Crop2ML separates model specification from model algorithm. A model specification contains formal descriptions of the model, the inputs, outputs, state variable initializations, auxiliary functions and a set of parameters and unit tests. Thus, it allows for checking that a model reproduces the expected output values with a given precision. It supports multiple tests associated to one or multiple set of parameters’ values. However, baseline parameter sweeps are not supported due to limited support in various languages and unit test frameworks. The specification also contains the algorithm written in CyML and any auxiliary functions called from the model algorithms or in other functions. They reduce code length and, therefore, improve readability of model algorithm by promoting reuse and increasing abstraction. Auxiliary functions include mathematical functions such as interpolation, and lower and upper bound functions.
All model units and composite models are then transformed into different languages or simulation platforms to be incorporated into modelling platforms.
The source code (https://github.com/AgriculturalModelExchangeInitiative/Crop2ML) and full documentation (https://crop2ml.readthedocs.io/en/latest/) of Crop2ML are available on Github.
2.2 Requirements and CyML design choices
We designed the CyML language to meet the following requirements.
2.2.1 Keep compatibility with programming languages of crop simulation platforms.
A model can be reused if it can be separated from its original platform and expressed using equivalent and explicit constructs available in all supported programming languages and platforms. Therefore, a sublanguage needs to be identified that is minimal enough to express biophysical processes in all platforms but expressive enough to capture the complexity of most models. The resulting code must be removed from the technical subtleties of the platform but it will still depend on the platform language. In fact, most of these languages are direct descendants of the C language from which they inherit some constructs. Thus, they provide some similarities such as statements, the sequencing controlled by loop and conditional constructs, and functions that foster programme modularization (Akin 2003). This leads to the ability to define a common language based on their common features. This language must be chosen in such a way that all its constructs are mapped to the constructs of the target languages, thus producing a fully automated source-to-source transformation. It must also provide some mathematical standard functions that have their equivalents in the language of the modelling platforms.
2.2.2 Link model specification and model algorithm to keep domain knowledge.
As the model specification language is separated from the language of the algorithms in Crop2ML, it is necessary to provide and link domain knowledge information, including the context or decisions underlying the algorithm and its implementation in the language. It is also important to reduce the coding role of modellers in the implementation of model algorithms so that they can focus on the scientific knowledge (Brown et al. 2018). Our hypothesis is that model reuse can be achieved if its algorithm is closely associated with its specification. Thereby model specification can be used to generate a function signature or domain class from the description of inputs and outputs. The specification must also allow pass through documentation within the translated source code, but also to validate model algorithms with the unit tests they incorporate.
2.2.3 Cover the domain of interest.
The abstract language must be sufficient to implement a biophysical process. This means that it must include all relevant and minimal features such as data types, modularity and structures to encode any model algorithm. For example, in order to encode a model algorithm based on a set of mathematical expressions, a simple pseudo-code described as a sequence of assignment statements is suggested. Like the model specification, this language must be modular. Model algorithms must be self-contained and reusable within a composite model.
2.2.4 Have a gentle learning curve.
An important impact of the language is its learning curve, which must be shallow and allow modellers to focus on the science of the model rather than on its implementation. Thus, CyML must enable an optimal model developer experience with a learning curve that does not intimidate new users. The algorithm language must be expressive and enable users to write efficient source code that is easily understandable with minimal syntax. It must also produce readable source code within the target simulation platforms. The translated programme must be a stand-alone programme that is independent of the transformation system.
2.2.5 Validate correctness using unit tests.
Given that CyML is built to serve as an intermediate representation of a set of languages, its validity is practically proved if all unit tests written in CyML succeed in all languages after transformation. This involves testing the generated code either in a multilanguage run-time environment or in the run-time environment of each language to ensure that the language features are well-defined and that their emulation in other languages is correct.
To satisfy the above requirements, we identify common patterns often used in crop modelling simulation platforms to implement model components. They result from the intersection of a set of minimal features of different languages used by the platforms (Fig. 1, left part). We used these features to propose a shared modelling language. An additional design choice is to use a subset of an existing language that can satisfy our requirements and provide the common selected features. Python was a good candidate language to fit our design considerations. It is an expressive and high-level programming language that allows writing short source code and has a gentler learning curve than C, C#, Java or C++ (Linge and Langtangen 2016). However, its dynamic typing can make transformation into programming languages with static typing ambiguous. Therefore, we proposed to add an explicit type declaration to the Python language, which led us to choose Cython (Behnel et al. 2000). Cython is a high-level programming language that combines the power of Python and C function calling and types on variables and class attributes. It is compiled directly in efficient C code that improves run-time speed and allows it to interact with C, C++ and Fortran source code. However, not all Cython syntax can be directly translated into all target languages. For instance, the yield statement and anonymous functions are not supported by Fortran. Therefore, we defined CyML as a subset of Cython to address the implementation of the model algorithm (Fig. 1, right part). CyML does not cover some features such as class definition, nested functions, exceptions handling, anonymous function, reading and writing files. These features are handled by the platforms in their programing language.

From the intersection of a set of languages features to a definition of an abstract language CyML, defined as a subset of Cython. Langi corresponds to a minimal language supported by a crop simulation platform ‘i’. The number of circles (n) in the left corresponds to the number of platforms.
2.3 CyML language
CyML is designed as a subset of the Cython language based on a language specialization approach. This involves removing undesirable syntactic or/and semantic features of Cython that may not be easily transformed into many different languages or are not required to implement PBM algorithms. The conformance to the subset of Cython features is guaranteed through a semantic analysis. The main concepts supported by CyML are represented in Fig. 2.

Main concepts supported by the CyML language. Black diamonds indicate composition (‘contains’) relationships and white diamonds indicate a specialization (‘is-a’).
2.3.1 Declaration: basic types and collection.
Unlike CyML, Cython does not require explicit type declarations. This means that in CyML, all variables have to be declared before they are used and the declared type is immutable. A variable can be initialized during or after its declaration. In the case of model algorithm implementation, a variable can be either a model input, output or a local variable required for the implementation. Explicit static typing is enforced by the semantic analysis step illustrated in Fig. 2. CyML supports basic types (e.g. integer, real, logical and string) and two sequence types (list and array) with dynamic or fixed length. Each element of a sequence must have the same type. Moreover, since time is an important variable in the defintion of discrete-time process, CyML provides datetime types in terms of year, month, day, hour, minute and second. CyML suppports commonly used binary (numerical and boolean), unary and comparison operators, as well as casting operators for basic types and sequence operators such as length or sum.
2.3.2 Statements.
Statements can be either an assignment, an expression or a control structure. An assignment assigns a variable to a mathematical expression, another variable or a value using an assignment operator (e.g. ‘=’). An assignment statement can, therefore, express the relationships between model inputs–outputs when those are described only by simple equations. An expression is commonly defined as a construct made up variable, operator or function call that can be evaluated to a value. In CyML, expression is distinguished from assignment by the fact that, in the case of assignment construct, the evaluation result of an expression is assigned to a variable. An expression can contain standard mathematical functions such as exponential, maximum, minimum and power functions. Unlike assignment, expressions have no assignment operator. They are built-in functions called to perform an operation (e.g. collection operations such as adding or removing an element in a sequence). CyML supports structured control flow statements that can be nested. Control flow statements include conditional branching (if, elseif and else) and loops (for-in-range, for-each, iterating over several collections and while) statement.
2.3.3 Function.
CyML uses the definition of a Python function to code the model algorithm and to represent external functions with arguments with explicit data types. A function is composed of a set of statements in its body grouped under a def statement with a signature consisting of the name of the function, their inputs arguments and return values. A function may call other functions that can be provided by an import mechanism to ensure modularity. CyML also supports recursion which means that a function can call itself in its definition.
2.3.4 Module and package.
A module is a file containing a set of functions that can be reused in models and functions. A package contains a set of modules and models in a set of files. These concepts allow external dependencies to be managed.
2.4 CyMLT design
The CyMLT architecture is composed of two main parts: the front-end and the back-end (Fig. 3).

Design architecture of the one-to-many CyML transformer (CyMLT). It takes as input a model unit algorithm implemented in CyML with associated model specifications and applies a transformation workflow to produce crop model components or source code in different languages for different platforms.
The front-end consists of a Model Parser, a Cython Parser, and a Semantic Analysis component.
The Model Parser checks the model specification based on the Crop2ML grammar and generates a logical object allowing access and manipulation of the model.
The Cython Parser provides a lexical and syntactic analysis of the source code. It detects syntactic errors and generates an Abstract Syntax Tree (AST). The AST is a data structure representing the syntactic structure of the source code as a tree where the nodes represent the syntactic components (e.g. FunctionDefinition, Assignment, If-Block…) of the grammar. Figure 4 shows an example of AST generated from a square function. The design choice of CyML relies on the legacy Cython parser. This parser uses all the syntactic components of Cython instead of a restricted grammar. To restrict Cython grammar, the generated Cython AST is processed to ensure that it incorporates only syntactic components defined in CyML.

Example of AST and ASG. (A) Definition of function ‘square’ in CyML. (B) Simplified view of AST of function ‘square’ where the internal nodes in black represent Cython constructs and the final node in blue a variable or constant. (C) Simplified view of ASG with of function ‘square’ with the new annotated nodes. The leaf nodes in black are non-terminal symbols of the Cython grammar, whereas the end blue nodes are terminal symbols, essentially variables and constants. A child node (c) can be accessed from its parent node (p) through an attribute (𝑝 𝑐).
The AST Transformation transforms the generated AST to a self-contained representation of the source code called Abstract Semantic Graph (ASG), which is independent of the source language.
The Semantic Analysis operates during the AST transformation to perform semantic checks from the AST. It consists of various checks such as type consistency, declaration of variables before their use or consistency of elements in a list. This analysis checks that the input and output data types in model specifications are well-defined in relation to the model algorithm. The semantic analysis generates error messages if the verification fails. Note that, unlike the AST, each node of the ASG is labelled with at least its type and its pseudo-type (Fig. 4C). The pseudo-type is the expected type of a node and strengthens code generation reducing the number of ASG traversals. For example, in Fig. 4C a node of type ‘Function’ follows ‘Module node’ and has a pseudo-type [‘Function’, ‘int’, ‘int’]. This pseudo-type corresponds to the function signature, meaning that this function takes as input one argument of type ‘int’ and returns one value of type ‘int’. Note also that, unlike the AST, the type of internal nodes of the ASG may be different from non-terminal symbols of the grammar. Another type of node is built that preserves the intention in the source code instead of the code structure. For example, in Fig. 4B the binary operator node ‘PowNode’ is transformed in Fig. 4C by a ‘standard call’ node, which takes as arguments the operands of the binary operation.
The back-end of CyMLT is responsible for Code Generation (Fig. 3). It is independent of the front-end. It takes as input the ASG generated by the front-end and works in relation with the Doc and Interface Generation and Transformation Rules components.
The Code Generation component transforms the annotated ASG into different readable source code or platform components. It consists of two integrated subcomponents: a Language Generation and a Platform Generation. A Language Generation emits the source code in a specific language with a specific programming paradigm. This source code does not contain any simulation platform features. A Platform Generation emits a model component based on the requirements of a platform such as its implementation language, software design and code conventions.
A Transformation Rule is a function that takes as input a node of the ASG and generates a new node based on a specific structure of the target language. Transformation Rules are applied on the ASG for Code Generation. The code generation is generally described by straightforward transformations of the ASG. However, some nodes of the ASG require non-trivial transformations to produce new nodes. For example, the transformation of the declaration node in Fig. 4C consists of replacing the basic type int by the Java basic type integer without the cdef statement to reproduce Java integer variable declaration, whereas the generation of the power call function requires applying a casting function (int) to preserve type compatibility.
The Doc and Interface Generation component generates documentation in the target language from the model specification. It embeds all the semantics of model inputs and outputs, and then integrates the model knowledge in the code generated.
Finally, the Notebook Generator transforms generated source code or model components into Jupyter notebook (Kluyver et al. 2016) to interactively test and validate the transformation.
2.5 CyMLT implementation
CyMLT proposes a unique approach to transform an ASG into many programming languages. It is implemented around the main classes shown in Fig. 5. A set of classes (suffixed by Generator) generates the code for each language and platform. It means that a subclass of PlatformGenerator and of LanguageGenerator class have been implemented for each supported platform and language. A PlatformGenerator class inherits attributes and properties of the LanguageGenerator class related to the language used by the platform. For example, as BioMA uses the C# language, the BioMAGenerator class (i.e. the class that generates BioMA components) inherits the CsharpGenerator class that generates the source code in C#. Each class contains a visitor method for each ASG node type. Each visitor method name is composed of ‘visit_’ followed by ‘the type of the node’. A visitor method emits code fragments. Each LanguageGenerator subclasses provide the same visitor method names given that the same ASG is used. A LanguageGenerator class also inherits two classes: CodeGenerator and LanguageRule. The CodeGenerator class contains the factorized methods shared by all LanguageGenerator classes including the method used for code emitting and code formatting. This class inherits the super class of the transformation process called NodeVisitor. CyMLT implements the Visitor design pattern (Gamma et al. 1995) to avoid a procedural implementation approach. NodeVisitor contains a dispatch method that enables recursive traversal through the nodes. During traversal, the appropriate visitor method corresponding to the type of the current node is called in LanguageGenerator or PlatformGenerator and the associated code fragment is emitted. Before emitting the code fragment, some nodes undergo a transformation from the LanguageRule class. This class is implemented for each language as a mapping where keys correspond to the different methods, data types and operators of CyML, and values are their emulation in target languages provided from their standard libraries [see Supporting Information—Tables S1–S5]. Given that the CyML language is similar to Python, it is straightforward to yield Python code through one ASG traversal. This is not the case for all target languages, which require more traversals to support specific features provided from the analysis of the ASG. For example, a first traversal could detect that it is necessary to declare other variables in the generated code. These additional operations have been implemented in the Adapter class containing some methods to traverse the ASG and, where the conditions have been defined, to retrieve the new features required in LanguageGenerator. Likewise, the Model object generated by the model parser is used in LanguageGenerator to generate the model interface with accessor and mutator methods for object-oriented languages, or to add additional semantics to variables based on platform conventions. This separation of model specification from model algorithm enhances CyMLT to transform a model algorithm from a procedural approach to an object-oriented approach with different software designs. Finally, LanguageGenerator and PlatformGenerator use DocGenerator to integrate model documentation into generated model components. DocGenerator extracts all information based on model specification and presents it in different format according to the language and the platform.

Class diagram illustrating the implementation of the one-to-many CyML transformer (CyMLT).
2.6 Case study
Phenology, the timing of crop development and the simulation of phase durations and crop stages, is sometimes thought of as the core for most crop growth PBMs and an essential component of most crop modelling platforms. In order to illustrate how a model is written in CyML and the functionalities of the language, we transformed the BioMA phenology component (Manceau and Martre 2018) of the wheat PBM SiriusQuality (He et al. 2012) into a Crop2ML composite model and wrote the algorithms of the model in CyML. The shootnumber, a model unit of this component, is presented in Supporting Information—Listing S1.
3.RESULTS
3.1 Model algorithm implemented in CyML
The shootnumber model is implemented in CyML as a function that includes all the meta information provided by the model specifications [see Supporting Information—Listing S2]. The model documentation is generated from the model specification and is shown in red. It contains the name of the model, its version, its time step (in days) and other descriptions such as the authors’ names and the reference for the model.
The algorithm shootnumber unit model requires an external function, Fibonacci, which is implemented outside of the model algorithm (see Supporting Information—Listing S2, Line 35) to make the code readable and shorter. This mathematical function allows to compute the shoot production from the number of emerged leaves on shoots (see Supporting Information—Listing S2, Line 22). We implement the code using conditional (if, Line 26) and loop (for, Line 29) control structures. Table 1 gives the meaning of CyML language built-in functions that are used to implement the shoot number model.
Function . | Description . |
---|---|
max | Largest item in a sequence |
min | Smallest item in a sequence |
ceil | Smallest integer greater than or equal to the parameter |
append | Add an element at the end of a dynamic array (list) |
len | Number of elements in a sequence (array or list) |
range | Generate a list of integers from a start value to a stop value with a step |
integer | Update the actual state variable from its previous value and the rate |
Function . | Description . |
---|---|
max | Largest item in a sequence |
min | Smallest item in a sequence |
ceil | Smallest integer greater than or equal to the parameter |
append | Add an element at the end of a dynamic array (list) |
len | Number of elements in a sequence (array or list) |
range | Generate a list of integers from a start value to a stop value with a step |
integer | Update the actual state variable from its previous value and the rate |
Function . | Description . |
---|---|
max | Largest item in a sequence |
min | Smallest item in a sequence |
ceil | Smallest integer greater than or equal to the parameter |
append | Add an element at the end of a dynamic array (list) |
len | Number of elements in a sequence (array or list) |
range | Generate a list of integers from a start value to a stop value with a step |
integer | Update the actual state variable from its previous value and the rate |
Function . | Description . |
---|---|
max | Largest item in a sequence |
min | Smallest item in a sequence |
ceil | Smallest integer greater than or equal to the parameter |
append | Add an element at the end of a dynamic array (list) |
len | Number of elements in a sequence (array or list) |
range | Generate a list of integers from a start value to a stop value with a step |
integer | Update the actual state variable from its previous value and the rate |
3.2 Transformation of CyML source code to different languages and platforms
Currently, CyMLT supports Python, Java, C#, C++ and Fortran languages. It also has the capability of generating a model algorithm in conformance with crop simulation platform requirements. Therefore, it handles different programming paradigms such as procedural, functional and object-oriented programming by associating model specifications to the transformation workflow.
3.2.1 Structure of generated source code.
Although CyML provides a procedural mechanism to implement model algorithm, the programming languages supported by CyMLT can be classified in procedural and object-oriented programming paradigms. Some languages are designed to support only the object-oriented paradigm (C# and Java). Fortran and C are procedural languages even though they can ‘mimic’ some object-oriented features to support object-oriented programming style (Cary et al. 1997). Python and C++ support both object-oriented and procedural paradigms. CyMLT uses procedural paradigm for Python and object-oriented for C++, as these are the most often used approaches in these languages. However, CyMLT can also be extended to generate models in Python with an object-oriented approach and in C++ with a procedural approach.
For the C++, C# and Java languages, a model algorithm implemented in CyML is transformed into a class (Listing 1) that encapsulates both the algorithm and the scientific knowledge related to the model through the integrated documentation. A class, in software engineering terms, is a data structure defining a set of common properties and methods of an object. The generated source code contains methods to access and mutate model inputs and outputs, a constructor method to create and initialize an instance of the model (object) and a calculation method encapsulating the procedural logic of the model algorithm. First, variables are used to access model input (Listing 2) values before transforming the set of instructions of the model algorithm into the new language. Then, mutator methods are applied to update the model outputs (Listing 3). Model inputs and outputs are used to build a class of objects passed in argument of the calculation method. External functions are transformed into static methods of the model class (Listing 1).

Structure of generated source code in Java, C#, Fortran and C++.

Access input variables (in Java), s and s1 correspond to two instances of the class of state variables to manage previous and current state. CyMLT generates variables to access the fields of these instances and uses them in the procedural logic.

Update output variables in Java. s corresponds to an instance of current state variable.
The current version of CyMLT supports Fortran 90. This Fortran version presents low-level features (pointers, allocation), which makes some transformations difficult but ensures a higher portability. In Fortran, model algorithm corresponds to a subroutine, whereas external functions are subroutines, functions or recursive functions. CyMLT automatically operates this choice. In our case study, the Fibonacci function is transformed in a recursive function, which keeps the structure of the original code. In Python, the generated source code has the same structure as the CyML function. However, CyMLT can also generate Python code with an object-oriented approach.
3.2.2 Data type and variable declaration.
In addition to the programming paradigms, languages supported by CyMLT can be classified by their type system, in particular their type expression (explicit or implicit). This can affect the quality of the generated code. Although some languages (e.g. C# and C++) allow both implicit and explicit type expression, we chose to provide explicit typing. Basic types (integer, logical, character and real) are built-in data types in all languages. However, other more complex types like datetime or sequence are supported but require external or standard libraries. Moreover, various libraries exist to handle the same data structure. CyMLT’s data types map appropriately to target languages by using their standard library [see Supporting Information—Table S1].
Some compromises have been made for the transformation of complex types. CyML arrays are modelled on a standard Python list. However, the size of list data type variables is not fixed. We propose to use the Numpy array in the next version of CyMLT. In Fortran, CyMLT generates allocable arrays to map to CyML list data types and provides some functions to handle it. These functions are extracted from CyMLT library and integrated into the generated code to make it independent of the library of transformation. In C++, datetime type handling is not easy. It is converted into a string, which could be split for processing. CyML arrays without a specified size in the function parameter are mapped to C++ arrays using templates (Listing 6, Line 1). In Java, there are many standard Time APIs (e.g. Date, LocalDateTime) depending on the version of Java. We have chosen to use the Date Library in Java and the DateTime Library in C#.
3.2.3 Type and intent preservation.
Most of the target languages provide built-in methods matching with CyML built-in functions. However, there may be some differences between their name or return types. This is considered in the generated source code. As an example, consider the statement at Listing 2 on Line 29, where the purpose is to find the smaller integer value that is larger than or equal to the leaf number. The method ceil in the C++ Math library corresponds to the CyML ceil function but returns a floating-point value. In this case, CyMLT preserves the original type (integer) by applying an explicit type conversion (Listing 4, Line 1).

Type preservation in CyML transformation to C++, int casting is applied to the result returned by ceil function.
The generated code preserves the intent of the original code provided by the information on the ASG. Listing 5 illustrates this intent preservation in the transformation of CyML For-loop construct (Listing 4, Line 1) where the consecutive iteration is expressed into an efficient way of representation in Fortran with the DO sequence (Listing 5, Line 1). However, the sequence indexing is different between CyML and Fortran. The last parameter of the CyML range function is not contained in the CyML sequence unlike the Fortran DO sequence. This is managed by subtracting this parameter by 1 in the generated code, thereby providing a same length of sequence. Likewise, arrays in Fortran are indexed from 1 by default and this is considered during the transformation of all array operations.

From CyML for-loop to Fortran do-loop. The subroutine Add is generated to expand leaf tiller number array.
3.2.4 Preservation of the scope of variables.
CyMLT considers the scope of the variables in the different target languages. The scope of a variable refers to a region of the code where the variable is visible. Some languages like Java, C++ and C# manage variable scope differently and this variability is handled by CyML.
Consider the transformation of a simple CyML function that calculates the sum of elements of an array x with undefined size (Listing 6). The generated code in Fortran requires the declaration of new variable i_cyml to map the For-loop construct. However, the generation of a new variable in Java, C++ and C# preserves the scope of the variable i. The scope of the iteration index on an array variable in a For-loop construct is limited to the loop scope, whereas it is extended to all the functions in CyML and Python. Assuming that in the original code this iteration index is reused after the loop, it will generate a compilation error in the target languages if the transformation did not handle this scoping issue by declaring another variable.

CyML code of a function that computes the sum of the elements of a list transformed using CyMLT in Python, C++, C#, Java and Fortran.
3.2.5 Transformation to simulation platforms.
The transformation of a CyML code to target languages can generate a model component in different ways. These transformations have been designed to be close to the philosophy of each target language. However, from the perspective of crop model component development, high-level programming languages are the lowest level of abstraction with respect to simulation platforms and frameworks. Additional constraints in crop modelling platforms include a specific programming paradigm, software design and code conventions. These different features give them capabilities to provide code introspection and reflection support, which allows them to dynamically extract and change information or knowledge about the code at runtime. Thus, the code generation should extend language code generation by considering platform coding constraints, which are often implicit. The design of programming languages is formalized using grammars and is unambiguous. Platforms use design and architectural patterns without the use of an explicit formalism. This implies adapting the transformation to each platform taking into account their specificities. The current version of CyMLT generates model components compatible with BioMA, DSSAT, Record, OpenAlea and Simplace platforms, which support C#, Fortran, C++, Python and Java, respectively.
3.2.6 Generation of object-oriented components.
An object-oriented platform provides features such as inheritance, polymorphism and software design used to implement models. Polymorphism allows a model programmer to provide a generic interface to a number of related functions, and, thus, to propose different strategies to implement a model with different assumptions. For instance, this provides the possibility to include new physiological processes that are shared among different crop types. For this, object-oriented platforms define an abstract class that specifies the interface of all model components, which implements all the abstract methods of the abstract class. Two different approaches are used for model components to inherit an abstract class. Some platforms offer an abstract class and all model components implement and extend this class. This is the case for Simplace and Record, which provide the FWSimComponent (Listing 7) and DiscreteTimeDyn interface, respectively. Another approach followed by platforms is component-based programming. A model developer creates a component that inherits of an interface provided by the platform. Thus, model components inherit this component interface. For example, BioMA provides the IStrategy interface. The current version of CyMLT generates a component interface in addition to the generation of model components. The abstract methods depend on the platform and include a method that encapsulates the algorithm of the model.

Structure of ShootNumber component in Simplace. A model unit in Simplace implements and extends an abstract class called FWSimComponent. Then, a model component overrides its abstract methods including init (model initialization), clone (deep clone of the model) and process (model algorithm). The structure of the abstract class is used to define a model skeleton in CyMLT to generate a model conforms to platform requirement.
3.2.7 Generation of stateless and stateful unit models.
A model algorithm is implemented in CyML as a function. However, the CyMLT generates both a stateless and a stateful component. A stateless component is an immutable object whose values of fields do not change if methods are invoked. CyMLT allows searching and extracting state variables from a model specification to perform code generation according to each platform.
In DSSAT and OpenAlea, a model algorithm is implemented as a stateless functional component (declarative paradigm). The Fortran code generated by CyMLT is compatible with DSSAT. In this platform, the calculation of rates of change and the integration of state processes are sometimes separated with the use of a control variable. In CyML, we introduce two variables that define the previous and current value of a state variable that avoids a misuse of the state variable. Although OpenAlea offers capabilities to benefit of oriented-object features of Python, OpenAlea components can be defined as pure Python functions, already generated by CyMLT. However, model specifications need to be transformed into an OpenAlea component specification for unit and composite node (Pradal et al. 2008).
BioMA uses the strategy design pattern to create a library of simple strategies (equivalent to Crop2ML unit models) and composite strategies for model composition. The simple strategy leads to the implementation of a model unit as a stateless component. Thus, an instance of model unit class is a stateless object since it contains only model parameters (if any) as attributes which do not change during the simulation. The method of computation is comparable to a function that takes an object as an argument (i.e. higher-order function). Concretely, these objects are instances of domain classes. Domain class contains the values and the attributes for all variables defined in model specifications. To handle the change of state variables, the method of computation of each class takes as arguments two instances of state variables domain class reproduced by CyMLT (Listing 8), one for the current value and the other one for the previous one. This is made possible by the fact that the previous state is emulated in the CyML function with variable suffixed with ‘_t1’.

Fragments of code in C# with BioMA guidelines generated with CyMLT. s1 is an instance of state domain class used for previous time, s is an instance of state domain class used for current time. This shows that leaf number has been calculated by another model at the current time step, whereas the other variables are those calculated at the previous time step.
Finally, in Record and Simplace, unlike BioMA, a model unit class contains all state variables. In Simplace, there is no convention to distinguish previous and current state variables. Thus, CyMLT considers them as distinct fields in the generated Simplace component. The Record platform handles variable history (time series) by suffixing state variable with an operator () in the code. Thus, in this case, CyMLT generates current state variables with the suffix () and previous state variables with (-1).
3.2.8 Generation of platform-specific types and data structures.
Some platforms define their own types by providing a generic class to handle model variables and parameters. A generic class is either a class or an interface that can be parameterized over the language data types. It contains a specific number of methods including methods to access or update variables. In this case, CyML data types map the framework generic types.
Unlike BioMA, where inputs and outputs are C# data types extended with the generation of accessors and mutators, Simplace and Record provide their own class or interface to declare model inputs and outputs. To generate a Simplace component, the process of transformation consists of declaring model variables with the specialized class FWSimVariable. Then, CyMLT generates other variables declared with Java data types, which are used to access values of the FWSimVariable instances (Listing 9). This allows expressing the model algorithm with a pure Java but requires the use of a mutator method of the generic class to update output (Listing 10). Likewise, the generated Record component implements the DiscreteTimeDyn class provided by the vle package of Record to encode discrete-time models algorithms.

Generation of other variables to access Simplace component variables. These variables are prefixed by t.

Update of the variables of the shootnumber unit model generated by CyMLT following Simplace specifications.
3.3 Extensibility
The number of languages and platforms that CyMLT supports can be extended due to its modular structure. The explicit separation between the production of the annotated ASG and its transformation into a readable source code of the target languages and platforms provides a great flexibility to add new target languages. The addition of a new language requires only a mapping of this intermediate representation into a set of compatible instructions based on the standard library of the language. The generated code must be independent of the transformer, clear and easy to read while preserving the knowledge expressed in the original code. We present the steps for the extension of CYMLT with R language (R Core Team 2017) and the PMF.
3.3.1 Supporting a new language: R.
R is a popular language used for statistical analyses and data visualization. Many modellers use R to start the development of their model (Zhao et al. 2019). Thus, with this extension, modellers can in the same environment conduct the first steps for model development and the implementation in a simulation platform, and analyse model outputs. The extension of CyMLT for R relies on the implementation of RGenerator and RRules classes that emit fragments of code in R and define transformation rules between CyML and the desired R constructs, respectively.
3.3.2 Implementation of transformation rules for R.
Transformation rules define the mapping of CyML operators, built-in functions and methods to their equivalent in R. R is a dynamic typed language and, as with Python, the type of variables is ignored.
Operators mapping. Listing 11 declares the mapping between CyML and R operators. Only the difference operators are shown between CyML and R. During the ASG traversal, the visit method considers these mappings to emit code fragments.

Adapting standard functions. CyML defines three standard libraries (i.e. math, system and io) to provide mathematical, system and file management functions in the different languages. A mapping is needed to link these functions to native R ones for each library. Some functions are identical between CyML and R, like min or max. Others require a transformation to another type of node. It is useful for model developers to observe the generated ASG of each CyML construct in order to define the equivalent of the construct. For example, the construct of a modulo binary operation in CyML is a standard_call node in the ASG whose namespace is system, the function is modulo and the arguments are the two operands. This node is transformed into a binary_op node (binary operation) with the function ‘translateModulo’ (Listing 12). The new node is visited to produce R fragment code.

Standard methods mapping. Standard methods are functions applied to a particular data type of CyML language (Listing 13). Thus, a set of methods is provided for each CyML data type. Their equivalents in R language are defined using the same mapping mechanism used for standard functions. In Listing 13 at Line 9 the append method applied to a list is transformed to an assignment node whose value is a function c that takes as arguments the name of the variable of type list (receiver) and the argument of the append method (args). The definition of these rules limits the use of conditional statements in the implementation of the visit methods and facilitates the extension of CyMLT.

3.3.3 Implementation of a R code generator.
The RGenerator class inherits the RRules class. It implements a family of visit methods like visit_assignment, visit_bool related to all types of nodes provided by the ASG. These methods emit fragments of code, which will be joined to produce a formatted source code in R. The properties that enable write and format functions for these fragments are implemented in a class named CodeGenerator inherited by RGenerator. Additionally, CodeGenerator abstracts the common behaviour of these languages by providing other properties and visit methods common to all the target languages. Some methods are redefined in the language generator when it has particular features. The developer of the R code generator implemented the different visit methods without bothering with the dispatching mechanism provided by the NodeVisitor class. A visit() method is called for all composite child nodes while a write() method is invoked for the terminal or single node to emit the code fragment. For example, a boolean value is a terminal node. Thus, the visit_bool method allowing generation of the corresponding boolean value in R will only consist in uppercase CyML logical value (Listing 14).

The assignment node is a composite node that contains a target node and a value node. These two nodes could be a composite node. So, they will all be visited by the visit_assignment() method (Listing 15).

All target language generators share the principle of implementing a visitor method for standard functions or standard methods call nodes, and, it is, therefore, implemented in the CodeGenerator class. The properties of the node are used to access to the function equivalent in the dictionary of functions in the transformation rules class. Listing 16 shows the implementation of the standard function call node where its properties such as namespace and function are used to access the equivalent function.

This implementation approach is followed for all types of nodes and could be gradually done according to the expected R constructs. Given that it has several possibilities to implement an algorithm, it is the responsibility of the extension developer to provide the corresponding semantic for each particular node of the ASG and to validate the transformation with unit tests.
3.3.4 Supporting a new simulation platform: APSIM-PMF.
APSIM (Holzworth et al. 2014a) is one of the most widely used PBM platforms for simulating the performance of a wide range of cropping systems. It has undergone a major evolution by providing the PMF (Brown et al. 2014). Plant Modelling Framework is used to build models that represent plant components of a crop composed by identical plants. It is based on the structure of a generic plant and a wide range of processes involved in plant growth and development. However, the composition and parameterization to build a particular crop model is not specified and is left to model developers. Plant Modelling Framework, therefore, allows great flexibility in its approach for implementing biophysical processes by separating model set-up and assembly. The PMF concepts and processes are implemented as generic classes at different organizational levels (Brown et al. 2014).
The extension of CyMLT to PMF consists in adding the capacity to generate a model component in C# that fulfils PMF requirements. The developer implements a PMF generator class that extends the C# generator class. This class contains some PMF requirements: (i) the generated model component is a C# class that inherits the Model class, and (ii) it contains the getter and setter methods of all model variables and parameters with the algorithm implemented in C#.
4. DISCUSSION
The CyML language provides a relatively simple structure with few specifications that can express the algorithm of a biophysical process involved in crop growth and development. The real interest of this language is to provide a common method to describe a process with the capacity to be integrated automatically in various platforms. CyMLT provides export capabilities in many languages and platforms, enabling users to focus on the scientific aspect of their model rather than on the internal knowledge of platforms’ specificities. A model component can be reused, improved, integrated and simulated in various platforms. This improves the diffusion of models, sharing them as a software and scientific artefacts, and thus, enhancing transparency and reproducibility of crop models. Moreover, with CyML, the model development may become a collaborative task of different groups of model builders with the possibility to compose different model units provided by different platforms.
For crop modellers, learning a new language with its own learning curve adds a level of complexity to an existing complex landscape of languages and tools. We designed CyML to minimize this added complexity by choosing a language that is very close to existing languages. The main source of complexity is in the model specification. The modeller has to specify the type of inputs and outputs, the documentation and unit tests. While this increases the complexity of the design of a new model, it provides an explicit and rigorous specification and enhances the transparency of the model and its reproducibility and reusability in different contexts. A transformation system embeds platform specificities to automatically generate model components conform to specific platforms. This makes the complexity of component integration in different platforms identical with a wide availability.
Several approaches and solutions exist to transform source code from one language to many higher-level programming languages (Baxter et al. 2004; Plaisted 2013; Schaub and Malloy 2016). They demonstrate the usefulness of source-to-source transformation systems in the development of reusable software libraries. For instance, Nunnari and Heloir (2018) allow for the implementation of motion controllers of virtual humans, which are reused in multiple game engines. Their system is based on Haxe, a language that offers the capability to transform Haxe code into many programming languages. However, like most available code transformation systems, the generated code depends on the transformation system. Likewise, Cython generates code into the C and C++ languages that have a high performance but the generated code has a low readability, therefore, making it difficult to understand and to maintain. To our knowledge, no solution exists to transform PBM algorithms in different languages considering the specificities of different modelling platforms. This transformation is useful in the sense that model components are not just code but embed scientific knowledge that should be preserved. In this work, we also propose a system that includes algorithm error checking with explicit error messages to guide developers. CyML addresses several issues encountered in current PBM frameworks, namely:
- reproducibility: a crop model or algorithm can be written once and automatically made available in different languages and platforms;
- reusability: a model can be reused and composed with other models of a specific platform;
- transparency: model algorithms are implemented using a common approach regardless of the crop simulation platform, and maintain the biophysical process knowledge.
Our approach and strategy should greatly reduce the implementation errors and improve model reproducibility. However, neither the definition of a language nor its transformation is approached without certain constraints, essentially due to the trade-offs between generality and abstraction.
4.1 CyML transformation challenges
We provide a new language with a transformation system to produce code correctness. However, some inconsistencies or complexities could appear depending on the target language. First, the current version of CyML does not handle the type overflow. It means that errors related to overflow could not be detected at the CyML system level. For example, the generation of the Fibonacci recursive function in Python by just removing declaration types could lead to the crash of the system due to the Python recursion limit, whereas the generated code will not produce any error in Java but the result will rapidly overflow. A method to detect overflow can be implemented to avoid this type of error at run-time level. Moreover, CyML can be extended to support 64-bit C double type. Second, CyML provides primitive types whose equivalence in some platforms are objects with some properties. This means that coding an existing model algorithm in CyML could require an additional CyML external function to emulate the properties of these objects. Third, CyML has some limitations with data type conversion. For example, Datetime type is not supported in Fortran or C++. In this case, CyML converts it into strings. However, the translator could be extended to depend on specific libraries used by simulation platforms to perform the transformation. Finally, some platforms are close to the philosophy of their underlying language (e.g. DSSAT, BioMA, OpenAlea), whereas others extend their language with a high-level specificity (Record, Simplace) that requires a complex transformation.
4.2 Lower the barrier of crop simulation platforms
The main barrier to exchange and reuse of model components between simulation platforms is the specificities embedded in the algorithm implementation. CyML intends to lower the barrier of platform specificities. Our analysis of several platforms showed that each platform adopts a standard to implement model algorithms that does not vary from one implementation to another. The knowledge of platform requirements offers the possibility to integrate them into CyMLT in order to make their components available to many modelling platforms. We did not conduct a performance analysis but the cost of implementation is reduced by an order of magnitude compared to the time used to manually re-encode the same model into each platform without considering the inherent errors added during the process. CyML supports not only the transformation of the algorithm of unit models, but it also provides the evaluation of composite models by calling in sequential order models that are encapsulated into it. It also proposes a way to produce unit tests for each unit model algorithm in different languages based on the specifications of the inputs, outputs and parameter values. It checks the validity of the generated source code ensuring that all transformation results give the same results. It should be noted that CyML adds unit test functionality to platforms that do not use test-driven development.
4.3 CyML for model reuse and reproducibility
CyML implements PBM components with a functional and procedural approach. A component describing a biophysical process (e.g. phenology, soil water balance, photosynthesis) can be decomposed into independent components, which can be implemented and composed in CyML. Components implemented at a high granularity embed more scientific knowledge, but the component becomes less reusable. The implementation of a component into small functions (unit models) enhances its readability, reduces the distance between its expression as equations or mathematical expressions and its implementation, and reduces its maintenance cost. CyML is designed to tackle the reproducibility of PBM components. Although PBMs are described in scientific publications and their code are increasingly publicly accessible, the reproducibility of the results remains a fundamental issue. Their implementation requires a procedural or functional language that is shared between simulation platforms to ensure their reproducibility. It is, therefore, useful to propose code in the language and that follow the specifications of the target platforms. The automatic transformation of model algorithms into different languages and simulation platforms is essential for interoperability and code reuse. CyML users can implement a model in CyML and transform the algorithms into various targets by using CyMLT. Hence, CyML aims at promoting PBM reusability and interoperability through a transformation system that parses model specifications and knowledge needed to transform algorithms.
4.4 Scope of CyML language
CyML is a subset of the Cython language. Thus, it does not include many features found in general-purpose programming languages. This choice of language limitation has its strengths and weaknesses. The method presented herein differs from existing model interchange platforms in that it generates source code with different programming paradigms and it associates model specifications to algorithms to enhance code analysis. It allows a common implementation of the dynamics of biophysical processes by removing the specificities of the languages and platforms. It improves the readability of the code since the structure of the code and the characteristics of languages are shared by modelling platforms. It ensures the mapping of the abstract representation to other languages or platforms. Indeed, this language limitation reduces ambiguity in the language transformation since the base language (Cython) has some features that cannot be transformed into some target languages. With CyML, different processes provided by different platforms can be represented and composed regardless of the platforms, which enables to define a new white-box component reusable by other platforms. CyMLT provides a reuse approach that is opposite to a black-box approach where the composition of model components is bound to the execution platform targeted by its modules (Van Evert et al. 2005).
CyML does not interact with the simulation paradigms of the platforms. Its sole concern is to represent and transform the process models. Its evaluation capabilities are only used to check the correctness of the transformation. Moreover, CyML does not provide a formalism to link model components with data to build a modelling solution. Thus, the processes to read inputs, parameter values and write output values in a file are separated from the algorithm implementation given that it reduces reusability.
Although CyML focuses on the implementation and reuse of biophysical models, it could be used in general purpose. Thus, any code that can be implemented with CyML features can be transformed into different languages without associating specifications files.
4.5 Towards a standard language
The development of CyML and its transformation system addresses the need of the plant and crop modelling community to enhance research collaboration by improving the capacity to exchange and reuse PBM components. The theoretical interest to provide a common approach to implement model response has been demonstrated (Holzworth et al. 2014b). However, despite the success of simulation platforms around which different communities are built, and some proposal of declarative language implementation, the lack of a shared standard limits model reusability. This issue limits the performance of the activity of PBM intercomparison and improvement. The availability of CyMLT through AMEI will allow building a large community around this system and can make CyML a standard language providing a means to seamlessly compare independent biophysical processes or promote alternatives approaches.
4.6 Future developments
Several modellers have expressed their interest to extend CyMLT with other languages used by the plant and crop modelling community. The use of a well-annotated ASG with model specifications provides an intuitive representation of the model algorithms. This abstraction set up various analysis of the source code by generating different source code based on the target language features, software design and code conventions. With this flexibility offered by the ASG, future work can explore the extension of CyMLT with other imperative programming languages such as Matlab, Julia, JavaScript or other modelling platforms that use imperative languages.
Reuse of legacy PBM model components without the need to encode them into CyML could reduce the investment in model exchange and could increase the interest of the platforms. Therefore, the next step would be to provide a transpiler that transforms legacy model components from various languages and simulation platforms into CyML code automatically. Such a many-to-many transformer would provide a complete system of interoperability of languages and simulation platforms.
CyMLT aims to enable the exchange and reuse of components between modelling platforms, notably between PBM and functional-structural plant modelling (FSPM) platforms. While crop growth models simulate plant growth and development at the scale of the canopy (m2) or average plant level, FSPMs are individual-based models at the scale of the organ. The exchange (sharing) of model components between PBM and FSPMs would allow an efficient coupling of these two modelling approaches to model crop species or variety mixtures by capturing spatial heterogeneities and quantifying plant traits involved in crop mixture performance (Gaudio et al. 2019). Another application is the use of FSPMs in a model-driven phenotyping approach, where plant structural traits are estimated by reverse engineering a FSPM (Liu et al. 2019) and are then used as crop model input parameters to simulate the behaviour of genotypes in target agro-climatic scenarios. Currently, CyML only allows for the representation of processes as functions and does not consider the plant’s structure. To extend CyML to the FSPM community will require to extend CyML language and CyMLT to support complex data structures such as 3D geometry and topology.
The convergence of our approach of model reuse and reproducibility approach with other collaborations, like the Crops in Silico collaboration (Marshall-Colon et al. 2017), would greatly accelerate the development of the next generation of PBMs. The Crops in Silico collaboration aims at integrating model frameworks to build a complete crop in silico from the level of the genes to the level of the field or ecosystem using a software package, Yggdrasil (Lang 2019). Yggdrasil connects PBMs across programming languages by running asynchronously models in parallel. It requires to write wrappers in the different languages to process the asynchronous messages to manage model inputs and outputs. CyMLT may interact with Yggdrasil (i) to make available model components into the languages supported by Yggdrasil with their wrappers; (ii) to produce efficient components source code in various languages in order to improve the performance of the simulation in Yggdrasil; and (iii) by validating each component with unit tests before their integration. The interaction between CyML and Yggdrasil could enhance the integration of PBMs across different languages and scales. A complementary approach to the one presented here was demonstrated for the automated transformation of input files of four agricultural models (Samourkasidis and Athanasiadis 2020) enabling the discovery and reuse of data across modelling solutions. Together with AMEI they could ensure that a complete model implementation and accompanied data can be transformed between modelling solutions.
5. CONCLUSIONS
In this study, we defined a minimal language based on the Cython language to implement biophysical processes involved in plant and crop growth and development. We designed a system that transforms CyML source code to many target languages and simulation platforms. The association of model specifications in XML-based format with the description of model algorithm based on CyML specifications allows to annotate each variable used in the algorithm. With this approach we can produce code with different programming paradigms including object-oriented approach and with different software designs. We showed that this language is sufficient to express biophysical processes and to transform them in different target languages and simulation platforms. We argue that the abstract language offers some trade-off between generality due to the convergence of the platforms and the complexity hidden in each platform. Crop modellers should have some programming skill to implement a model in CyML but no other skills are needed to produce automatically a model component source code in various languages and platforms. This reuse approach will help modellers to improve the reproducibility of their models and their reuse and should enhance research collaborations and model improvement and use.
SUPPORTING INFORMATION
The following additional information is available in the online version of this article—
Table S1. Mapping of basic data types between CyML and the languages supported by CyMLT.
Table S2. Mapping of arithmetic operators between CyML and the languages supported by CyMLT.
Table S3. Precedence pecking order in CyML language and the languages currently supported by CyMLT.
Table S4. Mapping of built-in functions between CyML and the languages supported by CyMLT.
Table S5. Mapping of flow control statements between CyML and the languages supported by CyMLT.
Listing S1. A Crop2ML model specification for the shoot number model.
Listing S2. CyML code of the shootnumber unit model of the WheatPhenology composite model.
ACKNOWLEDGEMENTS
C.A.M. acknowledges the support of INRAE Division AgroEcoSystem and NUM. P.M. acknowledges the support of INRAE Division AgroEcoSystem.
DATA AVAILABILITY
The CyMLT source code are available publicly on Github at https://github.com/AgriculturalModelExchangeInitiative/PyCrop2ML. Full documentation for CyML and CYMLT can be found at https://pycrop2ml.readthedocs.io.
SOURCE OF FUNDING
C.A.M. was supported through a PhD scholarship from the French National Research Agency under the Investments for the Future Program, referred as ANR-16-CONV-0004. C.P. was partially supported by the H2020 IPM Decision #817617. I.N.A. was partially supported by the European Union Horizon 2020 Research and Innovation program (grant #810775, DRAGON). The work of CREA was carried out in the frame of the project AGRIDIGIT – Digital Agriculture, funded by the Italian Ministry of Agriculture.
CONFLICT OF INTEREST
None declared.
CONTRIBUTIONS BY THE AUTHORS
C.A.M (Methodology; Investigation; Software; Writing – original draft). C.P. (Conceptualization; Software; Supervision; Writing - Original draft; Writing – review & editing). I.N.A. (Conceptualization; Writing – review & editing). M.D. (Conceptualization; Writing – review & editing). A.E. (Conceptualization; Software; Writing – review & editing). D.F. (Conceptualization; Software; Writing – review & editing). F.G. (Supervision; Writing – review & editing). D.H. (Conceptualization; Software; Writing – review & editing). G.H. (Conceptualization; Writing – review & editing). C.P. (Conceptualization; Software; Writing – review & editing). H.R. (Conceptualization; Software; Writing – review & editing). P.T. (Conceptualization; Writing – review & editing). P.M. (Conceptualization; Supervision; Project administration; Writing - Original draft; Writing – review & editing).