Decentralized Data Flows for the Functional Scalability of Service-Oriented IoT Systems

Analysis of related work.

Approach	Separation of data and control	Functional scalability	Decentralized dataflows
Orchestration with coordinated data exchanges	$\times$	$\times$	$✓$
P2P dataflows	$\times$	$\times$	$✓$
P2P choreography	$\times$	$\times$	$✓$
Our approach	$✓$	$✓$	$✓$

Approach	Separation of data and control	Functional scalability	Decentralized dataflows
Orchestration with coordinated data exchanges	$\times$	$\times$	$✓$
P2P dataflows	$\times$	$\times$	$✓$
P2P choreography	$\times$	$\times$	$✓$
Our approach	$✓$	$✓$	$✓$

Table 1

Analysis of related work.

Approach	Separation of data and control	Functional scalability	Decentralized dataflows
Orchestration with coordinated data exchanges	$\times$	$\times$	$✓$
P2P dataflows	$\times$	$\times$	$✓$
P2P choreography	$\times$	$\times$	$✓$
Our approach	$✓$	$✓$	$✓$

Approach	Separation of data and control	Functional scalability	Decentralized dataflows
Orchestration with coordinated data exchanges	$\times$	$\times$	$✓$
P2P dataflows	$\times$	$\times$	$✓$
P2P choreography	$\times$	$\times$	$✓$
Our approach	$✓$	$✓$	$✓$

In the analysis above, we did not consider decentralized orchestration, coordinated dataflows and choreographed orchestration, since those approaches do not support decentralized data flows between atomic services. Nevertheless, for completeness, we analyse those categories below.

In approaches built on top of decentralized orchestration, multiple composite services coordinate system execution [9, 10, 12, 32]. In particular, approaches like [32] persist data on distributed data spaces, which may become a bottleneck in IoT environments where services exchange huge volumes of data continuously. Although some works (e.g. [12]) solve the bottleneck issue by persisting references instead of values, they require the maintenance of tuple spaces and databases for moving references and storing data, respectively. Moving references is necessary because decentralized orchestration does not separate data and control. Paradoxically, decentralized orchestration does not support decentralized dataflows between atomic services.

In coordinated dataflows [33], a master composite coordinates data flows between slave composites. As data pass through multiple mediators, this approach introduces unnecessary network hops (like decentralized orchestration). This problem is also present in [34], which introduces the notion of abstract data types for moving data outside services. Discovery Net [35] and Kepler [36] are workflow systems that implement the semantics of coordinated dataflows, which provide mechanisms to coordinate the execution of (disjoint) data flow graphs. Discovery Net allows the definition of a control flow graph where nodes exchange control tokens instead of data values. Nodes can be data flow graphs per se. In Kepler, a director is a semantic entity that controls data exchanges from outside a data flow graph. These systems do not support decentralized data flows because their workflow execution mechanisms are centralized [9].

In choreographed orchestration, data pass through multiple orchestrator participants leading to network degradation [37]. To separate control and data, [8] proposes a choreographed orchestration approach that allows the manual modelling of cross-partner data dependencies. However, there are no decentralized data flows between atomic services, but only between partners.

1.2. Paper contributions

This paper proposes a functionally scalable approach that semantically separates data and control, in order to decentralize data flows between atomic services in IoT systems. Accordingly, this paper makes the following contributions:

We propose an (implementation-independent) service composition model that semantically separates data and control by so-called data forests. The semantics allows the encapsulation of explicit control flow graphs and explicit data dependency graphs. As graphs are orthogonal, the semantics enables a separate reasoning and analysis of data and control.
We propose an efficient algorithm that leverages the model semantics for a compositional analysis of data forests. The algorithm analyses a data dependency graph, without considering control flow, for the automatic construction of a decentralized mapping between data consumers and data producers. The algorithm reduces both the number of data flows and network traffic of an IoT system.
Unlike the state-of-the-art on decentralized data flows, we evaluate the proposed approach in terms of functional scalability. The evaluation is implementation-independent and shows that the approach scales well with the number of services by reducing the number of data flows linearly and network traffic logarithmically.

1.3. Paper organisation

The rest of the paper is organized as follows. Section 2 presents a motivating scenario to explain why decentralized data flows are a crucial desideratum in the IoT domain. Section 3 presents an overview of the model semantics and a detailed description of the proposed approach. Section 4 presents the implementation of the proposed solution. Section 5 presents a case study. Section 6 presents a comparative evaluation of decentralized data flows versus distributed data flows in the case study. Section 7 outlines an evaluation of the proposed approach in terms of functional scalability. Section 8 outlines the conclusions. Appendices A and B formalise our motivating scenario. Appendix C describes an example of the deployment-time process. Appendix D presents a source code fragment of our implementation. Appendix E presents the notation used throughout the paper.

2. WHY DO WE NEED DECENTRALIZED DATA FLOWS IN IOT?

This section discusses the rationale for enabling decentralized data flows in IoT, with the help of a scenario based on the smart parking system presented in [1] and real-world occupancy data from car parks in Birmingham, UK (Appendix A). The system is used by drivers who want to find the nearest parking space while they drive around a smart city. The workflow for this scenario is a combination of the reduce and the sequential data patterns and involves the services shown in Figure 1. Although data pass through operations, we consider that it is routed between services as there is a one-to-one relationship between services and operations. For simplicity, we assume that services are deployed on different things, e.g. the Availability Checking Service is deployed on an infostation while the Space Finding Service is deployed on an edge router. An infostation is an urban infrastructure device that collects up-to-date status from all occupancy sensors in range. We assume that all parking spaces are equipped with occupancy sensors whose functionality is abstracted by a Sensor Service.

Figure 1

Workflow of the smart parking system.

The workflow of our scenario is triggered by a driver’s request, and it starts with two parallel tasks. In the first one, the Data Reducer 2 receives the driver’s coordinates from the GPS Service. In the second one, the Data Reducer 1 pulls the states from multiple Sensor services to create a list that shows the status of parking spaces. Then, the Availability Checking Service uses the list to determine which parking spaces are free. As some spaces can be reserved in advance, the Reservation Service determines which free parking spaces are unreserved and passes its resulting list to the Data Reducer 2. Once the Data Reducer 2 receives all its inputs, it forwards them to the Mapping Service. The Mapping Service then computes the distance between each (free and unreserved) space and the driver’s location and appends the distances to the list. Finally, the Space Finding Service determines the nearest parking space for the driver, using the calculations from the Mapping Service. The coordinates of the nearest parking space are the final result of the workflow and are passed to the car control application.

As the number of occupancy sensors could be huge, especially in large cities, the services of the smart parking system may potentially exchange vast amounts of data continuously. This is because there is a one-to-one mapping between sensors and services, and occupancy sensors are frequently producing data. Thus, functional scalability becomes a challenging concern. To understand how we can address this problem, we analyse the three existing approaches for passing data between the services of our scenario (Fig. 2). For simplicity, we do not show control flow and we assume that there are no data required for both pulling sensor data and starting the execution of the system (from the car control application).

Figure 2

Data flow approaches for the smart parking system.

In our scenario, there are 373 drivers arriving to the car park BHMBCCMKT01 between 07:59:45 and 16:26:47 on a given day (Appendix A). This car park has 577 occupancy sensors each producing 100 bytes, so there are $577 \cdot 100 = 57.7$ KB of sensor data per driver request. The number of bytes returned by the Availability Checking Service, the Reservation Service and the Mapping Service are computed by the functions $S_{A} (t)$ ⁠, $S_{R} (t)$ and $S_{M} (t)$ ⁠, whose definitions are presented in Appendix B. For simplicity, we assume that the GPS Service and the Space Finding Service always return $40$ bytes for the coordinates of a driver and a nearest space, respectively.

We refer the reader to Appendix A for details about the data set and the formalisms used in our scenario. Appendix B presents the calculations for the total data transmitted over the network, considering that there are $373$ drivers between 07:59:45 and 16:26:47. These calculations are presented for each data exchange approach depicted in Figure 2. Below we interpret the results.

A centralized approach [5, 38, 39] depends on a single coordinator for passing data between services. Figure 2(a) shows how a central coordinator mediates data streams for the smart parking system. The main drawback of this approach is that such a coordinator is a potential bottleneck (as huge amounts of data are exchanged) and introduces an extra network hop for passing data. This approach is therefore a threat for scalability as pointed out by many researchers [7, 8, 11, 14]. Furthermore, IoT may potentially require the deployment of coordinators on resource-constrained things (e.g. edge devices), which can lead to bottlenecks due to the presence of low computing power and poor network resources [7]. For instance, the central coordinator of our scenario is deployed on an edge device, which manages $\sim$ 102.98 MB of data over the network using a poor connection (see Equation 5 in Appendix B). Certainly, this impacts the overall performance of the system negatively. This is especially true in large cities with multiple drivers requesting a space at the same time.

Although a distributed approach [33, 40, 41] removes the central coordinator, it introduces unnecessary network traffic as data passes through multiple mediators, even if data is unimportant for them. For example, in Figure 2(b) the data generated by the Availability Checking Service goes through Coordinator1 and then through Coordinator2, before reaching the service that really needs the data (i.e. the Reservation Service). These additional network hops negatively impact the overall performance of the system.

For our scenario, the distribution of data load among three coordinators avoids a single bottleneck. However, as coordinators mediate interactions, $\sim$ 123.36 MB of data is exchanged over the network (see Equation 5 in Appendix B), leading to $\sim$ 19.80% more network traffic than the centralized approach (see at the bottom of Appendix B). In fact, this indicates that the network traffic may increase linearly with the number of coordinators.

The decentralized data exchange approach [7, 8, 11, 13] is the most efficient one, since data are passed directly to the services that actually need it. Particularly, it requires only one network hop to pass data from a data producer to a data consumer, as depicted in Figure 2(c). The decentralized version of our scenario requires a total data transfer of $\sim$ 51.49 MB over the network (see Equation 7 in Appendix B), i.e. $\sim$ 50.00% less network traffic than the centralized approach and $\sim$ 58.26% less traffic than the distributed one (see at the bottom of Appendix B).

3. THE DX-MAN MODEL

To realize decentralized data flows, we propose to extend the semantics of DX-MAN [42–44], which is an algebraic model for building IoT systems, where services and exogenous composition operators are first-class semantic entities (Fig. 3).

Figure 3

DX-MAN Model.

A service $S \in S$ is a stateless distributed unit of composition that can be atomic or composite. Its semantics is a workflow space $W \in W$ ⁠, which is a (finite or infinite) set of workflow control flow variants $w_{i \in [1, \infty]} \in W$ that represent alternative service behaviours:

\begin{aligned} S \equiv W \equiv {w_{1}, w_{2}, \dots} \end{aligned}

(1)

where $S$ is the service type and $W$ is the workflow space type.

An atomic service is syntactically formed by connecting an invocation connector with a computation unit which encapsulates the implementation of multiple operations and it is not allowed to invoke other units (Fig. 3(a)). Semantically, this is equivalent to constructing an atomic workflow space whose variants invoke different operations in the computation unit via the invocation connector. An operation performs some computation and has an input parameter and an output parameter.

The DX-MAN model relies on algebraic composition for defining complex services. Algebraic composition is the process by which a composition operator hierarchically composes multiple services of type $S$ into a composite service of type $S$ which, in turn, can be further composed into even more complex composites (Fig. 3(b)) [44]. A composition operator defines control flows exogenously (i.e. outside the composed services) to avoid service dependencies. Formally, it is a function $◯$ for $n$ services with the following type:

\begin{aligned} ◯ : S^{n} \to S \end{aligned}

(2)

A composite service is syntactically formed by connecting a composition operator with multiple (atomic and/or composite) services, which has an input parameter and an output parameter (Fig. 3(c)). Semantically, it is a composite workflow space produced by a composition operator taking sub-workflow spaces as operands. There are composition operators for sequencing (i.e. sequencer), branching (i.e. inclusive selector and exclusive selector) and parallelism (i.e. parallelizer). A sequencer or a parallelizer defines an infinite workflow space, whilst a branching operator defines a finite one.

The control flow structure of a composite service is represented by an abstract workflow tree whose leaves are operations in atomic sub-services, whole composite sub-services or any combination thereof (Fig. 4). An abstract workflow tree allows the definition of a particular variant from a composite workflow space. To do so, a concrete workflow tree must be created, which is a selection function over a set of workflow variants, and it is therefore isomorphic to an abstract workflow tree. The edges of a concrete workflow tree are labelled according to the composition operator used. In particular, the label of a sequencer edge is an ordered list of natural numbers and the label of a parallelizer edge is a natural number (representing the amount of parallel tasks). For the edges of an inclusive selector and an exclusive selector, the labels are conditions and boolean values, respectively. An exclusive selector also has an associated global condition. For further details on workflow selection, see [42, 43].

$Abstract workflow trees for the composite service shown in Figure 3, when $\bigcirc _C$ is (a) a sequencer operator, (b) a parallelizer operator, (c) an inclusive selector operator or (d) an exclusive selector operator.$

Figure 4

Abstract workflow trees for the composite service shown in Figure 3, when $◯_{C}$ is (a) a sequencer operator, (b) a parallelizer operator, (c) an inclusive selector operator or (d) an exclusive selector operator.

3.1. Separation of control flows and data flows

IoT service composition can be endogenous or exogenous, depending on control flow origin [45]. Endogenous composition is used by P2P choreographies, which mix service computation (i.e. operations) with control flow constructs (Fig. 5(a)). By contrast, exogenous composition relies on coordinators that define control flow outside the computation of composed services (Fig. 5(b)). This composition approach is used by orchestration and by DX-MAN. Unlike its counterpart, exogenous composition avoids control flow dependencies between services and facilitates reuse at scale [29, 36, 44, 46, 47]. It also avoids application logic being embedded in the computation of multiple atomic services, thereby facilitating tracking and monitoring, which are crucial desiderata of functional scalability [1, 48].

Figure 5

Endogenous composition versus exogenous composition.

Despite the above advantages, exogenous composition suffers from performance issues when data follow control [11, 14, 49]. This is because coordinators mediate data streams even for data that are unimportant for them (Figs 2(a) and (b)).² To address this problem, we propose to define control flow and data flow as orthogonal dimensions (Fig. 6). The idea is that coordinators (i.e. composition operators) never exchange data during workflow execution, but only control.

Figure 6

Possible DX-MAN dimensions. This paper proposes an approach for realizing (b).

In DX-MAN, a workflow is defined by a concrete workflow tree (Fig. 7(b)). It has an input parameter and an output parameter and describes a series of steps for the invocation of operations in atomic services, sub-workflows or any combination thereof.³ Figure 7(c) shows an example of a sequential workflow for the invocation of the operation opA (provided by the atomic service $S_{A}$ ⁠) and the operation opB (provided by the atomic service $S_{B}$ ⁠). This workflow variant results from the composition depicted in Figure 7(a) where the atomic services $S_{B}$ and $S_{C}$ are composed into the composite $S_{D}$ by the sequencer operator $◯_{D}$ ⁠. Likewise, the sequencer operator $◯_{E}$ composes $S_{A}$ and $S_{D}$ into the top-level composite $S_{E}$ ⁠. For each composite, we select a workflow with a different concrete workflow tree. Note that a DX-MAN composition is done in a hierarchical bottom-up manner and, although there are infinite sequential workflow variants, we describe only one of them for the sake of this paper. Also note that the workflow invoking opB is a sub-workflow of the top-level one. This is because $S_{D}$ is composed into $S_{E}$ ⁠.

Figure 7

Relationship between DX-MAN compositions and IoT workflows.

A workflow execution traverses a control flow tree from top to bottom and then returns backwards (Fig. 8). Thus, the execution of the workflow depicted in Figure 7(c) starts with the activation of $◯_{E}$ ⁠, which sequentially passes control to the invocation connector $A$ and then to $◯_{D}$ ⁠. The latter just forwards control to the invocation connector $B$ ⁠. Once they are triggered, invocation connectors execute an operation in their computation unit.

Figure 8

Possible executions of the workflow shown in Figure 7(c) when control flows and data flows are (a) mixed or (b) orthogonal. In (a), composition operators (i.e. coordinators) mediate data streams, even if data are unimportant for them. In (b), data are passed directly between atomic services. This paper proposes an approach to realize (b).

If data follow control, it passes through the composition connectors $◯_{E}$ and $◯_{D}$ (Fig. 8(a)). Thus, to avoid such inefficient executions, we propose an approach that enables decentralized data exchanges between atomic services by the separation of control and data (Fig. 8(b)). Our approach is described below.

3.2. Design-time: (semantic) data forest definition

A DX-MAN system is a workflow control flow variant with an orthogonal data forest. The latter is a graph of directed data dependency trees defined on service composition at design-time, whose edges define an explicit relationship between parameters. Formally, a data forest is a graph $F := (V (F), E (F))$ s.t. $V (F)$ is a (finite or infinite) set of parameter vertices and $E (F)$ is a (finite or infinite) set of edges. As a vertex $v \in V (F)$ is of type $D$ ⁠, an edge $e \in E (F)$ is a tuple of type $D \times D$ where $D$ is the parameter type.

For a parallelizer, the workflow input is connected to the inputs of all the invoked elements (i.e. operations or sub-workflows) whose outputs are, in turn, connected to the workflow output. In this paper, we only describe how parallel and sequential data forests are formed, since decentralized data flows are only meaningful for parallelism and sequencing. For a sequencer, the connection of vertices forms a data pipeline, so the output of an invoked element is connected to the input of the next invoked element. Additionally, the workflow input is connected to the input of the first invoked element, and the output of the last invoked element is connected to the workflow output.

Figure 9 analyses thoroughly the composition depicted in Figure 7. It shows the sequential workflow variants and the data forests for the composites $S_{D}$ and $S_{E}$ ⁠. Such forests conform to the rules described for a sequencer. The analysis is segmented because every composite in the DX-MAN model is a black box that can have any possible behaviour out of the alternative ones. Thus, as there are no (bridge) connections between data dependency trees of different forests, data are encapsulated in composite services. Therefore, composites (with all their workflows and data forests) are reusable across different IoT systems.

Figure 9

Separation of control and data in sequential composites. Control constructs and data dependencies are explicitly defined by exogenous composition operators and data forests, respectively.

Reusability is also present in the composite depicted in Figure 10(a), which shows an example where the parallelizer $◯_{Z}$ composes the atomic services $S_{X}$ and $S_{Y}$ into the composite $S_{Z}$ ⁠. In this example, we define a concrete workflow tree for executing all the sub-service operations in parallel. The resulting workflow has a fork construct to pass control in parallel to opX and opY, and a join construct to synchronise control flow. Choosing this workflow implies that there is a data forest conforming to the rules previously described for a parallelizer (Fig. 10(b)).

Figure 10

Separation of control and data in a parallel composite. Parallel constructs (i.e. fork and join) and data dependencies are explicitly defined by exogenous composition operators and data forests, respectively.

Although decentralized data flows are only meaningful for parallelism and sequencing, for completeness we describe how branchial data forests are formed. For both exclusive and inclusive selectors, the corresponding connection scheme is isomorphic to the one defined for a parallelizer in which the workflow input is connected to the inputs of all the invoked elements, and the outputs of the elements are connected to the workflow output (Fig. 10(b)). This connection is done to ensure that there is an output value no matter the execution branch taken.

3.3. Deployment-time: data forest refinement

At deployment-time, a data forest can be (manually) refined in three different ways: (i) by adding or removing edges, (ii) by adding input data into exogenous operators or (iii) by introducing data processors. These refinements are entirely optional with the overall aim of providing extra flexibility for system modellers. To understand them, we consider the data forest of the composite $S_{E}$ (shown in Fig. 9):

\begin{array}{r} F_{E} := ({i E, o E, i A, o A, i D, o D}, \\ {(i E, i A), (o A, i D), (o D, o E)}) \end{array}

where $i E, o E, i A, o A, i D, o D \in D$ ⁠.

For all the scenarios, we show how edges and vertices change with respect to the original data forest (Fig. 11(a)).

Figure 11

Data forest refinement.

3.3.1. Edges refinement

Refining edges is useful for changing data relationships and optimizing data flows. For instance, the edge $(i E, i A)$ can be removed when the operation opA does not need any data to perform computation. Similarly, by replacing $(o A, i D)$ with $(i E, i D)$ ⁠, the sub-workflow $S_{D}$ receives data from the top-level input instead of getting it from opA (Fig. 11(b)). Formally, $E (F_{E})$ becomes $E (F_{E}) ∖ {(i E, i A), (o A, i D)} \cup {(i E, i D)}$ ⁠.

3.3.2. Exogenous operator refinement

Although they do not perform any computation, exogenous operators might require input values to evaluate boolean conditions at run-time. By default, exogenous operators do not have any parameters, but parameters can be added for a specific workflow. To do so, new vertices are added into the respective data forest and connected manually at the discretion of the modeller. Edge connections must conform to the rules described in [49]. Figure 11(d) shows an example of this kind of refinement, where $i O \in D$ represents the input parameter of the composition operator that coordinates the execution of the workflow $S_{E}$ ⁠. Formally, $V (F_{E})$ becomes $V (F_{E}) \cup {i O}$ and $E (F_{E})$ becomes $E (F_{E}) \cup {(i E, i O)}$ ⁠.

3.3.3. Data processor refinement

Data processors can be introduced in a data forest to perform intermediate processing. A data processor is a function that receives at least one input data, performs some custom computation and stores result(s) in at least one output (Fig. 11(c)). By introducing data processors, we avoid tangling with the original workflow control flow of an IoT system. Although the DX-MAN model currently supports the most common data patterns (i.e. map-reduce and filtering), other data processors can be defined using the same semantics. For instance, a replicator could be introduced to copy data to multiple parameters.

Figure 11(e) customises $F_{E}$ by filtering the data generated by opA before sending it to the sub-workflow $S_{D}$ ⁠. Formally, $V (F_{E})$ becomes $V (F_{E}) \cup {i F, o F}$ and $E (F_{E})$ becomes $E (F_{E}) ∖ {(o A, i D)} \cup {(o A, i F), (o F, i D)}$ s.t. $i F, o F \in D$ are the filter parameters. Figure 11(f) shows another refinement, where a mapper collects data from both opA and the parent workflow to create a list of transformed key-value pairs. The list is then processed by a reducer which produces an output in a suitable format for $S_{D}$ ⁠. Formally, $V (F_{E})$ becomes $V (F_{E}) \cup {i M, i M^{'}, o M, i R, o R}$ and $E (F_{E})$ becomes $E (F_{E}) ∖ {(i E, i A), (o A, i D)} \cup {(i E, i M^{'}), (o A, i M), (o M, i R), (o R, i D)}$ s.t. $i M, i M^{'}, o M \in D$ are the mapper parameters and $i R, o R \in D$ are the reducer parameters.

3.4. Deployment-time: data forest analysis

Once data forests have been defined and refined (if needed), a direct mapping between consumer parameters and producer parameters is created. To describe our approach, some formal notations are provided.

Notation 1.

Let $Π_{1} : D \times D \to D$ be the tail map of a forest edge.

Notation 2.

Let $Π_{2} : D \times D \to D$ be the head map of a forest edge.

Notation 3.

Let $PI$ be the processor input type, $PO$ the processor output type, $OI$ the operation input type, $OO$ the operation output type, $WI$ the (top-level) workflow input type, $WO$ the (top-level) workflow output type and $EI$ the type of an exogenous operator input s.t. $PI, PO, OI, OO, WI, WO, EI \subset D$ ⁠.

Notation 4.

Let $V_{c}$ be the type of vertices that consume data on workflow execution, namely service operation inputs, exogenous operator inputs, data processor inputs and top-level workflow outputs s.t. $V_{c} := OI \cup EI \cup PI \cup WO$ ⁠. Top-level workflow outputs contain the resulting data from an IoT system execution.

Notation 5.

Let $V_{p}$ be the type of vertices that produce data on workflow execution (i.e. operation outputs and data processor outputs) as well as vertices representing input data of top-level workflows s.t. $V_{p} := OO \cup PO \cup WI$ ⁠. Top-level workflow inputs are the data needed for an IoT system execution.

Notation 6.

Let $ς \subset (V_{c} \times V_{p})$ be a binary relation (i.e. a mapping) between some consumer parameters and some producer parameters.

Notation 7.

Let $ρ \subset (V_{p} \times V_{c})$ be a binary relation (i.e. a mapping) between some producer parameters and some consumer parameters.

Our approach analyses the edges of all the data forests of an IoT system, from bottom to top with a complexity of $O (k)$ s.t. $k$ is the total number of edges in a multi-level composition (Fig. 12). As they do not require any data manipulation, workflow parameters (i.e. composite service parameters) are discarded by transforming the binary relations $ς$ and $ρ$ each time an edge is analysed. When all edges have been processed, $ς$ is the result of our approach, i.e. a direct mapping from data consumers to data producers.

Figure 12

Bottom-up analysis of data forests: from level 0 to level $L$ ⁠. The algorithm analyses the data forests of the composite services at level $0$ ⁠, then the data forests of the composite services at level $1$ ⁠, and so on until reaching the data forests of the top-level composite at level $L$ ⁠.

We have designed Algorithm 1 to analyse individual data forest edges. The algorithm uses the sets $X_{p} \subset D$ and $Y_{c} \subset D$ to process the endpoints of an edge $e \in D \times D$ ⁠. The elements of these sets are determined according to two fundamental rules. The first one is shown in lines 4–7. It states that $X_{p}$ is the set of vertices connected to the tail parameter $Π_{1} (e)$ if the tail is not a data processor parameter and has incoming connections; otherwise, $X_{p}$ only contains the tail. The second rule is shown in lines 8–13. It states that if the head parameter $Π_{2} (e)$ is not in a data processor and has outgoing connections, then (i) $Y_{c}$ has the vertices connected from the head parameter and (ii) $X_{p}$ (without the head parameter) is the set of additional producers for each consumer $y \in Y_{c}$ ⁠. Otherwise, $Y_{c}$ just contains the head parameter $Π_{2} (e)$ and the elements of $X_{p}$ are additional producers of $Π_{2} (e)$ ⁠. After processing the above rules, the relation $ρ$ is updated to map each producer $x \in X_{p}$ to each consumer $y \in Y_{c}$ ⁠. Appendix C presents a simple example using this algorithm. A more detailed example is presented in Section 5.

4. IMPLEMENTATION

The DX-MAN platform [50] implements the semantics of the DX-MAN model and provides Java APIs to algebraically compose, deploy and execute IoT systems. Our approach is implemented on top of that platform and uses the Blockchain as the underlying decentralized data space. Please note that a Blockchain platform is just a possible implementation of a decentralized data space. Other possible implementations include OpenLink Data Spaces (ODS), ZeroMQ messaging, a shared memory for IoT or even Apache Storm.⁴ We chose a Blockchain implementation to ensure data ownership for every service and data provenance for discovering data flow histories. Furthermore, we guarantee an extra layer of performance, security and auditability. It is important to note that we do not claim any contributions in terms of implementation. Our implementation is just for validation purposes.

Our approach uses Hyperledger Fabric 1.2 as the underlying Blockchain infrastructure and Hyperledger Composer 0.20.0 to define a Blockchain model based on three smart contracts (Fig. 13). CreateParameters initialises the consumer parameters of an IoT system, whereas UpdateParameters and ReadParameters change and retrieve a list of parameter values from the data space, respectively. The implementation logic of the contracts is written in JavaScript and it is shown in Appendix D.

Figure 13

Blockchain model.

The DX-MAN platform provides the programming abstractions for composing IoT services (at design-time) and selecting IoT workflows (at deployment-time). When a workflow is selected, a data forest is automatically created for the respective control flow. Workflow selection is out of the scope of this paper; on this matter, we refer the reader to [42]. Also, to guarantee openness and replicability, the source code of the DX-MAN platform is available at https://github.com/damianarellanes/dxman.

Once workflows have been selected, Algorithm 1 analyses the edges of all the data forests involved to build a Java Hashmap (i.e. a binary relation $ς$ ⁠) with a consumer parameter UUID as the key and a list of producer parameter UUIDs as the value. The entries of the map are stored as Parameter assets in the Blockchain using the transaction CreateParameters. In particular, the field parameterId is the hashmap key and the field producers is the hashmap value (see lines 7 and 12 in Fig. 13).

At run-time, exogenous operators coordinate an IoT system execution by passing control only via CoAP messages.⁵ When an exogenous operator receives control, it performs one of the processes shown in Figure 14(b and c) only if input data are needed; otherwise, steps 1–2 are omitted. When control triggers an invocation connector, the latter performs the process shown in Figure 14(a). Firstly, the connector executes the transaction ReadParameters to retrieve the necessary inputs from the data space. For each input, the Blockchain directly consumes values from the producer list.

Figure 14

Steps performed by (a) an invocation connector and (b–c) an exogenous operator after receiving control. (b–c) are only applicable for exogenous operators that require input data. Other operators ignore the steps 1 and 2.

Producer data are stored in the Blockchain as soon as it is available, even before control reaches consumer services. To avoid run-time synchronisation problems, control flow blocks in an invocation connector until all needed input values are retrieved. Next, the connector executes the computation of an operation by passing it the respective data. The result is stored in the Blockchain via the transaction UpdateParameters.

UpdateParameters raises an UpdateParameterEvent after changing a parameter value. This is useful for data processor instances that subscribe to events notified by producer parameters. Thus, a data processor instance performs a user-defined computation after receiving all the events for its inputs. Currently, our implementation supports mappers, reducers and filters, but further processors (e.g. a replicator) can be implemented conforming to the semantics presented in Section 3.3.

Consumer parameters are unaware of data producers, and vice versa, since data references are stored in the Blockchain. Furthermore, the mapping generated by Algorithm 1 avoids the inefficient approach of passing data values through exogenous operators at run-time. Therefore, as consumer parameters read data directly from producers and they are unaware of each other, our approach enables a (transparent) decentralized data exchange.

5. CASE STUDY

This section describes how to enable decentralized data flows in the smart parking scenario presented in Section 2, by covering the three phases of a DX-MAN system life-cycle.

5.1. Design-time

At design-time, we use the syntactic constructs presented in Section 3 to define a multi-level composition structure for our smart parking application (Fig. 15). This process is done in a hierarchical bottom-up manner, starting with the composition of Sensor Services into SensorNet, via the parallelizer operator $◯_{N}$ ⁠. Afterwards, we use the sequencer operator $◯_{C}$ to compose the SensorNet composite, the Availability Checking Service and the Reservation Service into CityManagement. In the next level, the GPS Service and CityManagement are composed into Information via the parallelizer operator $◯_{I}$ ⁠. Finally, we use the sequencer $◯_{T}$ to compose Information, Mapping Service and Space Finding Service into CarControlApp (i.e. the top-level composite). Note that, according to the semantics presented in Section 3, each composite service is a workflow space with a (finite or infinite) set of workflow variants and a (finite or infinite) set of data forest variants. For clarity and conciseness, we do not present them. Instead, we refer the reader to our previous work on this matter [42, 43].

Figure 15

DX-MAN composition for the smart parking scenario.

5.2. Deployment-time

For each composite, we define a concrete workflow tree to explicitly choose a workflow and a data forest from the respective space (Section 3). In particular, for SensorNet there is a workflow that executes all sensor operations in parallel. For CityManagement, there is a workflow that sequentially invokes the SensorNet sub-workflow, the getFree operation and the getUnreser operation, in that order. For Information, there is a workflow that synchronises the execution of the getLocation operation and the CityManagement sub-workflow. Finally, in the top-level composite, the Information sub-workflow, the computeDis operation and the getNearest operation are triggered sequentially. The concrete workflow trees for each composite are shown in Figure 16 and the whole workflow of our scenario is depicted in Figure 17(a). The latter is just a concatenation of individual workflows.

Figure 16

Selecting workflows explicitly and data forests implicitly for the smart parking scenario.

Figure 17

Orthogonality between control flow (i.e. workflow) and data dependencies (i.e. data forests) in the smart parking scenario.

In Section 3, we mentioned that data forests are implicitly selected when workflows are chosen. On that basis, the resulting data forests for each composite service are shown in Figure 16 and the complete concatenation of data forests is illustrated in Figure 17(b). Note that both workflows and data forests are not manually created, but just selected using the syntactic representation of concrete workflow trees.

Once data dependencies are in place, some data forests are refined (Fig. 18(a)). In particular, we remove the edges between the inputs of the parallel composites (i.e. SensorNet and Information) and the inputs of their respective sub-services. Similarly, we remove the edges connecting the inputs of the sequential composites (i.e. CarControlApp and CityManagement) and the inputs of the parallel composites. Note that this does not mean that edges are always refined. In some scenarios, refinement never takes place.

Figure 18

Transforming data forests into a binary relation between consumer parameters and producer parameters.

After refinement, Algorithm 1 analyses the edges of each data forest from the bottom-level composite (i.e. SensorNet) to the top-level one (i.e. CarControlApp). The resulting binary relation between consumer parameters and producer parameters is illustrated in Figure 18(b).

5.3. Run-time

At run-time, composition operators coordinate the execution of the selected workflow by passing control only (Fig. 19(a)), while atomic services exchange data directly (Fig. 19(b)). In particular, when a driver requests a nearest parking space, $◯_{T}$ passes control to the $◯_{I}$ operator which, in turn, forks control towards the invocation connector $G$ and the $◯_{C}$ operator. When $G$ is triggered, it activates the getLocation operation before passing control back to $◯_{I}$ ⁠. Meanwhile, $◯_{C}$ forwards control to $◯_{N}$ for activating the invocation connectors of all the sensors. After receiving control from all of them, $◯_{N}$ notifies the $◯_{C}$ operator for executing the operations getFree and getUnreser via their invocation connectors. Once control returns from $R$ ⁠, $◯_{C}$ sends control back to $◯_{I}$ ⁠. If the latter has received control from $G$ and $◯_{C}$ ⁠, it notifies $◯_{T}$ for invoking the operations computeDis and getNearest via $M$ and $S$ ⁠, respectively. The execution of the smart parking app terminates when $◯_{T}$ receives control from $S$ ⁠.

Figure 19

Orthogonal control flows and data flows in the smart parking scenario.

Figure 19 illustrates the execution of the smart parking scenario, where it is clear that data never passes alongside control, but it flows in a decentralized manner according to the binary relation shown in Figure 18(b). It is important to note that control blocks in an invocation connector until all producer values become available (Section 4). For instance, $A$ waits until all sensor data have been produced, before executing the getFree operation; thereby, guaranteeing synchronisation between the Availability Checking Service and all Sensor Services. Situations like this can be common since control generally travels faster than data over the network, and they may happen even in sequential workflows. This implies that an operation can be executing while control waits in the invocation connector of the next service in a pipeline.

6. DISTRIBUTED DATA FLOWS VERSUS DECENTRALIZED DATA FLOWS IN THE SMART PARKING SCENARIO

This section presents a quantitative comparison between the proposed approach (when data are separated from control) and distributed data flows (when data follow control), in terms of network traffic generated and response time perceived. Our evaluation is based on two OMNET++ simulations in which composition operators and invocation connectors are OMNET++ modules, whereas control flow and data flow are channels between modules.⁶ For fairness, we assume ideal wireless channels communicating over IEEE 802.11a links at a rate of $54$ Mbps, with no delay and no bit error rate. The configuration of our simulations is shown in Figure 20.

Figure 20

OMNET++ configuration for the smart parking system. The simulation on distributed data flows do not use the channels between invocation connectors, whereas the simulation on decentralized data flows use all visible channels.

An OMNET++ experiment simulates drivers using the smart parking application between 07:59:45 and 16:26:47 (Appendix A). In all the experiments, we assume that there is no network traffic for invoking operations, since invocation connectors reside in the same device as their connected operations. Also, composition operators and invocation connectors do not incur any processing time since they only forward data or control. The latter is just a 1-byte signal that activates an operator or a connector.

For simplicity and clarity, we also assume that IoT devices hosting atomic services have a fixed data processing time of $100 μ$ seconds. This assumption is valid since having equal data processing rates does not influence the network traffic generated by atomic services.

Considering the above assumptions, the first experiment simulates decentralized data flows in the smart parking system. Here, control is exchanged among composition operators and invocation connectors, while data are passed between producer connectors and consumer connectors via ideal OMNET++ channels (Fig. 20). These channels simulate read-write operations on a decentralized space (regardless its implementation). These actions are performed by invocation connectors on behalf of their connected operations. In this experiment, the total network traffic from 07:59:45 to 16:26:47 is $51.93$ MB (Table 2). The additional traffic with respect to the calculation described in Appendix B corresponds to the bytes required for control exchange (i.e. $0.44$ MB).⁷ This extra traffic is minimal since control is just a 1-byte signal that does not carry any additional data.

Table 2

Network traffic in the smart parking scenario (from 07:59:45 to 16:26:47).

	Data traffic (MB)	Control traffic (MB)	Total (MB)
Decentralized	51.49	0.44	51.93
Distributed	140.41	0.22	140.63

Table 2

Network traffic in the smart parking scenario (from 07:59:45 to 16:26:47).

	Data traffic (MB)	Control traffic (MB)	Total (MB)
Decentralized	51.49	0.44	51.93
Distributed	140.41	0.22	140.63

The second simulation considers a distributed version of the smart parking scenario, in which data pass alongside control through invocation connectors and composition operators. The only exception is when control traverses from $◯_{T}$ to $G$ ⁠, and from $◯_{I}$ to the invocation connectors $1$ and 577 (Fig. 21(b)). This is because the GPS Service and the Sensor Services do not require any input data from the CarControlApp composite (Fig. 18(a)). Using distributed data flows, the total network traffic from 07:59:45 to 16:26:47 is $140.63$ MB (Table 2).

Figure 21

Decentralized data flows versus distributed data flows in the smart parking scenario.

Although our approach requires 50% more control messages than its counterpart, the network traffic is dramatically minimized by a rate of $2.71$ times, i.e. $(1 - \frac{51.93}{140.63}) \cdot 100 \approx 63.07 %$ less network traffic than the distributed version. Particularly, the main benefit occurs when data are passed directly from the Reservation Service to the Mapping Service, since the distributed approach requires four network hops whereas our approach requires only one (Fig. 21).

Reducing network traffic improves the overall response time of the smart parking system by an average factor of $4$ ⁠. This factor is computed as follows:

\begin{aligned} \frac{\sum_{i = 1}^{373} \frac{ψ_{i}^{α}}{ψ_{i}^{β}}}{373} \end{aligned}

(3)

where $ψ_{i}^{α}$ is the response time perceived by driver $i$ when requesting a parking space under distributed data flows and $ψ_{i}^{β}$ is the response time perceived by driver $i$ when requesting a parking space under decentralized data flows. The response times (in seconds) are individually collected from our simulations for the 373 drivers arriving between 07:59:45 and 16:26:47.⁸

As an example, Figure 21 shows the time window for the driver’s request at 08:26 (i.e. 1621 seconds after 07:59:45), where it is clear that the system is executed in 0.08 seconds using the distributed approach and in $0.02$ seconds with the decentralized version. This improvement is due to the fact that data and control travel independently over the network when decentralized data flows are used, so that control arrives before data in certain cases (e.g. see the timeline of $R$ ⁠) and data arrives before control in others (e.g. see the timeline of $M$ ⁠).

Table 2 presents the results of a quantitative evaluation that considers a few services. However, the benefit of our approach becomes more evident as the number of services increases. To evaluate this property, known as functional scalability, we conducted an experiment in which we gradually increase the data being processed by the smart parking application through the continuous addition of sensor services (up to 100 000). The aim of this experiment is to analyse network traffic in the order of Gigabytes. The results are shown in Figure 22.

Figure 22

Impact of increasing the number of sensor services in the smart parking scenario.

Figure 22 shows that our approach outperforms distributed data flows in terms of data traffic, whereas the distributed version outperforms our approach in terms of control traffic. Despite this, the total network traffic is linearly improved by our approach since control messages are minimal (i.e. they always oscillate below $0.08$ GB). Consequently, the total traffic and the data traffic are almost identical (cf., Fig. 22(a) and (c)).

The next section presents a generic evaluation of functional scalability that is independent of any implementation or specific case studies. In this evaluation, we do not only increase services horizontally (for the same composite) but also vertically (by increasing the number of composition levels).

7. EVALUATION OF FUNCTIONAL SCALABILITY

This section presents a comparative evaluation between decentralized data flows (when control flow and data flow are separated—Fig. 8(b)) and distributed data flows (when data follows control—Fig. 8(a)), in terms of functional scalability. To do so, two major research questions are studied: (RQ1) Does the proposed approach scale with the number of services? and (RQ2) Under which conditions are decentralized data flows beneficial?

To answer the above questions, we conducted a series of experiments based on the following statements.

Definition 7.1.

$G_{α} = (V_{α}, E_{α}, Ω_{α})$ is a weighted graph of distributed data flows in which $V_{α}$ is a set of parameters, $E_{α}$ is a set of directed data flows (between parameters) and $Ω_{α} : E_{α} \to R_{> 0}$ is a function that maps a distributed data flow to a network communication cost s.t. $V_{α} \subset (V_{p} \cup V_{c})$ ⁠.

Remark 1.

$V_{p}$ is the type of a producer parameter and $V_{c}$ is the type of a consumer parameter.

Definition 7.2.

$M = ⟨ e_{1}, \dots, e_{l} ⟩$ is the type of a finite walk in $G_{α}$ for moving a data value through $l$ data flows, from a producer parameter $v_{1} \in V_{p}$ to a consumer parameter $v_{l + 1} \in V_{c}$ s.t. $e_{j \in [1, l]} \in E_{α}$ and $v_{j \in [1, l + 1]} \in V_{α}$ ⁠, $\forall p \in (V_{α} \cap V_{p}) \exists M$ ⁠.

Definition 7.3.

Given

G_{α}

⁠,

α : M \to R_{> 0}

is the function that computes the total network cost of routing data from a producer parameter

p

to a consumer parameter

q

⁠, via distributed data flows. The function

α

is given as follows:

\begin{aligned} α (M^{'}) = \sum_{j = 1}^{| E (M^{'}) |} Ω_{α} (e_{j}) \end{aligned}

where

M^{'} \in M

⁠,

e_{j \in [1, | E (M^{'}) |]} \in E (M^{'})

⁠,

E (M^{'}) \subseteq E_{α}

⁠,

p \in (V_{α} \cap V_{p})

and

q \in (V_{α} \cap V_{c})

⁠.

Notation 8.

Let $ω : V_{p} \to R_{> 0}$ be the function that computes the network cost of writing/producing a parameter value on a data space.

Notation 9.

Let $Γ : V_{c} \to R_{> 0}$ be the function that computes the network cost of reading/consuming a parameter value from a data space.

Definition 7.4.

G_{β} = (V_{β}, E_{β}, Ω_{β})

is a weighted graph of decentralized data flows in which

V_{β}

is a set of parameters,

E_{β}

is a set of directed data flows (between parameters) and

Ω_{β} : E_{β} \to R_{> 0}

is a function that maps a decentralized data flow to a network communication cost s.t.

V_{β} \subseteq (V_{p} \cup V_{c})

⁠. The function

Ω_{β}

is defined as follows:

\begin{aligned} Ω_{β} (e) = (ω \circ Π_{1}^{β}) (e) + (Γ \circ Π_{2}^{β}) (e) \end{aligned}

where

e \in E_{β}

⁠,

Π_{1}^{β} (e) \in (V_{β} \cap V_{p})

and

Π_{2}^{β} (e) \in (V_{β} \cap V_{c})

⁠.

Remark 2.

The functions $Π_{1}^{β} : E_{β} \to V_{β}$ and $Π_{2}^{β} : E_{β} \to V_{β}$ are the tail map and the head map of an edge in $G_{β}$ ⁠, respectively.

Definition 7.5.

Given

G_{β}

⁠,

β : E_{β} \to R_{> 0}

is the total network cost of routing data from a producer parameter

p

to a consumer parameter

q

⁠, via a decentralized data flow. The function

β

is notably given as follows:

\begin{aligned} β (e) = Ω_{β} (e) \end{aligned}

where

e \in E_{β}

⁠,

p = Π_{1}^{β} (e)

⁠,

q = Π_{2}^{β} (e)

⁠,

p \in (V_{β} \cap V_{p})

and

q \in (V_{β} \cap V_{c})

⁠.

We assume that $Ω_{α}$ ⁠, $Γ$ and $ω$ include all network communication factors, such as marshalling, unmarshalling, latency and bandwidth, just to name a few.

7.1. Initial experimental setting

The worst performance of a distributed approach occurs when all composite services are sequential. So, for our evaluation we consider the DX-MAN composition shown in Figure 23, which has four hierarchy levels, four atomic services and three composites. The sequential workflow variants selected for each composite and their respective data forests are also presented in the figure.

Figure 23

Initial experimental setting.

Figure 24 shows the resulting workflow control flow and the data exchange approaches that we use in our experiments. For the distributed approach (Fig. 24(b)), we consider the graph $G_{α}$ in which data follows control and passes through workflow parameters via nine data flow edges with a $Ω_{α} (e_{j})$ cost each s.t. $e_{j \in [1, 9]} \in E_{α}$ ⁠. Particularly, $G_{α}$ has five data flow walks since the number of producer parameters is directly proportional to the number of walks: $M_{i G} = ⟨ e_{1} ⟩ \in M$ ⁠, $M_{o A} = ⟨ e_{2}, e_{3} ⟩ \in M$ ⁠, $M_{o B} = ⟨ e_{4}, e_{5} ⟩ \in M$ ⁠, $M_{o C} = ⟨ e_{6} ⟩ \in M$ and $M_{o D} = ⟨ e_{7}, e_{8}, e_{9} ⟩ \in M$ s.t. $i G, o A, o B, o C, o D \in (V_{α} \cap V_{p})$ and $e_{j \in [1, 9]} \in E_{α}$ ⁠.

Figure 24

Distributed data flows versus decentralized data flows.

For the decentralized approach (Fig. 24(c)), we consider the graph $G_{β}$ in which data are separated from control and passes from producer parameters to consumer parameters directly. In this case, there are five data flows each with a network cost of $Ω_{β} (e_{j})$ s.t. $e_{j \in [1, 5]} \in E_{β}$ ⁠. For example, the cost of routing the data produced by oA is $Ω_{β} ((o A, i B)) = ω (o A) + Γ (i B)$ ⁠. As $G_{β}$ is constructed with Algorithm 1 and does not consider any data exchanges through workflow parameters, the number of edges is directly proportional to the number of producer parameters (i.e. $| E_{β} | = | V_{β} \cap V_{p} | = 5$ ⁠) and the number of vertices is twice the number of producers (i.e. $| V_{β} | = 2 | V_{β} \cap V_{p} | = 10$ ⁠). Therefore, $| V_{β} | < | V_{α} | \land | E_{β} | < | E_{α} | ∵ | V_{α} | = 14 \land | E_{α} | = 9$ ⁠.

7.2. RQ1: does the proposed approach scale with the number of services?

We conducted an experiment to dynamically increase the number of composites in the (four-level) composition shown in Figure 23. The experiment is carried out in $τ = 200000$ steps with $| E_{α} | \in (N \cap [9, f (τ)])$ and $| E_{β} | \in (N \cap [5, g (τ)])$ s.t. $f (c) = 4 c + 9$ is the total number of distributed data flows and $g (c) = 2 c + 5$ is the total number of decentralized data flows, when there are $c \in (N \cap [0, τ])$ new composites in $S_{F}$ ⁠.

At each step of the experiment, we compose a replica of $S_{E}$ into $S_{F}$ and define a sequential workflow variant for the invocation of the operations opC and opD (Fig. 23). At the end of the experiment, there is a data pipeline in $G_{α}$ for $τ + 1$ composites in $S_{F}$ ⁠. In the case of $G_{β}$ ⁠, there are additional data flows between the parameters of the atomic services added.

As they capture the growth pattern of data flows, we use the functions $f (c)$ and $g (c)$ to analytically compare $| E_{α} |$ and $| E_{β} |$ as the number of composites increases. Figure 25 shows the result of our analysis under the assumption that the weight functions of $E_{α}$ and $E_{β}$ are $Ω_{α} : E_{α} \to {1}$ and $Ω_{β} : E_{β} \to {1}$ ⁠. On that basis, it is clear that the number of data exchanges grows linearly with the number of composite services and that the number of distributed data flows grows twice faster than its counterpart. The latter can be strictly explained by the average rate of change $\frac{Δ f}{Δ g} = 2$ in the interval $[0, τ]$ ⁠.

Figure 25

Impact of data flows when increasing the number of composite services in $S_{F}$ ⁠.

7.3. RQ2: under which conditions are decentralized data flows beneficial?

We conducted two experiments to determine if there is a benefit of decentralized data flows as the number of levels increases in a DX-MAN composition. To do so, we consider the total network costs of passing the output oD (of the atomic service $S_{D}$ ⁠) to an output oT (of a top-level composite), using both the distributed approach and the decentralized one s.t. $o T \in V_{c}$ ⁠. For both experiments, we assume that $i m (Ω_{α}), i m (ω), i m (Γ) \in R \cap (0, 1]$ ⁠.

For each experiment, we compute ten samples over 100 levels each. At each sample, we increase the number of composition levels by $1$ and define random network costs for $E_{α}$ and $E_{β}$ using $Ω_{α}$ and $Ω_{β}$ ⁠, respectively. The decentralized approach yields a graph with the same number of edges no matter the composition levels, whereas the distributed approach increases the number of edges by $1$ at every level. Thus, $| M_{o D} |$ also increases by $1$ ⁠. The improvement rate of the decentralized approach is given by $1 - \frac{β ((o D, o T))}{α (M_{o D})}$ ⁠.

Figure 26 shows the result of our experiments, where it is clear that the benefit of decentralized data flows is logarithmic and, as such, it becomes more evident as the number of levels of a DX-MAN composition increases. This is because $| M_{o D} |$ increases at every level and so $α (M_{o D})$ ⁠. In the experiments, we allocate network costs using random values uniformly distributed in $(0, 1] \in R$ ⁠. However, if the cost of performing operations on a data space (i.e. $β ((o D, o T))$ ⁠) is more expensive than the total cost of routing data through composite coordinators (i.e. $α (M_{o D})$ ⁠), the decentralized approach is outperformed by the distributed one. Thus, a DX-MAN composition only benefits from the proposed approach iff $β ((o D, o T)) < α (M_{o D})$ ⁠.

$Improvement rate of decentralized data flows when increasing the number of levels in a DX-MAN composition: $\xi _1$ and $\xi _2$ correspond to Experiment 1 and Experiment 2, respectively.$

Figure 26

Improvement rate of decentralized data flows when increasing the number of levels in a DX-MAN composition: $ξ_{1}$ and $ξ_{2}$ correspond to Experiment 1 and Experiment 2, respectively.

https://tools.ietf.org/html/rfc7252

8. CONCLUSIONS

In this paper, we presented a functionally scalable approach that semantically separates control and data in DX-MAN compositions for the realization of decentralized data flows in service-oriented IoT systems. At design-time, the algebraic semantics of the model allows the definition of data forests that denote data dependencies between service operation parameters. At deployment-time, data forests are (manually) refined and (automatically) analysed by an algorithm that leverages the separation of control and data. The algorithm works for any workflow in a composite workflow space and produces a direct mapping from consumer parameters to producer parameters. Thus, preventing coordinators (i.e. exogenous operators) from passing data at run-time. In fact, exogenous operators coordinate an IoT system execution by passing control only. To realize our approach, we implemented it on top of the DX-MAN platform, which uses the Blockchain as the underlying data space for persisting direct mappings and managing data at run-time. It is important to mention that our approach is not exclusive to a specific data space implementation (e.g. Blockchain or ODS). For that reason, we evaluated the benefits of the proposed approach rather than the benefits of a particular implementation.

Unfortunately, as other composition models have their own constructs and do not define composition operators like DX-MAN, it is not possible to extend other composition model semantics using our proposed approach. Instead, a custom extension per model would be required to enable the semantic separation of data and control. Another interesting future direction is to explore the possibility of incorporating stateful services into the semantics of DX-MAN.

DX-MAN is a service composition model that semantically separates data, control and computation for separate reasoning, monitoring, maintenance and evolution of such dimensions. More concretely, this separation allows passing data from producer parameters to consumer parameters directly and enables the use of distinct technologies to manage control flows and data flows separately. For example, in our implementation we use CoAP to pass control between exogenous operators and the Blockchain for handling data flows.

Our experimental results confirm that our approach scales well with the number of services, by reducing the number of data flows by an average factor of two with respect to a distributed approach (where data follows control). They also show that our approach scales well with the number of levels of a DX-MAN composition. From our results, we found that the proposed approach provides the best performance when the cost of performing operations on a data space is less than the cost of exchanging data over the network. Thus, our solution is potentially beneficial for large-scale IoT systems in which loosely coupled services exchange huge amounts of data continuously.

DATA AVAILABILITY

The data underlying this article are available in UCI Machine Learning Repository at https://archive.ics.uci.edu/ml/datasets/Parking+Birmingham#.

FUNDING

This work was partially supported by the I-BiDaaS project, funded by the European Union's Horizon 2020 Research and Innovation programme under grant agreement No. 780787.

CONFLICT OF INTEREST

The authors declare that they have no known conflict of interest.

Footnotes

For the rest of the paper, the terms IoT system and service-oriented IoT system are used interchangeably.

Section 2 provides a running example to describe the problem of coordinated data exchanges when using exogenous composition.

As a DX-MAN workflow is semantically equivalent to an orchestration, a DX-MAN composite is equivalent to a (potentially) infinite number of orchestrations.

A pipeline of different technologies can be used to efficiently implement data exchanges between services [51].

https://omnetpp.org/

For increased clarity in our motivation example (Section 2), the calculations in Appendix B do not consider control flow.

Appendix A presents the computation for the number of drivers.

REFERENCES

[1]

Arellanes

and

Lau

K.-K.

(

2020

)

Evaluating IoT service composition mechanisms for the scalability of IoT systems

Fut. Gener. Comput. Syst.

108

827

–

848

[2]

Sarkar

Nambi

A.U.

Prasad

R.V.

Rahim

Neisse

and

Baldini

(

2015

)

DIAT: a scalable distributed architecture for IoT

IEEE Internet Things J.

230

–

239

. https://www.businessinsider.com/internet-of-things-report?r=US&IR=T (

[3]

Newman

(

2020

). The Internet of Things

2020

accessed December 14, 2021

[4]

Arellanes

(

2021

)

Self-organizing software models for the internet of things: complex software structures that emerge without a central controller

IEEE Syst. Man Cybernet. Mag.

–

[5]

Botta

Donato

Persico

and

Pescapé

(

2016

)

Integration of cloud computing and internet of things: A survey

Fut. Gener. Comput. Syst.

684

–

700

[6]

Giang

N.K.

Lea

and

Leung

V.C.M.

(

2020

)

Developing applications in large scale, dynamic fog computing: a case study

Software

519

–

532

[7]

Paniagua

Eliasson

and

Delsing

(

2020

)

Efficient device-to-device service invocation using arrowhead orchestration

IEEE Internet Things J.

429

–

439

[8]

Hahn

Breitenbücher

Kopp

and

Leymann

(

2018

)

Modeling and execution of data-aware choreographies: an overview

Comput. Sci.

329

–

340

[9]

Jaradat

Dearle

and

Barker

(

2016

)

Towards an autonomous decentralized orchestration system

Concurr. Comput.

3164

–

3179

[10]

Pantazoglou

Pogkas

and

Tsalgatidou

(

2014

)

Decentralized enactment of BPEL processes

IEEE Trans. Serv. Comput.

184

–

197

[11]

Barker

Weissman

J.B.

and

Van Hemert

J.I.

(

2012

)

Reducing data transfer in service-oriented architectures: the circulate approach

IEEE Trans. Serv. Comput.

437

–

449

[12]

Sonntag

Gorlach

Karastoyanova

Leymann

and

Reiter

(

2010

)

Process space-based scientific workflow enactment

Int. J. Bus. Process Integr. Manage.

–

[13]

Barker

Walton

C.D.

and

Robertson

(

2009

)

Choreographing web services

IEEE Trans. Serv. Comput.

152

–

166

[14]

Binder

Constantinescu

and

Faltings

(

2009

)

Service invocation triggers: A lightweight routing infrastructure for decentralised workflow orchestration

Int. J. High Performance Comput. Netw.

–

[15]

Hahn

Breitenbücher

Leymann

Wurster

, and

Yussupov

(

2018

)

Modeling Data Transformations in Data-Aware Service Choreographies

International Enterprise Distributed Object Computing Conference (EDOC)

Stockholm, Sweden

, pp.

–

IEEE

[16]

T.-X.

and

Kim

(

2017

)

Control and data plane separation architecture for supporting multicast listeners over distributed mobility management

ICT Express

–

[17]

Mohamed

Onireti

Imran

M.A.

Imran

and

Tafazolli

(

2016

)

Control-data separation architecture for cellular radio access networks: a survey and outlook

IEEE Commun. Surv. Tutor.

446

–

465

[18]

Filippini

Redondi

A.E.C.

and

Capone

(

2017

)

Beyond cellular green generation: potential and challenges of the network separation

Mobile Inform. Syst.

–

[19]

Liu

(

2002

) Data-flow Distribution in FICAS Service Composition Infrastructure.

International Conference on Parallel and Distributed Computing Systems (PDCS)

, ISCA,

Lousville, KY

, pp.

–

[20]

Barker

Weissman

J. B.

, and

van

Hemert

(

2008

) Eliminating the middleman: peer-to-peer dataflow.

International Symposium on High Performance Distributed Computing (HPDC)

Boston, MA

June

, p.

–

ACM

[21]

Barker

Weissman

J. B.

, and

van

Hemert

(

2008

) Orchestrating data-centric workflows.

International Symposium on Cluster Computing and the Grid (CCGRID)

Lyon, France

, pp.

210

–

217

IEEE

[22]

Aalst

W.M.P.V.D.

Aldred

Dumas

and

Hofstede

A.H.M.T.

(

2004

) Design and implementation of the YAWL system. In

Persson

Stirna

(eds)

Advanced Information Systems Engineering, Lecture Notes in Computer Science

(Vol.

3084

Springer

Berlin, Heidelberg

. http://docs.oasis-open.org/wsbpel/2.0/OS/wsbpel-v2.0-OS.html (

[23]

OASIS

(

2007

Web services business process execution language version 2.0

accessed December 14, 2021

[24]

Morrison

J.P.

(

1978

)

Data stream linkage mechanism

IBM Syst. J.

383

–

408

[25]

Kahn

and

Macqueen

(

1977

) Coroutines and networks of parallel processes.

International Federation for Information Processing (IFIP)

Toronto, Canada

, pp.

993

–

998

IFIP

[26]

Cherrier

Salhi

Ghamri-Doudane

Y.M.

Lohier

and

Valembois

(

2014

)

BeC 3: behaviour crowd centric composition for IoT applications

Mobile Netw. Appl.

–

[27]

Seeger

Deshmukh

R.A.

Sarafov

and

Bröring

(

2019

)

Dynamic IoT choreographies

IEEE Pervasive Comput.

–

[28]

Macker

J.P.

and

Taylor

(

2017

)

Orchestration and analysis of decentralized workflows within heterogeneous networking infrastructures

Fut. Gener. Comput. Syst.

388

–

401

[29]

Giang

N.K.

Lea

and

Leung

V.C.M.

(

2018

)

Exogenous coordination for building fog-based cyber physical social computing and networking systems

IEEE Access

31740

–

31749

[30]

Arellanes

and

Lau

K.-K.

(

2018

) Analysis and classification of service interactions for the scalability of the internet of things.

International Congress on Internet of Things (ICIOT)

San Francisco, CA

, pp.

–

IEEE

[31]

Autili

Inverardi

and

Tivoli

(

2018

)

Choreography realizability enforcement through the automatic synthesis of distributed coordination delegates

Sci. Comput. Progr.

160

–

[32]

Wutke

Martin

, and

Leymann

(

2008

) Model and infrastructure for decentralized workflow enactment.

ACM Symposium on Applied Computing (SAC)

Fortaleza, Ceara, Brazil

, pp.

–

ACM

[33]

Giang

N. K.

Blackstock

Lea

, and

Leung

V. C. M.

(

2015

) Developing IoT applications in the Fog: A Distributed Dataflow approach.

International Conference on the Internet of Things (IOT)

Seoul, Korea (South)

, pp.

155

–

162

IEEE

[34]

Arbab

(

2005

)

Abstract behavior types: A foundation model for components and their composition

Sci. Comput. Progr.

–

[35]

Ghanem

Curcin

Wendel

and

Guo

(

2008

) Building and using analytical workflows in discovery net. In

Dubitzky

(ed)

Data Mining Techniques on the Grid

Wiley Publishing

New York

[36]

Ludäscher

Altintas

Berkley

Higgins

Jaeger

Jones

Lee

E.A.

Tao

and

Zhao

(

2005

)

Scientific workflow management and the Kepler system

Concurr. Comput.

1039

–

1065

[37]

Decker

Kopp

and

Barros

(

2008

)

An introduction to service choreographies

Inform. Technol.

122

–

127

[38]

Open JS Foundation

(

2020

)

Node-RED: flow-based programming for the internet of things

. https://www.nodered.org (accessed December 14, 2021).

[39]

Ngu

Gutierrez

Metsis

Nepal

and

Sheng

(

2017

)

IoT middleware: A survey on issues and enabling technologies

IEEE Internet Things J.

–

[40]

Xue

Liu

and

Yao

(

2019

)

A process partitioning technique for constructing decentralized web service compositions

Softw.: Pract. Exp.

1550

–

1570

[41]

Cheng

Solmaz

Cirillo

Kovacs

Terasawa

and

Kitazawa

(

2018

)

FogFlow: easy programming of IoT services over cloud and edges for smart cities

IEEE Internet Things J.

696

–

707

[42]

Arellanes

and

Lau

K.-K.

(

2019

)

Workflow Variability for Autonomic IoT Systems

International Conference on Autonomic Computing (ICAC)

Umea, Sweden

, pp.

–

IEEE

[43]

Arellanes

and

Lau

K.-K.

(

2018

) Algebraic service composition for user-centric IoT applications. In

Georgakopoulos

Zhang

L.-J.

(eds)

Internet of Things - ICIOT 2018

Lecture Notes in Computer Science

(Vol.

10972

Springer International Publishing

Cham

[44]

Arellanes

and

Lau

K.-K.

(

2017

) Exogenous Connectors for Hierarchical Service Composition.

International Conference on Service-Oriented Computing and Applications (SOCA)

Kanazawa, Japan

, pp.

125

–

132

IEEE

[45]

Lau

K.-K.

and

Di Cola

(

2017

)

An Introduction to Component-based Software Development

(1st edn).

World Scientific

Singapore

[46]

Rana

Bangash

Y.A.

Baz

Rana

T.A.

and

Imran

M.A.

(

2020

)

Incremental composition process for the construction of component-based management systems

Sensors

–

[47]

Arbab

Autili

Inverardi

and

Tivoli

(

2019

) Different glasses to look into the three cs: component, connector, coordination. In

Boreale

Corradini

Loreti

Pugliese

(eds)

Models, Languages, and Tools for Concurrent and Distributed Programming

Springer International Publishing

Cham

[48]

Netflix

(

2020

). Conductor. https://netflix.github.io/conductor/ (

accessed December 14, 2021

[49]

Arellanes

and

Lau

K.-K.

(

2019

) Decentralized data flows in algebraic service compositions for the scalability of IoT systems.

World Forum on Internet of Things (WF-IoT)

Limerick, Ireland

April

, pp.

668

–

673

IEEE

[50]

Arellanes

and

Lau

K.-K.

(

2017

) D-XMAN: a platform for total compositionality in service-oriented architectures.

International Symposium on Cloud and Service Computing (SC2)

Kanazawa, Japan

, pp.

283

–

286

IEEE

[51]

and

Soman

(

2021

) Real-time data infrastructure at Uber.

International Conference on Management of Data (SIGMOD/PODS)

China

, pp.

2503

–

2516

ACM

[52]

Stolfi

D.H.

Alba

and

Yao

(

2017

) Predicting car park occupancy rates in smart cities. In

Alba

Chicano

Luque

(eds)

Smart Cities

Lecture Notes in Computer Science

(Vol.

10268

Springer International Publishing

Cham

[53]

Morrison

J.P.

(

2010

)

Flow-Based Programming: A New Approach to Application Development

(2nd edn).

CreateSpace

USA

[54]

Hahn

Breitenbucher

Leymann

and

Weiss

(

2017

) TraDE - a transparent data exchange middleware for service choreographies. In

Panetto

Debruyne

Ardagna

C.A.

Gaaloul

Papazoglou

Paschke

Meersman

(eds)

On the Move to Meaningful Internet Systems

Lecture Notes in Computer Science

(Vol.

10573

Springer International Publishing

[55]

Guimaraes

F. P.

Kuroda

E. H.

, and

Batista

D. M.

(

2012

)

Performance Evaluation of Choreographies and Orchestrations with a New Simulator for Service Compositions

International Workshop on Computer Aided Modeling and Design of Communication Links and Networks (CAMAD)

Barcelona, Spain

, pp.

140

–

144

IEEE

[56]

IoT

Analytics

(

2018

). State of the IoT 2018: Number of IoT devices now at 7B - Market accelerating. https://iot-analytics.com/state-of-the-iot-update-q1-q2-2018-number-of-iot-devices-now-7b/ (

accessed December 14, 2021

[57]

Nanda

M. G.

Chandra

, and

Sarkar

(

2004

) Decentralizing execution of composite web services.

ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications (OOPSLA)

Vancouver, BC, Canada

, pp.

170

–

187

ACM

[58]

Kleinfeld

Steglich

Radziwonowicz

, and

Doukas

(

2014

) Glue.things: a mashup platform for wiring the internet of things with the internet of services.

International Workshop on Web of Things (WoT)

Cambridge, MA

, pp.

–

ACM

[59]

Chafle

Chandra

Mann

, and

Nanda

M. G.

(

2004

) Decentralized orchestration of composite web services.

International World Wide Web conference (WWW)

New York, NY

, pp.

134

–

143

ACM

[60]

Zhou

Leppanen

Harjula

Ylianttila

Ojala

Jin

, and

Yang

L. T.

(

2013

) CloudThings: a common architecture for integrating the internet of things with cloud computing.

Proceedings of the 2013 IEEE 17th International Conference on Computer Supported Cooperative Work in Design (CSCWD)

Whistler, BC, Canada

, pp.

651

–

657

IEEE

A. APPENDIX

This appendix presents the notation required for the analysis of the IoT scenario described in Section 2. This scenario considers a real-world data set publicly available at https://archive.ics.uci.edu/ml/datasets/Parking+Birmingham#, which collects occupancy status in car parks operated by the Birmingham City Council, from 07:59:45 to 16:26:47 on October 22, 2016 [52]. Table A3 presents the data obtained for the car park BHMBCCMKT01 whose total capacity is $577$ ⁠. It particularly shows the measurement time, the $t$ seconds elapsed after 07:59:45, the number of occupied spaces $O (t)$ and the rate of available parking spaces (calculated by $1 - \frac{O (t)}{577}$ ⁠).

Table A.1

Occupancy in the car park BHMBCCMKT01.

Measurement time	$t$	$O (t)$	Availability rate
07:59:45	0	57	0.90
08:26:46	1621	59	0.90
08:59:44	3599	72	0.88
09:26:47	5222	93	0.84
09:59:44	7199	130	0.77
10:26:47	8822	156	0.73
10:59:47	10802	198	0.66
11:32:43	12778	239	0.59
11:59:46	14401	261	0.55
12:32:47	16382	324	0.44
12:59:47	18002	359	0.38
13:32:43	19978	380	0.34
13:59:48	21603	408	0.29
14:39:47	24002	430	0.25
14:59:48	25203	424	0.27
15:26:49	26824	403	0.30
15:59:47	28802	384	0.33
16:26:47	30422	340	0.41

Measurement time	$t$	$O (t)$	Availability rate
07:59:45	0	57	0.90
08:26:46	1621	59	0.90
08:59:44	3599	72	0.88
09:26:47	5222	93	0.84
09:59:44	7199	130	0.77
10:26:47	8822	156	0.73
10:59:47	10802	198	0.66
11:32:43	12778	239	0.59
11:59:46	14401	261	0.55
12:32:47	16382	324	0.44
12:59:47	18002	359	0.38
13:32:43	19978	380	0.34
13:59:48	21603	408	0.29
14:39:47	24002	430	0.25
14:59:48	25203	424	0.27
15:26:49	26824	403	0.30
15:59:47	28802	384	0.33
16:26:47	30422	340	0.41

Table A.1

Occupancy in the car park BHMBCCMKT01.

Measurement time	$t$	$O (t)$	Availability rate
07:59:45	0	57	0.90
08:26:46	1621	59	0.90
08:59:44	3599	72	0.88
09:26:47	5222	93	0.84
09:59:44	7199	130	0.77
10:26:47	8822	156	0.73
10:59:47	10802	198	0.66
11:32:43	12778	239	0.59
11:59:46	14401	261	0.55
12:32:47	16382	324	0.44
12:59:47	18002	359	0.38
13:32:43	19978	380	0.34
13:59:48	21603	408	0.29
14:39:47	24002	430	0.25
14:59:48	25203	424	0.27
15:26:49	26824	403	0.30
15:59:47	28802	384	0.33
16:26:47	30422	340	0.41

Measurement time	$t$	$O (t)$	Availability rate
07:59:45	0	57	0.90
08:26:46	1621	59	0.90
08:59:44	3599	72	0.88
09:26:47	5222	93	0.84
09:59:44	7199	130	0.77
10:26:47	8822	156	0.73
10:59:47	10802	198	0.66
11:32:43	12778	239	0.59
11:59:46	14401	261	0.55
12:32:47	16382	324	0.44
12:59:47	18002	359	0.38
13:32:43	19978	380	0.34
13:59:48	21603	408	0.29
14:39:47	24002	430	0.25
14:59:48	25203	424	0.27
15:26:49	26824	403	0.30
15:59:47	28802	384	0.33
16:26:47	30422	340	0.41

Notation 10.

Let

T = {0, 1621, 3599, 5222, 7199, 8822, 10802, 12778, 14401, 16382, 18002, 19978, 21603, 24002, 25203, 26824, 28802, 30422}

be the set of measurement times in our sample. Then,

Δ O (t_{i}, t_{i + 1})

is the function that computes the number of drivers arriving to the car park BHMBCCMKT01 between

t_{i}

and

t_{i + 1}

seconds:

\begin{aligned} Δ O (t_{i}, t_{i + 1}) = {\begin{cases} O (t_{i + 1}) - O (t_{i}) & O (t_{i + 1}) > O (t_{i}) \\ 0 & o t h e r w i s e \end{cases} \end{aligned}

(4)

s.t.

\sum_{i = 1}^{| T | - 1} Δ O (t_{i}, t_{i + 1}) = 373

and

t_{i}, t_{i + 1} \in T

⁠. This means that there are

373

drivers arriving to the car park BHMBCCMKT01 between 07:59:45 and 16:26:47.

Considering the columns 2 and 4 from our data set, we performed a cubic regression to approximate a function $A (t)$ for the rate of available parking spaces at $t$ seconds. The resulting equation has an average relative error of $1.98 %$ and it is defined as follows:

\begin{aligned} A (t) = 0.00000000000009516 t^{3} - 0.00000000363813682 t^{2} \\ + 0.00000641771909518 t + 0.89631130855805541 \end{aligned}

where $0 \leq t \leq 30422$ ⁠.

The probability of a parking space being unreserved at $t$ seconds is then $A (t)$ ⁠. Following the same trend of demand for parking spaces, the rate of parking spaces that are both free and unreserved at $t$ seconds is given by the function $U (t)$ as follows:

\begin{aligned} U (t) = A (t)^{2} \end{aligned}

where $0 \leq t \leq 30422$ ⁠.

B. APPENDIX

This appendix presents the result of the calculations for the total data transmitted over the network in our IoT scenario (Section 2 and Appendix A). The calculations are presented for each data exchange approach, i.e. centralized data flows, distributed data flows and decentralized data flows. For all of them, we assume that:

There are 577 occupancy sensors near a driver, corresponding to each space in the car park BHMBCCMKT01.
Each Sensor Service is hosted in an occupancy sensor and produces 100 bytes.
Once all sensor data are collected by a reducer, it is processed by the Availability Checking Service.
The Mapping Service appends 20 bytes to each parking space that is both free and unreserved. These extra bytes correspond to the distance between the driver’s location and each occupancy sensor.
The Space Finding Service produces 40 bytes for the coordinates of the nearest parking space.
There is a driver requesting a nearest parking space every $\frac{t_{i + 1} - t_{i}}{Δ O (t_{i}, t_{i + 1})}$ seconds between $t_{i} \in T$ and $t_{i + 1} \in T$ (Notation .15 and Equation 4).
The GPS service produces 40 bytes for the coordinates of the driver.
The driver always finds a nearest parking space, since the car park is never full (see $O (t)$ in Table A1).

Notation 11.

Considering assumptions (1) and (2), let $θ = 577 \cdot 100 = 57700$ be the constant for the total sensor data (in bytes) produced for a driver’s request.

For the sake of clarity and conciseness, we use the functions below to compute the number of bytes returned by the Availability Checking Service, the Reservation Service, the Mapping Service and the Space Finding Service, respectively.

Notation 12.

Let $S_{A} (t) = θ \cdot A (t)$ be the function that computes the number of bytes returned by the Availability Checking Service. These bytes correspond to the list of parking spaces that are available $t$ seconds after 07:59:45 s.t. $0 \leq t \leq 30422$ ⁠.

Notation 13.

Let $S_{R} (t) = θ \cdot U (t)$ be the function that computes the number of bytes returned by the Reservation Service. These bytes correspond to the list of parking spaces that are both available and unreserved $t$ seconds after 07:59:45 s.t. $0 \leq t \leq 30422$ ⁠.

Notation 14.

Considering assumptions (1) and (4), let $S_{M} (t) = S_{R} (t) + 577 \cdot U (t) \cdot 20$ be the function that computes the number of bytes returned by the Mapping Service, where $0 \leq t \leq 30422$ ⁠. These bytes correspond to the list of parking spaces that are both available and unreserved $t$ seconds after 07:59:45, plus the appended bytes for the distance between each parking space in the list and the driver’s location.

Notation 15.

Considering the assumption (5), let $S_{S} = 40$ be the constant for the number of bytes returned by the Space Finding Service, which represent the coordinates of the nearest parking space.

Notation 16.

Considering the assumption (7), let $S_{G} = 40$ be the constant for the number of bytes returned by the GPS Service, which represent the coordinates of the driver.

A centralized approach for exchanging data requires two data flows for moving data from a service producer to a service consumer. Therefore, $B_{γ}$ denotes the total data exchanged over the network from 07:59:45 to 16:26:47 when a centralized approach is used:

\begin{aligned} B_{γ} = \sum_{i = 1}^{| T | - 1} \sum_{j = 1}^{Δ O (t_{i}, t_{i + 1})} 2 S_{G} + 2 θ + 2 S_{A} (t_{i} + \frac{t_{i + 1} - t_{i}}{Δ O (t_{i}, t_{i + 1})} j) \\ + 2 S_{R} (t_{i} + \frac{t_{i + 1} - t_{i}}{Δ O (t_{i}, t_{i + 1})} j) + 2 S_{M} (t_{i} + \frac{t_{i + 1} - t_{i}}{Δ O (t_{i}, t_{i + 1})} j) + 2 S_{S} \\ \approx 102.98 M B \end{aligned}

(5)

For the distributed approach, each coordinator requires two data flows for passing data between a service producer and a service consumer. In addition, Coordinator1 requires a data flow for moving data to the Coordinator2 which, in turn, requires another data flow for passing data to the Coordinator3. Thus, the exchanges between the Availability Checking Service and the Reservation Service are done with three data flows. Likewise, there are three data flows for passing data from the Reservation Service to the Mapping Service. The symbol $B_{α}$ denotes the total data exchanged over the network from 07:59:45 to 16:26:47 when a distributed approach is used:

\begin{aligned} B_{α} = \sum_{i = 1}^{| T | - 1} \sum_{j = 1}^{Δ O (t_{i}, t_{i + 1})} 2 S_{G} + 2 θ + 3 S_{A} (t_{i} + \frac{t_{i + 1} - t_{i}}{Δ O (t_{i}, t_{i + 1})} j) \\ + 3 S_{R} (t_{i} + \frac{t_{i + 1} - t_{i}}{Δ O (t_{i}, t_{i + 1})} j) + 2 S_{M} (t_{i} + \frac{t_{i + 1} - t_{i}}{Δ O (t_{i}, t_{i + 1})} j) + 2 S_{S} \\ \approx 123.36 M B \end{aligned}

(6)

A decentralized approach is the most efficient since it requires only one data flow for moving data from a service producer to a service consumer. Based on this premise, $B_{β}$ denotes the total data exchanged over the network from 07:59:45 to 16:26:47 when a decentralized approach is used:

\begin{aligned} B_{β} = \sum_{i = 1}^{| T | - 1} \sum_{j = 1}^{Δ O (t_{i}, t_{i + 1})} S_{G} + θ + S_{A} (t_{i} + \frac{t_{i + 1} - t_{i}}{Δ O (t_{i}, t_{i + 1})} j) \\ + S_{R} (t_{i} + \frac{t_{i + 1} - t_{i}}{Δ O (t_{i}, t_{i + 1})} j) + S_{M} (t_{i} + \frac{t_{i + 1} - t_{i}}{Δ O (t_{i}, t_{i + 1})} j) + S_{S} \\ \approx 51.49 M B \end{aligned}

(7)

Taking into account the above calculations, the distributed approach leads to $(\frac{B_{α}}{B_{γ}} - 1) \cdot 100 \approx 19.80 %$ more network traffic than the centralized approach. Likewise, the decentralized approach produces $(1 - \frac{B_{β}}{B_{γ}}) \cdot 100 \approx 50.00 %$ less network traffic than the centralized approach and $(1 - \frac{B_{β}}{B_{α}}) \cdot 100 \approx 58.26 %$ less traffic than the distributed one.

C. APPENDIX

This appendix presents an example of how data forests are analysed in a bottom-up fashion using the Algorithm 1 at deployment-time. Our example considers the three-level data forest depicted in Figure C.2, which obeys the connection rules described in Section 3.2. Here, G is the forest of a parallel workflow that simultaneously executes the corresponding workflow for F and an atomic operation (with input iD and output oD). The workflow for F sequentially executes the workflow for E and then an atomic operation (with input iC and output oC). To pre-process data, F has a filter that takes the data produced by E before sending it to the corresponding operation. In our example, E is the lowest-level data forest defined by a branchial workflow that uses the input parameter iS to choose an operation to execute (out of two possible ones).

Figure B.1

Deployment of the data forest shown in Figure C.2.

Figure C.2

Three-level data forest.

As described in Section 3.4, the deployment process begins at the bottom-level. So, we first analyse the edges of the branchial data forest E via the Algorithm 1. The resulting binary relation is shown in Figure B.1.(a).

In the next step, we analyse the edges of the sequential data forest F to yield the binary relation described in Figure B.1.(b). Here, we can notice that the relations from the output oE are now the relations from iP. This is because the filter input does not require to read from oE. Instead, the filter reads data directly from any of the actual producers, viz. the operation output oA or the operation output oB. Likewise, the input parameters iA, iB and iS do not read data from iE since a new (intermediate) top-level data producer has been found (i.e. iF). Complementarily, the operation input iC reads data directly from the filter output, whereas the (intermediate) top-level output oF reads data from oC.

Finally, in the last deployment step, the edges of the data forest G are analysed to yield the binary relation depicted in Figure B.1.(c). As iG is the new top-level data producer, the relations from consumers iA, iB and iS are updated accordingly. Similarly, as G is a parallel forest, the top-level output oG takes both the data source from oF (i.e. oC) and the output oD. At the end of the deployment process, we have a mapping scheme that discards unnecessary relations to allow data consumers read values directly from data producers.

D. APPENDIX

Figure D.3. presents the JavaScript source code of the three smart contracts used in our implementation. The source code follows the contract model definition described in Section 4.

Figure D.3

Smart contracts logic.

E. APPENDIX

This appendix presents the notation used throughout the paper.

E.1

IOT SCENARIO

O (t)

Number of occupied parking spaces at

t

seconds after 07:59:45

T

Set of measurement times (in seconds)

Δ O (t_{i}, t_{i + 1})

Number of drivers requesting a nearest space between

t_{i} \in T

and

t_{i + 1} \in T

seconds

A (t)

Rate of free spaces at

t

seconds after 07:59:45

U (t)

Rate of spaces that are both free and unreserved at

t

seconds after 07:59:45

θ

Total sensor data (in bytes) produced for a driver’s request

S_{A} (t)

Number of bytes produced by the Availability Checking Service for a driver’s request

S_{R} (t)

Number of bytes produced by the Reservation Service for a driver’s request

S_{M} (t)

Number of bytes produced by the Mapping Service for a driver’s request

S_{S}

Fixed number of bytes produced by the Space Finding Service

S_{G}

Fixed number of bytes produced by the GPS Service

B_{γ}

Number of bytes exchanged in our scenario using a centralized approach

B_{α}

Number of bytes exchanged in our scenario using a distributed approach

B_{β}

Number of bytes exchanged in our scenario using a decentralized approach

E.2

The DX-MAN model

S

The type of a service

S \in S

A service

S

W

The type a workflow space

W \in S

A workflow space

W

◯

The type of a composition operator

◯_{Z}

A composition operator defining a composite service

Z

D

The type of a parameter

F

A data forest

F

V (F)

The set of parameter vertices in a data forest

F

v \in V (F)

A parameter

v

in a data forest

F

E (F)

The set of edges in a data forest

F

e \in E (F)

An edge

e

(i.e. a pair of parameters) in a data forest

F

E.3

Algorithm for edge analysis

Π_{1} (e)

The tail of a data forest edge

e

Π_{2} (e)

The head of a data forest edge

e

PI

The type of a processor input

PO

The type of a processor output

OI

The type of an operation input

OO

The type of an operation output

WI

The type of a (top-level) workflow input

WO

The type of a (top-level) workflow output

EI

The type of an exogenous operator input

V_{c}

The type of a vertex parameter that consumes data during workflow execution

v \in V_{c}

A consumer parameter

v

V_{p}

The type of a vertex parameter that produces data during workflow execution

v \in V_{c}

A producer parameter

v

ς

A binary relation between some consumer parameters and some producer parameters

d o m (ς)

The domain of the relation

ς

ρ

A binary relation between some producer parameters and some consumer parameters

d o m (ρ)

The domain of the relation

ρ

k

The sum of the number of edges in all the data forests involved in a multi-level composition

X_{p}

A set of producer parameters

Y_{c}

A set of consumer parameters

E.4

Evaluation (preliminaries)

G_{α}

A weighted graph of distributed data flows

V_{α}

The set of parameters in

G_{α}

v \in (V_{α} \cap V_{p})

A producer parameter

v

G_{α}

v \in (V_{α} \cap V_{c})

A consumer parameter

v

G_{α}

E_{α}

The set of directed data flows (between parameters) in

G_{α}

e \in E_{α}

A distributed data flow

Ω_{α} (e)

The function that maps a distributed data flow

e \in E_{α}

to a network communication cost

Ω_{α} ((o A, i B))

The network cost of passing the value from the output parameter oA to the input parameter iB, via a distributed data flow

M

The type of a finite walk of data flows in

G_{α}

M^{'} \in M

A finite walk of data flows in

G_{α}

α (M^{'})

The function that maps a finite walk

M^{'} \in M

to the total network cost of routing data from a producer parameter

p \in (V_{α} \cap V_{p})

to a consumer parameter

q \in (V_{α} \cap V_{c})

ω (v)

The network cost of writing a value produced by

v \in V_{p}

Γ (v)

The network cost of reading a value for

v \in V_{c}

G_{β}

A weighted graph of decentralized data flows

V_{β}

The set of parameters in

G_{β}

v \in (V_{β} \cap V_{p})

A producer parameter

v

G_{β}

v \in (V_{β} \cap V_{c})

A consumer parameter

v

G_{β}

E_{β}

The set of directed data flows (between parameters) in

G_{β}

e \in E_{α}

A decentralized data flow

Ω_{β} (e)

The function that maps a decentralized data flow

e \in E_{β}

to a network communication cost

Ω_{β} ((o A, i B))

The network cost of passing the value from the output parameter oA to the input parameter iB, via a decentralized data flow

\circ

The operator for function composition

Π_{1}^{β} (e)

The tail of a decentralized data flow

e \in E_{β}

Π_{2}^{β} (e)

The head of a decentralized data flow

e \in E_{β}

β (e)

The total network cost of routing data from a producer parameter

Π_{1}^{β} (e)

to a consumer parameter

Π_{2}^{β} (e)

s.t.

e \in E_{β}

E.4

Evaluation (preliminaries)

G_{α}

A weighted graph of distributed data flows

V_{α}

The set of parameters in

G_{α}

v \in (V_{α} \cap V_{p})

A producer parameter

v

G_{α}

v \in (V_{α} \cap V_{c})

A consumer parameter

v

G_{α}

E_{α}

The set of directed data flows (between parameters) in

G_{α}

e \in E_{α}

A distributed data flow

Ω_{α} (e)

The function that maps a distributed data flow

e \in E_{α}

to a network communication cost

Ω_{α} ((o A, i B))

The network cost of passing the value from the output parameter oA to the input parameter iB, via a distributed data flow

M

The type of a finite walk of data flows in

G_{α}

M^{'} \in M

A finite walk of data flows in

G_{α}

α (M^{'})

The function that maps a finite walk

M^{'} \in M

to the total network cost of routing data from a producer parameter

p \in (V_{α} \cap V_{p})

to a consumer parameter

q \in (V_{α} \cap V_{c})

ω (v)

The network cost of writing a value produced by

v \in V_{p}

Γ (v)

The network cost of reading a value for

v \in V_{c}

G_{β}

A weighted graph of decentralized data flows

V_{β}

The set of parameters in

G_{β}

v \in (V_{β} \cap V_{p})

A producer parameter

v

G_{β}

v \in (V_{β} \cap V_{c})

A consumer parameter

v

G_{β}

E_{β}

The set of directed data flows (between parameters) in

G_{β}

e \in E_{α}

A decentralized data flow

Ω_{β} (e)

The function that maps a decentralized data flow

e \in E_{β}

to a network communication cost

Ω_{β} ((o A, i B))

The network cost of passing the value from the output parameter oA to the input parameter iB, via a decentralized data flow

\circ

The operator for function composition

Π_{1}^{β} (e)

The tail of a decentralized data flow

e \in E_{β}

Π_{2}^{β} (e)

The head of a decentralized data flow

e \in E_{β}

β (e)

The total network cost of routing data from a producer parameter

Π_{1}^{β} (e)

to a consumer parameter

Π_{2}^{β} (e)

s.t.

e \in E_{β}

E.5