Abstract

Horizontal and vertical scalability have been widely studied in the context of computational resources. However, with the exponential growth in the number of connected objects, functional scalability (in terms of the size of software systems) is rapidly becoming a central challenge for building efficient service-oriented Internet of Things (IoT) systems that generate huge volumes of data continuously. As systems scale up, a centralized approach for moving data between services becomes infeasible because it leads to a single performance bottleneck. A distributed approach avoids such a bottleneck, but it incurs additional network traffic as data streams pass through multiple mediators. Decentralized data exchange is the only solution for realizing totally efficient IoT systems, since it avoids a single performance bottleneck and dramatically minimizes network traffic. In this paper, we present a functionally scalable approach that separates data and control for the realization of decentralized data flows in service-oriented IoT systems. Our approach is evaluated empirically, and the results show that it scales well with the size of IoT systems by substantially reducing both the number of data flows and network traffic in comparison with distributed data flows.

1. INTRODUCTION

The Internet of Things (IoT) is penetrating essential domains of our daily life such as healthcare, security or city management. This emerging paradigm promises the seamless interconnection of any physical object (i.e. thing) through innovative distributed services, leading to service-oriented IoT systems.1 A proper service composition mechanism is thus required for the integration of IoT services into workflows that consist of both control flow and data flow [1]. Control flow refers to the order in which services are executed, whilst data flow defines how services move data over the network.

As of early 2021, there are over 20 billion connected things and it is predicted that this number will exponentially grow in the next few years [2–4]. Hence, as IoT systems may potentially consist of an overwhelmingly large number of services, functional scalability becomes a challenging concern. Unlike vertical and horizontal scalability, functional scalability accommodates growth in terms of the number of services composed in an IoT system [1]. To tackle the functional scalability challenge, a service composition mechanism must provide the means to compose loosely coupled services that exchange data as efficiently as possible over the network.

IoT services can exchange data in three different ways: (i) with a centralized coordinator, (ii) with multiple distributed coordinators or (iii) in a purely decentralized manner. A centralized approach [5] relies on a single coordinator to mediate data streams between services. Although this is viable for enterprise scenarios where small amounts of data are involved, the coordinator would easily become a bottleneck in IoT systems that generate data in the order of Petabytes. To avoid the central bottleneck, a distributed approach [6] can be used for balancing loads over multiple coordinators. However, this would cause network performance degradation as data passes through multiple mediators before reaching the actual data consumers.

A decentralized approach is the most efficient way to exchange data as it enables direct data exchanges from producers to consumers, thereby decreasing latency and maximising throughput [7–14]. However, exchanging data among loosely coupled IoT services is not trivial, especially in resource-constrained environments where things have poor network connection and low-disk space [7]. Furthermore, building data dependency graphs is hard when control flow and data flow are tightly coupled (i.e. when data follows control). To overcome such issues, recent research [1, 8, 15–18] shows that the separation of control and data can be useful since it allows independent reasoning, monitoring, maintenance and evolution of control flow and data flow. Consequently, an efficient data exchange approach can be defined without considering control, so a reduced number of messages can be transmitted over the network.

1.1. Related work

This section presents and analyses related work on decentralized data flows in service-oriented systems. We classify the existing approaches into three major categories, depending on the composition mechanism they are built on: (i) orchestration (with coordinated data exchanges), (ii) P2P dataflows and (iii) P2P choreography.

In orchestration-based approaches [11, 19–21], a central orchestrator coordinates system execution by passing data references alongside control. Although data values do not follow control like in traditional orchestration [22, 23], additional network traffic is introduced since references and acknowledgement messages are routed via the network. This additional network traffic arises from the fact that orchestration-based approaches do not separate data and control.

Dataflows is a composition mechanism that builds a graph of data transformations, where vertices receive data, perform some computation and pass the result(s) to other vertices via data flow edges [24, 25]. Such data exchanges can be done with or without mediators. In P2P dataflows [14, 26–29], services exchange data with no mediators between them [1]. As control flow is implicit in the collaborative exchange of data, P2P dataflows do not separate data and control.

Service choreography describes decentralized interactions among participants using a well-defined public protocol. In P2P choreography, atomic services are participants that exchange data via direct message passing [30]. Some approaches [13, 31] introduce proxies to coordinate the invocation of services and the exchange of data alongside control. In any case, P2P choreography does not separate data and control.

Table 1 summarizes our analysis of related work, where it is clear that there is no approach supporting the separation of control and data for realizing decentralized data flows between atomic services, in order to efficiently support environments that generate huge volumes of data continuously. Furthermore, existing approaches only consider vertical and horizontal scalability. It is important to mention that the separation of concerns does not imply decentralization but provides a number of benefits. Particularly, the separation is necessary to avoid passing references alongside control during system execution, thus reducing the number of messages transmitted over the network. Also, the orthogonality of data and control enables separate reasoning, monitoring, maintenance, reuse and evolution of those concerns, as discussed in other studies [1, 8].

Table 1

Analysis of related work.

ApproachSeparation of data and controlFunctional scalabilityDecentralized dataflows
Orchestration with coordinated data exchanges××
P2P dataflows××
P2P choreography××
Our approach
ApproachSeparation of data and controlFunctional scalabilityDecentralized dataflows
Orchestration with coordinated data exchanges××
P2P dataflows××
P2P choreography××
Our approach
Table 1

Analysis of related work.

ApproachSeparation of data and controlFunctional scalabilityDecentralized dataflows
Orchestration with coordinated data exchanges××
P2P dataflows××
P2P choreography××
Our approach
ApproachSeparation of data and controlFunctional scalabilityDecentralized dataflows
Orchestration with coordinated data exchanges××
P2P dataflows××
P2P choreography××
Our approach

In the analysis above, we did not consider decentralized orchestration, coordinated dataflows and choreographed orchestration, since those approaches do not support decentralized data flows between atomic services. Nevertheless, for completeness, we analyse those categories below.

In approaches built on top of decentralized orchestration, multiple composite services coordinate system execution [9, 10, 12, 32]. In particular, approaches like [32] persist data on distributed data spaces, which may become a bottleneck in IoT environments where services exchange huge volumes of data continuously. Although some works (e.g. [12]) solve the bottleneck issue by persisting references instead of values, they require the maintenance of tuple spaces and databases for moving references and storing data, respectively. Moving references is necessary because decentralized orchestration does not separate data and control. Paradoxically, decentralized orchestration does not support decentralized dataflows between atomic services.

In coordinated dataflows [33], a master composite coordinates data flows between slave composites. As data pass through multiple mediators, this approach introduces unnecessary network hops (like decentralized orchestration). This problem is also present in [34], which introduces the notion of abstract data types for moving data outside services. Discovery Net [35] and Kepler [36] are workflow systems that implement the semantics of coordinated dataflows, which provide mechanisms to coordinate the execution of (disjoint) data flow graphs. Discovery Net allows the definition of a control flow graph where nodes exchange control tokens instead of data values. Nodes can be data flow graphs per se. In Kepler, a director is a semantic entity that controls data exchanges from outside a data flow graph. These systems do not support decentralized data flows because their workflow execution mechanisms are centralized [9].

In choreographed orchestration, data pass through multiple orchestrator participants leading to network degradation [37]. To separate control and data, [8] proposes a choreographed orchestration approach that allows the manual modelling of cross-partner data dependencies. However, there are no decentralized data flows between atomic services, but only between partners.

1.2. Paper contributions

This paper proposes a functionally scalable approach that semantically separates data and control, in order to decentralize data flows between atomic services in IoT systems. Accordingly, this paper makes the following contributions:

  • We propose an (implementation-independent) service composition model that semantically separates data and control by so-called data forests. The semantics allows the encapsulation of explicit control flow graphs and explicit data dependency graphs. As graphs are orthogonal, the semantics enables a separate reasoning and analysis of data and control.

  • We propose an efficient algorithm that leverages the model semantics for a compositional analysis of data forests. The algorithm analyses a data dependency graph, without considering control flow, for the automatic construction of a decentralized mapping between data consumers and data producers. The algorithm reduces both the number of data flows and network traffic of an IoT system.

  • Unlike the state-of-the-art on decentralized data flows, we evaluate the proposed approach in terms of functional scalability. The evaluation is implementation-independent and shows that the approach scales well with the number of services by reducing the number of data flows linearly and network traffic logarithmically.

1.3. Paper organisation

The rest of the paper is organized as follows. Section 2 presents a motivating scenario to explain why decentralized data flows are a crucial desideratum in the IoT domain. Section 3 presents an overview of the model semantics and a detailed description of the proposed approach. Section 4 presents the implementation of the proposed solution. Section 5 presents a case study. Section 6 presents a comparative evaluation of decentralized data flows versus distributed data flows in the case study. Section 7 outlines an evaluation of the proposed approach in terms of functional scalability. Section 8 outlines the conclusions. Appendices  A and  B formalise our motivating scenario. Appendix  C describes an example of the deployment-time process. Appendix  D presents a source code fragment of our implementation. Appendix  E presents the notation used throughout the paper.

2. WHY DO WE NEED DECENTRALIZED DATA FLOWS IN IOT?

This section discusses the rationale for enabling decentralized data flows in IoT, with the help of a scenario based on the smart parking system presented in [1] and real-world occupancy data from car parks in Birmingham, UK (Appendix  A). The system is used by drivers who want to find the nearest parking space while they drive around a smart city. The workflow for this scenario is a combination of the reduce and the sequential data patterns and involves the services shown in Figure 1. Although data pass through operations, we consider that it is routed between services as there is a one-to-one relationship between services and operations. For simplicity, we assume that services are deployed on different things, e.g. the Availability Checking Service is deployed on an infostation while the Space Finding Service is deployed on an edge router. An infostation is an urban infrastructure device that collects up-to-date status from all occupancy sensors in range. We assume that all parking spaces are equipped with occupancy sensors whose functionality is abstracted by a Sensor Service.

Workflow of the smart parking system.
Figure 1

Workflow of the smart parking system.

The workflow of our scenario is triggered by a driver’s request, and it starts with two parallel tasks. In the first one, the Data Reducer 2 receives the driver’s coordinates from the GPS Service. In the second one, the Data Reducer 1 pulls the states from multiple Sensor services to create a list that shows the status of parking spaces. Then, the Availability Checking Service uses the list to determine which parking spaces are free. As some spaces can be reserved in advance, the Reservation Service determines which free parking spaces are unreserved and passes its resulting list to the Data Reducer 2. Once the Data Reducer 2 receives all its inputs, it forwards them to the Mapping Service. The Mapping Service then computes the distance between each (free and unreserved) space and the driver’s location and appends the distances to the list. Finally, the Space Finding Service determines the nearest parking space for the driver, using the calculations from the Mapping Service. The coordinates of the nearest parking space are the final result of the workflow and are passed to the car control application.

As the number of occupancy sensors could be huge, especially in large cities, the services of the smart parking system may potentially exchange vast amounts of data continuously. This is because there is a one-to-one mapping between sensors and services, and occupancy sensors are frequently producing data. Thus, functional scalability becomes a challenging concern. To understand how we can address this problem, we analyse the three existing approaches for passing data between the services of our scenario (Fig. 2). For simplicity, we do not show control flow and we assume that there are no data required for both pulling sensor data and starting the execution of the system (from the car control application).

Data flow approaches for the smart parking system.
Figure 2

Data flow approaches for the smart parking system.

In our scenario, there are 373 drivers arriving to the car park BHMBCCMKT01 between 07:59:45 and 16:26:47 on a given day (Appendix  A). This car park has 577 occupancy sensors each producing 100 bytes, so there are 577100=57.7 KB of sensor data per driver request. The number of bytes returned by the Availability Checking Service, the Reservation Service and the Mapping Service are computed by the functions SA(t), SR(t) and SM(t), whose definitions are presented in Appendix  B. For simplicity, we assume that the GPS Service and the Space Finding Service always return 40 bytes for the coordinates of a driver and a nearest space, respectively.

We refer the reader to Appendix  A for details about the data set and the formalisms used in our scenario. Appendix  B presents the calculations for the total data transmitted over the network, considering that there are 373 drivers between 07:59:45 and 16:26:47. These calculations are presented for each data exchange approach depicted in Figure 2. Below we interpret the results.

A centralized approach [5, 38, 39] depends on a single coordinator for passing data between services. Figure 2(a) shows how a central coordinator mediates data streams for the smart parking system. The main drawback of this approach is that such a coordinator is a potential bottleneck (as huge amounts of data are exchanged) and introduces an extra network hop for passing data. This approach is therefore a threat for scalability as pointed out by many researchers [7, 8, 11, 14]. Furthermore, IoT may potentially require the deployment of coordinators on resource-constrained things (e.g. edge devices), which can lead to bottlenecks due to the presence of low computing power and poor network resources [7]. For instance, the central coordinator of our scenario is deployed on an edge device, which manages 102.98 MB of data over the network using a poor connection (see Equation 5 in Appendix  B). Certainly, this impacts the overall performance of the system negatively. This is especially true in large cities with multiple drivers requesting a space at the same time.

Although a distributed approach [33, 40, 41] removes the central coordinator, it introduces unnecessary network traffic as data passes through multiple mediators, even if data is unimportant for them. For example, in Figure 2(b) the data generated by the Availability Checking Service goes through Coordinator1 and then through Coordinator2, before reaching the service that really needs the data (i.e. the Reservation Service). These additional network hops negatively impact the overall performance of the system.

For our scenario, the distribution of data load among three coordinators avoids a single bottleneck. However, as coordinators mediate interactions, 123.36 MB of data is exchanged over the network (see Equation 5 in Appendix  B), leading to 19.80% more network traffic than the centralized approach (see at the bottom of Appendix  B). In fact, this indicates that the network traffic may increase linearly with the number of coordinators.

The decentralized data exchange approach [7, 8, 11, 13] is the most efficient one, since data are passed directly to the services that actually need it. Particularly, it requires only one network hop to pass data from a data producer to a data consumer, as depicted in Figure 2(c). The decentralized version of our scenario requires a total data transfer of 51.49 MB over the network (see Equation 7 in Appendix  B), i.e. 50.00% less network traffic than the centralized approach and 58.26% less traffic than the distributed one (see at the bottom of Appendix  B).

3. THE DX-MAN MODEL

To realize decentralized data flows, we propose to extend the semantics of DX-MAN [42–44], which is an algebraic model for building IoT systems, where services and exogenous composition operators are first-class semantic entities (Fig. 3).

DX-MAN Model.
Figure 3

DX-MAN Model.

A service  SS is a stateless distributed unit of composition that can be atomic or composite. Its semantics is a workflow space  WW, which is a (finite or infinite) set of workflow control flow variants wi[1,]W that represent alternative service behaviours:

(1)

where S is the service type and W is the workflow space type.

An atomic service is syntactically formed by connecting an invocation connector with a computation unit which encapsulates the implementation of multiple operations and it is not allowed to invoke other units (Fig. 3(a)). Semantically, this is equivalent to constructing an atomic workflow space whose variants invoke different operations in the computation unit via the invocation connector. An operation performs some computation and has an input parameter and an output parameter.

The DX-MAN model relies on algebraic composition for defining complex services. Algebraic composition is the process by which a composition operator hierarchically composes multiple services of type S into a composite service of type S which, in turn, can be further composed into even more complex composites (Fig. 3(b)) [44]. A composition operator defines control flows exogenously (i.e. outside the composed services) to avoid service dependencies. Formally, it is a function for n services with the following type:

(2)

A composite service is syntactically formed by connecting a composition operator with multiple (atomic and/or composite) services, which has an input parameter and an output parameter (Fig. 3(c)). Semantically, it is a composite workflow space produced by a composition operator taking sub-workflow spaces as operands. There are composition operators for sequencing (i.e. sequencer), branching (i.e. inclusive selector and exclusive selector) and parallelism (i.e. parallelizer). A sequencer or a parallelizer defines an infinite workflow space, whilst a branching operator defines a finite one.

The control flow structure of a composite service is represented by an abstract workflow tree whose leaves are operations in atomic sub-services, whole composite sub-services or any combination thereof (Fig. 4). An abstract workflow tree allows the definition of a particular variant from a composite workflow space. To do so, a concrete workflow tree must be created, which is a selection function over a set of workflow variants, and it is therefore isomorphic to an abstract workflow tree. The edges of a concrete workflow tree are labelled according to the composition operator used. In particular, the label of a sequencer edge is an ordered list of natural numbers and the label of a parallelizer edge is a natural number (representing the amount of parallel tasks). For the edges of an inclusive selector and an exclusive selector, the labels are conditions and boolean values, respectively. An exclusive selector also has an associated global condition. For further details on workflow selection, see [42, 43].

Abstract workflow trees for the composite service shown in Figure 3, when $\bigcirc _C$ is (a) a sequencer operator, (b) a parallelizer operator, (c) an inclusive selector operator or (d) an exclusive selector operator.
Figure 4

Abstract workflow trees for the composite service shown in Figure 3, when C is (a) a sequencer operator, (b) a parallelizer operator, (c) an inclusive selector operator or (d) an exclusive selector operator.

3.1. Separation of control flows and data flows

IoT service composition can be endogenous or exogenous, depending on control flow origin [45]. Endogenous composition is used by P2P choreographies, which mix service computation (i.e. operations) with control flow constructs (Fig. 5(a)). By contrast, exogenous composition relies on coordinators that define control flow outside the computation of composed services (Fig. 5(b)). This composition approach is used by orchestration and by DX-MAN. Unlike its counterpart, exogenous composition avoids control flow dependencies between services and facilitates reuse at scale [29, 36, 44, 46, 47]. It also avoids application logic being embedded in the computation of multiple atomic services, thereby facilitating tracking and monitoring, which are crucial desiderata of functional scalability [1, 48].

Endogenous composition versus exogenous composition.
Figure 5

Endogenous composition versus exogenous composition.

Despite the above advantages, exogenous composition suffers from performance issues when data follow control [11, 14, 49]. This is because coordinators mediate data streams even for data that are unimportant for them (Figs 2(a) and (b)).2 To address this problem, we propose to define control flow and data flow as orthogonal dimensions (Fig. 6). The idea is that coordinators (i.e. composition operators) never exchange data during workflow execution, but only control.

Possible DX-MAN dimensions. This paper proposes an approach for realizing (b).
Figure 6

Possible DX-MAN dimensions. This paper proposes an approach for realizing (b).

In DX-MAN, a workflow is defined by a concrete workflow tree (Fig. 7(b)). It has an input parameter and an output parameter and describes a series of steps for the invocation of operations in atomic services, sub-workflows or any combination thereof.3 Figure 7(c) shows an example of a sequential workflow for the invocation of the operation opA (provided by the atomic service SA) and the operation opB (provided by the atomic service SB). This workflow variant results from the composition depicted in Figure 7(a) where the atomic services SB and SC are composed into the composite SD by the sequencer operator D. Likewise, the sequencer operator E composes SA and SD into the top-level composite SE. For each composite, we select a workflow with a different concrete workflow tree. Note that a DX-MAN composition is done in a hierarchical bottom-up manner and, although there are infinite sequential workflow variants, we describe only one of them for the sake of this paper. Also note that the workflow invoking opB is a sub-workflow of the top-level one. This is because SD is composed into SE.

Relationship between DX-MAN compositions and IoT workflows.
Figure 7

Relationship between DX-MAN compositions and IoT workflows.

A workflow execution traverses a control flow tree from top to bottom and then returns backwards (Fig. 8). Thus, the execution of the workflow depicted in Figure 7(c) starts with the activation of E, which sequentially passes control to the invocation connector A and then to D. The latter just forwards control to the invocation connector B. Once they are triggered, invocation connectors execute an operation in their computation unit.

Possible executions of the workflow shown in Figure 7(c) when control flows and data flows are (a) mixed or (b) orthogonal. In (a), composition operators (i.e. coordinators) mediate data streams, even if data are unimportant for them. In (b), data are passed directly between atomic services. This paper proposes an approach to realize (b).
Figure 8

Possible executions of the workflow shown in Figure 7(c) when control flows and data flows are (a) mixed or (b) orthogonal. In (a), composition operators (i.e. coordinators) mediate data streams, even if data are unimportant for them. In (b), data are passed directly between atomic services. This paper proposes an approach to realize (b).

If data follow control, it passes through the composition connectors E and D (Fig. 8(a)). Thus, to avoid such inefficient executions, we propose an approach that enables decentralized data exchanges between atomic services by the separation of control and data (Fig. 8(b)). Our approach is described below.

3.2. Design-time: (semantic) data forest definition

A DX-MAN system is a workflow control flow variant with an orthogonal data forest. The latter is a graph of directed data dependency trees defined on service composition at design-time, whose edges define an explicit relationship between parameters. Formally, a data forest is a graph F:=(V(F),E(F)) s.t. V(F) is a (finite or infinite) set of parameter vertices and E(F) is a (finite or infinite) set of edges. As a vertex vV(F) is of type D, an edge eE(F) is a tuple of type D×D where D is the parameter type.

For a parallelizer, the workflow input is connected to the inputs of all the invoked elements (i.e. operations or sub-workflows) whose outputs are, in turn, connected to the workflow output. In this paper, we only describe how parallel and sequential data forests are formed, since decentralized data flows are only meaningful for parallelism and sequencing. For a sequencer, the connection of vertices forms a data pipeline, so the output of an invoked element is connected to the input of the next invoked element. Additionally, the workflow input is connected to the input of the first invoked element, and the output of the last invoked element is connected to the workflow output.

Figure 9 analyses thoroughly the composition depicted in Figure 7. It shows the sequential workflow variants and the data forests for the composites SD and SE. Such forests conform to the rules described for a sequencer. The analysis is segmented because every composite in the DX-MAN model is a black box that can have any possible behaviour out of the alternative ones. Thus, as there are no (bridge) connections between data dependency trees of different forests, data are encapsulated in composite services. Therefore, composites (with all their workflows and data forests) are reusable across different IoT systems.

Separation of control and data in sequential composites. Control constructs and data dependencies are explicitly defined by exogenous composition operators and data forests, respectively.
Figure 9

Separation of control and data in sequential composites. Control constructs and data dependencies are explicitly defined by exogenous composition operators and data forests, respectively.

Reusability is also present in the composite depicted in Figure 10(a), which shows an example where the parallelizer Z composes the atomic services SX and SY into the composite SZ. In this example, we define a concrete workflow tree for executing all the sub-service operations in parallel. The resulting workflow has a fork construct to pass control in parallel to opX and opY, and a join construct to synchronise control flow. Choosing this workflow implies that there is a data forest conforming to the rules previously described for a parallelizer (Fig. 10(b)).

Separation of control and data in a parallel composite. Parallel constructs (i.e. fork and join) and data dependencies are explicitly defined by exogenous composition operators and data forests, respectively.
Figure 10

Separation of control and data in a parallel composite. Parallel constructs (i.e. fork and join) and data dependencies are explicitly defined by exogenous composition operators and data forests, respectively.

Although decentralized data flows are only meaningful for parallelism and sequencing, for completeness we describe how branchial data forests are formed. For both exclusive and inclusive selectors, the corresponding connection scheme is isomorphic to the one defined for a parallelizer in which the workflow input is connected to the inputs of all the invoked elements, and the outputs of the elements are connected to the workflow output (Fig. 10(b)). This connection is done to ensure that there is an output value no matter the execution branch taken.

3.3. Deployment-time: data forest refinement

At deployment-time, a data forest can be (manually) refined in three different ways: (i) by adding or removing edges, (ii) by adding input data into exogenous operators or (iii) by introducing data processors. These refinements are entirely optional with the overall aim of providing extra flexibility for system modellers. To understand them, we consider the data forest of the composite SE (shown in Fig. 9):

where iE,oE,iA,oA,iD,oDD.

For all the scenarios, we show how edges and vertices change with respect to the original data forest (Fig. 11(a)).

Data forest refinement.
Figure 11

Data forest refinement.

3.3.1. Edges refinement

Refining edges is useful for changing data relationships and optimizing data flows. For instance, the edge (iE,iA) can be removed when the operation opA does not need any data to perform computation. Similarly, by replacing (oA,iD) with (iE,iD), the sub-workflow SD receives data from the top-level input instead of getting it from opA (Fig. 11(b)). Formally, E(FE) becomes E(FE){(iE,iA),(oA,iD)}{(iE,iD)}.

3.3.2. Exogenous operator refinement

Although they do not perform any computation, exogenous operators might require input values to evaluate boolean conditions at run-time. By default, exogenous operators do not have any parameters, but parameters can be added for a specific workflow. To do so, new vertices are added into the respective data forest and connected manually at the discretion of the modeller. Edge connections must conform to the rules described in [49]. Figure 11(d) shows an example of this kind of refinement, where iOD represents the input parameter of the composition operator that coordinates the execution of the workflow SE. Formally, V(FE) becomes V(FE){iO} and E(FE) becomes E(FE){(iE,iO)}.

3.3.3. Data processor refinement

Data processors can be introduced in a data forest to perform intermediate processing. A data processor is a function that receives at least one input data, performs some custom computation and stores result(s) in at least one output (Fig. 11(c)). By introducing data processors, we avoid tangling with the original workflow control flow of an IoT system. Although the DX-MAN model currently supports the most common data patterns (i.e. map-reduce and filtering), other data processors can be defined using the same semantics. For instance, a replicator could be introduced to copy data to multiple parameters.

Figure 11(e) customises FE by filtering the data generated by opA before sending it to the sub-workflow SD. Formally, V(FE) becomes V(FE){iF,oF} and E(FE) becomes E(FE){(oA,iD)}{(oA,iF),(oF,iD)} s.t. iF,oFD are the filter parameters. Figure 11(f) shows another refinement, where a mapper collects data from both opA and the parent workflow to create a list of transformed key-value pairs. The list is then processed by a reducer which produces an output in a suitable format for SD. Formally, V(FE) becomes V(FE){iM,iM,oM,iR,oR} and E(FE) becomes E(FE){(iE,iA),(oA,iD)}{(iE,iM),(oA,iM),(oM,iR),(oR,iD)} s.t. iM,iM,oMD are the mapper parameters and iR,oRD are the reducer parameters.

3.4. Deployment-time: data forest analysis

Once data forests have been defined and refined (if needed), a direct mapping between consumer parameters and producer parameters is created. To describe our approach, some formal notations are provided.

 

Notation 1.

Let Π1:D×DD be the tail map of a forest edge.

 

Notation 2.

Let Π2:D×DD be the head map of a forest edge.

 

Notation 3.

Let PI be the processor input type, PO the processor output type, OI the operation input type, OO the operation output type, WI the (top-level) workflow input type, WO the (top-level) workflow output type and EI the type of an exogenous operator input s.t. PI,PO,OI,OO,WI,WO,EID.

 

Notation 4.

Let Vc be the type of vertices that consume data on workflow execution, namely service operation inputs, exogenous operator inputs, data processor inputs and top-level workflow outputs s.t. Vc:=OIEIPIWO. Top-level workflow outputs contain the resulting data from an IoT system execution.

 

Notation 5.

Let Vp be the type of vertices that produce data on workflow execution (i.e. operation outputs and data processor outputs) as well as vertices representing input data of top-level workflows s.t. Vp:=OOPOWI. Top-level workflow inputs are the data needed for an IoT system execution.

 

Notation 6.

Let ς(Vc×Vp) be a binary relation (i.e. a mapping) between some consumer parameters and some producer parameters.

 

Notation 7.

Let ρ(Vp×Vc) be a binary relation (i.e. a mapping) between some producer parameters and some consumer parameters.

Our approach analyses the edges of all the data forests of an IoT system, from bottom to top with a complexity of O(k) s.t. k is the total number of edges in a multi-level composition (Fig. 12). As they do not require any data manipulation, workflow parameters (i.e. composite service parameters) are discarded by transforming the binary relations ς and ρ each time an edge is analysed. When all edges have been processed, ς is the result of our approach, i.e. a direct mapping from data consumers to data producers.

Bottom-up analysis of data forests: from level 0 to level $L$. The algorithm analyses the data forests of the composite services at level $0$, then the data forests of the composite services at level $1$, and so on until reaching the data forests of the top-level composite at level $L$.
Figure 12

Bottom-up analysis of data forests: from level 0 to level L. The algorithm analyses the data forests of the composite services at level 0, then the data forests of the composite services at level 1, and so on until reaching the data forests of the top-level composite at level L.

We have designed Algorithm 1 to analyse individual data forest edges. The algorithm uses the sets XpD and YcD to process the endpoints of an edge eD×D. The elements of these sets are determined according to two fundamental rules. The first one is shown in lines 4–7. It states that Xp is the set of vertices connected to the tail parameter  Π1(e) if the tail is not a data processor parameter and has incoming connections; otherwise, Xp only contains the tail. The second rule is shown in lines 8–13. It states that if the head parameter  Π2(e) is not in a data processor and has outgoing connections, then (i) Yc has the vertices connected from the head parameter and (ii) Xp (without the head parameter) is the set of additional producers for each consumer yYc. Otherwise, Yc just contains the head parameter Π2(e) and the elements of Xp are additional producers of Π2(e). After processing the above rules, the relation ρ is updated to map each producer xXp to each consumer yYc. Appendix  C presents a simple example using this algorithm. A more detailed example is presented in Section 5.

graphic

4. IMPLEMENTATION

The DX-MAN platform [50] implements the semantics of the DX-MAN model and provides Java APIs to algebraically compose, deploy and execute IoT systems. Our approach is implemented on top of that platform and uses the Blockchain as the underlying decentralized data space. Please note that a Blockchain platform is just a possible implementation of a decentralized data space. Other possible implementations include OpenLink Data Spaces (ODS), ZeroMQ messaging, a shared memory for IoT or even Apache Storm.4 We chose a Blockchain implementation to ensure data ownership for every service and data provenance for discovering data flow histories. Furthermore, we guarantee an extra layer of performance, security and auditability. It is important to note that we do not claim any contributions in terms of implementation. Our implementation is just for validation purposes.

Our approach uses Hyperledger Fabric 1.2 as the underlying Blockchain infrastructure and Hyperledger Composer 0.20.0 to define a Blockchain model based on three smart contracts (Fig. 13). CreateParameters initialises the consumer parameters of an IoT system, whereas UpdateParameters and ReadParameters change and retrieve a list of parameter values from the data space, respectively. The implementation logic of the contracts is written in JavaScript and it is shown in Appendix  D.

Blockchain model.
Figure 13

Blockchain model.

The DX-MAN platform provides the programming abstractions for composing IoT services (at design-time) and selecting IoT workflows (at deployment-time). When a workflow is selected, a data forest is automatically created for the respective control flow. Workflow selection is out of the scope of this paper; on this matter, we refer the reader to [42]. Also, to guarantee openness and replicability, the source code of the DX-MAN platform is available at https://github.com/damianarellanes/dxman.

Once workflows have been selected, Algorithm 1 analyses the edges of all the data forests involved to build a Java Hashmap (i.e. a binary relation ς) with a consumer parameter UUID as the key and a list of producer parameter UUIDs as the value. The entries of the map are stored as Parameter assets in the Blockchain using the transaction CreateParameters. In particular, the field parameterId is the hashmap key and the field producers is the hashmap value (see lines 7 and 12 in Fig. 13).

At run-time, exogenous operators coordinate an IoT system execution by passing control only via CoAP messages.5 When an exogenous operator receives control, it performs one of the processes shown in Figure 14(b and c) only if input data are needed; otherwise, steps 1–2 are omitted. When control triggers an invocation connector, the latter performs the process shown in Figure 14(a). Firstly, the connector executes the transaction ReadParameters to retrieve the necessary inputs from the data space. For each input, the Blockchain directly consumes values from the producer list.

Steps performed by (a) an invocation connector and (b–c) an exogenous operator after receiving control. (b–c) are only applicable for exogenous operators that require input data. Other operators ignore the steps 1 and 2.
Figure 14

Steps performed by (a) an invocation connector and (b–c) an exogenous operator after receiving control. (b–c) are only applicable for exogenous operators that require input data. Other operators ignore the steps 1 and 2.

Producer data are stored in the Blockchain as soon as it is available, even before control reaches consumer services. To avoid run-time synchronisation problems, control flow blocks in an invocation connector until all needed input values are retrieved. Next, the connector executes the computation of an operation by passing it the respective data. The result is stored in the Blockchain via the transaction UpdateParameters.

UpdateParameters raises an UpdateParameterEvent after changing a parameter value. This is useful for data processor instances that subscribe to events notified by producer parameters. Thus, a data processor instance performs a user-defined computation after receiving all the events for its inputs. Currently, our implementation supports mappers, reducers and filters, but further processors (e.g. a replicator) can be implemented conforming to the semantics presented in Section 3.3.

Consumer parameters are unaware of data producers, and vice versa, since data references are stored in the Blockchain. Furthermore, the mapping generated by Algorithm 1 avoids the inefficient approach of passing data values through exogenous operators at run-time. Therefore, as consumer parameters read data directly from producers and they are unaware of each other, our approach enables a (transparent) decentralized data exchange.

5. CASE STUDY

This section describes how to enable decentralized data flows in the smart parking scenario presented in Section 2, by covering the three phases of a DX-MAN system life-cycle.

5.1. Design-time

At design-time, we use the syntactic constructs presented in Section 3 to define a multi-level composition structure for our smart parking application (Fig. 15). This process is done in a hierarchical bottom-up manner, starting with the composition of Sensor Services into SensorNet, via the parallelizer operator N. Afterwards, we use the sequencer operator C to compose the SensorNet composite, the Availability Checking Service and the Reservation Service into CityManagement. In the next level, the GPS Service and CityManagement are composed into Information via the parallelizer operator I. Finally, we use the sequencer T to compose Information, Mapping Service and Space Finding Service into CarControlApp (i.e. the top-level composite). Note that, according to the semantics presented in Section 3, each composite service is a workflow space with a (finite or infinite) set of workflow variants and a (finite or infinite) set of data forest variants. For clarity and conciseness, we do not present them. Instead, we refer the reader to our previous work on this matter [42, 43].

DX-MAN composition for the smart parking scenario.
Figure 15

DX-MAN composition for the smart parking scenario.

5.2. Deployment-time

For each composite, we define a concrete workflow tree to explicitly choose a workflow and a data forest from the respective space (Section 3). In particular, for SensorNet there is a workflow that executes all sensor operations in parallel. For CityManagement, there is a workflow that sequentially invokes the SensorNet sub-workflow, the getFree operation and the getUnreser operation, in that order. For Information, there is a workflow that synchronises the execution of the getLocation operation and the CityManagement sub-workflow. Finally, in the top-level composite, the Information sub-workflow, the computeDis operation and the getNearest operation are triggered sequentially. The concrete workflow trees for each composite are shown in Figure 16 and the whole workflow of our scenario is depicted in Figure 17(a). The latter is just a concatenation of individual workflows.

Selecting workflows explicitly and data forests implicitly for the smart parking scenario.
Figure 16

Selecting workflows explicitly and data forests implicitly for the smart parking scenario.

Orthogonality between control flow (i.e. workflow) and data dependencies (i.e. data forests) in the smart parking scenario.
Figure 17

Orthogonality between control flow (i.e. workflow) and data dependencies (i.e. data forests) in the smart parking scenario.

In Section 3, we mentioned that data forests are implicitly selected when workflows are chosen. On that basis, the resulting data forests for each composite service are shown in Figure 16 and the complete concatenation of data forests is illustrated in Figure 17(b). Note that both workflows and data forests are not manually created, but just selected using the syntactic representation of concrete workflow trees.

Once data dependencies are in place, some data forests are refined (Fig. 18(a)). In particular, we remove the edges between the inputs of the parallel composites (i.e. SensorNet and Information) and the inputs of their respective sub-services. Similarly, we remove the edges connecting the inputs of the sequential composites (i.e. CarControlApp and CityManagement) and the inputs of the parallel composites. Note that this does not mean that edges are always refined. In some scenarios, refinement never takes place.

Transforming data forests into a binary relation between consumer parameters and producer parameters.
Figure 18

Transforming data forests into a binary relation between consumer parameters and producer parameters.

After refinement, Algorithm 1 analyses the edges of each data forest from the bottom-level composite (i.e. SensorNet) to the top-level one (i.e. CarControlApp). The resulting binary relation between consumer parameters and producer parameters is illustrated in Figure 18(b).

5.3. Run-time

At run-time, composition operators coordinate the execution of the selected workflow by passing control only (Fig. 19(a)), while atomic services exchange data directly (Fig. 19(b)). In particular, when a driver requests a nearest parking space, T passes control to the I operator which, in turn, forks control towards the invocation connector G and the C operator. When G is triggered, it activates the getLocation operation before passing control back to I. Meanwhile, C forwards control to N for activating the invocation connectors of all the sensors. After receiving control from all of them, N notifies the C operator for executing the operations getFree and getUnreser via their invocation connectors. Once control returns from R, C sends control back to I. If the latter has received control from G and C, it notifies T for invoking the operations computeDis and getNearest via M and S, respectively. The execution of the smart parking app terminates when T receives control from S.

Orthogonal control flows and data flows in the smart parking scenario.
Figure 19

Orthogonal control flows and data flows in the smart parking scenario.

Figure 19 illustrates the execution of the smart parking scenario, where it is clear that data never passes alongside control, but it flows in a decentralized manner according to the binary relation shown in Figure 18(b). It is important to note that control blocks in an invocation connector until all producer values become available (Section 4). For instance, A waits until all sensor data have been produced, before executing the getFree operation; thereby, guaranteeing synchronisation between the Availability Checking Service and all Sensor Services. Situations like this can be common since control generally travels faster than data over the network, and they may happen even in sequential workflows. This implies that an operation can be executing while control waits in the invocation connector of the next service in a pipeline.

6. DISTRIBUTED DATA FLOWS VERSUS DECENTRALIZED DATA FLOWS IN THE SMART PARKING SCENARIO

This section presents a quantitative comparison between the proposed approach (when data are separated from control) and distributed data flows (when data follow control), in terms of network traffic generated and response time perceived. Our evaluation is based on two OMNET++ simulations in which composition operators and invocation connectors are OMNET++ modules, whereas control flow and data flow are channels between modules.6 For fairness, we assume ideal wireless channels communicating over IEEE 802.11a links at a rate of 54Mbps, with no delay and no bit error rate. The configuration of our simulations is shown in Figure 20.

OMNET++ configuration for the smart parking system. The simulation on distributed data flows do not use the channels between invocation connectors, whereas the simulation on decentralized data flows use all visible channels.
Figure 20

OMNET++ configuration for the smart parking system. The simulation on distributed data flows do not use the channels between invocation connectors, whereas the simulation on decentralized data flows use all visible channels.

An OMNET++ experiment simulates drivers using the smart parking application between 07:59:45 and 16:26:47 (Appendix  A). In all the experiments, we assume that there is no network traffic for invoking operations, since invocation connectors reside in the same device as their connected operations. Also, composition operators and invocation connectors do not incur any processing time since they only forward data or control. The latter is just a 1-byte signal that activates an operator or a connector.

For simplicity and clarity, we also assume that IoT devices hosting atomic services have a fixed data processing time of 100μ seconds. This assumption is valid since having equal data processing rates does not influence the network traffic generated by atomic services.

Considering the above assumptions, the first experiment simulates decentralized data flows in the smart parking system. Here, control is exchanged among composition operators and invocation connectors, while data are passed between producer connectors and consumer connectors via ideal OMNET++ channels (Fig. 20). These channels simulate read-write operations on a decentralized space (regardless its implementation). These actions are performed by invocation connectors on behalf of their connected operations. In this experiment, the total network traffic from 07:59:45 to 16:26:47 is 51.93MB (Table 2). The additional traffic with respect to the calculation described in Appendix  B corresponds to the bytes required for control exchange (i.e. 0.44MB).7 This extra traffic is minimal since control is just a 1-byte signal that does not carry any additional data.

Table 2

Network traffic in the smart parking scenario (from 07:59:45 to 16:26:47).

Data traffic (MB)Control traffic (MB)Total (MB)
Decentralized51.490.4451.93
Distributed140.410.22140.63
Data traffic (MB)Control traffic (MB)Total (MB)
Decentralized51.490.4451.93
Distributed140.410.22140.63
Table 2

Network traffic in the smart parking scenario (from 07:59:45 to 16:26:47).

Data traffic (MB)Control traffic (MB)Total (MB)
Decentralized51.490.4451.93
Distributed140.410.22140.63
Data traffic (MB)Control traffic (MB)Total (MB)
Decentralized51.490.4451.93
Distributed140.410.22140.63

The second simulation considers a distributed version of the smart parking scenario, in which data pass alongside control through invocation connectors and composition operators. The only exception is when control traverses from T to G, and from I to the invocation connectors 1 and 577 (Fig. 21(b)). This is because the GPS Service and the Sensor Services do not require any input data from the CarControlApp composite (Fig. 18(a)). Using distributed data flows, the total network traffic from 07:59:45 to 16:26:47 is 140.63MB (Table 2).

Decentralized data flows versus distributed data flows in the smart parking scenario.
Figure 21

Decentralized data flows versus distributed data flows in the smart parking scenario.

Although our approach requires 50% more control messages than its counterpart, the network traffic is dramatically minimized by a rate of 2.71 times, i.e. (151.93140.63)10063.07% less network traffic than the distributed version. Particularly, the main benefit occurs when data are passed directly from the Reservation Service to the Mapping Service, since the distributed approach requires four network hops whereas our approach requires only one (Fig. 21).

Reducing network traffic improves the overall response time of the smart parking system by an average factor of 4. This factor is computed as follows:

(3)

where ψiα is the response time perceived by driver i when requesting a parking space under distributed data flows and ψiβ is the response time perceived by driver i when requesting a parking space under decentralized data flows. The response times (in seconds) are individually collected from our simulations for the 373 drivers arriving between 07:59:45 and 16:26:47.8

As an example, Figure 21 shows the time window for the driver’s request at 08:26 (i.e. 1621 seconds after 07:59:45), where it is clear that the system is executed in 0.08 seconds using the distributed approach and in 0.02 seconds with the decentralized version. This improvement is due to the fact that data and control travel independently over the network when decentralized data flows are used, so that control arrives before data in certain cases (e.g. see the timeline of R) and data arrives before control in others (e.g. see the timeline of M).

Table 2 presents the results of a quantitative evaluation that considers a few services. However, the benefit of our approach becomes more evident as the number of services increases. To evaluate this property, known as functional scalability, we conducted an experiment in which we gradually increase the data being processed by the smart parking application through the continuous addition of sensor services (up to 100 000). The aim of this experiment is to analyse network traffic in the order of Gigabytes. The results are shown in Figure 22.

Impact of increasing the number of sensor services in the smart parking scenario.
Figure 22

Impact of increasing the number of sensor services in the smart parking scenario.

Figure 22 shows that our approach outperforms distributed data flows in terms of data traffic, whereas the distributed version outperforms our approach in terms of control traffic. Despite this, the total network traffic is linearly improved by our approach since control messages are minimal (i.e. they always oscillate below 0.08 GB). Consequently, the total traffic and the data traffic are almost identical (cf., Fig. 22(a) and (c)).

The next section presents a generic evaluation of functional scalability that is independent of any implementation or specific case studies. In this evaluation, we do not only increase services horizontally (for the same composite) but also vertically (by increasing the number of composition levels).

7. EVALUATION OF FUNCTIONAL SCALABILITY

This section presents a comparative evaluation between decentralized data flows (when control flow and data flow are separated—Fig. 8(b)) and distributed data flows (when data follows control—Fig. 8(a)), in terms of functional scalability. To do so, two major research questions are studied: (RQ1) Does the proposed approach scale with the number of services? and (RQ2) Under which conditions are decentralized data flows beneficial?

To answer the above questions, we conducted a series of experiments based on the following statements.

 

Definition 7.1.

Gα=(Vα,Eα,Ωα) is a weighted graph of distributed data flows in which Vα is a set of parameters, Eα is a set of directed data flows (between parameters) and Ωα:EαR>0 is a function that maps a distributed data flow to a network communication cost s.t. Vα(VpVc).

 

Remark 1.

Vp is the type of a producer parameter and Vc is the type of a consumer parameter.

 

Definition 7.2.

M=e1,,el is the type of a finite walk in Gα for moving a data value through l data flows, from a producer parameter v1Vp to a consumer parameter vl+1Vc s.t. ej[1,l]Eα and vj[1,l+1]Vα, p(VαVp)  M.

 

Definition 7.3.
Given Gα, α:MR>0 is the function that computes the total network cost of routing data from a producer parameter p to a consumer parameter q, via distributed data flows. The function α is given as follows:
where MM, ej[1,|E(M)|]E(M), E(M)Eα, p(VαVp) and q(VαVc).

 

Notation 8.

Let ω:VpR>0 be the function that computes the network cost of writing/producing a parameter value on a data space.

 

Notation 9.

Let Γ:VcR>0 be the function that computes the network cost of reading/consuming a parameter value from a data space.

 

Definition 7.4.
Gβ=(Vβ,Eβ,Ωβ) is a weighted graph of decentralized data flows in which Vβ is a set of parameters, Eβ is a set of directed data flows (between parameters) and Ωβ:EβR>0 is a function that maps a decentralized data flow to a network communication cost s.t. Vβ(VpVc). The function Ωβ is defined as follows:
where eEβ, Π1β(e)(VβVp) and Π2β(e)(VβVc).

 

Remark 2.

The functions Π1β:EβVβ and Π2β:EβVβ are the tail map and the head map of an edge in Gβ, respectively.

 

Definition 7.5.
Given Gβ, β:EβR>0 is the total network cost of routing data from a producer parameter p to a consumer parameter q, via a decentralized data flow. The function β is notably given as follows:
where eEβ, p=Π1β(e), q=Π2β(e), p(VβVp) and q(VβVc).

We assume that Ωα, Γ and ω include all network communication factors, such as marshalling, unmarshalling, latency and bandwidth, just to name a few.

7.1. Initial experimental setting

The worst performance of a distributed approach occurs when all composite services are sequential. So, for our evaluation we consider the DX-MAN composition shown in Figure 23, which has four hierarchy levels, four atomic services and three composites. The sequential workflow variants selected for each composite and their respective data forests are also presented in the figure.

Initial experimental setting.
Figure 23

Initial experimental setting.

Figure 24 shows the resulting workflow control flow and the data exchange approaches that we use in our experiments. For the distributed approach (Fig. 24(b)), we consider the graph Gα in which data follows control and passes through workflow parameters via nine data flow edges with a Ωα(ej) cost each s.t. ej[1,9]Eα. Particularly, Gα has five data flow walks since the number of producer parameters is directly proportional to the number of walks: MiG=e1M, MoA=e2,e3M, MoB=e4,e5M, MoC=e6M and MoD=e7,e8,e9M s.t. iG,oA,oB,oC,oD(VαVp) and ej[1,9]Eα.

Distributed data flows versus decentralized data flows.
Figure 24

Distributed data flows versus decentralized data flows.

For the decentralized approach (Fig. 24(c)), we consider the graph Gβ in which data are separated from control and passes from producer parameters to consumer parameters directly. In this case, there are five data flows each with a network cost of Ωβ(ej) s.t. ej[1,5]Eβ. For example, the cost of routing the data produced by oA is Ωβ((oA,iB))=ω(oA)+Γ(iB). As Gβ is constructed with Algorithm 1 and does not consider any data exchanges through workflow parameters, the number of edges is directly proportional to the number of producer parameters (i.e. |Eβ|=|VβVp|=5) and the number of vertices is twice the number of producers (i.e. |Vβ|=2|VβVp|=10). Therefore, |Vβ|<|Vα|  |Eβ|<|Eα|  |Vα|=14  |Eα|=9.

7.2. RQ1: does the proposed approach scale with the number of services?

We conducted an experiment to dynamically increase the number of composites in the (four-level) composition shown in Figure 23. The experiment is carried out in τ=200000 steps with |Eα|(N[9,f(τ)]) and |Eβ|(N[5,g(τ)]) s.t. f(c)=4c+9 is the total number of distributed data flows and g(c)=2c+5 is the total number of decentralized data flows, when there are c(N[0,τ]) new composites in SF.

At each step of the experiment, we compose a replica of SE into SF and define a sequential workflow variant for the invocation of the operations opC and opD (Fig. 23). At the end of the experiment, there is a data pipeline in Gα for τ+1 composites in SF. In the case of Gβ, there are additional data flows between the parameters of the atomic services added.

As they capture the growth pattern of data flows, we use the functions f(c) and g(c) to analytically compare |Eα| and |Eβ| as the number of composites increases. Figure 25 shows the result of our analysis under the assumption that the weight functions of Eα and Eβ are Ωα:Eα{1} and Ωβ:Eβ{1}. On that basis, it is clear that the number of data exchanges grows linearly with the number of composite services and that the number of distributed data flows grows twice faster than its counterpart. The latter can be strictly explained by the average rate of change ΔfΔg=2 in the interval [0,τ].

Impact of data flows when increasing the number of composite services in $S_F$.
Figure 25

Impact of data flows when increasing the number of composite services in SF.

7.3. RQ2: under which conditions are decentralized data flows beneficial?

We conducted two experiments to determine if there is a benefit of decentralized data flows as the number of levels increases in a DX-MAN composition. To do so, we consider the total network costs of passing the output oD (of the atomic service SD) to an output oT (of a top-level composite), using both the distributed approach and the decentralized one s.t. oTVc. For both experiments, we assume that im(Ωα),im(ω),im(Γ)R(0,1].

For each experiment, we compute ten samples over 100 levels each. At each sample, we increase the number of composition levels by 1 and define random network costs for Eα and Eβ using Ωα and Ωβ, respectively. The decentralized approach yields a graph with the same number of edges no matter the composition levels, whereas the distributed approach increases the number of edges by 1 at every level. Thus, |MoD| also increases by 1. The improvement rate of the decentralized approach is given by 1β((oD,oT))α(MoD).

Figure 26 shows the result of our experiments, where it is clear that the benefit of decentralized data flows is logarithmic and, as such, it becomes more evident as the number of levels of a DX-MAN composition increases. This is because |MoD| increases at every level and so α(MoD). In the experiments, we allocate network costs using random values uniformly distributed in (0,1]R. However, if the cost of performing operations on a data space (i.e. β((oD,oT))) is more expensive than the total cost of routing data through composite coordinators (i.e. α(MoD)), the decentralized approach is outperformed by the distributed one. Thus, a DX-MAN composition only benefits from the proposed approach iff β((oD,oT))<α(MoD).

Improvement rate of decentralized data flows when increasing the number of levels in a DX-MAN composition: $\xi _1$ and $\xi _2$ correspond to Experiment 1 and Experiment 2, respectively.
Figure 26

Improvement rate of decentralized data flows when increasing the number of levels in a DX-MAN composition: ξ1 and ξ2 correspond to Experiment 1 and Experiment 2, respectively.

8. CONCLUSIONS

In this paper, we presented a functionally scalable approach that semantically separates control and data in DX-MAN compositions for the realization of decentralized data flows in service-oriented IoT systems. At design-time, the algebraic semantics of the model allows the definition of data forests that denote data dependencies between service operation parameters. At deployment-time, data forests are (manually) refined and (automatically) analysed by an algorithm that leverages the separation of control and data. The algorithm works for any workflow in a composite workflow space and produces a direct mapping from consumer parameters to producer parameters. Thus, preventing coordinators (i.e. exogenous operators) from passing data at run-time. In fact, exogenous operators coordinate an IoT system execution by passing control only. To realize our approach, we implemented it on top of the DX-MAN platform, which uses the Blockchain as the underlying data space for persisting direct mappings and managing data at run-time. It is important to mention that our approach is not exclusive to a specific data space implementation (e.g. Blockchain or ODS). For that reason, we evaluated the benefits of the proposed approach rather than the benefits of a particular implementation.

Unfortunately, as other composition models have their own constructs and do not define composition operators like DX-MAN, it is not possible to extend other composition model semantics using our proposed approach. Instead, a custom extension per model would be required to enable the semantic separation of data and control. Another interesting future direction is to explore the possibility of incorporating stateful services into the semantics of DX-MAN.

DX-MAN is a service composition model that semantically separates data, control and computation for separate reasoning, monitoring, maintenance and evolution of such dimensions. More concretely, this separation allows passing data from producer parameters to consumer parameters directly and enables the use of distinct technologies to manage control flows and data flows separately. For example, in our implementation we use CoAP to pass control between exogenous operators and the Blockchain for handling data flows.

Our experimental results confirm that our approach scales well with the number of services, by reducing the number of data flows by an average factor of two with respect to a distributed approach (where data follows control). They also show that our approach scales well with the number of levels of a DX-MAN composition. From our results, we found that the proposed approach provides the best performance when the cost of performing operations on a data space is less than the cost of exchanging data over the network. Thus, our solution is potentially beneficial for large-scale IoT systems in which loosely coupled services exchange huge amounts of data continuously.

DATA AVAILABILITY

The data underlying this article are available in UCI Machine Learning Repository at https://archive.ics.uci.edu/ml/datasets/Parking+Birmingham#.

FUNDING

This work was partially supported by the I-BiDaaS project, funded by the European Union's Horizon 2020 Research and Innovation programme under grant agreement No. 780787.

CONFLICT OF INTEREST

The authors declare that they have no known conflict of interest.

Footnotes

1

For the rest of the paper, the terms IoT system and service-oriented IoT system are used interchangeably.

2

Section 2 provides a running example to describe the problem of coordinated data exchanges when using exogenous composition.

3

As a DX-MAN workflow is semantically equivalent to an orchestration, a DX-MAN composite is equivalent to a (potentially) infinite number of orchestrations.

4

A pipeline of different technologies can be used to efficiently implement data exchanges between services [51].

7

For increased clarity in our motivation example (Section 2), the calculations in Appendix  B do not consider control flow.

8

Appendix  A presents the computation for the number of drivers.

REFERENCES

[1]

Arellanes
,
D.
and
Lau
,
K.-K.
(
2020
)
Evaluating IoT service composition mechanisms for the scalability of IoT systems
.
Fut. Gener. Comput. Syst.
,
108
,
827
848
.

[2]

Sarkar
,
C.
,
Nambi
,
A.U.
,
Prasad
,
R.V.
,
Rahim
,
A.
,
Neisse
,
R.
and
Baldini
,
G.
(
2015
)
DIAT: a scalable distributed architecture for IoT
.
IEEE Internet Things J.
,
2
,
230
239
.

[3]

Newman
,
P.
(
2020
). The Internet of Things  
2020
. https://www.businessinsider.com/internet-of-things-report?r=US&IR=T (
accessed December 14, 2021
).

[4]

Arellanes
,
D.
(
2021
)
Self-organizing software models for the internet of things: complex software structures that emerge without a central controller
.
IEEE Syst. Man Cybernet. Mag.
,
7
,
4
9
.

[5]

Botta
,
A.
,
de
 
Donato
,
W.
,
Persico
,
V.
and
Pescapé
,
A.
(
2016
)
Integration of cloud computing and internet of things: A survey
.
Fut. Gener. Comput. Syst.
,
56
,
684
700
.

[6]

Giang
,
N.K.
,
Lea
,
R.
and
Leung
,
V.C.M.
(
2020
)
Developing applications in large scale, dynamic fog computing: a case study
.
Software
,
50
,
519
532
.

[7]

Paniagua
,
C.
,
Eliasson
,
J.
and
Delsing
,
J.
(
2020
)
Efficient device-to-device service invocation using arrowhead orchestration
.
IEEE Internet Things J.
,
7
,
429
439
.

[8]

Hahn
,
M.
,
Breitenbücher
,
U.
,
Kopp
,
O.
and
Leymann
,
F.
(
2018
)
Modeling and execution of data-aware choreographies: an overview
.
Comput. Sci.
,
33
,
329
340
.

[9]

Jaradat
,
W.
,
Dearle
,
A.
and
Barker
,
A.
(
2016
)
Towards an autonomous decentralized orchestration system
.
Concurr. Comput.
,
28
,
3164
3179
.

[10]

Pantazoglou
,
M.
,
Pogkas
,
I.
and
Tsalgatidou
,
A.
(
2014
)
Decentralized enactment of BPEL processes
.
IEEE Trans. Serv. Comput.
,
7
,
184
197
.

[11]

Barker
,
A.
,
Weissman
,
J.B.
and
Van Hemert
,
J.I.
(
2012
)
Reducing data transfer in service-oriented architectures: the circulate approach
.
IEEE Trans. Serv. Comput.
,
5
,
437
449
.

[12]

Sonntag
,
M.
,
Gorlach
,
K.
,
Karastoyanova
,
D.
,
Leymann
,
F.
and
Reiter
,
M.
(
2010
)
Process space-based scientific workflow enactment
.
Int. J. Bus. Process Integr. Manage.
,
5
,
32
44
.

[13]

Barker
,
A.
,
Walton
,
C.D.
and
Robertson
,
D.
(
2009
)
Choreographing web services
.
IEEE Trans. Serv. Comput.
,
2
,
152
166
.

[14]

Binder
,
W.
,
Constantinescu
,
I.
and
Faltings
,
B.
(
2009
)
Service invocation triggers: A lightweight routing infrastructure for decentralised workflow orchestration
.
Int. J. High Performance Comput. Netw.
,
6
,
81
90
.

[15]

Hahn
,
M.
,
Breitenbücher
,
U.
,
Leymann
,
F.
,
Wurster
,
M.
, and
Yussupov
,
V.
(
2018
)
Modeling Data Transformations in Data-Aware Service Choreographies
.
International Enterprise Distributed Object Computing Conference (EDOC)
,
Stockholm, Sweden
, pp.
28
34
.
IEEE
.

[16]

Do
,
T.-X.
and
Kim
,
Y.
(
2017
)
Control and data plane separation architecture for supporting multicast listeners over distributed mobility management
.
ICT Express
,
3
,
90
95
.

[17]

Mohamed
,
A.
,
Onireti
,
O.
,
Imran
,
M.A.
,
Imran
,
A.
and
Tafazolli
,
R.
(
2016
)
Control-data separation architecture for cellular radio access networks: a survey and outlook
.
IEEE Commun. Surv. Tutor.
,
18
,
446
465
.

[18]

Filippini
,
I.
,
Redondi
,
A.E.C.
and
Capone
,
A.
(
2017
)
Beyond cellular green generation: potential and challenges of the network separation
.
Mobile Inform. Syst.
,
13
,
1
11
.

[19]

Liu
,
D.
(
2002
) Data-flow Distribution in FICAS Service Composition Infrastructure.
International Conference on Parallel and Distributed Computing Systems (PDCS)
, ISCA,
Lousville, KY
, pp.
1
6
.

[20]

Barker
,
A.
,
Weissman
,
J. B.
, and
van
 
Hemert
,
J.
(
2008
) Eliminating the middleman: peer-to-peer dataflow.
International Symposium on High Performance Distributed Computing (HPDC)
,
Boston, MA
,
June
, p.
55
64
.
ACM
.

[21]

Barker
,
A.
,
Weissman
,
J. B.
, and
van
 
Hemert
,
J.
(
2008
) Orchestrating data-centric workflows.
International Symposium on Cluster Computing and the Grid (CCGRID)
,
Lyon, France
, pp.
210
217
.
IEEE
.

[22]

Aalst
,
W.M.P.V.D.
,
Aldred
,
L.
,
Dumas
,
M.
and
Hofstede
,
A.H.M.T.
(
2004
) Design and implementation of the YAWL system. In
Persson
,
A.
,
Stirna
,
J.
(eds)
Advanced Information Systems Engineering, Lecture Notes in Computer Science
(Vol.
3084
).
Springer
,
Berlin, Heidelberg
.

[23]

OASIS
(
2007
).
Web services business process execution language version 2.0
. http://docs.oasis-open.org/wsbpel/2.0/OS/wsbpel-v2.0-OS.html (
accessed December 14, 2021
).

[24]

Morrison
,
J.P.
(
1978
)
Data stream linkage mechanism
.
IBM Syst. J.
,
17
,
383
408
.

[25]

Kahn
,
G.
and
Macqueen
,
D.
(
1977
) Coroutines and networks of parallel processes.
International Federation for Information Processing (IFIP)
,
Toronto, Canada
, pp.
993
998
.
IFIP
.

[26]

Cherrier
,
S.
,
Salhi
,
I.
,
Ghamri-Doudane
,
Y.M.
,
Lohier
,
S.
and
Valembois
,
P.
(
2014
)
BeC 3: behaviour crowd centric composition for IoT applications
.
Mobile Netw. Appl.
,
19
,
18
32
.

[27]

Seeger
,
J.
,
Deshmukh
,
R.A.
,
Sarafov
,
V.
and
Bröring
,
A.
(
2019
)
Dynamic IoT choreographies
.
IEEE Pervasive Comput.
,
18
,
19
27
.

[28]

Macker
,
J.P.
and
Taylor
,
I.
(
2017
)
Orchestration and analysis of decentralized workflows within heterogeneous networking infrastructures
.
Fut. Gener. Comput. Syst.
,
75
,
388
401
.

[29]

Giang
,
N.K.
,
Lea
,
R.
and
Leung
,
V.C.M.
(
2018
)
Exogenous coordination for building fog-based cyber physical social computing and networking systems
.
IEEE Access
,
6
,
31740
31749
.

[30]

Arellanes
,
D.
and
Lau
,
K.-K.
(
2018
) Analysis and classification of service interactions for the scalability of the internet of things.
International Congress on Internet of Things (ICIOT)
,
San Francisco, CA
, pp.
80
87
.
IEEE
.

[31]

Autili
,
M.
,
Inverardi
,
P.
and
Tivoli
,
M.
(
2018
)
Choreography realizability enforcement through the automatic synthesis of distributed coordination delegates
.
Sci. Comput. Progr.
,
160
,
3
29
.

[32]

Wutke
,
D.
,
Martin
,
D.
, and
Leymann
,
F.
(
2008
) Model and infrastructure for decentralized workflow enactment.
ACM Symposium on Applied Computing (SAC)
,
Fortaleza, Ceara, Brazil
, pp.
90
94
.
ACM
.

[33]

Giang
,
N. K.
,
Blackstock
,
M.
,
Lea
,
R.
, and
Leung
,
V. C. M.
(
2015
) Developing IoT applications in the Fog: A Distributed Dataflow approach.
International Conference on the Internet of Things (IOT)
,
Seoul, Korea (South)
, pp.
155
162
.
IEEE
.

[34]

Arbab
,
F.
(
2005
)
Abstract behavior types: A foundation model for components and their composition
.
Sci. Comput. Progr.
,
55
,
3
52
.

[35]

Ghanem
,
M.
,
Curcin
,
V.
,
Wendel
,
P.
and
Guo
,
Y.
(
2008
) Building and using analytical workflows in discovery net. In
Dubitzky
,
W.
(ed)
Data Mining Techniques on the Grid
.
Wiley Publishing
,
New York
.

[36]

Ludäscher
,
B.
,
Altintas
,
I.
,
Berkley
,
C.
,
Higgins
,
D.
,
Jaeger
,
E.
,
Jones
,
M.
,
Lee
,
E.A.
,
Tao
,
J.
and
Zhao
,
Y.
(
2005
)
Scientific workflow management and the Kepler system
.
Concurr. Comput.
,
18
,
1039
1065
.

[37]

Decker
,
G.
,
Kopp
,
O.
and
Barros
,
A.
(
2008
)
An introduction to service choreographies
.
Inform. Technol.
,
52
,
122
127
.

[38]

Open JS Foundation
(
2020
)
Node-RED: flow-based programming for the internet of things
. https://www.nodered.org (accessed December 14, 2021).

[39]

Ngu
,
A.
,
Gutierrez
,
M.
,
Metsis
,
V.
,
Nepal
,
S.
and
Sheng
,
Q.
(
2017
)
IoT middleware: A survey on issues and enabling technologies
.
IEEE Internet Things J.
,
4
,
1
20
.

[40]

Xue
,
G.
,
Liu
,
D.
,
Liu
,
J.
and
Yao
,
S.
(
2019
)
A process partitioning technique for constructing decentralized web service compositions
.
Softw.: Pract. Exp.
,
49
,
1550
1570
.

[41]

Cheng
,
B.
,
Solmaz
,
G.
,
Cirillo
,
F.
,
Kovacs
,
E.
,
Terasawa
,
K.
and
Kitazawa
,
A.
(
2018
)
FogFlow: easy programming of IoT services over cloud and edges for smart cities
.
IEEE Internet Things J.
,
5
,
696
707
.

[42]

Arellanes
,
D.
and
Lau
,
K.-K.
(
2019
)
Workflow Variability for Autonomic IoT Systems
.
International Conference on Autonomic Computing (ICAC)
,
Umea, Sweden
, pp.
24
30
.
IEEE
.

[43]

Arellanes
,
D.
and
Lau
,
K.-K.
(
2018
) Algebraic service composition for user-centric IoT applications. In
Georgakopoulos
,
D.
,
Zhang
,
L.-J.
(eds)
Internet of Things - ICIOT 2018
.
Lecture Notes in Computer Science
(Vol.
10972
).
Springer International Publishing
,
Cham
.

[44]

Arellanes
,
D.
and
Lau
,
K.-K.
(
2017
) Exogenous Connectors for Hierarchical Service Composition.
International Conference on Service-Oriented Computing and Applications (SOCA)
,
Kanazawa, Japan
, pp.
125
132
.
IEEE
.

[45]

Lau
,
K.-K.
and
Di Cola
,
S.
(
2017
)
An Introduction to Component-based Software Development
(1st edn).
World Scientific
,
Singapore
.

[46]

Rana
,
T.
,
Bangash
,
Y.A.
,
Baz
,
A.
,
Rana
,
T.A.
and
Imran
,
M.A.
(
2020
)
Incremental composition process for the construction of component-based management systems
.
Sensors
,
20
,
1
18
.

[47]

Arbab
,
F.
,
Autili
,
M.
,
Inverardi
,
P.
and
Tivoli
,
M.
(
2019
) Different glasses to look into the three cs: component, connector, coordination. In
Boreale
,
M.
,
Corradini
,
F.
,
Loreti
,
M.
,
Pugliese
,
R.
(eds)
Models, Languages, and Tools for Concurrent and Distributed Programming
.
Springer International Publishing
,
Cham
.

[48]

Netflix
(
2020
). Conductor. https://netflix.github.io/conductor/ (
accessed December 14, 2021
).

[49]

Arellanes
,
D.
and
Lau
,
K.-K.
(
2019
) Decentralized data flows in algebraic service compositions for the scalability of IoT systems.
World Forum on Internet of Things (WF-IoT)
,
Limerick, Ireland
,
April
, pp.
668
673
.
IEEE
.

[50]

Arellanes
,
D.
and
Lau
,
K.-K.
(
2017
) D-XMAN: a platform for total compositionality in service-oriented architectures.
International Symposium on Cloud and Service Computing (SC2)
,
Kanazawa, Japan
, pp.
283
286
.
IEEE
.

[51]

Fu
,
Y.
and
Soman
,
C.
(
2021
) Real-time data infrastructure at Uber.
International Conference on Management of Data (SIGMOD/PODS)
,
China
, pp.
2503
2516
.
ACM
.

[52]

Stolfi
,
D.H.
,
Alba
,
E.
and
Yao
,
X.
(
2017
) Predicting car park occupancy rates in smart cities. In
Alba
,
E.
,
Chicano
,
F.
,
Luque
,
G.
(eds)
Smart Cities
.
Lecture Notes in Computer Science
(Vol.
10268
).
Springer International Publishing
,
Cham
.

[53]

Morrison
,
J.P.
(
2010
)
Flow-Based Programming: A New Approach to Application Development
(2nd edn).
CreateSpace
,
USA
.

[54]

Hahn
,
M.
,
Breitenbucher
,
U.
,
Leymann
,
F.
and
Weiss
,
A.
(
2017
) TraDE - a transparent data exchange middleware for service choreographies. In
Panetto
,
H.
,
Debruyne
,
C.
,
Ardagna
,
C.A.
,
Gaaloul
,
W.
,
Papazoglou
,
M.
,
Paschke
,
A.
,
Meersman
,
R.
(eds)
On the Move to Meaningful Internet Systems
.
Lecture Notes in Computer Science
(Vol.
10573
).
Springer International Publishing
.

[55]

Guimaraes
,
F. P.
,
Kuroda
,
E. H.
, and
Batista
,
D. M.
(
2012
)
Performance Evaluation of Choreographies and Orchestrations with a New Simulator for Service Compositions
.
International Workshop on Computer Aided Modeling and Design of Communication Links and Networks (CAMAD)
,
Barcelona, Spain
, pp.
140
144
.
IEEE
.

[56]

IoT
 
Analytics
(
2018
). State of the IoT 2018: Number of IoT devices now at 7B - Market accelerating. https://iot-analytics.com/state-of-the-iot-update-q1-q2-2018-number-of-iot-devices-now-7b/ (
accessed December 14, 2021
).

[57]

Nanda
,
M. G.
,
Chandra
,
S.
, and
Sarkar
,
V.
(
2004
) Decentralizing execution of composite web services.
ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications (OOPSLA)
,
Vancouver, BC, Canada
, pp.
170
187
.
ACM
.

[58]

Kleinfeld
,
R.
,
Steglich
,
S.
,
Radziwonowicz
,
L.
, and
Doukas
,
C.
(
2014
) Glue.things: a mashup platform for wiring the internet of things with the internet of services.
International Workshop on Web of Things (WoT)
,
Cambridge, MA
, pp.
16
21
.
ACM
.

[59]

Chafle
,
G.
,
Chandra
,
S.
,
Mann
,
V.
, and
Nanda
,
M. G.
(
2004
) Decentralized orchestration of composite web services.
International World Wide Web conference (WWW)
,
New York, NY
, pp.
134
143
.
ACM
.

[60]

Zhou
,
J.
,
Leppanen
,
T.
,
Harjula
,
E.
,
Ylianttila
,
M.
,
Ojala
,
T.
,
Yu
,
C.
,
Jin
,
H.
, and
Yang
,
L. T.
(
2013
) CloudThings: a common architecture for integrating the internet of things with cloud computing.
Proceedings of the 2013 IEEE 17th International Conference on Computer Supported Cooperative Work in Design (CSCWD)
,
Whistler, BC, Canada
, pp.
651
657
.
IEEE
.

A. APPENDIX

This appendix presents the notation required for the analysis of the IoT scenario described in Section 2. This scenario considers a real-world data set publicly available at https://archive.ics.uci.edu/ml/datasets/Parking+Birmingham#, which collects occupancy status in car parks operated by the Birmingham City Council, from 07:59:45 to 16:26:47 on October 22, 2016 [52]. Table A3 presents the data obtained for the car park BHMBCCMKT01 whose total capacity is 577. It particularly shows the measurement time, the t seconds elapsed after 07:59:45, the number of occupied spaces O(t) and the rate of available parking spaces (calculated by 1O(t)577).

Table A.1

Occupancy in the car park BHMBCCMKT01.

Measurement timetO(t)Availability rate
07:59:450570.90
08:26:461621590.90
08:59:443599720.88
09:26:475222930.84
09:59:4471991300.77
10:26:4788221560.73
10:59:47108021980.66
11:32:43127782390.59
11:59:46144012610.55
12:32:47163823240.44
12:59:47180023590.38
13:32:43199783800.34
13:59:48216034080.29
14:39:47240024300.25
14:59:48252034240.27
15:26:49268244030.30
15:59:47288023840.33
16:26:47304223400.41
Measurement timetO(t)Availability rate
07:59:450570.90
08:26:461621590.90
08:59:443599720.88
09:26:475222930.84
09:59:4471991300.77
10:26:4788221560.73
10:59:47108021980.66
11:32:43127782390.59
11:59:46144012610.55
12:32:47163823240.44
12:59:47180023590.38
13:32:43199783800.34
13:59:48216034080.29
14:39:47240024300.25
14:59:48252034240.27
15:26:49268244030.30
15:59:47288023840.33
16:26:47304223400.41
Table A.1

Occupancy in the car park BHMBCCMKT01.

Measurement timetO(t)Availability rate
07:59:450570.90
08:26:461621590.90
08:59:443599720.88
09:26:475222930.84
09:59:4471991300.77
10:26:4788221560.73
10:59:47108021980.66
11:32:43127782390.59
11:59:46144012610.55
12:32:47163823240.44
12:59:47180023590.38
13:32:43199783800.34
13:59:48216034080.29
14:39:47240024300.25
14:59:48252034240.27
15:26:49268244030.30
15:59:47288023840.33
16:26:47304223400.41
Measurement timetO(t)Availability rate
07:59:450570.90
08:26:461621590.90
08:59:443599720.88
09:26:475222930.84
09:59:4471991300.77
10:26:4788221560.73
10:59:47108021980.66
11:32:43127782390.59
11:59:46144012610.55
12:32:47163823240.44
12:59:47180023590.38
13:32:43199783800.34
13:59:48216034080.29
14:39:47240024300.25
14:59:48252034240.27
15:26:49268244030.30
15:59:47288023840.33
16:26:47304223400.41

 

Notation 10.
Let T={0,1621,3599,5222,7199,8822,10802,12778,14401,16382,18002,19978,21603,24002,25203,26824,28802,30422} be the set of measurement times in our sample. Then, ΔO(ti,ti+1) is the function that computes the number of drivers arriving to the car park BHMBCCMKT01 between ti and ti+1 seconds:
(4)
s.t. i=1|T|1ΔO(ti,ti+1)=373 and ti,ti+1T. This means that there are 373 drivers arriving to the car park BHMBCCMKT01 between 07:59:45 and 16:26:47.

Considering the columns 2 and 4 from our data set, we performed a cubic regression to approximate a function A(t) for the rate of available parking spaces at t seconds. The resulting equation has an average relative error of 1.98% and it is defined as follows:

where 0t30422.

The probability of a parking space being unreserved at t seconds is then A(t). Following the same trend of demand for parking spaces, the rate of parking spaces that are both free and unreserved at t seconds is given by the function U(t) as follows:

where 0t30422.

B. APPENDIX

This appendix presents the result of the calculations for the total data transmitted over the network in our IoT scenario (Section 2 and Appendix  A). The calculations are presented for each data exchange approach, i.e. centralized data flows, distributed data flows and decentralized data flows. For all of them, we assume that:

  1. There are 577 occupancy sensors near a driver, corresponding to each space in the car park BHMBCCMKT01.

  2. Each Sensor Service is hosted in an occupancy sensor and produces 100 bytes.

  3. Once all sensor data are collected by a reducer, it is processed by the Availability Checking Service.

  4. The Mapping Service appends 20 bytes to each parking space that is both free and unreserved. These extra bytes correspond to the distance between the driver’s location and each occupancy sensor.

  5. The Space Finding Service produces 40 bytes for the coordinates of the nearest parking space.

  6. There is a driver requesting a nearest parking space every ti+1tiΔO(ti,ti+1) seconds between tiT and ti+1T (Notation  .15 and Equation 4).

  7. The GPS service produces 40 bytes for the coordinates of the driver.

  8. The driver always finds a nearest parking space, since the car park is never full (see O(t) in Table A1).

 

Notation 11.

Considering assumptions (1) and (2), let θ=577100=57700 be the constant for the total sensor data (in bytes) produced for a driver’s request.

For the sake of clarity and conciseness, we use the functions below to compute the number of bytes returned by the Availability Checking Service, the Reservation Service, the Mapping Service and the Space Finding Service, respectively.

 

Notation 12.

Let SA(t)=θA(t) be the function that computes the number of bytes returned by the Availability Checking Service. These bytes correspond to the list of parking spaces that are available t seconds after 07:59:45 s.t. 0t30422.

 

Notation 13.

Let SR(t)=θU(t) be the function that computes the number of bytes returned by the Reservation Service. These bytes correspond to the list of parking spaces that are both available and unreserved t seconds after 07:59:45 s.t. 0t30422.

 

Notation 14.

Considering assumptions (1) and (4), let SM(t)=SR(t)+577U(t)20 be the function that computes the number of bytes returned by the Mapping Service, where 0t30422. These bytes correspond to the list of parking spaces that are both available and unreserved t seconds after 07:59:45, plus the appended bytes for the distance between each parking space in the list and the driver’s location.

 

Notation 15.

Considering the assumption (5), let SS=40 be the constant for the number of bytes returned by the Space Finding Service, which represent the coordinates of the nearest parking space.

 

Notation 16.

Considering the assumption (7), let SG=40 be the constant for the number of bytes returned by the GPS Service, which represent the coordinates of the driver.

A centralized approach for exchanging data requires two data flows for moving data from a service producer to a service consumer. Therefore, Bγ denotes the total data exchanged over the network from 07:59:45 to 16:26:47 when a centralized approach is used:

(5)

For the distributed approach, each coordinator requires two data flows for passing data between a service producer and a service consumer. In addition, Coordinator1 requires a data flow for moving data to the Coordinator2 which, in turn, requires another data flow for passing data to the Coordinator3. Thus, the exchanges between the Availability Checking Service and the Reservation Service are done with three data flows. Likewise, there are three data flows for passing data from the Reservation Service to the Mapping Service. The symbol Bα denotes the total data exchanged over the network from 07:59:45 to 16:26:47 when a distributed approach is used:

(6)

A decentralized approach is the most efficient since it requires only one data flow for moving data from a service producer to a service consumer. Based on this premise, Bβ denotes the total data exchanged over the network from 07:59:45 to 16:26:47 when a decentralized approach is used:

(7)

Taking into account the above calculations, the distributed approach leads to (BαBγ1)10019.80% more network traffic than the centralized approach. Likewise, the decentralized approach produces (1BβBγ)10050.00% less network traffic than the centralized approach and (1BβBα)10058.26% less traffic than the distributed one.

C. APPENDIX

This appendix presents an example of how data forests are analysed in a bottom-up fashion using the Algorithm 1 at deployment-time. Our example considers the three-level data forest depicted in Figure C.2, which obeys the connection rules described in Section 3.2. Here, G is the forest of a parallel workflow that simultaneously executes the corresponding workflow for F and an atomic operation (with input iD and output oD). The workflow for F sequentially executes the workflow for E and then an atomic operation (with input iC and output oC). To pre-process data, F has a filter that takes the data produced by E before sending it to the corresponding operation. In our example, E is the lowest-level data forest defined by a branchial workflow that uses the input parameter iS to choose an operation to execute (out of two possible ones).

Deployment of the data forest shown in Figure C.2.
Figure B.1

Deployment of the data forest shown in Figure C.2.

Three-level data forest.
Figure C.2

Three-level data forest.

As described in Section 3.4, the deployment process begins at the bottom-level. So, we first analyse the edges of the branchial data forest E via the Algorithm 1. The resulting binary relation is shown in Figure B.1.(a).

In the next step, we analyse the edges of the sequential data forest F to yield the binary relation described in Figure B.1.(b). Here, we can notice that the relations from the output oE are now the relations from iP. This is because the filter input does not require to read from oE. Instead, the filter reads data directly from any of the actual producers, viz. the operation output oA or the operation output oB. Likewise, the input parameters iA, iB and iS do not read data from iE since a new (intermediate) top-level data producer has been found (i.e. iF). Complementarily, the operation input iC reads data directly from the filter output, whereas the (intermediate) top-level output oF reads data from oC.

Finally, in the last deployment step, the edges of the data forest G are analysed to yield the binary relation depicted in Figure B.1.(c). As iG is the new top-level data producer, the relations from consumers iA, iB and iS are updated accordingly. Similarly, as G is a parallel forest, the top-level output oG takes both the data source from oF (i.e. oC) and the output oD. At the end of the deployment process, we have a mapping scheme that discards unnecessary relations to allow data consumers read values directly from data producers.

D. APPENDIX

Figure D.3. presents the JavaScript source code of the three smart contracts used in our implementation. The source code follows the contract model definition described in Section 4.

Smart contracts logic.
Figure D.3

Smart contracts logic.

E. APPENDIX

This appendix presents the notation used throughout the paper.

E.1

IOT SCENARIO

O(t)Number of occupied parking spaces at t seconds after 07:59:45
TSet of measurement times (in seconds)
ΔO(ti,ti+1)Number of drivers requesting a nearest space between tiT and ti+1T seconds
A(t)Rate of free spaces at t seconds after 07:59:45
U(t)Rate of spaces that are both free and unreserved at t seconds after 07:59:45
θTotal sensor data (in bytes) produced for a driver’s request
SA(t)Number of bytes produced by the Availability Checking Service for a driver’s request
SR(t)Number of bytes produced by the Reservation Service for a driver’s request
SM(t)Number of bytes produced by the Mapping Service for a driver’s request
SSFixed number of bytes produced by the Space Finding Service
SGFixed number of bytes produced by the GPS Service
BγNumber of bytes exchanged in our scenario using a centralized approach
BαNumber of bytes exchanged in our scenario using a distributed approach
BβNumber of bytes exchanged in our scenario using a decentralized approach
O(t)Number of occupied parking spaces at t seconds after 07:59:45
TSet of measurement times (in seconds)
ΔO(ti,ti+1)Number of drivers requesting a nearest space between tiT and ti+1T seconds
A(t)Rate of free spaces at t seconds after 07:59:45
U(t)Rate of spaces that are both free and unreserved at t seconds after 07:59:45
θTotal sensor data (in bytes) produced for a driver’s request
SA(t)Number of bytes produced by the Availability Checking Service for a driver’s request
SR(t)Number of bytes produced by the Reservation Service for a driver’s request
SM(t)Number of bytes produced by the Mapping Service for a driver’s request
SSFixed number of bytes produced by the Space Finding Service
SGFixed number of bytes produced by the GPS Service
BγNumber of bytes exchanged in our scenario using a centralized approach
BαNumber of bytes exchanged in our scenario using a distributed approach
BβNumber of bytes exchanged in our scenario using a decentralized approach
E.1

IOT SCENARIO

O(t)Number of occupied parking spaces at t seconds after 07:59:45
TSet of measurement times (in seconds)
ΔO(ti,ti+1)Number of drivers requesting a nearest space between tiT and ti+1T seconds
A(t)Rate of free spaces at t seconds after 07:59:45
U(t)Rate of spaces that are both free and unreserved at t seconds after 07:59:45
θTotal sensor data (in bytes) produced for a driver’s request
SA(t)Number of bytes produced by the Availability Checking Service for a driver’s request
SR(t)Number of bytes produced by the Reservation Service for a driver’s request
SM(t)Number of bytes produced by the Mapping Service for a driver’s request
SSFixed number of bytes produced by the Space Finding Service
SGFixed number of bytes produced by the GPS Service
BγNumber of bytes exchanged in our scenario using a centralized approach
BαNumber of bytes exchanged in our scenario using a distributed approach
BβNumber of bytes exchanged in our scenario using a decentralized approach
O(t)Number of occupied parking spaces at t seconds after 07:59:45
TSet of measurement times (in seconds)
ΔO(ti,ti+1)Number of drivers requesting a nearest space between tiT and ti+1T seconds
A(t)Rate of free spaces at t seconds after 07:59:45
U(t)Rate of spaces that are both free and unreserved at t seconds after 07:59:45
θTotal sensor data (in bytes) produced for a driver’s request
SA(t)Number of bytes produced by the Availability Checking Service for a driver’s request
SR(t)Number of bytes produced by the Reservation Service for a driver’s request
SM(t)Number of bytes produced by the Mapping Service for a driver’s request
SSFixed number of bytes produced by the Space Finding Service
SGFixed number of bytes produced by the GPS Service
BγNumber of bytes exchanged in our scenario using a centralized approach
BαNumber of bytes exchanged in our scenario using a distributed approach
BβNumber of bytes exchanged in our scenario using a decentralized approach
E.2

The DX-MAN model

SThe type of a service
SSA service S
WThe type a workflow space
WSA workflow space W
The type of a composition operator
ZA composition operator defining a composite service Z
DThe type of a parameter
FA data forest F
V(F)The set of parameter vertices in a data forest F
vV(F)A parameter v in a data forest F
E(F)The set of edges in a data forest F
eE(F)An edge e (i.e. a pair of parameters) in a data forest F
SThe type of a service
SSA service S
WThe type a workflow space
WSA workflow space W
The type of a composition operator
ZA composition operator defining a composite service Z
DThe type of a parameter
FA data forest F
V(F)The set of parameter vertices in a data forest F
vV(F)A parameter v in a data forest F
E(F)The set of edges in a data forest F
eE(F)An edge e (i.e. a pair of parameters) in a data forest F
E.2

The DX-MAN model

SThe type of a service
SSA service S
WThe type a workflow space
WSA workflow space W
The type of a composition operator
ZA composition operator defining a composite service Z
DThe type of a parameter
FA data forest F
V(F)The set of parameter vertices in a data forest F
vV(F)A parameter v in a data forest F
E(F)The set of edges in a data forest F
eE(F)An edge e (i.e. a pair of parameters) in a data forest F
SThe type of a service
SSA service S
WThe type a workflow space
WSA workflow space W
The type of a composition operator
ZA composition operator defining a composite service Z
DThe type of a parameter
FA data forest F
V(F)The set of parameter vertices in a data forest F
vV(F)A parameter v in a data forest F
E(F)The set of edges in a data forest F
eE(F)An edge e (i.e. a pair of parameters) in a data forest F
E.3

Algorithm for edge analysis

Π1(e)The tail of a data forest edge e
Π2(e)The head of a data forest edge e
PIThe type of a processor input
POThe type of a processor output
OIThe type of an operation input
OOThe type of an operation output
WIThe type of a (top-level) workflow input
WOThe type of a (top-level) workflow output
EIThe type of an exogenous operator input
VcThe type of a vertex parameter that consumes data during workflow execution
vVcA consumer parameter v
VpThe type of a vertex parameter that produces data during workflow execution
vVcA producer parameter v
ςA binary relation between some consumer parameters and some producer parameters
dom(ς)The domain of the relation ς
ρA binary relation between some producer parameters and some consumer parameters
dom(ρ)The domain of the relation ρ
kThe sum of the number of edges in all the data forests involved in a multi-level composition
XpA set of producer parameters
YcA set of consumer parameters
Π1(e)The tail of a data forest edge e
Π2(e)The head of a data forest edge e
PIThe type of a processor input
POThe type of a processor output
OIThe type of an operation input
OOThe type of an operation output
WIThe type of a (top-level) workflow input
WOThe type of a (top-level) workflow output
EIThe type of an exogenous operator input
VcThe type of a vertex parameter that consumes data during workflow execution
vVcA consumer parameter v
VpThe type of a vertex parameter that produces data during workflow execution
vVcA producer parameter v
ςA binary relation between some consumer parameters and some producer parameters
dom(ς)The domain of the relation ς
ρA binary relation between some producer parameters and some consumer parameters
dom(ρ)The domain of the relation ρ
kThe sum of the number of edges in all the data forests involved in a multi-level composition
XpA set of producer parameters
YcA set of consumer parameters
E.3

Algorithm for edge analysis

Π1(e)The tail of a data forest edge e
Π2(e)The head of a data forest edge e
PIThe type of a processor input
POThe type of a processor output
OIThe type of an operation input
OOThe type of an operation output
WIThe type of a (top-level) workflow input
WOThe type of a (top-level) workflow output
EIThe type of an exogenous operator input
VcThe type of a vertex parameter that consumes data during workflow execution
vVcA consumer parameter v
VpThe type of a vertex parameter that produces data during workflow execution
vVcA producer parameter v
ςA binary relation between some consumer parameters and some producer parameters
dom(ς)The domain of the relation ς
ρA binary relation between some producer parameters and some consumer parameters
dom(ρ)The domain of the relation ρ
kThe sum of the number of edges in all the data forests involved in a multi-level composition
XpA set of producer parameters
YcA set of consumer parameters
Π1(e)The tail of a data forest edge e
Π2(e)The head of a data forest edge e
PIThe type of a processor input
POThe type of a processor output
OIThe type of an operation input
OOThe type of an operation output
WIThe type of a (top-level) workflow input
WOThe type of a (top-level) workflow output
EIThe type of an exogenous operator input
VcThe type of a vertex parameter that consumes data during workflow execution
vVcA consumer parameter v
VpThe type of a vertex parameter that produces data during workflow execution
vVcA producer parameter v
ςA binary relation between some consumer parameters and some producer parameters
dom(ς)The domain of the relation ς
ρA binary relation between some producer parameters and some consumer parameters
dom(ρ)The domain of the relation ρ
kThe sum of the number of edges in all the data forests involved in a multi-level composition
XpA set of producer parameters
YcA set of consumer parameters
E.4

Evaluation (preliminaries)

GαA weighted graph of distributed data flows
VαThe set of parameters in Gα
v(VαVp)A producer parameter v in Gα
v(VαVc)A consumer parameter v in Gα
EαThe set of directed data flows (between parameters) in Gα
eEαA distributed data flow
Ωα(e)The function that maps a distributed data flow eEα to a network communication cost
Ωα((oA,iB))The network cost of passing the value from the output parameter oA to the input parameter iB, via a distributed data flow
MThe type of a finite walk of data flows in Gα
MMA finite walk of data flows in Gα
α(M)The function that maps a finite walk MM to the total network cost of routing data from a producer parameter p(VαVp) to a consumer parameter q(VαVc)
ω(v)The network cost of writing a value produced by vVp
Γ(v)The network cost of reading a value for vVc
GβA weighted graph of decentralized data flows
VβThe set of parameters in Gβ
v(VβVp)A producer parameter v in Gβ
v(VβVc)A consumer parameter v in Gβ
EβThe set of directed data flows (between parameters) in Gβ
eEαA decentralized data flow
Ωβ(e)The function that maps a decentralized data flow eEβ to a network communication cost
Ωβ((oA,iB))The network cost of passing the value from the output parameter oA to the input parameter iB, via a decentralized data flow
The operator for function composition
Π1β(e)The tail of a decentralized data flow eEβ
Π2β(e)The head of a decentralized data flow eEβ
β(e)The total network cost of routing data from a producer parameter Π1β(e) to a consumer parameter Π2β(e) s.t. eEβ
GαA weighted graph of distributed data flows
VαThe set of parameters in Gα
v(VαVp)A producer parameter v in Gα
v(VαVc)A consumer parameter v in Gα
EαThe set of directed data flows (between parameters) in Gα
eEαA distributed data flow
Ωα(e)The function that maps a distributed data flow eEα to a network communication cost
Ωα((oA,iB))The network cost of passing the value from the output parameter oA to the input parameter iB, via a distributed data flow
MThe type of a finite walk of data flows in Gα
MMA finite walk of data flows in Gα
α(M)The function that maps a finite walk MM to the total network cost of routing data from a producer parameter p(VαVp) to a consumer parameter q(VαVc)
ω(v)The network cost of writing a value produced by vVp
Γ(v)The network cost of reading a value for vVc
GβA weighted graph of decentralized data flows
VβThe set of parameters in Gβ
v(VβVp)A producer parameter v in Gβ
v(VβVc)A consumer parameter v in Gβ
EβThe set of directed data flows (between parameters) in Gβ
eEαA decentralized data flow
Ωβ(e)The function that maps a decentralized data flow eEβ to a network communication cost
Ωβ((oA,iB))The network cost of passing the value from the output parameter oA to the input parameter iB, via a decentralized data flow
The operator for function composition
Π1β(e)The tail of a decentralized data flow eEβ
Π2β(e)The head of a decentralized data flow eEβ
β(e)The total network cost of routing data from a producer parameter Π1β(e) to a consumer parameter Π2β(e) s.t. eEβ
E.4

Evaluation (preliminaries)

GαA weighted graph of distributed data flows
VαThe set of parameters in Gα
v(VαVp)A producer parameter v in Gα
v(VαVc)A consumer parameter v in Gα
EαThe set of directed data flows (between parameters) in Gα
eEαA distributed data flow
Ωα(e)The function that maps a distributed data flow eEα to a network communication cost
Ωα((oA,iB))The network cost of passing the value from the output parameter oA to the input parameter iB, via a distributed data flow
MThe type of a finite walk of data flows in Gα
MMA finite walk of data flows in Gα
α(M)The function that maps a finite walk MM to the total network cost of routing data from a producer parameter p(VαVp) to a consumer parameter q(VαVc)
ω(v)The network cost of writing a value produced by vVp
Γ(v)The network cost of reading a value for vVc
GβA weighted graph of decentralized data flows
VβThe set of parameters in Gβ
v(VβVp)A producer parameter v in Gβ
v(VβVc)A consumer parameter v in Gβ
EβThe set of directed data flows (between parameters) in Gβ
eEαA decentralized data flow
Ωβ(e)The function that maps a decentralized data flow eEβ to a network communication cost
Ωβ((oA,iB))The network cost of passing the value from the output parameter oA to the input parameter iB, via a decentralized data flow
The operator for function composition
Π1β(e)The tail of a decentralized data flow eEβ
Π2β(e)The head of a decentralized data flow eEβ
β(e)The total network cost of routing data from a producer parameter Π1β(e) to a consumer parameter Π2β(e) s.t. eEβ
GαA weighted graph of distributed data flows
VαThe set of parameters in Gα
v(VαVp)A producer parameter v in Gα
v(VαVc)A consumer parameter v in Gα
EαThe set of directed data flows (between parameters) in Gα
eEαA distributed data flow
Ωα(e)The function that maps a distributed data flow eEα to a network communication cost
Ωα((oA,iB))The network cost of passing the value from the output parameter oA to the input parameter iB, via a distributed data flow
MThe type of a finite walk of data flows in Gα
MMA finite walk of data flows in Gα
α(M)The function that maps a finite walk MM to the total network cost of routing data from a producer parameter p(VαVp) to a consumer parameter q(VαVc)
ω(v)The network cost of writing a value produced by vVp
Γ(v)The network cost of reading a value for vVc
GβA weighted graph of decentralized data flows
VβThe set of parameters in Gβ
v(VβVp)A producer parameter v in Gβ
v(VβVc)A consumer parameter v in Gβ
EβThe set of directed data flows (between parameters) in Gβ
eEαA decentralized data flow
Ωβ(e)The function that maps a decentralized data flow eEβ to a network communication cost
Ωβ((oA,iB))The network cost of passing the value from the output parameter oA to the input parameter iB, via a decentralized data flow
The operator for function composition
Π1β(e)The tail of a decentralized data flow eEβ
Π2β(e)The head of a decentralized data flow eEβ
β(e)The total network cost of routing data from a producer parameter Π1β(e) to a consumer parameter Π2β(e) s.t. eEβ
E.5

Evaluation (experiments)

τThe number of steps in the RQ1 experiment
f(c)The number of distributed data flows when there are c(N[0,τ]) new composites composed by the service SF
g(c)The number of decentralized data flows when there are c(N[0,τ]) new composites composed by the service SF
im(Ωα)The image of the function Ωα
im(ω)The image of the function ω
im(Γ)The image of the function Γ
τThe number of steps in the RQ1 experiment
f(c)The number of distributed data flows when there are c(N[0,τ]) new composites composed by the service SF
g(c)The number of decentralized data flows when there are c(N[0,τ]) new composites composed by the service SF
im(Ωα)The image of the function Ωα
im(ω)The image of the function ω
im(Γ)The image of the function Γ
E.5

Evaluation (experiments)

τThe number of steps in the RQ1 experiment
f(c)The number of distributed data flows when there are c(N[0,τ]) new composites composed by the service SF
g(c)The number of decentralized data flows when there are c(N[0,τ]) new composites composed by the service SF
im(Ωα)The image of the function Ωα
im(ω)The image of the function ω
im(Γ)The image of the function Γ
τThe number of steps in the RQ1 experiment
f(c)The number of distributed data flows when there are c(N[0,τ]) new composites composed by the service SF
g(c)The number of decentralized data flows when there are c(N[0,τ]) new composites composed by the service SF
im(Ωα)The image of the function Ωα
im(ω)The image of the function ω
im(Γ)The image of the function Γ
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.