-
PDF
- Split View
-
Views
-
Cite
Cite
Gianluca Mastrantonio, The Modelling of Movement of Multiple Animals that Share Behavioural Features, Journal of the Royal Statistical Society Series C: Applied Statistics, Volume 71, Issue 4, August 2022, Pages 932–950, https://doi.org/10.1111/rssc.12561
- Share Icon Share
Abstract
In this work, we propose a model that can be used to infer the behaviour of multiple animals. Our proposal is defined as a set of hidden Markov models that are based on the sticky hierarchical Dirichlet process, with a shared base-measure, and a step and turn with an attractive point (STAP) emission distribution. The latent classifications are representative of the behaviour assumed by the animals, which is described by the STAP parameters. Given the latent classifications, the animals are independent. As a result of the way we formalize the distribution over the STAP parameters, the animals may share, in different behaviours, the set or a subset of the parameters, thereby allowing us to investigate the similarities between them. The hidden Markov models, based on the Dirichlet process, allow us to estimate the number of latent behaviours for each animal, as a model parameter. This proposal is motivated by a real data problem, where the global positioning system (GPS) coordinates of six Maremma Sheepdogs have been observed. Among the other results, we show that four dogs share most of the behaviour characteristics, while two have specific behaviours.
1 INTRODUCTION
Movement data are often based on a time-series of two-dimensional spatial coordinates recorded using a global positioning system (GPS) device attached to an animal. Since the first paper by Dunn and Gipson (1977), the statistical models used to analyse such data have become increasingly popular, and are used to understand different aspects of the movement of animals, ranging from habitat selection (Hebblewhite & Merrill, 2008) to behaviour analysis (Anderson & Lindzey, 2003; Maruotti et al., 2016; Mastrantonio, 2018; Merrill & David Mech, 2000); for a detailed review, the reader may refer to Hooten et al. (2017).
Three major categories of movement-description models can be identified in the behaviour modelling framework: biased random walks (BRWs), correlated random walks (CRWs), and bias and correlated random walks (BCRWs). In a BRW, the animal movement is attracted (or biased) toward a point in space, which is called center-of-attraction (see, e.g., Blackwell, 1997; Dunn & Gipson, 1977). The center-of-attraction can be interpreted as a proxy of the home-range (Christ et al., 2008) or it can describe a movement toward a patch of space (McClintock et al., 2012). In a CRW, the movement direction, at any given time, depends on the previous direction. This characteristic is called directional persistence (Jonsen et al., 2005) and it is useful to describe a constant change in direction between consecutive observations. If both directional persistence and attractors are used to describe a movement, the model is a BCRW (Codling et al., 2008; Fortin et al., 2005; McClintock et al., 2012). A movement-description model is generally used as an emission distribution of a mixture-type model, where a latent cluster-membership variable is used to identify the behaviour assumed by an animal. If the observed time-window is wide enough, the use of a mixture-type model is justified by the assumption that an animal exhibits more than one behaviour during the day (Patterson et al., 2008), e.g., sleeping and hunting. The switching between behaviours is often temporally structured and, if formulated in a discrete-time framework, the model is usually the hidden Markov model (HMM) (Langrock et al., 2012; Michelot et al., 2016).
The literature on the modelling of multiple animals is not as extensive as that on single individuals, even though coordinates of different animals are often recorded. Nonetheless, interest in this topic is increasing (see, e.g., Westley et al., 2018) since, as shown by (Jonsen, 2016), the joint modelling of multiple animals often increases the precision of the estimates. By adopting the classification given by Scharf and Buderman (2020), it is possible to model multiple animals using two approaches. In the first approach, called indirect, the parameters that govern the behaviours are seen as random effects across animals, that is, they come from a common distribution, whose parameters must be estimated, and the animals are conditionally independent (see, e.g., Buderman et al., 2018; McClintock et al., 2013; Michelot et al., 2017). However, in the direct appr-oach, the dependence between animals is described by an unobserved graph or social network, see, e.g., Calabrese et al. (2018), Hooten et al. (2018), Milner et al. (2021) and Niu et al. (2020).
We here propose a Bayesian model which can be used to describe multiple animals that share certain movement characteristics, observed over different time-windows. The model is based on the hierarchical Dirichlet process (HDP) (Teh et al., 2006) and it is a generalization of the sticky hierarchical Dirichlet process HMM (sHDP-HMM) of Fox et al. (2011). In the model, given the latent classification and likelihood parameters, the animals are independent and the behaviours are described by the five parameters of the step and turn with an attractive point (STAP) distribution, which is a BCRW emission-distribution that has recently been proposed by Mastrantonio (2020). The main contributions of the present proposal are the possibility of estimating the number of latent behaviours of each animal as model parameters, and of introducing the sharing of parameters between behaviours and animals in HMMs based on Dirichlet processes (DPs). The former is a by-product of the DP modelling, which allows us to avoid the use of information criteria to select the number of behaviours, which has been shown to be problematic in this context (Pohle et al., 2017), or a trans-dimensional Markov chain Monte Carlo (MCMC) algorithm, such as the reversible jump MCMC (RJMCMC), which presents challenges in its implementation (Hastie & Green, 2012). The sharing of parameters is introduced in the lower level of the model hierarchy, where the distribution over the STAP parameters is defined by combining five different DPs. This distribution is discrete, with a countable number of atoms being defined so that they can share some of their multivariate components. This approach is similar to the one proposed by Mastrantonio et al. (2021), which was used to model climate data in a change-point framework. Therefore, the behaviours within or between animals can have the same value of an STAP parameter, and this allows us to investigate similarities and differences between the analysed animals. Other approaches also exist that have the sharing of parameters as one of their characteristics (see, e.g., Jonsen (2016) or Milner et al. (2021)), but they require one to select, a priori, what parameters are allowed to change and the number of values that a parameter can assume. However, in our proposal, everything is done during the model fitting and driven by the information within the data.
Our proposal has been used to model the trajectories of 6 Maremma Sheepdogs, observed in Australia with recorded coordinates every 30 min. These dogs are used all over Europe and Asia to protect livestock from possible predators and, in recent years, also in Australia (see, e.g., van Bommel & Johnson, 2016; Gehring et al., 2017). Maremma Sheepdogs are able to work in synergy with the shepherd to keep the stock together but this is not always possible when the property is too large. For this reason, the dogs are often left alone and are rarely visited by the shepherd. The owner has no supervision over the dogs and it is therefore interesting to analyse and understand their behaviour. The used dataset was taken from the movebank repository (www.movebank.org) and is described in detail in van Bommel and Johnson (2014a) and van Bommel and Johnson (2014b). With our model, we have identified many similarities and some specific features between dogs, that are easy to interpret and which give a better insight into the behaviour of the dogs. Two competitive models have also been estimated on the same data and the results are compared with our proposal.
The paper is organized as follows. We introduce the STAP density in Section 2, and the hierarchical formalization of our proposal in Section 3, while Section 4 contains the results of the real data application. The paper ends with some conclusive remarks in Section 5. The Web-based supporting materials, available on the web page of the journal, contain details of the MCMC algorithm and on the results of the competitive models.
2 THE STAP DISTRIBUTION
With the aim of better understanding the results of the real data application (Section 4) and the formalization of our proposal, we briefly describe the STAP distribution, which was introduced in Mastrantonio (2020), and its parameters; for a more detailed description the reader may refer to Mastrantonio (2020).
We assume we have a time-series of two-dimensional spatial locations that represent an animal's path, where , and is a temporal index. The coordinates are recorded without any measurement error and the time difference between consecutive points is constant. In order to formalize the STAP, we introduce the bearing angle
and the rotation matrix
where is the two-argument tangent function (Jammalamadaka & Kozubowski, 2004). The bearing-angle measures the direction of the movement between time and , and the rotation matrix can be used to perform a rotation in a two-dimensional space, so that if it multiplies a two-dimensional vector, the vector is rotated anti-clockwise by an angle x. The conditional distribution of is assumed to be second-order Markovian, with the following specification:
where , τ ∈ (0, 1), ρ ∈ [0, 1], and ∑ is a two-dimensional covariance matrix. The location is fixed, and is another parameters that is needed to compute in the conditional distribution of . If the path follows Equation (1), we write , with θ = (μ, η, ∑, τ, ρ).
The movement described by the STAP can have directional-persistence and attraction to a point in space, therefore, the STAP is a BCRW. To better understand these two properties and how they are formalized in Equation (1), we introduce the vector , which is a vector with initial and terminal points equal to and respectively. This vector represents the expected movement between time and , since its initial point is the previously observed location and the terminal one is equal to . If ρ = 0, the STAP reduces to a two-dimensional AR(1) (a BRW), and points to the spatial location μ, which is therefore the attractor. The length of is , which shows that τ measures how much of the total distance between the last observation () and the attractor (μ) is expected to be covered or, in other words, how strong the attraction to μ is. If ρ = 1, the STAP reduces to a CRW, based on a normal density. In this case, the direction of is the same as the direction of , which depends on the previous bearing angle , and thus induces a directional-correlation between consecutive points. If ρ ∈ (0, 1), is a weighted mean between its value in a BRW and a CRW, with weights given by (1 − ρ) and ρ respectively. The covariance matrix of the conditional distribution of is fixed in a BRW (), while it rotates with the bearing-angle for any ρ > 0 (): for more details, the reader may refer to Mastrantonio (2020).
We show examples of STAP densities, and the associated BRW (ρ = 0) and CRW (ρ = 1) in Figure 1, to better understand the differences between the BRW, CRW and BCRW. The dashed arrow in the figure is the movement between times and , the solid arrow is , and the ellipse is a contour of the conditional distribution of with a constant density containing 95% of the total probability mass. From 1 (a), we can see that the direction of and the ellipse change according to in a CRW. However, and the ellipse are independent from the previous direction in a BRW (Figure 1b), and points to the spatial attractor . When ρ ∈ (0, 1), the ellipse and are dependent on both the previous direction and the spatial attractor, see Figures 1c and d.
![Graphical representation of the conditional distribution of si+1 ((a) CRW, (b) BRW, (c), (d) BCRW), for different possible values of si and the previous directions. The dashed arrow represents the movement between si−1 and si. The solid arrow is F→i, while the ellipse is the area containing 95% of the probability mass of the conditional distribution of si+1. μ=(0,0)′, η=(0,6)′, τ = 0.25, ∑=0.2001 in all figures, and the central dot is the location μ, which is the attractor in the biased random walks and the bias and correlated random walks [Colour figure can be viewed at https://dbpia.nl.go.kr]](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/jrsssc/71/4/10.1111_rssc.12561/1/m_rssc12561-fig-1.jpeg?Expires=1750514669&Signature=p7h0IxPhJdvz1sr~ZM6w5DjNj~ndWCq4-y7Ef3bXnbr9xs09zo4bv0jNLyMlcdaT3lb7iwsF7CIuacTaa8USKRgxuiacT~SHZcCJ9TrYAH4QegbjNPe3gIenNpUfxh-yeJ2bro6Bdfgv5DoppUeVRMy3A~6dIaRl0VLv3XGTaIyTrlLNe620MWgfJLqGy95VHjD3pS6v8o3x6NlyBD6czzlGbc3SLwE8W6CNE1zXEK160TXJi6TVE9dLm8ZYn7nlLlSG9Z2Gm0wbfqpsDaFNkLH2wPUv-TYoXGYWZ9I72dN0fqh3HCudwKkK2KXwOTUizg0k4sP1LVFWsYun0qifcQ__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Graphical representation of the conditional distribution of ((a) CRW, (b) BRW, (c), (d) BCRW), for different possible values of and the previous directions. The dashed arrow represents the movement between and . The solid arrow is , while the ellipse is the area containing 95% of the probability mass of the conditional distribution of . , , τ = 0.25, in all figures, and the central dot is the location μ, which is the attractor in the biased random walks and the bias and correlated random walks [Colour figure can be viewed at https://dbpia.nl.go.kr]
3 THE PROPOSED MODEL
In this section, we introduce the components of the model and how they are used to introduce the characteristics of our proposal. We extend the notation of the previous section to describe the path of m animals, and to allow changes in the behaviour to be considered.
We indicate the path of the j th animal with , where j = 1, …, m, and is the set of temporal points, equally spaced in time, where the position of the j th dog is recorded. The sets and can contain different time-points, but the time difference must be constant across animals, that is, for all j = 1, …, m and . We introduce a discrete random variable to represent the animal behaviour at time , where indicates that animal j follows behaviour k at time . Given the behaviour assumed by each animal, the paths are independent and
where . In other words, if the j th animal is following the k th behaviour at time (i.e. ), the path is described by the set of STAP parameters. It should be noted that the k th behaviours are represented by the same set of parameters for all animals.
Let , , , and , then the model we propose is
where , , and . A full description of the components of the model and how they are used to introduce the main novelties of our model is given below.
The DPs. In order to simplify the description of the lower levels of the model hierarchy, we use χ as a variable that can be μ, η, ∑, ν, or ρ, and it is used when whatever is described can be applied to any of the five parameters. In Equation (9) the values of the STAP parameter are sampled from the distribution , and the infinite-dimensional vector of probabilities , associated with parameter χ, is Gem distributed (Gnedin et al., 2001) with scaling parameter. The scaling parameter can easily be interpreted with the stick-breaking representation of the GEM distribution, defined as follows:
From Equation (10), we see that the smaller is, and the smaller is the number of elements of that contains most of the probability mass with and for all p≠1. The vectors and can be used to define the discrete distribution
where is the Dirac delta function. Equation (11) is a draw from a , and thus we can equivalently describe and as the components of a sample from , or as Equations (8) and (9). The vectors and contain, respectively, the values that the parameters can assume () and the ‘base’ probability () that a particular value of the parameter is selected in a behaviour (see Equation (15) below).
The functionsand. The function and (Equations (6) and (7)) introduce the sharing of parameters between behaviours, which is one of the novelties of our proposal. The role of function is to produce the set of STAP parameters , where we remind the reader that . The set is comprised of all the possible combinations of the 5 STAP parameters, without repetitions. This means that , if , but we can have a subset of elements that has the same value, for example, . Hence, since each behaviour selects its STAP parameters in , different behaviours can share parameters, even though they are described by a different .
Function is closely related to since, if we introduce the new variables , , , and that represent what parameter is in , that is,
we can associate a value to which is computed as
The set is the output of and it is a probability vector, since and . We can define the discrete distribution
where, similarly to Equation (11), its atoms contain all the possible values that can assume and is connected to the expected value of the probability of selection as the vector of parameter in a behaviour (see Equation (14) below). This way to define the distribution is closely related to the shared base-distribution of the change-point model of Mastrantonio et al. (2021).
Behaviour switching. Let be the matrix that has as lth row. Matrix rules the switching between the behaviours of animal j (Equation (4)) and if the jth animal is following behaviour l at time , the probability of switching to behaviour k is given by the element of in row l and column k. Hence, the time evolution of is modeled by a discrete first-order Markov process, which defines an HMM with transition matrix and initial state . The initial state is drawn from a Geometric distribution with parameter ε, which is defined as the number of Bernoulli trials needed to have one success. The row is DP distributed (see Equation (5)) and the expected value of is equal to
(see Fox et al., 2011), where δ(l, k) is equal to 1 if l = k, and 0 otherwise. From Equation (14), we can see that the kth element of β is associated with the expected value of , for all the animals (j = 1, …, m) and . Hence, a larger increases the probability of switching from any behaviour l to the kth, described by . However, Equation (14) can also be stated as
which highlights how the value is connected to all , with and j = 1, …, m, so that . Therefore, a larger increases the expected values of all these probabilities, and for this reason we call the ‘base probability’ as . The variable α is the scaling parameter of the DP of Equation (5), which has the same interpretation of , while ν is a weight that is added to the self transitions to increase their expected value, see Equation (14), which in turn is used to reduce the tendency of the HDP-HMM to create redundant behaviours, that is, behaviours with similar parameter vectors. For a more detailed description of parameters α and ν, the reader may refer to Fox et al. (2011).
It should be noted that, in most applications, see, e.g., McClintock et al. (2012) and Leos-Barajas et al. (2017), is assumed, where indicates the maximum number of behaviours, while we have in this work, since we define the HMM using DPs. Thus, the model assumes an infinite and countable number of possible behaviours for each animal, but, since we have a finite number of observed time-points, only a finite number of them can be ‘occupied’; these are generally called ‘non-empty states’ (Frühwirth-Schnatter & Malsiner-Walli, 2019), or, in this context, ‘non-empty behaviours’. The random variable is used to estimate the number of latent behaviours of the jth animal. Parameters α, ν and γ (through β) determine the number of non-empty states , since they are responsible with the total mass associated with each of the elements.
The emission-distribution. The model specification is concluded with the emission distribution, which is given by Equations (2) and (3). It should be noted that, given the latent behaviours, we consider the animals independent but, since they share the same set of atoms , the behaviours of the different animals can be described by the same STAP distribution. From Equation (12), we know that and can have common components and therefore, when an animal changes behaviour, it is not necessary for all the parameters to change and, more importantly, we can also identify the features that two animals share for different behaviours, for example, two behaviours can have the same attractive-point (), even though the strength of attraction () is different. This feature is one of the main novelties of our proposal, and, although other approaches have a similar characteristic (see, e.g., Jonsen (2016) or Milner et al. (2021)), they do not allow the number of latent behaviours to be estimated, which is the other main novelty of our proposal, the possibility of evaluating, during model fitting, what parameters are allowed to change, or the number of values that a parameter can assume. The sharing of a subset of the parameters between states is new in the context of HMMs based on DPs.
Connection to the sHDP-HMM. To conclude this section, we would like to show that our proposal can be considered as a generalization of the sHDP-HMM. The model of Fox et al. (2011) is defined for a single time-series, and is a draw from a DP. It is easy to see that, if we consider only one animal and use only one multivariate parameter in Equation (9), for example, , with the associated vector of probability , the distribution is a draw from a DP, since and . Hence, the model reduces to a sHDP-HMM.
4 REAL DATA APPLICATION
We have the recorded coordinates of 6 dogs, taken every 30 min at the Heatherlie property in Australia, between 2012-11-10 15:30 and 2012-08-02 15:30. The data1 consist of 4801 observations for each dog, with less than 1% of missing points. To facilitate the specification of the prior the coordinates are centered using the bivariate sample mean and scaled with a common standard deviation, computed using both the X and Y coordinates, to maintain the relative scale between the two coordinates; the recorded locations are shown in Figure 2. The dogs are called Woody, Sherlock, Alvin, Rosie, Bear, and Lucy. Rosie and Lucy are female, while the other four are male, and Woody, Sherlock, Bear and Lucy form a cohesive group, which is responsible for livestock protection, while Rosie, due to her advanced age, is solitary, and Alvin suffers from social exclusion, which restricted his movement (van Bommel & Johnson, 2016).

Maremma Sheepdogs originate from Europe, and have been used for centuries to protect livestock from potential predators (Gehring et al., 2017). They are trained to live with the livestock from birth and, as a result, they develop a strong bond with them and an instinct to protect them. They can be fence-trained, but are generally allowed to move freely. The use of livestock guardian dogs is relatively new outside Europe, especially in Australia, and, due to their effectiveness, interest in their use is increasing (van Bommel & Invasive Animals Cooperative Research Centre, 2010; van Bommel & Johnson, 2016). Since the extension of properties in Australia can be as much as several thousand hectares, it is hard for the owner to supervise the dogs (van Bommel & Johnson, 2012) and to have information about their behaviour (van Bommel & Invasive Animals Cooperative Research Centre, 2010).
4.1 Comparison of the model and implementation details
We compare the predictive performances between our proposal (M1) and two competitive models (M2 and M3), using the integrated completed likelihood (ICL) (Biernacki et al., 2000) and the deviance information criteria (DIC) and (Celeux et al., 2006). In the first competitive model (M2), we assume that only the entire vector of STAP parameters can be shared between animals, which means that each time-series is an sHDP-HMM with a share-based distribution. In terms of model formalization, we assume
for the set of atoms and weights of (Equation (13)). In the second competitor (M3), each animal follows the model of Mastrantonio (2020), which means they are completely independent, and there is no sharing of parameters. Hence, we substitute with the animal-specific distribution
where
and then
By changing the way distribution is defined, we aim to show that the main feature of our proposal, that is, the sharing of sets and subsets of parameters, improves the model fitting and leads to a better description of the data.
The models are implemented assuming , , , and . The distribution is assumed to be a mixture of a U(0,1) and two bulks of probability on 0 and 1, with the 3 mixture weights equal to 1/3. This allows (in M1 and M2) and (in M3) to be, a posteriori, equal to 0 or 1 with a greater probability than 0, which allows us to detect a pure CRW or BRW behaviour. We assume and ν/(α + ν) ∼ B(1, 1), which allows us to easily sample from their full conditionals, see Fox et al. (2011) and Section A of the Web-based Supporting Materials. All the distributions are chosen to be weakly informative. The domain is a square [−5, 5] × [−5, 5] and the parameter ε of the Geometric distribution is equal to 0.00001, which means that the distribution over the initial state, , is approximately uniform over the positive integers. Posterior estimates are obtained with 15,000 iterations, burnin 7500, thin 3, and thus 2500 samples are available for posterior inference. Convergence has been checked by means of a visual inspection of the posterior chains and using the statistics (Gelman et al., 2013) Details on the MCMC algorithm, implemented in Julia 1.3 (Bezanson et al., 2017), can be found in the Web-based Supporting Materials, Section A, and the codes used to replicate the results, tables, and figures are available at https://github.com/GianlucaMastrantonio/multiple_animals_movement_model.
In Table 1, we can see that the three indices indicate that our model is the one with the best fit,2 model M2 is the second, while M3 is always the last. Therefore, the joint modelling of the six dogs improves the performances of the model (since M2 is always preferable to M3), but the sharing of a subset of parameters also leads to a better description of the data (since M1 is better than M2). We provide a description of the results obtained with M2 and M3 in the Web-based Supporting Materials, Section B.
Information criteria for the proposed model (M1), sHDP-HHMMs with a common (M2), sHDP-HHMMs with animal-specific (M3). The model selected by each index is indicated in bold
. | M1 . | M2 . | M3 . |
---|---|---|---|
ICL | 209593 | 201880 | 192145 |
DIC5 | −457502 | −441264 | −407823 |
DIC7 | −417544 | −400786 | −376037 |
. | M1 . | M2 . | M3 . |
---|---|---|---|
ICL | 209593 | 201880 | 192145 |
DIC5 | −457502 | −441264 | −407823 |
DIC7 | −417544 | −400786 | −376037 |
Information criteria for the proposed model (M1), sHDP-HHMMs with a common (M2), sHDP-HHMMs with animal-specific (M3). The model selected by each index is indicated in bold
. | M1 . | M2 . | M3 . |
---|---|---|---|
ICL | 209593 | 201880 | 192145 |
DIC5 | −457502 | −441264 | −407823 |
DIC7 | −417544 | −400786 | −376037 |
. | M1 . | M2 . | M3 . |
---|---|---|---|
ICL | 209593 | 201880 | 192145 |
DIC5 | −457502 | −441264 | −407823 |
DIC7 | −417544 | −400786 | −376037 |
4.2 Description and interpretation of the output
Using the algorithm proposed by Wade and Ghahramani (2018), we find a representative behaviour associated with each animal and time, that we indicate as MAP behaviour. We indicate the kth behaviour of the jth dog based on as , and let be the number of times we have , without any loss of generality, we assume , that is, the behaviours are ordered with respect to the number of times they are observed. It should be noted that is not the same as , if , and therefore, to avoid confusion, we indicate the vector of STAP parameters for with . For easiness of interpretation, we only discuss behaviours that have been observed, on average, at least once a day (), thus obtaining then 3 behaviours for each dogs, with the exception of Rosie (dog 4) that has 2 behaviours; see Tables B.1–B.6 in the Web-based Supporting Materials, where the posterior means () and credible intervals (CIs) for the STAP parameters, , and the transition probabilities for all the dogs and behaviours are shown. Using similar pictures to the ones used in Figure 1, we show a graphical description of the behaviours found by the model in Figures 3 and 4; the behaviours are represented on different spatial scales. From the model output, we computed the posterior mean of the variable , which is the posterior probability that a STAP parameter assumes the same value in and . These probabilities are depicted in Figure 5 for all possible combinations of behaviours and animals. To take into account that identifiability for and is only granted if and , respectively (see Equation (1)), we assume , if or , and if or , for .
![Graphical representation of the conditional distribution of sj,tj,i+1for the first three dogs ((a)(b)(c) Woody, (d)(e)(f) Sherlock, (g)(h)(i) Alvin), for different possible values of sj,tj,iand previous directions. The images has been obtained by using the posterior values that maximize the data likelihood of each animal, given the representative clusterization ẑj,tj,i. The dashed arrow represents the movement between sj,tj,i−1and sj,tj,i. The solid arrow is F→j,tj,i, while the ellipse is an area containing 95% of the probability mass of the conditional distribution of sj,tj,i+1. The asterisk represents the attractor, and it is only shown for behaviours that have posterior values of ρj,k<0.9and τj,k>0.1 [Colour figure can be viewed at https://dbpia.nl.go.kr]](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/jrsssc/71/4/10.1111_rssc.12561/1/m_rssc12561-fig-3.jpeg?Expires=1750514669&Signature=Lksfw1Dt27kQJLpQvID~EYW~ljpYDJcKgFzYMyJF6YQv6pBpAKVDdKvC9G0yup46iEKLOyoXTZ3u0g~h0KtsDc8E61HvjiWahDgiMtzKUnggboqqJhbUZqslf5U9rZFY-e-2~ft4GWc~f~kAOf3j51hM-GmZrTNEKcqU0xQLZLGNJ~2wE3OM52WM9tzM~EfXeDiZ3NKBH~HKkbF8nYIuSjQMb1nixL6EFP2XEY2ZuqO2vgWGAtvNXELKn-o7y79E0lWbsHOfUxY9cMIF7dRN9wj4kkCiQdb0XRZAOa5Aaxgm2oJSzi5bAf5XX9z-tkPT2Wajn0-3d7XSJYVcKwa18g__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Graphical representation of the conditional distribution of for the first three dogs ((a)(b)(c) Woody, (d)(e)(f) Sherlock, (g)(h)(i) Alvin), for different possible values of and previous directions. The images has been obtained by using the posterior values that maximize the data likelihood of each animal, given the representative clusterization . The dashed arrow represents the movement between and . The solid arrow is , while the ellipse is an area containing 95% of the probability mass of the conditional distribution of . The asterisk represents the attractor, and it is only shown for behaviours that have posterior values of and [Colour figure can be viewed at https://dbpia.nl.go.kr]
![Graphical representation of the conditional distribution of sj,tj,i+1 for the last three dogs ((a)(b) Rosie, (c)(d)(e) Bear, (f)(g)(h) Lucy), for different possible values of sj,tj,i and previous directions. The images has been obtained by using the posterior values that maximize the data likelihood of each animal, given the representative clusterization ẑj,tj,i. The dashed arrow represents the movement between sj,tj,i−1 and sj,tj,i. The solid arrow is F→j,tj,i, while the ellipse is an area containing 95% of the probability mass of the conditional distribution of sj,tj,i+1. The asterisk represents the attractor, and it is only shown for behaviours that have posterior values of ρj,k<0.9 and τj,k>0.1 [Colour figure can be viewed at https://dbpia.nl.go.kr]](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/jrsssc/71/4/10.1111_rssc.12561/1/m_rssc12561-fig-4.jpeg?Expires=1750514669&Signature=pT08SRHvD90UtfmbzPrlXLU6sqf~zmqraRtTXgXThGACRLcu8oOzFL2lshNTMo7HF5kMa1OnwQLAJF33~Cm2JxPWOagqx6XwuhVQCj3pUNnS6dS-FdCf-d7irggGhUMDIR3lT8cqPSmT3ThA7RY2n-aCCK1E-Z-XkjRHhoanSvphDfNSpbStEWkFLncc~IodvHSlMQOPUYVZozSAIA28o-wvI9ditvGs1oO1iTDAJAdkfi4W4AzVOqjHQWpTrPOHp85nbUAoJx9MpAW3VnAMDvctHNl0JJmCF9a-LwWHKmwN65rJkyeUOZXil73nYl8GhOLAYJR~OTLvC~KNj1InYw__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Graphical representation of the conditional distribution of for the last three dogs ((a)(b) Rosie, (c)(d)(e) Bear, (f)(g)(h) Lucy), for different possible values of and previous directions. The images has been obtained by using the posterior values that maximize the data likelihood of each animal, given the representative clusterization . The dashed arrow represents the movement between and . The solid arrow is , while the ellipse is an area containing 95% of the probability mass of the conditional distribution of . The asterisk represents the attractor, and it is only shown for behaviours that have posterior values of and [Colour figure can be viewed at https://dbpia.nl.go.kr]
![Graphical representation of the posterior mean of δ(·,·), which represents the probability of the parameters in its argument ((a) μj,k, (b) ηj,k, (c) τj,k, (d) ∑j,k, (e) ρj,k) having the same value for two behaviours [Colour figure can be viewed at https://dbpia.nl.go.kr]](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/jrsssc/71/4/10.1111_rssc.12561/1/m_rssc12561-fig-5.jpeg?Expires=1750514669&Signature=DNl70W~eYeV2hfHA8UAncDszMQlzddYjn4rqiaIgEdW3WQsYWfiN6rS~lVHoO5IqB1ocZZFxZD7QKv6bOAiluItcetzcHSuhzlshQqwnxlGOPmyMDMEZ65B7Zu7GMpVu4CiOkGWlS8iOmv7TCoyLF~FrOfdh2Fqg5~31yeOtdgINxTpTt5rAkjfOysojs~J977TRjzg5ycmUMpJae6wN0jLv~-uvm0wmdlFdSObJliQzNnhT9DOaJt7Mq-xvq1Osn95eJyzjcG8mVGeUooIyz0jH5vuJGTN5bVeQowMs1l2leWw7Dafq4jm7--dTC9CO2UwDJKzlvIAalG1NewMNDw__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Graphical representation of the posterior mean of δ(·,·), which represents the probability of the parameters in its argument ((a) , (b) , (c) , (d) , (e) ) having the same value for two behaviours [Colour figure can be viewed at https://dbpia.nl.go.kr]
Similarities between the MAP behaviours. One of the assumptions of our proposal is that the temporal evolution of the behaviours are independent. To have a posterior confirmation that this hypothesis is true, we computed the Adjusted Rand Index (Gates & Ahn, 2017) for the MAP behaviours of each pair of animals. The Adjusted Rand Index, which is a measure of similarity (or agreement) between two clusterizations, has a value close to one, if there is a strong agreement, while its value is close to zero (even negative) if the clusterization is very dissimilar. It should be noted that we are able to compute the index because the animals are observed in the same temporal-window. The results in Table 2 show that the values of the index are very low, with the exception of dogs 1 and 6, where the index is 0.414. We can conclude that our hypothesis is reasonable.
. | Woody . | Sherlock . | Alvin . | Rosie . | Bear . | Lucy . |
---|---|---|---|---|---|---|
Woody | 1.000 | 0.094 | 0.006 | 0.030 | 0.181 | 0.414 |
Sherlock | 0.094 | 1.000 | −0.004 | 0.015 | 0.125 | 0.081 |
Alvin | 0.006 | −0.004 | 1.000 | 0.047 | −0.002 | 0.003 |
Rosie | 0.030 | 0.015 | 0.047 | 1.000 | 0.021 | 0.021 |
Bear | 0.181 | 0.125 | −0.002 | 0.021 | 1.000 | 0.159 |
Lucy | 0.414 | 0.081 | 0.003 | 0.021 | 0.159 | 1.000 |
. | Woody . | Sherlock . | Alvin . | Rosie . | Bear . | Lucy . |
---|---|---|---|---|---|---|
Woody | 1.000 | 0.094 | 0.006 | 0.030 | 0.181 | 0.414 |
Sherlock | 0.094 | 1.000 | −0.004 | 0.015 | 0.125 | 0.081 |
Alvin | 0.006 | −0.004 | 1.000 | 0.047 | −0.002 | 0.003 |
Rosie | 0.030 | 0.015 | 0.047 | 1.000 | 0.021 | 0.021 |
Bear | 0.181 | 0.125 | −0.002 | 0.021 | 1.000 | 0.159 |
Lucy | 0.414 | 0.081 | 0.003 | 0.021 | 0.159 | 1.000 |
. | Woody . | Sherlock . | Alvin . | Rosie . | Bear . | Lucy . |
---|---|---|---|---|---|---|
Woody | 1.000 | 0.094 | 0.006 | 0.030 | 0.181 | 0.414 |
Sherlock | 0.094 | 1.000 | −0.004 | 0.015 | 0.125 | 0.081 |
Alvin | 0.006 | −0.004 | 1.000 | 0.047 | −0.002 | 0.003 |
Rosie | 0.030 | 0.015 | 0.047 | 1.000 | 0.021 | 0.021 |
Bear | 0.181 | 0.125 | −0.002 | 0.021 | 1.000 | 0.159 |
Lucy | 0.414 | 0.081 | 0.003 | 0.021 | 0.159 | 1.000 |
. | Woody . | Sherlock . | Alvin . | Rosie . | Bear . | Lucy . |
---|---|---|---|---|---|---|
Woody | 1.000 | 0.094 | 0.006 | 0.030 | 0.181 | 0.414 |
Sherlock | 0.094 | 1.000 | −0.004 | 0.015 | 0.125 | 0.081 |
Alvin | 0.006 | −0.004 | 1.000 | 0.047 | −0.002 | 0.003 |
Rosie | 0.030 | 0.015 | 0.047 | 1.000 | 0.021 | 0.021 |
Bear | 0.181 | 0.125 | −0.002 | 0.021 | 1.000 | 0.159 |
Lucy | 0.414 | 0.081 | 0.003 | 0.021 | 0.159 | 1.000 |
The dogs in the cohesive group. We can clearly see, from Figures 3 and 4, that the four dogs that form a cohesive group (dog 1 Woody, dog 2 Sherlock, dog 5 Bear and dog 6 Lucy) have similar behaviours. In behaviour , the length of is ≈0, which means that the distribution of is centered on the previous location (), there is no a preferable direction, since the ellipses are very close to a circle, the speed is very low (see the size of the ellipses), and there is no attractor. Hence, the movement is only determined by the covariance matrix , which is the same for all the first behaviours (), see Figure 5d. The first behaviour of the cohesive group can easily be interpreted as boundary-patrolling- or scent-marking-behaviour, which is a common behaviour in this dog breed, and it has already been observed and with similar movement characteristics (Mastrantonio, 2020; McGrew & Blakesley, 1982).
As in , the length of is ≈0 in , there is no attractor but, since the ellipses rotate according to the previous directions, there is directional persistence. The major and minor axes of the ellipses have different lengths, with the major one in the same direction as , which means that we can expect to observe more movements on a straight line, that is, in the same direction as the previous bearing angle , or in direction . The strength of directional persistence, measured by parameter , is very similar for all the , as we can see from Figure 5e, where the probability values are close to 1. The movement speed in increases compared to , and is fully characterized by and . The strong directional persistence, the movement along a straight line, and the higher speed can lead to being interpreted as a defending-behaviour, where the dog defends the territory and livestock from predators, that is, mostly wild dogs and foxes that are present in the area (see Brook et al., 2012; Walton et al., 2017), or an explore-behaviour (van Bommel & Invasive Animals Cooperative Research Centre, 2010; Mastrantonio, 2020).
The third behaviour is a BRW, since the CIs of are very close to 0 and the ellipses are independent of the previous direction (see Figures 3 and 4). The CIs of , which are in [0.1, 0.28], indicate a moderate attraction to , see Tables B.1, B.2, B.5 and B.6 in the Web-based Supporting Materials. The four dogs have the same spatial attractor, , the same covariance matrix, , the same parameter and, with the exception of the second dog, the same , as we can see from Figures 3–5. The spatial attractor, due to the large variance of the movement (the ellipses size), can be considered as a tendency of these dogs to move to the central patch of the space and, since we can see from figure c2 of van Bommel and Johnson (2012) that the attractor is close to where the livestock is, it is easy to interpret this behaviour as the dog attending livestock.
It is interesting to note that the parameters that define the three behaviours, that is, in , () in , () in , have a high probability of being the same between dogs, see Figure 5, which means that they behave in a similar manner. In transition matrix terms, we can see, from Tables B.1, B.2, B.5, B.6 in the Web-based Supporting Materials, that and . Then, after patrolling (), it is more probable that the dog begins to explore the space, or to defend the property from predators spotted during patrolling (), than to guard livestock (). After attending livestock, it is more probable that the dog switches to .
The socially excluded dog and the old one. The socially excluded dog (j = 3) has 3 behaviours, one of which, , is different from all the other dogs’ behaviours, while the other two, and , are similar to the ones of the cohesive group, for example, is similar to and is similar to , with j = 1, 2, 5, 6, see Figures 3 and 4. is only characterized by the covariance matrix , since like in the cohesive group, is ≈0 and there is no attractor or directional persistence. Therefore, due to the high probability of having the same value as , with j = 1, 2, 5, 6, see Figure 5d, may be interpreted in the same way as the first behaviours of the cohesive group, that is, a boundary-patrolling behaviour.
is similar to the second behaviour of the cohesive group in terms of covariance matrix, see Figure 5d, but there is a lack of directional persistence, and a slight bias toward an attractor located in the central area, see the direction of . With this behaviour, the dog is exploring but, at the same time, staying in the proximity of the sheep paddock, see Figure 2c.
The CI of is very close to 1 and therefore has a strong attraction to the coordinates , as we can see from Table B.3 of the Web-based Supporting Materials. The CIs of and are very small and equal to (0.575, 0.576) and (−0.381, 0.380), respectively, which means that the attractor is well localized in space. We can see from figure c2 of van Bommel and Johnson (2012), that is close to the owner's homestead. This behaviour cannot be interpreted as the dog attending livestock since, once it reaches the spatial attractor, it does not move very much, that is, the sizes of the ellipses are very small. Hence, this is probably a behaviour in which the dog stays close to the owner's house, and rests.
For the last dog, that is the old one, the model has found only two behaviours. has the same characteristics as the first behaviour of the cohesive group, with the same values of the covariance matrix, while is similar to the second behaviour of the cohesive group, with the same covariance matrix and similar directional persistence, see Figures 3–5. Hence, they can be interpreted as the first and second behaviours of the cohesive group. It is interesting to note that, since this dog is very old, she is no longer able to attend livestock, which is the activity that requires the most energy (higher speed), but she is still working, that is, checking the boundaries (), exploring the space, and defending the territory ().
5 FINAL REMARKS
In this work, we have proposed a new HMM which can be used to model trajectory-tracking data of multiple animals and which, according to the classification given by Scharf and Buderman (2020), is part of the indirect approach. Our model allows subsets of parameters to be shared between animals and behaviours, and the number of latent behaviours to be selected during the model fitting. The emission distribution is the STAP, but other distributions can be used by changing the model formalization accordingly. The model was used to help understand the behaviour of 6 Maremma Sheepdogs, observed in a property in Australia. The results show that there are many common features between the animals, such as the attractive point, and most of them share the same number of behaviours as well as the same parameter values. The obtained results are easily interpretable, and the rich output offers an insight into the similarities between animals.
As a possible extension, we are currently exploring the use of covariates to model the probability that behaviours share parameters, and we are working on a different formalization that makes the model able to detect whether different animals tend to follow the same behaviours at the same time-points.
DATA AVAILABILITY STATEMENT
The dataset is freely available from the movebank repository https://www.datarepository.movebank.org/handle/10255/move.395
SUPPORTING INFORMATION
Additional supporting information may be found in the online version of the article at the publisher's website.
The dataset is freely available from the movebank repository https://www.datarepository.movebank.org/handle/10255/move.395
It should be noted that a higher ICL and a lower DIC indicate a better fit.
ACKNOWLEDGEMENTS
The authors thank the Editor-in-Chief, the Associate Editor and the two anonymous reviewers for their comments that have greatly improved the manuscript. The work of the author has partially been developed under the MIUR grant Dipartimenti di Eccellenza 2018—2022 (E11G18000350001), conferred to the Dipartimento di Scienze Matematiche—DISMA,