-
PDF
- Split View
-
Views
-
Cite
Cite
Yuping Hu, Ye Li, Chen Yuan, Helai Huang, Modeling conflict risk with real-time traffic data for road safety assessment: a copula-based joint approach, Transportation Safety and Environment, Volume 4, Issue 3, September 2022, tdac017, https://doi.org/10.1093/tse/tdac017
- Share Icon Share
Abstract
This study proposes a conflict-based traffic safety assessment method by associating conflict frequency and severity with short-term traffic characteristics. Instead of analysing historical crash data, this study employs microscopic trajectory data to quantify the relationship between conflict risk and traffic characteristics. The time-to-collision (TTC) index is used to detect conflicts, and a severity index (SI) is proposed on the basis of time-integrated TTC. With SI, the k-means algorithm is applied to classify the conflict severity level. Then the severity of regional conflict risk is split to three levels. Zero truncated Poisson regression and ordered logit regression methods are employed to estimate the effects of short-term traffic characteristics on conflict frequency and severity, respectively. Furthermore, the copula-based joint modelling method is applied to explore the potential non-linear dependency of conflict risk outcomes. A total of 18 copula models are tested to select the optimal ones. The HighD dataset from Germany is utilized to examine the proposed framework. Both between-lane and within-lane factors are considered. Results show that the correlations between traffic characteristics and conflict risk are significant, and the dependency of conflict outcomes varies among different severity levels. The difference of speed variation between lanes significantly influences the conflict frequency and severity simultaneously. Findings indicate that the proposed method is practicable to assess real-time traffic safety within a specific region by using short-term (30-second time interval) traffic characteristics. This study also contributes to develop targeted proactive safety strategies by evaluating road safety based on conflict risk, and considering different severity levels.
1. Introduction
Since crash events are public health problems, there are considerable studies exploring the crash mechanism and how to improve road safety. Most previous safety analysis are based on crash data, which have indicated that traffic characteristics are significant influential factors of road crashes [1]. With the development of intelligent transportation systems, the demand for a proactive safety technique is urgent. Recent studies have proposed that traffic conflict is an effective surrogate safety measure [2]. Some limitations of historical crash data (e.g. under-reporting, long-term collection) can be overcome in the safety analysis by virtue of the traffic conflict technique (TCT) [3, 4]. Conflict-based research has the potential to detect unsafe traffic dynamics before the crash occurrence, which is also beneficial for proactive safety management.
However, current conflict-based analysis mostly considered the microscopic perspective of traffic safety. Individual conflict occurrence was often emphasized, and mainly investigated by utilizing trajectory data and micro-simulation data [5, 6]. There is a lack of relevant work applying TCT to assess real-time road safety toward some macroscopic aspects. It is known that a conflict happens more frequently compared to a crash, so conflicts collection can be achieved in shorter period. Particularly, crash data are difficult to be collected at one newly operated road segment. It may be practicable to estimate the collision risk by quantifying the relationship between traffic characteristics and conflict risk. Additionally, the time-to-collision (TTC) index has been widely used to identify traffic conflicts [7]. Less but more severe conflicts could be detected with a smaller TTC threshold. In fact, frequency and severity are both important aspects, which need to be together considered in conflict-based analysis. Similar to the previous crash-based studies [8–10], some effective statistical methods can be employed to examine that interaction relationship. Previous joint modeling analysis of crash frequency and severity has been a topic worth discussing. The method can provide a more comprehensive and systematic understanding of influential mechanism. In terms of conflict-based research, it is a promising approach in safety analysis, but existing relative work is limited.
Therefore, this study aims to design an approach of real-time conflict risk assessment from a new perspective. The investigation of relationship between conflict and real-time traffic characteristics can fill the current research gap. Alternative to individual conflict occurrence, this study focuses on lane-level conflict risk within specific time and space. The influential mechanism of short-term traffic characteristics and road safety will be better known, if that relationship is validated. The objectives are: (i) associating the vehicular trajectories with regional conflict risk; (ii) exploring the effects of short-term traffic characteristics on conflict risk; and (iii) finding the potential dependence of conflict frequency and severity. This study provides a new idea of evaluating the real-time road safety, and also contributes to the design of proactive countermeasures.
2. Literature review
Traditionally, researchers have explored the influential factors of crash frequency or severity at a macroscopic level [8, 9]. Analysing crash frequency is based on some count-data models methods (Poisson, negative binomial, etc.), and Lord and Mannering [10] provided a systematic review of that. When the dependent variables are count, discrete, and non-zero, Poisson and Zero-truncated Poisson (ZTP) models were compared, and ZTP model was demonstrated to own a considerable advantage [11]. Crash severity is a measure of the societal impact and harm to the society, and the modelling process is mostly based on logistic regression. For example, Jung et al. [12] combined vehicle-to-vehicle crash frequency and severity estimations to examine factor impacts on highway safety in rainy weather, using a negative binomial regression and multinomial logit regression for crash frequency and severity modeling, respectively. As crash data are often classified according to the injury severity or collision type, some researchers attempted to explore the interdependence and develop joint-modelling methods to overcome the limitations of previous analysis [13–15]. The copula-based approach is verified as an efficient way to realize bivariate joint-modelling.
The major strength of using copulas is that the process of estimating marginal distributions is separate from the dependence structure estimation. Zou et al. [16] considered the underreporting in wildlife-vehicle collision (WVC) data, and proposed a copula regression model linking WVC and the underreporting outcome, which combined the logistic and negative binomial regression together. Stipancic et al. [17] used incorporating GPS-derived surrogate safety measures (SSMs) as predictive variables, then developed a full Bayes spatial negative binomial model for crash frequency and a fractional multinomial logit model for crash severity. Yang et al. [18] proposed a multivariate copula-based framework to model crash count and conflict risk measures jointly, which promoted the understanding of the exposure and traffic risk factors and their heterogenous impact on crash count and conflict risk measures. However, copula-based modelling of regional conflict risk has not been conducted in safety literature.
Previous studies demonstrated that there was a strong relationship between crashes and near crashes. The use of near crashes as a crash surrogate for risk assessment was beneficial [19, 20]. Nevertheless, the conflict-based analysis mostly focused on vehicular safety. Recently, some researchers have attached more attention to associating conflicts with the evaluation of segment-level safety. El-Basyouny and Sayed [21] proposed a two-phase model, where a conflict-based negative binomial (NB) safety performance function (SPF) was employed to predict collisions in the second phase. Arun et al. [22] estimated the frequency of severe and non-severe crashes by jointly modelling the indicators of crash frequency and bivariate peak-over-threshold models for both TTC and Delta-V were estimated. As for conflict severity, different thresholds of conflict indicators were proposed to measure the severity of one conflict event [23]. Thus, how to measure conflict severity within a specific unit needs to be clarified. Conflict counts and severity are considered simultaneously in this study, which also promotes understanding of the systematic effects of real-time traffic characteristics.
3. Methodology
The proposed analysis methods include two stages, where the conflict frequency and severity are quantified independently in the first stage, and the interdependence between the two outcome variables is explored in the second stage. The overall framework of this study, including data processing and data analysis, is presented in Fig. 1.

3.1 Conflict risk indicators

Overview of the key indicators: (a) sketch diagram of TTC calculation; (b) graphical representation of TIT.
Compared to other clustering techniques like k-median and hierarchical, k-means is used more widely [6, 30, 31] and has a good interpretability. The k-means method focuses on the smallest difference within each cluster and largest difference between clusters. With SI, the severity level can be classified simply by k-means algorithm as expected.
3.2 Modeling the effects of traffic characteristics on conflict risk
3.2.1 Conflict frequency model
3.2.2 Conflict severity model
3.3 Joint modelling of conflict risk level and count data
Different families of bivariate copula models were compared, including Gaussian, Clayton, Frank, Gumbel, Ali-Mikhail-Haq (AMH) and Farlie-Gumbel-Morgenstern (FGM). Characteristics of those copulas have been introduced in previous studies [16], which are displayed in Table 1.
Model . | Copula |$C( {u,v;\theta } )$| . | Parameter range of |$\theta $| . |
---|---|---|
Gaussian | |${\phi _2}( {{\phi ^{ - 1}}( u ),{\phi ^{ - 1}}( v ),\theta } )$| | |$\theta \in ( { - 1,1} ),\ \theta \ = \ 0$| is independence |
Clayton | |${( {{u^{ - \theta }} + {v^{ - \theta }} - 1} )^{ - 1/\theta }}$| | |$\theta \in ( {0,\infty } ),\ \theta \to 0$| is independence |
Frank | |$- \frac{1}{\theta }\ln \big( {1 + \frac{{( {{e^{ - \theta u}} - 1} )\ ( {{e^{ - \theta v}} - 1} )}}{{{e^{ - \theta }} - 1}}} \big)$| | |$\theta \in ( { - \infty ,\infty } )\backslash \{ 0 \},\ \theta \to 0$| is independence |
Gumbel | |${\rm{exp}}( { - {{[ {{{( { - \ln u} )}^\theta } + {{( { - \ln v} )}^\theta }} ]}^{1/\theta }}} )$| | |$\theta \in [1,\infty ),\ \theta \ = \ 1$| is independence |
AMH | |$\frac{{uv}}{{1 - \theta ( {1 - u} )\ ( {1 - v} )}}$| | |$\theta \in [ - 1,1),\ \theta \to 0$| is independence |
FGM | |$uv[ {1 + \theta ( {1 - u} )( {1 - v} )} ]$| | |$\theta \in ( { - 1,1} ),\ \theta \ = \ 0$| is independence |
Model . | Copula |$C( {u,v;\theta } )$| . | Parameter range of |$\theta $| . |
---|---|---|
Gaussian | |${\phi _2}( {{\phi ^{ - 1}}( u ),{\phi ^{ - 1}}( v ),\theta } )$| | |$\theta \in ( { - 1,1} ),\ \theta \ = \ 0$| is independence |
Clayton | |${( {{u^{ - \theta }} + {v^{ - \theta }} - 1} )^{ - 1/\theta }}$| | |$\theta \in ( {0,\infty } ),\ \theta \to 0$| is independence |
Frank | |$- \frac{1}{\theta }\ln \big( {1 + \frac{{( {{e^{ - \theta u}} - 1} )\ ( {{e^{ - \theta v}} - 1} )}}{{{e^{ - \theta }} - 1}}} \big)$| | |$\theta \in ( { - \infty ,\infty } )\backslash \{ 0 \},\ \theta \to 0$| is independence |
Gumbel | |${\rm{exp}}( { - {{[ {{{( { - \ln u} )}^\theta } + {{( { - \ln v} )}^\theta }} ]}^{1/\theta }}} )$| | |$\theta \in [1,\infty ),\ \theta \ = \ 1$| is independence |
AMH | |$\frac{{uv}}{{1 - \theta ( {1 - u} )\ ( {1 - v} )}}$| | |$\theta \in [ - 1,1),\ \theta \to 0$| is independence |
FGM | |$uv[ {1 + \theta ( {1 - u} )( {1 - v} )} ]$| | |$\theta \in ( { - 1,1} ),\ \theta \ = \ 0$| is independence |
Note:|${\phi _2}$| represents the standard cdf of bivariate normal distribution and |${\phi ^{ - 1}}$| denotes the inverse cdf of the standard univariate normal distribution.
Model . | Copula |$C( {u,v;\theta } )$| . | Parameter range of |$\theta $| . |
---|---|---|
Gaussian | |${\phi _2}( {{\phi ^{ - 1}}( u ),{\phi ^{ - 1}}( v ),\theta } )$| | |$\theta \in ( { - 1,1} ),\ \theta \ = \ 0$| is independence |
Clayton | |${( {{u^{ - \theta }} + {v^{ - \theta }} - 1} )^{ - 1/\theta }}$| | |$\theta \in ( {0,\infty } ),\ \theta \to 0$| is independence |
Frank | |$- \frac{1}{\theta }\ln \big( {1 + \frac{{( {{e^{ - \theta u}} - 1} )\ ( {{e^{ - \theta v}} - 1} )}}{{{e^{ - \theta }} - 1}}} \big)$| | |$\theta \in ( { - \infty ,\infty } )\backslash \{ 0 \},\ \theta \to 0$| is independence |
Gumbel | |${\rm{exp}}( { - {{[ {{{( { - \ln u} )}^\theta } + {{( { - \ln v} )}^\theta }} ]}^{1/\theta }}} )$| | |$\theta \in [1,\infty ),\ \theta \ = \ 1$| is independence |
AMH | |$\frac{{uv}}{{1 - \theta ( {1 - u} )\ ( {1 - v} )}}$| | |$\theta \in [ - 1,1),\ \theta \to 0$| is independence |
FGM | |$uv[ {1 + \theta ( {1 - u} )( {1 - v} )} ]$| | |$\theta \in ( { - 1,1} ),\ \theta \ = \ 0$| is independence |
Model . | Copula |$C( {u,v;\theta } )$| . | Parameter range of |$\theta $| . |
---|---|---|
Gaussian | |${\phi _2}( {{\phi ^{ - 1}}( u ),{\phi ^{ - 1}}( v ),\theta } )$| | |$\theta \in ( { - 1,1} ),\ \theta \ = \ 0$| is independence |
Clayton | |${( {{u^{ - \theta }} + {v^{ - \theta }} - 1} )^{ - 1/\theta }}$| | |$\theta \in ( {0,\infty } ),\ \theta \to 0$| is independence |
Frank | |$- \frac{1}{\theta }\ln \big( {1 + \frac{{( {{e^{ - \theta u}} - 1} )\ ( {{e^{ - \theta v}} - 1} )}}{{{e^{ - \theta }} - 1}}} \big)$| | |$\theta \in ( { - \infty ,\infty } )\backslash \{ 0 \},\ \theta \to 0$| is independence |
Gumbel | |${\rm{exp}}( { - {{[ {{{( { - \ln u} )}^\theta } + {{( { - \ln v} )}^\theta }} ]}^{1/\theta }}} )$| | |$\theta \in [1,\infty ),\ \theta \ = \ 1$| is independence |
AMH | |$\frac{{uv}}{{1 - \theta ( {1 - u} )\ ( {1 - v} )}}$| | |$\theta \in [ - 1,1),\ \theta \to 0$| is independence |
FGM | |$uv[ {1 + \theta ( {1 - u} )( {1 - v} )} ]$| | |$\theta \in ( { - 1,1} ),\ \theta \ = \ 0$| is independence |
Note:|${\phi _2}$| represents the standard cdf of bivariate normal distribution and |${\phi ^{ - 1}}$| denotes the inverse cdf of the standard univariate normal distribution.
3.4 Model evaluation indices
Kendall's tau (τ) is also utilized to characterize and compare the dependency structure in different copula models. A zero-value Kendall's τ stands for independence (i.e. no dependency). It is an effective indicator that satisfies the properties for assessing the dependency between random variables and detailed information can be referred to the previous study [15].
4. Data
4.1 Study area
The vehicle trajectory data were collected from the HighD dataset, which provides post-processed trajectories of 110,000 vehicles from 60 sub-datasets (4K resolution, 25 Hz) at six different locations on freeways near Cologne, Germany, during 2017 and 2018 [33]. All data were collected during sunny and windless weather from 8 AM to 5 PM to maximize the quality of data. The average length of the road segments in each location is about 420 m. The detailed information included the recorded frame (25 frames per second), vehicle ID, vehicle type, longitudinal (x) and lateral (y) positions, velocities, driving direction, surrounding vehicles’ ID, TTC, lane position. More details can be seen on the HighD website [34].
A 30-second time interval was used to collect short-term vehicle information at the lane level without any on-ramps and off-ramps. Fig. 3 displays the schematic diagram of data extraction. There are two freeway types included in the HighD dataset, so five types of the lane layouts were defined. For instance, as shown in Fig. 3, Inner_1 variable denotes the inner lane of a six-lane two-way freeway, with 11 as the corresponding type ID. The Middle_1 (12), Outer_1 (13), Inner_2 (21) and Outer_2 (22) variables are similarly defined. The selected variables include traffic characteristics (e.g. volume, speed) and other factors. Lane-based data were collected from 325 lanes in the HighD data set during a 30-second time interval.

4.2 Data description
The TTC threshold of 4 s was used in this study, thus the conflict events will be detected if 0<TTC<4 s. Given the focus of this study, a total of 1,749 observations involving conflicts were analysed. It should be noted that observations without conflict occurrence were not included. Table 2 displays the amount of origin data and the filtered data collected from each type of lane. Also, the quantity of corresponding data points is presented. For the origin data, there are 47 video recordings of six-lane two-way roads, and the number of data points corresponding to the three types is the same. There are 13 video recordings of four-lane two-way roads. Similarly, data points of two corresponding types are the same. For the filtered data, there are some differences.
Description . | Lane type (type ID) . | Data points . | Origin data . | Filtered data . |
---|---|---|---|---|
Inner lane at six-lane two-way road | Inner_1 (11) | 91 | 3,129 | 216 |
Middle lane at six-lane two-way road | Middle_1 (12) | 91 | 3,129 | 951 |
Outer lane at six-lane two-way road | Outer_1 (13) | 91 | 3,129 | 319 |
Inner lane at four-lane two-way road | Inner_2 (21) | 13 | 756 | 61 |
Outer lane at four-lane two-way road | Outer_2 (21) | 13 | 756 | 202 |
Total | 325 | 10,899 | 1,749 |
Description . | Lane type (type ID) . | Data points . | Origin data . | Filtered data . |
---|---|---|---|---|
Inner lane at six-lane two-way road | Inner_1 (11) | 91 | 3,129 | 216 |
Middle lane at six-lane two-way road | Middle_1 (12) | 91 | 3,129 | 951 |
Outer lane at six-lane two-way road | Outer_1 (13) | 91 | 3,129 | 319 |
Inner lane at four-lane two-way road | Inner_2 (21) | 13 | 756 | 61 |
Outer lane at four-lane two-way road | Outer_2 (21) | 13 | 756 | 202 |
Total | 325 | 10,899 | 1,749 |
Description . | Lane type (type ID) . | Data points . | Origin data . | Filtered data . |
---|---|---|---|---|
Inner lane at six-lane two-way road | Inner_1 (11) | 91 | 3,129 | 216 |
Middle lane at six-lane two-way road | Middle_1 (12) | 91 | 3,129 | 951 |
Outer lane at six-lane two-way road | Outer_1 (13) | 91 | 3,129 | 319 |
Inner lane at four-lane two-way road | Inner_2 (21) | 13 | 756 | 61 |
Outer lane at four-lane two-way road | Outer_2 (21) | 13 | 756 | 202 |
Total | 325 | 10,899 | 1,749 |
Description . | Lane type (type ID) . | Data points . | Origin data . | Filtered data . |
---|---|---|---|---|
Inner lane at six-lane two-way road | Inner_1 (11) | 91 | 3,129 | 216 |
Middle lane at six-lane two-way road | Middle_1 (12) | 91 | 3,129 | 951 |
Outer lane at six-lane two-way road | Outer_1 (13) | 91 | 3,129 | 319 |
Inner lane at four-lane two-way road | Inner_2 (21) | 13 | 756 | 61 |
Outer lane at four-lane two-way road | Outer_2 (21) | 13 | 756 | 202 |
Total | 325 | 10,899 | 1,749 |
With SI value, three severity levels (Details of k-means clustering results are illustrated in Appendix A.) were classified by the k-means clustering algorithm. The overall range of SI value is from 0.25 to 94.1, and the range of each severity level is marked in Fig. 4. Level 1 denotes the lowest severity level, while level 2 is medium and level 3 represents the highest. For level 1, the amount of samples is 997, while level 2 is 610 and level 3 is 142. Conflict frequency distribution at each severity level is also presented in Fig. 4. It indicates that the range of conflict frequency is from 1 to 13, and samples only including one conflict occurrence occupy a large proportion.

SI range and conflict frequency distribution of each severity level.
The description statistics of variables and details of conflict severity clusters are presented in Table 3. The between-lanes factors were considered. For two-way four-lane roads, the difference of between-lane traffic characteristics is easy to calculate. As for two-way six-lane roads, if the subject lane is the on the middle lane, the difference will use an average value that consider lanes on both sides. In the lane-based traffic data set, cross-section volume varies from 4 to 43 vehicles per 30 s, and vehicles’ average speed ranges from 2.72 m/s to 45.87 m/s. Coefficient variation of speed keeps a small scale, and the proportion of truck ranges from 0 to 0.93. Additionally, the data were collected on weekdays, while weekends were not included.
Variables . | Description . | Statistics of continuous variables . | |||
---|---|---|---|---|---|
Min . | Max . | Mean . | Std . | ||
Outcome1 | Number of conflicts, response variable | 1 | 13 | 1.25 | 0.76 |
Volume | Number of vehicles detected at the midsection line (veh/30 s) | 4 | 43 | 17.94 | 6.00 |
Avg_speed | Average speed of vehicles (m/s) | 2.72 | 45.87 | 28.06 | 6.04 |
Std_speed | Standard deviation of vehicles’ speed (m/s) | 0.52 | 8.50 | 3.07 | 1.16 |
Cv_speed | Coefficient variation of vehicles’ speed | 0.02 | 1.02 | 0.12 | 0.08 |
Prop_truck | The proportion of trucks among all vehicles | 0 | 0.93 | 0.21 | 0.23 |
Diff_V | The difference of volume between subject lane and the adjacent lane (veh/30 s) | −16 | 21 | 0.08 | 6.05 |
Diff_AvgS | The difference of Avg_speed between subject lane and the adjacent lane (m/s) | −19.09 | 12.96 | −3.57 | 4.17 |
Diff_StdS | The difference of Std_speed between subject lane and the adjacent lane (m/s) | −7.01 | 4.71 | 0.26 | 1.24 |
Diff_CvS | The difference of Cv_speed between subject lane and the adjacent lane | −0.47 | 0.76 | 0.02 | 0.66 |
Diff_Ptruck | The difference of Prop_truck between subject lane and the adjacent lane | −0.83 | 0.9 | 0.16 | 0.25 |
Variables | Description | Statistics of categorical variables(Proportion of each category) | |||
Lane type | The type of subject lane (11 = Inner_1: inner lane at six-lane two-way freeway; 12 = Middle_1; 13 = Outer_1; 21 = Inner_2; 22 = Outer_2) | 11: 12.35%; 12: 54.37%; 13: 18.24%; 21: 2.49%; 22: 11.55% | |||
Time of day | Whether after or before noon (1 = PM, 0 = AM) | 1: 26.34%; 0:73.76% | |||
Day of week | Day of week indicator (1 = Monday, 2 = Tuesday, 3 = Wednesday, 4 = Thursday, 5 = Friday) | 1: 35.73%; 2: 2.97%; 3: 23.16%; 4: 20.75%; 5: 17.38% | |||
Outcome2 | Severity of conflict risk (1/2/3: low/medium/high level) | 1: 57.00%; 2: 34.88%; 3: 8.12% |
Variables . | Description . | Statistics of continuous variables . | |||
---|---|---|---|---|---|
Min . | Max . | Mean . | Std . | ||
Outcome1 | Number of conflicts, response variable | 1 | 13 | 1.25 | 0.76 |
Volume | Number of vehicles detected at the midsection line (veh/30 s) | 4 | 43 | 17.94 | 6.00 |
Avg_speed | Average speed of vehicles (m/s) | 2.72 | 45.87 | 28.06 | 6.04 |
Std_speed | Standard deviation of vehicles’ speed (m/s) | 0.52 | 8.50 | 3.07 | 1.16 |
Cv_speed | Coefficient variation of vehicles’ speed | 0.02 | 1.02 | 0.12 | 0.08 |
Prop_truck | The proportion of trucks among all vehicles | 0 | 0.93 | 0.21 | 0.23 |
Diff_V | The difference of volume between subject lane and the adjacent lane (veh/30 s) | −16 | 21 | 0.08 | 6.05 |
Diff_AvgS | The difference of Avg_speed between subject lane and the adjacent lane (m/s) | −19.09 | 12.96 | −3.57 | 4.17 |
Diff_StdS | The difference of Std_speed between subject lane and the adjacent lane (m/s) | −7.01 | 4.71 | 0.26 | 1.24 |
Diff_CvS | The difference of Cv_speed between subject lane and the adjacent lane | −0.47 | 0.76 | 0.02 | 0.66 |
Diff_Ptruck | The difference of Prop_truck between subject lane and the adjacent lane | −0.83 | 0.9 | 0.16 | 0.25 |
Variables | Description | Statistics of categorical variables(Proportion of each category) | |||
Lane type | The type of subject lane (11 = Inner_1: inner lane at six-lane two-way freeway; 12 = Middle_1; 13 = Outer_1; 21 = Inner_2; 22 = Outer_2) | 11: 12.35%; 12: 54.37%; 13: 18.24%; 21: 2.49%; 22: 11.55% | |||
Time of day | Whether after or before noon (1 = PM, 0 = AM) | 1: 26.34%; 0:73.76% | |||
Day of week | Day of week indicator (1 = Monday, 2 = Tuesday, 3 = Wednesday, 4 = Thursday, 5 = Friday) | 1: 35.73%; 2: 2.97%; 3: 23.16%; 4: 20.75%; 5: 17.38% | |||
Outcome2 | Severity of conflict risk (1/2/3: low/medium/high level) | 1: 57.00%; 2: 34.88%; 3: 8.12% |
Variables . | Description . | Statistics of continuous variables . | |||
---|---|---|---|---|---|
Min . | Max . | Mean . | Std . | ||
Outcome1 | Number of conflicts, response variable | 1 | 13 | 1.25 | 0.76 |
Volume | Number of vehicles detected at the midsection line (veh/30 s) | 4 | 43 | 17.94 | 6.00 |
Avg_speed | Average speed of vehicles (m/s) | 2.72 | 45.87 | 28.06 | 6.04 |
Std_speed | Standard deviation of vehicles’ speed (m/s) | 0.52 | 8.50 | 3.07 | 1.16 |
Cv_speed | Coefficient variation of vehicles’ speed | 0.02 | 1.02 | 0.12 | 0.08 |
Prop_truck | The proportion of trucks among all vehicles | 0 | 0.93 | 0.21 | 0.23 |
Diff_V | The difference of volume between subject lane and the adjacent lane (veh/30 s) | −16 | 21 | 0.08 | 6.05 |
Diff_AvgS | The difference of Avg_speed between subject lane and the adjacent lane (m/s) | −19.09 | 12.96 | −3.57 | 4.17 |
Diff_StdS | The difference of Std_speed between subject lane and the adjacent lane (m/s) | −7.01 | 4.71 | 0.26 | 1.24 |
Diff_CvS | The difference of Cv_speed between subject lane and the adjacent lane | −0.47 | 0.76 | 0.02 | 0.66 |
Diff_Ptruck | The difference of Prop_truck between subject lane and the adjacent lane | −0.83 | 0.9 | 0.16 | 0.25 |
Variables | Description | Statistics of categorical variables(Proportion of each category) | |||
Lane type | The type of subject lane (11 = Inner_1: inner lane at six-lane two-way freeway; 12 = Middle_1; 13 = Outer_1; 21 = Inner_2; 22 = Outer_2) | 11: 12.35%; 12: 54.37%; 13: 18.24%; 21: 2.49%; 22: 11.55% | |||
Time of day | Whether after or before noon (1 = PM, 0 = AM) | 1: 26.34%; 0:73.76% | |||
Day of week | Day of week indicator (1 = Monday, 2 = Tuesday, 3 = Wednesday, 4 = Thursday, 5 = Friday) | 1: 35.73%; 2: 2.97%; 3: 23.16%; 4: 20.75%; 5: 17.38% | |||
Outcome2 | Severity of conflict risk (1/2/3: low/medium/high level) | 1: 57.00%; 2: 34.88%; 3: 8.12% |
Variables . | Description . | Statistics of continuous variables . | |||
---|---|---|---|---|---|
Min . | Max . | Mean . | Std . | ||
Outcome1 | Number of conflicts, response variable | 1 | 13 | 1.25 | 0.76 |
Volume | Number of vehicles detected at the midsection line (veh/30 s) | 4 | 43 | 17.94 | 6.00 |
Avg_speed | Average speed of vehicles (m/s) | 2.72 | 45.87 | 28.06 | 6.04 |
Std_speed | Standard deviation of vehicles’ speed (m/s) | 0.52 | 8.50 | 3.07 | 1.16 |
Cv_speed | Coefficient variation of vehicles’ speed | 0.02 | 1.02 | 0.12 | 0.08 |
Prop_truck | The proportion of trucks among all vehicles | 0 | 0.93 | 0.21 | 0.23 |
Diff_V | The difference of volume between subject lane and the adjacent lane (veh/30 s) | −16 | 21 | 0.08 | 6.05 |
Diff_AvgS | The difference of Avg_speed between subject lane and the adjacent lane (m/s) | −19.09 | 12.96 | −3.57 | 4.17 |
Diff_StdS | The difference of Std_speed between subject lane and the adjacent lane (m/s) | −7.01 | 4.71 | 0.26 | 1.24 |
Diff_CvS | The difference of Cv_speed between subject lane and the adjacent lane | −0.47 | 0.76 | 0.02 | 0.66 |
Diff_Ptruck | The difference of Prop_truck between subject lane and the adjacent lane | −0.83 | 0.9 | 0.16 | 0.25 |
Variables | Description | Statistics of categorical variables(Proportion of each category) | |||
Lane type | The type of subject lane (11 = Inner_1: inner lane at six-lane two-way freeway; 12 = Middle_1; 13 = Outer_1; 21 = Inner_2; 22 = Outer_2) | 11: 12.35%; 12: 54.37%; 13: 18.24%; 21: 2.49%; 22: 11.55% | |||
Time of day | Whether after or before noon (1 = PM, 0 = AM) | 1: 26.34%; 0:73.76% | |||
Day of week | Day of week indicator (1 = Monday, 2 = Tuesday, 3 = Wednesday, 4 = Thursday, 5 = Friday) | 1: 35.73%; 2: 2.97%; 3: 23.16%; 4: 20.75%; 5: 17.38% | |||
Outcome2 | Severity of conflict risk (1/2/3: low/medium/high level) | 1: 57.00%; 2: 34.88%; 3: 8.12% |
5. Results and discussions
5.1 Conflict risk independent modelling results
For the independent modelling process, the results are presented in Table 4. Significant variables are included in the final models. The relationship between influential factors and the two outcome variables can be quantified by independent ZTP and OL modelling, respectively. Based on the estimated coefficient, six variables were found to be significant in the ZTP model and five variables were significant in the OL model. Among them, Diff_StdS and Diff_CvS were both statistically significant in the two models, but the corresponding sign of coefficient was reversed. If the difference of speed's standard deviation between subject lane and the adjacent lane increases, the conflict frequency on target lane tends to be higher, but the probability of high-severity conflict risk will decrease. On the contrary, the effect of Diff_CvS on conflict frequency was found to be negative while positive on severity. In the ZTP model, the coefficients of Volume and Cv_speed were positive, while the Prop_truck and Diff_AvgS were negative. For modelling conflict severity, the effects of Std_speed and Diff_CvS were positive, and lane type (Middle_1) was also statistically significant with a positive coefficient, while Avg_speed was negative. Variables including Day of week and Time of day were not significant, thus they were not in the final models.
Type . | Variable . | Coef. . | Std. Err. . | z value . | Pr(>|z|) . |
---|---|---|---|---|---|
Conflict risk outcome variable 1 (frequency modeiing) | Intercept | −1.257 | 0.064 | −19.49 | <0.001 |
Volume | 0.519*** | 0.052 | 9.97 | <0.001 | |
Cv_speed | 0.445*** | 0.033 | 13.28 | <0.001 | |
Prop_truck | −0.193* | 0.079 | −2.43 | 0.015 | |
Diff_AvgS | −0.486*** | 0.074 | −6.56 | <0.001 | |
Diff_StdS | 0.203** | 0.066 | 3.07 | 0.002 | |
Diff_CvS | −0.174*** | 0.045 | −3.85 | <0.001 | |
Conflict risk outcome variable 2 (severity modeling) | Avg_speed | −0.285*** | 0.059 | −4.85 | <0.001 |
Std_speed | 0.155* | 0.064 | 2.41 | 0.016 | |
Diff_StdS | −0.267** | 0.076 | −3.44 | 0.001 | |
Diff_CvS | 0.278** | 0.083 | 3.34 | 0.001 | |
Lanetype_12 | 0.364* | 0.160 | 2.28 | 0.023 | |
Threshold1 | 0.494 | 0.147 | |||
Threshold2 | 2.719 | 0.165 |
Type . | Variable . | Coef. . | Std. Err. . | z value . | Pr(>|z|) . |
---|---|---|---|---|---|
Conflict risk outcome variable 1 (frequency modeiing) | Intercept | −1.257 | 0.064 | −19.49 | <0.001 |
Volume | 0.519*** | 0.052 | 9.97 | <0.001 | |
Cv_speed | 0.445*** | 0.033 | 13.28 | <0.001 | |
Prop_truck | −0.193* | 0.079 | −2.43 | 0.015 | |
Diff_AvgS | −0.486*** | 0.074 | −6.56 | <0.001 | |
Diff_StdS | 0.203** | 0.066 | 3.07 | 0.002 | |
Diff_CvS | −0.174*** | 0.045 | −3.85 | <0.001 | |
Conflict risk outcome variable 2 (severity modeling) | Avg_speed | −0.285*** | 0.059 | −4.85 | <0.001 |
Std_speed | 0.155* | 0.064 | 2.41 | 0.016 | |
Diff_StdS | −0.267** | 0.076 | −3.44 | 0.001 | |
Diff_CvS | 0.278** | 0.083 | 3.34 | 0.001 | |
Lanetype_12 | 0.364* | 0.160 | 2.28 | 0.023 | |
Threshold1 | 0.494 | 0.147 | |||
Threshold2 | 2.719 | 0.165 |
Note:*, **, *** refer to p-value <0.05, <0.01, <0.001 significance level respectively.
Type . | Variable . | Coef. . | Std. Err. . | z value . | Pr(>|z|) . |
---|---|---|---|---|---|
Conflict risk outcome variable 1 (frequency modeiing) | Intercept | −1.257 | 0.064 | −19.49 | <0.001 |
Volume | 0.519*** | 0.052 | 9.97 | <0.001 | |
Cv_speed | 0.445*** | 0.033 | 13.28 | <0.001 | |
Prop_truck | −0.193* | 0.079 | −2.43 | 0.015 | |
Diff_AvgS | −0.486*** | 0.074 | −6.56 | <0.001 | |
Diff_StdS | 0.203** | 0.066 | 3.07 | 0.002 | |
Diff_CvS | −0.174*** | 0.045 | −3.85 | <0.001 | |
Conflict risk outcome variable 2 (severity modeling) | Avg_speed | −0.285*** | 0.059 | −4.85 | <0.001 |
Std_speed | 0.155* | 0.064 | 2.41 | 0.016 | |
Diff_StdS | −0.267** | 0.076 | −3.44 | 0.001 | |
Diff_CvS | 0.278** | 0.083 | 3.34 | 0.001 | |
Lanetype_12 | 0.364* | 0.160 | 2.28 | 0.023 | |
Threshold1 | 0.494 | 0.147 | |||
Threshold2 | 2.719 | 0.165 |
Type . | Variable . | Coef. . | Std. Err. . | z value . | Pr(>|z|) . |
---|---|---|---|---|---|
Conflict risk outcome variable 1 (frequency modeiing) | Intercept | −1.257 | 0.064 | −19.49 | <0.001 |
Volume | 0.519*** | 0.052 | 9.97 | <0.001 | |
Cv_speed | 0.445*** | 0.033 | 13.28 | <0.001 | |
Prop_truck | −0.193* | 0.079 | −2.43 | 0.015 | |
Diff_AvgS | −0.486*** | 0.074 | −6.56 | <0.001 | |
Diff_StdS | 0.203** | 0.066 | 3.07 | 0.002 | |
Diff_CvS | −0.174*** | 0.045 | −3.85 | <0.001 | |
Conflict risk outcome variable 2 (severity modeling) | Avg_speed | −0.285*** | 0.059 | −4.85 | <0.001 |
Std_speed | 0.155* | 0.064 | 2.41 | 0.016 | |
Diff_StdS | −0.267** | 0.076 | −3.44 | 0.001 | |
Diff_CvS | 0.278** | 0.083 | 3.34 | 0.001 | |
Lanetype_12 | 0.364* | 0.160 | 2.28 | 0.023 | |
Threshold1 | 0.494 | 0.147 | |||
Threshold2 | 2.719 | 0.165 |
Note:*, **, *** refer to p-value <0.05, <0.01, <0.001 significance level respectively.
In this study, the independent modelling process focused on quantifying two relationships. By knowing the respective effects of traffic characteristics on the two outcomes of conflict risk, we discovered that conflict frequency and severity are influenced by different factors, especially with some reverse effects. The results imply that the increase of traffic volume or coefficient variation of vehicles’ speed will increase the frequency, while not significant for severity. If the standard deviation of vehicles’ speed is higher, the probability of higher severity will increase. The difference of speed features between subject lane and adjacent lane were identified as significant predictors. Higher between-lane speed variation may be caused by the trucks’ involvement on the road, which tend to be slower than the rest of the traffic [35].
5.2 Copula-based joint modelling results
Considering the possible interdependence that may exist between conflict frequency and different severity levels, a copula method was then employed to explore that relation. Different families of copula structures were investigated, including Gaussian, Clayton, Frank, AMH and FGM. As aforementioned, the conflict severity outcome variable is split into three binary variables, then a total of 18 models were tested. Table 5 shows all the models’ performance. The goodness-of-fit measures of all the models differ slightly. The model structure with the lowest AIC and BIC was selected as the most suitable model. Therefore, the Gaussian copula structure outperforms the other structures for joint modelling count data with level 1 and level 3, while Frank copula structure is the best for level 2.
Copulas . | Level 1 & frequency . | Level 2 & frequency . | Level 3 & frequency . | |||
---|---|---|---|---|---|---|
AIC . | BIC . | AIC . | BIC . | AIC . | BIC . | |
Gaussian | 4,066.470 | 4,159.406 | 4,012.602 | 4,100.071 | 2,714.553 | 2,774.688 |
Clayton | 4,127.374 | 4,220.31 | 4,011.322 | 4,098.791 | 2,720.456 | 2,780.591 |
Frank | 4,068.179 | 4,161.115 | 4,009.911 | 4,097.38 | 2,716.977 | 2,777.112 |
Gumbel | 4,127.374 | 4,220.310 | 4,014.31 | 4,101.778 | 2,714.739 | 2,774.874 |
AMH | 4,068.062 | 4,160.997 | 4,010.573 | 4,098.041 | 2,718.83 | 2,778.965 |
FGM | 4,067.298 | 4,160.234 | 4,010.129 | 4,097.598 | 2,727.42 | 2,836.756 |
Copulas . | Level 1 & frequency . | Level 2 & frequency . | Level 3 & frequency . | |||
---|---|---|---|---|---|---|
AIC . | BIC . | AIC . | BIC . | AIC . | BIC . | |
Gaussian | 4,066.470 | 4,159.406 | 4,012.602 | 4,100.071 | 2,714.553 | 2,774.688 |
Clayton | 4,127.374 | 4,220.31 | 4,011.322 | 4,098.791 | 2,720.456 | 2,780.591 |
Frank | 4,068.179 | 4,161.115 | 4,009.911 | 4,097.38 | 2,716.977 | 2,777.112 |
Gumbel | 4,127.374 | 4,220.310 | 4,014.31 | 4,101.778 | 2,714.739 | 2,774.874 |
AMH | 4,068.062 | 4,160.997 | 4,010.573 | 4,098.041 | 2,718.83 | 2,778.965 |
FGM | 4,067.298 | 4,160.234 | 4,010.129 | 4,097.598 | 2,727.42 | 2,836.756 |
Copulas . | Level 1 & frequency . | Level 2 & frequency . | Level 3 & frequency . | |||
---|---|---|---|---|---|---|
AIC . | BIC . | AIC . | BIC . | AIC . | BIC . | |
Gaussian | 4,066.470 | 4,159.406 | 4,012.602 | 4,100.071 | 2,714.553 | 2,774.688 |
Clayton | 4,127.374 | 4,220.31 | 4,011.322 | 4,098.791 | 2,720.456 | 2,780.591 |
Frank | 4,068.179 | 4,161.115 | 4,009.911 | 4,097.38 | 2,716.977 | 2,777.112 |
Gumbel | 4,127.374 | 4,220.310 | 4,014.31 | 4,101.778 | 2,714.739 | 2,774.874 |
AMH | 4,068.062 | 4,160.997 | 4,010.573 | 4,098.041 | 2,718.83 | 2,778.965 |
FGM | 4,067.298 | 4,160.234 | 4,010.129 | 4,097.598 | 2,727.42 | 2,836.756 |
Copulas . | Level 1 & frequency . | Level 2 & frequency . | Level 3 & frequency . | |||
---|---|---|---|---|---|---|
AIC . | BIC . | AIC . | BIC . | AIC . | BIC . | |
Gaussian | 4,066.470 | 4,159.406 | 4,012.602 | 4,100.071 | 2,714.553 | 2,774.688 |
Clayton | 4,127.374 | 4,220.31 | 4,011.322 | 4,098.791 | 2,720.456 | 2,780.591 |
Frank | 4,068.179 | 4,161.115 | 4,009.911 | 4,097.38 | 2,716.977 | 2,777.112 |
Gumbel | 4,127.374 | 4,220.310 | 4,014.31 | 4,101.778 | 2,714.739 | 2,774.874 |
AMH | 4,068.062 | 4,160.997 | 4,010.573 | 4,098.041 | 2,718.83 | 2,778.965 |
FGM | 4,067.298 | 4,160.234 | 4,010.129 | 4,097.598 | 2,727.42 | 2,836.756 |
These results indicate that the correlation between conflict frequency and severity varies among different levels. For level 1 and level 3, the dependence of two response variables reflects the Gaussian copula structure, while for level 2, the dependence is Frank copula's characteristic. Actually, Gaussian and Frank copulas are referred to as ‘comprehensive copulas’ in terms of their ability to parameterize the full range of stochastic dependency by allowing positive and negative dependence with symmetry in both tails. Nonetheless, as compared to Gaussian copula, the Frank copula is characterized by stronger dependence in the middle of distribution and weaker dependence in distribution tails. Such correlations may occur due to some omitted or unobserved variables, measurement errors, or from simultaneous causality [36].
Table 6 shows the coefficient estimation results of the optimal copula models. The Kendall's |$\tau $| values can examine the level of dependence, which are also reported in Table 6. Results reveal that the dependency between the level 1 and count data is negative, and the dependency is positive in another two models. The variables, which are significant in the independent models, also show statistical significance in the copula models. It is clear that the impact of explanatory variables on the frequency follows the same trend in the independent ZTP and three copula models. However, the magnitude of those estimated coefficients and the effects of variables on severity levels differ. When focusing on whether it is low severity (level 1), the Diff_StdS and Diff_CvS have similar effects on the two outcome variables. If Diff_StdS increases, both the frequency and the probability of low-severity risk will increase. Meanwhile, both frequency and probability of low-severity risk tend to be lower with higher Diff_CvS.
Parameter . | Level 1 & frequency (Gaussian copula) . | Level 2 & frequency (Frank copula) . | Level 3 & frequency (Gaussian copula) . | |||
---|---|---|---|---|---|---|
Binary logit . | ZTP . | Binary logit . | ZTP . | Binary logit . | ZTP . | |
. | Coef. (Std. Err.) . | Coef. (Std. Err.) . | Coef. (Std. Err.) . | Coef. (Std. Err.) . | Coef. (Std. Err.) . | Coef. (Std. Err.) . |
Intercept | 0.493 (0.147) | −1.248 (0.064) | −1.023(0.157) | −1.257 (0.064) | −2.506(0.093) | −1.251(0.064) |
Volume | — | 0.508*** (0.051) | — | 0.522*** (0.052) | — | 0.507*** (0.052) |
Avg_speed | 0.258*** (0.060) | — | −0.131* (0.056) | — | −0.322*** (0.078) | — |
Std_speed | −0.137* (0.067) | — | 0.118* (0.065) | — | — | — |
Cv_speed | — | 0.436*** (0.033) | — | 0.443*** (0.033) | — | 0.442*** (0.033) |
Prop_truck | — | −0.189* (0.079) | — | −0.187* (0.079) | — | −0.194* (0.079) |
Diff_AvgS | — | −0.457*** (0.073) | — | −0.486*** (0.074) | — | −0.465*** (0.073) |
Diff_StdS | 0.288** (0.088) | 0.200** (0.066) | −0.131* (0.063) | 0.208** (0.066) | — | 0.200** (0.065) |
Diff_CvS | −0.291** (0.104) | −0.165*** (0.045) | — | −0.179*** (0.045) | 0.189** (0.069) | −0.169*** (0.045) |
Lanetype_12 | −0.354* (0.161) | — | 0.518** (0.171) | — | — | — |
Lanetype_22 | — | — | 0.460* (0.219) | — | — | — |
Kendall's |${{\tau }}$| | −0.234 (−0.290, −0.166) | 0.152 (0.096, 0.215) | 0.168 (0.098, 0.241) |
Parameter . | Level 1 & frequency (Gaussian copula) . | Level 2 & frequency (Frank copula) . | Level 3 & frequency (Gaussian copula) . | |||
---|---|---|---|---|---|---|
Binary logit . | ZTP . | Binary logit . | ZTP . | Binary logit . | ZTP . | |
. | Coef. (Std. Err.) . | Coef. (Std. Err.) . | Coef. (Std. Err.) . | Coef. (Std. Err.) . | Coef. (Std. Err.) . | Coef. (Std. Err.) . |
Intercept | 0.493 (0.147) | −1.248 (0.064) | −1.023(0.157) | −1.257 (0.064) | −2.506(0.093) | −1.251(0.064) |
Volume | — | 0.508*** (0.051) | — | 0.522*** (0.052) | — | 0.507*** (0.052) |
Avg_speed | 0.258*** (0.060) | — | −0.131* (0.056) | — | −0.322*** (0.078) | — |
Std_speed | −0.137* (0.067) | — | 0.118* (0.065) | — | — | — |
Cv_speed | — | 0.436*** (0.033) | — | 0.443*** (0.033) | — | 0.442*** (0.033) |
Prop_truck | — | −0.189* (0.079) | — | −0.187* (0.079) | — | −0.194* (0.079) |
Diff_AvgS | — | −0.457*** (0.073) | — | −0.486*** (0.074) | — | −0.465*** (0.073) |
Diff_StdS | 0.288** (0.088) | 0.200** (0.066) | −0.131* (0.063) | 0.208** (0.066) | — | 0.200** (0.065) |
Diff_CvS | −0.291** (0.104) | −0.165*** (0.045) | — | −0.179*** (0.045) | 0.189** (0.069) | −0.169*** (0.045) |
Lanetype_12 | −0.354* (0.161) | — | 0.518** (0.171) | — | — | — |
Lanetype_22 | — | — | 0.460* (0.219) | — | — | — |
Kendall's |${{\tau }}$| | −0.234 (−0.290, −0.166) | 0.152 (0.096, 0.215) | 0.168 (0.098, 0.241) |
Parameter . | Level 1 & frequency (Gaussian copula) . | Level 2 & frequency (Frank copula) . | Level 3 & frequency (Gaussian copula) . | |||
---|---|---|---|---|---|---|
Binary logit . | ZTP . | Binary logit . | ZTP . | Binary logit . | ZTP . | |
. | Coef. (Std. Err.) . | Coef. (Std. Err.) . | Coef. (Std. Err.) . | Coef. (Std. Err.) . | Coef. (Std. Err.) . | Coef. (Std. Err.) . |
Intercept | 0.493 (0.147) | −1.248 (0.064) | −1.023(0.157) | −1.257 (0.064) | −2.506(0.093) | −1.251(0.064) |
Volume | — | 0.508*** (0.051) | — | 0.522*** (0.052) | — | 0.507*** (0.052) |
Avg_speed | 0.258*** (0.060) | — | −0.131* (0.056) | — | −0.322*** (0.078) | — |
Std_speed | −0.137* (0.067) | — | 0.118* (0.065) | — | — | — |
Cv_speed | — | 0.436*** (0.033) | — | 0.443*** (0.033) | — | 0.442*** (0.033) |
Prop_truck | — | −0.189* (0.079) | — | −0.187* (0.079) | — | −0.194* (0.079) |
Diff_AvgS | — | −0.457*** (0.073) | — | −0.486*** (0.074) | — | −0.465*** (0.073) |
Diff_StdS | 0.288** (0.088) | 0.200** (0.066) | −0.131* (0.063) | 0.208** (0.066) | — | 0.200** (0.065) |
Diff_CvS | −0.291** (0.104) | −0.165*** (0.045) | — | −0.179*** (0.045) | 0.189** (0.069) | −0.169*** (0.045) |
Lanetype_12 | −0.354* (0.161) | — | 0.518** (0.171) | — | — | — |
Lanetype_22 | — | — | 0.460* (0.219) | — | — | — |
Kendall's |${{\tau }}$| | −0.234 (−0.290, −0.166) | 0.152 (0.096, 0.215) | 0.168 (0.098, 0.241) |
Parameter . | Level 1 & frequency (Gaussian copula) . | Level 2 & frequency (Frank copula) . | Level 3 & frequency (Gaussian copula) . | |||
---|---|---|---|---|---|---|
Binary logit . | ZTP . | Binary logit . | ZTP . | Binary logit . | ZTP . | |
. | Coef. (Std. Err.) . | Coef. (Std. Err.) . | Coef. (Std. Err.) . | Coef. (Std. Err.) . | Coef. (Std. Err.) . | Coef. (Std. Err.) . |
Intercept | 0.493 (0.147) | −1.248 (0.064) | −1.023(0.157) | −1.257 (0.064) | −2.506(0.093) | −1.251(0.064) |
Volume | — | 0.508*** (0.051) | — | 0.522*** (0.052) | — | 0.507*** (0.052) |
Avg_speed | 0.258*** (0.060) | — | −0.131* (0.056) | — | −0.322*** (0.078) | — |
Std_speed | −0.137* (0.067) | — | 0.118* (0.065) | — | — | — |
Cv_speed | — | 0.436*** (0.033) | — | 0.443*** (0.033) | — | 0.442*** (0.033) |
Prop_truck | — | −0.189* (0.079) | — | −0.187* (0.079) | — | −0.194* (0.079) |
Diff_AvgS | — | −0.457*** (0.073) | — | −0.486*** (0.074) | — | −0.465*** (0.073) |
Diff_StdS | 0.288** (0.088) | 0.200** (0.066) | −0.131* (0.063) | 0.208** (0.066) | — | 0.200** (0.065) |
Diff_CvS | −0.291** (0.104) | −0.165*** (0.045) | — | −0.179*** (0.045) | 0.189** (0.069) | −0.169*** (0.045) |
Lanetype_12 | −0.354* (0.161) | — | 0.518** (0.171) | — | — | — |
Lanetype_22 | — | — | 0.460* (0.219) | — | — | — |
Kendall's |${{\tau }}$| | −0.234 (−0.290, −0.166) | 0.152 (0.096, 0.215) | 0.168 (0.098, 0.241) |
The significant variables are less in the high-level model than lower one. Diff_CvS is not a significant variable for severity in the second model. Compared to the same significant variables in the first model, the sign of corresponding coefficients is reversed. Besides, the Lanetype_22 is found to be statistically significant in the second model. In the third joint model, there are just two variables found to influence the probability of high-severity risk significantly. Only the Diff_CvS influences both outcome variables simultaneously. The increase of Diff_CvS will cause a higher probability of high-severity risk, but decrease the frequency. Consistent with previous studies, the related variables of speed influence traffic safety greatly [37, 38]. In summary, the influential mechanism of conflict risk varies under each severity level. The proposed joint model provides a more reasonable means of conflict risk modelling.
6. Conclusions and recommendations
This paper focused on modelling conflict risk, including quantifying the effects of traffic characteristics on conflict and revealing the dependency between conflict frequency and severity levels. Trajectory data of a HighD dataset from Germany were utilized to collect lane-based data, and 30-second time interval was considered. In other words, this paper aimed at estimating conflict risk on a target lane within specific spatio-temporal unit. Time-to-collision (TTC) index was used to detect conflicts, and a severity index (SI) was proposed on the basis of time-integrated TTC (TIT) indicator. The severity of regional conflict risk could be defined with the SI, then three severity levels were classified by the k-means clustering method. Moreover, conflict severity was regarded as an ordinal variable, while each level was separately set as a binary variable in the joint modelling process. Subsequently, a zero truncated Poisson (ZTP) model was established to quantify the relationship between traffic characteristics and conflict frequency and an ordered logit (OL) model was used to estimate effects of the same factors on conflict severity. Moreover, the copula-based joint modelling method is applied to explore the potential non-linear dependency of conflict risk outcomes, and different copula structures were tested. The joint modelling process was mainly to reveal the different correlations between conflict frequency and different severity levels, along with the different impacts of explanatory variables.
The main findings can be summarized as follows:
The severity of regional conflict risk can be identified and classified by using the proposed severity index (SI). The effects of traffic characteristics on conflict severity can be well estimated by the ordered logit model. The standard deviation of vehicles’ speed influences the severity positively on the subject lane, while the difference of between-lane speed's standard deviation affects it negatively.
The zero truncated Poisson model can well quantify the relationship between traffic characteristics and conflict frequency. Traffic volume owns the largest positive effect on conflict frequency. The speed's coefficient variation and the difference of between-lanes speed's standard deviation also influence the frequency positively.
Correlation varies under different severity levels. For levels 1 and 3, Gaussian copula reveals the dependency with frequency, while for level 2, the optimal joint model is Frank copula. The impact of explanatory variables on the frequency follows the same trend in the independent ZTP and three copula models. Different effects of variables on each severity level can be revealed in the joint models.
Since the relationship may differ under different severity levels, it is essential to know the dependency effects across conflict count components by each severity level. The copula-based method is demonstrated to be an effective way of reflecting the dependency among conflict counts and three severity levels. In this study, the independent model is mainly for understanding the effects of traffic characteristics on two conflict outcomes, respectively. Then the potential dependency among each case is considered in the joint model. This is a first attempt to estimate regional conflict risk considering the different traffic interactions among low-, medium- and high-severity levels. For practical applications, the copula-based approach can provide a new perspective of real-time traffic safety evaluation. With the calibrated joint model, the conflict risk can be assessed by short-term traffic characteristics that collected from detectors. For different severity levels, the traffic factors significantly affecting conflict risk can be different. Then, targeted countermeasures can be implemented by controlling the key factors’ effects at each level. The case that with larger amount of conflict counts under high severity level is undesired. Thus, developing a related proactive safety strategy to reduce the risk is urgent.
Limitations of the present study offer some directions for the future research. First, the study area was limited to freeways during clear and windless weather in Germany, and the length of each spatial slice was limited to 420 m due to the limitation of the HighD data set. Besides, the issue was simplified in this study that each severity level was set as three binary variables, and estimating three models may not be enough to exactly explore the interdependence between conflict frequency and severity. Finally, it would be valuable to consider a more feasible indicator to measure the severity of regional risk, and validate the transferability and temporal stability of the current findings.
ACKNOWLEDGEMENTS
This research was sponsored by the National Natural Science Foundation of China (Grant Nos. 71901223, 71971222); the Natural Science Foundation of Hunan Province (Grant No. 2021JJ40746), and the Fundamental Research Funds for the Central Universities of Central South University (Grant No. 1053320214771). The authors are also grateful for the assistance of Dr. Pengpeng Xu.
Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
References
Appendix A.
As the clustering object, SI is a one-dimensional numerical value. To classify the severity of conflict risk, k-means method is employed based on SPSS software. Fig. A1(a) displays the sum of the squared errors (SSE) varying among different number of clusters. When the number of clusters is less than 3, SSE declines greatly, and the change of SSE slows down when the number is above 3. Besides, because the silhouette coefficient is a key index to describe the difference inside and outside the cluster, its change curve is presented in Fig. A1(b). A larger value of silhouette coefficient represents a better clustering effect. Although the coefficient reaches the highest point when the number of clusters is 2, it changes rapidly later, thus two clusters may not be appropriate. The overall results show that the optimum number of clusters is three. Therefore, three severity levels of conflict risk are further analysed in this study.

Clustering effect of k-means method: (a) change of SSE with different clusters; (b) Sihouetter coefficient diagram.