Predicting and Interpreting Spatial Accidents through MDLSTM

Predicting and interpreting the spatial location and causes of traffic accidents is one of the current hot topics in traffic safety. This research purposed a multi-dimensional long-short term memory neural network model (MDLSTM) to fit the non-linear relationships between traffic accident characteristics and land use properties, which are further interpreted to form local and general rules. More variables are taken into account as the input land use properties and the output traffic accident characteristics. Five types of traffic accident characteristics are simultaneously predicted with higher accuracy, and three levels of interpretation, including the hidden factor-traffic potential, the potential-determine factors, which varies between grid cells, and the general rules across the whole study area are analyzed. Based on the model, some interesting insights were revealed including the division line in the potential traffic accidents in Shenyang (China). It is also purposed that the relationship between land use and accidents differ from previous researches in the neighboring and regional aspects. Neighboring grids have strong spatial connections so that the relationship of accidents in a continuous area is relatively similar. In a larger region, the spatial location is found to have a great influence on the traffic accident and has a strong directionality.


Introduction
According to the report published by the World Health Organization (WHO), road traffic crashes result in the deaths of approximately 1.35 million people around the world each year and leave between 20 and 50 million people with non-fatal injuries [1]. Factors affecting traffic accidents can be divided into subjective and objective aspects at the macroscopic level. The objective aspects mainly include regional characteristics, road network characteristics, climate characteristics and so on. The subjective aspects mainly include human operation errors, violations of regulations, negligence, vehicle technical reasons and so on. The involvement of multiple influencing factors complicates the prediction and analysis of traffic accidents, and makes it difficult to strip out the influence of any one of these factors. Although current research is centred on quantitatively analyzing the conditions of different influencing factors and elucidating the most influential factors [2], gaps in this area of knowledge remain.
The revelation of significant spatial auto-correction in traffic accidents from spatial analysis brought an inspiration: since the multiple causes of traffic accidents are also spatial aggregates, the spatial influence on such traffic accidents must contain many valuable factors that are not directly observed, hence, local land use characteristics and spatial correlation are analyzed concurrently in this paper, using the multi-dimensional longshort term memory neural network model (MDLSTM). The method greatly improves the accuracy of traffic accident prediction by responding to multi-variate inputs with nonlinear relationship. More indicators that make comparisons existing research in the model input and output are taken into account, which is also the advantage of the MDLSTM model. In addition, this method can capture the relationship between some variables that traditional models consider to be unrelated. 3 of 18 planning. Lovegrove et al. [16] analyzed the feasibility of applying the macro-safety model to evaluate traffic improvement schemes in the traffic analysis zone (Traffic-Analysis-Zone, TAZ) in a case study.

Influencing Factors of Traffic Accidents
Different factors have different effects on traffic accidents. Previous studies on influencing factors of traffic accidents mainly focused on the attributes of personnel [17], vehicles [18], roads [19] and environment [20]. For example, Liu and Fan took traffic accidents from 2005 to 2013 in North Carolina as a sample and found drunk driving behaviors had huge impact on traffic accidents [21]. Kelley et al. studied the crash data in CIREN database from 1998 to 2012 and found side impact could be an important influencing factor on traffic accidents [22]. Cheng et al. researched on traffic accident data from San Francisco from 2008 to 2013 and found severe weather could be related to serious traffic accidents [23]. None of the existing studies looked at the causes of traffic accidents from the aspects of urban zoning differences [24], road network topology [25], etc. In our study, factors, such as plot ratio, point of interest and congestion ration representing urban zoning differences and road network topology are used to find more specific causes of traffic accidents.
Researchers in the field of traffic safety have been found to use spatial distribution as clues to track the causes of traffic accidents and focused on environmental factors. Decades ago, among objective factors, researchers also focused on the impact of road network layout, road and traffic design, traffic control, active risk management and environmental conditions on traffic safety, and the problem of traffic accidents caused by land use. There are little studies on the issue of traffic accidents related to land use, and this topic is becoming increasingly important.

Data Sources
The land use properties and traffic accident data both come from the City of Shenyang in China, which is also the area of study in this paper. The land use dataset is compiled from the point of interest (POI) data, the evening peak traffic flow data and road maps, which are collected from Open Street Map (OSM). Since POI data focus more on commercial service facilities, such as catering and entertainment, the residential data of POI are verified with the residential area information on the Anjuke platform. Fourteen basic types of POI are: Catering, Hotel, Shopping, Life Services, Tourism, Leisure and Entertainment, Sports and Fitness, Education, Medical, Transportation Facilities, Finance, Residential, Companies, and Government Organizations. The POI and evening peak traffic flow data are gathered from the Baidu Map API (Baidu, Beijing, China), and it is the average of the traffic state data of the evening rush hours from 14 June 2019 to 21 June 2019. Table 1 provides an overview of the land use dataset. The accident dataset is based on the statistics of traffic accidents in Shenyang from Jan 2015 to Dec 2017, an overview is provided in Table 2. The following fields are included: text description of the accident location, date and time, isolation of the road and cross-sectional location in the road. These indicators are all turned to a digital form to accurately model the occurrence or characteristics of traffic accidents.

. Distribution of Accidents Characteristics
The text description of accident location is matched to their latitudinal and longitudinal positions through the Baidu Map API, so that all accidents can be traced. In order to describe the traffic accidents from a macroscopic perspective, traffic accidents within 3000 m of each grid cell will be recorded as "traffic accident counts" indicator of the grid cell. The height values in the right-hand side diagram of Figure 1a represents the location of the accident. The values show the number of traffic accidents that took place within a radius of 3000 m from the center of the grid cell, as shown in Figure 1b The accident dataset is based on the statistics of traffic accidents in Shenyang from Jan 2015 to Dec 2017, an overview is provided in Table 2. The following fields are included: text description of the accident location, date and time, isolation of the road and crosssectional location in the road. These indicators are all turned to a digital form to accurately model the occurrence or characteristics of traffic accidents.

. Distribution of Accidents Characteristics
The text description of accident location is matched to their latitudinal and longitudinal positions through the Baidu Map API, so that all accidents can be traced. In order to describe the traffic accidents from a macroscopic perspective, traffic accidents within 3000 m of each grid cell will be recorded as "traffic accident counts" indicator of the grid cell. The height values in the right-hand side diagram of Figure 1a represents the location of the accident. The values show the number of traffic accidents that took place within a radius of 3000 m from the center of the grid cell, as shown in Figure 1b. The accident date and accident time are processed to linearize the relationship with the accident frequency. The date of the traffic accident is converted to the number of days till winter (represented by the winter solstice on 22nd December). Since Shenyang has more road icing in winter, winter is the season where most traffic accidents occur, as shown by Figure 2a, where the three peaks in the data distribution of traffic accident data corresponds to the three winters in 2015, 2016 and 2017.
In Figure 2b, the accident time data is illustrated in a similar fashion as the accident date data. 13:00-17:00 is the time period when traffic accidents occur frequently, so the time distance to 15:00 is taken as the value of the indicator. The accident date and accident time are processed to linearize the relationship with the accident frequency. The date of the traffic accident is converted to the number of days till winter (represented by the winter solstice on 22nd December). Since Shenyang has more road icing in winter, winter is the season where most traffic accidents occur, as shown by Figure 2a, where the three peaks in the data distribution of traffic accident data corresponds to the three winters in 2015, 2016 and 2017.
In Figure 2b, the accident time data is illustrated in a similar fashion as the accident date data. 13:00-17:00 is the time period when traffic accidents occur frequently, so the time distance to 15:00 is taken as the value of the indicator. and "post meridiem" which means time period before noon (0:00-12:00) and after noon (12:00-24:00).
As shown in Figure 3a, the isolation of the road is one of the effective factors influencing traffic accidents. There are 4 levels of isolation of the road: the "Center isolation and motor vehicle-non-motor vehicle isolation", "Center isolation", "Motor vehicle and non-motor vehicle isolation" and "None", each denoted 4, 3, 2 and 1, respectively. Figure  3b shows the cross-sectional location is another key feature in traffic accidents. There are 5 levels of cross-sectional location: "Motor vehicle lane", "Motor vehicle and non-motor vehicle mixed lane", "Non-motor vehicle lane", "sidewalk" and "cross walk", corresponding to 5, 4, 3, 2, and 1, respectively. The spatial distributions of these two indicators are as follows.

Rasterization
The data processing in this study is trying to connect the traffic accident data with the land use properties. The spatial auto-correlation is included to model the unobvious effect. To achieve this, rasterization is used to break up the land use data into raster shapes. The traffic accident data and raster data are then matched spatially, so that the MDLSTM model can capture the spatial relation between accident and land use.
The location of study of this paper is the urban area in the City of Shenyang, as shown in Figure 4a. Similar to Liu [26] and Yue [27], the rectangular region are rasterized to grid cells at the scale of around 400 m × 444 m, with the usual method [28,29] as shown in Figure 4b. In total, 12,110 grid cells (about 96 rows and 125 columns) were collected in the Shenyang urban area. To speed up learning and convergence when training the model, layer normalization was performed to scale the data into the range [0, 1] as studied by Ba J L [30]. As shown in Figure 3a, the isolation of the road is one of the effective factors influencing traffic accidents. There are 4 levels of isolation of the road: the "Center isolation and motor vehicle-non-motor vehicle isolation", "Center isolation", "Motor vehicle and nonmotor vehicle isolation" and "None", each denoted 4, 3, 2 and 1, respectively. Figure 3b shows the cross-sectional location is another key feature in traffic accidents. There are 5 levels of cross-sectional location: "Motor vehicle lane", "Motor vehicle and non-motor vehicle mixed lane", "Non-motor vehicle lane", "sidewalk" and "cross walk", corresponding to 5, 4, 3, 2, and 1, respectively. The spatial distributions of these two indicators are as follows. As shown in Figure 3a, the isolation of the road is one of the effective factors influencing traffic accidents. There are 4 levels of isolation of the road: the "Center isolation and motor vehicle-non-motor vehicle isolation", "Center isolation", "Motor vehicle and non-motor vehicle isolation" and "None", each denoted 4, 3, 2 and 1, respectively. Figure  3b shows the cross-sectional location is another key feature in traffic accidents. There are 5 levels of cross-sectional location: "Motor vehicle lane", "Motor vehicle and non-motor vehicle mixed lane", "Non-motor vehicle lane", "sidewalk" and "cross walk", corresponding to 5, 4, 3, 2, and 1, respectively. The spatial distributions of these two indicators are as follows.

Rasterization
The data processing in this study is trying to connect the traffic accident data with the land use properties. The spatial auto-correlation is included to model the unobvious effect. To achieve this, rasterization is used to break up the land use data into raster shapes. The traffic accident data and raster data are then matched spatially, so that the MDLSTM model can capture the spatial relation between accident and land use.
The location of study of this paper is the urban area in the City of Shenyang, as shown in Figure 4a. Similar to Liu [26] and Yue [27], the rectangular region are rasterized to grid cells at the scale of around 400 m × 444 m, with the usual method [28,29] as shown in Figure 4b. In total, 12,110 grid cells (about 96 rows and 125 columns) were collected in the Shenyang urban area. To speed up learning and convergence when training the model, layer normalization was performed to scale the data into the range [0, 1] as studied by Ba J L [30].

Rasterization
The data processing in this study is trying to connect the traffic accident data with the land use properties. The spatial auto-correlation is included to model the unobvious effect. To achieve this, rasterization is used to break up the land use data into raster shapes. The traffic accident data and raster data are then matched spatially, so that the MDLSTM model can capture the spatial relation between accident and land use.
The location of study of this paper is the urban area in the City of Shenyang, as shown in Figure 4a. Similar to Liu [26] and Yue [27], the rectangular region are rasterized to grid cells at the scale of around 400 m × 444 m, with the usual method [28,29] as shown in Figure 4b. In total, 12,110 grid cells (about 96 rows and 125 columns) were collected in the Shenyang urban area. To speed up learning and convergence when training the model, layer normalization was performed to scale the data into the range [0, 1] as studied by Ba J L [30].  Data processing has many steps, including map acquisition, data matching, grid transformation, window sampling and batch splitting. After batch splitting, a sliding window (9 × 9 grids) was used to sample the data and reshape them into bi-dimensional tensors, resulting in 10,092 windows in the study area. As shown in Figure 5, the windows were selected by a zero rate index, which means a window is marked as unusable when 80% of the data in the window is missing for lack of information. Among them, 100 randomly selected windows are used as the test dataset, and the remaining 9992 windows are used as the training dataset.

Validation of the Spatial Autocorrelation
The premise of this study is that traffic accidents have significant spatial auto-correlation, which give rise to the assumption that the multiple causes of traffic accidents are also spatial aggregates, and the spatial influence of such traffic accidents contains many valuable factors that are not directly observed. In this section, the spatial auto-correlation is first validated to show that these indicators do have spatial correlation.
The spatial dependency was tested using Global Moran's I and Global Geary's C statistics. The results are shown in Table 3. A statistically significant spatial cluster was found, and both results are significant at p < 0.001 significance level. Data processing has many steps, including map acquisition, data matching, grid transformation, window sampling and batch splitting. After batch splitting, a sliding window (9 × 9 grids) was used to sample the data and reshape them into bi-dimensional tensors, resulting in 10,092 windows in the study area. As shown in Figure 5, the windows were selected by a zero rate index, which means a window is marked as unusable when 80% of the data in the window is missing for lack of information. Among them, 100 randomly selected windows are used as the test dataset, and the remaining 9992 windows are used as the training dataset.  Data processing has many steps, including map acquisition, data matching, grid transformation, window sampling and batch splitting. After batch splitting, a sliding window (9 × 9 grids) was used to sample the data and reshape them into bi-dimensional tensors, resulting in 10,092 windows in the study area. As shown in Figure 5, the windows were selected by a zero rate index, which means a window is marked as unusable when 80% of the data in the window is missing for lack of information. Among them, 100 randomly selected windows are used as the test dataset, and the remaining 9992 windows are used as the training dataset.

Validation of the Spatial Autocorrelation
The premise of this study is that traffic accidents have significant spatial auto-correlation, which give rise to the assumption that the multiple causes of traffic accidents are also spatial aggregates, and the spatial influence of such traffic accidents contains many valuable factors that are not directly observed. In this section, the spatial auto-correlation is first validated to show that these indicators do have spatial correlation.
The spatial dependency was tested using Global Moran's I and Global Geary's C statistics. The results are shown in Table 3. A statistically significant spatial cluster was found, and both results are significant at p < 0.001 significance level.

Validation of the Spatial Autocorrelation
The premise of this study is that traffic accidents have significant spatial auto-correlation, which give rise to the assumption that the multiple causes of traffic accidents are also spatial aggregates, and the spatial influence of such traffic accidents contains many valuable factors that are not directly observed. In this section, the spatial auto-correlation is first validated to show that these indicators do have spatial correlation.
The spatial dependency was tested using Global Moran's I and Global Geary's C statistics. The results are shown in Table 3. A statistically significant spatial cluster was found, and both results are significant at p < 0.001 significance level.

MDLSTM Model
The basic model of MDLSTM is the recurrent neural network (RNN) model developed to simulate the regulation of sequence data. RNN can be widely applied in natural language processing (NLP), since it has the strength of fitting the non-linear relationship between words' occurrence in a specific location and other words in the context. The advantage of this model is that it can retain the information transferred between distanced words. Take Figure 6 as an example.

MDLSTM Model
The basic model of MDLSTM is the recurrent neural network (RNN) model developed to simulate the regulation of sequence data. RNN can be widely applied in natural language processing (NLP), since it has the strength of fitting the non-linear relationship between words' occurrence in a specific location and other words in the context. The advantage of this model is that it can retain the information transferred between distanced words. Take Figure 6 as an example.
x(t-1) Figure 6. Application of recurrent neural network in natural language processing . "t" means the word step while "t − 1" means the previous step of "t". The "h(t)" means the output of step "t" and the "x(t)" means the input of step "t". The "?" means the word that needs to be predicted corresponding to the predicting result word "Lied".
The word "Dentist" is the input of the second step (t − 1) of the model, and the next word "Lied" is the expected output. In this process, the occurrence of "Lied" is affected by not only the word "Dentist", but also the previous inputs, such as "The". MDLSTM is the bi-dimensional version of the developed form of RNN, which has the structure below: As shown in Figure 7a, the improvements made on the model to a basic RNN are in two aspects. The first is the increases of the long-distance impact through the widely known "Gate" structure, which gave rise to the development of the Long-Short Term Memory neural network model in 1997 [30]. The second is the expansion of the dimension of LSTM in 2007 [31], which made the model more suitable for spatial analysis. In the traffic accident context, every cell in the bi-dimensional network represents a grid cell in the urban area, as shown in Figure 7a. Application of recurrent neural network in natural language processing. "t" means the word step while "t − 1" means the previous step of "t". The "h(t)" means the output of step "t" and the "x(t)" means the input of step "t". The "?" means the word that needs to be predicted corresponding to the predicting result word "Lied".
The word "Dentist" is the input of the second step (t − 1) of the model, and the next word "Lied" is the expected output. In this process, the occurrence of "Lied" is affected by not only the word "Dentist", but also the previous inputs, such as "The". MDLSTM is the bi-dimensional version of the developed form of RNN, which has the structure below: As shown in Figure 7a, the improvements made on the model to a basic RNN are in two aspects. The first is the increases of the long-distance impact through the widely known "Gate" structure, which gave rise to the development of the Long-Short Term Memory neural network model in 1997 [30]. The second is the expansion of the dimension of LSTM in 2007 [31], which made the model more suitable for spatial analysis. In the traffic accident context, every cell in the bi-dimensional network represents a grid cell in the urban area, as shown in Figure 7a.
Influences among the grids cells are further expanded, as shown in the following figures. Figure 8 represents the cell (t, s) in the MDLSTM model shown in Figure 7a, with the input x t,s and the output h t,s . It also represents the grid cell located at (t, s) in the urban area shown in Figure 7b; since all grid cells have the same trained parameters, the A used are duplicated in every cell. The structure consists of input, output and transfer. Influences among the grids cells are further expanded, as shown in the following figures. Figure 8 represents the cell (t, s) in the MDLSTM model shown in Figure 7a, with the input , and the output ℎ , . It also represents the grid cell located at (t, s) in the urban area shown in Figure 7b; since all grid cells have the same trained parameters, the A used are duplicated in every cell. The structure consists of input, output and transfer. In the urban safety context, the input , is the land use properties, including the plot ratio, number of types of POIs, centrality, distance to the CBD, number of surrounding road sections and the congestion ratio of grid cell located at (t, s). The output ℎ , is the accident characteristics, including the accident counts, date, time, isolation, cross-sectional location of the grid cell (t, s). Others are intermediate variables, including , , , , , , , and , , which vary between grid cells. Weights and bias, including , , , , , , , , , and are the same for every cell in the entire network. Influences among the grids cells are further expanded, as shown in the following figures. Figure 8 represents the cell (t, s) in the MDLSTM model shown in Figure 7a, with the input , and the output ℎ , . It also represents the grid cell located at (t, s) in the urban area shown in Figure 7b; since all grid cells have the same trained parameters, the A used are duplicated in every cell. The structure consists of input, output and transfer. In the urban safety context, the input , is the land use properties, including the plot ratio, number of types of POIs, centrality, distance to the CBD, number of surrounding road sections and the congestion ratio of grid cell located at (t, s). The output ℎ , is the accident characteristics, including the accident counts, date, time, isolation, cross-sectional location of the grid cell (t, s). Others are intermediate variables, including , , , , , , , and , , which vary between grid cells. Weights and bias, including , , , , , , , , , and are the same for every cell in the entire network. In the urban safety context, the input x t,s is the land use properties, including the plot ratio, number of types of POIs, centrality, distance to the CBD, number of surrounding road sections and the congestion ratio of grid cell located at (t, s). The output h t,s is the accident characteristics, including the accident counts, date, time, isolation, cross-sectional location of the grid cell (t, s). Others are intermediate variables, including S t,s , i t,s , f t,s,j , and o t,s , which vary between grid cells. Weights and bias, including Based on the three steps, which outlines the basic flow of the model, the relationship among the land use properties of every grid cell, the accident characteristics of the surrounding cells, and the accident characteristics of the current cell (t, s) are as follows: Transfer: C t,s = i t,s ·S t,s + f t,s,1 ·C t−1,s + f t,s,2 ·C t,s−1 Output: h t,s = o t,s ·tanh(C t,s ) (3) where S t,s is the state from the local land use. C t,s is the total state. tanh is a commonly used "hyperbolic tangent function" function in the machine learning method. W C and b C are the weight matrix and bias matrix of the state S t,s . x t,s is the input of the grid cell (t, s), h t−1,s and h t,s−1 are the output of grid cells (t − 1, s) and (t, s − 1). I t,s is the integrated matrix including the input of grid cell (t, s) x t,s and the output h of grid cell (t − 1, s) and (t,s − 1). i t,s , f t,s,1 , f t,s,2 and o t,s are the intermediate variables of grid cell (t, s). C t,s can be transformed to the output h t,s that represents the traffic accident characteristics through an output rate o t,s . C t,s can be interpreted as the traffic accidents potential. Within C t,s , the elements corresponding to the accident counts, date, time, isolation and cross-sectional location can be viewed as the most dangerous location, date, time, isolation and cross-sectional location. If the second element in C t,s grows larger, the potential of the current grid cell will move to a date closer to winter, meaning the traffic accident will be more likely to happen in the winter.
The intermediate variables can be interpreted as follows.
The o t,s shows the proportion of potential traffic accidents manifested as real traffic accidents. The i t,s shows the proportion of land use properties that affects the traffic accident characteristics, and the f t,s,j shows the proportion of surrounding traffic accident characteristics that generates an impact to the traffic accident characteristics of the current cell. In the training process of the model, although the intermediate variables are not directly determined, the basic parameters are weights and bias. Through these parameters, every grid cell resolves its own value of the intermediate variables S t,s , i t,s , f t,s,j , and o t,s : where W i and b i , W f,j and b f,j , W o and b o are the weight matrix and bias matrix of the intermediated variables.

Discussion
The discussion section is organized as follow: Section 4.1 first presents the validation of the model comparing to the LSTM, RNN, and BPNN. This proves the effectiveness of the model and show its advantages over other neural network structures. Section 4.2 interprets the state C t,s of each grid cell to show the characteristics of traffic accident potential. The spatial aggregation of the traffic accident count, date, time, isolation and cross-sectional location are explained to discuss the accident potential. Section 4.3 detailed explains the intermediate variables in the urban safety context to reveal the influencing factors on these characteristics of traffic accident potential. The example conclusion can be drawn, as grid cells with higher o t,s are more likely for potential traffic accidents to occur. Section 4.4 summarizes the position of all grid cells, and some general rules are proposed based on weights and bias interpretations. The potential of accident date is found to be largely influenced by the local indicators; the potential of cross-sectional location is found to be less influenced by the local land use properties.
Corresponding to the three levels, Section 4.2 focuses on the spatial distribution of the potential by explaining the accident potential. Section 4.3 focuses on an example grid cell by discussing the intermediate variables that influence the potential. Section 4.4 focuses on a general rule for the entire urban area through interpreting the weight matrix.

Validation of the MDLSTM Model
Before explaining the mechanism of the model, its accuracy and reliability are first tested in comparison with the other neural network models. In this section, backpropaga-tion neural network (BPNN), recurrent neural network (RNN), long-short term memory neural network (LSTM), and the multi-dimensional long-short term memory neural network (MDLSTM) are used to show the differences in modeling the land use properties and accident characteristics. The results are as follows. Figure 9 shows the mean square error (MSE) of the MDLSTM, LSTM, RNN and BPNN models trained based on the training dataset. In the MDLSTM model, a 3 × 3 window, at the center of the 9 × 9 windows introduced in Section 3.1, is selected as the object for calculating MSE. This greatly reduces the impact of window sampling on the accuracy of the model. The windows are also applied in LSTM, RNN and BPNN models, so that the accuracy can be compared fairly. It shows that MDLSTM not only converges faster than the other three models on the training dataset, but also has a higher accuracy. In order to demonstrate whether the model is overfit, the performance of MDLSTM and LSTM, RNN, BPNN on the testing dataset are also compared. MDLSTM is proved to perform better, as shown in Table 4.
Corresponding to the three levels, Section 4.2 focuses on the spatial distribution of the potential by explaining the accident potential. Section 4.3 focuses on an example grid cell by discussing the intermediate variables that influence the potential. Section 4.4 focuses on a general rule for the entire urban area through interpreting the weight matrix.

Validation of the MDLSTM Model
Before explaining the mechanism of the model, its accuracy and reliability are first tested in comparison with the other neural network models. In this section, backpropagation neural network (BPNN), recurrent neural network (RNN), long-short term memory neural network (LSTM), and the multi-dimensional long-short term memory neural network (MDLSTM) are used to show the differences in modeling the land use properties and accident characteristics. The results are as follows. Figure 9 shows the mean square error (MSE) of the MDLSTM, LSTM, RNN and BPNN models trained based on the training dataset. In the MDLSTM model, a 3 × 3 window, at the center of the 9 × 9 windows introduced in Section 3.1, is selected as the object for calculating MSE. This greatly reduces the impact of window sampling on the accuracy of the model. The windows are also applied in LSTM, RNN and BPNN models, so that the accuracy can be compared fairly. It shows that MDLSTM not only converges faster than the other three models on the training dataset, but also has a higher accuracy. In order to demonstrate whether the model is overfit, the performance of MDLSTM and LSTM, RNN, BPNN on the testing dataset are also compared. MDLSTM is proved to perform better, as shown in Table 4. Figure 9. Performance of MDLSTM, LSTM, RNN and BPNN on the training dataset. MDLSTM means the "multi-dimensional long-short term memory neural network". LSTM means the "long-short term memory neural network". RNN means the "recurrent neural network". BPNN means the "back-propagate neural network".

Characteristics of Traffic Accident Potential
It is known that the characteristics of traffic accident potential has a spatial distribution that disclose some significant, essential information about where or which kind of accidents could take place. According to the model structure, the potential , of a grid cell is based on the input land use properties , , surrounding accident characteristics ℎ , and ℎ , and Figure 9. Performance of MDLSTM, LSTM, RNN and BPNN on the training dataset. MDLSTM means the "multidimensional long-short term memory neural network". LSTM means the "long-short term memory neural network". RNN means the "recurrent neural network". BPNN means the "back-propagate neural network".

Characteristics of Traffic Accident Potential
It is known that the characteristics of traffic accident potential has a spatial distribution that disclose some significant, essential information about where or which kind of accidents could take place. According to the model structure, the potential C t,s of a grid cell is based on the input land use properties x t,s , surrounding accident characteristics h t−1,s and h t,s−1 and the intermediate variables S t,s , i t,s , f t,s,j , and o t,s . The accident characteristics h t,s can be determined by the potential C t,s and the intermediate variable o t,s .
In Figure 10, the two axes (length and width) indicates the spatial location of the grid cell, and the ordinate shows the size of the hidden danger of each traffic accident characteristics. For example, the higher black points are grid cells with high quality isolation, such as center isolation.
As for the accident count, the gathering area of traffic accident can be restricted to a certain area because of the training input and output data. However, comparing to the distribution of the accident, this area is much larger (see Figure 3). Considering the value of the accident count potential shown in Figure 11a, the accident count potential is gathered at several locations within the whole traffic accident potential area. Except for the scattered points, the horizontal line, which represents the 64th row, corresponds to the "Hunnan middle road", where hidden dangers in traffic accident concentrate.  , , , , , , , and , . The accident characteristics ℎ , can be determined by the potential , and the intermediate variable , . In Figure 10, the two axes (length and width) indicates the spatial location of the grid cell, and the ordinate shows the size of the hidden danger of each traffic accident characteristics. For example, the higher black points are grid cells with high quality isolation, such as center isolation. As for the accident count, the gathering area of traffic accident can be restricted to a certain area because of the training input and output data. However, comparing to the distribution of the accident, this area is much larger (see Figure 3). Considering the value of the accident count potential shown in Figure 11a, the accident count potential is gathered at several locations within the whole traffic accident potential area. Except for the scattered points, the horizontal line, which represents the 64th row, corresponds to the "Hunnan middle road", where hidden dangers in traffic accident concentrate. As for the accident date shown in Figure 11b, the dividing line of values lower than 10 and higher than 10 is at a similar position as the 64th row, which means that accidents are more likely to take place closer to winter on the north side of the line, and less likely on the south side of the line. In addition, the trend shows traffic accidents in the south-east of the urban center are more likely to occur in winter, and specific measures should be taken.
As for the accident happening time, shown in Figure 12a, 15:00 is found to be the period of high traffic accidents (see Section 3.1.2), except in regions close to the dividing line. This indicates comparatively more accidents in the daytime. The north and west part of the urban area are also more dangerous at times. Since the isolation is decided by the presenting facilities, the results in Figure 12b only shows the distribution of the facilities, such as the isolation form of each road.  As for the accident date shown in Figure 11b, the dividing line of values lower than 10 and higher than 10 is at a similar position as the 64th row, which means that accidents are more likely to take place closer to winter on the north side of the line, and less likely on the south side of the line. In addition, the trend shows traffic accidents in the southeast of the urban center are more likely to occur in winter, and specific measures should be taken.

The Impact of Land Use Properties and Spatial Effect on the Traffic Accident
As for the accident happening time, shown in Figure 12a, 15:00 is found to be the period of high traffic accidents (see Section 3.1.2), except in regions close to the dividing line. This indicates comparatively more accidents in the daytime. The north and west part of the urban area are also more dangerous at times. Since the isolation is decided by the presenting facilities, the results in Figure 12b only shows the distribution of the facilities, such as the isolation form of each road.   As for the accident date shown in Figure 11b, the dividing line of values lower than 10 and higher than 10 is at a similar position as the 64th row, which means that accidents are more likely to take place closer to winter on the north side of the line, and less likely on the south side of the line. In addition, the trend shows traffic accidents in the southeast of the urban center are more likely to occur in winter, and specific measures should be taken.
As for the accident happening time, shown in Figure 12a, 15:00 is found to be the period of high traffic accidents (see Section 3.1.2), except in regions close to the dividing line. This indicates comparatively more accidents in the daytime. The north and west part of the urban area are also more dangerous at times. Since the isolation is decided by the presenting facilities, the results in Figure 12b only shows the distribution of the facilities, such as the isolation form of each road.  For example, the first cell in this table, 0.3, means that 30% of the accident potential caused by the land use properties can join the potential calculation and be transferred to the final number of accidents. 0.12 in the first column and the second row shows every 1 unit change in land use properties will cause a 0.12 units change in land use potential, regardless of the i t,s . The third and fourth value in the first column, 0.74 and 0.05, shows that 74% and 5% of the accident potential can be transferred from the north and west neighboring grid cells. 0.6 in the first column and the last row shows at least 60% of the potential of accident count will take place in reality. Since these variables are from either σ (1st, 3th, 4th and 5th row) function or tanh (2nd row) function, the 2nd row has both negative and positive values. For the grid cell (50, 40), among all accident characteristics, the accident date (0.77) has the highest proportion of accident potential depending on the land use properties. In contrast, the accident date is negatively affected by the land use properties (−0.27). The one with the highest conversion rate (0.97) are from land use properties to the potential of accident time. From the direction point of view, it is clear that the f t,s,j in the first dimension is much larger than in the second dimension. That may reflect the road form, since this point is near a high level vertical road. In addition, the o t,s shows that about 60% of the traffic accidents potential on the plot will cause accidents. The date and time of the accident are relatively closer (0.56 and 0.62, while 0 is the closest and 1 is the farthest) to winter and night.
The regional regulation is based on the 30 usable windows behind the grid cell (50, 40), as shown in Figure 13

General Rules Based on the Interpretation of the Weight Matrix
As discussed in Section 3.3, the weight matrix shows the basic rule that traffic accidents obey. By sorting out and summarizing the relationship of each grid cell among the land use properties, intermediate variables, traffic accident potential and traffic accident characteristics, a general rule can be devised. 4.4.1. Relationship between Land Use Properties , and Accident Potential S , plays an essential role in the model, since it transforms a grid cell's land use properties to the accident potential. Meanwhile, itself is also generated based on the land use properties. In this section, elements in are first explored and interpreted to show the basic relationship between land use and traffic accident potential. Table 6 shows the after training.

General Rules Based on the Interpretation of the Weight Matrix
As discussed in Section 3.3, the weight matrix shows the basic rule that traffic accidents obey. By sorting out and summarizing the relationship of each grid cell among the land use properties, intermediate variables, traffic accident potential and traffic accident characteristics, a general rule can be devised.

Relationship between Land
Use Properties x t,s and Accident Potential S t,s W C plays an essential role in the model, since it transforms a grid cell's land use properties to the accident potential. Meanwhile, W C itself is also generated based on the land use properties. In this section, elements in W C are first explored and interpreted to show the basic relationship between land use and traffic accident potential. Table 6 shows the W C after training.
Negative values in W C suggest land use properties contribute negatively to the traffic accident potential. For example, the most negative effect is the impact of umber of surrounding road sections on the accident cross-section location (−0.57), which indicates that higher accessibility may lead to a higher possibility for non-motorized lanes traffic accidents than in motorized lanes.
Positive values suggest the land use properties have a positive impact to the accident characteristics. For example, the most positive effect is congestion ratio on the accident date (2.11), which indicates that higher congested areas may lead to accident occurrences further away from winter. The comparison of numbers in the "Sum" row indicates the relative impact of all the chosen land use properties on the accident potential. The accident date is found to be largely influenced by the land use properties (2.93), which shows great variations in the dates of accidents potential in different regions, and the need for more targeted measures in seasonal control. Every type of land use properties has a positive impact on the accident date, except for the distance to CBD. However, the accident cross-sectional location is negatively affected by the land use properties (−0.59).

Accident Potential C t,s Based on the Local One S t,s
The i t,s shows the proportion of local state S t,s that will influence the accident potential C t,s . This i t,s is further generated based on the input weight W i . Therefore, elements in W i reflects the impact of land use properties on the proportion of hidden traffic accidents caused by local land use properties. Table 7 shows W i after training.
In the W i which differs from W C , negative values suggest the land use properties has lower contribution to the traffic accident potential. For example, the most negative effect is the impact of congestion ratio on the accident count (−1.34), which indicates the higher the level of congestion, the lower the number of accidents affected by local land use properties. Positive values, such as the congestion ratio on the accident date (3.24), indicates that the date of traffic accident occurrences in higher congested area may be further away from winter.
The comparison of numbers in the "Sum" row indicates the relative impact of all the chosen land use properties on the accident potential. In accordance to Equation (2), the impact of this W i is very similar to the W C , since the accident potential is affected by the product of i t,s and S t,s . As discussed in Section 3.3., W f ,j represents the impact of land use properties on the transferred accident potential from the neighboring grid cells. This is the main step for considering the spatial effects in the model. Since we suppose the unit values of the spatial effect in east-west direction and the north-south direction are equivalent, the focus is placed on W f ,1&2 , the sum of W f ,1 and W f ,2 . Table 8 shows the first part of the transfer weight matrix W f ,1&2 after training. In the W f ,1&2 , negative values suggest the land use properties has lower contribution to the transfer ratio of traffic accident potential. For example, the most negative effect is the impact of congestion ratio on the accident count (−1.55), which indicates the higher the level of congestion, the lower the number of accidents affected by surrounding accident potential. Positive values, such as the congestion ratio on the accident isolation (4.15), indicates that the isolation form is affected more by the neighboring grid cells.
Moreover, the comparison between numbers in the "Sum" row shows a similar conclusion that the potential of accident count is less influenced by the neighboring accident potential. The 3.04, corresponding to the accident isolation, shows that the isolation form of every grid cell is greatly influenced by the spatial effect.

Proportion of Accident Potential C t,s That Leads to an Accident h t,s
The o t,s represents the proportion of accident potential, C t,s , that leads to an accident, h t,s . W o , as the corresponding weight, represents the impact of land use properties on o t,s . Higher o t,s suggests a serious proportion of accidents potential resulting in traffic accidents, which also reflect factors in the local area for avoiding traffic accidents. Table 9 shows the first part of the input weight matrix W o after training. In W o , negative values also suggest the land use properties has lower contribution to the o t,s , which shows the proportion of traffic accident potential that eventually occurred.
For example, the most negative effect is the impact of distance to CBD on accident isolation (−1.07), which indicates that the farther away from CBD, the less likely for the isolation of hidden traffic accident to result in accidents. Positive values, such as the distance to CBD on the accident cross-section location (0.64), indicates that the farther away from CBD, the more significant the cross-sectional location becomes as a factor in the potential of traffic accident that result in accidents.
Moreover, the comparison between numbers in the "Sum" row shows that the accident time will be far from potential if the land use properties gets higher. The accident count will also be more predictable and explainable by the traffic accident potential with a higher land use property.

Conclusions
This study focuses on the interpretation and application of the multi-dimensional long-short term memory neural network model (MDLSTM) on modelling the relationship between traffic accident and selected land use properties.
The idea is to divide the influencing factors of traffic accident into two categories: a spatial category and a local category. The local category considers land use properties, which include the plot ratio, number of types of POIs, centrality, distance to CBD, number of surrounding road sections and congestion ratio. Other parameters are considered in the spatial category.
Some interesting insights are found.
(1) The spatial distribution of accident potential purposed a division line, on both side of which the accident potential shares significant differences. (2) Spatial effect differ strongly through directions between north-to-south and west-to-east, especially the characteristics about the physical infrastructure, such as the isolation form. (3) The potential of accident date is found to be largely influenced by the local indicators, and the potential of cross-sectional location is found to be less affected by the local land use properties. The potential of isolation form is highly spatial correlated, while the accident count shows differences. As for the proportion of potential accident that causes real-life traffic accident, the accident count shows better interpretability, while the higher land use characteristics leads to lower accuracy in accident time prediction.
Based on the findings above, several applicable advices can be proposed to the urban managers and researchers. It is a practical problem for urban managers to predict the location of the traffic accidents, especially for managers in Shenyang. Results show that "Hunnan middle road" is an essential accident potential hotspot. It also illustrates the potential form which further shows the accidents might be a critical problem in some regions near city center. At the level of the whole urban area, focus need to be put on the accident non-motorized lane especially in the suburban area with simple isolation facilities. In addition, the traffic accidents around congested area are also important since it positively correlated to the plot ratio. Winter accidents may occur far away from the city center. Therefore, target measures are needed in seasonal accident control.
The innovations of this paper are: 1. Multiple local and surrounding influence factors are considered, and appropriate model is used to capture their influence. The model separates spatial influence factors from local influence factors, which greatly improves the interpretability of traffic accident analysis models. 2. Multi-Dimensional Long Short Term neural network (MDLSTM) model is used to explore the relationship between input and output, with higher accuracy and computational efficiency. 3. Interpretation of the relationship of land use properties and traffic accidents are proposed, and a three levels of explanation method was used. The hidden factor-accident potential is found, containing the local and spatial effect. At last, the general rules of land use properties with the traffic accident characteristics are interpreted in detail to provide guidance for policy making.