GIS-based spatial modelling of COVID-19 death incidence in São Paulo, Brazil

Seeking to understand the socio-spatial behaviour of the COVID-19 virus in the most impacted area in Brazil, five spatial regression models were analysed to assess the disease distribution in the affected territory. Results obtained using the Spearman correlation test provided evidence for the correlation between COVID-19 death incidence and social aspects such as population density, average people per household, and informal urban settlements. More importantly, all analysed models using four selected explanatory variables have proven to represent at least 85 per cent of reported deaths at the district level. Overall, our results have demonstrated that the geographically weighted regression (GWR) model best explains the spatial distribution of COVID-19 in the city of São Paulo, highlighting the spatial aspects of the data. Spatial analysis has shown the spread of COVID-19 in areas with highly vulnerable populations. Our findings corroborate reports from the recent literature, pointing out the need for special attention in peripheral areas and informal settlements.


I. IntrODuCtIOn
The World Health Organization (WHO) has recognized the new coronavirus as a pandemic, (1) and as of 22 June 2020, SARS-CoV-2 has infected more than 9.1 million people worldwide, causing more than 473,000 deaths. (2) Brazil was the second leading country in number of COVID-19 total cases and total deaths in June. São Paulo State accounted for more than 20 per cent of total confirmed cases in Brazil and the city of São Paulo for almost 50 per cent of total confirmed cases in São Paulo State. (3) Understanding the incidence of COVID-19 is critical for effectively controlling the transmission. Therefore, the aim of this study was to conduct a GIS-based spatial modelling analysis of COVID-19 in the city of São Paulo, considering both environmental and social factors.
Spatial regression models have been recently reported as a useful approach for the assessment of the spatial distribution of SARS-CoV-2. (  While local models represent the spatial relationship among the variables, global models offer the advantage of a simple application for scenario assessment. In the global models, the relationships between explanatory variables and dependent variables are assumed not to produce spatial variation. (5) Ordinary least squares (OLS) is a regression model that evaluates the relationship between explanatory and dependent variables and does not consider the spatial dependency among observations. The spatial lag model (SLM) incorporates spatial dependency into the OLS. The spatial error model (SEM) assumes spatial dependency in the error term. (6) In the local models, the geographical context is considered, with estimates of specific parameters for each location. (7) The geographically weighted regression (GWR) considers the spatial heterogeneity of data. A multiscale GWR considers variations in the studied relationships at different spatial scales. (8)

II. BACKGrOunD
As of 22 June 2020, there were 1,106,470 confirmed COVID-19 cases in Brazil, which was at that point in time still the second leading country in the number of both total cases and total deaths (51,271). (9) São Paulo is the largest city in Brazil, and also the largest city in Latin America, with a population of 12.3 million people. (10) It plays an important role in the industrial and service sectors. (11) There are important socioeconomic and demographic differences among the 96 districts of this mega-city (Map S1 and Table S1 in the online supplementary information). The central-western region is home to an elderly population with higher incomes per capita, while a younger population with a higher unemployment rate lives in the peripheral areas. (12) The urban inequalities between the centre and the peripheral areas of São Paulo have been documented for over 20 years, (13) and still constitute a problem to overcome. Budds and Teixeira indicated in 2005 that one-third of the inhabitants in the city of São Paulo lived in substandard settlements. (14) Recently, Wilkinson has suggested that the population living in these types of settlements is more susceptible to COVID-19, because social isolation cannot be practised easily in the living environments that tend to accompany poor economic conditions. (15) There are different types of informal settlements in São Paulo, with distinct features. (16) A considerable number of informal settlements still present conditions of extreme deprivation, although there have been improvements in infrastructure during the 1990s and 2000s. (17) It is important to note that a centre-periphery model is not capable of explaining the complexity of peripheral areas as there are spatial discontinuities caused by sub-centres, i.e. heterogeneous urban spaces constituting a sort of patchwork. (18) This demonstrates the need for models to identify the relationship between socioeconomic spatial differences and the spread of diseases.
14. Budds, J and P Teixeira (2005), "Ensuring the right to the city: pro-poor housing, urban development and tenure legalization in São  of the city of São Paulo has been publishing fortnightly bulletins reporting the number of COVID-19 cases by district, as well as the number of deaths -both confirmed and suspected -to be due to  At first glance, data show that the number of cases is higher in the central area, while the number of deaths is higher in the peripheral area, indicating the underreporting of COVID-19 cases in this area.
In general, in São Paulo, as in Brazil more generally, COVID-19 testing has been conducted primarily among hospitalized people, (21) although wealthy people may also pay for precautionary testing. (22) Therefore the numbers of confirmed and suspected deaths from COVID-19 seem a better representation of the situation in the city of São Paulo.

III. MAterIALS AnD MetHODS
Drawing on data made available by São Paulo State (23) and on a digital map of the city of São Paulo, (24) a list was composed of 18 demographic and socio-environmental explanatory variables. The following variables were considered: population (people); population density (people per square kilometre); population 60 years and older (people); population density of those 60 years and older (people per square kilometre); number of households per district; average number of people per household; number of informal urban settlements per district; illiteracy (%); graduates of higher education (%); average monthly income (R$ per capita); access to potable water (%); access to both a toilet and potable water (%); population living in a household with two or more people per bedroom (%); population living in an urban household with solid waste collection (%); population living in a household with access to electricity (%); population living in a household with inadequate sanitation (%); population living in substandard settlements (walls not made of brick or wood) (%); and the Municipal Human Development Index score. (25) Test results for COVID-19 incidence fail to accurately represent the actual distribution in peripheral areas because access to diagnostic tests is more challenging there. (26) This makes it difficult to compare rates in different parts of the city. Therefore, in an attempt to reduce this discrepancy, this study focused instead on the numbers of both confirmed and suspected deaths by COVID-19, which were analysed from 8 March to 18 June 2020. (27) The Spearman correlation test at the 95 per cent significance level (α = 0.05) was used to assess the correlation between cumulative deaths (confirmed and suspected) and socio-environmental variables.
Seeking to understand the socio-spatial behaviour related to COVID-19 in the most impacted area in Brazil, five spatial regression (28) models were analysed in order to assess the disease distribution in the affected territory. These included three global models -namely, the ordinary least squares (OLS), spatial error model (SEM) and spatial lag model (SLM) -and two local models -namely, geographically weighted regression (GWR) and multiscale geographically weighted regression (MGWR). (29) The stepwise forward procedure (30) and the Spearman correlation test were used to determine the most relevant explanatory variables for the regression models. The software programs GeoDa 1.14 and MGRW 2.2 were used to run the models, and the software program

IV. reSuLtS
Using the Spearman correlation test, significant positive correlations were found between deaths (confirmed and suspected as being due to COVID-19) and population (total and 60 years and older), population density (people per square kilometre), number of households, average people per household and informal urban settlements (Table 1). Using both the stepwise forward procedure and the Spearman correlation test, the following explanatory variables were selected as the most relevant for the regression models: 60 years and older (people), population density (people per square kilometre), average people per household (people), and Municipal Human Development Index (MHDI), all at the district level. R$: reais, Brazilian currency. US$ 1 = R$ 5.32, as of 10 September 2020.
*Less than 10 -6 . In the OLS, SLM and SEM regression models, selected explanatory variables were positively associated with COVID-19 deaths (confirmed and suspected) (p-value < 0.01), except for the MHDI, indicating a weak relation of this index with the dependent variable. Because spatial dependence is incorporated into the SLM and SEM models, these regression models have shown slightly better performance than the OLS model (Table 2). Summary statistics of the three global models are presented in Tables S2 and S3 in the online supplement. The (M)GWR models have shown superior performance for data adjustment, as the AICc values were significantly lower than the values obtained for the global models (Table 2).
Overall, the adjusted R² values indicate that selected explanatory variables explain between 85.7 per cent and 90.0 per cent of the COVID-19 death incidence (confirmed and suspected).
Considering the global models, and analysing both the R² and the AICc, the SEM model has presented best adherence to analysed data (Table 2). A map of reported COVID-19 cases, SEM-predicted COVID-19 cases, and residuals (real -prediction) is presented in Maps 1A-1C. The highest residuals were found: i) to be positive values in the districts with more reported deaths and in districts in the historic centre; and ii) to be negative values in districts with low death incidence compared to their neighbours. Moreover, using the overlay procedure to show the locations of informal urban settlements, we have observed that the districts with the most deaths are those with higher numbers of informal urban settlements.
The models based on geographically weighted regression (GWR) provide a spatial evaluation of distinct variables, as the process spatial heterogeneity is considered, (32) unlike the global models. Although the multiscale variation (M) of the GWR model is expected to display better outputs, (33) our results show comparable performances of the GWR and MGWR models. Because the GWR is more suitable for a detailed assessment at the district scale, the GWR model was used to analyse the selected variables influencing the spread of COVID-19 in São Paulo. Spatial variations of the selected variables can be detected by analysis of their coefficients using the GWR model (Maps 2A-2D).
In the northwestern region of São Paulo, the population density, average people per household and MDHI variables represent well the number of deaths, while in the southwestern region of the city, the number of people aged 60 years and older proved to influence the number of deaths in densely populated districts with numerous informal urban settlements. In the central area of the city the MHDI is inversely related to the number of deaths.

V. DISCuSSIOn
Our results have demonstrated that the GWR model best explained the spatial distribution of COVID-19 in the city of São Paulo, highlighting the spatial aspects of the data. Mollalo et al. selected the MGWR as the best M A P 1

M A P 2 the effects of people aged 60 years and older (2A), population density (2B), average people per household (2C) and MDHI (2D) in describing death incidence by COVID-19 in
São Paulo using the Gwr model  (2012), "Unraveling the mystery of Brazilian jeitinho: a model for data from the United States (R² = 0.68). (34) Because the spatial heterogeneity is considered in the local models in order to evaluate the influence of each variable in the final output, a more detailed analysis is possible. Hence, we suggest a preliminary analysis of the models when feasible, or the conjunction of global and local models to better understand the spread of the COVID-19 and also other infectious diseases in large cities. Analysing the number of deaths (confirmed and suspected) is imperative to understand the spread of COVID-19 in the city of São Paulo. In Brazil, even in wealthier cities, the lack of testing for COVID-19 has been an issue, (35) and diagnosis may be especially difficult among disadvantaged people. (36) For example, the district with the highest number of confirmed cases on 30 April 2020 had 200 cases and 11 deaths, while the district with the highest number of deaths had 103 deaths and 130 confirmed cases. This discrepancy probably occurs due to access to paid tests for the population with high incomes, indicating the underreporting in peripheral areas. Furthermore, in Brazil, the vulnerability of the health system is also a reason for concern. Hospital beds, for example, are inefficiently distributed, (37) and support measures for vulnerable populations are inadequate. It is also important to emphasize the effectiveness of social isolation measures in the metropolitan area of São Paulo as a means of controlling the load of COVID-19 in the healthcare system, especially the public system. One recent study reported that the social isolation rate (38) was found to be negatively correlated with cumulative confirmed COVID-19 cases. (39) The studied models fit very well (R² greater than 0.85) real data on confirmed and suspected deaths due to COVID-19 in the city of São Paulo. However, death incidence per 1,000 people is high in some districts in the historic centre (Table S1 in the online supplement), although the number of deaths is lower than in other districts. The GWR model explicitly shows this discrepancy, which is also evidenced by the MDHI's influence (Map 2). Households known as cortiços (40) in the historic centre of the city of São Paulo present significant differences compared to nearby districts because of their precarious conditions and high social vulnerability. (41) Results obtained using the Spearman correlation test have made evident the correlation between COVID-19 death incidence and social aspects such as population density, average people per household, and informal urban settlements. Spatial analysis has shown the spread of COVID-19 in areas with highly vulnerable populations (Map 1). Recently, Corburn et al. and Wilkinson have reported that social distancing and self-quarantine may be impractical for vulnerable populations, because of the following factors: i) high numbers of informal workers, leading to financial vulnerability; ii) space constraints; iii) inadequate access to water and sanitation; and iv) a lack of secure and adequate housing. (42) Furthermore, it is worth mentioning a cultural feature that may also influence the adherence to social distancing: the Brazilian jeitinho, described as a social mechanism of breaking rules to deal with unexpected situations. (43) Therefore, governmental (44) and social (45) initiatives (46) have sought to communicate better with the population in São Paulo as a means of improving the awareness of the importance of the social isolation measures and thus increase the adherence to social distancing.  Our results are consistent with vulnerability aspects reported by Wilkinson, notably population density and household conditions. (47) People aged 60 years and older have proven to be a determining variable influencing the spread of COVID-19, corroborating data presented by  Even though informal settlements in São Paulo present a variety of conditions, the average number of people per household is similar to the number of people in formal households, while the population density in areas of informal settlements is two-five times higher than in areas of formal households. (49)

VI. COnCLuSIOnS
In this study, we have used spatial modelling to assess the spread of COVID-19 in a metropolitan area in the global South. All analysed spatial regression models, using four selected explanatory variables, have proven to represent at least 85 per cent of reported deaths at the district level. Overall, the local model GWR has shown the best adjustment to the analysed data (R² = 0.900, AICc = 78.76). More importantly, our study has revealed a significant correlation between COVID-19 death incidence and social aspects such as population density, people aged 60 years and older, and informal urban settlements. Our findings have shown that the high population density found in informal settlements in the city of São Paulo is a determining factor influencing the spread of COVID-19, possibly because people living in socioeconomic vulnerability may not be able to adhere to social distancing measures. Our study corroborates reports from the recent literature, pointing to the need for special attention in peripheral areas and informal settlements.

SuPPLeMentAL MAterIAL
Supplemental material for this article is available online.