Chapter 7 Global and local spatial autocorrelation

This session we begin to explore the analysis of local spatial autocorrelation statistics. Spatial autocorrelation is the correlation among data values, strictly due to the relative location proximity of the objects that the data refer to. Remember in the earlier weeks when we spoke about Tobler’s first law of geography - “everything is related to everything else, but near things are more related than distant things”? Spatial autocorrelation is the measure of this correlation between near things. If correlation is not a familiar term, there is a recommended reading for you on blackboard to refresh your memory.

We’ll be making use of the following packages:

dplyr
janitor
sf
sp
spdep
tmap
readxl

7.1 Get data

So, let’s start by getting some data. We are going to take some of the data from past weeks. In getting the data ready you will have one more opportunity to practice how to read data into R but also how to perform some basic spatial checks, transformations and operations. It may seem like you have already done some of this stuff. But that’s the point: to force you to practice. The more you do this stuff, the easier it will be and -trust us- eventually things will click and become second nature. First let’s get the LSOA boundary data.

#The following assumes you have a subdirectory called BoundaryData in your data folder, if not then you will need to change to the pathfile where you store your LSOA shapefile
#option1: import data
manchester_lsoa <- st_read("data/BoundaryData/england_lsoa_2011.shp")

#option2: import data
#Alternatively, you can download the LSOA shapefile from where we saved the data. 
#Important! note that I store it in on my Data folder You HAVE TO change that if needed.
download.file("https://www.dropbox.com/s/h5c1okn4m6t3rqe/BoundaryData.zip?dl=1" , 
              destfile="data/BoundaryData.zip", # change to your local directory
              mode = "wb")
#Important! note that I extract file to the data/BoundaryData folder. you HATE TO change exdir (the directory to extract files to)
unzip("data/BoundaryData.zip", exdir = "data/BoundaryData")
manchester_lsoa <- st_read("data/BoundaryData/BoundaryData/england_lsoa_2011.shp")

Now check the coordinate system.

st_crs(manchester_lsoa)

## Coordinate Reference System:
##   User input: OSGB36 / British National Grid 
##   wkt:
## PROJCRS["OSGB36 / British National Grid",
##     BASEGEOGCRS["OSGB36",
##         DATUM["Ordnance Survey of Great Britain 1936",
##             ELLIPSOID["Airy 1830",6377563.396,299.3249646,
##                 LENGTHUNIT["metre",1]]],
##         PRIMEM["Greenwich",0,
##             ANGLEUNIT["degree",0.0174532925199433]],
##         ID["EPSG",4277]],
##     CONVERSION["British National Grid",
##         METHOD["Transverse Mercator",
##             ID["EPSG",9807]],
##         PARAMETER["Latitude of natural origin",49,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8801]],
##         PARAMETER["Longitude of natural origin",-2,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8802]],
##         PARAMETER["Scale factor at natural origin",0.9996012717,
##             SCALEUNIT["unity",1],
##             ID["EPSG",8805]],
##         PARAMETER["False easting",400000,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8806]],
##         PARAMETER["False northing",-100000,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8807]]],
##     CS[Cartesian,2],
##         AXIS["(E)",east,
##             ORDER[1],
##             LENGTHUNIT["metre",1]],
##         AXIS["(N)",north,
##             ORDER[2],
##             LENGTHUNIT["metre",1]],
##     USAGE[
##         SCOPE["Engineering survey, topographic mapping."],
##         AREA["United Kingdom (UK) - offshore to boundary of UKCS within 49°45'N to 61°N and 9°W to 2°E; onshore Great Britain (England, Wales and Scotland). Isle of Man onshore."],
##         BBOX[49.75,-9,61.01,2.01]],
##     ID["EPSG",27700]]

There is no EPSG code assigned, but notice the datum is given for BNG. Let’s address this.

lsoa_WGS84 <- st_transform(manchester_lsoa, 4326)
st_crs(lsoa_WGS84)

## Coordinate Reference System:
##   User input: EPSG:4326 
##   wkt:
## GEOGCRS["WGS 84",
##     ENSEMBLE["World Geodetic System 1984 ensemble",
##         MEMBER["World Geodetic System 1984 (Transit)"],
##         MEMBER["World Geodetic System 1984 (G730)"],
##         MEMBER["World Geodetic System 1984 (G873)"],
##         MEMBER["World Geodetic System 1984 (G1150)"],
##         MEMBER["World Geodetic System 1984 (G1674)"],
##         MEMBER["World Geodetic System 1984 (G1762)"],
##         MEMBER["World Geodetic System 1984 (G2139)"],
##         ELLIPSOID["WGS 84",6378137,298.257223563,
##             LENGTHUNIT["metre",1]],
##         ENSEMBLEACCURACY[2.0]],
##     PRIMEM["Greenwich",0,
##         ANGLEUNIT["degree",0.0174532925199433]],
##     CS[ellipsoidal,2],
##         AXIS["geodetic latitude (Lat)",north,
##             ORDER[1],
##             ANGLEUNIT["degree",0.0174532925199433]],
##         AXIS["geodetic longitude (Lon)",east,
##             ORDER[2],
##             ANGLEUNIT["degree",0.0174532925199433]],
##     USAGE[
##         SCOPE["Horizontal component of 3D system."],
##         AREA["World."],
##         BBOX[-90,-180,90,180]],
##     ID["EPSG",4326]]

Now we have this WGS84 boundary, so we can plot this to make sure all looks well, and then remove the old object which we no longer need.

plot(st_geometry(lsoa_WGS84))

rm(manchester_lsoa)

Let’s add the burglary data from Greater Manchester. We have practiced this code in previous sessions so we won’t go over it on detail again, but try to remember and understand what each line of code rather than blindly cut and paste. If you don’t understand what each of these lines of codes is doing, raise your hand to call us over to help.

#step1: read data into R
#or you can use this link:"https://www.dropbox.com/s/rs43mg8equy1zu5/gmp_crimes_2021.zip?dl=1"
#(see week5, 5.1.1.2. Activity 2, if you forget how to do!)
crimes <- read_xlsx("data/gmp_crimes_2021.xlsx") %>% clean_names()

#step2: filter out to select burglary
burglary <- filter(crimes, crime_type == "Burglary")

#step3: transform into spatial object
burglary_spatial = st_as_sf(burglary, coords = c("longitude", "latitude"), 
                            crs = 4326, agr = "constant")

#step4: select burglaries that intersect with the Manchester city LSOA map.
bur_mc_intersects <- st_intersects(lsoa_WGS84, burglary_spatial)
bur_mc <- burglary_spatial[unlist(bur_mc_intersects),]

#step5: check results
plot(st_geometry(bur_mc))

We now have the burglaries data, let’s now count how many burglaries there are within each LSOA polygon. This is a point in polygon operation that we covered in week 4 when we counted the number of violent crimes in the buffers of the licenced premises for example. If the code or the notion does not make much sense to you make sure you review the relevant session from week 4.

#step6: point in polygon spatial operation (be patient this can take time)
burglaries_per_lsoa <- bur_mc %>% 
  st_join(lsoa_WGS84, ., left = FALSE) %>% 
  count(code)

#step7: let's rename the column with the count of burglaries (n) into something more meaningful
burglaries_per_lsoa <- rename(burglaries_per_lsoa, burglary = n)

#step8: Plot with tmap
tm_shape(burglaries_per_lsoa) + 
  tm_fill("burglary", 
          fill.scale = tm_scale_intervals(style = "quantile", values = "brewer.reds")) +
  tm_borders(col_alpha = 0.1) +
  tm_title("Burglary counts", size = 0.7) + 
  tm_legend(position = c("right", "bottom"), title.size = 0.8)

#step9: remove redundant objects
rm(burglary)
rm(crimes)
rm(burglary_spatial)
rm(bur_mc)

Do you see any patterns? Are burglaries randomly spread around the map? Or would you say that areas that are closer to each other tend to be more alike? Is there evidence of clustering? Do burglaries seem to appear in certain pockets of the map?

In this session we are going to discuss ways in which you can quantify the answer to this question. We will discuss measures of global spatial autocorrelation, which essentially aim to answer the degree to which areas that are near each other tend to be more alike. We say global because we are interested in the degree of clustering not on the location of the clusters. Later we will also cover techniques to identify local clusters of autocorrelation, but for now we will focus in quantifying whether areas are (on average) alike their neighbours.

7.2 What is a neighbour?

Previously I asked whether areas are alike their neighbours or to areas that are close. But what is a neighbour? Or what do we mean by close? How can one define a set of neighbours for each area? If we want to know if what we measure in a particular area is similar to what happens on its neighbouring areas, we need to establish what we mean by a neighbour.

There are various ways of defining a neighbour. We can say that two areas are neighbours if they share boundaries, if they are next to each other. In this case we talk of neighbours by contiguity. By contiguous you can, at the same time, mean all areas that share common boundaries (what we call contiguity using the rook criteria, like in chess) or areas that share common boundaries and common corners, that is, that have any point in common (and we call this contiguity using the queen criteria).

When defining neighbours by contiguity we may also specify the order of contiguity. First order contiguity means that we are focusing on areas immediately contiguous. Second order means that we consider neighbours only those areas that are immediately contiguous to our first order neighbours (only the yellow areas in the figure below) and you could go on and on. Look at the graphic below for clarification:

Figure 1

Source

Alternatively we may define neighbours by distance. You can consider neighbours those areas that distant-wise are close to each other (regardless of whether boundaries are shared). In other words, areas will be defined as neighbours if they are within a specified radius.

In sum, adjacency is an important concept in some spatial analysis. In some cases objects are considered adjacent when they “touch”, e.g. neighbouring countries. Contiguity measures tend to be more common when studying areas. It can also be based on distance. This is the most common approach when analysing point data, but can also be relevant when studying areas.

7.3 Putting ‘neighbourness’ in our analysis - constructing a spatial weight matrix

You will come across the term spatial weight matrix at some point or, using mathematical notation, $W$. Essentially the spatial weight matrix is a $n$ by $n$ matrix with ones and zeroes (in the case of contiguity-based definitions (v.s. distance-based)) identifying if any two observations are neighbours or not (1 or 0). You can think of the spatial weight matrix as a new data table that we are constructing with our definition of neighbours (whether that be rook or queen definition).

How can we build such a matrix? To illustrate, let’s focus on Manchester’s City Centre. Calculating a spatial weights matrix is a computationally intensive process, which means it takes a long time (especially in older laptops…!). The larger area you have (which will have more LSOAs) the longer this will take.

7.3.1 Activity 1: Burglaries in Manchester City Centre ward

We will use familiar code to clip the spatial object with the counts of burglaries to only those that intersect with the City Centre ward. Again, we have covered this code elsewhere, so we won’t explain here in detail. But don’t just cut and paste, if there is anything in this code you don’t fully understand you are expected to ask us.

#Read a geojson file with Manchester wards
manchester_ward <- st_read("https://raw.githubusercontent.com/eonk/cm_book/main/data/wards.geojson")

## Reading layer `wards' from data source 
##   `https://raw.githubusercontent.com/eonk/cm_book/main/data/wards.geojson' 
##   using driver `GeoJSON'
## Simple feature collection with 215 features and 12 fields
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: 351664 ymin: 381168.6 xmax: 406087.5 ymax: 421039.8
## Projected CRS: OSGB36 / British National Grid

#Create a new object that only has the city centre ward
df1 <- manchester_ward %>%
  filter(wd16nm == "City Centre")
#Change coordinate systems
cc_ward <- st_transform(df1, 4326)
#Check if their coordinate systems match
st_crs(cc_ward) == st_crs(burglaries_per_lsoa)

## [1] TRUE

#Get rid of objects we no longer need
rm(manchester_ward)
rm(df1)
#Intersect
cc_intersects <- st_intersects(cc_ward, burglaries_per_lsoa)
cc_burglary <- burglaries_per_lsoa[unlist(cc_intersects),]
#Plot with tmap
tmap_mode("view")

## ℹ tmap mode set to "view".

tm_shape(cc_burglary) + 
  tm_fill(
    "burglary",
    fill.scale = tm_scale_intervals(style = "quantile", values = "brewer.reds"),
    id = "code"
  ) +
  tm_borders() +
  tm_title("Burglary counts", size = 0.7) +
  tm_legend(title.size = 0.8) +
  tm_view(legend.position = c("right", "top"))

## 
## ── tmap v3 code detected ───────────

So now we have a new spatial object cc_burglary with the 23 LSOA units that compose the City Centre Ward of Manchester. By focusing in a smaller subset of areas we can understand perhaps a bit better what comes next. But again we carry on. Do you perceive here some degree of spatial autocorrelation?

7.3.2 Activity 2: Manually explore neighbours

The id argument in the tm_fill ensures that when you click over any of the areas you get not only the count of burglaries in that LSOA (the quantity we are mapping) gets displayed within a bubble, but you also get to see the code that identifies that LSOA.

Move your cursor over the LSOA covering the West of Beswick (E01033688). You will see this area had 34 burglaries in 2019. Using the rook criteria identify the first order neighbors of this LSOA. List their identifiers. Are things different if we use the queen criteria? If so, how does it change? Think and think hard about what the lecture by Luc Anselin discussed. Have you identified all the neighbours of this area? (there are multiple ways of answering this question, just make sure you reason your answer).

7.4 Creating a list of neighbours

It would be very, very tedious having to identify the neighbours of all the areas in our study area by hand, in the way we have done aboce in Activity 2. That’s why we love computers. We can automate tedious work so that they do it and we have more time to do fun stuff. We can use code to get the computer to establish what areas are next to each other (if we are using a contiguity based definition of being a neighbour).

7.4.1 Activity 3: Creating a neighbour list

In order to identify neighbours we will use the poly2nb() function from the spdep package that we loaded at the beginning of our session. The spdep package provides basic functions for building neighbour lists and spatial weights, tests for spatial autocorrelation for areal data like Moran’s I (more on this below), and functions for fitting spatial regression models.

This function builds a neighbours list based on regions with contiguous boundaries. If you look at the documentation you will see that you can pass a “queen” argument that takes TRUE or FALSE as options. If you do not specify this argument the default is set to TRUE. That is, if you don’t specify queen = FALSE this function will return a list of first order neighbours using the Queen criteria. If TRUE, a single shared boundary point meets the contiguity condition.

w <- poly2nb(cc_burglary, row.names=cc_burglary$code)
class(w)

## [1] "nb"

This has created an object of class nb - which is a neighbour list object. We can get some idea of what’s there if we ask for a summary.

summary(w)

## Neighbour list object:
## Number of regions: 22 
## Number of nonzero links: 96 
## Percentage nonzero weights: 19.83471 
## Average number of links: 4.363636 
## Link number distribution:
## 
## 2 3 4 5 6 7 8 
## 3 3 7 4 3 1 1 
## 3 least connected regions:
## E01033673 E01033674 E01033688 with 2 links
## 1 most connected region:
## E01033658 with 8 links

This is basically telling us that using this criteria each LSOA polygon has an average of 4.3 neighbours (when we just focus on the city centre) and that all areas have some neighbours (there is no islands). The link number distribution gives you the number of links (neighbours) per area. So here we have 3 polygons with 2 neighbours, 3 with 3, 7 with 4, and so on. The summary function here also identifies the areas sitting at both extreme of the distribution.

For more details we can look at the structure of w.

str(w)

## List of 22
##  $ : int [1:4] 2 6 11 22
##  $ : int [1:5] 1 3 4 6 8
##  $ : int [1:5] 2 5 6 8 12
##  $ : int [1:3] 2 8 21
##  $ : int [1:7] 3 6 9 10 12 14 18
##  $ : int [1:6] 1 2 3 5 10 11
##  $ : int [1:3] 9 13 17
##  $ : int [1:6] 2 3 4 12 18 21
##  $ : int [1:8] 5 7 13 14 17 18 19 20
##  $ : int [1:4] 5 6 11 14
##  $ : int [1:4] 1 6 10 22
##  $ : int [1:4] 3 5 8 18
##  $ : int [1:3] 7 9 14
##  $ : int [1:4] 5 9 10 13
##  $ : int [1:4] 16 19 20 21
##  $ : int [1:2] 15 19
##  $ : int [1:2] 7 9
##  $ : int [1:6] 5 8 9 12 20 21
##  $ : int [1:4] 9 15 16 20
##  $ : int [1:5] 9 15 18 19 21
##  $ : int [1:5] 4 8 15 18 20
##  $ : int [1:2] 1 11
##  - attr(*, "class")= chr "nb"
##  - attr(*, "region.id")= chr [1:22] "E01005065" "E01005066" "E01005128" "E01005212" ...
##  - attr(*, "call")= language poly2nb(pl = cc_burglary, row.names = cc_burglary$code)
##  - attr(*, "type")= chr "queen"
##  - attr(*, "snap")= num 9e-08
##  - attr(*, "sym")= logi TRUE
##  - attr(*, "ncomp")=List of 2
##   ..$ nc     : int 1
##   ..$ comp.id: int [1:22] 1 1 1 1 1 1 1 1 1 1 ...

We can graphically represent the links using the following code:

#We first plot the boundaries
plot(st_geometry(cc_burglary), col='gray', border='blue', lwd=2)
#Then we use the st_centroid and st_coordinates functions to obtain the coordinates of the polygon centroids
xy <- st_coordinates(st_centroid(cc_burglary))

## Warning: st_centroid assumes attributes are constant over geometries

#Then we draw lines between the polygons centroids for neighbours that are listed as linked in w
plot(w, xy, col='red', lwd=2, add=TRUE)

7.5 Generating the weight matrix

From this neighbourhood list we can specify our weight matrix.

7.5.1 Activity 4: Building a spatial weights matrix

We can transform our neighbourhood list object w into a spatial weights matrix. A spatial weights matrix reflects the intensity of the geographic relationship between observations. For this we use the spdep function nb2mat().

wm <- nb2mat(w, style='B')
wm

##           [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
## E01005065    0    1    0    0    0    1    0    0    0     0     1     0     0
## E01005066    1    0    1    1    0    1    0    1    0     0     0     0     0
## E01005128    0    1    0    0    1    1    0    1    0     0     0     1     0
## E01005212    0    1    0    0    0    0    0    1    0     0     0     0     0
## E01033653    0    0    1    0    0    1    0    0    1     1     0     1     0
## E01033654    1    1    1    0    1    0    0    0    0     1     1     0     0
## E01033655    0    0    0    0    0    0    0    0    1     0     0     0     1
## E01033656    0    1    1    1    0    0    0    0    0     0     0     1     0
## E01033658    0    0    0    0    1    0    1    0    0     0     0     0     1
## E01033659    0    0    0    0    1    1    0    0    0     0     1     0     0
## E01033661    1    0    0    0    0    1    0    0    0     1     0     0     0
## E01033662    0    0    1    0    1    0    0    1    0     0     0     0     0
## E01033664    0    0    0    0    0    0    1    0    1     0     0     0     0
## E01033667    0    0    0    0    1    0    0    0    1     1     0     0     1
## E01033672    0    0    0    0    0    0    0    0    0     0     0     0     0
## E01033673    0    0    0    0    0    0    0    0    0     0     0     0     0
## E01033674    0    0    0    0    0    0    1    0    1     0     0     0     0
## E01033677    0    0    0    0    1    0    0    1    1     0     0     1     0
## E01033681    0    0    0    0    0    0    0    0    1     0     0     0     0
## E01033682    0    0    0    0    0    0    0    0    1     0     0     0     0
## E01033683    0    0    0    1    0    0    0    1    0     0     0     0     0
## E01033688    1    0    0    0    0    0    0    0    0     0     1     0     0
##           [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22]
## E01005065     0     0     0     0     0     0     0     0     1
## E01005066     0     0     0     0     0     0     0     0     0
## E01005128     0     0     0     0     0     0     0     0     0
## E01005212     0     0     0     0     0     0     0     1     0
## E01033653     1     0     0     0     1     0     0     0     0
## E01033654     0     0     0     0     0     0     0     0     0
## E01033655     0     0     0     1     0     0     0     0     0
## E01033656     0     0     0     0     1     0     0     1     0
## E01033658     1     0     0     1     1     1     1     0     0
## E01033659     1     0     0     0     0     0     0     0     0
## E01033661     0     0     0     0     0     0     0     0     1
## E01033662     0     0     0     0     1     0     0     0     0
## E01033664     1     0     0     0     0     0     0     0     0
## E01033667     0     0     0     0     0     0     0     0     0
## E01033672     0     0     1     0     0     1     1     1     0
## E01033673     0     1     0     0     0     1     0     0     0
## E01033674     0     0     0     0     0     0     0     0     0
## E01033677     0     0     0     0     0     0     1     1     0
## E01033681     0     1     1     0     0     0     1     0     0
## E01033682     0     1     0     0     1     1     0     1     0
## E01033683     0     1     0     0     1     0     1     0     0
## E01033688     0     0     0     0     0     0     0     0     0
## attr(,"call")
## nb2mat(neighbours = w, style = "B")

Starting from our neighbours list (w), in which regions are either listed as neighbours or are absent (thus not in the set of neighbours for some definition), the function creates an $n$ by $n$ weights matrix (where $n$ is the number of neighbourhoods here) with values given by the coding scheme style chosen (specified with the style= parameter in the nb2mat() function. Here we specify this value as B which results in basic binary coding. That means that this matrix has values of 0 or 1 indicating whether the elements listed in the rows are adjacent (using our definition, which in this case was the Queen criteria) with each other.

The diagonal is full of zeroes. This is because we exclude “self-influence” and specify that an area cannot be a neighbour of itself (formally $w_{ii} = 0$ for all $i = 1,...,n$). Then, for each other cell, the matrix contains the “spatial influence” of unit $j$ on unit $i$.

For example, the first column there is the first LSOA in our neighbour list object (w). Look back to when we printed the list of neighbours with the str(w) function, just above. You can see, for the first like the output was: $ : int [1:4] 2 6 11 22. So we see, this first LSOA has 4 neighbours, and these are the 2nd, 6th, 11th, and 22nd LSOAs. Sure enough, if you look across this first row in our weights matrix (wm), you will see that every value is a 0 except for four 1s, one at the 2nd column, one in the 6th, one in the 11th, and one in the 22nd. Here the 1 is telling us that yes, this is a neighbour (while 0s tell us no, these are not neighbours), as we used the binary coding scheme (style = 'B').

When our spatial weights matrix looks like this, it is symmetrical. That is the influence of one neighbour on the other is the same as it’s influence on that neighbourhood - if they neighbour, their influence is 1, if they do not, it is 0.

7.5.2 Activity 5: Row standardised spatial weights matrix

So with binary coding scheme, we get a spatial influence of 1 if the LSOA is a neighbour, and 0 if it is not a neighbour. But we can also get more informative than that.

In many computations we will see that the matrix is row standardised. We can obtain a row standardise matrix changing the style= parameter to style='W':

wm_rs <- nb2mat(w, style='W')
wm_rs

##                [,1]      [,2]      [,3]      [,4]      [,5]      [,6]      [,7]
## E01005065 0.0000000 0.2500000 0.0000000 0.0000000 0.0000000 0.2500000 0.0000000
## E01005066 0.2000000 0.0000000 0.2000000 0.2000000 0.0000000 0.2000000 0.0000000
## E01005128 0.0000000 0.2000000 0.0000000 0.0000000 0.2000000 0.2000000 0.0000000
## E01005212 0.0000000 0.3333333 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## E01033653 0.0000000 0.0000000 0.1428571 0.0000000 0.0000000 0.1428571 0.0000000
## E01033654 0.1666667 0.1666667 0.1666667 0.0000000 0.1666667 0.0000000 0.0000000
## E01033655 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## E01033656 0.0000000 0.1666667 0.1666667 0.1666667 0.0000000 0.0000000 0.0000000
## E01033658 0.0000000 0.0000000 0.0000000 0.0000000 0.1250000 0.0000000 0.1250000
## E01033659 0.0000000 0.0000000 0.0000000 0.0000000 0.2500000 0.2500000 0.0000000
## E01033661 0.2500000 0.0000000 0.0000000 0.0000000 0.0000000 0.2500000 0.0000000
## E01033662 0.0000000 0.0000000 0.2500000 0.0000000 0.2500000 0.0000000 0.0000000
## E01033664 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.3333333
## E01033667 0.0000000 0.0000000 0.0000000 0.0000000 0.2500000 0.0000000 0.0000000
## E01033672 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## E01033673 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## E01033674 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.5000000
## E01033677 0.0000000 0.0000000 0.0000000 0.0000000 0.1666667 0.0000000 0.0000000
## E01033681 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## E01033682 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## E01033683 0.0000000 0.0000000 0.0000000 0.2000000 0.0000000 0.0000000 0.0000000
## E01033688 0.5000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
##                [,8]      [,9]     [,10]     [,11]     [,12]     [,13]     [,14]
## E01005065 0.0000000 0.0000000 0.0000000 0.2500000 0.0000000 0.0000000 0.0000000
## E01005066 0.2000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## E01005128 0.2000000 0.0000000 0.0000000 0.0000000 0.2000000 0.0000000 0.0000000
## E01005212 0.3333333 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## E01033653 0.0000000 0.1428571 0.1428571 0.0000000 0.1428571 0.0000000 0.1428571
## E01033654 0.0000000 0.0000000 0.1666667 0.1666667 0.0000000 0.0000000 0.0000000
## E01033655 0.0000000 0.3333333 0.0000000 0.0000000 0.0000000 0.3333333 0.0000000
## E01033656 0.0000000 0.0000000 0.0000000 0.0000000 0.1666667 0.0000000 0.0000000
## E01033658 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.1250000 0.1250000
## E01033659 0.0000000 0.0000000 0.0000000 0.2500000 0.0000000 0.0000000 0.2500000
## E01033661 0.0000000 0.0000000 0.2500000 0.0000000 0.0000000 0.0000000 0.0000000
## E01033662 0.2500000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## E01033664 0.0000000 0.3333333 0.0000000 0.0000000 0.0000000 0.0000000 0.3333333
## E01033667 0.0000000 0.2500000 0.2500000 0.0000000 0.0000000 0.2500000 0.0000000
## E01033672 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## E01033673 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## E01033674 0.0000000 0.5000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## E01033677 0.1666667 0.1666667 0.0000000 0.0000000 0.1666667 0.0000000 0.0000000
## E01033681 0.0000000 0.2500000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## E01033682 0.0000000 0.2000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## E01033683 0.2000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## E01033688 0.0000000 0.0000000 0.0000000 0.5000000 0.0000000 0.0000000 0.0000000
##           [,15] [,16]     [,17]     [,18] [,19]     [,20]     [,21] [,22]
## E01005065  0.00  0.00 0.0000000 0.0000000 0.000 0.0000000 0.0000000  0.25
## E01005066  0.00  0.00 0.0000000 0.0000000 0.000 0.0000000 0.0000000  0.00
## E01005128  0.00  0.00 0.0000000 0.0000000 0.000 0.0000000 0.0000000  0.00
## E01005212  0.00  0.00 0.0000000 0.0000000 0.000 0.0000000 0.3333333  0.00
## E01033653  0.00  0.00 0.0000000 0.1428571 0.000 0.0000000 0.0000000  0.00
## E01033654  0.00  0.00 0.0000000 0.0000000 0.000 0.0000000 0.0000000  0.00
## E01033655  0.00  0.00 0.3333333 0.0000000 0.000 0.0000000 0.0000000  0.00
## E01033656  0.00  0.00 0.0000000 0.1666667 0.000 0.0000000 0.1666667  0.00
## E01033658  0.00  0.00 0.1250000 0.1250000 0.125 0.1250000 0.0000000  0.00
## E01033659  0.00  0.00 0.0000000 0.0000000 0.000 0.0000000 0.0000000  0.00
## E01033661  0.00  0.00 0.0000000 0.0000000 0.000 0.0000000 0.0000000  0.25
## E01033662  0.00  0.00 0.0000000 0.2500000 0.000 0.0000000 0.0000000  0.00
## E01033664  0.00  0.00 0.0000000 0.0000000 0.000 0.0000000 0.0000000  0.00
## E01033667  0.00  0.00 0.0000000 0.0000000 0.000 0.0000000 0.0000000  0.00
## E01033672  0.00  0.25 0.0000000 0.0000000 0.250 0.2500000 0.2500000  0.00
## E01033673  0.50  0.00 0.0000000 0.0000000 0.500 0.0000000 0.0000000  0.00
## E01033674  0.00  0.00 0.0000000 0.0000000 0.000 0.0000000 0.0000000  0.00
## E01033677  0.00  0.00 0.0000000 0.0000000 0.000 0.1666667 0.1666667  0.00
## E01033681  0.25  0.25 0.0000000 0.0000000 0.000 0.2500000 0.0000000  0.00
## E01033682  0.20  0.00 0.0000000 0.2000000 0.200 0.0000000 0.2000000  0.00
## E01033683  0.20  0.00 0.0000000 0.2000000 0.000 0.2000000 0.0000000  0.00
## E01033688  0.00  0.00 0.0000000 0.0000000 0.000 0.0000000 0.0000000  0.00
## attr(,"call")
## nb2mat(neighbours = w, style = "W")

Row standardisation of a matrix ensure that the sum of the values across each row add up to 1. So, for example, if you have four neighbours and that has to add up to 4, you need to divide 1 by 4, which gives you 0.25. So if we look back at our first row there, you can see in the 2nd, 6th, 11th, and 22nd column now a 0.25 instead of the 1 with the binary coding, representing the spatial influence for each of the neighbours.

Row standardisation ensures that all weights are between 0 and 1. This facilities the interpretation of operation with the weights matrix as an averaging of neighboring values, and allows for the spatial parameter used in our analyses to be comparable between models.

In this case our spatial weights matrix is not symmetrical. The spatial influence of one neighbourhood on the other depends on how many other neighbours it has. So for example, while the influence on LSOA 1 from LSOA 2 is 0.25 (since LSOA 1 has 4 neighbours), the influence of LSOA 1 on LSOA 2 is only 0.20, since LSOA 2 has 5 neighbours.

7.5.3 Activity 6: Spatial weights list

Besides a spatial weights matrix, we can also have a spatial weights list. Before we can use the functions from spdep to compute the global Moran’s I we need to create a listw type spatial weights object (instead of the matrix we used above). To get the same value as above in the wm object, we use style='B' to use binary (TRUE/FALSE) distance weights.

ww <-  nb2listw(w, style='B')
ww

## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 22 
## Number of nonzero links: 96 
## Percentage nonzero weights: 19.83471 
## Average number of links: 4.363636 
## 
## Weights style: B 
## Weights constants summary:
##    n  nn S0  S1   S2
## B 22 484 96 192 1888

7.6 Moran’s I

The most well known measure of spatial autocorrelation is the Moran’s I. It was developed by Patrick Alfred Pierce Moran, an Australian statistician. You can find the formula and some explanation in the wikipedia article. The video lecture by Luc Anselin covers an explanation of Moran’s I. We strongly recommend you watch the video. You can also find helpful this link if things are still unclear. The formula you see may look intimidating but it is nothing but the formula for standard correlation expanded to incorporate the spatial weight matrix.

$ I = * $

To compute this in R, we can use the moran() function. Have a look at ?moran to see a description of this in R. You will see the above formula but in code form: I = (n sum_i sum_j w_ij (x_i - xbar) (x_j - xbar)) / (S0 sum_i (x_i - xbar)^2). (Note: S0 is refers to the sum of the weights, which is also same as $\sum_i \sum_j w_{ij}$)

7.6.1 Activity 7: Calculate Moran’s I

As the function is defined as moran(y, ww, n, Szero(ww)), we will need to add the arguments of x - our variable of interest, ww - our weights list, n - the number of observations, and S0 - the sum of the weights (note that when we have row standardised weights then S0 is the same as n, so actually that part can be ignored from the equation, but we still must supply this information). In order to supply this S0 we can use the Szero() function (in the spdep package) and pass to it our weights object (ww). You might think it odd to supply these parameters as they can be derived from the “ww” object which has that information. Anyway, we supply them and it works. There probably are cases where it makes sense to use other values.

moran(cc_burglary$burglary, ww, n=length(ww$neighbours), S0=Szero(ww))

## $I
## [1] 0.1342341
## 
## $K
## [1] 8.622859

So the Moran’s I here is 0.13, which is not very large. The Index is usually bounded by -1 and 1 (the exact bounds depend on the weights matrix used - when the matrix is row-standardised it will always be between -1 and 1), so 0.13 is not large, but it is still above zero, so we might want to explore whether this is significant using a test of statistical significance for this statistic.

The Spatial Autocorrelation (Global Moran’s I) tool is an inferential statistic, which means that the results of the analysis are always interpreted within the context of its null hypothesis. For the Global Moran’s I statistic, the null hypothesis states that the attribute being analysed is randomly distributed among the features in your study area; said another way, the spatial processes promoting the observed pattern of values is random chance. Imagine that you could pick up the values for the attribute you are analysing and throw them down onto your features, letting each value fall where it may. This process (picking up and throwing down the values) is an example of a random chance spatial process. When the p-value returned by this tool is statistically significant, you can reject the null hypothesis.

In some software you can use statistical tests invoking asymptotic theory, but like Luc Anselin mentions in his videos, the most appropriate way of doing these tests is by using a computational approach, such as a Monte Carlo procedure. The way Monte Carlo works is that the values of burglary are randomly assigned to the polygons, and the Moran’s I is computed. This is repeated several times to establish a distribution of expected values under the null hypothesis. The observed value of Moran’s I is then compared with the simulated distribution to see how likely it is that the observed values could be considered a random draw.

If confused, watch this quick video on monte carlo simulations.

We use the function moran.mc() to run a permutation test for Moran’s I statistic calculated by using some number of random permutations of our numeric vector, for the given spatial weighting scheme, to establish the rank of the observed statistic in relation to the simulated values.

We need to specify our variable of interest (burglary), the listw object we created earlier (ww), and the number of permutations we want to run (here we choose 99).

set.seed(1234) # The seed number you choose is the starting point used in the generation of a sequence of random numbers, which is why (provided you use the same pseudo-random number generator) you'll obtain the same results given the same seed number. 
burg_moranmc_results <- moran.mc(cc_burglary$burglary, ww, nsim=99)
burg_moranmc_results

## 
##  Monte-Carlo simulation of Moran I
## 
## data:  cc_burglary$burglary 
## weights: ww  
## number of simulations + 1: 100 
## 
## statistic = 0.13423, observed rank = 90, p-value = 0.1
## alternative hypothesis: greater

So, the probability of observing this Moran’s I if the null hypothesis was true is 0.1. This is higher than our alpha level of 0.05. In this case, we can not reject our null hypothesis of spatial randomness, and must conclude that there isn’t a significant global spatial autocorrelation in our data.

7.6.2 Activity 8: Moran scatter plot

We can make a “Moran scatter plot” to visualize spatial autocorrelation. Note the row standardisation of the weights matrix.

rwm <- mat2listw(wm, style='W')
# Checking if rows add up to 1 (they should)
mat <- listw2mat(rwm)
#This code is simply adding each row to see if we get one when we add their values up, we are only displaying the first 15 rows in the matrix
apply(mat, 1, sum)[1:15]

## E01005065 E01005066 E01005128 E01005212 E01033653 E01033654 E01033655 E01033656 
##         1         1         1         1         1         1         1         1 
## E01033658 E01033659 E01033661 E01033662 E01033664 E01033667 E01033672 
##         1         1         1         1         1         1         1

Now we can plot:

moran.plot(cc_burglary$burglary, rwm)

The X axis represents the values of our burglary variable in each unit (each LSOA) and the Y axis represents a spatial lag of this variable. A spatial lag in this context is simply the weighted average value of the burglary count in the areas that are considered neighbours of each LSOA.

So we are plotting the value of burglary against the weighted average value of burglary in the neighbours. And you can see the correlation is almost flat here. As with any correlation measure, you could get positive spatial autocorrelation, that would mean that as you move further to the right in the X axis you have higher levels of burglary in the surrounding area. This is what we see here. But the correlation is fairly low and as we saw is not statistically significant. You can also obtain negative spatial autocorrelation. That is, that would mean that areas with high level of crime tend (it’s all about the global average effect!) to be surrounded by areas with low levels of crime. This is clearly not what we see here.

It is very important to understand that global statistics like the spatial autocorrelation (Global Moran’s I) tool assess the overall pattern and trend of your data. They are most effective when the spatial pattern is consistent across the study area. In other words, you may have clusters (local pockets of autocorrelation), without having clustering (global autocorrelation). This may happen if the sign of the clusters negate each other.

But don’t just take our word for how important this is, or how it’s commonly applied in criminological research. Instead, now that you’ve gone through on how to do this, and have begun to get a sense of understanding, read the following paper on https://link.springer.com/article/10.1023/A:1007592124641 where the authors make use of Moran’s I to explain spatial characteristics of homicide. You will likely see this in other papers as well, and now you will know what it means and why it’s important.

7.7 Local spatial autocorrelation

So now we know about global measures of spatial association, in particular the Moran’s I statistic, which provide a mechanism to make inferences about a population from a sample. While this is useful for us to be able to assess quantitatively whether crime events cluster in a non-random manner, in the words of Jerry Ratcliffe “this simply explains what most criminal justice students learn in their earliest classes.” For example, consider the study of robberies in Philadelphia:

Aggregated to the police districts, this returns a global Moran’s I value (range 0 to 1) of 0.56, which suggests that there is clustering of robbery within police sectors. (Ratcliffe, Jerry. “Crime mapping: spatial and temporal challenges.” Handbook of quantitative criminology. Springer, New York, NY, 2010. 5-24.). While this should hardly be surprising given the above map, we as humans are likely to see patterns even if there aren’t any, and that is why these tests are helpful to verify (or disspell) our assumptions. But the global autocorrelation statistic only tells us about overall pattern, whether there is any clustering. It does not tell us about where this clustering is, or what it looks like (hotspots? cold spots? outliers?).

In this section we will learn about local indicators of spatial association (LISA) and show how they allow for the decomposition of global indicators, such as Moran’s I, into the contribution of each observation. The LISA statistics serve two purposes. On one hand, they may be interpreted as indicators of local pockets of nonstationarity, or hot spots. On the other hand, they may be used to assess the influence of individual locations on the magnitude of the global statistic and to identify “outliers” (Anselin, Luc. “Local indicators of spatial association—LISA.” Geographical analysis 27.2 (1995): 93-115.).

7.7.1 Activity 9: Get data and weights for all Manchester

To explore local indicators of spatial correlation, let’s go back to using all of Manchester, rather than just City Centre ward. We want to have enough data to see local variation, and while the code may take slightly longer to run, we can have a go at more meaningful stats.

So this is our burglaries_per_lsoa object that we’re referring back to. To check what our data look like, we can always plot again with tmap:

tmap_mode("plot")

## ℹ tmap mode set to "plot".

tm_shape(burglaries_per_lsoa) + 
  tm_fill(
    "burglary",
    fill.scale = tm_scale_intervals(style = "quantile", values = "brewer.reds")
  ) +
  tm_borders(col_alpha = 0.1) +
  tm_title("Burglary counts", size = 0.7) +
  tm_legend(position = c("right", "bottom"), title.size = 0.8)

## 
## ── tmap v3 code detected ───────────

Looks good! Now that we have the data we can generate the weight matrix. Again, what we do here is stuff we did above for the global correlation. In fact, if you did the optional homework already, you will also have run this code.

#Generate list of neighbours using the Queen criteria
w <- poly2nb(burglaries_per_lsoa, row.names=burglaries_per_lsoa$code)
#Generate list with weights using row standardisation
ww <-  nb2listw(w, style='W')

7.7.2 Activity 10: Exploring a Local Indicator of Spatial Association: Local Moran’s I.

To illustrate Local Indicators of Spatial association (LISA) we will demonstrate the example of the Local Moran’s I. The first step before we can generate this particular LISA map is to compute this local Moran’s I. The initial part of the video presentation by Luc Anselin that we expected you to watch explains the formula and logic underpinning these computations and we won’t reiterate here that detail. But at least a a general reminder:

Global tests for spatial autocorrelation [introduced last week] are calculated from the local relationships between the values observed at a spatial entity and its neighbours, for the neighbour definition chosen. Because of this, we can break global measures down into their components, and by extension, construct localised testsintended to detect ‘clusters’ – observations with very similar neighbours –and ‘hotspots’ [spatial outliers] – observations with very different neighbours. (Bivand et al. 2008, highlights added)

Let’s first look at the Moran’s scatterplot:

moran.plot(burglaries_per_lsoa$burglary, ww)

Notice how the plot is split in 4 quadrants. The top right corner belongs to areas that have high level of burglary and are surrounded by other areas that have above the average level of burglary. This are the high-high locations that Luc Anselin referred to. The bottom left corner belongs to the low-low areas. These are areas with low level of burglary and surrounded by areas with below average levels of burglary. Both the high-high and low-low represent clusters. A high-high cluster is what you may refer to as a hot spot. And the low-low clusters represent cold spots. In the opposite diagonal we have spatial outliers. They are not outliers in the standard sense, extreme observations, they are outliers in that they are surrounded by areas that are very unlike them. So you could have high-low spatial outliers, areas with high levels of burglary and low levels of surrounding burglary, or low-high spatial outliers, areas that have themselves low levels of burglary (or whatever else we are mapping) and that are surrounded by areas with above average levels of burglary.

You can also see here that the positive spatial autocorrelation is more noticeable when we focus on the whole of Manchester city, unlike what we observed when only looked at the city centre. You can check this running the global Moran’s I.

moran(burglaries_per_lsoa$burglary, ww, n=length(ww$neighbours), S0=Szero(ww))

## $I
## [1] 0.3805359
## 
## $K
## [1] 29.6531

moran.mc(burglaries_per_lsoa$burglary, ww, nsim=99999)

## 
##  Monte-Carlo simulation of Moran I
## 
## data:  burglaries_per_lsoa$burglary 
## weights: ww  
## number of simulations + 1: 1e+05 
## 
## statistic = 0.38054, observed rank = 1e+05, p-value = 1e-05
## alternative hypothesis: greater

You can see that the global Moran’s I statistic is 0.38 and that the pseudo p-value we generate with our computational Monte Carlo method is highly significant. There is indeed global spatial autocorrelation, when we look at all of Manchester (not just city centre ward). Knowing this we can try to decompose this, figure out what is driving this global measure.

To compute the local Moran we can use a function from the spdep package.

locm_bm <- localmoran(burglaries_per_lsoa$burglary, ww)
summary(locm_bm)

##        Ii                 E.Ii                Var.Ii              Z.Ii         
##  Min.   :-2.200079   Min.   :-2.904e-01   Min.   :0.000001   Min.   :-5.61562  
##  1st Qu.:-0.006682   1st Qu.:-1.971e-03   1st Qu.:0.007952   1st Qu.:-0.05812  
##  Median : 0.084961   Median :-8.279e-04   Median :0.039497   Median : 0.58237  
##  Mean   : 0.380536   Mean   :-3.571e-03   Mean   :0.149754   Mean   : 0.58209  
##  3rd Qu.: 0.275671   3rd Qu.:-1.729e-04   3rd Qu.:0.119675   3rd Qu.: 1.13208  
##  Max.   :11.795065   Max.   :-2.000e-08   Max.   :6.249463   Max.   : 6.46961  
##  Pr(z != E(Ii))  
##  Min.   :0.0000  
##  1st Qu.:0.1889  
##  Median :0.4300  
##  Mean   :0.4532  
##  3rd Qu.:0.7027  
##  Max.   :0.9955

The first column provides the summary statistic for the local moran statistic. Being local you will have one for each of the areas. The last column gives you a p value for this statistic. In order to produce the LISA map we need to do some previous work. First we are going to create some new variables that we are going to need:

First we scale the variable of interest. When we scale burglary what we are doing is re-scaling the values so that the mean is zero. This is the process of computing a z-score, which you achieve through taking the value, subtracting the mean, and dividing by the standard deviation. See an explanation of what this does here.

We can use scale(), which is a generic function whose default method centers and/or scales the variable. Here centering is done by subtracting the mean (omitting NAs) the corresponding columns, and scaling is done by dividing the (centered) variable by their standard deviations:

#scale the variable of interest and save it to a new column
burglaries_per_lsoa$s_burglary <- scale(burglaries_per_lsoa$burglary) %>% as.vector()

We’ve also added as.vector() to the end, to make sure that the data type we get out of this is a vector, that maps neatly into our dataframe.

Now we also want to account for the spatial dependence of our values. Remember how we keep saying that “The First Law of Geography”, according to Waldo Tobler, is “everything is related to everything else, but near things are more related than distant things.” Seriously, we should all just tattoo this onto our foreheads because this is the key message of the module…!

So what do we mean by this spatial dependence? When a value observed in one location depends on the values observed at neighbouring locations, there is a spatial dependence. And spatial data may show spatial dependence in the variables and error terms.

Why should spatial dependence occur? There are two reasons commonly given. First, data collection of observations associated with spatial units may reflect measurement error. This happens when the boundaries for which information is collected do not accurately reflect the nature of the underlying process generating the sample data. A second reason for spatial dependence is that the spatial dimension of a social or economic characteristic may be an important aspect of the phenomenon. For example, based on the premise that location and distance are important forces at work, regional science theory relies on notions of spatial interaction and diffusion effects, hierarchies of place and spatial spillovers.

There are two types of dependence, spatial error and spatial lag. Here we focus on spatial lag.

Spatial lag is when the dependent variable y in place i is affected by the independent variables in both place i and j.

This will be important to keep in mind when considering spatial regression. With spatial lag in ordinary least square regression, the assumption of uncorrelated error terms is violated, because near things will have associated error terms. Similarly, the assumption of independent observations is also violated, as the observations are influenced by the other observations near them. As a result, the estimates are biased and inefficient. Spatial lag is suggestive of a possible diffusion process – events in one place predict an increased likelihood of similar events in neighboring places.

So to be able to account for the spatial lag in our model, we must create a variable to account for this. We can do this with the lag.listw() function. Remember: a spatial lag in this context is simply the average value of the burglary count in the areas that are considered neighbours of each LSOA. So we are plotting the value of burglary against the average value of burglary in the neighbours.

For this we need our listw object, which is the ww object created earlier, when we generated the list with weights using row standardisation. We then pass this listw object into the lag.listw() function, which computes the spatial lag of a numeric vector using a listw sparse representation of a spatial weights matrix.

#create a spatial lag variable and save it to a new column
burglaries_per_lsoa$lag_s_burglary <- lag.listw(ww, burglaries_per_lsoa$s_burglary)

Make sure to check the summaries to ensure nothing weird is going on

summary(burglaries_per_lsoa$s_burglary)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -1.1330 -0.5676 -0.2196  0.0000  0.2588  9.0013

summary(burglaries_per_lsoa$lag_s_burglary)

##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -0.85757 -0.37622 -0.13265  0.05835  0.23706  2.85981

We can create a Moran scatter plot so that you see that nothing has changed apart from the scale in which we are using the variables. The observations that are influential are highlighted in the plot as you can see.

x <- burglaries_per_lsoa$s_burglary
y <- burglaries_per_lsoa$lag_s_burglary
xx <- tibble(x,y)
moran.plot(x, ww)

We are now going to create a new variable to identify the quadrant in which each observation falls within the Moran Scatter plot, so that we can tell apart the high-high, low-low, high-low, and low-high areas. We will only identify those that are significant according to the p value that was provided by the local moran function.

Before we get started, let’s quickly review the tools we will use.

All our data is in this burglaries_per_lsoa object. This has a variable for the LSOA code (code), a variable for the number of burglaries (burglary), and then also the two variables we created, the scaled measure of burglary (s_burglary), and the spatial lag measure (lag_s_burglary).

We also have our locm_bm object, which we created with the localmoran() function, that has calculated a variety of measures for each of our observations, which we explored with the summary() function. You can see (if you scroll up) that the 5th element in this object is the p-value (“Pr(z > 0)”). To call the nth element of an object, you can use the square brackets after its name. So to return the nth column of thing, you can use thing[,n]. Again this should not be new to you, as we’ve been doing this sort of thing for a while.

So the data we need for each observation, in order to identify whether it belongs to the high-high, low-low, high-low, or low-high quadrants are the standardised burglary score, the spatial lag score, and the p-value.

Essentially all we’ll be doing, is assigning a variable values based on where in the plot it is. So for example, if it’s in the upper right, it is high-high, and has values larger than 0 for both the burglary and the spatial lag values. If it’s in the upper left, it’s low-high, and has a value larger than 0 for the spatial lag value, but lower than 0 on the burglary value. And so on, and so on. Here’s an image to illustrate:

So let’s first initialise this variable. In this instance we are creating a new column in the burglaries_per_lsoa dataframe and calling it quad_sig.

We are using the mutate() function from the dplyr package to create our new variable, just as we have in previous labs.

We also use nested ifelse() statements. Nested ifelse() just means that it’s an ifelse() inside another ifelse() statement. To help us with these sorts of situations is the ifelse() function. We saw this with the previous exercises, but I’ll describe it brielfy again. It allows us to conditionally assign some value to some variable. The structure of the function is so that you have to pass it a condition, then a value to assign if the condition is true, and then another value if the condition is false. You are basically using this function to say: “if this condition is true, do first thing, else, do the second thing”. It would look something like this:

dataframe$new_variable <- ifelse(dataframe$some_numeric_var < 100, "smaller than 100", "not smaller than 100")

When nesting these, all you do is put another condition to check in the “thing to do if false”, so it checks all conditions. So in the first instance we check if the value for burglary is greater than zero, and the value for the lag is greater than zero, and the p-value is smaller than our threshold of 0.05. If it is, then this should belong to the “high-high” group. If any one of these conditions is not met, then we move into the ‘thing to do if false’ section, where we now check again another set of criteria, and so on and so on. If none of these are met, we assign it the non-significant value:

burglaries_per_lsoa <- burglaries_per_lsoa %>% 
  mutate(quad_sig = ifelse(s_burglary > 0 & 
                             lag_s_burglary > 0 & 
                             locm_bm[,5] <= 0.05, 
                     "high-high",
                     ifelse(s_burglary <= 0 & 
                              lag_s_burglary <= 0 & 
                              locm_bm[,5] <= 0.05, 
                     "low-low", 
                     ifelse(s_burglary > 0 & 
                              lag_s_burglary <= 0 & 
                              locm_bm[,5] <= 0.05, 
                     "high-low",
                     ifelse(s_burglary <= 0 & 
                              lag_s_burglary > 0 & 
                              locm_bm[,5] <= 0.05,
                     "low-high", 
                     "non-significant")))))

Now we can have a look at what this returns us:

table(burglaries_per_lsoa$quad_sig)

## 
##       high-high        low-high         low-low non-significant 
##              25              10               1             245

This looks like a lot of non-significant results. We want to be sure this isn’t an artifact of our code but is true, we can check how many values are under 0.05:

nrow(locm_bm[locm_bm[,5] <= 0.05,])

## [1] 36

We can see that only 36 areas have p-values under 0.05 threshold. So this is in line with our results, and we can rest assured.

Well, this is exciting, but where are these regions?

Let’s put ’em on a map, just simply, using quick thematic map (qtm()):

qtm(burglaries_per_lsoa, fill="quad_sig", fill.title="LISA")

##

## ── tmap v3 code detected ───────────

## [v3->v4] `qtm()`: migrate the
## argument(s) related to the legend
## of the visual variable `fill`
## namely 'title' to 'fill.legend =
## tm_legend(<HERE>)'

Very nice!

So how do we interpret these results? Well keep in mind:

The LISA value for each location is determined from its individual contribution to the global Moran’s I calculation.
Whether or not this value is statistically significant is assessed by comparing the actual value to the value calculated for the same location by randomly reassigning the data among all the areal units and recalculating the values each time (the Monte Carlo simulation approach discussed earlier).

So essentially this map now tells us that there was statistically significant moderate clustering in burglaries in Manchester. When reporting your results, report at least the Moran’s I test value and the p value. So, for this test, you should report Moran’s I = 0.32, p < .001. Including the LISA cluster map is also a great way of showing how the attribute is actually clustering.