Chapter 4 Performing spatial operations in R

By now you have come a long way in terms of taking your spatial data, and visualising it using maps, and being able to present the values of a variable using thematic maps. You have had some practice in taking data which has a spatial component, and joining it to a shapefile, using the common column, in order to be able to visually demonstrate variation on something, such as the crime rate, across space.

I hope that you are finding this to be really exciting stuff, and an opportunity to get yourselves accustomed to spatial data. If there is anything you are unsure about, or want to catch up on, please do not hesitate to revisit older material, and ask us questions about it. We build on each week acquiring knowledge cumulatively, so don’t let yourself get stuck anywhere down the line. But, if you’re ready, today we will go a step further, and get your hands dirty with spatial manipulation of your data.

Thus far, our data manipulation exercises were such that you might be familiar with, from your earlier exposures to data analysis. Linking datasets using a common column, calculating a new variable (new column) from values of existing variables, these are all tasks which you can perform on spatial or non-spatial data. However today we will explore some exercises in data manipulation which are specific to spatial data analysis. After this session you can truly say you are masters of spatial data manipulation. So let’s get started with that!

The main objectives for this session are that by the end you will have:

  • used geocoding methods to translate postcodes into geographic coordinates
  • made interactive point map with leaflet
  • met a new format of spatial shape file called geojson
  • subset points that are within a certain area using a spatial operation
  • created new polygons by generating buffers around points
  • counted the number of points that fall within a polygon (known as points in polygon)

These are all very useful tools for the spatial crime analyst, and we will hope to demonstrate this by working through an example project, where you would make use of all of these tools.

The packages we will use today are:

  • readr
  • dplyr
  • rjson
  • purrr
  • leaflet
  • RColorBrewer
  • sf
  • janitor

Let’s consider the assumption that licenced premises which serve alcohol are associated with increased crimes. We might have some hypotheses about why this may be.

One theory might be that some of these serve as crime attractors.

Crime attractors are particular places, areas, neighbourhoods, districts which create well-known criminal opportunities to which strongly motivated, intending criminal offenders are attracted because of the known opportunities for particular types of crime. Examples might include bar districts; prostitution areas; drug markets; large shopping malls, particularly those near major public transit exchanges; large, insecure parking lots in business or commercial areas. The intending offender goes to rough bars looking for fights or other kinds of ‘action.’

On the other hand, it is possible that these areas are crime generators.

Crime generators are particular areas to which large numbers of people are attracted for reasons unrelated to any particular level of criminal motivation they might have or to any particular crime they might end up committing. Typical examples might include shopping precincts; entertainment districts; office concentrations; or sports stadiums.

(To read further in crime attractors vs crime generators turn to your recommended reading of Brantingham, P., & Brantingham, P. (1995). Criminality of place. European journal on criminal policy and research, 3(3), 5-26.). There have since been more developments, for example about crime radiators and absorbers as well (watch this Risky Places lecture from Kate Bowers: to learn more!)

It’s possible that some licensed premises attract crimes, due to their reputation. However it is also possible that some of them are simply located in areas that are busy, attracts lots of people for lots of reasons, and crimes occurr as a result of an abundance of opportunities instead.

In any case, what we want to do is to examine whether certain outlets have more crimes near them than others. We can do this using open data, some R code, and the spatial operations discussed above. So let’s get to it!

4.1 Getting some (more) data

Manchester City Council have an Open Data Catalogue on their website, which you can use to browse through what sorts of data they release to the public. There are a some more and some less interesting data sets made available here. It’s not quite as impressive as the open data from some of the cities in the US such as New York or Dallas but we’ll take it.

One interesting data set, especially for our questions about the different alcohol outlets is the Licensed Premises data set. This details all the currently active licenced premises in Manchester. You can see there is a link to download now.

As always, there are a few ways you can download this data set. On the manual side of things, you can simply right click on the download link from the website, save it to your computer, and read it in from there, by specifying the file path. Remember, if you save it in your working directory, then you just need to specify the file name, as the working directory folder is where R will first look for this file.

4.1.0.1 Activity 1: Reading data in from the web

But, programmers are lazy, and the whole point of using code-based interfaces is that we get to avoid doing unneccessary work, like point-and-click downloading of files. And when data exists online in a suitable format, we can tell R to read the data in from the web directly, and cut out the middle man (that being ourseves in our pointing-and-clicking activity).

How can we do this? Well think about what we do when we read in a file. We say, hello R, i would like to create a new object please and I will call this new object my_data. We do this by typing the name we are giving the object and the assignment function <-. Right? Then on the right hand side of the assignment function, there is the value that we are assigning the variable. So it could be a bit of text (such as when you’re creating a shp_name object and you pass it the string "path to my file"), or it could be some function, for example when you read a csv file with the read.csv() function.

So if we’re reading a csv, we also need to specity where to read the csv from. Where should R look to find this data? This is where normally you are putting in the path to your file, right? Something like:

my_data <- read_csv("path to my file here")

Well what if your data does not live on your laptop or PC? Well, if there is a way that R can still access this data just by following a path, then this approach will still work! So how can we apply this to getting the Licensed Premises data from the web?

You know when you right click on the link, and select “Save As…” or whatever you click on to save? You could, also select “Copy Link Address.” This just copies the webpage where this data is stored. Give it a go! Copy the address, and then paste it into your browser. It will take you to a blank page where a forced download of the data will begin. So what if you pasted this into the read.csv() function?

my_data <- read_csv("www.data.com/the_data_i_want")

Well in this case, the my_data object would be assigned the value returned from the read.csv() function reading in the file from the url you provided. File path is no mysterious thing, file path is simply the path to the file you want to read. If this is a website, then so be it.

So without dragging this on any further, let’s read in the licensed premises data directly from the web:

library(readr)
lic_prem <- read_csv("http://www.manchester.gov.uk/open/download/downloads/id/169/licensed_premises.csv")
## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   .default = col_character(),
##   CASEREFERENCE = col_double(),
##   LATENIGHTREFRESHMENT = col_logical(),
##   ALCOHOLSUPPLY = col_logical(),
##   OPENINGHOURS = col_logical(),
##   LICENCEENDDATE = col_logical(),
##   ONPREMISESALCOHOLSALE = col_logical(),
##   OFFPREMISESALCOHOLSALE = col_logical(),
##   SUPERVISORADDRESS = col_logical(),
##   SUPERVISORPOSTCODE = col_logical(),
##   CONDITIONS = col_logical(),
##   DECIDEDBY = col_logical(),
##   LICENCEURL = col_logical(),
##   UPRN = col_logical()
## )
## ℹ Use `spec()` for the full column specifications.
## Warning: 63264 parsing failures.
##  row col   expected     actual                                                                                file
## 2272  -- 36 columns 23 columns 'http://www.manchester.gov.uk/open/download/downloads/id/169/licensed_premises.csv'
## 2273  -- 36 columns 23 columns 'http://www.manchester.gov.uk/open/download/downloads/id/169/licensed_premises.csv'
## 2274  -- 36 columns 23 columns 'http://www.manchester.gov.uk/open/download/downloads/id/169/licensed_premises.csv'
## 2275  -- 36 columns 23 columns 'http://www.manchester.gov.uk/open/download/downloads/id/169/licensed_premises.csv'
## 2276  -- 36 columns 23 columns 'http://www.manchester.gov.uk/open/download/downloads/id/169/licensed_premises.csv'
## .... ... .......... .......... ...................................................................................
## See problems(...) for more details.

You can always check if this worked by looking to your global environment on the righ hand side and seeing if this ‘lic_prem’ object has appeared. If it has, you should see it has 65535 observations (rows), and 36 variables (columns).

Let’s have a look at what this data set looks like. You can use the View() function for this:

View(lic_prem)

We can see there are some interesting and perhaps less interesting columns in there. There are quite a lot of venues in this list as well. Let’s think about subsetting them. Let’s say we’re interested in city centre manchester. We can see that there is a column for postcodes. We know (from our local domain knowledge) That city centre postcodes are M1-M4. So let’s start by subsetting the data to include these.

4.1.1 Subsetting using pattern matching

We could use spatial operations here, and geocode all the postcodes at this point, then use a spatial file of city centre to select only the points contained in this area. The only reason we’re not doing this is because the geocode function takes a bit of time to geocode each address. It would only be about 10 - 15 minutes, but we don’t want to leave you sitting around in the lab for this long, so instead we will try to subset the data using pattern matching in text. In particular we will be using the grepl() function. This function takes a pattern and looks for it in some text. If it finds the pattern, it returns TRUE, and if it does not, it returns FALSE. So you have to pass two parameters to the grepl() function, one of them being the pattern that you want it to look for, and the other being the object in which to search.

So for example, if we have an object that is some text, and we want to find if it contains the letter “a,” we would pass those inside the grepl() function, which would tell us TRUE (yes it’s there) or FALSE (no it’s not there):

some_text <- "this is some text that has some letter 'a's"

grepl("a", some_text)
## [1] TRUE

You can see this returns TRUE, because there is at least one occurrence of the letter a. If there wasn’t, we’d get FALSE:

some_text <- "this is some text tht hs some letters"

grepl("a", some_text)
## [1] FALSE

4.1.1.1 Activity 2: Pattern matching to find city centre premises

So we can use this, to select all the cases where we find the pattern “M1” in the postcode. NOTICE the space in our search pattern. It’s not “M1” it’s “M1.” Can you guess why?

Well, M1 will be found in M1 but also in M13, which is the University of Manchester’s postcode, and not the part of city centre in which we are interested.

So let’s subset our data by creating a new object city_centre_prems, and using the piping (%>%) and filter() functions from the dplyr package:

#remember to load dplyr package if you haven't already: 
library(dplyr)

#then create the city_centre_prems object:
city_centre_prems <- lic_prem %>%
  filter(grepl("M1 ", POSTCODE) )

Now we only have 353 observations (see your global environment), which is a much more manageable number.

4.1.2 Geocoding from an address

Great OK so we have this list of licensed premises, and we have their address, which is clearly some sort of spatial information, but how would you put this on a map?

Any ideas?

We can, at the most basic, geocode the postcode. This will put all the establisments to the centroid of the postcode. Postcodes are used in the United Kingdom as alphanumeric codes, that were devised by Royal Mail. A full postcode is known as a “postcode unit” and designates an area with a number of addresses or a single major delivery point. You can search the Royal Mail for information on post codes here..

Here is a map of the postcode areas in Greater Manchester:

Now the centroid of the post code area represents the central point of the shapefile. For example, here you can see some polygons with their centroids illustrated by points:

This is not quite as precise as geocoding the actual address, but let’s just stick with this approach for now.

So we need something that will help us get the coordinates for the relevant post code centroid. For this we can use the Open postcode geo from data.gov.uk. Open Postcode Geo is a postcode dataset and API optimised for geocoding applications. You can use Open Postcode Geo to geocode a dataset, geocode user input, and therefore build a proximity search. Data is derived from the Office for National Statistics postcode database and is free to use, subject to including attributions to ONS, OS (Ordinance Survey) and Royal Mail.

Postcodes can be entered at area, district, sector, and unit level - see Postcode map for the geographical relationship between these. We can use the Application Programme Interface (API) to query postcodes and read them directly into R, attaching a latitude and a longitude to our dataframe.

4.1.2.1 Activity 3: Getting address from post code using an API

What is an API? I once gave an hour long demo on using APIs, if you would like to watch you can see here: APIs demo.

But essentially, the way we use it here, an API is a way for us to query data from the web using an url. In this case, we will use the Open Postcode Geo API (detailed above), and give it an address. In the URL. Then, it will return to us the coordinates of that address.

Try this in a browser. Open up chrome, or firefox, or whatever you use and type “http://api.getthedata.com/postcode/” plus your post code, but instead of the space add a plus sign.

For example, the uni postcode is “M13 9PL.” So to query the coordinates for the university we use the url: “http://api.getthedata.com/postcode/M13+9PL

You should see a result like this:

This is called a JSON file, you can see it has lots of info about this post code, for example it tells us the country is England, and it gives us a value for latitude and for longitude.

It’s all very well seeing this in a browser, but how can we get this into R? Well we can use a function called fromJSON() from the rjson package, which reads in JSON files and turns them into the nice dataframes we know and love. Inside the fromJSON() we use the readlines() function to get the data from the URL. We save all this into a geocode_result object:

library(rjson)

geocode_result <- fromJSON(readLines("http://api.getthedata.com/postcode/M13+9PL"))
## Warning in readLines("http://api.getthedata.com/postcode/M13+9PL"): incomplete
## final line found on 'http://api.getthedata.com/postcode/M13+9PL'

We get a warning about an incomplete final line. For now ignore this, as it seems we get our data anyway. So what do we get? Well let’s see this geocode_result object:

geocode_result
## $status
## [1] "match"
## 
## $match_type
## [1] "unit_postcode"
## 
## $input
## [1] "M13 9PL"
## 
## $data
## $data$postcode
## [1] "M13 9PL"
## 
## $data$status
## [1] "live"
## 
## $data$usertype
## [1] "large"
## 
## $data$easting
## [1] 384591
## 
## $data$northing
## [1] 396711
## 
## $data$positional_quality_indicator
## [1] 1
## 
## $data$country
## [1] "England"
## 
## $data$latitude
## [1] "53.466926"
## 
## $data$longitude
## [1] "-2.233578"
## 
## $data$postcode_no_space
## [1] "M139PL"
## 
## $data$postcode_fixed_width_seven
## [1] "M13 9PL"
## 
## $data$postcode_fixed_width_eight
## [1] "M13  9PL"
## 
## $data$postcode_area
## [1] "M"
## 
## $data$postcode_district
## [1] "M13"
## 
## $data$postcode_sector
## [1] "M13 9"
## 
## $data$outcode
## [1] "M13"
## 
## $data$incode
## [1] "9PL"
## 
## 
## $copyright
## [1] "Contains OS data (c) Crown copyright and database right 2021"                 
## [2] "Contains Royal Mail data (c) Royal Mail copyright and database right 2021"    
## [3] "Contains National Statistics data (c) Crown copyright and database right 2021"

Ahh it contains all the information we saw earlier in the browser. How nice!

This is great for one postcode at a time, but remember what I said about being lazy, and automating. We want the computer to do the work for us. To do this, we need to set up our query (the url) in a way that lets us give many postcodes in, and get many results out. For that we can use the paste0() function and the gsub() function. paste0() can be used to paste together different bits of text, while gsub() substitutes certain characters (in this case spaces) for other characters (in this case +). So to get from M13 9PL to M13+9PL we use gsub(" ", "+", "M13 9PL"), first saying what to replace, then what to replace it with, and finally in what object (or character string).

So let’s say we assign our postcode to an object called “address”:

address <- "M13 9PL"

We can then use this in the combination of paste0 and gsub to build the URL for our query:

geocode_result <- fromJSON(readLines(paste0("http://api.getthedata.com/postcode/",gsub(" ", "+", address))))
## Warning in readLines(paste0("http://api.getthedata.com/postcode/", gsub(" ", :
## incomplete final line found on 'http://api.getthedata.com/postcode/M13+9PL'

You can see we get the same result as above. And if we want only the coordinates we can call them:

geocode_result$data$latitude
## [1] "53.466926"
geocode_result$data$longitude
## [1] "-2.233578"

So how can we apply this to a whole dataframe? Well here I build two functions geocode_addys_getlng() to get the longitudes and geocode_addys_getlat() to get the latitudes. You can unpick this code if you like, but if you want, you’re welcome to just run these to create the functions in your environment and use

geocode_addys_getlng <- function(x){
  
  geocode_result <- fromJSON(readLines(paste0("http://api.getthedata.com/postcode/",gsub(" ", "", x))))
  return(ifelse(!is.null(geocode_result$data$longitude), geocode_result$data$longitude, NA))
}

geocode_addys_getlat <- function(x){
  
  geocode_result <- fromJSON(readLines(paste0("http://api.getthedata.com/postcode/",gsub(" ", "", x))))
  return(ifelse(!is.null(geocode_result$data$latitude), geocode_result$data$latitude, NA))
}

Now to apply these functions to the whole dataframe, we will use the mutate() function to create a new column for longitue and one for latitude, and for each one apply the relevant function with the map_chr function from the purr package.

library(purrr)

city_centre_prems <- city_centre_prems %>% 
  mutate(longitude = map_chr(POSTCODE, geocode_addys_getlng),
         latitude = map_chr(POSTCODE, geocode_addys_getlat))

Be patient, this will take a while, each postcode has to be referenced against their database and the relevant coordinates extracted. For each point you will see a note appear in red, and while R is working you will see the red stop sign on the top right corner of the Console window:

Also think about how incredibly fast and easy this actually is, if you consider a potential alternative where you have to manualy find some coordinates for each address. That sounds pretty awful, doesn’t it? Compared to that, setting the above functions running, and stepping away to make a cup of tea is really a pretty excellent alternative, no?

Right so hopefully that is done now, and you can have a look at your data again to see what this new column looks like. Remember you can use the View() function to make your data appear in this screen.

View(city_centre_prems)

And now we have a column called longitude for longitude and a column called latitude for latitude. Neat! I know there was a lot in there, so don’t worry about asking lots of questions on this, but also don’t worry too much if you just run the functions and get the coordinates, as long as you know where the coords come from!

4.2 Making interactive maps with leaflet

Thus far we have explored a few approaches to making maps. We made great use of the tmaps package for example in the past few weeks.

But now, we are going to step up our game, and introduce Leaflet as one way to easily make some neat maps. It is the leading open-source JavaScript library for mobile-friendly interactive maps. It is very most popular, used by websites ranging from The New York Times and The Washington Post to GitHub and Flickr, as well as GIS specialists like OpenStreetMap, Mapbox, and CartoDB, some of who’s names you’ll recognise from the various basemaps we played with in previous labs.

In this section of the lab we will learn how to make really flashy looking maps using leaflet.

If you haven’t already, you will need to have installed the following packages to follow along:

install.packages("leaflet") #for mapping
install.packages("RColorBrewer") #for getting nice colours for your maps

Once you have them installed, load them up with the library() function:

4.2.1 Activity 4: Making an interactive map

To make a map, just load the leaflet library:

library(leaflet)

You then create a map with this simple bit of code:

m <- leaflet() %>%
  addTiles()  

And just print it:

m  

Not a super usseful map, definitely won’t win map of the week, but it was really easy to make!

You might of course want to add some content to your map.

4.2.1.1 Adding points manually:

You can add a point manually:

m <- leaflet() %>%
  addTiles()  %>% 
  addMarkers(lng=-2.230899, lat=53.464987, popup="You would be here if it weren't 2021")
m  

Or many points manually, with some popup text as well:

latitudes = c(53.464987, 53.472726, 53.466649) 
longitudes = c(-2.230899, -2.245481, -2.243421) 
popups = c("You are (not) here", "Here is another point", "Here is another point") 
df = data.frame(latitudes, longitudes, popups)      

m <- leaflet(data = df) %>%
  addTiles()  %>%  
  addMarkers(lng=~longitudes, lat=~latitudes, popup=~popups)
m  

4.2.1.2 Adding data from elsewhere

Last time around we added crime data to our map. In this case, we want to be mapping our licensed premises in the city centre, right? So let’s do this:

city_centre_prems$latitude <- as.numeric(city_centre_prems$latitude)
city_centre_prems$longitude <- as.numeric(city_centre_prems$longitude)

m <- leaflet(data = city_centre_prems) %>%
  addProviderTiles("Stamen.Toner") %>% 
  addMarkers(lng=~longitude, lat=~latitude, popup=~as.character(PREMISESNAME), label = ~as.character(PREMISESNAME))
## Warning in validateCoords(lng, lat, funcName): Data contains 4 rows with either
## missing or invalid lat/lon values and will be ignored
m  

Should be looking familiar as well. Now let’s say you wanted to save this map. You can do this by clicking on the export button at the top of the plot viewer, and choose the Save as Webpage option saving this as a .html file:

Then you can open this file with any type of web browser (safari, firefox, chrome) and share your map that way. You can send this to your friends not on this course, and make them jealous of your fancy map making skills.

One thing you might have noticed is that we still have some points that are not in Manchester. This should illustrate that the pattern matching approach is really just a work-around. Instead, what we really should be doing to subset our data spatially is to use spatial operations. So now we’ll learn how to do some of these in the next section.

4.3 Spatial operations

Spatial operations are a vital part of geocomputation. Spatial objects can be modified in a multitude of ways based on their location and shape. For a comprehensive overview of spatial operations in R I would recommend the relevant chatper Chapter 4: Spatial Operations from the project of Robin Lovelace and Jakub Nowosad, Geocomputation with R.

Spatial operations differ from non-spatial operations in some ways. To illustrate the point, imagine you are researching road safety. Spatial joins can be used to find road speed limits related with administrative zones, even when no zone ID is provided. But this raises the question: should the road completely fall inside a zone for its values to be joined? Or is simply crossing or being within a certain distance sufficent? When posing such questions it becomes apparent that spatial operations differ substantially from attribute operations on data frames: the type of spatial relationship between objects must be considered.

So you can see we can do exciting spatial operations with our spatial data, which we cannot with the non-spatial stuff.

For our spatial operations we will be using functions that belong to the sf package. So make sure you have this loaded up:

library(sf)

4.3.1 Coordinate reference systems revisited

One important note before we begin to do this brings us back to some of the learning from the second session on map projections and coordinate reference systems, like we discussed in the lecture today. We spoke about all the ways of flattening out the earth, and ways of making sense what that means for the maps, and also how to be able to point to specific locations within these. The latter refers to the Coordinate Reference System or CRS the most common ones we will use are WGS 84 and British National Grid.

So why are we talking about this?

It is important to note that spatial operations that use two spatial objects rely on both objects having the same coordinate reference system

If we are looking to carry out operations that involve two different spatial objects, they need to have the same CRS!!! Funky weird things happen when this condition is not met, so beware!

So how do we know what CRS our spatial objects are? Well the sf package contains a handy function called st_crs() which let’s us check. All you need to pass into the brackets of this function is the name of the object you want to know the CRS of.

So let’s check what is the CRS of our licenced premises:

st_crs(city_centre_prems)
## Coordinate Reference System: NA

You can see that we get the CRS returned as NA. Can you think of why? Have we made this into a spatial object? Or is this merely a dataframe with a latitude and longitude column? The answer is really in the question here.

So we need to convert this to a sf object, or a spatial object, and make sure that R knows that the latitude and the longitude columns are, in fact, coordinates.

In the st_as_sf() function we specify what we are transforming (the name of our dataframe), the column names that have the coordinates in them (longitude and latitude), the CRS we are using (4326 is the code for WGS 84, which is the CRS that uses latitude and longitude coordinates (remember BNG uses Easting and Northing)), and finally agr, the attribute-geometry-relationship, specifies for each non-geometry attribute column how it relates to the geometry, and can have one of following values: “constant,” “aggregate,” “identity.” “constant” is used for attributes that are constant throughout the geometry (e.g. land use), “aggregate” where the attribute is an aggregate value over the geometry (e.g. population density or population count), “identity” when the attributes uniquely identifies the geometry of particular “thing,” such as a building ID or a city name. The default value, NA_agr_, implies we don’t know.

cc_spatial <- st_as_sf(city_centre_prems, coords = c("longitude", "latitude"), 
                 crs = 4326, agr = "constant", na.fail = FALSE)

Now let’s check the CRS of this spatial version of our licensed premises:

st_crs(cc_spatial)
## Coordinate Reference System:
##   User input: EPSG:4326 
##   wkt:
## GEOGCRS["WGS 84",
##     DATUM["World Geodetic System 1984",
##         ELLIPSOID["WGS 84",6378137,298.257223563,
##             LENGTHUNIT["metre",1]]],
##     PRIMEM["Greenwich",0,
##         ANGLEUNIT["degree",0.0174532925199433]],
##     CS[ellipsoidal,2],
##         AXIS["geodetic latitude (Lat)",north,
##             ORDER[1],
##             ANGLEUNIT["degree",0.0174532925199433]],
##         AXIS["geodetic longitude (Lon)",east,
##             ORDER[2],
##             ANGLEUNIT["degree",0.0174532925199433]],
##     USAGE[
##         SCOPE["unknown"],
##         AREA["World"],
##         BBOX[-90,-180,90,180]],
##     ID["EPSG",4326]]

We can now see that we have this coordinate system as WGS 84. We need to then make sure that any other spatial object with which we want to perform spatial operations is also in the same CRS.

4.3.2 Meet a new format of shapefile: geojson

GeoJSON is an open standard format designed for representing simple geographical features, along with their non-spatial attributes. It is based on JSON, the JavaScript Object Notation. It is a format for encoding a variety of geographic data structures.

Geometries are shapes. All simple geometries in GeoJSON consist of a type and a collection of coordinates. The features include points (therefore addresses and locations), line strings (therefore streets, highways and boundaries), polygons (countries, provinces, tracts of land), and multi-part collections of these types. GeoJSON features need not represent entities of the physical world only; mobile routing and navigation apps, for example, might describe their service coverage using GeoJSON.

To tinker with GeoJSON and see how it relates to geographical features, try geojson.io, a tool that shows code and visual representation in two panes.

Let’s read in a geoJSON spatial file, again from the web. This particular geojson represents the wards of Greater Manchester.

manchester_ward <- st_read("https://raw.githubusercontent.com/RUMgroup/Spatial-data-in-R/master/rumgroup/data/wards.geojson")
## Reading layer `wards' from data source `https://raw.githubusercontent.com/RUMgroup/Spatial-data-in-R/master/rumgroup/data/wards.geojson' using driver `GeoJSON'
## Simple feature collection with 215 features and 12 fields
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: 351664 ymin: 381168.6 xmax: 406087.5 ymax: 421039.8
## Projected CRS: OSGB 1936 / British National Grid

Let’s select only the city centre ward, using the filter() function from dplyr

city_centre <- manchester_ward %>%
  filter(wd16nm == "City Centre")

Let’s see how this looks, using the plot() function:

plot(st_geometry(city_centre))

Now we could use this to make sure that our points included in cc_spatial are in fact only licensed premises in the city centre. This will be your first spatial operation. Excited? Let’s do this!

4.3.2.1 Activity 5: Subset points to those within a polygon

So we have our polygon, our spatial file of the city centre ward. We now want to subset our point data, the cc_spatial data, which has points representing licensed premises.

First things first, we check whether they have the same crs.

st_crs(city_centre) == st_crs(cc_spatial)
## [1] FALSE

Uh oh! They do not! So what can we do? Well we already know that cc_spatial is in WGS 84, because we made it so a little bit earlier. What about this new city_centre polygon?

st_crs(city_centre) 
## Coordinate Reference System:
##   User input: OSGB 1936 / British National Grid 
##   wkt:
## PROJCRS["OSGB 1936 / British National Grid",
##     BASEGEOGCRS["OSGB 1936",
##         DATUM["OSGB 1936",
##             ELLIPSOID["Airy 1830",6377563.396,299.3249646,
##                 LENGTHUNIT["metre",1]]],
##         PRIMEM["Greenwich",0,
##             ANGLEUNIT["degree",0.0174532925199433]],
##         ID["EPSG",4277]],
##     CONVERSION["British National Grid",
##         METHOD["Transverse Mercator",
##             ID["EPSG",9807]],
##         PARAMETER["Latitude of natural origin",49,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8801]],
##         PARAMETER["Longitude of natural origin",-2,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8802]],
##         PARAMETER["Scale factor at natural origin",0.9996012717,
##             SCALEUNIT["unity",1],
##             ID["EPSG",8805]],
##         PARAMETER["False easting",400000,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8806]],
##         PARAMETER["False northing",-100000,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8807]]],
##     CS[Cartesian,2],
##         AXIS["(E)",east,
##             ORDER[1],
##             LENGTHUNIT["metre",1]],
##         AXIS["(N)",north,
##             ORDER[2],
##             LENGTHUNIT["metre",1]],
##     USAGE[
##         SCOPE["unknown"],
##         AREA["UK - Britain and UKCS 49°46'N to 61°01'N, 7°33'W to 3°33'E"],
##         BBOX[49.75,-9.2,61.14,2.88]],
##     ID["EPSG",27700]]

Aha, the key is in the 27700. This code in fact stands for…. British National Grid…!

So what can we do? We can transform our spatial object. Yepp, we can convert between CRS.

So let’s do this now. To do this, we can use the st_transform() function.

cc_WGS84 <- st_transform(city_centre, 4326)

Let’s check that it worked:

st_crs(cc_WGS84) 
## Coordinate Reference System:
##   User input: EPSG:4326 
##   wkt:
## GEOGCRS["WGS 84",
##     DATUM["World Geodetic System 1984",
##         ELLIPSOID["WGS 84",6378137,298.257223563,
##             LENGTHUNIT["metre",1]]],
##     PRIMEM["Greenwich",0,
##         ANGLEUNIT["degree",0.0174532925199433]],
##     CS[ellipsoidal,2],
##         AXIS["geodetic latitude (Lat)",north,
##             ORDER[1],
##             ANGLEUNIT["degree",0.0174532925199433]],
##         AXIS["geodetic longitude (Lon)",east,
##             ORDER[2],
##             ANGLEUNIT["degree",0.0174532925199433]],
##     USAGE[
##         SCOPE["unknown"],
##         AREA["World"],
##         BBOX[-90,-180,90,180]],
##     ID["EPSG",4326]]

Looking good. Triple double check:

st_crs(cc_WGS84) == st_crs(cc_spatial)
## [1] TRUE

YAY!

Now we can move on to our spatial operation, where we select only those points within the city centre polygon. To do this, we can use the st_intersects() function:

# intersection
cc_intersects <- st_intersects(cc_WGS84, cc_spatial)
## although coordinates are longitude/latitude, st_intersects assumes that they are planar
# subsetting
cc_intersects <- cc_spatial[unlist(cc_intersects),]

have a look at this new cc_intersects object in your environment. How many observations does it have? Is this now fewer than the previous cc_spatial object? Why do you think this is?

(hint: you’re removing everything that is outside the city centre polygon)

We can plot this too to have a look:

# plot
plot(st_geometry(cc_WGS84), border="#aaaaaa")
plot(st_geometry(cc_intersects), col = "red", add=T)

COOL, we have successfully performed our first spatial operation, we managed to subset our points data set to include only those points which are inside the polgon for city centre. See how this was much easier, and more reliable than the hacky workaround using pattern matching? Yay!

4.3.2.2 Activity 6: Building buffers

Right, but what we want to do really to go back to our original question. We want to know about crime in and around out areas of interest, in this case our licensed premises. But how can we count this?

Well first we will need crime data. Let’s use the same data set from last week. I’m not going over the detail of how to read this in, if you forgot, go back to the notes from last week.

crimes <- read_csv("data/2019-06-greater-manchester-street.csv")
## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   `Crime ID` = col_character(),
##   Month = col_character(),
##   `Reported by` = col_character(),
##   `Falls within` = col_character(),
##   Longitude = col_double(),
##   Latitude = col_double(),
##   Location = col_character(),
##   `LSOA code` = col_character(),
##   `LSOA name` = col_character(),
##   `Crime type` = col_character(),
##   `Last outcome category` = col_character(),
##   Context = col_logical()
## )

Notice that in this case the columns are spelled with upper case “L.” You should always familiarise yourself with your data set to make sure you are using the relevant column names. You can see just the column names using the names() function like so :

names(crimes)
##  [1] "Crime ID"              "Month"                 "Reported by"          
##  [4] "Falls within"          "Longitude"             "Latitude"             
##  [7] "Location"              "LSOA code"             "LSOA name"            
## [10] "Crime type"            "Last outcome category" "Context"

Arg so messy! Let’s use our handy helpful clean_names() function from the janitor package:

library(janitor)
crimes <- crimes %>% clean_names()
names(crimes)
##  [1] "crime_id"              "month"                 "reported_by"          
##  [4] "falls_within"          "longitude"             "latitude"             
##  [7] "location"              "lsoa_code"             "lsoa_name"            
## [10] "crime_type"            "last_outcome_category" "context"

Or you can have a look at the first 6 lines of your dataframe with the head() function:

head(crimes)
## # A tibble: 6 x 12
##   crime_id  month reported_by falls_within longitude latitude location lsoa_code
##   <chr>     <chr> <chr>       <chr>            <dbl>    <dbl> <chr>    <chr>    
## 1 <NA>      2019… Greater Ma… Greater Man…     -2.46     53.6 On or n… E01004768
## 2 aa1cc4cb… 2019… Greater Ma… Greater Man…     -2.44     53.6 On or n… E01004768
## 3 e513df63… 2019… Greater Ma… Greater Man…     -2.44     53.6 On or n… E01004768
## 4 6ed763df… 2019… Greater Ma… Greater Man…     -2.44     53.6 On or n… E01004768
## 5 780d55b8… 2019… Greater Ma… Greater Man…     -2.45     53.6 On or n… E01004768
## 6 753fa25f… 2019… Greater Ma… Greater Man…     -2.44     53.6 On or n… E01004768
## # … with 4 more variables: lsoa_name <chr>, crime_type <chr>,
## #   last_outcome_category <chr>, context <lgl>

Or you can view, with the View() function.

Now, we have our points that are crimes, right? Well… How do we connect them to our points that are licensed premises?

First things first, let’s make sure again that R is aware that this is a spatial set of points, and that the columns latitude and longitude are used to create a geometry.

crimes_spatial <- st_as_sf(crimes, coords = c("longitude", "latitude"), 
                 crs = 4326, agr = "constant")

Next, we should find a way to link each crime to the licenced premise which we might count it against.

One approach is to build a buffer around our licensed premises, and say that we will count all the crimes which fall within a specific radius of this licensed premise. What should this radius be? Well this is where your domain knowledge as criminologist comes in. How far away would you consdier a crime to still be related to this pub? 400 meters? 500 meters? 900 meters? 1 km? What do you think? This is again one of them it depends questions. Whatever buffer you choose you should justify, and make sure that you can defend when someone might ask about it, as the further your reach obviously the more crimes you will include, and these might alter your results.

So, let’s say we are interested in all crimes that occur within 400 meters of each licensed premise. We chose 400m here as this is the recommended distance for accessible bus stop guidance, so basically as far as people should walk to get to a bus stop (TfL, 2008). So in this case, we want to take our points, which represent the licensed premises, and build buffers of 400 meters around them.

You can do with the st_buffer() function:

prem_buffer <- st_buffer(cc_intersects, 1)
## Warning in st_buffer.sfc(st_geometry(x), dist, nQuadSegs, endCapStyle =
## endCapStyle, : st_buffer does not correctly buffer longitude/latitude data
## dist is assumed to be in decimal degrees (arc_degrees).

You should get a warning here, like I did above. This message indicates that sf assumes a distance value is given in degrees. This is because we have lat/long data (WSG 48)

One quick fix to avoid this message, is to convert to BNG:

prem_BNG <- st_transform(cc_intersects, 27700)

Now we can try again, with meters

prem_buffer <- st_buffer(prem_BNG, 400)

Let’s see how that looks:

plot(st_geometry(prem_buffer))
plot(st_geometry(prem_BNG), add = T)

That should look nice and squiggly. But also it looks like there is quite a lot of overlap here. Should we maybe consider smaller buffers? Let’s look at 100 meter buffers:

prem_buffer_100 <- st_buffer(prem_BNG, 100)
plot(st_geometry(prem_buffer_100))
plot(st_geometry(prem_BNG), add = T)

Still quite a bit of overlap, but this is possibly down to all the licensed premises being very densely close together in the city centre.

Well now let’s have a look at our crimes. I think it might make sense (again using domain knowledge) to restrict the analysis to violent crime. So let’s do this:

violent_spatial <- crimes_spatial %>%
  filter(crime_type=="Violence and sexual offences")

Now, remember the CRS is WGS 48 here, so we will need to convert our buffer layer back to this:

buffer_WGS84 <- st_transform(prem_buffer_100, 4326)

Now let’s just have a look:

plot(st_geometry(buffer_WGS84))
plot(st_geometry(violent_spatial), col = 'red', add = T)

OKAY, so some crimes fall inside some buffers, others not so much. Well, let’s get to our last spatial operation of the day, the famous points in polygon, to get to answering which licensed premises have the most violent crimes near them.

4.3.2.3 Activity 7: Counting Points in Polygon

When you have a polygon layer and a point layer - and want to know how many or which of the points fall within the bounds of each polygon, you can use this method of analysis. In computational geometry, the point-in-polygon (PIP) problem asks whether a given point in the plane lies inside, outside, or on the boundary of a polygon. As you can see, this is quite relevant to our problem, wanting to count how many crimes (points) fall within 100 meters of our licensed premises (our buffer polygons).

crimes_per_prem <- violent_spatial %>% 
  st_join(buffer_WGS84, ., left = FALSE) %>% 
  count(PREMISESNAME)

You now have a new dataframe, crimes_per_prem which has a column for the name of the premises, a column for the number of violend crimes that fall within the buffer, and a column for the geometry.

Take a moment to look at this table. Use the View() function. Which premises have the most violent crimes? Are you surprised?

Now as a final step, let’s plot this, going back to leaflet. We can shade by the number of crimes within the buffer, and include a little popup label with the name of the establishment:

pal <- colorBin("RdPu", domain = crimes_per_prem$n, bins = 5, pretty = TRUE)
leaflet(crimes_per_prem) %>% 
  addTiles() %>% 
  addPolygons(fillColor = ~pal(n), fillOpacity = 0.8,
              weight = 1, opacity = 1, color = "black",
              label = ~as.character(PREMISESNAME)) %>% 
  addLegend(pal = pal, values = ~n, opacity = 0.7, 
            title = 'Violend crimes', position = "bottomleft")