Chapter 3 Basic geospatial operations in R

3.1 Introduction

In this chapter we get our hands dirty with spatial manipulation of data. Thus far, our data manipulation exercises (using dplyr) were such that you might be familiar with, from any earlier exposures to data analysis. For example, linking datasets using a common column is a task which you can perform on spatial or non-spatial data. These are referred to as attribute operations. However in this chapter we will explore some exercises in data manipulation which are specific to spatial data analysis. We will be learning some key spatial operations: a set of functions that allow you to create new and manipulate spatial data.

The main objectives for this chapter are that by the end you will have:

  • met a new format for accessing boundary data, called geoJSON.
  • carried out spatial operations such as:
    • subset points that are within a certain area,
    • created new polygons by generating buffers around points,
    • counted the number of points that fall within a polygon (known as points in polygon),
    • finding the nearest feature in one data set to observations in another data set, and
    • measured distance between points in a map.
  • made interactive point map with leaflet.
  • used geocoding methods to translate text fields such as addresses into geographic coordinates.

These are all very useful tools for the spatial crime analyst, and we will hope to demonstrate this by working through an example project. The packages we will use in this chapter are:

# Packages for reading data and data carpentry
library(readr)
library(dplyr)
library(janitor)
library(units)
library(purrr)

# Packages for handling spatial data and for geospatial carpentry
library(sf)
library(tidygeocoder)
library(crsuggest)

# Packages for mapping and visualisation
library(leaflet)
library(RColorBrewer)

# Packages providing accesss to spatial data
library(osmdata)

3.2 Exploring the relationship between alcohol outlets and crime

The main example we will work through most of the chapter considers the assumption that licenced premises which serve alcohol are associated with increased crimes. We might have some hypotheses about why this may be.

One theory might be that some of these serve as crime attractors.

“Crime attractors are particular places, areas, neighbourhoods, districts which create well-known criminal opportunities to which strongly motivated, intending criminal offenders are attracted because of the known opportunities for particular types of crime. Examples might include bar districts; prostitution areas; drug markets; large shopping malls, particularly those near major public transit exchanges; large, insecure parking lots in business or commercial areas. The intending offender goes to rough bars looking for fights or other kinds of ‘action’” (Brantingham and Brantingham 1995, 7).

On the other hand, it is possible that these areas are crime generators.

“Crime generators are particular areas to which large numbers of people are attracted for reasons unrelated to any particular level of criminal motivation they might have or to any particular crime they might end up committing. Typical examples might include shopping precincts; entertainment districts; office concentrations; or sports stadiums” (Brantingham and Brantingham 1995, 8).

It is possible that some licensed premises attract crimes, due to their reputation. However it is also possible that some of them are simply located in areas that are busy, attracts lots of people for lots of reasons, and crimes occur as a result of an abundance of opportunities instead.

Whatever the mechanism, the first step to identifying crime places is to examine whether certain outlets have more crimes near them than others. We can do this using open data, some R code, and the spatial operations discussed above. We will return to data from Manchester, UK for this example, however as we will be using Open Street Map, you can easily replicate this for any other location where you have point-level crime data.

3.3 Acquiring relevant data

We will be using three different sources of data in this chapter. First, we will acquire our crime data, which is what we used in the previous chapter, so this should be familiar. Then we will meet the new format for boundary data, geoJSON. And finally, we will look at Open Street Map for data on our points of interest.

3.3.1 Reading in crime data

The point-level crime data for Greater Manchester can be found in the file 2019-06-greater-manchester-street.csv we used in Chapter 1. We can import it with the read_csv() function from the readr package.

crimes <- read_csv("data/2019-06-greater-manchester-street.csv")

If you replicate the exercises for getting to know the data set from the previous chapter, you might notice that in this case the column names are slightly different. For example, Latitude and Longitude are spelled with upper case “L”. You should always familiarise yourself with your data set to make sure you are using the relevant column names. You can see just the column names using the names() function like so :

names(crimes)
##  [1] "Crime ID"              "Month"                
##  [3] "Reported by"           "Falls within"         
##  [5] "Longitude"             "Latitude"             
##  [7] "Location"              "LSOA code"            
##  [9] "LSOA name"             "Crime type"           
## [11] "Last outcome category" "Context"

This is because last time, we cleaned these names using the clean_names() function from the janitor package. Let’s do this again, as these variable names are not ideal, with their capital letters and spacing…so messy!

# clean up the variable names
crimes <- crimes %>% clean_names()

#print them again to see
names(crimes)
##  [1] "crime_id"              "month"                
##  [3] "reported_by"           "falls_within"         
##  [5] "longitude"             "latitude"             
##  [7] "location"              "lsoa_code"            
##  [9] "lsoa_name"             "crime_type"           
## [11] "last_outcome_category" "context"

Much better! Now let’s get some boundary data for Manchester.

3.3.2 Meet a new format: geoJSON

GeoJSON is an open standard format designed for representing simple geographical features, along with their non-spatial attributes. It is based on JSON, the JavaScript Object Notation. It is a format for encoding a variety of geographic data structures and is the most common format for geographical representation in the web. Unlike ESRI shapefiles, with GeoJSON data everything is stored in a single file.

Geometries are shapes. All simple geometries in GeoJSON consist of a type and a collection of coordinates. The features include points (therefore addresses and locations), line strings (therefore streets, highways and boundaries), polygons (countries, provinces, tracts of land), and multi-part collections of these types. GeoJSON features need not represent entities of the physical world only; mobile routing and navigation apps, for example, might describe their service coverage using GeoJSON. To tinker with GeoJSON and see how it relates to geographical features, try geojson.io, a tool that shows code and visual representation in two panes.

Let’s read in a geoJSON spatial file, again from the web. This particular geoJSON represents the wards of Greater Manchester.

manchester_ward <- st_read("data/wards.geojson")

Let’s now select only the city centre ward, using the filter() function from the dplyr package. Notice how we’re using the values in the data (or attribute) table, therefore this is an attribure operation.

city_centre <- manchester_ward %>%
  filter(wd16nm == "City Centre")

Let’s see how this looks. In Chapter 1 we learned about plotting maps using the ggplot() function from the ggplot2 package. Here, let’s look at a different way, which can be useful for quick mapping - usually used when just checking in with our data to make sure it looks like what we were expecting.

With the st_geometry() function in the sf package, we can extract only the geometry of our object, stripping away the attributes. If we include this inside the plot() function, we get a quick, minimalist plot of the geometry of our object.

#extract geometry only
city_centre_geometry <- st_geometry(city_centre)

#plot geometry object
plot(city_centre_geometry)
A thin black line tracing the outline of the Manchester City Centre ward.

FIGURE 3.1: Plot City Centre ward geometry

Now we could use this geometry to make sure that our points are in fact only licensed premises in the city centre. This will be your first spatial operation. Excited? Let’s do this!

3.3.3 Open Street Map and points of interest

To map licenced premises we will be accessing data from Open Street Map, a database of geospatial information built by a community of mappers, enthusiasts and members of the public, who contribute and maintain data about all sorts of environmental features, such as roads, green spaces, restaurants and railway stations. You can see all about open street map on their online mapping platform. One feature of Open Street Map, unlike Google Map, is that the underlying data is openly available for download for free. In R, we can take advantage of a package written specifically for querying Open Street Map’s API, called osmdata. For more detail on how to use this, see Langton and Solymosi (2018), and Appendix C: sourcing geographical data for crime analysis of this book.

If we load the package osmdata we can use its functions to query the Open Street Map API. To find out more about the capabilities of this package, see Padgham et al. (2017). While this is outside the scope of our chapter here, you may want to explore osmdata more, as it is an international database, and has lots of data that may come in handy for research and analysis.

Here we focus specifically on Manchester. To retrieve data for a specific area, we must create a bounding box. You can think of the bounding box as a box drawn around the area that we are interested in (in this case, Manchester, UK) which tells the Open Street Map API that we want everything inside the box, but nothing outside the box.

So, how can we name a bounding box specification to define the study region? One way to do this is through a search term. Here, we want to select Greater Manchester, so we can use the search term “greater manchester united kingdom” within the getbb() function (stands for get bounding box). Using this function we can also specify what format we want the data to be in. In this case, we want a spatial object, specifically an sf polygon object (from the sf package), which we name bb_sf.

bb_sf <- getbb(place_name = "greater manchester united kingdom", 
               format_out = "sf_polygon")

We can see what this bounding box looks like by plotting it (notice this time we combine the plot() and st_geometry() functions to save needing to create a new object):

plot(st_geometry(bb_sf))
A thin black line tracing the outline of Greater Manchester.

FIGURE 3.2: Outline of Greater Manchester boundary

We can see the bounding box takes the form of Greater Manchester. We can now use this to query data from the Open Street Map API using the opq() function. The function name is short for ‘Overpass query’, which is how users can query the Open Street Map API using search criteria.

Besides specifying what area we want to query with our bounding box object(s) in the opq() function, we must also define the feature which we want returned. Features in Open Street Map are defined through ‘keys’ and ‘values’. Keys are used to describe a broad category of features (e.g. highway, amenity), and values are more specific descriptions (e.g. cycleway, bar). These are tags which contributors to Open Street Map have defined. A useful way to explore these is by using the comprehensive Open Street Map Wiki page on map features (see also Appendix C: sourcing geographical data for crime analysis of this book).

We can select what features we want using the add_osm_feature() function, specifying our key as ‘amenity’ and our value as ‘bar’. We also want to specify what sort of object (what class) to get our data into, and as we are still working with spatial data, we stick to the sf format, for which the function is osmdata_sf(). Here, we specify our bounding box as the bb_sf object we created above. If you use the bounding box obtained through getbb() one can subsequently trim down the outputs from add_osm_feature() using trim_osmdata(). For instance, we could add trim_osmdata(bb_poly = bb_sf) to our initial query.

osm_bar_sf <- opq(bbox = bb_sf) %>%    # select bounding box
  add_osm_feature(key = 'amenity', value = 'bar') %>% # select features
  osmdata_sf()          # specify class

The resulting object osm_bar_sf contains lots of information. We can view the contents of the object by simply executing the object name into the Console.

osm_bar_sf

This confirms details like the bounding box coordinates, but also provides information on the features collected from the query. As one might expect, most information relating to bar locations has been recorded using points (i.e. two-dimensional vertices, coordinates) of which we have 763 at the time of writing. We also have around fifty polygons. For now, let’s extract the point information.

osm_bar_sf <- osm_bar_sf$osm_points 

We now have an sf object with all the bars in our study region mapped by Open Street Map volunteers, along with ~90 variables of auxiliary data, such as characteristics of the bar (e.g.: brewery, or cocktail) as well as address and contact information, amongst many others. Of course, it is up to the volunteers whether they collect all these data, and in many cases, they have not added information (you may see lots of missing values if you look at the data). Given the work relies on volunteers there are unavoidably some accuracy issues (including how up to date the information may be). Nevertheless, when the details are recorded, they provide rich insight and local knowledge that we may otherwise be unable to obtain.

One column we should consider is the name which tells us the name of the bar. There are missing values here as well, and for this example, we will choose to exclude those lines where there is no name included, as we would like at least a little bit of context around our bars. To do this, perform another attribute operation using the filter() function:

osm_bar_sf <- osm_bar_sf %>% filter(!is.na(name))

We are still left with 283 bars in our data set.

3.4 Attribute operations

We mentioned above we were using attribute operations. These are changes to the data which we make based on manipulation of elements in the attribute table. For example, the use of filter() is an attribute operation, because we rely on the data in the attribute table in order to accomplish this task.

For example, let’s say we want to focus only on violent crime. To do this, we use the information in the attribute table, namely the values for the crime_type variable for each observation (crime) in our data set.

crimes <- crimes %>% 
  filter(crime_type == "Violence and sexual offences")

With the above, we select only those crimes (rows of the attribute table) where the crime_type variable meets a certain criteria (takes the value of “Violence and sexual offences”). Spatial operations on the other hand manipulate the geometry part of our data. We rely on the spatial information to accomplish the tasks of interest. In the next section, we will work through some examples of these.

3.5 Spatial operations

Spatial operations are a vital part of geocomputation. Spatial objects can be modified in a multitude of ways based on their location and shape. For a comprehensive overview of spatial operations in R we recommend Chapter 4 of Lovelace, Nowosad, and Muenchow (2019).

“Spatial operations differ from non-spatial operations in some ways. To illustrate the point, imagine you are researching road safety. Spatial joins can be used to find road speed limits related with administrative zones, even when no zone ID is provided. But this raises the question: should the road completely fall inside a zone for its values to be joined? Or is simply crossing or being within a certain distance sufficent? When posing such questions it becomes apparent that spatial operations differ substantially from attribute operations on data frames: the type of spatial relationship between objects must be considered.” Lovelace, Nowosad, and Muenchow (2019).

So you can see we can do exciting spatial operations with our spatial data, which we cannot with the non-spatial stuff.

3.5.1 Reprojecting coordinates

It is important to recall here some of the learning from the previous chapter on map projections and coordinate reference systems. We learned about ways of flattening out the earth, and ways of making sense of what that means for how to be able to point to specific locations in our maps. Coordinate Reference System or CRS is this method of how to refer to locations with our data. You might use a Geographic Coordinate System, which tells you where your data are located on the surface of the Earth. The most commonly used one is the WGS 84, where we define our locations with latitude and longitude points. The other type is a Projected Coordinate System which tells the data how to draw on a flat, 2-dimensional surface (such as a computer screen). In our case here, we will often encounter the British National Grid when working with British data. Here our locations are defined with Eastings and Northings.

So why are we talking about this?

It is important to note that spatial operations that use two spatial objects rely on both objects having the same coordinate reference system.

If we are looking to carry out operations that involve two different spatial objects, they need to have the same CRS! Funky weird things happen when this condition is not met, so beware! So how do we know what CRS our spatial objects are? Well the sf package contains a handy function called st_crs() which let’s us check. All you need to pass into the brackets of this function is the name of the object you want to know the CRS of.

Let’s check what is the CRS of our crimes:

st_crs(crimes)
## Coordinate Reference System: NA

You can see that we get the CRS returned as NA. Can you think of why? Have we made this into a spatial object? Or is this merely a dataframe with a latitude and longitude column? The answer is really in the question here. So we need to convert this to a sf object, or a spatial object, and make sure that R knows that the latitude and the longitude columns are, in fact, coordinates.

crimes <- st_as_sf(crimes, coords = c("longitude", "latitude"), 
                 crs = 4326, agr = "constant", na.fail = FALSE)

In the st_as_sf() function we specify what we are transforming (the name of our dataframe), the column names that have the coordinates in them (longitude and latitude), the CRS we are using (4326 is the code for WGS 84, which is the CRS that uses latitude and longitude coordinates (remember BNG uses Easting and Northing)), and finally agr, the attribute-geometry-relationship, specifies for each non-geometry attribute column how it relates to the geometry, and can have one of following values: "constant", "aggregate", "identity". The option "constant" is used for attributes that are constant throughout the geometry (e.g. land use), "aggregate" where the attribute is an aggregate value over the geometry (e.g. population density or population count), "identity" when the attributes uniquely identifies the geometry of particular “thing”, such as a building ID or a city name. The default value, NA_agr_, implies we don’t know. Finally we can specify the parameter na.fail which decides what to do when one or more of our observations have missing (NA) values in their coordinates. The default setting is for the entire process to fail in this case. However, if we set this to FALSE it simply means we will lose those observations. Essentially, we are getting rid of those violent crimes which have not been geocoding, excluding them from our analysis.

Now let’s check the CRS of this spatial version of our licensed premises:

st_crs(crimes)

We can now see that we have this coordinate system as WGS 84. We need to then make sure that any other spatial object with which we want to perform spatial operations is also in the same CRS. Let’s look at our city centre ward boundary file:

st_crs(city_centre)

We see that this is in fact in a projected coordinate system, namely the British National Grid we mentioned. To make them align, we can re-project this object into the WGS84 geographic coordinate system. To do this, we can use the st_transform() function.

city_centre <- st_transform(city_centre, crs = 4326)

Now we can check the projection again:

st_crs(city_centre)

And we can also check whether the CRS of the two objects match:

st_crs(crimes) == st_crs(city_centre)
## [1] TRUE

It is true! Finally, to check our bar data from Open Street Map:

st_crs(osm_bar_sf)

You will see this is also in WGS84. Since all our data are in this CRS, we can now move on to carry out some spatial operations.

But before doing so, we need to raise an important point. R is still undergoing the adaptation to new standards in the way CRS data is stored and handled. Before 2020 you could specify the CRS data with a EPSG code or with a proj4string reference, which look more like gibberish to the non initiated but provided more flexibility (see Lovelace, Nowosad, and Muenchow (2019) for details). But there were problems with the old PROJ4. As we noted in the first chapter, it only allow for static reference frames (when on earth everything is moving all the time). So PROJ the main library for performing conversions between systems has changed. It now discourages the old proj4string format and has changed to the OGC WKT2. When you consult the https://epsg.io/ site you can see, whenever you select a projection how it translate to different standards (including OGC WKT). This is one of those changes that long term are really good, but because the cross dependencies between spatial R packages that were being updated at different speed this had an impact on usability. Although we will stick to the more familiar EPSG notation, we strongly recommend reading the following technical notes by Pebesma and Bivand and to be particularly alert to this issue: Pebesma and Bivand (2020), Bivand (2019), Bivand (2020).

3.5.2 Subsetting points

Recall above that we wanted to focus our efforts on the City Centre ward of Manchester, however for our bounding box to download OSM data we used Greater Manchester. If we were to plot our bars, we would see that we have many which fall outside of the City Centre ward:

plot(st_geometry(osm_bar_sf), col = 'red')
plot(st_geometry(city_centre),  add = TRUE)
Many red circles are spread out over an area, with concentrations here and there. The largest concentration lies within a black outline of the City Centre ward.

FIGURE 3.3: Many bars (in red) fall outside City Centre ward

This is also the case for our crimes data:

plot(st_geometry(osm_bar_sf), col = 'red')
plot(st_geometry(crimes), col = 'blue', add = TRUE)
plot(st_geometry(city_centre), add = TRUE)
In addition to the red circles from the previous figure, now way too many overlapping blue circles have shaded large parts of the figure blue, including the inside of the City Centre ward outline.

FIGURE 3.4: Most crimes (in blue) fall outside City Centre ward

So if we really want to focus on City Centre, we should create spatial objects for the crimes and the bars which include only those which fall within the City Centre ward boundary.

First things first, we check whether they have the same CRS.

st_crs(city_centre) == st_crs(crimes)
## [1] TRUE

We do indeed, as we made sure in the previous section. Now we can move on to our spatial operation, where we select only those points within the city centre polygon. To do this, we first make a list of intersecting points to the polygon, using the st_intersects() function. This function takes two arguments, first the polygon which we want to subset our points within, and second, the points which we want to subset. We then use the resulting “cc_crimes” object to subset the crimes object to include only those which intersect (return TRUE for intersects):

# intersection
cc_crimes <- st_intersects(city_centre, crimes)
# subsetting
cc_crimes <- crimes[unlist(cc_crimes),]

Have a look at this new “cc_crimes” object in your environment. How many observations does it have? Is this now fewer than the previous “crimes” object? Why do you think this is?

(hint: you’re removing everything that is outside the city centre polygon)

We can plot this again to have a look:

plot(st_geometry(city_centre))
plot(st_geometry(cc_crimes), col = 'blue',  add = TRUE)
An outline of the City Centre ward, with numerous blue circles spread across its interior. The far east and west areas are more sparse with circles than the rest.

FIGURE 3.5: Crimes inside the City Centre ward

We have successfully performed our first spatial operation, we managed to subset our points data set of crimes to include only those crimes which are located inside the polygon for city centre.

We can do the same for the bars:

# intersection
cc_bars <- st_intersects(city_centre, osm_bar_sf)
# subsetting
cc_bars <- osm_bar_sf[unlist(cc_bars),]

We can see that of the previous 283 bars, 115 are within the City Centre ward. We can plot our data now:

plot(st_geometry(city_centre))
plot(st_geometry(cc_bars), col = 'red', add = TRUE)
plot(st_geometry(cc_crimes), col = 'blue', add = TRUE)
An outline of the City Centre ward, with numerous red and blue circles spread across its interior. The far east and west are more sparse with both types of circles compared to the centre and northeast.

FIGURE 3.6: Bars and crimes within City Centre

3.5.3 Building buffers

So we now have our bars and our violent crimes in Manchester City Centre. Let’s go back to our original question. We want to know about crime in and around our areas of interest, in this case our bars. But how can we count this? We have our points that are crimes, right? Well… How do we connect them to our points that are licensed premises?

One approach is to build a buffer around our bars, and say that we will count all the crimes which fall within a specific radius of this bar. What should this radius be? Well this is where your domain knowledge as criminologist or crime analyst comes in. How far away would you consider a crime to still be related to this pub? 400 meters? 500 meters? 900 meters? 1 km? What do you think? This is one of those it depends questions, where there is no universal right answer, instead it will depend on the environment, the question, and contextual factors. Theory will have an important role to play, for example whether the processes are conceptualised as micro-level, or meso-level. Whatever buffer you choose you should justify, and make sure that you can defend when someone might ask about it, as the further you reach obviously the more crimes you will include, and these might alter your results.

So, let’s say we are interested in all crimes that occur within 400 meters of each licensed premise, within the study area. We chose 400m here as this is often the recommended distance for accessible bus stop guidance, so basically as far as people should walk to get to a bus stop. So in this case, we want to take our points, which represent the licensed premises, and build buffers of 400 meters around them.

You can do with the st_buffer() function. We pass two arguments to our function, the item which we want to buffer (the points in our `cc_bars’ object) and the size of this buffer.Let’s quickly illustrate:

prem_buffer <- st_buffer(cc_bars, 1)

You might get a warning here, saying “st_buffer does not correctly buffer longitude/latitude datadist is assumed to be in decimal degrees (arc_degrees).”. This message indicates that sf assumes a distance value (our size of the buffer, specified as ‘1’ above) is given in degrees. This is because we have our data in a Geographic Coordinate System (lat/long data in WSG 48).

If we want to calculate the size of our buffer in a meaningful distance on our 2D surfaces, we can transform to a Projected Coordinate System, such as British National Grid. Let’s do this now:

#The code for BNG is 27700
bars_bng <- st_transform(cc_bars, 27700) 

Now we can try again, with meters, specifying our indicated 400m radius:

bars_buffer <- st_buffer(bars_bng, 400)

Let’s see how that looks:

plot(st_geometry(bars_buffer))
plot(st_geometry(bars_bng), add = T)
Small black circles are the centrepoints for large black circles, leading to a tangled mess of circles that seems to have a concetration near the midlle.

FIGURE 3.7: Bars with 400m buffers

That should look nice and squiggly. We can see it looks like there is quite a lot of overlap here. Should we maybe consider smaller buffers? Let’s look at 100 meter buffers:

bar_buffer_100 <- st_buffer(bars_bng, 100) # create 100m buffer

# plot new buffers
plot(st_geometry(bar_buffer_100))
plot(st_geometry(bars_bng), add = T)
Small black circles are now the centrepoints for medium black circles. Compared to the previous figure, they are easier to discern, and concentrations are easier to place.

FIGURE 3.8: Bars with 100m buffers

Still quite a bit of overlap, but this is possibly down to all the licensed premises being very densely close together in the city centre. We will discuss how to deal with this later on. For now, let’s go with these 100m buffers, and where a crime falls into an area of overlap, we will count it towards both premises. That is - where a crime falls within multiple buffers it will count towards all the bars associated with those buffers.

The next step will be to count the number of crimes which fall into each buffer. Before we move on though, remember the CRS for our crimes is WGS 48 here, so we will need to convert our buffer layer back to this:

buffer_WGS84 <- st_transform(bar_buffer_100, 4326)

Now let’s just have a look:

plot(st_geometry(buffer_WGS84))
plot(st_geometry(cc_crimes), col = 'blue', add = T)
Medium sized black circles, from the previous figure, are now joined by small blue circles. Some concentrations of the two types of circles align.

FIGURE 3.9: Crimes around the 100m buffer polygons

Okay, so some crimes fall inside some buffers, others not so much. Well, let’s get to our next spatial operation, to be able to determine how many crimes happened in the 100m radius of each bar in Manchester City Centre.

3.5.4 Counting Points within a Polygon

When you have a polygon layer and a point layer - and want to know how many or which of the points fall within the bounds of each polygon, you can use this method of analysis. In computational geometry, the point-in-polygon (PIP) problem asks whether a given point in the plane lies inside, outside, or on the boundary of a polygon. As you can see, this is quite relevant to our problem, wanting to count how many crimes (points) fall within 100 meters of our licensed premises (our buffer polygons). We can achieve this with the st_join() function, which spatially joins the bar name to each crime, and then we could the frequency of each name (hence the count(name)). This returns a frequency table of the number of crimes within the buffer of each bar, saved in the crimes_per_prem object.

crimes_per_prem <- cc_crimes %>% 
  st_join(buffer_WGS84, ., left = FALSE) %>% 
  count(name)

You now have a new dataframe, crimes_per_prem which has a column for the name of the bars, a column for the number of violent crimes that fall within the buffer, and a column for the geometry. Take a moment to look at this table. Use the View() function. Which premises have the most violent crimes? Let’s see the bar with the most crimes:

crimes_per_prem %>% 
  filter(n == max(n)) %>% 
  select(name, n)

The bar with the highest number of crimes is Crafty Pig with 59 crimes. Keep this in mind for the next section…!

So in this case, we used the point-in-polygon approach, and counted the number of points which fell into each polygon. We saw earlier, with the buffers, that they often overlapped with one another. This means that a crime may have been counted multiple times. This resulting data therefore tells us: How many crimes happened within 100 meters of each bar. This is one way to approach the problem, but not the only way. In our next spatial operation, we will calculate distances in order to explore another way.

3.5.5 Distances: Finding the nearest point

Another way to solve this problem is to assign each crime (point) to the closest possible bar (other point). That is, look at the distances for each crime between it’s location and the locations of all the bars in Manchester, and then, from those, choose the bar which is the closest. Then, we can assign this bar as the location for that crime.

We can achieve this using the st_nearest_feature() function. This function takes our two sf objects, and for each row of the first one (x = cc_crimes), simply returns us the index of the nearest features from the second one (y = cc_bars). We combine with the mutate() function in order to create a new variable which contains this index for each crime. Let’s illustrate:

crime_w_bars <- cc_crimes %>% 
  mutate(nearest_bar = st_nearest_feature(cc_crimes, cc_bars))

If we now have a look at this new object “crime_w_bars”, we can see it is our crimes data, but we have a new column, which contains the index of the closest bar in the cc_bars dataframe, right at the end. So for example, for me the first point there, the nearest bar is that in location 84 (vectors in R are 1-indexed, not 0-indexed like many other languages). If we wanted to look at what is on the 84th row we may call:

cc_bars[84]

Note: you may get a different number, if since time of writing new bars have been added, or taken away in the OSM database. You can continue with a different number to follow along!

However, this returns all the 90+ variables for this row. If we want only the name, we can query for the 84th row and the 2nd column (which is name):

cc_bars[84, 2]

You can see the name is “Dive”. For that first crime, in our data set, the nearest bar is “Dive” bar. Now, instead of going through this process manually for each point, we can use the index to subset within our mutate() function:

crime_w_bars  <- cc_crimes %>% 
  mutate(nearest_bar = cc_bars[st_nearest_feature(cc_crimes, cc_bars),2])

Now we have new information in this nearest_bar column, the name of the nearest bar, and the geometry. We actually don’t need the geometry for now, as we will simply be counting the frequency of each bar, which we can join back to our cc_bars object, which has a geometry, so we can extract the $name element only, and remove the geometry. Like so:

crimes_per_prem_2 <- crime_w_bars %>%  # create new crimes_per_prem_2 object
  st_drop_geometry() %>%              # drop (remove) the geometry
  group_by(nearest_bar$name) %>%      # group by to find frequency of each bar 
  summarise(num_crimes = n()) %>%     # count number of crimes 
  rename(name = `nearest_bar$name`)   # rename variable to 'name' 

To tie this back to our spatial object “cc_bars” we can use the left_join() function:

crimes_per_prem_2 <- left_join(cc_bars, crimes_per_prem_2, 
                               by = c("name" = "name"))

Let’s see the bar with the most crimes with this approach:

crimes_per_prem_2 %>% 
  filter(num_crimes == max(num_crimes, na.rm = TRUE)) %>% 
  select(name, num_crimes)
## Simple feature collection with 1 feature and 2 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -2.236 ymin: 53.48 xmax: -2.236 ymax: 53.48
## Geodetic CRS:  WGS 84
##         name num_crimes             geometry
## 1 Crafty Pig         50 POINT (-2.236 53.48)

The bar with the highest number of crimes is still Crafty Pig, but now with 50 crimes. This means there is a difference in the number of crimes attributed to this bar between the two approaches. Clearly there are 9 crimes which fell within the buffer in the first approach, but were closer to another bar in the dataset, and were instead attributed to that one using this approach.

So which is better? This is once again up to you as the researcher and analyst to decide. They do slightly different things, and so will answer slightly different questions. With the nearest feature approach, instead of talking about the number of crimes within some distance to the bar, we are instead talking about for each crime, the closest venue. This might mean that we could be attributing crimes that happen quite far from the venue to it, just because it’s the closest within our data set. However, we are counting each crime only once. Pros and cons need to be weighed up to make decisions like these.

3.5.6 Measuring distances

Let’s have a look at this bar called “Crafty Pig”. We can select this from the cc_bars, the buffers, and the crimes

cp <- cc_bars %>% filter(name == "Crafty Pig") 

cp_buffer <- bar_buffer_100 %>% filter(name == "Crafty Pig") %>% 
  st_transform(4326) # transform CRS

cp_crimes <- crime_w_bars %>% filter(nearest_bar$name  == "Crafty Pig")  

We can use mapply() and the st_union() function to draw a line between each crime and the closest bar (Crafty Pig in this case):

dist_lines <- st_sfc( # Create simple feature geometry list column
  mapply(   # apply function to multiple objects 
    function(a,b){              # specify function parameters
      st_cast(st_union(a,b),"LINESTRING") # specify function
      }, 
    cp_crimes$geometry,  # input a for function
    cp_crimes$nearest_bar$geometry,  # input b for function
    SIMPLIFY=FALSE)) # don't attempt to reduce the result 

We can then plot these to get an idea of what we’re looking at:

plot(st_geometry(cp_buffer))
plot(st_geometry(cp), col = "black", add = TRUE)
plot(st_geometry(cp_crimes), col = "blue", add = TRUE)
plot(st_geometry(dist_lines), add = TRUE)
A small black circle is the centrepoint for a very large black circle. A small blue circle is within this larger circle, and a line connects it to the small black circle.

FIGURE 3.10: Crimes near the Noho bar in Manchester

Note: make sure to run all 4 lines above in one batch.

So we can see that all these crimes happened on only one location, which is within the 100 meter buffer. But how far exactly are they?

You can use the st_distance() function to answer this question. We wrap this in the mutate() function in order to create a new column called distance which will contain for each row (each crime) the distance between that and its nearest bar (in this case Crafty Pig).

cp_crimes <- cp_crimes %>% 
  mutate(distance = st_distance(geometry, nearest_bar$geometry))

Having a look at our newly created variable, we can see that the crime locations are 67.094514757815 away from the Crafty Pig bar.

One thing you might find strange about the data is that why are all these crimes geocoded on top of one another? This is how the open data are released, using geo-masking by snapping crime locations to a geomask (a set of points). This is done to ensure anonymity in the data (see Tompson et al. (2015) for more detail on this). In non-anonimysed data you might expect to see a little less overlap in your crime locations… Then with variation in distances between crimes and their nearest bars, we could use these distances to inform a buffer width for example. Anyway we will return to distances a little later with a better data set. But now, let’s move on to putting these outcomes on a map, which will help us further investigate the case of the Crafty Pig!

3.6 Plotting interactive maps with leaflet

In the first chapter, we introduced the ggplot2 package for making maps in R. In this chapter, we are going to introduce leaflet as one way to easily make some neat maps. It is the leading open-source JavaScript library for mobile-friendly interactive maps. It is very popular, used by websites ranging from The New York Times and The Washington Post to GitHub and Flickr, as well as GIS specialists like OpenStreetMap, Mapbox, and CartoDB. We will also make use of the RColorBrewer package.

To make a map, load the leaflet and RColotBrewer libraries.

library(leaflet) #for mapping
library(RColorBrewer) #for getting nice colours for your maps

You then create a map with this simple bit of code:

m <- leaflet() %>% addTiles()  

And just print it:

m  
A featureless world map in the Mercator projection is zoomed out to an extent where continents repeat. The top left has plus and minus buttons meant for zooming. The bottom right contains the text 'Leaflet', in addition to a copyright notice.

FIGURE 3.11: A first interactive map

Not a super useful map, but it was really easy to make! You might of course want to add some content to your map.

You can add a point manually:

m <- leaflet() %>% addTiles()  %>% 
  addMarkers(lng=-2.230899,  # longitude
             lat=53.464987,  # latitude
             popup="University of Manchester") # text for popup
m  
A leaflet street map is centred around the University of Manchester, which is indicated by a blue pin marker. The usual leaflet decorators including zoom buttons and copyright notice are present.

FIGURE 3.12: Mapping the Univeristy of Manchester

If you click over the highlighted point you will read our input text “University of Manchester”.

You can add many points manually, with some popup text as well:

# create dataframe of latitude, longitude, and popups
latitudes <- c(53.464987, 53.472726, 53.466649) 
longitudes <- c(-2.230899, -2.245481, -2.243421) 
popups <- c("You are here", "Here is another point", "Here is another point") 
df <- data.frame(latitudes, longitudes, popups)      

# create leaflet map
m <- leaflet(data = df) %>% addTiles()  %>%  
  addMarkers(lng=~longitudes, lat=~latitudes, popup=~popups)
#print leaflet map
m  

We can also map polygons, not just points. Let’s plot our crimes on/near bars to illustrate. To do this, we can return to our buffers where we counted the number of crimes within 100m of each bar/ licensed premise (the “crimes_per_prem” object).

First, let’s pick a colour palette. We do this with the colorBin() function. We will discuss colour choices in maps in Chapter 5, for now, let’s just pick the “RdPu” palette. We should also specify the domain = parameter (what value to use for shading, in this case n, bins = - the number of crimes), the number of bins (in this case 5 - we will discuss this in detail in the coming chapters as well), and pretty = to use pretty breaks (this may actually mess with the number of bins specified in the bins parameter, but again, for now this is OK).

Let’s create this palette and save in an object called pal for palette:

pal <- colorBin("RdPu", domain = crimes_per_prem$n, bins = 5, pretty = TRUE)

Now we can make a leaflet map, where we add these polygons (the buffers) with the addPolygons() function, and call our palette, specifying again the variable to use for shading, as well as some other parameters. One to highlight specifically is the label parameter. This allows us to use a variable as a label for when a user clicks on our polygon (buffer). Here we specify the name of the bar with label = ~as.character(name).This way we not only shade each buffer with the number of crimes which fall inside it, but also include a little popup label with the name of the establishment:

leaflet(crimes_per_prem) %>% 
  addTiles() %>% 
  addPolygons(fillColor = ~pal(n), fillOpacity = 0.8,
              weight = 1, opacity = 1, color = "black",
              label = ~as.character(name)) %>% 
  addLegend(pal = pal, values = ~n, opacity = 0.7, 
            title = 'Violent crimes', position = "bottomleft") 

FIGURE 3.13: Mapping crimes around bars in Manchester with leaflet

It’s not the neatest of maps, with all these overlaps, but we will talk about prettifying maps further down the line. You can however, pan and zoom, and investigate to find our most high-crime venue, the Crafty Pig. And here, with this background information, we can solve the puzzle. You see, the Crafty Pig appears to be the nearest venue to an area in Manchester City Centre called Piccadilly Gardens which is an area known for high levels of crime and anti social behaviour. Therefore it is likely that we are erroneously attributing many of these crimes to the Crafty Pig venue, as they may be taking place in Piccadilly Gardens instead. It is important, therefore, to think about any unintended consequences of the spatial operations we carry out, and how these might affect the conclusions which we draw from our crime mapping exercises.

Finally, let’s say we want to save our interactive map, while keeping it interactive. You can do this by clicking on the export button at the top of the plot viewer, and choose the Save as Webpage option saving this as a .html file:

A screenshot from the plot viewer has been cropped, showing the tabs and buttons near the top menu. The export menu is active, with the current selection being 'Save as Web Page'.

FIGURE 3.14: Save leaflet map as interactive html document

Then you can open this file with any type of web browser (safari, firefox, chrome) and share your map that way. You can send this to your friends, and make them jealous of your fancy map making skills.

3.7 Geocoding

We were making use of point of interest data from Open Street Map above, but it is possible that we have a data set of bars from official, administrative data sets, but that are not geocoded. In this case, we may have a list of bars with an associated address, which is clearly some sort of spatial information, but how would you put this on a map?

One solution to this problem is to geocode these addresses. We can use the package tidygeocoder to achieve this. This package takes an address given as character values, for example “221B Baker Street, Marylebone, London NW1 6XE” and returns coordinates, geocoding this address. Let’s say we have a dataframe of addresses (in this case only one observation):

addresses <- data.frame(name = "Sherlock Holmes", 
                        address = "221B Baker Street, London, UK")

We can then use the geocode() function from the tidygeocoder pckage to get coordinates for this address. We have to specify the column which has the address (in this case address), and the method to use for geocoding. See the help file for the function for the many options. For example if you are in the USA you may use “census”. Since we are global, we will use “osm”, which uses nominatim (OSM) to provide worldwide coverage. So given the above example:

addresses %>% geocode(address, method = 'osm')
## Passing 1 address to the Nominatim single address geocoder
## Query completed in: 1 seconds
## # A tibble: 1 × 4
##   name            address                    lat   long
##   <chr>           <chr>                    <dbl>  <dbl>
## 1 Sherlock Holmes 221B Baker Street, Lond…  51.5 -0.158

To illustrate on scale, let’s have a look at another source of data on bars in Manchester. Manchester City Council have an Open Data Catalogue on their website, which you can use to browse through what sort of data they release to the public. Like in many city open data portals, there are a some more and some less interesting data sets made available here. It’s not quite as impressive as the open data from some of the cities in the US such as New York or Dallas but we’ll take it.

One interesting data set, especially for our questions about the different alcohol outlets is the Licensed Premises data set. This details all the currently active licenced premises in Manchester. There is a URL link to download these provided.

There are a few ways we can download this data set. On the manual side of things, we can simply right click on the download link from the website, save it to your computer, and read it in from there, by specifying the file path. Remember, if you save it in your R working directory, then you just need to specify the file name, as the working directory folder is where R will first look for this file.

Another way is to read into R directly from the URL provided. Here, let’s read in the licensed premises data directly from the web using the URL "http://www.manchester.gov.uk/open/download/downloads/id/169/licensed_premises.csv":

data_url <- "http://www.manchester.gov.uk/open/download/downloads/id/169/licensed_premises.csv"

lic_prem <- read_csv(data_url) %>% clean_names()

You will likely get some warnings when reading this data, but you can safely ignore those. You can always check if this worked by looking to your global environment on the right hand side and seeing if this lic_prem object has appeared. If it has, you should see it has 65535 observations (rows), and 36 variables (columns).

Let’s have a look at what this data set looks like. You can use the View() function for this:

View(lic_prem)

We see that there is a field for “premisesname” which is the name of the premise, and two fields, “locationtext” and “postcode” which refer to address information. To geocode these, let’s create a new column which combines the address and post code, and then use the geocode() function introduced above. This will take a while for the whole 65535 addresses data set, so just for illustration purposes, we take the first 30.

lic_prem <- lic_prem %>% 
  slice(1:30) %>%    # Select first 30 venues
  # Create new complete_address column from locationtext and postcode using paste()
  mutate(complete_address = paste(locationtext,  postcode,  sep=", ")) %>%     
  geocode(complete_address, method = 'osm')  # geocode with osm method
## Passing 30 addresses to the Nominatim single address geocoder
## Query completed in: 30.2 seconds

Now we have these licenced premises geocoded, with brand new latitude and longitude information! We can use this to make a leaflet map of our venues!

# make sure coordinates are numeric values
lic_prem$latitude <- as.numeric(lic_prem$lat)
lic_prem$longitude <- as.numeric(lic_prem$long)

# create map
leaflet(data = lic_prem) %>%
  addTiles() %>% 
  addMarkers(lng=~longitude,  lat=~latitude, popup=~as.character(premisesname), 
             label = ~as.character(premisesname))

FIGURE 3.15: Map of (some) geocoded licenced premises in Manchester

Geocoding may come in handy when we have address data, or something similar but no geometry to use to map it.

3.8 Measuring distance more thoroughly

Before we end the chapter, we want to return to the spatial operation of measuring the distance between points. This may be important for example to those researchers focusing on studying the journey to crime by offenders. In this area, a common parameter studied is the average distance to crime from their home locations. In order to estimate these parameters, we first need to have a way to generate the distances. In this section, we will use another data set (this time from Madrid, Spain) to show a simpler example to look at the issue of geographical distance.

3.8.1 How far are police stations in Madrid?

To illustrate how to measure distance we will download data from the city of Madrid in Spain. Specifically we will obtain a csv file with the latitude and longitude of the police stations and a geoJSON file with the administrative boundary for the city of Madrid. Both are available from the data provided with this book (see Preamble section). We will also turn the .csv into a sf object with the appropriate coordinate reference system for this data, following the steps we’ve outlines in Chapter 1 and earlier in this Chapter.

#read csv data
comisarias <- read_csv("data/nationalpolice.csv")
#set crs, read into sf object, and assign crs
polCRS <- st_crs(4326)
comisarias_sf <- st_as_sf(comisarias, coords = c("X", "Y"), crs = polCRS)

#create unique id for each row
comisarias_sf$id <- as.numeric(rownames(comisarias_sf))

#Read as sf boundary data for Madrid city
madrid <- st_read("data/madrid.geojson") 

We went through a lot of steps there, so it’s worth to check in and plot our data, make sure that everything looks the way we expect. To practice with leaflet map some more, let’s plot using leaflet() function:

leaflet(comisarias_sf) %>% addTiles() %>%
  addMarkers(data = comisarias_sf) %>%
  addPolygons(data = madrid)

FIGURE 3.16: Leaflet map of our Madrid data

We can clearly see here that there are areas of the municipality that are far away from any national police station, the North West part of the city which you can see is a green area but noticeably also the South East, which is mostly urban and in fact is the location of a known shanty town and open drug market (“Cañada Real”, you can read about it in the award-winning book by Briggs and Monge-Gamero (2017)).

3.8.2 Distance in geographical space

There are many definitions of distance in data science and spatial data science. A common definition of distance is the Euclidean distance, which simply is the length of a segment connecting two points in a two dimensional place. Because of the distortions caused by projections on a flat surface, a straight line on a map is not necessarily the shortest distance. Thus, another common definition used in geography is the great circle distance, which corresponds to an arc linking two points on a sphere and takes into account the spherical shape of the world. The great circle distance is useful, for example, to evaluate the shortest path when intercontinental distances are concerned. For applications which require a network distance measure, Euclidean distance is generally inadequate. Instead, Manhattan distance (also called Taxicab, Rectilinear, or City block metric) is a commonly used metric. We return to dealing with spatial data on networks in Chapter 8.

We can compute both with the st_distance function of the sf package. This function can be used to measure the distance between two points, between one point and others or between all points. In the latter case we obtain a symmetric matrix of distances (\(N \times N\)), taken pairwise between the points in our dataset. In the diagonal we find the combinations between the same points giving all null values.

Say we want to measure the distance between the main police headquarters (“Jefatura Superior de Policia”, row 34) and three other stations (say row 1, row 10, and row 25 in our dataset). We could use the following code for that:

# calculate distance
dist_headquarters <- st_distance(slice(comisarias_sf, 34), 
                              slice(comisarias_sf, c(1, 10, 25)))

dist_headquarters # distance in meters
## Units: [m]
##      [,1] [,2] [,3]
## [1,] 8124 5086 8555

The result is a matrix with a single row or column (depending on the order of the spatial objects) with a class of units. Often we may want to re-express these distances in a different unit. For this purpose the units package offers useful functionality, through the set_units() function.

set_units(dist_headquarters, "km")
## Units: [km]
##       [,1]  [,2]  [,3]
## [1,] 8.124 5.086 8.555

We can compute the distance between all police stations as well.

# calculate distance
m_distance <- st_distance(comisarias_sf)

# matrix dimensions
dim(m_distance)
## [1] 34 34

If you want to preview the top of the matrix you can use:

head(m_distance)

3.8.3 A practical example to evaluate distance

For this practical example we will look at the Madrid data. Earlier, we read in the relevant geoJSON file and stored in the object madrid. The CRS is EPSG 4326 and therefore it is geographic projection, with distance expressed in degrees.

st_crs(madrid)

For this we will transform and reproject to EPSG 2062, which is one of the appropriate projected coordinate system for Madrid. How do you figure out this if you don’t want to check the EPSG site? There is a convenient package crsuggest developed by Kyle Walker (2021) that does this job for you. The function suggest_crs() from this package will attempt to list of candidate projections for your sf object. And we will see that top of that list is the 2062 EPSG.

suggested_crs <- suggest_crs(madrid)
head(suggested_crs, n = 1)
## # A tibble: 1 × 6
##   crs_code crs_name          crs_type crs_gcs crs_units
##   <chr>    <chr>             <chr>      <dbl> <chr>    
## 1 2062     Madrid 1870 (Mad… project…    4903 m        
## # … with 1 more variable: crs_proj4 <chr>

Now we can transform our madrid object appropriately:

madrid_meters <- st_transform(madrid, crs = 2062)

Before we saw that some areas of Madrid are nowhere near a police station. Let’s say we want to get precise about this and we want to know how far different parts of the city of Madrid are from a police station, and we want to be able to show this in a map. Solving this means we have to define “parts of the city”. What we will do is to divide the city of Madrid into different cells of 250 meters within a grid using the st_make_grid function of the sf package.

madrid_grid <- st_make_grid(madrid_meters,  cellsize = 250)

#only extract the points in the limits of Madrid
madrid_grid <- st_intersection(madrid_grid, madrid_meters)   

We can plot the results, to see that everything has gone according to plan:

plot(madrid_grid)
A black, filled shape of Madrid.

FIGURE 3.17: Madrid digivded into 250m grid cells

With so many “small” cells the paper printed version of this map may just look completely black. But it is simply composed of many small cells. So how do we look at distance from police stations here? We can measure the distance between each grid cell to all 39 police stations. To estimate the distance to the nearest police station we will find the minimum distance value for each grid, i.e. the distance to the nearest station.

comisarias_sf_meters <- st_transform(comisarias_sf, crs = 2062)

distances <- st_distance(comisarias_sf_meters, 
                         st_centroid(madrid_grid)) %>% 
  as_tibble()

If you view the new object “distances” you will see there is a row for each police station and a column representing each of the 10082 cells in our grid. For using these distances in a leaflet map we will project back into 4326. And then will compute the shortest distance for each cell.

# Compute distances
police_distances <- data.frame(
  # We want grids in a WGS 84 CRS:
  us = st_transform(madrid_grid, crs = 4326),
  # Extract minimum distance for each grid
  distance_km = map_dbl(distances, min)/1000,
  # Extract the value's index for joining with the location info
  location_id = map_dbl(distances, function(x) match(min(x), x))) %>% 
  # Join with the police station table
  left_join(comisarias_sf, by = c("location_id" = "id"))

We now have a dataframe of distances, for each grid, to the nearest police station. We can have a look at the distribution of these distances through plotting them on a histogram

# Plot and examine distances

hist(police_distances$distance_km, main = 'Distance to nearest police station')
A bar chart, titled 'Distance to nearest ploce station'. The vertical axis is labeled Frequency, and ranges from zero to fifteen hundred, with notches at steps of five hundred. The horizontal axis does not have a nice label, but clearly represents distance in kilometres, with one bar for each kilometre range, and labels every five steps from zero to fifteen. The plot peaks at the second bar, one to two kilometres, at around seventeen hundred. The first and third bars are next, at around thirteen hundred. From there, the bars taper down as distance increases, with possible plateaus from five to nine kilometres, and again at nine to fourteen. The last bar, corresponding to eighteen to nineteen kilometres is very short.

FIGURE 3.18: Histogram of distance to nearest police station

Now we are ready to use this data to plot a map. We can get creative, and use the makeIcon() function from leaflet to assign an image as our icon, rather than these regular pointer icons we’ve been seeing. To do this, first we will adjust some aesthetics.

We get the URL for the icon (we break into part 1 (_pt1) and part 2 (_pt2) so it can be seen in the textbook), and pass this to the makeIcon() function.

# Create more appropriate icon, taking it from Wikepedia commons 
icon_url_pt1 <- "https://upload.wikimedia.org/wikipedia/commons/"
icon_url_pt2 <- "a/ad/189-woman-police-officer-1.svg"

# create icon, adjusting size 
police_icon <- makeIcon(paste0(icon_url_pt1, icon_url_pt2), iconWidth = 12,  iconHeight = 20)

Now we also want to create a colour scale. We can, for example, group our grids into quantiles. To do so, we can use the quantile() function:

# Bin ranges for a nicer color scale
bins <- quantile(police_distances$distance_km)
# Create a binned color palette
pal <- colorBin(c("#0868AC", "#43A2CA", "#7BCCC4", "#BAE4BC", "#F0F9E8"), 
                domain = police_distances$distance_km, 
                bins = bins, reverse = TRUE)

Now let’s create the map.

full_map <- leaflet() %>% 
  addTiles() %>% 
  addMarkers(data = comisarias_sf, icon = ~police_icon, 
             group = "Police stations") %>% 
  addPolygons(data = police_distances[[1]], 
              fillColor = pal(police_distances$distance_km), 
              fillOpacity = 0.8, weight =0, opacity = 1, color = "transparent", 
              group = "Distances",
              highlight = highlightOptions(weight = 2.5, color = "#666", 
                                           bringToFront = TRUE, opacity= 1), 
              popupOptions = popupOptions(autoPan = FALSE, closeOnClick = TRUE,
                                          textOnly = T)) %>% 
   addLegend(pal = pal, values = (police_distances$distance_km), 
             opacity = 0.8, title = "Distance (Km)", position= "bottomright") 

# print the map
full_map

FIGURE 3.19: Map of distance to nearest police station across Madrid

And there you go. Just remember something. It is easy to misinterpret data and maps. You always need to care a great deal about measurement, quality of your data, and other potential issues affecting interpretation. When it comes to distance, and the movements of people and law enforcement personnel, for example, physical distance is not trivial, but time to arrival is also important and this is determined by factors other than Euclidean distance (e.g., availability and speed of transport, physical barriers, etc.). Our representation is always as good as the data we have. In Spain there are two other police forces (Guardia Civil, that patrols rural areas, and municipal civil, with jurisdiction for local administrative enforcement) that we are not representing here (that is, our data is incomplete). And we are not plotting the police stations in the nearby municipalities that are part of Madrid metropolitan area, around the edges.

3.9 Summary and further reading

In this chapter we explored the differences bweteen attribute and spatial operations, and made great use of the latter in practicing spatial manipulation of data. These spatial operations allow us to manipulate our data in a way that lets us study what is going on with crime at micro places. At the start of the chapter we introduced the idea of different crime places. To read further about crime attractors vs crime generators turn to the recommended readings by Brantingham and Brantingham (1995) and Newton (2018). There have since been more developments, for example about crime radiators and absorbers as well (see Bowers (2021) to learn more).

We covered how to subset points within an area, build buffers around geometries, count the number of points in a point layer that fall within each polygon in a polygon layer, find the nearest feature to a set of features, and turn non-spatial information such as an address into something we can map using geocoding. These spatial operations are just some of many, but we chose to cover them as they have the most frequent application in crime analysis.

For those interested to learn more, spatial operations are typically discussed in standard GIS textbooks, such as those we have recommended in previous chapters. You could see, for example, Chapter 9 of Bolstad (2019). But probably the best follow up to what we discuss here is Chapter 4 and 5 of Lovelace, Nowosad, and Muenchow (2019), for it provides a systematic introduction to how to perform these spatial operations with R. There is an online book in development by two giants of the R spatial community, Edzer Pebesma and Roger Bivand, which at the time of writing is best cited as Pebesma and Bivand (2021). The first 5 chapters of the book provide a strong backbone to understand sf objects in greater detail, coordinate systems, and key concepts for spatial data science.

References

Bivand, Roger. 2019. “Ecs530: (III) Coordinate Reference Systems.” https://rsbivand.github.io/ECS530_h19/ECS530_III.html.
———. 2020. “Migration to Proj6/Gdal3.” http://rgdal.r-forge.r-project.org/articles/PROJ6_GDAL3.html.
Bolstad, Paul. 2019. GIS Fundamentals: A First Text on Geographic Information Systems. 6th ed. White Bear Lake, MN: Eider Press.
Bowers, Kate. 2021. “Risky Places: Crime Absorbers, Crime Radiators as Risky Places.” https://play.kth.se/media/Risky+PlacesA+Crime+absorbers,+crime+radiators+as+risky+places,+Prof+Kate+Bowers,+UCL/0_zya4j354.
———. 1995. “Criminality of Place: Crime Generators and Crime Attractors.” European Journal on Criminal Policy and Research 3: 5–26.
Briggs, Daniel, and Rubén Monge-Gamero. 2017. Dead-End Lives: Drugs and Violence in the City Shadows. Bristol, UK: Policy Press.
Langton, Samuel, and Reka Solymosi. 2018. “Open Data for Crime and Place Research: A Practical Guide in r.” In “The Study of Crime and Place: A Methods Handbook” Eds: Elisabeth Groff and Corey Haberman. Temple University Press.
Lovelace, Robin, Jakub Nowosad, and Jannes Muenchow. 2019. Geocomputation with r. Boca Raton, Florida: Chapman; Hall/CRC Press.
———. 2018. “Macro-Level Generators of Crime, Including Parks, Stadiums, and Transit Stations.” In The Oxford Handbook of Environmental Criminology, edited by Gerben Bruinsma and Shane Johnson, 497–517. Oxford: Oxford University Press.
Padgham, Mark, Bob Rudis, Robin Lovelace, and Maëlle Salmon. 2017. “Osmdata.” The Journal of Open Source Software 2 (14). https://doi.org/10.21105/joss.00305.
Pebesma, Edzer, and Roger Bivand. 2020. “R Spatial Follows GDAL and PROJ Development.” https://r-spatial.org/r/2020/03/17/wkt.html.
———. 2021. “Spatial Data Science with Applications in r.” https://keen-swartz-3146c4.netlify.app/.
Tompson, Lisa, Shane Johnson, Matthew Ashby, Chloe Perkins, and Philip Edwards. 2015. “UK Open Source Crime Data: Accuracy and Possibilities for Research.” Cartography and Geographic Information Science 42 (2): 97–111.
Walker, Kyle. 2021. Crsuggest: Obtain Suggested Coordinate Reference System Information for Spatial Data. https://CRAN.R-project.org/package=crsuggest.