You can use this template script file to work through the lesson.

Spatial Data

In this lesson, students will learn how to work with spatial data, or data with geographic information. There are different packages that in R that handle spatial data. In these packages, spatial data is stored as a special class (ex. character, number, etc.) which stores the geographic information.

To start, let’s plot North Carolina with county lines. ggplot2 contains some mapping data that you can use without having to import from your computer. Here, we will use the function map_data to get the USA map data, and then filter by “north carolina” as you usually would with data frames in dplyr.

library(ggplot2)
library(dplyr)

map <- map_data("county")
nc <- map |>
  filter(region =="north carolina")

head(nc)
##        long      lat group order         region subregion
## 1 -79.53800 35.84424  1857 54915 north carolina  alamance
## 2 -79.54372 35.89008  1857 54916 north carolina  alamance
## 3 -79.53800 35.98175  1857 54917 north carolina  alamance
## 4 -79.52081 36.23385  1857 54918 north carolina  alamance
## 5 -79.26298 36.23385  1857 54919 north carolina  alamance
## 6 -79.27444 35.90726  1857 54920 north carolina  alamance

In this data, there are points that will be connected into polygons to plot each county based on the “group”. In the aesthetics, we set x to the longitude and y to the latitude.

ggplot(data = nc, aes(x = long, y = lat, group = group)) +
  geom_polygon() 

There are different types of geographic information that we can plot.

When plotting data as a polygon, like we are doing with the North Carolina map, points are connected with lines to make a shape. You can plot the different types of spatial data, like points, lines, and polygons, using ggplot layers. With the polygon, we will use geom_polygon()

You can edit the plot in the same way that you did other plots in ggplot. For example, now we can color each of the the counties, which are in the “subregion” column, a different color and change the theme.

ggplot(data = nc, aes(x = long, y = lat, group = group)) +
  geom_polygon(aes( fill = subregion))+
  theme_void()+
  theme(legend.position = 'none') 

With plotting spatial data, you have to consider the coordinate system, which determines how the spherical data from the earth is plotted flattened as a 2-D map. There are different coordinate systems that you can use. The default is cartesian coordinates, where x and y coordinates are 1:1.

See what happens when you change the coordinate systems by adding some of the following layers:

  1. coord_map(): Mercator projection
  2. coord_map("conic", lat0 = 30)
  3. coord_map('gilbert') 1.coord_map('orthographic')

Plotting polygons in sf

In the last example, we plotted a polygon simply by connecting points with lines. However, other data sources have polygon spatial data stored differently. In the next example, we will plot using a shape (.shp) file in the package sf. We will plot the major rivers or streams in Wake County. The file is already in your Posit Cloud workspace. To load in the shape file using sf, you will use the function st_read.

library(sf)
## Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1; sf_use_s2() is TRUE
rivers <- st_read('data/Major_Rivers/')
## Reading layer `Major_Rivers' from data source 
##   `/cloud/project/lessons/mapping/data/Major_Rivers' using driver `ESRI Shapefile'
## replacing null geometries with empty geometries
## Simple feature collection with 9421 features and 5 fields (with 42 geometries empty)
## Geometry type: GEOMETRY
## Dimension:     XY
## Bounding box:  xmin: 612248 ymin: 197064.4 xmax: 677032.4 ymax: 257607
## Projected CRS: NAD83 / North Carolina

You can already see that this is different than a regular csv file, with information about the geographic data shown as you import the data. Specifically, you can see that the coordinates are XY, a “bounding box” or dimensions, and a coordinate system (Projected CRS).

You can also see in the data a column for the geometry. This determines the shape of each stream.

head(rivers)
## Simple feature collection with 6 features and 5 fields
## Geometry type: LINESTRING
## Dimension:     XY
## Bounding box:  xmin: 615889.6 ymin: 218627.2 xmax: 616459.2 ymax: 219804.9
## Projected CRS: NAD83 / North Carolina
##   OBJECTID        FTYPE FCODE         NAME       RCH_CODE
## 1        1 STREAM/RIVER 46000 Reedy Branch 03030002002226
## 2        2 STREAM/RIVER 46000 Reedy Branch 03030002002226
## 3        3 STREAM/RIVER 46000 Reedy Branch 03030002002226
## 4        4 STREAM/RIVER 46000 Beaver Creek 03030002000011
## 5        5    CONNECTOR 33400 Beaver Creek 03030002000011
## 6        6    CONNECTOR 33400 Beaver Creek 03030002000012
##                         geometry
## 1 LINESTRING (616459.2 219804...
## 2 LINESTRING (616355.1 219793...
## 3 LINESTRING (616225.6 219798...
## 4 LINESTRING (615944.7 218709...
## 5 LINESTRING (615895.2 218702...
## 6 LINESTRING (616331.7 218627...
rivers$geometry[1]
## Geometry set for 1 feature 
## Geometry type: LINESTRING
## Dimension:     XY
## Bounding box:  xmin: 616369 ymin: 219792.1 xmax: 616459.2 ymax: 219804.9
## Projected CRS: NAD83 / North Carolina
## LINESTRING (616459.2 219804.9, 616452.3 219801....

You can use the functions st_bbox to find the boundaries of the file and st_crs to view the coordinate system information.

st_bbox(rivers)
##     xmin     ymin     xmax     ymax 
## 612248.0 197064.4 677032.4 257607.0
st_crs(rivers)
## Coordinate Reference System:
##   User input: NAD83 / North Carolina 
##   wkt:
## PROJCRS["NAD83 / North Carolina",
##     BASEGEOGCRS["NAD83",
##         DATUM["North American Datum 1983",
##             ELLIPSOID["GRS 1980",6378137,298.257222101,
##                 LENGTHUNIT["metre",1]]],
##         PRIMEM["Greenwich",0,
##             ANGLEUNIT["degree",0.0174532925199433]],
##         ID["EPSG",4269]],
##     CONVERSION["SPCS83 North Carolina zone (meters)",
##         METHOD["Lambert Conic Conformal (2SP)",
##             ID["EPSG",9802]],
##         PARAMETER["Latitude of false origin",33.75,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8821]],
##         PARAMETER["Longitude of false origin",-79,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8822]],
##         PARAMETER["Latitude of 1st standard parallel",36.1666666666667,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8823]],
##         PARAMETER["Latitude of 2nd standard parallel",34.3333333333333,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8824]],
##         PARAMETER["Easting at false origin",609601.22,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8826]],
##         PARAMETER["Northing at false origin",0,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8827]]],
##     CS[Cartesian,2],
##         AXIS["easting (X)",east,
##             ORDER[1],
##             LENGTHUNIT["metre",1]],
##         AXIS["northing (Y)",north,
##             ORDER[2],
##             LENGTHUNIT["metre",1]],
##     USAGE[
##         SCOPE["unknown"],
##         AREA["USA - North Carolina"],
##         BBOX[33.83,-84.33,36.59,-75.38]],
##     ID["EPSG",32119]]

To plot the streams, you can use the layer geom_sf. You do not need to specify any x and y values here as you did in the last example. Because there is already a coordinate system associated with the file, ggplot will automatically plot in the given coordinate system.

ggplot() +
  geom_sf(data = rivers, color = 'blue')

Although the data has associated geometries, you can still manipulate the data frame, for example, using dplyr.

We can use distinct to find each stream in the data or can filter for specific streams.

rivers |>
  distinct(NAME)
##                               NAME
## 1                     Reedy Branch
## 2                     Beaver Creek
## 3                       Big Branch
## 4                     Middle Creek
## 5              Little Beaver Creek
## 6                     Thomas Creek
## 7                   Terrible Creek
## 8                    Kenneth Creek
## 9                     Mills Branch
## 10                    Hector Creek
## 11          Little White Oak Creek
## 12                 White Oak Creek
## 13                   Little Branch
## 14                  Buckhorn Creek
## 15                   Haleys Branch
## 16                       Kit Creek
## 17                   Buffalo Creek
## 18                    Little River
## 19                     Marks Creek
## 20                     Utley Creek
## 21                     Basal Creek
## 22                    Neills Creek
## 23                     Black Creek
## 24              Little Black Creek
## 25                 Beaverdam Creek
## 26                     Neuse River
## 27                     Horse Creek
## 28                      Mud Branch
## 29                    Lowery Creek
## 30                 New Light Creek
## 31                      Water Fork
## 32          Little Beaverdam Creek
## 33              Lower Barton Creek
## 34              Upper Barton Creek
## 35                 Honeycutt Creek
## 36                    Pierce Creek
## 37                      Lick Creek
## 38                    Guffy Branch
## 39                    Little Creek
## 40                     Swift Creek
## 41                   Mahlers Creek
## 42                  Whiteoak Creek
## 43                    Poplar Creek
## 44                     Cedar Creek
## 45                  Panther Branch
## 46                  Juniper Branch
## 47                   Sanford Creek
## 48                    Harris Creek
## 49                Dutchmans Branch
## 50                  Richland Creek
## 51                    Austin Creek
## 52                     Perry Creek
## 53                     Smith Creek
## 54                    Hominy Creek
## 55                      Cedar Fork
## 56                  Moccasin Creek
## 57              Fowlers Mill Creek
## 58                  Crabtree Creek
## 59                    Turkey Creek
## 60                    Coles Branch
## 61                    Walnut Creek
## 62                     Camp Branch
## 63                    Rocky Branch
## 64                     Lynn Branch
## 65                  Speight Branch
## 66                     Long Branch
## 67                    Snipes Creek
## 68                    Nancy Branch
## 69                   Morris Branch
## 70                 Bachelor Branch
## 71                     Jack Branch
## 72                     Reedy Creek
## 73                     House Creek
## 74 Southwest Prong Beaverdam Creek
## 75                   Panther Creek
## 76                     Brier Creek
## 77             Pigeon House Branch
## 78                  Wildcat Branch
## 79                Hare Snipe Creek
## 80                      Toms Creek
## 81                 Northeast Creek
## 82                    Indian Creek
## 83                     Pots Branch
## 84                     Rocky Creek
## 85                     Marsh Creek
## 86                  Bridges Branch
## 87                 Cemetery Branch
## 88                     Mango Creek
## 89                     Mingo Creek
## 90                   Cabtree Creek
## 91                     UTLEY CREEK
## 92                 Crabtree Branch
rivers |>
  filter(NAME == 'Cemetery Branch') 
## Simple feature collection with 33 features and 5 fields
## Geometry type: LINESTRING
## Dimension:     XY
## Bounding box:  xmin: 643169 ymin: 225367.5 xmax: 643744.3 ymax: 227726.8
## Projected CRS: NAD83 / North Carolina
## First 10 features:
##    OBJECTID        FTYPE FCODE            NAME       RCH_CODE
## 1      2129    CONNECTOR 33400 Cemetery Branch 03020201002398
## 2      2130    CONNECTOR 33400 Cemetery Branch 03020201002398
## 3      2132    CONNECTOR 33400 Cemetery Branch 03020201002398
## 4      2133    CONNECTOR 33400 Cemetery Branch 03020201002398
## 5      2135 STREAM/RIVER 46000 Cemetery Branch 03020201002398
## 6      2137    CONNECTOR 33400 Cemetery Branch 03020201002398
## 7      2138    CONNECTOR 33400 Cemetery Branch 03020201002398
## 8      2139    CONNECTOR 33400 Cemetery Branch 03020201002398
## 9      2140    CONNECTOR 33400 Cemetery Branch 03020201002398
## 10     2577 STREAM/RIVER 46000 Cemetery Branch 03020201002398
##                          geometry
## 1  LINESTRING (643609.2 227309...
## 2  LINESTRING (643557.5 227237...
## 3  LINESTRING (643521.4 227042...
## 4  LINESTRING (643403.4 226693...
## 5  LINESTRING (643397 226439.2...
## 6  LINESTRING (643294.6 226121...
## 7  LINESTRING (643214.5 225898...
## 8  LINESTRING (643170.5 225626...
## 9  LINESTRING (643240.6 225510...
## 10 LINESTRING (643605 227283.4...

For fun, let’s plot just Cemetery Branch over the other streams in a different color to highlight it. Here, we can also use the theme_void to remove gridlines and axes.

cemetery_branch<- rivers |>
  filter(NAME == 'Cemetery Branch')

ggplot() +
  geom_sf(data = rivers, color ='cornflowerblue') +
  geom_sf(data = cemetery_branch, color = 'red') + 
  theme_void()

Practice

In addition, you can layer multiple shape files. In your data folder, you also have a shape file for the Wake County Line.

Plot the Wake County line with the stream data. This file is called “Wake_County_Line-shp”.

You will be able to get something that looks like this:

Plotting points and changing coordinate systems

It is pretty straightforward to plot points on maps since you can plot based on the x and y coordinates. You can plot points in the same way that you have done before with x and y values. With GPS coordinates, the x is longitude and the y is latitude.

You can also use the function st_point in the sf package for geographic points.

park <- st_point(c(-78.625, 35.795))

To add the point to the plot, again use geom_sf since you have make stored the point as an sf object.

What happens when you add the point to the previous plot?

When plotting different layers, you have to make sure the coordinate systems are the same. You can see that the units for each are still not the same.

st_bbox(park)
##    xmin    ymin    xmax    ymax 
## -78.625  35.795 -78.625  35.795
st_bbox(rivers)
##     xmin     ymin     xmax     ymax 
## 612248.0 197064.4 677032.4 257607.0

We can change the coordinate systems using st_transform. We will do this for each of the objects to make sure they are on the same system. Most coordinate systems have a unique EPSG number. Here, we will use EPSG 4269, which is the same North American Datum 1983 system as before, but by transforming it, it will turn units to GPS coordinates. We also have to set the crs when making the point in for the point by setting the crs with st_stc.

rivers <- st_transform(rivers, crs = 4269)
park <- st_sfc(park,crs = 4269)

Now, plot again.

ggplot() +
  geom_sf(data = rivers, color = 'blue') + 
  geom_sf(data = park, color = 'red') +
  theme_void()

This previous way of plotted points used a spatial data object for the points. However, it can also be done by simply using geom_point(). This does require again that the units of the axes be in GPS coordinates.

ggplot() +
  geom_sf(data = rivers, color = 'blue')+
  geom_point(aes(x = -78.62475, y = 35.79474), color = 'red') +
    theme_void()

Working with multiple shape files

We are only scratching the surface of all you can do with spatial data. For example, you may want to work with multiple shape files or different types of spatial data. There are many functions in sf to work with multiple files including:

  • st_intersection
  • st_join
  • st_crop
  • st_mask
  • st_union

If we decide to work on spatial project, we can investigate these more.

Water Quality Data

In class, we will work on plotting the streams that we are working on for the water quality in Raleigh Project.

When we learned about joins, we used a data file that had the Site IDs matched to a stream name. This file also has latitude and longitude coordinates for where streams were measured.

In this part of the lesson, we will make a map showing where stream site locations.

Import in the stream_codes data file into R.

stream_codes <- read.csv('https://maryglover.github.io/bio331/open_data/stream_codes.csv')

head(stream_codes)
##   Site           Stream      lat      long
## 1 BDB1 Beaverdam Branch 35.82430 -78.65056
## 2  BB2       Big Branch 35.82264 -78.62973
## 3 BBS3 Big Branch South 35.74339 -78.56879
## 4  CC4   Crabtree Creek 35.84380 -78.72660
## 5  CC5   Crabtree Creek 35.79123 -78.58823
## 6 HSC6 Hare Snipe Creek 35.84504 -78.68847

Based on what you have learned so far, what do you think you would need to do to add the site locations to the map of streams in Wake County?

Let’s plot this together, thinking about how to make the map visually appealing.

Plotting data on maps

With the City of Raleigh water quality data set, choose 1 parameter to investigate.

  1. Group_by the sites and summarize the parameter to get the average across all of the years (or filter for a subset of the years).
  2. Join in the latitude and longitude data from the stream codes data with the summary table.
  3. Plot the streams from the stream shape file and the points for each of the stream sites.
  4. Change the color of the plot based on the parameter that you summarized.

New functions

ggplot2

  • geom_polygon()
  • coord_map()

sf

  • st_read()
  • st_transform()
  • st_bbox()
  • st_crs()
  • st_sfc()
  • geom_sf()

Resources

Road map layers

In the maps we have been making so far, we are simply plotting on an empty map. However, it is nice to add in some map features in the background, like roads or sateline images. To do this we can use the ggspatial package which has some “map tiles” loaded in. We can do this with the annotation_map_tile function. There are different types of tiles you can use including:

  • cartolight
  • osm
  • opencycle
  • cartodark
  • loviniacycle

Here, I am plotting the Raleigh stream system, that we used before, and the ‘cartolight’ map.

library(ggspatial) 
ggplot() +
annotation_map_tile(type = 'cartolight', zoomin = -1)+
geom_sf(data = rivers, color = 'darkblue', linewidth = 1.5) +
theme_void()  
## Zoom: 10

There are additional map layers that you can find to use as base plot, but it may take some searching to find 1) free ones and 2) how to download them.