You can use this template script file to work through the lesson.
In this lesson, students will learn how to work with spatial data, or data with geographic information. There are different packages that in R that handle spatial data. In these packages, spatial data is stored as a special class (ex. character, number, etc.) which stores the geographic information.
To start, let’s plot North Carolina with county lines.
ggplot2 contains some mapping data that you can use without
having to import from your computer. Here, we will use the function
map_data to get the USA map data, and then filter by “north
carolina” as you usually would with data frames in
dplyr.
library(ggplot2)
library(dplyr)
map <- map_data("county")
nc <- map |>
filter(region =="north carolina")
head(nc)
## long lat group order region subregion
## 1 -79.53800 35.84424 1857 54915 north carolina alamance
## 2 -79.54372 35.89008 1857 54916 north carolina alamance
## 3 -79.53800 35.98175 1857 54917 north carolina alamance
## 4 -79.52081 36.23385 1857 54918 north carolina alamance
## 5 -79.26298 36.23385 1857 54919 north carolina alamance
## 6 -79.27444 35.90726 1857 54920 north carolina alamance
In this data, there are points that will be connected into polygons to plot each county based on the “group”. In the aesthetics, we set x to the longitude and y to the latitude.
ggplot(data = nc, aes(x = long, y = lat, group = group)) +
geom_polygon()
There are different types of geographic information that we can plot.
When plotting data as a polygon, like we are doing with the North
Carolina map, points are connected with lines to make a shape. You can
plot the different types of spatial data, like points, lines, and
polygons, using ggplot layers. With the polygon, we will use
geom_polygon()
You can edit the plot in the same way that you did other plots in
ggplot. For example, now we can color each of the the
counties, which are in the “subregion” column, a different color and
change the theme.
ggplot(data = nc, aes(x = long, y = lat, group = group)) +
geom_polygon(aes( fill = subregion))+
theme_void()+
theme(legend.position = 'none')
With plotting spatial data, you have to consider the coordinate system, which determines how the spherical data from the earth is plotted flattened as a 2-D map. There are different coordinate systems that you can use. The default is cartesian coordinates, where x and y coordinates are 1:1.
See what happens when you change the coordinate systems by adding some of the following layers:
coord_map(): Mercator projectioncoord_map("conic", lat0 = 30)coord_map('gilbert')
1.coord_map('orthographic')In the last example, we plotted a polygon simply by connecting points
with lines. However, other data sources have polygon spatial data stored
differently. In the next example, we will plot using a shape (.shp) file
in the package sf. We will plot the major rivers or streams
in Wake County. The file is already in your Posit Cloud workspace. To
load in the shape file using sf, you will use the function
st_read.
library(sf)
## Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1; sf_use_s2() is TRUE
rivers <- st_read('data/Major_Rivers/')
## Reading layer `Major_Rivers' from data source
## `/cloud/project/lessons/mapping/data/Major_Rivers' using driver `ESRI Shapefile'
## replacing null geometries with empty geometries
## Simple feature collection with 9421 features and 5 fields (with 42 geometries empty)
## Geometry type: GEOMETRY
## Dimension: XY
## Bounding box: xmin: 612248 ymin: 197064.4 xmax: 677032.4 ymax: 257607
## Projected CRS: NAD83 / North Carolina
You can already see that this is different than a regular csv file, with information about the geographic data shown as you import the data. Specifically, you can see that the coordinates are XY, a “bounding box” or dimensions, and a coordinate system (Projected CRS).
You can also see in the data a column for the geometry. This determines the shape of each stream.
head(rivers)
## Simple feature collection with 6 features and 5 fields
## Geometry type: LINESTRING
## Dimension: XY
## Bounding box: xmin: 615889.6 ymin: 218627.2 xmax: 616459.2 ymax: 219804.9
## Projected CRS: NAD83 / North Carolina
## OBJECTID FTYPE FCODE NAME RCH_CODE
## 1 1 STREAM/RIVER 46000 Reedy Branch 03030002002226
## 2 2 STREAM/RIVER 46000 Reedy Branch 03030002002226
## 3 3 STREAM/RIVER 46000 Reedy Branch 03030002002226
## 4 4 STREAM/RIVER 46000 Beaver Creek 03030002000011
## 5 5 CONNECTOR 33400 Beaver Creek 03030002000011
## 6 6 CONNECTOR 33400 Beaver Creek 03030002000012
## geometry
## 1 LINESTRING (616459.2 219804...
## 2 LINESTRING (616355.1 219793...
## 3 LINESTRING (616225.6 219798...
## 4 LINESTRING (615944.7 218709...
## 5 LINESTRING (615895.2 218702...
## 6 LINESTRING (616331.7 218627...
rivers$geometry[1]
## Geometry set for 1 feature
## Geometry type: LINESTRING
## Dimension: XY
## Bounding box: xmin: 616369 ymin: 219792.1 xmax: 616459.2 ymax: 219804.9
## Projected CRS: NAD83 / North Carolina
## LINESTRING (616459.2 219804.9, 616452.3 219801....
You can use the functions st_bbox to find the boundaries
of the file and st_crs to view the coordinate system
information.
st_bbox(rivers)
## xmin ymin xmax ymax
## 612248.0 197064.4 677032.4 257607.0
st_crs(rivers)
## Coordinate Reference System:
## User input: NAD83 / North Carolina
## wkt:
## PROJCRS["NAD83 / North Carolina",
## BASEGEOGCRS["NAD83",
## DATUM["North American Datum 1983",
## ELLIPSOID["GRS 1980",6378137,298.257222101,
## LENGTHUNIT["metre",1]]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433]],
## ID["EPSG",4269]],
## CONVERSION["SPCS83 North Carolina zone (meters)",
## METHOD["Lambert Conic Conformal (2SP)",
## ID["EPSG",9802]],
## PARAMETER["Latitude of false origin",33.75,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8821]],
## PARAMETER["Longitude of false origin",-79,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8822]],
## PARAMETER["Latitude of 1st standard parallel",36.1666666666667,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8823]],
## PARAMETER["Latitude of 2nd standard parallel",34.3333333333333,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8824]],
## PARAMETER["Easting at false origin",609601.22,
## LENGTHUNIT["metre",1],
## ID["EPSG",8826]],
## PARAMETER["Northing at false origin",0,
## LENGTHUNIT["metre",1],
## ID["EPSG",8827]]],
## CS[Cartesian,2],
## AXIS["easting (X)",east,
## ORDER[1],
## LENGTHUNIT["metre",1]],
## AXIS["northing (Y)",north,
## ORDER[2],
## LENGTHUNIT["metre",1]],
## USAGE[
## SCOPE["unknown"],
## AREA["USA - North Carolina"],
## BBOX[33.83,-84.33,36.59,-75.38]],
## ID["EPSG",32119]]
To plot the streams, you can use the layer geom_sf. You
do not need to specify any x and y values here as you did in the last
example. Because there is already a coordinate system associated with
the file, ggplot will automatically plot in the given coordinate
system.
ggplot() +
geom_sf(data = rivers, color = 'blue')
Although the data has associated geometries, you can still manipulate
the data frame, for example, using dplyr.
We can use distinct to find each stream in the data or
can filter for specific streams.
rivers |>
distinct(NAME)
## NAME
## 1 Reedy Branch
## 2 Beaver Creek
## 3 Big Branch
## 4 Middle Creek
## 5 Little Beaver Creek
## 6 Thomas Creek
## 7 Terrible Creek
## 8 Kenneth Creek
## 9 Mills Branch
## 10 Hector Creek
## 11 Little White Oak Creek
## 12 White Oak Creek
## 13 Little Branch
## 14 Buckhorn Creek
## 15 Haleys Branch
## 16 Kit Creek
## 17 Buffalo Creek
## 18 Little River
## 19 Marks Creek
## 20 Utley Creek
## 21 Basal Creek
## 22 Neills Creek
## 23 Black Creek
## 24 Little Black Creek
## 25 Beaverdam Creek
## 26 Neuse River
## 27 Horse Creek
## 28 Mud Branch
## 29 Lowery Creek
## 30 New Light Creek
## 31 Water Fork
## 32 Little Beaverdam Creek
## 33 Lower Barton Creek
## 34 Upper Barton Creek
## 35 Honeycutt Creek
## 36 Pierce Creek
## 37 Lick Creek
## 38 Guffy Branch
## 39 Little Creek
## 40 Swift Creek
## 41 Mahlers Creek
## 42 Whiteoak Creek
## 43 Poplar Creek
## 44 Cedar Creek
## 45 Panther Branch
## 46 Juniper Branch
## 47 Sanford Creek
## 48 Harris Creek
## 49 Dutchmans Branch
## 50 Richland Creek
## 51 Austin Creek
## 52 Perry Creek
## 53 Smith Creek
## 54 Hominy Creek
## 55 Cedar Fork
## 56 Moccasin Creek
## 57 Fowlers Mill Creek
## 58 Crabtree Creek
## 59 Turkey Creek
## 60 Coles Branch
## 61 Walnut Creek
## 62 Camp Branch
## 63 Rocky Branch
## 64 Lynn Branch
## 65 Speight Branch
## 66 Long Branch
## 67 Snipes Creek
## 68 Nancy Branch
## 69 Morris Branch
## 70 Bachelor Branch
## 71 Jack Branch
## 72 Reedy Creek
## 73 House Creek
## 74 Southwest Prong Beaverdam Creek
## 75 Panther Creek
## 76 Brier Creek
## 77 Pigeon House Branch
## 78 Wildcat Branch
## 79 Hare Snipe Creek
## 80 Toms Creek
## 81 Northeast Creek
## 82 Indian Creek
## 83 Pots Branch
## 84 Rocky Creek
## 85 Marsh Creek
## 86 Bridges Branch
## 87 Cemetery Branch
## 88 Mango Creek
## 89 Mingo Creek
## 90 Cabtree Creek
## 91 UTLEY CREEK
## 92 Crabtree Branch
rivers |>
filter(NAME == 'Cemetery Branch')
## Simple feature collection with 33 features and 5 fields
## Geometry type: LINESTRING
## Dimension: XY
## Bounding box: xmin: 643169 ymin: 225367.5 xmax: 643744.3 ymax: 227726.8
## Projected CRS: NAD83 / North Carolina
## First 10 features:
## OBJECTID FTYPE FCODE NAME RCH_CODE
## 1 2129 CONNECTOR 33400 Cemetery Branch 03020201002398
## 2 2130 CONNECTOR 33400 Cemetery Branch 03020201002398
## 3 2132 CONNECTOR 33400 Cemetery Branch 03020201002398
## 4 2133 CONNECTOR 33400 Cemetery Branch 03020201002398
## 5 2135 STREAM/RIVER 46000 Cemetery Branch 03020201002398
## 6 2137 CONNECTOR 33400 Cemetery Branch 03020201002398
## 7 2138 CONNECTOR 33400 Cemetery Branch 03020201002398
## 8 2139 CONNECTOR 33400 Cemetery Branch 03020201002398
## 9 2140 CONNECTOR 33400 Cemetery Branch 03020201002398
## 10 2577 STREAM/RIVER 46000 Cemetery Branch 03020201002398
## geometry
## 1 LINESTRING (643609.2 227309...
## 2 LINESTRING (643557.5 227237...
## 3 LINESTRING (643521.4 227042...
## 4 LINESTRING (643403.4 226693...
## 5 LINESTRING (643397 226439.2...
## 6 LINESTRING (643294.6 226121...
## 7 LINESTRING (643214.5 225898...
## 8 LINESTRING (643170.5 225626...
## 9 LINESTRING (643240.6 225510...
## 10 LINESTRING (643605 227283.4...
For fun, let’s plot just Cemetery Branch over the other streams in a
different color to highlight it. Here, we can also use the
theme_void to remove gridlines and axes.
cemetery_branch<- rivers |>
filter(NAME == 'Cemetery Branch')
ggplot() +
geom_sf(data = rivers, color ='cornflowerblue') +
geom_sf(data = cemetery_branch, color = 'red') +
theme_void()
In addition, you can layer multiple shape files. In your data folder, you also have a shape file for the Wake County Line.
Plot the Wake County line with the stream data. This file is called “Wake_County_Line-shp”.
You will be able to get something that looks like this:
It is pretty straightforward to plot points on maps since you can plot based on the x and y coordinates. You can plot points in the same way that you have done before with x and y values. With GPS coordinates, the x is longitude and the y is latitude.
You can also use the function st_point in the
sf package for geographic points.
park <- st_point(c(-78.625, 35.795))
To add the point to the plot, again use geom_sf since
you have make stored the point as an sf object.
What happens when you add the point to the previous plot?
When plotting different layers, you have to make sure the coordinate systems are the same. You can see that the units for each are still not the same.
st_bbox(park)
## xmin ymin xmax ymax
## -78.625 35.795 -78.625 35.795
st_bbox(rivers)
## xmin ymin xmax ymax
## 612248.0 197064.4 677032.4 257607.0
We can change the coordinate systems using st_transform.
We will do this for each of the objects to make sure they are on the
same system. Most coordinate systems have a unique EPSG number. Here, we
will use EPSG 4269, which is the same
North American Datum 1983 system as before, but by transforming it, it
will turn units to GPS coordinates. We also have to set the crs when
making the point in for the point by setting the crs with
st_stc.
rivers <- st_transform(rivers, crs = 4269)
park <- st_sfc(park,crs = 4269)
Now, plot again.
ggplot() +
geom_sf(data = rivers, color = 'blue') +
geom_sf(data = park, color = 'red') +
theme_void()
This previous way of plotted points used a spatial data object for
the points. However, it can also be done by simply using
geom_point(). This does require again that the
units of the axes be in GPS coordinates.
ggplot() +
geom_sf(data = rivers, color = 'blue')+
geom_point(aes(x = -78.62475, y = 35.79474), color = 'red') +
theme_void()
We are only scratching the surface of all you can do with spatial
data. For example, you may want to work with multiple shape files or
different types of spatial data. There are many functions in
sf to work with multiple files including:
If we decide to work on spatial project, we can investigate these more.
In class, we will work on plotting the streams that we are working on for the water quality in Raleigh Project.
When we learned about joins, we used a data file that had the Site IDs matched to a stream name. This file also has latitude and longitude coordinates for where streams were measured.
In this part of the lesson, we will make a map showing where stream site locations.
Import in the stream_codes data file into R.
stream_codes <- read.csv('https://maryglover.github.io/bio331/open_data/stream_codes.csv')
head(stream_codes)
## Site Stream lat long
## 1 BDB1 Beaverdam Branch 35.82430 -78.65056
## 2 BB2 Big Branch 35.82264 -78.62973
## 3 BBS3 Big Branch South 35.74339 -78.56879
## 4 CC4 Crabtree Creek 35.84380 -78.72660
## 5 CC5 Crabtree Creek 35.79123 -78.58823
## 6 HSC6 Hare Snipe Creek 35.84504 -78.68847
Based on what you have learned so far, what do you think you would need to do to add the site locations to the map of streams in Wake County?
Let’s plot this together, thinking about how to make the map visually appealing.
With the City of Raleigh water quality data set, choose 1 parameter to investigate.
ggplot2
sf
In the maps we have been making so far, we are simply plotting on an
empty map. However, it is nice to add in some map features in the
background, like roads or sateline images. To do this we can use the
ggspatial package which has some “map tiles” loaded in. We
can do this with the annotation_map_tile function. There
are different types of tiles you can use including:
Here, I am plotting the Raleigh stream system, that we used before, and the ‘cartolight’ map.
library(ggspatial)
ggplot() +
annotation_map_tile(type = 'cartolight', zoomin = -1)+
geom_sf(data = rivers, color = 'darkblue', linewidth = 1.5) +
theme_void()
## Zoom: 10
There are additional map layers that you can find to use as base plot, but it may take some searching to find 1) free ones and 2) how to download them.