Spatial Data with R

Paritosh Kumar Roy

https://paritoshkroy.quarto.pub/

Spatial Data

  • Spatial data inherently describe the location and geometric properties of objects in space.
  • All spatial data include positional information to define their locations.
  • Positional information enables us to:
    • Understand where objects are located relative to one another.
    • Determine their position on the surface (Earth’s surface or any-other surface).

Types of Spatial Data

Spatial data can be represented using vector and raster data.

Vector Data

  • Represents features using points, lines, and polygons.
    • Example: A map of roads (lines), cities (points), and administrative boundary of a country (polygons).

Raster Data

  • Represents a surface as a grid of cells (pixels), where each cell has one or more values corresponding to a specific attribute. For example:
    • TV displays \(1080 \times 1920\) raster images
      • the geometry is the coordinate of each pixel
      • the attributes are red, green and blue values
    • Satellite imagery or digital elevation models
  • A raster is like a matrix, \(\cdots\) with some positional information attached

Components of Positional Information

Positional information in spatial data typically includes:

Coordinates

  • Numerical values that define the location of points, lines, or polygons on a two-dimensional (2D) or three-dimensional (3D) plane.
    • Example: Latitude and longitude in a geographic coordinate system.

Geographic Coordinate Reference System

  • Uses latitude and longitude based on a spherical model of the Earth (e.g., WGS84).
    • Projected Coordinate System: Uses a flat, 2D representation of the Earth (e.g., Universal Transverse Mercator, UTM).

Elevation or Depth

  • In 3D data, positional information includes a vertical component (e.g., altitude or depth).

Coordinates

Latitude

Longitude

  • Latitude is a geographic coordinate that measures a point’s north-south position on Earth, ranging from \(0^\circ\) at the Equator to \(90^\circ\) at the poles.
  • Longitude is a geographic coordinate measuring a point’s east-west position on Earth, ranging from \(0^\circ\) at the Prime Meridian to \(\pm 180^\circ\).
  • Latitude and longitude of any place are based on the sizes of two angles that originate at the center of Earth.

Coordinates

Source: Encyclopædia Britannica, Inc.

Source: Encyclopædia Britannica, Inc.
  • The coordinates for London is c(-0.1, 51.5) means that its location is -0.1 degrees east and 51.5 degrees north of the origin.
  • The origin in this case is at 0 degrees longitude (Prime Meridian) and 0 degrees latitude (Equator) in a geographic (lon/lat) CRS.

Coordinates Reference System (CRS)

  • World Geodetic System 1984: The coordinates for London is (-0.1, 51.5) means that its location is -0.1 degrees east and 51.5 degrees north of the origin, (0,0).

  • Universal Transverse Mercator: Uses different origin ensuring all locations in the UK have positive Easting and Northing values and relative distances can be measured in meter.

Information on CRS

Popular online resource providing information and reference materials on CRS are:

Introduction to Simple Features: Using sf Package

  • sf stands for simple feature defined by the OpenGIS abstract specification to have both spatial (geometry) and non-spatial attributes
  • sf uses well-known text (WKT) representation of coordinate reference systems
  • Three primitive spatial data type in sf package are:

Introduction to Simple Features: Using sf Package

  • Multi-part geometries spatial data type

Create points object in sf objects

library(sf)
pts1 <- st_point(c(10,40)); pts2 <- st_point(c(40,30))
st_geometry(pts1)
Geometry set for 1 feature 
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 10 ymin: 40 xmax: 10 ymax: 40
CRS:           NA
pts3 <- st_point(c(20,20)); pts4 <- st_point(c(30,10))
pts <- st_multipoint(cbind(c(10,40,20,30),c(40,30,20,10)))
st_geometry(pts)
Geometry set for 1 feature 
Geometry type: MULTIPOINT
Dimension:     XY
Bounding box:  xmin: 10 ymin: 10 xmax: 40 ymax: 40
CRS:           NA

Create points object in sf objects

ggplot(pts) + geom_sf(col = "blue", size = 2) + theme_bw()

Create lines object sf objects

coords1 <- cbind(c(10,20,10), c(10,20,40))
ls1 <- st_linestring(coords1)
st_geometry(ls1)
Geometry set for 1 feature 
Geometry type: LINESTRING
Dimension:     XY
Bounding box:  xmin: 10 ymin: 10 xmax: 20 ymax: 40
CRS:           NA
coords2 <- cbind(c(40,30,40,30), c(40,30,20,10))
ls2 <- st_linestring(coords2)
ls <- st_multilinestring(list(coords1, coords2))
st_geometry(ls)
Geometry set for 1 feature 
Geometry type: MULTILINESTRING
Dimension:     XY
Bounding box:  xmin: 10 ymin: 10 xmax: 40 ymax: 40
CRS:           NA

Create lines object sf objects

ggplot(ls) + geom_sf(col = "blue") + theme_bw()

Create polygons object sf objects

cor1 <- list(cbind(c(30,45,10,30),c(20,40,40,20)))
pol1 <- st_polygon(cor1)
st_geometry(pol1)
Geometry set for 1 feature 
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: 10 ymin: 20 xmax: 45 ymax: 40
CRS:           NA
cor2 <- list(cbind(c(15,40,10,5,15),c(5,10,20,10,5)))
pol2 <- st_polygon(cor2)
pol <- st_multipolygon(list(cor1, cor2))
st_geometry(pol)
Geometry set for 1 feature 
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 5 ymin: 5 xmax: 45 ymax: 40
CRS:           NA

Create polygons object sf objects

ggplot(pol) + geom_sf(aes(geometry = geometry), col = "blue",
                      linewidth = 1, fill = "gray90") + theme_bw()

Create polygons object sf objects

cor1 <- list(cbind(c(40,20,45,40),c(40,45,30,40)))
cor21 <- list(cbind(c(20,10,10,30,45,20),c(35,30,10,5,20,35)))
cor22 <- list(cbind(c(30,20,20,30),c(20,15,25,20)))
pol <- st_multipolygon(list(cor1, c(cor21,cor22)))
st_geometry(pol)
Geometry set for 1 feature 
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 10 ymin: 5 xmax: 45 ymax: 45
CRS:           NA

Create polygons object sf objects

ggplot(pol) + geom_sf(aes(geometry = geometry), col = "blue",
                      linewidth = 1, fill = "gray90") + theme_bw()

Elements of a sf object

  • The sf object is a data.frame containing a collection of simple features (rows) and attributes (columns) plus a list-column with the geometry of each feature.
  • A sf object contains the following objects of class sf, sfc and sfg:
    • sf (simple feature): each row of the data.frame is a single simple feature consisting of attributes and geometry.
    • sfc (simple feature geometry list-column): the geometry column of the data.frame is a list-column of class sfc with the geometry of each simple feature.
    • sfg (simple feature geometry): each of the rows of the sfc list-column corresponds to the simple feature geometry (sfg) of a single simple feature.

Some operations on point object using sf

Create simple feature geometry (sfc) list column

Toronto <- st_point(c(-79.3819, 43.6525))
Montreal <- st_point(c(-73.5833, 45.50))

Create geometry set for 2 features

sfc <- st_sfc(Toronto, Montreal) 
sfc_dt <- data.frame(Name = c("Toronto", "Montreal"),  geom = sfc)
cities <- st_sf(sfc_dt)
cities
Simple feature collection with 2 features and 1 field
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -79.3819 ymin: 43.6525 xmax: -73.5833 ymax: 45.5
CRS:           NA
      Name                 geometry
1  Toronto POINT (-79.3819 43.6525)
2 Montreal    POINT (-73.5833 45.5)

Some operations on point object using sf

ggplot() + 
  geom_sf(data=cities, shape=1, size=5, col=2) + theme_bw()

Some operations on point object using sf

Retrieve coordinate reference system and set CRS

st_crs(4326)
Coordinate Reference System:
  User input: EPSG:4326 
  wkt:
GEOGCRS["WGS 84",
    DATUM["World Geodetic System 1984",
        ELLIPSOID["WGS 84",6378137,298.257223563,
            LENGTHUNIT["metre",1]]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["geodetic latitude (Lat)",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["geodetic longitude (Lon)",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    USAGE[
        SCOPE["unknown"],
        AREA["World"],
        BBOX[-90,-180,90,180]],
    ID["EPSG",4326]]

Some operations on point object using sf

Retrieve coordinate reference system and set CRS

st_crs(2958)
Coordinate Reference System:
  User input: EPSG:2958 
  wkt:
PROJCRS["NAD83(CSRS) / UTM zone 17N",
    BASEGEOGCRS["NAD83(CSRS)",
        DATUM["NAD83 Canadian Spatial Reference System",
            ELLIPSOID["GRS 1980",6378137,298.257222101,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4617]],
    CONVERSION["UTM zone 17N",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",0,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",-81,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",0.9996,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",500000,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",0,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["unknown"],
        AREA["Canada - 84°W to 78°W"],
        BBOX[41.67,-84,84,-78]],
    ID["EPSG",2958]]

Some operations on point object using sf

Retrieve coordinate reference system and set CRS

cities_sf <- st_set_crs(cities, value = st_crs(4326))
ggplot(cities_sf) + geom_sf() + theme_bw()

Some operations on point object using sf

CRS transformation

Suppose we want to convert points form long-lat to UTM

  • use the st_transform function
  • \(\cdots\) and specify a new CRS
cities_sf <- st_set_crs(cities, value = st_crs(4326))
cities_utm <- st_transform(cities_sf, crs = st_crs(2958))
cities_utm
Simple feature collection with 2 features and 1 field
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 630485.3 ymin: 4834550 xmax: 1079413 ymax: 5065319
Projected CRS: NAD83(CSRS) / UTM zone 17N
      Name                 geometry
1  Toronto POINT (630485.3 4834550)
2 Montreal  POINT (1079413 5065319)

Exploring elements of sf object

Return bounding of a simple feature or simple feature set

st_bbox(cities_sf)
    xmin     ymin     xmax     ymax 
-79.3819  43.6525 -73.5833  45.5000 
st_bbox(cities_utm)
     xmin      ymin      xmax      ymax 
 630485.3 4834550.0 1079413.3 5065319.0 

Exploring elements of sf object

Retrieve coordinates in matrix form

st_coordinates(cities_sf)
            X       Y
[1,] -79.3819 43.6525
[2,] -73.5833 45.5000
st_coordinates(cities_utm)
             X       Y
[1,]  630485.3 4834550
[2,] 1079413.3 5065319

Some operations on point object using sf

Compute distances

sqrt(sum((st_coordinates(cities_utm)[1,] - st_coordinates(cities_utm)[2,])^2))
[1] 504767.9
dist(st_coordinates(cities_utm))
         1
2 504767.9
st_distance(cities_utm, which = "Euclidean")
Units: [m]
         1        2
1      0.0 504767.9
2 504767.9      0.0
st_distance(cities_sf, which = "Great Circle")
Units: [m]
         [,1]     [,2]
[1,]      0.0 502976.2
[2,] 502976.2      0.0

Create sf point object from data.frame

To convert a data frame object to sf, use st_as_sf()

stuff <- data.frame(lon = c(-79.3819, -73.5833), lat = c(43.6525, 45.5))
stuff_sf <- st_as_sf(x = stuff, coords =  c("lon", "lat"), crs = st_crs(4326))
stuff_sf
Simple feature collection with 2 features and 0 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -79.3819 ymin: 43.6525 xmax: -73.5833 ymax: 45.5
Geodetic CRS:  WGS 84
                  geometry
1 POINT (-79.3819 43.6525)
2    POINT (-73.5833 45.5)

Present these points through map

library(leaflet)
leaflet(cities_sf) %>% addTiles() %>% addMarkers()

Some operations on polygon object using sf

  • Create sf object from polygon
pol1 <- st_polygon(list(cbind(c(30,20,10,30), c(10,40,20,10))))
pol2 <- st_polygon(list(cbind(c(30,40,20,30), c(10,40,40,10))))
pol_sfc <- st_sfc(pol1,pol2)
pol_sf <- st_sf(data.frame(Name = c("A","B"), geom = pol_sfc))
ggplot(pol_sf) + geom_sf(aes(fill = Name)) + theme_bw()

Some operations on polygon object using sf

  • Add an attribute (column or variable) to the data
pol_sf <- pol_sf %>% mutate(Value = runif(2))
ggplot(pol_sf) + geom_sf(aes(fill = Value)) + theme_bw()

Some operations on polygon object using sf

  • st_centroid gives the centroid of a geometry
pol_centroid <- st_centroid(pol_sf)
ggplot(pol_sf) + geom_sf(fill = "gray90") + 
  geom_sf(data = pol_centroid, col = "blue") + theme_bw()

Some operations on polygon object using sf

  • Select a single simple feature or some simple features that match a condition.
pol_A_sf <- pol_sf %>% filter(Name == "A")
ggplot(pol_A_sf) + geom_sf(fill = "gray90") + theme_bw()

Some operations on polygon object using sf

  • st_union unions the input geometries to produce a single geometry
pol_union_sf <- st_union(pol_sf)
ggplot(pol_union_sf) + geom_sf(fill = "gray90") + theme_bw()

Some operations on polygon object using sf

  • st_buffer unions the input geometries, merging geometry to produce a result geometry with no overlaps
pol_buffer_sf <- st_buffer(pol_union_sf, dist = 1)
ggplot() + geom_sf(data = pol_union_sf, fill = "gray90") +
  geom_sf(data = pol_buffer_sf, fill = NA, col = "blue", linewidth = 1) + 
  theme_bw()

Some operations on polygon object using sf

  • st_make_grid create a regular tiling over the bounding box of an sf or sfc object
pol_grid_sf <- st_make_grid(pol_union_sf, n = c(10,10), what = "polygons")
corner_grid_sf <- st_make_grid(pol_union_sf, n = c(10,10), what = "corners")
center_grid_sf <- st_make_grid(pol_union_sf, n = c(10,10), what = "centers")

Some operations on polygon object using sf

ggplot() + geom_sf(data = pol_union_sf, fill = "gray90") +
  geom_sf(data = pol_grid_sf, fill = NA, col = "green") + 
  geom_sf(data = corner_grid_sf, col = "red") + 
  geom_sf(data = center_grid_sf, col = "blue") + theme_bw()

Some operations on polygon object using sf

  • st_intersection create a regular tiling over the bounding box of an sf or sfc object
center_in_pol_sf <- st_intersection(pol_union_sf, center_grid_sf)
corner_in_pol_sf <- st_intersection(pol_union_sf, corner_grid_sf)

Some operations on polygon object using sf

ggplot() + geom_sf(data = pol_union_sf, fill = "gray90") +
  geom_sf(data = center_in_pol_sf, col = "red") + 
  geom_sf(data = corner_in_pol_sf, col = "blue") + theme_bw()

File Format for Spatial Data

  • Shapefile is the standard file format for spatial point or polygon data which actually 4 or more files comprise a single spatial dataset
    • .shp file contains the geometry
    • .dbf file contains attributes
    • .shx links the geometry to attributes
    • .prj gives the projection
  • GeoPackage is an open, standards-based, platform-independent, portable, self-describing, compact format (.gpkg) for transferring geospatial information.
  • GeoJSON is a format for encoding a variety of geographic data structures (.geojson).
  • read_sf() function of sf is used to read these files.
  • write_sf() function of sf is used to save these files.

Making maps with R

Consider a map of the world, where each county is a polygon.

world <- ne_countries(scale = "medium", type = "countries", returnclass = "sf")
class(world)
[1] "sf"         "data.frame"
st_geometry(world)
Geometry set for 242 features 
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -180 ymin: -89.99893 xmax: 180 ymax: 83.59961
Geodetic CRS:  WGS 84
First 5 geometries:
str(world$admin)
 chr [1:242] "Zimbabwe" "Zambia" "Yemen" "Vietnam" "Venezuela" "Vatican" ...

Making maps with R

ggplot(world) + geom_sf(fill = NA) +  coord_sf() + theme_bw()

Making maps with R

canada <- world %>% filter(admin == "Canada")
class(canada)
[1] "sf"         "data.frame"
st_geometry(canada)
Geometry set for 1 feature 
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -141.0021 ymin: 41.67485 xmax: -52.65366 ymax: 83.11611
Geodetic CRS:  WGS 84

Making maps with R

ggplot(canada) + geom_sf(fill = NA) + coord_sf() + theme_bw()

Making maps with R

library(leaflet)
leaflet(canada) %>% addTiles() %>% 
  addPolygons(fillColor = "red", weight = 1, color = "black")

Making maps with R

Ontario

on <- read_sf("Province/Province.shp")
st_crs(on)
Coordinate Reference System:
  User input: NAD83 
  wkt:
GEOGCRS["NAD83",
    DATUM["North American Datum 1983",
        ELLIPSOID["GRS 1980",6378137,298.257222101,
            LENGTHUNIT["metre",1]]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["latitude",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["longitude",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    ID["EPSG",4269]]

Making maps with R

Ontario

on_sf <- st_transform(on, crs = st_crs(4326))
on_utm <- st_transform(on, crs = st_crs(6836))

Making maps with R

Ontario

library(leaflet)
leaflet(on_sf) %>% addTiles() %>% 
  addPolygons(fillColor = "red", weight = 1, color = "blue")

Acknowledgement