eBird Status and Trends links bird sightings from hundreds of thousands of eBirders with habitat information from satellites to create predictions of the broad scale patterns of distribution and abundance of bird species. These products allow us to explore movements, range boundaries, and the spatiotemporal pattern of abundance for over 600 North American birds across the entire Western Hemisphere at high spatial and temporal resolution. In addition to exploring the various Status and Trends products through the website, the underlying data are available for download and analysis in R.

In this lab, we’ll learn how to access, analyze, and visualize the Status and Trends data within R using the package ebirdst. As an example, we’ll focus on the Baltimore Oriole, a familiar songbird that breeds in open woodlands throughout the Northeastern United States. Start by opening the Status and Trends page for Baltimore Oriole and explore the products that are available, paying particular attention to the weekly abundance animation and the seasonal abundance maps.

Now that you’re familiar with the products available through Status and Trends, let’s dive into the raw data! We’ll start by learning how to download the data, then we’ll produce maps of occurrence and abundance for a given week.

Setup

Before we dive into the material, there’s some setup that needs to be done to ensure we have all the necessary tools. First, install R and RStudio, which we’ll be using to interact with R. It’s important to have the most recent versions of both R and RStudio, so if you haven’t installed or updated them within the last 6 months, re-install them now. Next, open RStudio and run the following code to install the add on R packages required for this lab, you only need to run this line of code once:

install.packages("remotes")
remotes::install_github("CornellLabofOrnithology/ebirdst")

To get started with this lab, open up RStudio and create a new project from the File menu. This will create a new directory on your computer in which will contain your code and data for this project. Create a new R script (File > New File) to contain all the code in this lab and save it in your project directory. Finally, download this file of GIS data, which contains some contextual GIS layers that we’ll use when making maps. Place this file in a data/ sub-directory of your project.

Download data

The Status and Trends website is a great resources for exploring patterns of bird movement throughout North America, but if we want to make custom visualizations or perform analyses we’ll need to access the data underlying the Status and Trends products. Fortunately, all this raw data is stored in the cloud on the Amazon Web Services Registry of Open Data and the R package ebirdst gives you access to these data. To see a list of all the species available through ebirdst you can explore the ebirdst_runs data frame. Before doing so, we’ll need to attach the add on R packages that we’ll be using throughout this lab using the library() command.

library(raster)
library(sf)
library(dplyr)
library(scales)
library(ggplot2)
library(ebirdst)

glimpse(ebirdst_runs)
#> Observations: 610
#> Variables: 14
#> $ species_code                    <chr> "bbwduc", "fuwduc", "empgoo", "snogoo…
#> $ run_name                        <chr> "bbwduc-ERD2018-EBIRD_SCIENCE-2019110…
#> $ scientific_name                 <chr> "Dendrocygna autumnalis", "Dendrocygn…
#> $ common_name                     <chr> "Black-bellied Whistling-Duck", "Fulv…
#> $ breeding_start_dt               <date> 2018-05-24, 2018-05-10, 2018-05-24, …
#> $ breeding_end_dt                 <date> 2018-08-03, 2018-08-17, 2018-06-21, …
#> $ nonbreeding_start_dt            <date> 2018-01-18, 2018-11-30, NA, 2018-12-…
#> $ nonbreeding_end_dt              <date> 2018-03-01, 2018-02-15, NA, 2018-01-…
#> $ postbreeding_migration_start_dt <date> 2018-08-10, 2018-08-24, NA, 2018-08-…
#> $ postbreeding_migration_end_dt   <date> 2018-01-11, 2018-11-23, NA, 2018-12-…
#> $ prebreeding_migration_start_dt  <date> 2018-03-08, 2018-02-22, NA, 2018-01-…
#> $ prebreeding_migration_end_dt    <date> 2018-05-17, 2018-05-03, NA, 2018-05-…
#> $ year_round_start_dt             <date> NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ year_round_end_dt               <date> NA, NA, NA, NA, NA, NA, NA, NA, NA, …

You can run View(ebirdst_runs) to explore this table interactively within RStudio. The function ebirdst_download() downloads data for a single species; all you need to provide is the name of the species.

species <- "Baltimore Oriole"
dl_path <- ebirdst_download(species)

The variable dl_path now contains the location on your computer where the data were downloaded to. By default, all Status and Trends data are downloaded to a special directory buried within your file system. This is done by design because it’s critical that you don’t change the file structure of the downloaded data package otherwise ebirdst won’t be able to work with the data; however, you can use print(dl_path) to see where the files are.

Let’s use list.files() to see what files were downloaded:

list.files(dl_path, recursive = TRUE)
#>  [1] "data/balori-ERD2018-EBIRD_SCIENCE-20191025-bb8f7498_config.rds"                                                   
#>  [2] "data/balori-ERD2018-EBIRD_SCIENCE-20191025-bb8f7498_srd_raster_template.tif"                                      
#>  [3] "results/tifs/balori-ERD2018-EBIRD_SCIENCE-20191025-bb8f7498_hr_2018_abundance_lower.tif"                          
#>  [4] "results/tifs/balori-ERD2018-EBIRD_SCIENCE-20191025-bb8f7498_hr_2018_abundance_median.tif"                         
#>  [5] "results/tifs/balori-ERD2018-EBIRD_SCIENCE-20191025-bb8f7498_hr_2018_abundance_seasonal_breeding.tif"              
#>  [6] "results/tifs/balori-ERD2018-EBIRD_SCIENCE-20191025-bb8f7498_hr_2018_abundance_seasonal_nonbreeding.tif"           
#>  [7] "results/tifs/balori-ERD2018-EBIRD_SCIENCE-20191025-bb8f7498_hr_2018_abundance_seasonal_postbreeding_migration.tif"
#>  [8] "results/tifs/balori-ERD2018-EBIRD_SCIENCE-20191025-bb8f7498_hr_2018_abundance_seasonal_prebreeding_migration.tif" 
#>  [9] "results/tifs/balori-ERD2018-EBIRD_SCIENCE-20191025-bb8f7498_hr_2018_abundance_upper.tif"                          
#> [10] "results/tifs/balori-ERD2018-EBIRD_SCIENCE-20191025-bb8f7498_hr_2018_count_median.tif"                             
#> [11] "results/tifs/balori-ERD2018-EBIRD_SCIENCE-20191025-bb8f7498_hr_2018_occurrence_median.tif"                        
#> [12] "results/tifs/band_dates.csv"

Notice that most of these files end in .tif. These are GeoTIFFs, which is the most widespread format for storing spatial raster data, i.e. where space is divided up into a regular grid of cells each of which is assigned some quantity such as abundance. The ebirdst website contains a description of what data each of the files contains.

Loading data into R

Now that we’ve downloaded the Baltimore Oriole data, let’s load some into our R session so we can work with it. The R package raster provides tools for working with raster data of the form stored in the GeoTIFFs we just downloaded. The ebirdst function load_raster() loads one of the GeoTIFFs into an object the raster package can work with. For example, to load the weekly abundance and occurrence data, we can use:

occ <- load_raster("occurrence", path = dl_path)
abd <- load_raster("abundance", path = dl_path)

The occurrence layer represents the expected probability of occurrence of the species, ranging from 0 to 1, on an eBird Traveling Count by a skilled eBirder starting at the optimal time of day with the optimal search duration and distance that maximizes detection of that species in a region. The abundance layer represents the expected relative abundance of Baltimore Orioles on an similarly optimal eBird Traveling Count. The abd and occ variables store RasterStack objects and we can use some functions from the raster package to get information about them. Specifically, note that the size of each cell (aka the resolution) is 2.96 km (2,962 m) and that there are 52 layers in each RasterStack corresponding to the estimates of abundance and occurrence for each of the 52 weeks of the year.

res(abd)
#> [1] 2962.807 2962.809
nlayers(abd)
#> [1] 52

The ebirdst function parse_raster_dates() gives us the mid point of the week that each layer in the RasterStack corresponds to.

parse_raster_dates(abd)
#>  [1] "2018-01-04" "2018-01-11" "2018-01-18" "2018-01-25" "2018-02-01"
#>  [6] "2018-02-08" "2018-02-15" "2018-02-22" "2018-03-01" "2018-03-08"
#> [11] "2018-03-15" "2018-03-22" "2018-03-29" "2018-04-05" "2018-04-12"
#> [16] "2018-04-19" "2018-04-26" "2018-05-03" "2018-05-10" "2018-05-17"
#> [21] "2018-05-24" "2018-05-31" "2018-06-07" "2018-06-14" "2018-06-21"
#> [26] "2018-06-28" "2018-07-06" "2018-07-13" "2018-07-20" "2018-07-27"
#> [31] "2018-08-03" "2018-08-10" "2018-08-17" "2018-08-24" "2018-08-31"
#> [36] "2018-09-07" "2018-09-14" "2018-09-21" "2018-09-28" "2018-10-05"
#> [41] "2018-10-12" "2018-10-19" "2018-10-26" "2018-11-02" "2018-11-09"
#> [46] "2018-11-16" "2018-11-23" "2018-11-30" "2018-12-07" "2018-12-14"
#> [51] "2018-12-21" "2018-12-28"

When you explored the Status and Trends website you likely saw static seasonal abundance maps in addition to the animated weekly maps. These seasonal maps are created by taking the average of the weekly layers for all weeks within the given season (e.g. “breeding”), where the date boundaries of the season are defined through a process of expert review. The raw data underlying these seasonal maps is also contained within the downloaded data. Load the seasonal data now with:

abd_seasonal <- load_raster("abundance_seasonal", path = dl_path)
# the season that each layer corresponds to can be accessed with
names(abd_seasonal)
#> [1] "breeding"               "nonbreeding"            "postbreeding_migration"
#> [4] "prebreeding_migration"

Mapping occurrence and abundance

Let’s start by mapping some of these data! We’ll pick a week and produce maps comparing the occurrence and abundance of Baltimore Oriole for this week. For the sake of this example, we’ll choose the week centered on June 14 (week 24), which is during the breeding season for Baltimore Oriole.

# identify which layer our focal week corresponds to
week_date <- as.Date("2018-06-14")
week_index <- which(parse_raster_dates(abd) == week_date)
week_index
#> [1] 24

We can map a quick map of the abundance data using the plot() function from the raster package.

plot(abd[[week_index]])

Ok, so we have a map, but it’s not very useful. It’s clear that to make a good map there’s some data preparation that needs to be done. Specifically,

  1. All Status and Trends data are provided for the whole Western Hemisphere, but for any given week most of the mapped area will be have a predicted occurrence of zero (light grey on the above map). The ebirdst function calc_full_extent() calculates the extent of the non-zero data, which we can then use to focus our maps.
  2. You may notice the shape of the continents in the above map looks a little “weird”. That’s because the Status and Trends data are provided in the sinusoidal equal area projection used by the underlying environmental predictor data. This projection is valid for the whole globe and for most analytical uses it functions well; however, it’s heavily distorted and when producing maps there are much better options. Fortunately, as part of the Status and Trends process an optimal projection for mapping each species is calculated and made available through the function load_fac_map_parameters().
  3. By default, plot() assigns colors to the data using linear bins; however, the abundance data in particular is heavily left skewed, and linear bins usually result is a loss of resolution in the pattern of abundance. Quantile bins, which each have an equal number of observations, are typically a much better choice than linear bins, which are all of equal size. The ebirdst function calc_bins() calculates quantile bins for Status and Trends data.

The following code performs these data preparation steps.

# get species-specific map projection and full annual cycle extent
map_pars <- load_fac_map_parameters(dl_path)
proj <- map_pars$custom_projection
fac_ext <- map_pars$fa_extent
# generate a spatial polygon of the extent
fac_ext_sf <- fac_ext %>% 
  as("SpatialPolygons") %>% 
  st_as_sf() %>% 
  st_set_crs(proj) %>% 
  st_geometry()

# project the abundance and occurrence data for the given week
# crop to the pre-calculated full annual extent to speed up projection
occ_proj <- occ[[week_index]] %>% 
  crop(map_pars$fa_extent_sinu + 2e6) %>% 
  projectRaster(crs = proj, method = "ngb")
abd_proj <- abd[[week_index]] %>% 
  crop(map_pars$fa_extent_sinu + 2e6) %>% 
  projectRaster(crs = proj, method = "ngb")

# determine non-zero extent for the focal week
week_ext <- calc_full_extent(occ_proj)

# calculate quantile bins
occ_bins <- calc_bins(occ_proj, method = "quantile")$bins
abd_bins <- calc_bins(abd_proj, method = "quantile")$bins

We can now make a much improved version of the map. Note that the below map uses the color palette from the online Status and Trends maps, accessed via abundance_palette(), which is much better than the default palette used by raster. Again, we’ll only map the abundance data for now.

par(mar = c(0.25, 0.25, 0.25, 0.5))
pal <- abundance_palette(n = length(abd_bins) - 1)
plot(abd_proj, 
     breaks = abd_bins, col = pal,
     maxpixels = ncell(abd_proj),
     ext = week_ext,
     axes = FALSE)

This looks much better, but note that this map only show the relative abundance within areas where Baltimore Orioles are predicted to occur. The Status and Trends models also make predictions for where the species is absent and it will be valuable to also show this information. In addition, the data layers make a distinction between areas where the species is predicted to be absent (cell values of 0) and areas where no prediction was made because there wasn’t enough data to fit the model (cell values of NA). The predicted zeros were hidden in the above map because abd_bins starts at the lowest non-zero value. We can show these predicted zeros by adding an additional bin to capture zero, which we’ll assign to grey in the color palettes.

# extend bins
occ_bins_z <- c(0, occ_bins)
abd_bins_z <- c(0, abd_bins)
# add grey to palette
pal_z <- c("#e6e6e6", pal)

# produce map
par(mar = c(0.25, 0.25, 1.5, 0.5))
title <- paste0("Relative Abundance\n", species, 
               ", ", format(week_date, "%b %d"))
plot(abd_proj, 
     breaks = abd_bins_z, col = pal_z, 
     # only show 3 decimals of precision in legend
     lab.breaks = comma(abd_bins_z, 0.001),
     maxpixels = ncell(abd_proj),
     ext = week_ext,
     main = title, cex.main = 1,
     axes = FALSE, box = FALSE)

Looks great! The final step is to add some contextual information: state borders, country borders, and a background layer showing the land to show regions where no prediction was made. We can get all these layers from Natural Earth. These layers can be downloaded with the the R package rnaturalearth; however, for this lab we’ve put these layers in a GeoPackage file for easy access. Unlike the Status and Trends data, which is in raster format, these borders are in the form of spatial lines and polygons, which we can work with using the sf package in R. Let’s load these data now. If you’re interested in working with spatial data in R, the free online book Geocomputation with R is an excellent resources.

The GIS data are provided for the whole earth and we’ll need to crop them to the non-zero extent prior to transforming them to the species-specific, custom projection.

f_gis <- "data/gis-data.gpkg"
# create a polygon to crop the gis data to
fac_ext_ll <- fac_ext_sf %>% 
  # tranform to lat-lon to match gis layers
  st_transform(crs = 4326) %>% 
  # buffer by ~ 3000 km
  st_buffer(30)

# western hemisphere land boundary
ne_land <- read_sf(f_gis, "ne_land") %>% 
  # crop to full annual cycle non-zero extent
  st_crop(fac_ext_ll) %>% 
  # transform to custom projection
  st_transform(crs = proj) %>% 
  st_geometry()
# country lines
ne_country_lines <- read_sf(f_gis, "ne_country_lines") %>% 
  #st_intersection(fac_ext_ll) %>% 
  st_transform(crs = proj)

# state lines
ne_state_lines <- read_sf(f_gis, "ne_state_lines") %>% 
  #st_intersection(fac_ext_ll) %>% 
  st_transform(crs = proj)

Now we can make the final maps, building up the different components in layers with multiple calls to plot(). At this stage we’ll make occurrence and abundance maps so we can compare them. You’ll notice there’s some additional code below to produce legends, this is necessary because we’re using quantile bins and we just want to show the minimum, maximum, and midpoint on the color bar.

par(mar = c(0.25, 0.25, 2, 0.25), mfrow = c(2, 1))

# abundance map
title <- paste0(species, ", ", format(week_date, "%b %d"), "\n",
                "Relative Abundance")
# add land background/area of no prediction
plot(ne_land, col = "#cfcfcf", border = FALSE,
     main = title, cex.main = 1,
     xlim = c(week_ext@xmin, week_ext@xmax),
     ylim = c(week_ext@ymin, week_ext@ymax))
# add abundance data including predicted zeros
plot(abd_proj, 
     breaks = abd_bins_z, col = pal_z,
     maxpixels = ncell(abd_proj),
     legend = FALSE, add = TRUE)
# country and state boundaries
plot(ne_state_lines, col = "white", lwd = 0.7, add = TRUE)
plot(ne_country_lines, col = "white", lwd = 1.2, add = TRUE)
# legend
lbl_brks <- seq(0, max(abd_bins), length.out = length(abd_bins_z))
lbls_at <- c(0, quantile(lbl_brks[-1], c(0, 0.5, 1)))
lbls <- comma(c(0, quantile(abd_bins, c(0, 0.5, 1))), accuracy = 0.001)
plot(abd_proj, legend.only = TRUE,
     breaks = lbl_brks, col = pal_z,
     smallplot = c(0.88, 0.90, 0.15, 0.85),
     axis.args = list(at = lbls_at, labels = lbls))

# occurrence map
par(mar = c(0.25, 0.25, 1, 0.25))
# add land background/area of no prediction
plot(ne_land, col = "#cfcfcf", border = FALSE,
     main = "Occurrence", cex.main = 1,
     xlim = c(week_ext@xmin, week_ext@xmax),
     ylim = c(week_ext@ymin, week_ext@ymax))
# add occurrence data including predicted zeros
plot(occ_proj, 
     breaks = occ_bins_z, col = pal_z,
     maxpixels = ncell(occ_proj),
     legend = FALSE, add = TRUE)
# country and state boundaries
plot(ne_state_lines, col = "white", lwd = 0.7, add = TRUE)
plot(ne_country_lines, col = "white", lwd = 1.2, add = TRUE)
# legend
lbl_brks <- seq(0, max(occ_bins), length.out = length(occ_bins_z))
lbls_at <- c(0, quantile(lbl_brks[-1], c(0, 0.5, 1)))
lbls <- comma(c(0, quantile(occ_bins, c(0, 0.5, 1))), accuracy = 0.001)
plot(occ_proj, legend.only = TRUE,
     breaks = lbl_brks, col = pal_z,
     smallplot = c(0.88, 0.90, 0.15, 0.85),
     axis.args = list(at = lbls_at, labels = lbls))

Abundance trajectories

The above maps visualize a single week of data across the whole range of the species. Alternatively, we can visualize all 52 weeks of data for a single location to examine the phenology of this species. We often call these plots trajectories, and they show how abundance changes over the course of the year at a location.

For this example, we’ll take Cornell campus (42.448 N, 76.483 W) as our location of interest. We need to convert the latitude and longitude to spatial features, then we can use the extract() function from raster to extract the the values for all the weeks from the raster layers at the given point. Finally, we’re going to use some new data that we haven’t worked with before: the upper and lower confidence limits on the abundance estimates. Specifically, abundance_lower and abundance_upper are the 10th and 90th quantile of the expected relative abundance estimates, respectively. This will give us a sense of the uncertainty in the abundance estimates.

# create a point for cornell campus
pt <- st_point(c(-76.483, 42.448)) %>% 
  st_sfc(crs = 4326) %>%
  # transform to raster projection
  st_transform(crs = projection(abd)) %>% 
  as_Spatial()

# load uncertainty layers
abd_lower <- load_raster("abundance_lower", dl_path)
abd_upper <- load_raster("abundance_upper", dl_path)

# extract weekly occurrence and abundance values
abd_traj <- extract(abd, pt)[1, ]
lower_traj <- extract(abd_lower, pt)[1, ]
upper_traj <- extract(abd_upper, pt)[1, ]

Now that we’ve extract the data, let’s plot them with ggplot2.

# create a data frame of extract values
week_data <- data.frame(week = parse_raster_dates(abd),
                        abd = unname(abd_traj),
                        lower = lower_traj,
                        upper = upper_traj)

# plot
ggplot(week_data) +
  aes(x = week, y = abd, ymin = lower, ymax = upper) +
  geom_ribbon(fill = "grey70") +
  geom_line() +
  labs(x = "Week", y = "Relative Abundance") +
  scale_x_date(date_labels = "%b")

Seasonal abundance

For the final exercise, we’ll map relative abundance during the breeding and non-breeding season. We already loaded this data at the start of the lab. As with the weekly data, we need to do some preparation before we can make a map.

# project layers
seasons <- c("breeding", "nonbreeding")
abd_seasonal_proj <- abd_seasonal[[seasons]] %>% 
  crop(map_pars$fa_extent_sinu + 2e6) %>% 
  projectRaster(crs = proj, method = "ngb") %>% 
  setNames(seasons)
# get precalculated bins from load_fac_map_parameters()
bins_seasonal <- map_pars$abundance_bins
# add zero to show it in grey
bins_seasonal <- c(0, bins_seasonal)
# s&t color palette
pal_seasonal <- abundance_palette(n = length(bins_seasonal) - 1)
# add grey for zero
pal_seasonal <- c("#e6e6e6", pal_seasonal)

The above weekly mapping code can be re-purposed to produce seasonal abundance maps.

par(mar = c(0.25, 0.25, 1, 0.25), mfrow = c(2, 1))

# breeding map
# add land background/area of no prediction
plot(ne_land, col = "#cfcfcf", border = FALSE,
     main = "Relative Abundance, Breeding", cex.main = 1,
     xlim = c(fac_ext@xmin, fac_ext@xmax),
     ylim = c(fac_ext@ymin, fac_ext@ymax))
# add abundance data including predicted zeros
plot(abd_seasonal_proj[["breeding"]], 
     breaks = bins_seasonal, col = pal_seasonal,
     maxpixels = ncell(abd_seasonal_proj),
     legend = FALSE, add = TRUE)
# country and state boundaries
plot(ne_state_lines, col = "white", lwd = 0.7, add = TRUE)
plot(ne_country_lines, col = "white", lwd = 1.2, add = TRUE)
# legend
lbl_brks <- seq(0, max(bins_seasonal), length.out = length(bins_seasonal))
lbls_at <- c(0, quantile(lbl_brks[-1], c(0, 0.5, 1)))
lbls <- comma(c(0, quantile(bins_seasonal[-1], c(0, 0.5, 1))), 
              accuracy = 0.001)
plot(abd_proj, legend.only = TRUE,
     breaks = lbl_brks, col = pal_seasonal,
     smallplot = c(0.86, 0.88, 0.05, 0.95),
     axis.args = list(at = lbls_at, labels = lbls))

# non-breeding map
par(mar = c(0.25, 0.25, 1, 0.25))
# add land background/area of no prediction
plot(ne_land, col = "#cfcfcf", border = FALSE,
     main = "Relative Abundance, Non-breeding", cex.main = 1,
     xlim = c(fac_ext@xmin, fac_ext@xmax),
     ylim = c(fac_ext@ymin, fac_ext@ymax))
# add abundance data including predicted zeros
plot(abd_seasonal_proj[["nonbreeding"]], 
     breaks = bins_seasonal, col = pal_seasonal,
     maxpixels = ncell(abd_seasonal_proj),
     legend = FALSE, add = TRUE)
# country and state boundaries
plot(ne_state_lines, col = "white", lwd = 0.7, add = TRUE)
plot(ne_country_lines, col = "white", lwd = 1.2, add = TRUE)
# legend
plot(abd_proj, legend.only = TRUE,
     breaks = lbl_brks, col = pal_seasonal,
     smallplot = c(0.86, 0.88, 0.05, 0.95),
     axis.args = list(at = lbls_at, labels = lbls))