Lesson 1 Introduction
The contents of this website comprise the notes for a workshop on best practices for using eBird data. Part I of this workshop will focus on extracting and processing eBird data, while Part II will cover using these data to model species distributions. The two parts are intentionally designed to be independent, which allows you to attend Part I, Part II, or both depending on your skills, background, and interests.
1.1 Data
To follow along with the lessons in this workshop download the workshop data package. This package contains a variety of datasets for Parts I and II of this workshop. Instructions for how to use each dataset will be given in the relevant lesson.
1.2 Format
This format of this workshop will be loosely based on Software Carpentry, the gold standard for workshops teaching scientific computing. As much as possible, the instructor will work through the lessons in real time, entering code live, while you code-along. Interspersed with the live coding will be exercises designed to give you a chance to practice on your own. This approach, known as participatory live coding, has been shown to be a much more effective means of learning to code than using slides or a pre-written script.
Another Software Carpentry technique that we’ve adopted is the use of sticky notes. You should have two sticky notes at your desk, one blue and one yellow. Throughout the workshop, if you’ve completed an exercise or have passed a checkpoint, put the blue sticky note on your laptop to indicate that you’re done. Similarly, if you’re lost, stuck, or have a problem and need help, place the yellow sticky on your laptop.
In addition to the instructor, we have several helpers roaming the room helping to troubleshoot problems. The sticky notes are the most effective way of flagging a helper down.
Checkpoint
Let’s stop here for a quick round on introductions, let us know who you are and how you hope to use eBird data!
1.3 Setup
Before we dive into writing code, let’s take a few minutes to ensure our systems are properly set up with all the correct software and R packages. Devoting some time to this up front will reduce errors and make troubleshooting easier later in the workshop. If you followed the pre-workshop setup instructions, most of this setup should already be complete; however, we include it here to ensure everyone’s on the same page.
Start by opening a browser window with four tabs pointing to the following websites:
- The shared Google Doc for this workshop (your instructor will provide a link). This will act as a collaborative notepad, which we can use to share code and links. Make sure you can edit the document.
- The eBird homepage
- The auk website. This
auk
R package is used to access eBird data and we’ll be using the website to view the documentation. - The online lessons for this workshop.
Checkpoint
Are all tabs correctly opened? Can you edit the shared notepad?
Next install or update RStudio, then open it. Look at the top line of the console, which gives your R version. If you have a version older than 3.5.0, you should consider updating R.
Checkpoint
Is your R version at least 3.5.0? Do you need help updating R?
Create a new RStudio project called ebird-best-practices
. Next, install all the packages required for this workshop and Part II, by running the following code:
The auk
package uses the unix command line tool AWK to access eBird data. AWK comes installed by default on Mac OS and Linux systems, but Windows users will need to install it. To do so, install the Cygwin software making sure to use the default install location.
Checkpoint
Is AWK installed?
Run the following code to test that auk
is installed correctly and AWK is working:
library(auk)
library(tidyverse)
tf <- tempfile()
system.file("extdata/ebd-sample.txt", package = "auk") %>%
auk_ebd() %>%
auk_species(species = c("Canada Jay", "Blue Jay")) %>%
auk_country(country = c("US", "Canada")) %>%
auk_bbox(bbox = c(-100, 37, -80, 52)) %>%
auk_date(date = c("2012-01-01", "2012-12-31")) %>%
auk_time(start_time = c("06:00", "09:00")) %>%
auk_duration(duration = c(0, 60)) %>%
auk_complete() %>%
auk_filter(tf) %>%
read_ebd() %>%
pull(common_name) %>%
message()
unlink(tf)
It should print Blue Jay
.
Checkpoint
Did “Blue Jay” print without errors?
If you’re running into any setup issues that can’t be resolved, use RStudio Cloud for this workshop instead. Your instructor will explain how to connect to RStudio Cloud.
1.4 Tidyverse
Throughout this workshop, we’ll be using functions from the Tidyverse. This is an opinionated set of packages for working with data in R. Packages such as dplyr
, ggplot2
, and purrr
are part of the Tidyverse. We’ll try to explain any functions as they come up; however, there’s one important operator from the Tidyverse that needs to be explained up front: the pipe operator %>%
. The pipe operator takes the expression to the left of it and “pipes” it into the first argument of the expression on the right.
The value of the pipe operator becomes clear when we have several operations in a row. Using the pipe makes code easier to read and reduces the need for intermediate variables.
# without pipes
set.seed(1)
ran_norm <- rnorm(10, sd = 5)
ran_norm_pos <- abs(ran_norm)
ran_norm_sort <- sort(ran_norm_pos)
ran_norm_round <- round(ran_norm_sort, digits = 1)
ran_norm_round
#> [1] 0.9 1.5 1.6 2.4 2.9 3.1 3.7 4.1 4.2 8.0
# with pipes
set.seed(1)
rnorm(10, sd = 5) %>%
abs() %>%
sort() %>%
round(digits = 1)
#> [1] 0.9 1.5 1.6 2.4 2.9 3.1 3.7 4.1 4.2 8.0
For those that have never used the pipe, it probably looks strange, but if you stick with it, you’ll quickly come to appreciate it.
Checkpoint
Any questions about the pipe?
Exercise
Rewrite the following code using pipes: