The ARUtools package aims to make processing of large quantities of acoustic recordings easier through automation of metadata processing and sub-sampling of recordings.
Prior to working on your ARU recordings or meta data you must:
Know your goals of interpreting recordings
Transferred your recordings to a suitable and organized location for processing
Prepare your site information so it can be linked with the recordings
This introduction will walk through the first few steps of extracting the metadata, adding site information, and calculating sunrise and sunset information.
Let’s use some example data to get started.
head(example_files)
#> [1] "a_BARLT10962_P01_1/P01_1_20200502T050000-0400_ARU.wav"
#> [2] "a_BARLT10962_P01_1/P01_1_20200503T052000-0400_ARU.wav"
#> [3] "a_S4A01234_P02_1/P02_1_20200504T052500_ARU.wav"
#> [4] "a_S4A01234_P02_1/P02_1_20200505T073000_ARU.wav"
#> [5] "a_BARLT10962_P03_1/P03_1_20200506T100000-0400_ARU.wav"
#> [6] "a_BARLT11111_P04_1/P04_1_20200506T050000-0400_ARU.wav"
This is a list of hypothetical ARU files from different sites, and using different ARUs. This is fairly messily organized data in that there is no clear structure to the folders and there appear to be unneeded characters in the files. However give the standard structure of site names, ARU ID codes, and datetime stamps, we can extract that information from the file structure alone.
First things first, we’ll clean up the meta data associated with the files.
m <- clean_metadata(project_files = example_files)
#> Extracting ARU info...
#> Extracting Dates and Times...
Because our example files follow the standard formats for Site ID, ARU Id, and date/time, we can extract all the information without having to change any of the default arguments.
m
#> # A tibble: 42 × 11
#> file_name type path aru_id manufacturer model aru_type site_id tz_offset
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 P01_1_202005… wav a_BA… BARLT… Frontier La… BAR-… BARLT P01_1 -0400
#> 2 P01_1_202005… wav a_BA… BARLT… Frontier La… BAR-… BARLT P01_1 -0400
#> 3 P02_1_202005… wav a_S4… S4A01… Wildlife Ac… Song… SongMet… P02_1 <NA>
#> 4 P02_1_202005… wav a_S4… S4A01… Wildlife Ac… Song… SongMet… P02_1 <NA>
#> # ℹ 38 more rows
#> # ℹ 2 more variables: date_time <dttm>, date <date>
If you were reading directly from files you would assign a base
directory and then have clean_metadata
read the files in
that folder and sub-folders.
Next, we want to add our coordinates to this data.
If your data has GPS logs included, they would have been detected in
the above step and you could now use g <- clean_gps(m)
to create a list of GPS coordinates.
However, many models of ARUs do not have an internal GPS and those that do, may not accurately record the location where the ARU is deployed to. Therefore we recommend that you create a site index file to manually record deployment locations, like this one.
example_sites
#> Sites Date_set_out Date_removed ARU lon lat Plots Subplot
#> 1 P01_1 2020-05-01 2020-05-03 BARLT10962 -85.03 50.010 Plot1 a
#> 2 P02_1 2020-05-03 2020-05-05 S4A01234 -87.45 52.680 Plot1 a
#> 3 P03_1 2020-05-05 2020-05-06 BARLT10962 -90.38 48.990 Plot2 a
#> 4 P04_1 2020-05-05 2020-05-07 BARLT11111 -85.53 45.000 Plot2 a
#> 5 P05_1 2020-05-06 2020-05-07 BARLT10962 -88.45 51.050 Plot3 b
#> 6 P06_1 2020-05-08 2020-05-09 BARLT10962 -90.08 52.000 Plot1 a
#> 7 P07_1 2020-05-08 2020-05-10 S4A01234 -86.03 50.450 Plot1 a
#> 8 P08_1 2020-05-10 2020-05-11 BARLT10962 -84.45 48.999 Plot2 a
#> 9 P09_1 2020-05-10 2020-05-11 S4A02222 -91.38 45.000 Plot2 a
#> 10 P10_1 2020-05-10 2020-05-11 S4A03333 -90.00 50.010 Plot3 b
While you can simply specify a single date, it is recommended that you use both a start date and an end date for the best matching. This is critical if you are moving your ARUs during a season.
Now let’s clean up this list so we can add these sites to our metadata.
sites <- clean_site_index(example_sites)
#> Error in `clean_site_index()`:
#> ! Problems with data `site_index`:
#> • Column 'site_id' does not exist
#> • Column 'date' does not exist
#> • Column 'aru_id' does not exist
#> • Column 'longitude' does not exist
#> • Column 'latitude' does not exist
#> • See ?clean_site_index
Ooops! We can see right away that clean_site_index()
expects the data to be in a particular format. Luckily we can let it
know if we’ve used a different format.
sites <- clean_site_index(example_sites,
name_aru_id = "ARU",
name_site_id = "Sites",
name_date_time = c("Date_set_out", "Date_removed"),
name_coords = c("lon", "lat")
)
#> There are overlapping date ranges
#> • Shifting start/end times to 'noon'
#> • Skip this with `resolve_overlaps = FALSE`
Hmm, that’s an interesting message! This means that some of our deployment dates overlap. ARUtools assumes that if you set out an ARU on a specific day, you probably didn’t set it out at midnight (i.e. the very start of that day). Since we assume you are likely using ARUs for recording in the early morning or late at night, we shift the dates start/end times to noon as an estimate of when the ARU was likely deployed.
If your ARU was deployed at midnight, use
resolve_ovelaps = FALSE
. Or, if you know the exact time
your ARU was deployed, use a date/time rather than just a date in your
site index.
sites
#> # A tibble: 10 × 8
#> site_id aru_id date_time_start date_time_end date_start date_end
#> <chr> <chr> <dttm> <dttm> <date> <date>
#> 1 P01_1 BARLT10… 2020-05-01 12:00:00 2020-05-03 12:00:00 2020-05-01 2020-05-03
#> 2 P02_1 S4A01234 2020-05-03 12:00:00 2020-05-05 12:00:00 2020-05-03 2020-05-05
#> 3 P03_1 BARLT10… 2020-05-05 12:00:00 2020-05-06 12:00:00 2020-05-05 2020-05-06
#> 4 P04_1 BARLT11… 2020-05-05 12:00:00 2020-05-07 12:00:00 2020-05-05 2020-05-07
#> # ℹ 6 more rows
#> # ℹ 2 more variables: longitude <dbl>, latitude <dbl>
Note that we’ve lost a couple of non-standard columns:
Plots
and Subplot
.
We can retain these by specifying cols_extra
.
sites <- clean_site_index(example_sites,
name_aru_id = "ARU",
name_site_id = "Sites",
name_date_time = c("Date_set_out", "Date_removed"),
name_coords = c("lon", "lat"),
name_extra = c("Plots", "Subplot")
)
#> There are overlapping date ranges
#> • Shifting start/end times to 'noon'
#> • Skip this with `resolve_overlaps = FALSE`
sites
#> # A tibble: 10 × 10
#> site_id aru_id date_time_start date_time_end date_start date_end
#> <chr> <chr> <dttm> <dttm> <date> <date>
#> 1 P01_1 BARLT10… 2020-05-01 12:00:00 2020-05-03 12:00:00 2020-05-01 2020-05-03
#> 2 P02_1 S4A01234 2020-05-03 12:00:00 2020-05-05 12:00:00 2020-05-03 2020-05-05
#> 3 P03_1 BARLT10… 2020-05-05 12:00:00 2020-05-06 12:00:00 2020-05-05 2020-05-06
#> 4 P04_1 BARLT11… 2020-05-05 12:00:00 2020-05-07 12:00:00 2020-05-05 2020-05-07
#> # ℹ 6 more rows
#> # ℹ 4 more variables: longitude <dbl>, latitude <dbl>, Plots <chr>,
#> # Subplot <chr>
We can even be fancy and rename them for consistency by using named vectors.
sites <- clean_site_index(example_sites,
name_aru_id = "ARU",
name_site_id = "Sites",
name_date_time = c("Date_set_out", "Date_removed"),
name_coords = c("lon", "lat"),
name_extra = c("plot" = "Plots", "subplot" = "Subplot")
)
#> There are overlapping date ranges
#> • Shifting start/end times to 'noon'
#> • Skip this with `resolve_overlaps = FALSE`
sites
#> # A tibble: 10 × 10
#> site_id aru_id date_time_start date_time_end date_start date_end
#> <chr> <chr> <dttm> <dttm> <date> <date>
#> 1 P01_1 BARLT10… 2020-05-01 12:00:00 2020-05-03 12:00:00 2020-05-01 2020-05-03
#> 2 P02_1 S4A01234 2020-05-03 12:00:00 2020-05-05 12:00:00 2020-05-03 2020-05-05
#> 3 P03_1 BARLT10… 2020-05-05 12:00:00 2020-05-06 12:00:00 2020-05-05 2020-05-06
#> 4 P04_1 BARLT11… 2020-05-05 12:00:00 2020-05-07 12:00:00 2020-05-05 2020-05-07
#> # ℹ 6 more rows
#> # ℹ 4 more variables: longitude <dbl>, latitude <dbl>, plot <chr>,
#> # subplot <chr>
Now let’s add this site-related information to our metadata.
m
#> # A tibble: 42 × 15
#> file_name type path aru_id manufacturer model aru_type site_id tz_offset
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 P01_1_202005… wav a_BA… BARLT… Frontier La… BAR-… BARLT P01_1 -0400
#> 2 P01_1_202005… wav a_BA… BARLT… Frontier La… BAR-… BARLT P01_1 -0400
#> 3 P02_1_202005… wav a_S4… S4A01… Wildlife Ac… Song… SongMet… P02_1 <NA>
#> 4 P02_1_202005… wav a_S4… S4A01… Wildlife Ac… Song… SongMet… P02_1 <NA>
#> # ℹ 38 more rows
#> # ℹ 6 more variables: date_time <dttm>, date <date>, longitude <dbl>,
#> # latitude <dbl>, plot <chr>, subplot <chr>
Great! We have all the site-related information to describe that recording.
Now to prepare for our selection procedure, the last thing we need to do is calculate the time to sunrise or sunset.
Here we need to be clear about what timezone the ARU unit was recording times as.
There are two options.
The first option is that all ARUs were set up at home base before
deployment. In this case it’s possible they were deployed in a location
with a different timezone than what they were recording in. This doesn’t
matter, as long as you specify the programmed timezone here. In this
case, use tz = "America/Toronto"
, or whichever time zone
was used. Note that timezones must be one of
OlsonNames()
.
The second option is that each ARU unit was set up to record in the
local timezone where it was placed. If this is the case, specify
tz = "local"
and the calc_sun()
function will
use coordinates to determine local timezones.
(See the Dealing with Timezones vignette for more details).
In our example, let’s assume that the ARUs were set up in each
location they were deployed. So we’ll use tz = "local"
, the
default setting.
m <- calc_sun(m)
dplyr::glimpse(m)
#> Rows: 42
#> Columns: 18
#> $ file_name <chr> "P01_1_20200502T050000-0400_ARU.wav", "P01_1_20200503T052…
#> $ type <chr> "wav", "wav", "wav", "wav", "wav", "wav", "wav", "wav", "…
#> $ path <chr> "a_BARLT10962_P01_1/P01_1_20200502T050000-0400_ARU.wav", …
#> $ aru_id <chr> "BARLT10962", "BARLT10962", "S4A01234", "S4A01234", "BARL…
#> $ manufacturer <chr> "Frontier Labs", "Frontier Labs", "Wildlife Acoustics", "…
#> $ model <chr> "BAR-LT", "BAR-LT", "Song Meter 4", "Song Meter 4", "BAR-…
#> $ aru_type <chr> "BARLT", "BARLT", "SongMeter", "SongMeter", "BARLT", "BAR…
#> $ site_id <chr> "P01_1", "P01_1", "P02_1", "P02_1", "P03_1", "P04_1", "P0…
#> $ tz_offset <chr> "-0400", "-0400", NA, NA, "-0400", "-0400", "-0400", "-04…
#> $ date_time <dttm> 2020-05-02 05:00:00, 2020-05-03 05:20:00, 2020-05-04 05:…
#> $ date <date> 2020-05-02, 2020-05-03, 2020-05-04, 2020-05-05, 2020-05-…
#> $ longitude <dbl> -85.03, -85.03, -87.45, -87.45, -90.38, -85.53, -85.53, -…
#> $ latitude <dbl> 50.010, 50.010, 52.680, 52.680, 48.990, 45.000, 45.000, 5…
#> $ plot <chr> "Plot1", "Plot1", "Plot1", "Plot1", "Plot2", "Plot2", "Pl…
#> $ subplot <chr> "a", "a", "a", "a", "a", "a", "a", "b", "a", "a", "a", "a…
#> $ tz <chr> "America/Toronto", "America/Toronto", "America/Toronto", …
#> $ t2sr <dbl> -74.933333, -53.216667, -47.250000, 79.616667, 207.133333…
#> $ t2ss <dbl> 479.9500, 498.4167, 483.4167, 606.6833, -685.8833, 486.18…
Tada! Now we have a complete set of cleaned metadata associated with each recording.
This is a very simple example and much of the pain in large projects
comes from complications, so be sure to check out
vignette("customizing")
and
vignette("spatial")
to dig into some of these issues.
Now that we have a set of cleaned metadata the next step is to select
recordings. To do this using a random sampling approach check out the
subsampling article vignette("SubSample")
.