The ridigbio package can be used to obtain records from iDigBio API’s, including both the Search API and the Media APIs.
In this demo we will cover how to:
ridigbio
idig_search_records()
idig_search_media()
First, you must install the ridigbio package. If you are new to R and R studio, please refer to our QUBES module to get started: Introduction to R with Biodiversity Data, doi:10.25334/84FC-TE88 .
The lastest version of our R package can be installed via CRAN.
Before downloading any records, you must load the ridigbio package.
To download records from the Search API, we will use the function
idig_search_records()
. Here the rq
, or record
query, indicates we want to download all the records where the
scientificname
is equal to Galax
urceolata.
## [1] "uuid" "occurrenceid" "catalognumber"
## [4] "family" "genus" "scientificname"
## [7] "country" "stateprovince" "geopoint.lon"
## [10] "geopoint.lat" "data.dwc:eventDate" "data.dwc:year"
## [13] "data.dwc:month" "data.dwc:day" "collector"
## [16] "recordset"
When fields are not specified, default columns include the following:
Column | Description |
---|---|
uuid | Universally Unique IDentifier assigned by iDigBio |
occurrenceid | identifier for the occurrence, https://rs.tdwg.org/dwc/terms/occurrenceID |
catalognumber | identifier for the record within the collection, https://rs.tdwg.org/dwc/terms/catalogNumber |
family | scientific name of the family, https://rs.tdwg.org/dwc/terms/family |
genus | scientific name of the genus, https://rs.tdwg.org/dwc/terms/genus |
scientificname | scientific name, https://rs.tdwg.org/dwc/terms/scientificName |
country | country, https://rs.tdwg.org/dwc/terms/country |
stateprovince | name of the next smaller administrative region than country, https://rs.tdwg.org/dwc/terms/stateProvince |
geopoint.lon | equivalent to decimalLongitude, https://rs.tdwg.org/dwc/terms/decimalLongitude |
geopoint.lat | equivalent to decimalLatitude,https://rs.tdwg.org/dwc/terms/decimalLatitude |
datecollected | Modified field and could lack biological meaning |
data.dwc:eventDate | equivalent to eventDate, https://dwc.tdwg.org/list/#dwc_eventDate |
data.dwc:year | year of collection event, https://dwc.tdwg.org/list/#dwc_year |
data.dwc:month | month of collection event, https://dwc.tdwg.org/list/#dwc_month |
data.dwc:day | day of collection event |
collector | equivalent to recordedBy, https://rs.tdwg.org/dwc/terms/recordedBy |
recordset | indicates the iDigBio recordset the observation belongs too! |
In addition to scientificname
, record query may be based
on many other fields. For example, you can search for all members of the
family
Diapensiaceae:
What if you want to read in all the points for a family within an extent?
Hint: Use the iDigBio portal to determine the bounding box for your region of interest.
The bounding box delimits the geographic extent.
rq_input <- list("scientificname"=list("type"="exists"),
"family"="Diapensiaceae",
geopoint=list(
type="geo_bounding_box",
top_left=list(lon = -98.16, lat = 48.92),
bottom_right=list(lon = -64.02, lat = 23.06)
)
)
Search using the input you just made
To download media records from the Media API, we will use the
function idig_search_media()
. Here the rq
, or
record query, indicates we want to download all the records where the
scientificname
is equal to Galax
urceolata.
## [1] "accessuri" "datemodified" "dqs" "etag"
## [5] "flags" "format" "hasSpecimen" "licenselogourl"
## [9] "mediatype" "modified" "recordids" "records"
## [13] "recordset" "rights" "tag" "type"
## [17] "uuid" "version" "webstatement" "xpixels"
## [21] "ypixels"
When fields are not specified, default columns include the following:
Column | Description |
---|---|
accessuri | Unique identifier for a resource, https://ac.tdwg.org/termlist/#ac_accessURI |
datemodified | date last modified, which is assigned by iDigBio |
dqs | data quality score assigned by iDigBio |
etag | tag assigned by iDigBio |
flags | data quality flag assigned by iDigBio |
format | media format, https://purl.org/dc/terms/format |
hasSpecimen | TRUE or FALSE, indicates if there is an associated record for this media |
licenselogourl | media license, https://ac.tdwg.org/termlist/#ac_licenseLogoURL) |
mediatype | media object type |
modified | date modified, https://purl.org/dc/terms/modified |
recordids | list of UUID for associated records |
records | UUID for the associated record. Use this field to connect Record downloads with Media downloads |
recordset | indicates the iDigBio recordset the observation belongs too! |
rights | media rights, https://purl.org/dc/terms/rights |
tag | general keywords or tags, https://rs.tdwg.org/ac/terms/tag |
type | media type, https://purl.org/dc/terms/type |
uuid | Universally Unique IDentifier assigned by iDigBio |
version | media record version assigned by iDigBio |
webstatement | media rights, https://developer.adobe.com/xmp/docs/XMPNamespaces/xmpRights/ |
xpixels | as defined by EXIF, x dimension in pixel |
ypixels | as defined by EXIF,y dimension in pixels |
The media search above retained 341 rows, however some of these
observations do not have information in the accessuri
field. To only obtain records with acessuri
, we indicate we
only want records where data.ac:accessURI
exist, by setting
mq
, or media query, as followed:
galax_media2 <- idig_search_media(rq=list(scientificname="Galax urceolata"),
mq=list("data.ac:accessURI"=list("type"="exists")))
Now we have 327 observations with accessuri
!