Data Resource is a simple format to describe a data resource such as an individual table or file, including its name, format, path, etc.
In this document we use the terms “package” for Data Package, “resource” for Data Resource, “dialect” for Table Dialect, and “schema” for Table Schema.
Frictionless supports reading, manipulating and writing resources, but much of its functionality is limited to Tabular Data Resources.
resources()
lists all resources in a package:
library(frictionless)
package <- example_package()
# List the resources
resources(package)
#> [1] "deployments" "observations" "media"
read_resource()
reads data from a tabular resource to a
data frame:
read_resource(package, "deployments")
#> # A tibble: 3 × 5
#> deployment_id longitude latitude start comments
#> <chr> <dbl> <dbl> <date> <chr>
#> 1 1 4.62 50.8 2020-09-25 <NA>
#> 2 2 4.64 50.8 2020-10-01 "On \"forêt\" road."
#> 3 3 4.65 50.8 2020-10-05 "Malfunction/no photos, data"
Frictionless does not support reading data from non-tabular resources.
remove_resource()
removes a resource (of any type):
remove_resource(package, "deployments")
#> A Data Package with 2 resources:
#> • observations
#> • media
#> Use `unclass()` to print the Data Package as a list.
# This and many other functions return "package", which you can update with
# package <- remove_resource(package, "deployments")
add_resource()
adds or replaces a tabular resource. The
provided data must be a data frame or a tabular data file
(e.g. CSV):
# Add a resource with data from a data frame
add_resource(package, "iris", data = iris)
#> A Data Package with 4 resources:
#> • deployments
#> • observations
#> • media
#> • iris
#> Use `unclass()` to print the Data Package as a list.
# Replace a resource with one where data is stored in a tabular file
path <- system.file("extdata", "deployments.csv", package = "frictionless")
add_resource(package, "deployments", data = path, replace = TRUE)
#> A Data Package with 3 resources:
#> • deployments
#> • observations
#> • media
#> Use `unclass()` to print the Data Package as a list.
Note that you can pipe most functions (see
vignette("data-package")
).
write_package()
writes a package to disk as a
datapackage.json
file. This file includes the metadata of
all the resources. write_package()
also writes resource
data to CSV files, unless the referred data are referred to be URL or
inline. See the function documentation for details.
name
is required. It is used to identify a resource in
read_resource()
, add_resource()
and
remove_resource()
(always as the second argument):
add_resource()
sets name
to the provided
resource_name
:
path
or data
(see further) is required. Providing both is not
allowed.
path
is for data in files (e.g. a CSV file). It can be a
local path or URL. Supported protocols are http
,
https
, ftp
, sftp
and
sftp
. Absolute paths (/
) or relative parent
paths (../
) are not allowed to avoid security
vulnerabilities.
When multiple paths are provided
("path": ["myfile1.csv", "myfile2.csv"]
), the files are
expected to have the same structure. read_resource()
merges
these into a single data frame in the order the paths are provided
(using dplyr::bind_rows()
):
# The "observations" resource has multiple files in path
package$resources[[2]]$path
#> [1] "observations_1.tsv" "observations_2.tsv"
# These are combined into a single data frame when reading
read_resource(package, "observations")
#> # A tibble: 8 × 7
#> observation_id deployment_id timestamp scientific_name count
#> <chr> <chr> <dttm> <chr> <dbl>
#> 1 1-1 1 2020-09-28 00:13:07 Capreolus capreolus 1
#> 2 1-2 1 2020-09-28 15:59:17 Capreolus capreolus 1
#> 3 1-3 1 2020-09-28 16:35:23 Lepus europaeus 1
#> 4 1-4 1 2020-09-28 17:04:04 Lepus europaeus 1
#> 5 1-5 1 2020-09-28 19:19:54 Sus scrofa 2
#> 6 2-1 2 2021-10-01 01:25:06 Sus scrofa 1
#> 7 2-2 2 2021-10-01 01:25:06 Sus scrofa 1
#> 8 2-3 2 2021-10-01 04:47:30 Sus scrofa 1
#> # ℹ 2 more variables: life_stage <fct>, comments <chr>
add_resource()
sets path
to the path(s)
provided in data
:
Note: Support for inline data
is currently limited,
e.g. JSON object and string are not supported and schema
,
mediatype
and format
are ignored.
data
is for inline data (included in the
datapackage.json
). read_resource()
attempts to
read data
if it is provided as a JSON array:
# The "media" resource has inline data
str(package$resources[[3]]$data)
#> List of 3
#> $ :List of 5
#> ..$ media_id : chr "aed5fa71-3ed4-4284-a6ba-3550d1a4de8d"
#> ..$ deployment_id : chr "1"
#> ..$ observation_id: chr "1-1"
#> ..$ timestamp : chr "2020-09-28 02:14:59+02:00"
#> ..$ file_path : chr "https://multimedia.agouti.eu/assets/aed5fa71-3ed4-4284-a6ba-3550d1a4de8d/file"
#> $ :List of 5
#> ..$ media_id : chr "da81a501-8236-4cbd-aa95-4bc4b10a05df"
#> ..$ deployment_id : chr "1"
#> ..$ observation_id: chr "1-1"
#> ..$ timestamp : chr "2020-09-28 02:15:00+02:00"
#> ..$ file_path : chr "https://multimedia.agouti.eu/assets/da81a501-8236-4cbd-aa95-4bc4b10a05df/file"
#> $ :List of 5
#> ..$ media_id : chr "0ba57608-3cf1-49d6-a5a2-fe680851024d"
#> ..$ deployment_id : chr "1"
#> ..$ observation_id: chr "1-1"
#> ..$ timestamp : chr "2020-09-28 02:15:01+02:00"
#> ..$ file_path : chr "https://multimedia.agouti.eu/assets/0ba57608-3cf1-49d6-a5a2-fe680851024d/file"
read_resource(package, "media")
#> # A tibble: 3 × 5
#> media_id deployment_id observation_id timestamp file_path
#> <chr> <chr> <chr> <chr> <chr>
#> 1 aed5fa71-3ed4-4284-a6ba-3550… 1 1-1 2020-09-… https://…
#> 2 da81a501-8236-4cbd-aa95-4bc4… 1 1-1 2020-09-… https://…
#> 3 0ba57608-3cf1-49d6-a5a2-fe68… 1 1-1 2020-09-… https://…
add_resource()
adds the provided data frame to
data
:
df <- data.frame("col_1" = c(1, 2), "col_2" = c("a", "b"))
package <- add_resource(package, "df", df)
package$resources[[4]]$data
#> col_1 col_2
#> 1 1 a
#> 2 2 b
write_package()
writes that data frame to a CSV file,
adds its path to path
and removes data
.
profile
is required to have the value "tabular-data-resource"
.
add_resource()
sets profile
to that value.
schema
is required. It is used by read_resource()
to parse data
types and missing values. It can either be a JSON object or a path or
URL referencing a JSON object. See vignette("table-schema")
for details.
dialect
is used by read_resource()
to parse a tabular data file. It
can either be a JSON object or a path or URL referencing a JSON object.
See vignette("table-dialect")
for details.
title
is ignored by read_resource()
and not set by
add_resource()
, unless provided:
description
is ignored by read_resource()
and not set by
add_resource()
unless provided
(cf. title
).
format
is ignored by read_resource()
. add_resource()
sets format
when data are provided as a file, based on the
provided delim
:
delim | format |
---|---|
"," (default) |
"csv" |
"\t" |
"tsv" |
any other value | "csv" |
path <- system.file("extdata", "observations_1.tsv", package = "frictionless")
package <- add_resource(package, "observations", data = path, delim = "\t", replace = TRUE)
package$resources[[2]]$format
#> [1] "tsv"
add_resource()
leaves format
undefined when
data are provided as a data frame. write_package()
sets it
to "csv"
when writing to disk.
mediatype
is ignored by read_resource()
. add_resource()
sets mediatype
when data are provided as a file, based on
the provided delim
:
delim | mediatype |
---|---|
"," (default) |
"text/csv" |
"\t" |
"text/tab-separated-values" |
any other value | "text/csv" |
path <- system.file("extdata", "observations_1.tsv", package = "frictionless")
package <- add_resource(package, "observations", data = path, delim = "\t", replace = TRUE)
package$resources[[2]]$mediatype
#> [1] "text/tab-separated-values"
add_resource()
leaves mediatype
undefined
when data are provided as a data frame. write_package()
sets it to "text/csv"
when writing to disk.
encoding
(e.g. "windows-1252"
) is used by
read_resource()
to parse the file. It defaults to UTF-8 if
no encoding
is provided or if it cannot be recognized. The
returned data frame is always UTF-8.
add_resource()
guesses the encoding
(using
readr::guess_encoding()
) when data are provided as file. It
leaves the encoding
undefined when data are provided as a
data frame. write_package()
sets it to "utf-8"
when writing to disk.
bytes
is ignored by read_resource()
and not set by
add_resource()
unless provided
(cf. title
).
hash
is ignored by read_resource()
and not set by
add_resource()
unless provided
(cf. title
).
sources
is ignored by read_resource()
and not set by
add_resource()
unless provided
(cf. title
).
licenses
is ignored by read_resource()
and not set by
add_resource()
unless provided
(cf. title
).
compression
(a recipe) is ignored by read_resource()
and not set by
add_resource()
.
Compression is derived from the provided path
instead.
If the path
ends in .gz
, .bz2
,
.xz
, or .zip
, the files are automatically
decompressed by read_resource()
(using default
readr::read_delim()
functionality). Only .gz
files can be read directly from URL path
s.