The goal of ucimlrepo
is to download and import data
sets directly into R from the UCI
Machine Learning Repository.
[!IMPORTANT]
This package is an unoffical port of the Python
ucimlrepo
package.
[!NOTE]
Want to have datasets alongside a help documentation entry?
Check out the
{ucidata}
R package! The package provides a small selection of data sets from the UC Irvine Machine Learning Repository alongside of help entries.
You can install the development version of ucimlrepo from GitHub with:
# install.packages("remotes")
::install_github("coatless-rpkg/ucimlrepo") remotes
To use ucimlrepo
, load the package using:
library(ucimlrepo)
With the package now loaded, we can download a dataset using the
fetch_ucirepo()
function or use the
list_available_datasets()
function to view a list of
available datasets.
For example, to download the iris
dataset, we can
use:
# Fetch a dataset by name
<- fetch_ucirepo(name = "iris")
iris_by_name names(iris_by_name)
#> [1] "data" "metadata" "variables"
There are many levels to the data returned. For example, we can
extract the original data frame containing the iris
dataset
using:
<- iris_by_name$data$original
iris_uci head(iris_uci)
#> sepal length sepal width petal length petal width class
#> 1 5.1 3.5 1.4 0.2 Iris-setosa
#> 2 4.9 3.0 1.4 0.2 Iris-setosa
#> 3 4.7 3.2 1.3 0.2 Iris-setosa
#> 4 4.6 3.1 1.5 0.2 Iris-setosa
#> 5 5.0 3.6 1.4 0.2 Iris-setosa
#> 6 5.4 3.9 1.7 0.4 Iris-setosa
Alternatively, we could retrieve two data frames, one for the features and one for the targets:
<- iris_by_name$data$features
iris_features <- iris_by_name$data$targets iris_targets
We can then view the first few rows of each data frame:
head(iris_features)
#> sepal length sepal width petal length petal width
#> 1 5.1 3.5 1.4 0.2
#> 2 4.9 3.0 1.4 0.2
#> 3 4.7 3.2 1.3 0.2
#> 4 4.6 3.1 1.5 0.2
#> 5 5.0 3.6 1.4 0.2
#> 6 5.4 3.9 1.7 0.4
head(iris_targets)
#> class
#> 1 Iris-setosa
#> 2 Iris-setosa
#> 3 Iris-setosa
#> 4 Iris-setosa
#> 5 Iris-setosa
#> 6 Iris-setosa
Alternatively, you can also directly query by using an ID found by
using list_available_datasets()
or by looking up the
dataset on the UCI ML Repo website:
# Fetch a dataset by id
<- fetch_ucirepo(id = 53) iris_by_id
We can also view a list of data sets available for download using the
list_available_datasets()
function:
# List available datasets
list_available_datasets()
[!NOTE]
Not all 600+ datasets on UCI ML Repo are available for download using the package. The current list of available datasets can be viewed here.
If you would like to see a specific dataset added, please submit a comment on an issue ticket in the upstream repository.