Function | Works |
---|---|
tidypredict_fit() , tidypredict_sql() ,
parse_model() |
✔ |
tidypredict_to_column() |
✗ |
tidypredict_test() |
✗ |
tidypredict_interval() ,
tidypredict_sql_interval() |
✗ |
parsnip |
✔ |
Here is a simple ranger()
model using the
iris
dataset:
library(dplyr)
library(tidypredict)
library(ranger)
<- ranger(Species ~ .,data = iris, num.trees = 100) model
The parser is based on the output from the
ranger::treeInfo()
function. It will return as many
decision paths as there are non-NA rows in the prediction
field.
treeInfo(model) %>%
head()
#> nodeID leftChild rightChild splitvarID splitvarName splitval terminal
#> 1 0 1 2 3 Petal.Width 0.75 FALSE
#> 2 1 NA NA NA <NA> NA TRUE
#> 3 2 3 4 3 Petal.Width 1.75 FALSE
#> 4 3 5 6 0 Sepal.Length 7.10 FALSE
#> 5 4 NA NA NA <NA> NA TRUE
#> 6 5 7 8 3 Petal.Width 1.65 FALSE
#> prediction
#> 1 <NA>
#> 2 setosa
#> 3 <NA>
#> 4 <NA>
#> 5 virginica
#> 6 <NA>
The output from parse_model()
is transformed into a
dplyr
, a.k.a Tidy Eval, formula. The entire decision tree
becomes one dplyr::case_when()
statement
tidypredict_fit(model)[1]
#> [[1]]
#> case_when(Petal.Width < 0.75 ~ "setosa", Petal.Width >= 1.75 &
#> Petal.Width >= 0.75 ~ "virginica", Sepal.Length >= 7.1 &
#> Petal.Width < 1.75 & Petal.Width >= 0.75 ~ "virginica", Petal.Length >=
#> 5.35 & Petal.Width < 1.65 & Sepal.Length < 7.1 & Petal.Width <
#> 1.75 & Petal.Width >= 0.75 ~ "virginica", Sepal.Width < 2.75 &
#> Petal.Width >= 1.65 & Sepal.Length < 7.1 & Petal.Width <
#> 1.75 & Petal.Width >= 0.75 ~ "virginica", Sepal.Width >=
#> 2.75 & Petal.Width >= 1.65 & Sepal.Length < 7.1 & Petal.Width <
#> 1.75 & Petal.Width >= 0.75 ~ "versicolor", Petal.Width <
#> 1.45 & Petal.Length < 5.35 & Petal.Width < 1.65 & Sepal.Length <
#> 7.1 & Petal.Width < 1.75 & Petal.Width >= 0.75 ~ "versicolor",
#> Petal.Length < 5 & Petal.Width >= 1.45 & Petal.Length < 5.35 &
#> Petal.Width < 1.65 & Sepal.Length < 7.1 & Petal.Width <
#> 1.75 & Petal.Width >= 0.75 ~ "versicolor", Sepal.Length <
#> 6.15 & Petal.Length >= 5 & Petal.Width >= 1.45 & Petal.Length <
#> 5.35 & Petal.Width < 1.65 & Sepal.Length < 7.1 & Petal.Width <
#> 1.75 & Petal.Width >= 0.75 ~ "versicolor", Sepal.Length >=
#> 6.15 & Petal.Length >= 5 & Petal.Width >= 1.45 & Petal.Length <
#> 5.35 & Petal.Width < 1.65 & Sepal.Length < 7.1 & Petal.Width <
#> 1.75 & Petal.Width >= 0.75 ~ "virginica")
From there, the Tidy Eval formula can be used anywhere where it can
be operated. tidypredict
provides three paths:
dplyr
,
mutate(iris, !! tidypredict_fit(model))
tidypredict_to_column(model)
to a piped command
settidypredict_to_sql(model)
to retrieve the SQL
statementtidypredict
also supports ranger
model
objects fitted via the parsnip
package.
library(parsnip)
<- rand_forest(mode = "classification") %>%
parsnip_model set_engine("ranger") %>%
fit(Species ~ ., data = iris)
tidypredict_fit(parsnip_model)[[1]]
#> case_when(Petal.Width < 0.7 ~ "setosa", Petal.Length < 4.85 &
#> Petal.Width >= 0.7 ~ "versicolor", Petal.Width < 1.75 & Petal.Length >=
#> 4.85 & Petal.Width >= 0.7 ~ "versicolor", Petal.Width >=
#> 1.75 & Petal.Length >= 4.85 & Petal.Width >= 0.7 ~ "virginica")