This vignette is an introduction to condvis2. This package constructs shiny-based interactive visualisations of prediction models.
We will use the airquality data built in to R.
<- na.omit(airquality) ozone
The first fit we consider uses Wind only to predict ozone. Here we construct the fit and plot the results.
<- loess(Ozone~Wind, data=ozone)
fit1 plot(Ozone~Wind, data=ozone, xlim=c(1.5, 21), ylim=c(-5,175))
<- seq(min(ozone$Wind), max(ozone$Wind), length.out=30)
wind lines(wind, predict(fit1, data.frame(Wind=wind)))
If we add in a second predictor, we can either move to plotting the surface, or plot how the fit relates to one predictor, for a fixed value of another.
<- loess(Ozone~Wind+Solar.R, data=ozone)
fit2 par(mfrow=c(1,3))
par(mar=c(5,5,3,1))
for (s in quantile(ozone$Solar.R, c(.25,.5,.75))){
plot(Ozone~Wind, data=subset(ozone, Solar.R <= s+20 & Solar.R >= s-20),
xlim=c(1.5, 21),ylim=c(-5,175), main=paste0("Solar.R=",s))
lines(wind, predict(fit2, data.frame(Wind=wind, Solar.R= s)))
}
Notice in the above plots we are only plotting observations where Solar.R values are close to the value used for prediction.
These plots can be constructed interactively in
condvis2
:
suppressMessages(library(condvis2))
condvis(ozone, fit2,sectionvars="Wind", conditionvars="Solar.R")
This gives the following display:
Choose a sectionvar
menu.In this way we can see how well the curve fits to observations in its section.
If we include another predictor in the fit
<- loess(Ozone~Wind+Solar.R+Temp, data=ozone) fit3
and invoke condvis
condvis(ozone, fit3, sectionvars="Wind", conditionvars=c("Solar.R", "Temp"))
the result is shown in the screenshot below.
You can also pick a second section variable, to see how the fitted surface looks as a function of two variables, while the third is modified interactively.
The image shows fitted surface levels as a function of two predictors. The points shown are near the fixed conditioning values. For an image plot, the point colour represents the observed response, coloured in the same way as the fitted surface. Here we convey the distance of an observations condition values to the fixed condition levels by size. You will see some points in the main display whose colour does not match the image colour, for example, the two darker points around Wind=10, with low Solar.R. As the points are large, they are near the conditioning values which are poorly fit.
There is very little data at low Temp, and none of these observations have a Wind value below 7. So the steep incline at the left hand side of the main plot is not supported by the data. Prediction in this area is extrapolation.
Alternatively, the relationship between Ozone and (Wind, Solar.R) for fixed Temp may be displayed as a 3-d surface. The points near the selected value of Temp are dark blue, lightening in colour as the distance increases. The residual sizes are shown via line segments.
We could fit a second fit to the same data and compare the results.
library(e1071)
<- svm(Ozone~Wind+Solar.R+Temp, data=ozone)
fit4 condvis(ozone, list(loess=fit3,svm=fit4), sectionvars="Wind", conditionvars=c("Solar.R", "Temp"))
We see big differences between the fits, especially for low Wind.
Note: condvis uses CVpredict to extract predictions from fitted models. By default CVpredict uses predict, but there are customised versions of CVpredict for many fit types. (See the help page.)
Condvis2 supports confidence intervals, for fits that offer them.
Use
<- lm(Ozone~Wind+Solar.R+Temp, data=ozone)
fit5 condvis(ozone, fit5,
sectionvars="Wind", conditionvars=c("Solar.R", "Temp"), predictArgs=list(list(pinterval="confidence")))
The loess fit does not provide confidence intervals but it does offer standard errors, which can be used for confidence bounds:
<- fit3
fitu class(fitu)<- c("upper", class(fitu))
<- function(f, newdata, ...){
CVpredict.upper <- predict(f, newdata, se=T)
p $fit+ 2*p$se.fit
p
}
<- fit3
fitl class(fitl)<- c("lower", class(fitu))
<- function(f, newdata, ...){
CVpredict.lower <- predict(f, newdata, se=T)
p $fit- 2*p$se.fit
p
}
condvis(ozone, list(loess=fit3,lower=fitl,upper=fitu),
sectionvars="Wind", conditionvars=c("Solar.R", "Temp"),
linecols=c("red", "blue","blue"))
As the conditioning dimension increases, it might be easier to view sections that are automatically chosen. The tour controls offer various controls.
Tour Step
to watch.Tour Length
slider.Interp steps
from 0.library(ks)
data(iris)
<- kde(x=iris[,1:3])
irisf
condvis(data = iris, model = list(kde=irisf),
sectionvars= c("Sepal.Length", "Sepal.Width"),
conditionvars= "Petal.Length", density=T)
The result is shown in the screenshot below. It shows the estimated density of two variables conditional on the third.
Catherine B. Hurley, Mark O’Connell, Katarina Domijan. (2021) Interactive slice visualization for exploring machine learning models. arXiv 2101.06986.
Mark O’Connell, Catherine Hurley, Katarina Domijan. (2017) Conditional Visualization for Statistical Models: An Introduction to the condvis Package in R. Journal of Statistical Software 81(5) 1–20.