This vignette is to explain nls14, an R package to try to bring the R function nls() up to date and to add capabilities for the extension of the symbolic and automatic derivative tools in R.
A particular goal in nls14 is to attempt, wherever possible, to use analytic or automatic derivatives. The function nls() uses a rather weak forward derivative approximation. A second objective is to use a Marquardt stabilization of the Gauss-Newton equations to avoid the commonly encountered “singular gradient” failure of nls(). This refers to the loss of rank of the Jacobian at the parameters for evaluation. The particular stabilization also incorporates a very simple trick to avoid very small diagonal elements of the Jacobian inner product, though in the present implementations, this is accomplished indirectly. See the section below Implementation of method
A large part of the work of this package was carried out by Duncan Murdoch. Without his input, much of the capability of the package would not exist.
The package and this vignette are a work in progress, and assistance and examples are welcome. Note that there is similar work in the package Deriv (Andrew Clausen and Serguei Sokol (2015) Symbolic Differentiation, version 2.0, http://CRAN.R-project.org/package=Deriv)
extracts and displays the coefficients for a model estimated by nlxb() or nlfb() in the nls14 structured object.
Functions Deriv and fnDeriv are designed as replacements for the stats package functions D and deriv respectively, though the argument lists do not match exactly.
This function finds common subexpressions in an expression vector so that duplicate computation can be avoided.
These functions create functions to evaluate residuals or sums of squares at particular parameter locations.
These functions create functions to evaluate residuals or sums of squares at particular parameter locations. The attempt is to have dot-args available. This is experimental.
Output a description of the model and the data that was used at the time of its creation to the console and optionally to a file. The purpose of this function is to provide a record of the details underlying the function
Given a nonlinear model expressed as an expression of the form lhs ~ formula_for_rhs and a start vector where parameters used in the model formula are named, attempts to find the minimum of the residual sum of squares using the Nash variant (Nash, 1979) of the Marquardt algorithm, where the linear sub-problem is solved by a qr method.
Given a nonlinear model expressed as an expression of the form lhs ~ formula_for_rhs and a start vector where parameters used in the model formula are named, attempts to find the minimum of the residual sum of squares using the Nash variant (Nash, 1979) of the Marquardt algorithm, where the linear sub-problem is solved by a qr method.
Given a nonlinear model expressed as an expression of the form lhs ~ formula_for_rhs and a start vector where parameters used in the model formula are named, attempts to find the minimum of the residual sum of squares using the Nash variant (Nash, 1979) of the Marquardt algorithm, where the linear sub-problem is solved by a qr method.
Print summary output (but involving some serious computations!) of an object of class nls14 from or from package .
For a nonlinear model originally expressed as an expression of the form lhs ~ formula_for_rhs assume we have a resfn and jacfn that compute the residuals and the Jacobian at a set of parameters. This routine computes the gradient, that is, t(Jacobian) . residuals.
For a nonlinear model originally expressed as an expression of the form lhs ~ formula_for_rhs assume we have a resfn and jacfn that compute the residuals and the Jacobian at a set of parameters. This routine computes the gradient, that is, t(Jacobian) %*% residuals.
Functions to simplify expressions. simplifies expressions according to rules specified by .
Provide a summary output (but involving some serious computations!) of an object of class nls14 from nls14xb or nls14fb from package nls14.
Given a nonlinear model expressed as an expression of the form
lhs ~ formula_for_rhs
and a start vector where parameters used in the model formula are named, attempts to find the minimum of the residual sum of squares using the Nash variant (Nash, 1979) of the Marquardt algorithm, where the linear sub-problem is solved by a qr method.
Multi-line functions present a challenge. This is in part because the chain rule may have to be applied backwards (last line first), but also because there may be structures that are not always differentiable, such as if statements.
We would like to be able to find the Jacobian or gradient of functions that have as their parameters a vector, e.g., prm. At time of writing (January 2015) we cannot specify such a vector within nls14. The following script shows things that work or fail.
# tryhobbsderiv.R
hobbs.res<-function(x){ # Hobbs weeds problem -- residual
# This variant uses looping
# if(length(x) != 3) stop("hobbs.res -- parameter vector n!=3")
y<-c(5.308, 7.24, 9.638, 12.866, 17.069, 23.192, 31.443, 38.558, 50.156, 62.948,
75.995, 91.972)
t<-1:12
# if(abs(12*x[3])>50) {
# res<-rep(Inf,12)
# } else {
res<-x[1]/(1+x[2]*exp(-x[3]*t)) - y
# }
}
# test it
start1 <- c(200, 50, .3)
cat("residuals at start1:")
## residuals at start1:
r1 <- hobbs.res(start1)
print(r1)
## [1] -0.05050236 -0.20779506 -0.26086802 -0.41247546 -0.61690702
## [6] -1.60525418 -3.36423901 -2.43016453 -4.28734016 -5.63079076
## [11] -5.67542775 -7.44779537
require(nls14)
## Loading required package: nls14
Jr1a <- try(fnDeriv(r1, "x"))
y<-c(5.308, 7.24, 9.638, 12.866, 17.069, 23.192, 31.443, 38.558, 50.156, 62.948,
75.995, 91.972)
t <- 1:12
hobbs1 <- function(x){ res<-x[1]/(1+x[2]*exp(-x[3]*t)) - y }
hobbs1m <- function(x){ res<-x001/(1+x002*exp(-x003*t)) - y }
Jr11m <- try(fnDeriv(hobbs1m, c("x001", "x002", "x003")))
hobbs1me <- function(x){ expression(x001/(1+x002*exp(-x003*t)) - y) }
Jr11me <- try(fnDeriv(hobbs1me, c("x001", "x002", "x003")))
Jr11ex <- try(fnDeriv(expression(x001/(1+x002*exp(-x003*t)) - y)
, c("x001", "x002", "x003")))
Jr11ex
## {
## .expr1 <- -x003
## .expr2 <- .expr1 * t
## .expr3 <- exp(.expr2)
## .expr4 <- x002 * .expr3
## .expr5 <- 1 + .expr4
## .expr6 <- .expr5^2
## .value <- x001/(.expr5) - y
## .grad <- array(0, c(length(.value), 2L), list(NULL, c("x001",
## "x002", "x003")))
## .grad[, "x001"] <- 1/.expr5
## .grad[, "x002"] <- -(x001 * .expr3/.expr6)
## .grad[, "x003"] <- -(x001 * (x002 * (.expr3 * -t))/.expr6)
## attr(.value, "gradient") <- .grad
## .value
## }
x001 <- start1[1]
x002 <- start1[2]
x003 <- start1[3]
print(hobbs1m(start1)) # start1 actually ignored
## [1] -0.05050236 -0.20779506 -0.26086802 -0.41247546 -0.61690702
## [6] -1.60525418 -3.36423901 -2.43016453 -4.28734016 -5.63079076
## [11] -5.67542775 -7.44779537
print(eval(hobbs1me(start1))) # start1 actually ignored
## [1] -0.05050236 -0.20779506 -0.26086802 -0.41247546 -0.61690702
## [6] -1.60525418 -3.36423901 -2.43016453 -4.28734016 -5.63079076
## [11] -5.67542775 -7.44779537
print(try(eval(Jr11ex)))
## [1] "Error in array(0, c(length(.value), 2L), list(NULL, c(\"x001\", \"x002\", : \n length of 'dimnames' [2] not equal to array extent\n"
## attr(,"class")
## [1] "try-error"
## attr(,"condition")
## <simpleError in array(0, c(length(.value), 2L), list(NULL, c("x001", "x002", "x003"))): length of 'dimnames' [2] not equal to array extent>
resx <- expression(x001/(1+x002*exp(-x003*t)) - y)
res1 <- Deriv(resx, "x001", do_substitute=FALSE)
res1
## expression(1/(1 + x002 * exp(-x003 * t)))
col1 <- eval(res1)
res2 <- Deriv(resx, "x002", do_substitute=FALSE)
res2
## expression(-(x001 * exp(-x003 * t)/(1 + x002 * exp(-x003 * t))^2))
col2 <- eval(res2)
res3 <- Deriv(resx, "x003", do_substitute=FALSE)
res3
## expression(-(x001 * (x002 * (exp(-x003 * t) * -t))/(1 + x002 *
## exp(-x003 * t))^2))
col3 <- eval(res3)
hobJac <- cbind(col1, col2, col3)
print(hobJac)
## col1 col2 col3
## [1,] 0.02628749 -0.1023858 5.119291
## [2,] 0.03516102 -0.1356989 13.569891
## [3,] 0.04688566 -0.1787496 26.812437
## [4,] 0.06226762 -0.2335615 46.712293
## [5,] 0.08226046 -0.3019747 75.493681
## [6,] 0.10793373 -0.3851362 115.540847
## [7,] 0.14039380 -0.4827335 168.956738
## [8,] 0.18063918 -0.5920347 236.813864
## [9,] 0.22934330 -0.7069798 318.140911
## [10,] 0.28658605 -0.8178179 408.908969
## [11,] 0.35159786 -0.9119072 501.548971
## [12,] 0.42262102 -0.9760500 585.629985
Jacobians, the matrices of partial derivatives of residuals with respect to the parameters, have a vector input.
Nonlinear least squares methods are mostly founded on some or other variant of the Gauss-Newton algorithm. The function we wish to minimize is the sum of squares of the (nonlinear) residuals r(x) where there are m observations (elements of r) and n parameters x. Hence the function is
f(x) = sum(r(k)^2)
Newton’s method starts with an original set of parameters x[0]. At a given iteraion, which could be the first, we want to solve
x[k+1] = x[k] - H^(-1) g
where H is the Hessian and g is the gradient at x[k]. We can rewrite this as a solution, at each iteration, of
H delta = -g
with
x[k+1] = x[k] + delta
For the particular sum of squares, the gradient is
g(x) = 2 * r(k)
and
H(x) = 2 ( J' J + sum(r * Z))
where J is the Jacobian (first derivatives of r w.r.t. x) and Z is the tensor of second derivatives of r w.r.t. x). Note that J’ is the transpose of J.
The primary simplification of the Gauss-Newton method is to assume that the second term above is negligible. As there is a common factor of 2 on each side of the Newton iteration after the simplification of the Hessian, the Gauss-Newton iteration equation is
J' J delta = - J' r
This iteration frequently fails. The approximation of the Hessian by the Jacobian inner-product is one reason, but there is also the possibility that the sum of squares function is not “quadratic” enough that the unit step reduces it. Hartley introduced a line search along delta, while Marquardt suggested replacing J’ J with (J’ J + lambda * D) where D is a diagonal matrix intended to partially approximate the omitted portion of the Hessian.
Marquardt suggested D = I (a unit matrix) or D = (diagonal part of J’ J). The former approach, when lambda is large enough that the iteration is essentially
delta = - g / lambda
we get a version of the steepest descents algorithm. Using the diagonal of J’ J, we have a scaled version of this (see http://en.wikipedia.org/wiki/Levenberg%E2%80%93Marquardt_algorithm)
Nash (1977) ??ref?? found that on low precision machines, it was common for diagonal elements of J’ J to underflow. A very small modification to solve
(J' J + lambda * (D + phi * I)) * delta = - g
where phi is a small number (phi = 1 seems to work quite well) ?? check??.
In Nash (1979), the iteration equation was solved as stated. However, this involves forming the sum of squares and cross products of J, a process that loses some numerical precision. A better way to solve the linear equations is to apply the QR decomposition to the matrix J itself. However, we still need to incorporate the lambda * I or lambda * D adjustments. This is done by adding rows to J that are the square roots of the “pieces”. We add 1 row for each diagonal element of I and each diagonal element of D.
In each iteration, we reduce the lambda parameter before solution. If the resulting sum of squares is not reduced, lambda is increased, otherwise we move to the next iteration.
Almost all statistical functions have exogenous data that is needed to compute residuals or likelihoods and is not dependent on the model parameters. (This section starts from Notes140806.)
model2rjfun does NOT have … args.
Should it have? i.e., a problem where we are fitting a set of time series, 1 for each plant/animal, with some sort of start parameter for each that is NOT estimated (e.g., pH of soil, some index of health).
Difficulty in such a problem is that the residuals are then a matrix, and the nlfb rather than nlxb is a better approach. However, fitting 1 series would still need this data, and example nls14tryextraparms.txt shows that the extra parm (ms in this case) needs to be in the user’s globalenv.
rm(list=ls())
require(nls14)
# want to have data AND extra parameters (NOT to be estimated)
traceval <- TRUE # traceval set TRUE to debug or give full history
# Data for Hobbs problem
ydat <- c(5.308, 7.24, 9.638, 12.866, 17.069, 23.192, 31.443,
38.558, 50.156, 62.948, 75.995, 91.972) # for testing
tdat <- seq_along(ydat) # for testing
# A simple starting vector -- must have named parameters for nls14xb, nls, wrap14nls.
start1 <- c(b1=1, b2=1, b3=1)
startf1 <- c(b1=1, b2=1, b3=.1)
eunsc <- y ~ b1/(1+b2*exp(-b3*tt))
cat("LOCAL DATA IN DATA FRAMES\n")
## LOCAL DATA IN DATA FRAMES
weeddata1 <- data.frame(y=ydat, tt=tdat)
cat("weeddata contains the original data\n")
## weeddata contains the original data
ms <- 2 # define the external parameter here
cat("wdata scales y by ms =",ms,"\n")
## wdata scales y by ms = 2
wdata <- data.frame(y=ydat/ms, tt=tdat)
wdata
## y tt
## 1 2.6540 1
## 2 3.6200 2
## 3 4.8190 3
## 4 6.4330 4
## 5 8.5345 5
## 6 11.5960 6
## 7 15.7215 7
## 8 19.2790 8
## 9 25.0780 9
## 10 31.4740 10
## 11 37.9975 11
## 12 45.9860 12
cat("estimate the UNSCALED model with SCALED data\n")
## estimate the UNSCALED model with SCALED data
anls14xbs <- try(nls14xb(eunsc, start=start1, trace=traceval, data=wdata))
## formula: y ~ b1/(1 + b2 * exp(-b3 * tt))
## lower:[1] -Inf -Inf -Inf
## upper:[1] Inf Inf Inf
## $watch
## [1] FALSE
##
## $phi
## [1] 1
##
## $lamda
## [1] 1e-04
##
## $offset
## [1] 100
##
## $laminc
## [1] 10
##
## $lamdec
## [1] 4
##
## $femax
## [1] 10000
##
## $jemax
## [1] 5000
##
## $rofftest
## [1] TRUE
##
## $smallsstest
## [1] TRUE
##
## vn:[1] "y" "b1" "b2" "b3" "tt"
## datvar:[1] "y" "tt"
## Data variable y : [1] 2.6540 3.6200 4.8190 6.4330 8.5345 11.5960 15.7215 19.2790
## [9] 25.0780 31.4740 37.9975 45.9860
## Data variable tt : [1] 1 2 3 4 5 6 7 8 9 10 11 12
## Finished masks check
## model2rjfun: modelformula = y ~ b1/(1 + b2 * exp(-b3 * tt))
## [1] "formula"
## trjfn:
## function (prm)
## {
## if (is.null(names(prm)))
## names(prm) <- names(pvec)
## localdata <- list2env(as.list(prm), parent = data)
## eval(residexpr, envir = localdata)
## }
## <environment: 0x150b258>
## Starting pnum=b1 b2 b3
## 1 1 1
## Start:lamda: 1e-04 SS= 5676.925 at b1 = 1 b2 = 1 b3 = 1 1 / 0
## lamda: 0.001 SS= 6088.946 at b1 = 25.44327 b2 = -119.8584 b3 = -185.954 2 / 1
## lamda: 0.01 SS= 6088.946 at b1 = 25.09223 b2 = -94.84981 b3 = -166.6058 3 / 1
## lamda: 0.1 SS= 6088.946 at b1 = 23.49226 b2 = -12.1639 b3 = -96.94824 4 / 1
## lamda: 1 SS= 6088.946 at b1 = 19.10491 b2 = 15.51374 b3 = -31.7677 5 / 1
## lamda: 10 SS= 6077.975 at b1 = 9.663924 b2 = 2.341445 b3 = -1.074466 6 / 1
## <<lamda: 4 SS= 5096.507 at b1 = 2.508867 b2 = 0.9465915 b3 = 1.140924 7 / 1
## <<lamda: 1.6 SS= 4086.732 at b1 = 5.539492 b2 = 1.127669 b3 = 0.9866082 8 / 2
## lamda: 16 SS= 5979.157 at b1 = 10.70911 b2 = 2.436093 b3 = -0.3579844 9 / 3
## <<lamda: 6.4 SS= 3871.108 at b1 = 6.275743 b2 = 1.194974 b3 = 0.9504393 10 / 3
## <<lamda: 2.56 SS= 3427.063 at b1 = 7.913975 b2 = 1.462101 b3 = 0.7695198 11 / 4
## <<lamda: 1.024 SS= 2714.043 at b1 = 11.24279 b2 = 2.292268 b3 = 0.4124356 12 / 5
## <<lamda: 0.4096 SS= 1624.322 at b1 = 17.58076 b2 = 3.978355 b3 = 0.5133315 13 / 6
## <<lamda: 0.16384 SS= 1595.968 at b1 = 25.77752 b2 = 7.844312 b3 = 0.2536126 14 / 7
## <<lamda: 0.065536 SS= 389.9944 at b1 = 36.5797 b2 = 11.35334 b3 = 0.4014417 15 / 8
## <<lamda: 0.0262144 SS= 227.356 at b1 = 47.30675 b2 = 19.47462 b3 = 0.3253081 16 / 9
## <<lamda: 0.01048576 SS= 30.76802 at b1 = 61.38048 b2 = 30.70084 b3 = 0.3549442 17 / 10
## <<lamda: 0.004194304 SS= 7.535561 at b1 = 71.37593 b2 = 39.43433 b3 = 0.3435968 18 / 11
## <<lamda: 0.001677722 SS= 2.40805 at b1 = 80.80069 b2 = 44.72922 b3 = 0.3344339 19 / 12
## <<lamda: 0.0006710886 SS= 1.188571 at b1 = 88.88428 b2 = 47.00714 b3 = 0.3237746 20 / 13
## <<lamda: 0.0002684355 SS= 0.7263783 at b1 = 94.69062 b2 = 48.29672 b3 = 0.3169928 21 / 14
## <<lamda: 0.0001073742 SS= 0.6497016 at b1 = 97.37788 b2 = 48.92225 b3 = 0.3142756 22 / 15
## <<lamda: 4.294967e-05 SS= 0.6468358 at b1 = 98.02182 b2 = 49.07495 b3 = 0.3136437 23 / 16
## <<lamda: 1.717987e-05 SS= 0.6468194 at b1 = 98.08991 b2 = 49.09088 b3 = 0.3135732 24 / 17
## <<lamda: 6.871948e-06 SS= 0.6468193 at b1 = 98.09306 b2 = 49.09162 b3 = 0.3135698 25 / 18
print(anls14xbs)
## nls14 class object: x
## residual sumsquares = 0.64682 on 12 observations
## after 18 Jacobian and 25 function evaluations
## name coeff SE tstat pval gradient JSingval
## b1 98.0931 5.653 17.35 3.164e-08 -1.337e-07 505.4
## b2 49.0916 1.688 29.08 3.281e-10 3.017e-08 0.2344
## b3 0.31357 0.006863 45.69 5.768e-12 -1.398e-05 0.04632
escal <- y ~ ms*b1/(1+b2*exp(-b3*tt))
cat("estimate the SCALED model with scaling provided in the call (ms=0.5)\n")
## estimate the SCALED model with scaling provided in the call (ms=0.5)
anls14xbh <- try(nls14xb(escal, start=start1, trace=traceval, data=weeddata1, ms=0.5))
## formula: y ~ ms * b1/(1 + b2 * exp(-b3 * tt))
## lower:[1] -Inf -Inf -Inf
## upper:[1] Inf Inf Inf
## $watch
## [1] FALSE
##
## $phi
## [1] 1
##
## $lamda
## [1] 1e-04
##
## $offset
## [1] 100
##
## $laminc
## [1] 10
##
## $lamdec
## [1] 4
##
## $femax
## [1] 10000
##
## $jemax
## [1] 5000
##
## $rofftest
## [1] TRUE
##
## $smallsstest
## [1] TRUE
##
## vn:[1] "y" "ms" "b1" "b2" "b3" "tt"
## datvar:[1] "y" "ms" "tt"
## Data variable y : [1] 5.308 7.240 9.638 12.866 17.069 23.192 31.443 38.558 50.156 62.948
## [11] 75.995 91.972
## Data variable ms :[1] 2
## Data variable tt : [1] 1 2 3 4 5 6 7 8 9 10 11 12
## Finished masks check
## model2rjfun: modelformula = y ~ ms * b1/(1 + b2 * exp(-b3 * tt))
## [1] "formula"
## trjfn:
## function (prm)
## {
## if (is.null(names(prm)))
## names(prm) <- names(pvec)
## localdata <- list2env(as.list(prm), parent = data)
## eval(residexpr, envir = localdata)
## }
## <environment: 0x2e30208>
## Starting pnum=b1 b2 b3
## 1 1 1
print(anls14xbh)
## [1] "Error in resfn(pnum, ...) : unused argument (ms = 0.5)\n"
## attr(,"class")
## [1] "try-error"
## attr(,"condition")
## <simpleError in resfn(pnum, ...): unused argument (ms = 0.5)>
cat("\n scaling is now using the globally defined value of ms=",ms,"\n")
##
## scaling is now using the globally defined value of ms= 2
anls14xb1a <- try(nls14xb(escal, start=start1, trace=traceval, data=weeddata1))
## formula: y ~ ms * b1/(1 + b2 * exp(-b3 * tt))
## lower:[1] -Inf -Inf -Inf
## upper:[1] Inf Inf Inf
## $watch
## [1] FALSE
##
## $phi
## [1] 1
##
## $lamda
## [1] 1e-04
##
## $offset
## [1] 100
##
## $laminc
## [1] 10
##
## $lamdec
## [1] 4
##
## $femax
## [1] 10000
##
## $jemax
## [1] 5000
##
## $rofftest
## [1] TRUE
##
## $smallsstest
## [1] TRUE
##
## vn:[1] "y" "ms" "b1" "b2" "b3" "tt"
## datvar:[1] "y" "ms" "tt"
## Data variable y : [1] 5.308 7.240 9.638 12.866 17.069 23.192 31.443 38.558 50.156 62.948
## [11] 75.995 91.972
## Data variable ms :[1] 2
## Data variable tt : [1] 1 2 3 4 5 6 7 8 9 10 11 12
## Finished masks check
## model2rjfun: modelformula = y ~ ms * b1/(1 + b2 * exp(-b3 * tt))
## [1] "formula"
## trjfn:
## function (prm)
## {
## if (is.null(names(prm)))
## names(prm) <- names(pvec)
## localdata <- list2env(as.list(prm), parent = data)
## eval(residexpr, envir = localdata)
## }
## <environment: 0x231a3d8>
## Starting pnum=b1 b2 b3
## 1 1 1
## Start:lamda: 1e-04 SS= 22708 at b1 = 1 b2 = 1 b3 = 1 1 / 0
## lamda: 0.001 SS= 24356 at b1 = 25.472 b2 = -122.14 b3 = -187.69 2 / 1
## lamda: 0.01 SS= 24356 at b1 = 25.327 b2 = -113.2 b3 = -180.68 3 / 1
## lamda: 0.1 SS= 24356 at b1 = 24.283 b2 = -57.588 b3 = -135.68 4 / 1
## lamda: 1 SS= 24356 at b1 = 20.254 b2 = 14.461 b3 = -54.127 5 / 1
## lamda: 10 SS= 24355 at b1 = 10.077 b2 = 4.8657 b3 = -4.5699 6 / 1
## <<lamda: 4 SS= 20252 at b1 = 2.5977 b2 = 0.83113 b3 = 1.4117 7 / 1
## <<lamda: 1.6 SS= 16143 at b1 = 5.7092 b2 = 1.3529 b3 = 0.86529 8 / 2
## lamda: 16 SS= 22536 at b1 = 11.269 b2 = 3.0589 b3 = -0.12239 9 / 3
## <<lamda: 6.4 SS= 15214 at b1 = 6.5093 b2 = 1.428 b3 = 0.86138 10 / 3
## <<lamda: 2.56 SS= 13336 at b1 = 8.2714 b2 = 1.7659 b3 = 0.74306 11 / 4
## <<lamda: 1.024 SS= 10290 at b1 = 11.801 b2 = 2.7987 b3 = 0.46561 12 / 5
## <<lamda: 0.4096 SS= 5999.3 at b1 = 18.477 b2 = 4.9891 b3 = 0.46298 13 / 6
## <<lamda: 0.16384 SS= 3360.8 at b1 = 27.609 b2 = 9.7076 b3 = 0.35379 14 / 7
## <<lamda: 0.065536 SS= 872.75 at b1 = 40.302 b2 = 16.934 b3 = 0.38655 15 / 8
## <<lamda: 0.026214 SS= 308.99 at b1 = 51.624 b2 = 26.628 b3 = 0.36281 16 / 9
## <<lamda: 0.010486 SS= 64.882 at b1 = 63.651 b2 = 36.557 b3 = 0.35778 17 / 10
## <<lamda: 0.0041943 SS= 19.992 at b1 = 73.76 b2 = 43.123 b3 = 0.3463 18 / 11
## <<lamda: 0.0016777 SS= 8.8066 at b1 = 83.053 b2 = 46.013 b3 = 0.33222 19 / 12
## <<lamda: 0.00067109 SS= 4.1734 at b1 = 91.086 b2 = 47.556 b3 = 0.32103 20 / 13
## <<lamda: 0.00026844 SS= 2.7218 at b1 = 96.072 b2 = 48.622 b3 = 0.31554 21 / 14
## <<lamda: 0.00010737 SS= 2.5891 at b1 = 97.8 b2 = 49.023 b3 = 0.31386 22 / 15
## <<lamda: 4.295e-05 SS= 2.5873 at b1 = 98.074 b2 = 49.087 b3 = 0.31359 23 / 16
## <<lamda: 1.718e-05 SS= 2.5873 at b1 = 98.093 b2 = 49.091 b3 = 0.31357 24 / 17
print(anls14xb1a)
## nls14 class object: x
## residual sumsquares = 2.5873 on 12 observations
## after 17 Jacobian and 24 function evaluations
## name coeff SE tstat pval gradient JSingval
## b1 98.0925 5.651 17.36 3.153e-08 -9.046e-06 1011
## b2 49.0915 1.688 29.09 3.27e-10 6.839e-06 0.4687
## b3 0.31357 0.006863 45.69 5.769e-12 -0.003104 0.09268
ms <- 1
cat("\n scaling is now using the globally re-defined value of ms=",ms,"\n")
##
## scaling is now using the globally re-defined value of ms= 1
anls14xb1b <- try(nls14xb(escal, start=start1, trace=traceval, data=weeddata1))
## formula: y ~ ms * b1/(1 + b2 * exp(-b3 * tt))
## lower:[1] -Inf -Inf -Inf
## upper:[1] Inf Inf Inf
## $watch
## [1] FALSE
##
## $phi
## [1] 1
##
## $lamda
## [1] 1e-04
##
## $offset
## [1] 100
##
## $laminc
## [1] 10
##
## $lamdec
## [1] 4
##
## $femax
## [1] 10000
##
## $jemax
## [1] 5000
##
## $rofftest
## [1] TRUE
##
## $smallsstest
## [1] TRUE
##
## vn:[1] "y" "ms" "b1" "b2" "b3" "tt"
## datvar:[1] "y" "ms" "tt"
## Data variable y : [1] 5.308 7.240 9.638 12.866 17.069 23.192 31.443 38.558 50.156 62.948
## [11] 75.995 91.972
## Data variable ms :[1] 1
## Data variable tt : [1] 1 2 3 4 5 6 7 8 9 10 11 12
## Finished masks check
## model2rjfun: modelformula = y ~ ms * b1/(1 + b2 * exp(-b3 * tt))
## [1] "formula"
## trjfn:
## function (prm)
## {
## if (is.null(names(prm)))
## names(prm) <- names(pvec)
## localdata <- list2env(as.list(prm), parent = data)
## eval(residexpr, envir = localdata)
## }
## <environment: 0x2d36710>
## Starting pnum=b1 b2 b3
## 1 1 1
## Start:lamda: 1e-04 SS= 23521 at b1 = 1 b2 = 1 b3 = 1 1 / 0
## lamda: 0.001 SS= 24356 at b1 = 50.886 b2 = -240.72 b3 = -372.91 2 / 1
## lamda: 0.01 SS= 24356 at b1 = 50.183 b2 = -190.69 b3 = -334.2 3 / 1
## lamda: 0.1 SS= 24356 at b1 = 46.97 b2 = -25.309 b3 = -194.81 4 / 1
## lamda: 1 SS= 24356 at b1 = 38.096 b2 = 29.924 b3 = -64.262 5 / 1
## lamda: 10 SS= 24353 at b1 = 18.798 b2 = 3.551 b3 = -2.9011 6 / 1
## <<lamda: 4 SS= 21065 at b1 = 4.1015 b2 = 0.86688 b3 = 1.3297 7 / 1
## <<lamda: 1.6 SS= 16852 at b1 = 10.248 b2 = 1.2302 b3 = 1.0044 8 / 2
## lamda: 16 SS= 24078 at b1 = 20.944 b2 = 2.9473 b3 = -0.40959 9 / 3
## <<lamda: 6.4 SS= 15934 at b1 = 11.771 b2 = 1.3084 b3 = 0.98087 10 / 3
## <<lamda: 2.56 SS= 14050 at b1 = 15.152 b2 = 1.6468 b3 = 0.80462 11 / 4
## <<lamda: 1.024 SS= 10985 at b1 = 22.005 b2 = 2.6632 b3 = 0.45111 12 / 5
## <<lamda: 0.4096 SS= 6427.1 at b1 = 35.128 b2 = 4.7135 b3 = 0.50974 13 / 6
## <<lamda: 0.16384 SS= 4725 at b1 = 52.285 b2 = 9.2948 b3 = 0.31348 14 / 7
## <<lamda: 0.065536 SS= 1132.2 at b1 = 76.446 b2 = 15.142 b3 = 0.41095 15 / 8
## <<lamda: 0.026214 SS= 518.76 at b1 = 96.924 b2 = 24.422 b3 = 0.35816 16 / 9
## <<lamda: 0.010486 SS= 79.464 at b1 = 122.18 b2 = 35.262 b3 = 0.36484 17 / 10
## <<lamda: 0.0041943 SS= 26.387 at b1 = 141.55 b2 = 42.387 b3 = 0.35215 18 / 11
## <<lamda: 0.0016777 SS= 11.339 at b1 = 160.02 b2 = 45.519 b3 = 0.33738 19 / 12
## <<lamda: 0.00067109 SS= 5.2767 at b1 = 176.92 b2 = 47.037 b3 = 0.32443 20 / 13
## <<lamda: 0.00026844 SS= 2.9656 at b1 = 189.12 b2 = 48.279 b3 = 0.31712 21 / 14
## <<lamda: 0.00010737 SS= 2.6003 at b1 = 194.72 b2 = 48.92 b3 = 0.31429 22 / 15
## <<lamda: 4.295e-05 SS= 2.5873 at b1 = 196.04 b2 = 49.075 b3 = 0.31364 23 / 16
## <<lamda: 1.718e-05 SS= 2.5873 at b1 = 196.18 b2 = 49.091 b3 = 0.31357 24 / 17
## <<lamda: 6.8719e-06 SS= 2.5873 at b1 = 196.19 b2 = 49.092 b3 = 0.31357 25 / 18
print(anls14xb1b)
## nls14 class object: x
## residual sumsquares = 2.5873 on 12 observations
## after 18 Jacobian and 25 function evaluations
## name coeff SE tstat pval gradient JSingval
## b1 196.186 11.31 17.35 3.164e-08 -2.663e-07 1011
## b2 49.0916 1.688 29.08 3.281e-10 1.59e-07 0.4605
## b3 0.31357 0.006863 45.69 5.768e-12 -5.531e-05 0.04715
Note that this derivatives table is accomplished by example. We would like to have as many examples of different ways to use the tools as possible in order to provide a good understanding of the tools and their appropriate usage. In particular we want to know of errors. We compare the derivative tools Deriv() and fnDeriv() with the stats package tools D(), deriv() and deriv3().
See specific notes either in comments or at the end.
library(nls14)
# Various derivatives
new <- fnDeriv(quote(1 + x + y), c("x", "y"))
old <- deriv(quote(1 + x + y), c("x", "y"))
print(new)
## {
## .value <- 1 + x + y
## .grad <- array(0, c(length(.value), 2L), list(NULL, c("x",
## "y")))
## .grad[, "x"] <- 1
## .grad[, "y"] <- 1
## attr(.value, "gradient") <- .grad
## .value
## }
# Following generates a very long line on output of knitr (for markdown)
class(new)
## [1] "{"
str(new)
## language { .value <- 1 + x + y; .grad <- array(0, c(length(.value), 2L), list(NULL, c("x", "y"; ))); .grad[, "x"] <- 1; .grad[, "y"] <- 1; attr(.value, "gradient") <- .grad; .value }
as.expression(new)
## expression({
## .value <- 1 + x + y
## .grad <- array(0, c(length(.value), 2L), list(NULL, c("x",
## "y")))
## .grad[, "x"] <- 1
## .grad[, "y"] <- 1
## attr(.value, "gradient") <- .grad
## .value
## })
print(old)
## expression({
## .value <- 1 + x + y
## .grad <- array(0, c(length(.value), 2L), list(NULL, c("x",
## "y")))
## .grad[, "x"] <- 1
## .grad[, "y"] <- 1
## attr(.value, "gradient") <- .grad
## .value
## })
class(old)
## [1] "expression"
str(old)
## expression({ .value <- 1 + x + y .grad <- array(0, c(length(.value), 2L), list(NULL, c("x", "y"))) .grad[, "x"] <- 1 .grad[, "y"] <- 1 attr(.value, "gradient") <- .grad .value })
Unfortunately, the inputs and outputs are not always easily transformed so that the symbolic derivatives can be found. (?? Need to codify this and provide filters so we can get things to work nicely.)
As an example, how could we take object new and embed it in a function we can then use in R? We can certainly copy and paste the output into a function template, as follows,
fnfromnew <- function(x,y){
.value <- 1 + x + y
.grad <- array(0, c(length(.value), 2L), list(NULL, c("x",
"y")))
.grad[, "x"] <- 1
.grad[, "y"] <- 1
attr(.value, "gradient") <- .grad
.value
}
print(fnfromnew(3,5))
## [1] 9
## attr(,"gradient")
## x y
## [1,] 1 1
However, we would ideally like to be able to automate this to generate functions and gradients for nonlinear least squares and optimization calculations. The same criticism applies to the object old
If we have x and y set such that the function is not admissible, then both our old and new functions give a gradient that is seemingly reasonable. While the gradient of this simple function could be considered to be defined for ANY values of x and y, I (JN) am sure most users would wish for a warning at the very least in such cases.
x <- NA
y <- Inf
print(eval(new))
## [1] NA
## attr(,"gradient")
## x y
## [1,] 1 1
print(eval(old))
## [1] NA
## attr(,"gradient")
## x y
## [1,] 1 1
We could define a way to avoid the issue of character vs. expression (and possibly other classes) as follows:
safeD <- function(obj, var) {
# safeguarded D() function for symbolic derivs
if (! is.character(var) ) stop("The variable var MUST be character type")
if (is.character(obj) ) {
eobj <- parse(text=obj)
result <- D(eobj, var)
} else {
result <- D(obj, var)
}
}
lxy2 <- expression(log(x+y^2))
clxy2 <- "log(x+y^2)"
print(D(clxy2, "y"))
## [1] NA
print(D(lxy2, "y"))
## 2 * y/(x + y^2)
print(safeD(clxy2, "y"))
## 2 * y/(x + y^2)
print(safeD(lxy2, "y"))
## 2 * y/(x + y^2)
## Try different ways to supply the log function
aDeriv <- Deriv(log(x), "x")
class(aDeriv)
## [1] "call"
aDeriv
## 1/x
aderiv <- deriv( ~ log(x), "x")
class(aderiv)
## [1] "expression"
aderiv
## expression({
## .value <- log(x)
## .grad <- array(0, c(length(.value), 1L), list(NULL, c("x")))
## .grad[, "x"] <- 1/x
## attr(.value, "gradient") <- .grad
## .value
## })
aD <- D(expression(log(x)), "x")
class(aD)
## [1] "call"
aD
## 1/x
cat("but \n")
## but
D( "~ log(x)", "x") # fails -- gives NA rather than expected answer due to quotes
## [1] NA
try(D( ~ log(x), "x"))
interm <- ~ log(x)
interm
## ~log(x)
class(interm)
## [1] "formula"
interme <- as.expression(interm)
class(interme)
## [1] "expression"
try(D(interme, "x"))
try(deriv(interme, "x"))
try(deriv(interm, "x"))
## expression({
## .value <- log(x)
## .grad <- array(0, c(length(.value), 1L), list(NULL, c("x")))
## .grad[, "x"] <- 1/x
## attr(.value, "gradient") <- .grad
## .value
## })
Deriv(log(x, base=3), "x" ) # OK
## 1/(x * 1.09861228866811)
try(D(expression(log(x, base=3)), "x" )) # fails - only single-argument calls supported
try(deriv(~ log(x, base=3), "x" )) # fails - only single-argument calls supported
try(deriv(expression(log(x, base=3)), "x" )) # fails - only single-argument calls supported
try(deriv3(expression(log(x, base=3)), "x" )) # fails - only single-argument calls supported
fnDeriv(quote(log(x, base=3)), "x" )
## {
## .value <- log(x, base = 3)
## .grad <- array(0, c(length(.value), 2L), list(NULL, "x"))
## .grad[, "x"] <- 1/(x * 1.09861228866811)
## attr(.value, "gradient") <- .grad
## .value
## }
Deriv(exp(x), "x")
## exp(x)
D(expression(exp(x)), "x") # OK
## exp(x)
deriv(~exp(x), "x") # OK, but much more complicated
## expression({
## .expr1 <- exp(x)
## .value <- .expr1
## .grad <- array(0, c(length(.value), 1L), list(NULL, c("x")))
## .grad[, "x"] <- .expr1
## attr(.value, "gradient") <- .grad
## .value
## })
fnDeriv(quote(exp(x)), "x")
## {
## .expr1 <- exp(x)
## .value <- .expr1
## .grad <- array(0, c(length(.value), 2L), list(NULL, "x"))
## .grad[, "x"] <- .expr1
## attr(.value, "gradient") <- .grad
## .value
## }
Deriv(sin(x), "x")
## cos(x)
D(expression(sin(x)), "x")
## cos(x)
deriv(~sin(x), "x")
## expression({
## .value <- sin(x)
## .grad <- array(0, c(length(.value), 1L), list(NULL, c("x")))
## .grad[, "x"] <- cos(x)
## attr(.value, "gradient") <- .grad
## .value
## })
fnDeriv(quote(sin(x)), "x")
## {
## .value <- sin(x)
## .grad <- array(0, c(length(.value), 2L), list(NULL, "x"))
## .grad[, "x"] <- cos(x)
## attr(.value, "gradient") <- .grad
## .value
## }
Deriv(cos(x), "x")
## -sin(x)
D(expression(cos(x)), "x")
## -sin(x)
deriv(~ cos(x), "x")
## expression({
## .value <- cos(x)
## .grad <- array(0, c(length(.value), 1L), list(NULL, c("x")))
## .grad[, "x"] <- -sin(x)
## attr(.value, "gradient") <- .grad
## .value
## })
fnDeriv(quote(cos(x)), "x")
## {
## .value <- cos(x)
## .grad <- array(0, c(length(.value), 2L), list(NULL, "x"))
## .grad[, "x"] <- -sin(x)
## attr(.value, "gradient") <- .grad
## .value
## }
Deriv(tan(x), "x")
## 1/cos(x)^2
D(expression(tan(x)), "x")
## 1/cos(x)^2
deriv(~ tan(x), "x")
## expression({
## .value <- tan(x)
## .grad <- array(0, c(length(.value), 1L), list(NULL, c("x")))
## .grad[, "x"] <- 1/cos(x)^2
## attr(.value, "gradient") <- .grad
## .value
## })
fnDeriv(quote(tan(x)), "x")
## {
## .value <- tan(x)
## .grad <- array(0, c(length(.value), 2L), list(NULL, "x"))
## .grad[, "x"] <- 1/cos(x)^2
## attr(.value, "gradient") <- .grad
## .value
## }
Deriv(sinh(x), "x")
## cosh(x)
D(expression(sinh(x)), "x")
## cosh(x)
deriv(~sinh(x), "x")
## expression({
## .value <- sinh(x)
## .grad <- array(0, c(length(.value), 1L), list(NULL, c("x")))
## .grad[, "x"] <- cosh(x)
## attr(.value, "gradient") <- .grad
## .value
## })
fnDeriv(quote(sinh(x)), "x")
## {
## .value <- sinh(x)
## .grad <- array(0, c(length(.value), 2L), list(NULL, "x"))
## .grad[, "x"] <- cosh(x)
## attr(.value, "gradient") <- .grad
## .value
## }
Deriv(cosh(x), "x")
## sinh(x)
D(expression(cosh(x)), "x")
## sinh(x)
deriv(~cosh(x), "x")
## expression({
## .value <- cosh(x)
## .grad <- array(0, c(length(.value), 1L), list(NULL, c("x")))
## .grad[, "x"] <- sinh(x)
## attr(.value, "gradient") <- .grad
## .value
## })
fnDeriv(quote(cosh(x)), "x")
## {
## .value <- cosh(x)
## .grad <- array(0, c(length(.value), 2L), list(NULL, "x"))
## .grad[, "x"] <- sinh(x)
## attr(.value, "gradient") <- .grad
## .value
## }
Deriv(sqrt(x), "x")
## 0.5/sqrt(x)
D(expression(sqrt(x)), "x")
## 0.5 * x^-0.5
deriv(~sqrt(x), "x")
## expression({
## .value <- sqrt(x)
## .grad <- array(0, c(length(.value), 1L), list(NULL, c("x")))
## .grad[, "x"] <- 0.5 * x^-0.5
## attr(.value, "gradient") <- .grad
## .value
## })
fnDeriv(quote(sqrt(x)), "x")
## {
## .expr1 <- sqrt(x)
## .value <- .expr1
## .grad <- array(0, c(length(.value), 2L), list(NULL, "x"))
## .grad[, "x"] <- 0.5/.expr1
## attr(.value, "gradient") <- .grad
## .value
## }
Deriv(pnorm(q), "q")
## dnorm(q)
D(expression(pnorm(q)), "q")
## dnorm(q)
deriv(~pnorm(q), "q")
## expression({
## .value <- pnorm(q)
## .grad <- array(0, c(length(.value), 1L), list(NULL, c("q")))
## .grad[, "q"] <- dnorm(q)
## attr(.value, "gradient") <- .grad
## .value
## })
fnDeriv(quote(pnorm(q)), "q")
## {
## .value <- pnorm(q)
## .grad <- array(0, c(length(.value), 2L), list(NULL, "q"))
## .grad[, "q"] <- dnorm(q)
## attr(.value, "gradient") <- .grad
## .value
## }
Deriv(dnorm(x, mean), "mean")
## dnorm(x - mean) * (x - mean)
D(expression(dnorm(x, mean)), "mean")
## [1] 0
deriv(~dnorm(x, mean), "mean")
## expression({
## .value <- dnorm(x, mean)
## .grad <- array(0, c(length(.value), 1L), list(NULL, c("mean")))
## .grad[, "mean"] <- 0
## attr(.value, "gradient") <- .grad
## .value
## })
fnDeriv(quote(dnorm(x, mean)), "mean")
## {
## .expr1 <- x - mean
## .value <- dnorm(x, mean)
## .grad <- array(0, c(length(.value), 2L), list(NULL, "mean"))
## .grad[, "mean"] <- dnorm(.expr1) * .expr1
## attr(.value, "gradient") <- .grad
## .value
## }
Deriv(asin(x), "x")
## 1/sqrt(1 + x^2)
D(expression(asin(x)), "x")
## 1/sqrt(1 - x^2)
deriv(~asin(x), "x")
## expression({
## .value <- asin(x)
## .grad <- array(0, c(length(.value), 1L), list(NULL, c("x")))
## .grad[, "x"] <- 1/sqrt(1 - x^2)
## attr(.value, "gradient") <- .grad
## .value
## })
fnDeriv(quote(asin(x)), "x")
## {
## .value <- asin(x)
## .grad <- array(0, c(length(.value), 2L), list(NULL, "x"))
## .grad[, "x"] <- 1/sqrt(1 + x^2)
## attr(.value, "gradient") <- .grad
## .value
## }
Deriv(acos(x), "x")
## -1/sqrt(1 + x^2)
D(expression(acos(x)), "x")
## -(1/sqrt(1 - x^2))
deriv(~acos(x), "x")
## expression({
## .value <- acos(x)
## .grad <- array(0, c(length(.value), 1L), list(NULL, c("x")))
## .grad[, "x"] <- -(1/sqrt(1 - x^2))
## attr(.value, "gradient") <- .grad
## .value
## })
fnDeriv(quote(acos(x)), "x")
## {
## .value <- acos(x)
## .grad <- array(0, c(length(.value), 2L), list(NULL, "x"))
## .grad[, "x"] <- -1/sqrt(1 + x^2)
## attr(.value, "gradient") <- .grad
## .value
## }
Deriv(atan(x), "x")
## 1/(1 + x^2)
D(expression(atan(x)), "x")
## 1/(1 + x^2)
deriv(~atan(x), "x")
## expression({
## .value <- atan(x)
## .grad <- array(0, c(length(.value), 1L), list(NULL, c("x")))
## .grad[, "x"] <- 1/(1 + x^2)
## attr(.value, "gradient") <- .grad
## .value
## })
fnDeriv(quote(atan(x)), "x")
## {
## .value <- atan(x)
## .grad <- array(0, c(length(.value), 2L), list(NULL, "x"))
## .grad[, "x"] <- 1/(1 + x^2)
## attr(.value, "gradient") <- .grad
## .value
## }
Deriv(gamma(x), "x")
## gamma(x) * digamma(x)
D(expression(gamma(x)), "x")
## gamma(x) * digamma(x)
deriv(~gamma(x), "x")
## expression({
## .expr1 <- gamma(x)
## .value <- .expr1
## .grad <- array(0, c(length(.value), 1L), list(NULL, c("x")))
## .grad[, "x"] <- .expr1 * digamma(x)
## attr(.value, "gradient") <- .grad
## .value
## })
fnDeriv(quote(gamma(x)), "x")
## {
## .expr1 <- gamma(x)
## .value <- .expr1
## .grad <- array(0, c(length(.value), 2L), list(NULL, "x"))
## .grad[, "x"] <- .expr1 * digamma(x)
## attr(.value, "gradient") <- .grad
## .value
## }
Deriv(lgamma(x), "x")
## digamma(x)
D(expression(lgamma(x)), "x")
## digamma(x)
deriv(~lgamma(x), "x")
## expression({
## .value <- lgamma(x)
## .grad <- array(0, c(length(.value), 1L), list(NULL, c("x")))
## .grad[, "x"] <- digamma(x)
## attr(.value, "gradient") <- .grad
## .value
## })
fnDeriv(quote(lgamma(x)), "x")
## {
## .value <- lgamma(x)
## .grad <- array(0, c(length(.value), 2L), list(NULL, "x"))
## .grad[, "x"] <- digamma(x)
## attr(.value, "gradient") <- .grad
## .value
## }
Deriv(digamma(x), "x")
## trigamma(x)
D(expression(digamma(x)), "x")
## trigamma(x)
deriv(~digamma(x), "x")
## expression({
## .value <- digamma(x)
## .grad <- array(0, c(length(.value), 1L), list(NULL, c("x")))
## .grad[, "x"] <- trigamma(x)
## attr(.value, "gradient") <- .grad
## .value
## })
fnDeriv(quote(digamma(x)), "x")
## {
## .value <- digamma(x)
## .grad <- array(0, c(length(.value), 2L), list(NULL, "x"))
## .grad[, "x"] <- trigamma(x)
## attr(.value, "gradient") <- .grad
## .value
## }
Deriv(trigamma(x), "x")
## psigamma(x, 2L)
D(expression(trigamma(x)), "x")
## psigamma(x, 2L)
deriv(~trigamma(x), "x")
## expression({
## .value <- trigamma(x)
## .grad <- array(0, c(length(.value), 1L), list(NULL, c("x")))
## .grad[, "x"] <- psigamma(x, 2L)
## attr(.value, "gradient") <- .grad
## .value
## })
fnDeriv(quote(trigamma(x)), "x")
## {
## .value <- trigamma(x)
## .grad <- array(0, c(length(.value), 2L), list(NULL, "x"))
## .grad[, "x"] <- psigamma(x, 2L)
## attr(.value, "gradient") <- .grad
## .value
## }
Deriv(psigamma(x, deriv = 5), "x")
## psigamma(x, 6)
D(expression(psigamma(x, deriv = 5)), "x")
## psigamma(x, 6L)
deriv(~psigamma(x, deriv = 5), "x")
## expression({
## .value <- psigamma(x, deriv = 5)
## .grad <- array(0, c(length(.value), 1L), list(NULL, c("x")))
## .grad[, "x"] <- psigamma(x, 6L)
## attr(.value, "gradient") <- .grad
## .value
## })
fnDeriv(quote(psigamma(x, deriv = 5)), "x")
## {
## .value <- psigamma(x, deriv = 5)
## .grad <- array(0, c(length(.value), 2L), list(NULL, "x"))
## .grad[, "x"] <- psigamma(x, 6)
## attr(.value, "gradient") <- .grad
## .value
## }
Deriv(x*y, "x")
## y
D(expression(x*y), "x")
## y
deriv(~x*y, "x")
## expression({
## .value <- x * y
## .grad <- array(0, c(length(.value), 1L), list(NULL, c("x")))
## .grad[, "x"] <- y
## attr(.value, "gradient") <- .grad
## .value
## })
fnDeriv(quote(x*y), "x")
## {
## .value <- x * y
## .grad <- array(0, c(length(.value), 2L), list(NULL, "x"))
## .grad[, "x"] <- y
## attr(.value, "gradient") <- .grad
## .value
## }
Deriv(x/y, "x")
## 1/y
D(expression(x/y), "x")
## 1/y
deriv(~x/y, "x")
## expression({
## .value <- x/y
## .grad <- array(0, c(length(.value), 1L), list(NULL, c("x")))
## .grad[, "x"] <- 1/y
## attr(.value, "gradient") <- .grad
## .value
## })
fnDeriv(quote(x/y), "x")
## {
## .value <- x/y
## .grad <- array(0, c(length(.value), 2L), list(NULL, "x"))
## .grad[, "x"] <- 1/y
## attr(.value, "gradient") <- .grad
## .value
## }
Deriv(x^y, "x")
## y * x^(y - 1)
D(expression(x^y), "x")
## x^(y - 1) * y
deriv(~x^y, "x")
## expression({
## .value <- x^y
## .grad <- array(0, c(length(.value), 1L), list(NULL, c("x")))
## .grad[, "x"] <- x^(y - 1) * y
## attr(.value, "gradient") <- .grad
## .value
## })
fnDeriv(quote(x^y), "x")
## {
## .value <- x^y
## .grad <- array(0, c(length(.value), 2L), list(NULL, "x"))
## .grad[, "x"] <- y * x^(y - 1)
## attr(.value, "gradient") <- .grad
## .value
## }
Deriv((x), "x")
## [1] 1
D(expression((x)), "x")
## [1] 1
deriv(~(x), "x")
## expression({
## .value <- (x)
## .grad <- array(0, c(length(.value), 1L), list(NULL, c("x")))
## .grad[, "x"] <- 1
## attr(.value, "gradient") <- .grad
## .value
## })
fnDeriv(quote((x)), "x")
## {
## .value <- (x)
## .grad <- array(0, c(length(.value), 2L), list(NULL, "x"))
## .grad[, "x"] <- 1
## attr(.value, "gradient") <- .grad
## .value
## }
Deriv(+x, "x")
## [1] 1
D(expression(+x), "x")
## [1] 1
deriv(~ +x, "x")
## expression({
## .value <- +x
## .grad <- array(0, c(length(.value), 1L), list(NULL, c("x")))
## .grad[, "x"] <- 1
## attr(.value, "gradient") <- .grad
## .value
## })
fnDeriv(quote(+x), "x")
## {
## .value <- +x
## .grad <- array(0, c(length(.value), 2L), list(NULL, "x"))
## .grad[, "x"] <- 1
## attr(.value, "gradient") <- .grad
## .value
## }
Deriv(-x, "x")
## [1] -1
D(expression(- x), "x")
## -1
deriv(~ -x, "x")
## expression({
## .value <- -x
## .grad <- array(0, c(length(.value), 1L), list(NULL, c("x")))
## .grad[, "x"] <- -1
## attr(.value, "gradient") <- .grad
## .value
## })
fnDeriv(quote(-x), "x")
## {
## .value <- -x
## .grad <- array(0, c(length(.value), 2L), list(NULL, "x"))
## .grad[, "x"] <- -1
## attr(.value, "gradient") <- .grad
## .value
## }
Deriv(abs(x), "x")
## sign(x)
try(D(expression(abs(x)), "x")) # 'abs' not in derivatives table
try(deriv(~ abs(x), "x"))
fnDeriv(quote(abs(x)), "x")
## {
## .value <- abs(x)
## .grad <- array(0, c(length(.value), 2L), list(NULL, "x"))
## .grad[, "x"] <- sign(x)
## attr(.value, "gradient") <- .grad
## .value
## }
Deriv(sign(x), "x")
## [1] 0
try(D(expression(sign(x)), "x")) # 'sign' not in derivatives table
try(deriv(~ sign(x), "x"))
fnDeriv(quote(sign(x)), "x")
## {
## .value <- sign(x)
## .grad <- array(0, c(length(.value), 2L), list(NULL, "x"))
## .grad[, "x"] <- 0
## attr(.value, "gradient") <- .grad
## .value
## }
Notes:
the base tool deriv (and likely deriv3 ?? need to explore and explain) and nls14::fnDeriv are intended to output a function to compute a derivative. deriv generates an expression object, while fnDeriv will generate a language object. Do we need to explain this more?? Note that input to deriv is of the form of a tilde expression with no left hand side, while fnDeriv uses a quoted expression. ?? We should explore more ways to input things to these tools.
the base tool D and nls14::Deriv generate expressions, but D requires an expression, while Deriv can handle the expression without a wrapper. ?? Do we need to discuss more??
nls14 includes abs(x) and sign(x) in the derivatives table despite conventional wisdom that these are not differentiable. However, abs(x) clearly has a defined derivative everywhere except at x = 0, where assigning a value of 0 to the derivative is almost certainly acceptable in computations. Similarly for sign(x).
nls14 also includes some tools for simplification of algebraic expressions. Such simplification does not appear to be available / exposed in the
# Various simplifications
Simplify(quote(+(a+b)))
## a + b
Simplify(quote(-5))
## [1] -5
Simplify(quote(--(a+b)))
## a + b
Simplify(quote(exp(log(a+b))))
## a + b
Simplify(quote(exp(1)))
## [1] 2.7183
Simplify(quote(log(exp(a+b))))
## a + b
Simplify(quote(log(1)))
## [1] 0
Simplify(quote(!TRUE))
## [1] FALSE
Simplify(quote(!FALSE))
## [1] TRUE
Simplify(quote((a+b)))
## a + b
Simplify(quote(a + b + 0))
## a + b
Simplify(quote(0 + a + b))
## a + b
Simplify(quote((a+b) + (a+b)))
## 2 * (a + b)
Simplify(quote(1 + 4))
## [1] 5
Simplify(quote(a + b - 0))
## a + b
Simplify(quote(0 - a - b))
## -a - b
Simplify(quote((a+b) - (a+b)))
## [1] 0
Simplify(quote(5 - 3))
## [1] 2
Simplify(quote(0*(a+b)))
## [1] 0
Simplify(quote((a+b)*0))
## [1] 0
Simplify(quote(1L * (a+b)))
## a + b
Simplify(quote((a+b) * 1))
## a + b
Simplify(quote((-1)*(a+b)))
## -(a + b)
Simplify(quote((a+b)*(-1)))
## -(a + b)
Simplify(quote(2*5))
## [1] 10
Simplify(quote((a+b) / 1))
## a + b
Simplify(quote((a+b) / (-1)))
## -(a + b)
Simplify(quote(0/(a+b)))
## [1] 0
Simplify(quote(1/3))
## [1] 0.33333
Simplify(quote((a+b) ^ 1))
## a + b
Simplify(quote(2^10))
## [1] 1024
Simplify(quote(log(exp(a), 3)))
## a/1.09861228866811
Simplify(quote(FALSE && b))
## [1] FALSE
Simplify(quote(a && TRUE))
## a
Simplify(quote(TRUE && b))
## b
Simplify(quote(a || TRUE))
## [1] TRUE
Simplify(quote(FALSE || b))
## b
Simplify(quote(a || FALSE))
## a
Simplify(quote(if (TRUE) a+b))
## a + b
Simplify(quote(if (FALSE) a+b))
## NULL
Simplify(quote(if (TRUE) a+b else a*b))
## a + b
Simplify(quote(if (FALSE) a+b else a*b))
## a * b
Simplify(quote(if (cond) a+b else a+b))
## a + b
# This one was wrong...
Simplify(quote(--(a+b)))
## a + b
There is at least one other symbolic package for R. Here we look at Ryacas. The following example was provided by Gabor Grothendieck.
library(nls14)
dnls14 <- Deriv(sin(x+y), "x")
print(dnls14)
## cos(x + y)
class(dnls14)
## [1] "call"
library(Ryacas)
##
## Attaching package: 'Ryacas'
##
## The following object is masked from 'package:nls14':
##
## Simplify
x <- Sym("x")
y <- Sym("y")
dryacas <- deriv(sin(x+y), x)
print(dryacas)
## [1] "Starting Yacas!"
## Cos(x+y);
class(dryacas)
## [1] "Sym" "character"
We use the Hobbs Weeds problem (Nash, 1979 and Nash, 2014). Note that nls() fails from start1.
require(nls14)
traceval <- FALSE
# Data for Hobbs problem
ydat <- c(5.308, 7.24, 9.638, 12.866, 17.069, 23.192, 31.443,
38.558, 50.156, 62.948, 75.995, 91.972) # for testing
tdat <- seq_along(ydat) # for testing
# A simple starting vector -- must have named parameters for nls14xb, nls, wrap14nls.
start1 <- c(b1=1, b2=1, b3=1)
eunsc <- y ~ b1/(1+b2*exp(-b3*tt))
cat("LOCAL DATA IN DATA FRAMES\n")
## LOCAL DATA IN DATA FRAMES
weeddata1 <- data.frame(y=ydat, tt=tdat)
weeddata2 <- data.frame(y=1.5*ydat, tt=tdat)
anls14xb1 <- try(nls14xb(eunsc, start=start1, trace=TRUE, data=weeddata1,
control=list(watch=FALSE)))
## formula: y ~ b1/(1 + b2 * exp(-b3 * tt))
## lower:[1] -Inf -Inf -Inf
## upper:[1] Inf Inf Inf
## $watch
## [1] FALSE
##
## $phi
## [1] 1
##
## $lamda
## [1] 1e-04
##
## $offset
## [1] 100
##
## $laminc
## [1] 10
##
## $lamdec
## [1] 4
##
## $femax
## [1] 10000
##
## $jemax
## [1] 5000
##
## $rofftest
## [1] TRUE
##
## $smallsstest
## [1] TRUE
##
## vn:[1] "y" "b1" "b2" "b3" "tt"
## datvar:[1] "y" "tt"
## Data variable y : [1] 5.308 7.240 9.638 12.866 17.069 23.192 31.443 38.558 50.156 62.948
## [11] 75.995 91.972
## Data variable tt : [1] 1 2 3 4 5 6 7 8 9 10 11 12
## Finished masks check
## model2rjfun: modelformula = y ~ b1/(1 + b2 * exp(-b3 * tt))
## [1] "formula"
## trjfn:
## function (prm)
## {
## if (is.null(names(prm)))
## names(prm) <- names(pvec)
## localdata <- list2env(as.list(prm), parent = data)
## eval(residexpr, envir = localdata)
## }
## <environment: 0x20de638>
## Starting pnum=b1 b2 b3
## 1 1 1
## Start:lamda: 1e-04 SS= 23521 at b1 = 1 b2 = 1 b3 = 1 1 / 0
## lamda: 0.001 SS= 24356 at b1 = 50.886 b2 = -240.72 b3 = -372.91 2 / 1
## lamda: 0.01 SS= 24356 at b1 = 50.183 b2 = -190.69 b3 = -334.2 3 / 1
## lamda: 0.1 SS= 24356 at b1 = 46.97 b2 = -25.309 b3 = -194.81 4 / 1
## lamda: 1 SS= 24356 at b1 = 38.096 b2 = 29.924 b3 = -64.262 5 / 1
## lamda: 10 SS= 24353 at b1 = 18.798 b2 = 3.551 b3 = -2.9011 6 / 1
## <<lamda: 4 SS= 21065 at b1 = 4.1015 b2 = 0.86688 b3 = 1.3297 7 / 1
## <<lamda: 1.6 SS= 16852 at b1 = 10.248 b2 = 1.2302 b3 = 1.0044 8 / 2
## lamda: 16 SS= 24078 at b1 = 20.944 b2 = 2.9473 b3 = -0.40959 9 / 3
## <<lamda: 6.4 SS= 15934 at b1 = 11.771 b2 = 1.3084 b3 = 0.98087 10 / 3
## <<lamda: 2.56 SS= 14050 at b1 = 15.152 b2 = 1.6468 b3 = 0.80462 11 / 4
## <<lamda: 1.024 SS= 10985 at b1 = 22.005 b2 = 2.6632 b3 = 0.45111 12 / 5
## <<lamda: 0.4096 SS= 6427.1 at b1 = 35.128 b2 = 4.7135 b3 = 0.50974 13 / 6
## <<lamda: 0.16384 SS= 4725 at b1 = 52.285 b2 = 9.2948 b3 = 0.31348 14 / 7
## <<lamda: 0.065536 SS= 1132.2 at b1 = 76.446 b2 = 15.142 b3 = 0.41095 15 / 8
## <<lamda: 0.026214 SS= 518.76 at b1 = 96.924 b2 = 24.422 b3 = 0.35816 16 / 9
## <<lamda: 0.010486 SS= 79.464 at b1 = 122.18 b2 = 35.262 b3 = 0.36484 17 / 10
## <<lamda: 0.0041943 SS= 26.387 at b1 = 141.55 b2 = 42.387 b3 = 0.35215 18 / 11
## <<lamda: 0.0016777 SS= 11.339 at b1 = 160.02 b2 = 45.519 b3 = 0.33738 19 / 12
## <<lamda: 0.00067109 SS= 5.2767 at b1 = 176.92 b2 = 47.037 b3 = 0.32443 20 / 13
## <<lamda: 0.00026844 SS= 2.9656 at b1 = 189.12 b2 = 48.279 b3 = 0.31712 21 / 14
## <<lamda: 0.00010737 SS= 2.6003 at b1 = 194.72 b2 = 48.92 b3 = 0.31429 22 / 15
## <<lamda: 4.295e-05 SS= 2.5873 at b1 = 196.04 b2 = 49.075 b3 = 0.31364 23 / 16
## <<lamda: 1.718e-05 SS= 2.5873 at b1 = 196.18 b2 = 49.091 b3 = 0.31357 24 / 17
## <<lamda: 6.8719e-06 SS= 2.5873 at b1 = 196.19 b2 = 49.092 b3 = 0.31357 25 / 18
print(anls14xb1)
## nls14 class object: x
## residual sumsquares = 2.5873 on 12 observations
## after 18 Jacobian and 25 function evaluations
## name coeff SE tstat pval gradient JSingval
## b1 196.186 11.31 17.35 3.164e-08 -2.663e-07 1011
## b2 49.0916 1.688 29.08 3.281e-10 1.59e-07 0.4605
## b3 0.31357 0.006863 45.69 5.768e-12 -5.531e-05 0.04715
anlsb1 <- try(nls(eunsc, start=start1, trace=TRUE, data=weeddata1))
## 23521 : 1 1 1
print(anlsb1)
## [1] "Error in nls(eunsc, start = start1, trace = TRUE, data = weeddata1) : \n singular gradient\n"
## attr(,"class")
## [1] "try-error"
## attr(,"condition")
## <simpleError in nls(eunsc, start = start1, trace = TRUE, data = weeddata1): singular gradient>
A different start causes nls14xb to return a large sum of squares. Note that nls() again fails.
startf1 <- c(b1=1, b2=1, b3=.1)
anlsf1 <- try(nls(eunsc, start=startf1, trace=TRUE, data=weeddata1))
## 23756 : 1.0 1.0 0.1
print(anlsf1)
## [1] "Error in nls(eunsc, start = startf1, trace = TRUE, data = weeddata1) : \n singular gradient\n"
## attr(,"class")
## [1] "try-error"
## attr(,"condition")
## <simpleError in nls(eunsc, start = startf1, trace = TRUE, data = weeddata1): singular gradient>
anls14xf1 <- try(nls14xb(eunsc, start=startf1, trace=TRUE, data=weeddata1,
control=list(watch=FALSE)))
## formula: y ~ b1/(1 + b2 * exp(-b3 * tt))
## lower:[1] -Inf -Inf -Inf
## upper:[1] Inf Inf Inf
## $watch
## [1] FALSE
##
## $phi
## [1] 1
##
## $lamda
## [1] 1e-04
##
## $offset
## [1] 100
##
## $laminc
## [1] 10
##
## $lamdec
## [1] 4
##
## $femax
## [1] 10000
##
## $jemax
## [1] 5000
##
## $rofftest
## [1] TRUE
##
## $smallsstest
## [1] TRUE
##
## vn:[1] "y" "b1" "b2" "b3" "tt"
## datvar:[1] "y" "tt"
## Data variable y : [1] 5.308 7.240 9.638 12.866 17.069 23.192 31.443 38.558 50.156 62.948
## [11] 75.995 91.972
## Data variable tt : [1] 1 2 3 4 5 6 7 8 9 10 11 12
## Finished masks check
## model2rjfun: modelformula = y ~ b1/(1 + b2 * exp(-b3 * tt))
## [1] "formula"
## trjfn:
## function (prm)
## {
## if (is.null(names(prm)))
## names(prm) <- names(pvec)
## localdata <- list2env(as.list(prm), parent = data)
## eval(residexpr, envir = localdata)
## }
## <environment: 0x32405f0>
## Starting pnum= b1 b2 b3
## 1.0 1.0 0.1
## Start:lamda: 1e-04 SS= 23756 at b1 = 1 b2 = 1 b3 = 0.1 1 / 0
## lamda: 0.001 SS= 24356 at b1 = 416.72 b2 = 821.43 b3 = -40.851 2 / 1
## lamda: 0.01 SS= 196562 at b1 = 162.93 b2 = 368.16 b3 = 7.5932 3 / 1
## <<lamda: 0.004 SS= 10597 at b1 = 24.764 b2 = 108.11 b3 = 31.926 4 / 1
## <<lamda: 0.0016 SS= 9205.5 at b1 = 35.486 b2 = 108.11 b3 = 31.926 5 / 2
## <<lamda: 0.00064 SS= 9205.4 at b1 = 35.532 b2 = 108.11 b3 = 31.926 6 / 3
## <<lamda: 0.000256 SS= 9205.4 at b1 = 35.532 b2 = 108.11 b3 = 31.926 7 / 4
print(anls14xf1)
## nls14 class object: x
## residual sumsquares = 9205.4 on 12 observations
## after 4 Jacobian and 7 function evaluations
## name coeff SE tstat pval gradient JSingval
## b1 35.5321 NA NA NA -6.684e-07 3.464
## b2 108.109 NA NA NA -1.464e-11 5.015e-11
## b3 31.9262 NA NA NA 1.583e-09 6.427e-27
# anls14xb2 <- try(nls14xb(eunsc, start=start1, trace=FALSE, data=weeddata2))
# print(anls14xb2)
We can discover quickly the difficulty here by computing the Jacobian at this “solution” and checking its singular values.
cf1 <- coef(anls14xf1)
print(cf1)
## b1 b2 b3
## 35.532 108.109 31.926
## attr(,"pkgname")
## [1] "nls14"
jf1 <- anls14xf1$jacobian
svals <- svd(jf1)$d
print(svals)
## [1] 3.4641e+00 5.0146e-11 6.4267e-27
Here we see that the Jacobian is only rank 1, even though there are 3 coefficients. It is therefore not surprising that our nonlinear least squares program has concluded we are unable to make further progress.
We can run the same example as above using R functions rather than expressions, but now we need to have a gradient function as well as one to compute residuals. nls14 has tools to create these functions from expressions, as we shall see here. First we again set up the data and load the package.
require(nls14)
traceval <- FALSE
# Data for Hobbs problem
ydat <- c(5.308, 7.24, 9.638, 12.866, 17.069, 23.192, 31.443,
38.558, 50.156, 62.948, 75.995, 91.972) # for testing
tdat <- seq_along(ydat) # for testing
# A simple starting vector -- must have named parameters for nls14xb, nls, wrap14nls.
start1 <- c(b1=1, b2=1, b3=1)
eunsc <- y ~ b1/(1+b2*exp(-b3*tt))
cat("LOCAL DATA IN DATA FRAMES\n")
## LOCAL DATA IN DATA FRAMES
weeddata1 <- data.frame(y=ydat, tt=tdat)
weedrj <- model2rjfun(modelformula=eunsc, pvec=start1, data=weeddata1)
## model2rjfun: modelformula = y ~ b1/(1 + b2 * exp(-b3 * tt))
## [1] "formula"
weedrj
## function (prm)
## {
## if (is.null(names(prm)))
## names(prm) <- names(pvec)
## localdata <- list2env(as.list(prm), parent = data)
## eval(residexpr, envir = localdata)
## }
## <environment: 0x2a44e98>
modeldoc(weedrj) # Note how useful this is to report status
## FUNCTION weedrj
## Formula: y ~ b1/(1 + b2 * exp(-b3 * tt))
## Code: expression({
## .expr3 <- exp(-b3 * tt)
## .expr5 <- 1 + b2 * .expr3
## .expr10 <- .expr5^2
## .value <- b1/.expr5 - y
## .grad <- array(0, c(length(.value), 3L), list(NULL, c("b1",
## "b2", "b3")))
## .grad[, "b1"] <- 1/.expr5
## .grad[, "b2"] <- -(b1 * .expr3/.expr10)
## .grad[, "b3"] <- b1 * (b2 * (.expr3 * tt))/.expr10
## attr(.value, "gradient") <- .grad
## .value
## })
## Parameters: b1, b2, b3
## Data: y, tt
##
## VALUES
## Observations: 12
## Parameters:
## b1 b2 b3
## 1 1 1
## Data (length 12):
## y tt
## 1 5.308 1
## 2 7.240 2
## 3 9.638 3
## 4 12.866 4
## 5 17.069 5
## 6 23.192 6
## 7 31.443 7
## 8 38.558 8
## 9 50.156 9
## 10 62.948 10
## 11 75.995 11
## 12 91.972 12
y <- c(5.308, 7.24, 9.638, 12.866, 17.069, 23.192, 31.443, 38.558,
50.156, 62.948, 75.995, 91.972)
tt <- seq_along(y) # for testing
mydata <- data.frame(y = y, tt = tt)
f <- y ~ 100*b1/(1 + 10*b2 * exp(-0.1 * b3 * tt))
p <- c(b1 = 1, b2 = 1, b3 = 1)
rjfn <- model2rjfun(f, p, data = mydata)
## model2rjfun: modelformula = y ~ 100 * b1/(1 + 10 * b2 * exp(-0.1 * b3 * tt))
## [1] "formula"
rjfn(p)
## [1] 4.64386 3.64458 2.25518 0.11562 -2.91533 -7.77921 -14.68094
## [8] -20.35397 -30.41538 -41.57497 -52.89343 -67.04642
## attr(,"gradient")
## b1 b2 b3
## [1,] 9.9519 -8.9615 0.89615
## [2,] 10.8846 -9.6998 1.93997
## [3,] 11.8932 -10.4787 3.14361
## [4,] 12.9816 -11.2964 4.51856
## [5,] 14.1537 -12.1504 6.07520
## [6,] 15.4128 -13.0373 7.82235
## [7,] 16.7621 -13.9524 9.76668
## [8,] 18.2040 -14.8902 11.91213
## [9,] 19.7406 -15.8437 14.25933
## [10,] 21.3730 -16.8050 16.80496
## [11,] 23.1016 -17.7647 19.54122
## [12,] 24.9256 -18.7127 22.45528
myfn <- function(p, resfn=rjfn){
val <- resss(p, resfn)
}
p <- c(b1 = 2, b2=2, b3=1)
a1 <- optim(p, myfn, control=list(trace=0))
a1
## $par
## b1 b2 b3
## 1.9618 4.9092 3.1358
##
## $value
## [1] 2.5873
##
## $counts
## function gradient
## 200 NA
##
## $convergence
## [1] 0
##
## $message
## NULL
We can also embed the function directly.
a2 <- optim(p, function(p,resfn=rjfn){resss(p,resfn)}, control=list(trace=0))
a2
## $par
## b1 b2 b3
## 1.9618 4.9092 3.1358
##
## $value
## [1] 2.5873
##
## $counts
## function gradient
## 200 NA
##
## $convergence
## [1] 0
##
## $message
## NULL
One of the common needs for optimization computations is the availability of accurate gradients of the objective functions. While differentiation is relatively straightforward, it is tedious and error-prone.
At 2015-1-24, I have not determined how to automate the use the output of the derivatives generated by nls14 to create a working gradient function. However, the following write-up shows how such a function can be generated in a semi-automatic way.
We use an example that appeared on the R-help mailing list on Jan 14, 2015. Responses by Ravi Varadhan and others, along with some modification I made, gave the following negative log likelihood function to be minimized.
y=c(5,11,21,31,46,75,98,122,145,165,195,224,245,293,321,330,350,420) # data set
Nweibull2 <- function(x,prm){
la <- prm[1]
al <- prm[2]
be<- prm[3]
val2 <- la*be*(x/al)^(be-1)* exp( (x/al)^be+la*al*(1-exp((x/al)^be) ) )
val2
}
LL2J <- function(par,y) {
R = Nweibull2(y,par)
-sum(log(R))
}
We want the gradient of LL3J() with respect to par, and first compute the derivatives of Nweibull2() with respect to the paramters prm
We start with the central expression in Nweibull2() and compute its partial derivatives. The expression is:
la*be*(x/al)^(be-1)* exp( (x/al)^be+la*al*(1-exp((x/al)^be) ) )
# Put in the main expression for the Nweibull pdf.
require(nls14)
## we generate the three gradient components
g1n <- Deriv(la*be*(x/al)^(be-1)* exp( (x/al)^be+la*al*(1-exp((x/al)^be) ) ), "la")
g1n
## la * be * (x/al)^(be - 1) * (exp((x/al)^be + la * al * (1 - exp((x/al)^be))) *
## (al * (1 - exp((x/al)^be)))) + be * (x/al)^(be - 1) * exp((x/al)^be +
## la * al * (1 - exp((x/al)^be)))
g2n <- Deriv(la*be*(x/al)^(be-1)* exp( (x/al)^be+la*al*(1-exp((x/al)^be) ) ), "al")
g2n
## la * be * (x/al)^(be - 1) * (exp((x/al)^be + la * al * (1 - exp((x/al)^be))) *
## (be * (x/al)^(be - 1) * -(x/al^2) + (la * al * -(exp((x/al)^be) *
## (be * (x/al)^(be - 1) * -(x/al^2))) + la * (1 - exp((x/al)^be))))) +
## la * be * ((be - 1) * (x/al)^(be - 1 - 1) * -(x/al^2)) *
## exp((x/al)^be + la * al * (1 - exp((x/al)^be)))
g3n <- Deriv(la*be*(x/al)^(be-1)* exp( (x/al)^be+la*al*(1-exp((x/al)^be) ) ), "be")
g3n
## la * be * (x/al)^(be - 1) * (exp((x/al)^be + la * al * (1 - exp((x/al)^be))) *
## ((x/al)^be * log(x/al) + la * al * -(exp((x/al)^be) * ((x/al)^be *
## log(x/al))))) + (la * be * ((x/al)^(be - 1) * log(x/al)) +
## la * (x/al)^(be - 1)) * exp((x/al)^be + la * al * (1 - exp((x/al)^be)))
By copying and pasting the output above into a function structure, we get Nwei2g() below.
Nwei2g <- function(x, prm){
la <- prm[1]
al <- prm[2]
be<- prm[3]
g1v <- la * be * (x/al)^(be - 1) * (exp((x/al)^be + la * al * (1 - exp((x/al)^be))) *
(al * (1 - exp((x/al)^be)))) + be * (x/al)^(be - 1) * exp((x/al)^be +
la * al * (1 - exp((x/al)^be)))
g2v <- la * be * (x/al)^(be - 1) * (exp((x/al)^be + la * al * (1 - exp((x/al)^be))) *
(be * (x/al)^(be - 1) * -(x/al^2) + (la * al * -(exp((x/al)^be) *
(be * (x/al)^(be - 1) * -(x/al^2))) + la * (1 - exp((x/al)^be))))) +
la * be * ((be - 1) * (x/al)^(be - 1 - 1) * -(x/al^2)) *
exp((x/al)^be + la * al * (1 - exp((x/al)^be)))
g3v <- la * be * (x/al)^(be - 1) * (exp((x/al)^be + la * al * (1 - exp((x/al)^be))) *
((x/al)^be * log(x/al) + la * al * -(exp((x/al)^be) * ((x/al)^be *
log(x/al))))) + (la * be * ((x/al)^(be - 1) * log(x/al)) +
la * (x/al)^(be - 1)) * exp((x/al)^be + la * al * (1 - exp((x/al)^be)))
gg <- matrix(data=c(g1v, g2v, g3v), ncol=3)
}
We can check this gradient functionusing the grad() function from package numDeriv.
start1 <- c(lambda=.01,alpha=340,beta=.8)
start2 <- c(lambda=.01,alpha=340,beta=.7)
require(numDeriv)
## Loading required package: numDeriv
ganwei <- Nwei2g(y, prm=start1)
require(numDeriv)
Nw <- function(x, y) {
Nweibull2(y, x)
} # to allow grad() to work
gnnwei <- matrix(NA, nrow=length(y), ncol=3)
for (i in 1:length(y)){
gnrow <- grad(Nw, x=start1, y=y[i])
gnnwei[i,] <- gnrow
}
gnnwei
## [,1] [,2] [,3]
## [1,] 1.5080091 7.5763e-06 -4.4573e-02
## [2,] 1.0470119 4.3477e-06 -2.1663e-02
## [3,] 0.6473921 1.6572e-06 -7.3707e-03
## [4,] 0.4021585 1.7784e-07 -9.4940e-04
## [5,] 0.1635446 -1.0061e-06 3.5893e-03
## [6,] -0.0815624 -1.6713e-06 6.0571e-03
## [7,] -0.1689654 -1.5386e-06 5.8709e-03
## [8,] -0.2048835 -1.1819e-06 5.0149e-03
## [9,] -0.2077443 -7.9701e-07 4.0222e-03
## [10,] -0.1955391 -4.9171e-07 3.1859e-03
## [11,] -0.1644138 -1.3523e-07 2.1110e-03
## [12,] -0.1297028 8.2595e-08 1.3249e-03
## [13,] -0.1055059 1.7196e-07 9.0488e-04
## [14,] -0.0598567 2.2613e-07 3.1939e-04
## [15,] -0.0406518 2.0242e-07 1.4872e-04
## [16,] -0.0355949 1.9079e-07 1.1187e-04
## [17,] -0.0261120 1.6182e-07 5.3031e-05
## [18,] -0.0075317 6.8245e-08 -1.1995e-05
ganwei
## [,1] [,2] [,3]
## [1,] 1.5080091 7.5763e-06 -4.4573e-02
## [2,] 1.0470119 4.3477e-06 -2.1663e-02
## [3,] 0.6473921 1.6572e-06 -7.3707e-03
## [4,] 0.4021585 1.7784e-07 -9.4940e-04
## [5,] 0.1635446 -1.0061e-06 3.5893e-03
## [6,] -0.0815624 -1.6713e-06 6.0571e-03
## [7,] -0.1689654 -1.5386e-06 5.8709e-03
## [8,] -0.2048835 -1.1819e-06 5.0149e-03
## [9,] -0.2077443 -7.9701e-07 4.0222e-03
## [10,] -0.1955391 -4.9171e-07 3.1859e-03
## [11,] -0.1644138 -1.3523e-07 2.1110e-03
## [12,] -0.1297028 8.2595e-08 1.3249e-03
## [13,] -0.1055059 1.7196e-07 9.0488e-04
## [14,] -0.0598567 2.2613e-07 3.1939e-04
## [15,] -0.0406518 2.0242e-07 1.4872e-04
## [16,] -0.0355949 1.9079e-07 1.1187e-04
## [17,] -0.0261120 1.6182e-07 5.3031e-05
## [18,] -0.0075317 6.8245e-08 -1.1995e-05
cat("max(abs(gnnwei - ganwei))= ", max(abs(gnnwei - ganwei)),"\n")
## max(abs(gnnwei - ganwei))= 2.8041e-11
Now we can build the gradient of the objective function. This requires an application of the chain rule to the summation of logs of the elements of the quantity R. Since the derivative of log(R) w.r.t. R is simply 1/R, this is relatively simple. However, I have not found how to automate this.
## and now we can build the gradient of LL2J
LL2Jg <- function(prm, y) {
R = Nweibull2(y,prm)
gNN <- Nwei2g(y, prm)
# print(str(gNN)
gL <- - as.vector( t(1/R) %*% gNN)
}
# test
gaLL2J <- LL2Jg(start1, y)
gaLL2J
## [1] 3365.78244 -0.01441 -18.63351
gnLL2J <- grad(LL2J, start1, y=y)
gnLL2J
## [1] 3365.78244 -0.01441 -18.63351
cat("max(abs(gaLL2J-gnLL2J))= ", max(abs(gaLL2J-gnLL2J)), "\n" )
## max(abs(gaLL2J-gnLL2J))= 1.8254e-08
One of the key tasks with tools for derivatives is that of taking objects in one or other form (that is, R class) and using it as an input for a symbolic function. The object may, of course, be an output from another such function, and this is one of the reasons we need to do such transformations.
We also note that the different tools for symbolic derivatives use slightly different inputs. For example, for the derivative of log(x), we have
require(nls14)
dlogx <- nls14::Deriv(log(x), "x")
str(dlogx)
## language 1/x
print(dlogx)
## 1/x
Unfortunately, there are complications when we have an expression object, and we need to specify that we do NOT execute the substitute() function. Here we show how to do this implicitly and with an explicit object.
dlogxs <- nls14::Deriv(expression(log(x)), "x", do_substitute=FALSE)
str(dlogxs)
## expression(1/x)
print(dlogxs)
## expression(1/x)
cat(as.character(dlogxs), "\n")
## 1/x
fne <- expression(log(x))
dlogxe <- Deriv(fne, "x", do_substitute=FALSE)
str(dlogxe)
## expression(1/x)
print(dlogxe)
## expression(1/x)
# base R
dblogx <- D(expression(log(x)), "x")
str(dblogx)
## language 1/x
print(dblogx)
## 1/x
require(Deriv)
## Loading required package: Deriv
##
## Attaching package: 'Deriv'
##
## The following object is masked from 'package:Ryacas':
##
## Simplify
##
## The following objects are masked from 'package:nls14':
##
## Deriv, Simplify
ddlogx <- Deriv::Deriv(expression(log(x)), "x")
str(ddlogx)
## expression(1/x)
print(ddlogx)
## expression(1/x)
cat(as.character(ddlogx), "\n")
## 1/x
ddlogxf <- ~ ddlogx
str(ddlogxf)
## Class 'formula' length 2 ~ddlogx
## ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
Erin Hodgess on R-help in January 2015 raised the issue of taking the derivative of an expression that contains an indexed variable. We show the example and its resolution, then give an explanation.
zzz <- expression(y[3]*r1 + r2)
try(deriv(zzz,c("r1","r2")))
require(nls14)
try(nls14::Deriv(zzz, c("r1","r2")))
## Warning in if (as.character(expr) == name) return(1) else return(0): the
## condition has length > 1 and only the first element will be used
## [1] 0
try(fnDeriv(zzz, c("r1","r2")))
newDeriv(`[`(x,y), stop("no derivative when indexing"))
try(nls14::Deriv(zzz, c("r1","r2")))
## Warning in if (as.character(expr) == name) return(1) else return(0): the
## condition has length > 1 and only the first element will be used
## [1] 0
try(nls14::fnDeriv(zzz, c("r1","r2")))
## {
## .expr1 <- y[3]
## .expr2 <- stop("no derivative when indexing")
## .expr3 <- .expr2 * r1
## .value <- .expr1 * r1 + r2
## .grad <- array(0, c(length(.value), 2L), list(NULL, c("r1",
## "r2")))
## .grad[, "r1"] <- .expr1 + .expr3
## .grad[, "r2"] <- .expr3 + 1
## attr(.value, "gradient") <- .grad
## .value
## }
Richard Heiberger pointed out that internally, R stores
y[3]
as
"["(y,3)
that is, as a function. Duncan Murdoch pointed out the availability of nls14 and the use of newDeriv() to redefine the “[” function for the purposes of derivatives.
This is not an ideal resolution, especially as we would like to be able to get the gradients of functions with respect to vectors of parameters (Noted also by Sergei Sokol in the manual for package Deriv). The following examples illustrate this.
try(nls14::Deriv(zzz, "y[3]"))
## [1] 0
try(nls14::Deriv(y3*r1+r2,"y3"))
## r1
try(nls14::Deriv(y[3]*r1+r2,"y[3]"))
## stop("no derivative when indexing") * r1