The distfromq
package provides functions for
constructing an approximation to a probability distribution from a
finite set of quantiles of the distribution. The package provides four
main functions:
make_p_fn
returns a function that is analogous to
pnorm
, and can be used to evaluate the approximated
cumulative distribution function (CDF).make_d_fn
returns a function that is analogous to
dnorm
, and can be used to evaluate the approximated
probability density function (PDF).make_q_fn
returns a function that is analogous to
qnorm
, and can be used to evaluate the approximated
quantile function (QF), i.e., the inverse CDF.make_r_fn
returns a function that is analogous to
rnorm
, and can be used to simulate a pseudo-random sample
from the approximated distribution.All four functions take a set of probability levels (ps
)
and the corresponding quantiles (qs
) of the distribution to
be approximated as inputs, as well as arguments specifying the methods
for interpolating those quantiles to estimate the interior of the
distribution, and for extrapolating into the tails of the distribution
using a specified parametric family.
The next section of the vignette describes the methods that are used for approximating the distribution, and the following section gives examples.
Suppose that we have a collection of pairs \((q_k, \tau_k)\), \(k = 1, \ldots, K\), where for each \(k\), \(\tau_k = F(q_k)\) or \(q_k = F^{-1}(\tau_k)\) for some distribution with CDF \(F\) and quantile function \(F^{-1}\). We allow for the distribution to have regions with no density or with discrete point masses, corresponding to regions where \(F\) is flat or has a vertical step. In these instances, the points \((q_k, \tau_k)\) may fall within the flat region or partway up the vertical step, as will be illustrated below. Our goal is to reconstruct an estimate of \(F\), the corresponding PDF \(f\) (if the distribution is continuous), and the QF \(F^{-1}\), as well as a method for sampling from the distribution.
Our approach first splits the provided quantiles into a discrete part of the distribution and a continuous part. Then, for the continuous part it takes different strategies for interpolating on the interior of the provided quantiles and for extrapolating into the tails. We outline these procedures briefly here and give more detail below:
dup_tol
controls identification of quantiles that are
“close enough” to identify a point mass.zero_tol
controls identification of quantiles
that are “close enough to zero” to identify a point mass.To estimate the quantile function (QF), we invert the estimate of the CDF. To estimate the probability density function (PDF), we use the derivative of the spline-based approximation to the CDF if there are no discrete components. This function does not support estimation of a density function if the distribution has any discrete components.
The continuous distribution is estimated based on a set of adjusted quantile pairs that is obtained by eliminating duplicate quantiles, subtracting the cumulative probabilities associated with the discrete distribution, and renormalizing by the total probability mass associated with the discrete distribution. To approximate the CDF of the continuous distribution component at interior points, we fit a monotonic cubic Hermite spline that interpolates the set of “observations” \(\{(q_1, \tau_1), \ldots, (q_K, \tau_K)\}\) (where these now denote the adjusted quantiles for the continuous distribution component). This spline fit yields a “stage 1” CDF estimate, \(\hat{F}^{(1)}\).
To address issues with numerical stability in inverting the spline in the process of obtaining an estimate of the QF, we evaluate this stage 1 estimate at a grid of quantile values that are inserted between each consecutive pair of input quantiles (the number of new quantile values to use is an argument to the functions). We then use a “stage 2” CDF estimate that linearly interpolates this augmented grid of \((q, \tau)\) pairs. This behavior can be disabled if desired, though this is not recommended.
We estimate the distribution for the left and right tails separately, assuming that they come from a specified location-scale family. The distributional family is assumed to be the same in each tail, but the location and scale parameters may differ in the left and right tails. Setting notation, suppose that \(Y = a + b \cdot Z\) where the random variable \(Z\) has a specified distribution. Recall that at the probability level \(\tau\), a quantile of \(Y\) can be calculated in terms of the corresponding quantile of \(Z\) via \(F_Y^{-1}(\tau) = a + b \cdot F_Z^{-1}(\tau)\). Using the quantiles at two probability levels \(\tau_i\) and \(\tau_j\), we can calculate the value of \(b\) using
\[\begin{align*} \frac{F_Y^{-1}(\tau_i) - F_Y^{-1}(\tau_j)}{F_Z^{-1}(\tau_i) - F_Z^{-1}(\tau_j)} &= \frac{a + b \cdot F_Z^{-1}(\tau_i) - (a + b \cdot F_Z^{-1}(\tau_j))} {F_Z^{-1}(\tau_i) - F_Z^{-1}(\tau_j)} \\ &= b \cdot \frac{F_Z^{-1}(\tau_i) - F_Z^{-1}(\tau_j)}{F_Z^{-1}(\tau_i) - F_Z^{-1}(\tau_j)} \\ &= b \end{align*}\]
Similarly, we can calculate the value of \(a\) as
\[F_Y^{-1}(\tau_i) - b \cdot F_Z^{-1}(\tau_i) = a + b \cdot F_Z^{-1}(\tau_i) - b \cdot F_Z^{-1}(\tau_i) = a\]
In the above expressions, we use the two smallest quantiles when estimating the lower tail and the two largest quantiles when estimating the upper tail. With these choices, by construction the lower tail integrates to \(\tau_1\) on the interval \((-\infty, q_1]\) and the upper tail integrates to \(1 - \tau_K\) on the interval \([q_K, \infty)\).
As an alternative, we also allow for the use of a log-normal distribution for extrapolation into the tails. The parameters of the log-normal distribution are estimated by applying the approach outlined above to the logarithms of the quantiles.
To estimate the QF, we invert the CDF estimate, allowing for jumps in
regions where the CDF is flat. As noted above, this can be numerically
unstable if n_grid = NULL
(thus using the spline fit
directly), so it is recommended that the n_grid
parameter
be set to an integer if QF estimates are desired.
To estimate the PDF in instances where the CDF estimate corresponds to a continuous distribution, we differentiate the CDF estimate. If the stage 2 CDF estimate is used, this yields a “histogram density” estimate, which is discontinuous and piecewise constant. If this presents a challenge, the grid size can be increased or the stage 1 CDF estimate used instead.
Random deviates are obtained by sampling \(U \sim \text{Unif}(0,1)\) and then evaluating the approximated quantile function: \(Y = F_Y^{-1}(U)\).
We illustrate the package functionality with a series of four examples. The first example is a setting where the distribution is for a continuous random variable with a defined probability density function. The remaining examples illustrate the behavior of the functions in settings where the provided quantiles are the same at multiple probability levels or a lognormal distribution is used for tail extrapolation and there is at least one quantile at zero, indicating that there is some discrete component to the distribution.
library(distfromq)
library(ggplot2)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
In our first example, we work with quantiles from the distribution \(Y \sim \text{log normal}(4, 0.5)\). We compare this predictive distribution with three approximations of it that are derived from the quantiles using different assumptions for the tail behavior:
quantile_probs <- seq(from = 0.1, to = 0.9, by = 0.1)
meanlog <- 4.0
sdlog <- 0.5
q_lognormal <- qlnorm(quantile_probs, meanlog = meanlog, sdlog = sdlog)
x <- seq(from = 0.0, to = 400.0, length = 501)
cdf_lognormal <- plnorm(x, meanlog = meanlog, sdlog = sdlog)
p_lognormal_approx <- make_p_fn(ps = quantile_probs,
qs = q_lognormal,
tail_dist = "lnorm")
cdf_lognormal_approx <- p_lognormal_approx(x)
# note that `tail_dist = "norm"` is the default; we specify it here for clarity
p_normal_approx <- make_p_fn(ps = quantile_probs,
qs = q_lognormal,
tail_dist = "norm")
cdf_normal_approx <- p_normal_approx(x)
p_cauchy_approx <- make_p_fn(ps = quantile_probs,
qs = q_lognormal,
tail_dist = "cauchy")
cdf_cauchy_approx <- p_cauchy_approx(x)
dplyr::bind_rows(
data.frame(
x = x,
y = cdf_lognormal,
dist = "Log normal"
),
data.frame(
x = x,
y = cdf_lognormal_approx,
dist = "Spline interpolation,\nlog-normal tails"
),
data.frame(
x = x,
y = cdf_normal_approx,
dist = "Spline interpolation,\nnormal tails"
),
data.frame(
x = x,
y = cdf_cauchy_approx,
dist = "Spline interpolation,\nCauchy tails"
)
) %>%
ggplot() +
geom_line(
mapping = aes(x = x, y = y, color = dist, linetype = dist),
size = 0.8
) +
geom_point(
data = data.frame(q = q_lognormal, p = quantile_probs),
mapping = aes(x = q, y = p),
size = 1.2
) +
scale_color_viridis_d(
"Distribution",
end = 0.9
) +
scale_linetype_discrete("Distribution") +
ylab("Probability") +
xlab("") +
theme_bw()
#> Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
#> ℹ Please use `linewidth` instead.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.
The following illustrates differences in the density estimates.
d_lognormal_approx <- make_d_fn(ps = quantile_probs,
qs = q_lognormal,
tail_dist = "lnorm")
d_normal_approx <- make_d_fn(ps = quantile_probs,
qs = q_lognormal,
tail_dist = "norm")
d_cauchy_approx <- make_d_fn(ps = quantile_probs,
qs = q_lognormal,
tail_dist = "cauchy")
pdf_lognormal <- dlnorm(x, meanlog = meanlog, sdlog = sdlog)
pdf_lognormal_approx <- d_lognormal_approx(x)
pdf_normal_approx <- d_normal_approx(x)
pdf_cauchy_approx <- d_cauchy_approx(x)
dplyr::bind_rows(
data.frame(
x = x,
y = pdf_lognormal,
dist = "Log normal"
),
data.frame(
x = x,
y = pdf_lognormal_approx,
dist = "Spline interpolation,\nlog-normal tails"
),
data.frame(
x = x,
y = pdf_normal_approx,
dist = "Spline interpolation,\nnormal tails"
),
data.frame(
x = x,
y = pdf_cauchy_approx,
dist = "Spline interpolation,\nCauchy tails"
)
) %>%
ggplot() +
geom_vline(
data = data.frame(q = q_lognormal),
mapping = aes(xintercept = q),
size = 0.2
) +
geom_line(
mapping = aes(x = x, y = y, color = dist, linetype = dist),
size = 0.8
) +
scale_color_viridis_d(
"Distribution",
end = 0.9
) +
scale_linetype_discrete("Distribution") +
ylab("Probability Density") +
xlab("") +
theme_bw()
We emphasize that the density is a step function. We illustrate this
by setting the n_grid
parameter to the artificially low
value of 1:
d_lognormal_approx_n_grid_1 <- make_d_fn(ps = quantile_probs,
qs = q_lognormal,
tail_dist = "lnorm",
interior_args = list(n_grid = 1))
pdf_lognormal_approx_n_grid_1 <- d_lognormal_approx_n_grid_1(x)
dplyr::bind_rows(
data.frame(
x = x,
y = pdf_lognormal_approx,
dist = "Spline interpolation,\n n_grid = 20"
),
data.frame(
x = x,
y = pdf_lognormal_approx_n_grid_1,
dist = "Spline interpolation,\n n_grid = 1"
)
) %>%
ggplot() +
geom_vline(
data = data.frame(q = q_lognormal),
mapping = aes(xintercept = q),
size = 0.2
) +
geom_line(
mapping = aes(x = x, y = y, color = dist),
size = 0.8
) +
scale_color_viridis_d(
"Distribution",
end = 0.7
) +
scale_linetype_discrete("Distribution") +
ylab("Probability Density") +
xlab("") +
theme_bw()
We can generate random deviates from the distribution estimates as follows:
r_normal_approx <- make_r_fn(ps = quantile_probs,
qs = q_lognormal,
tail_dist = "norm")
r_lognormal_approx <- make_r_fn(ps = quantile_probs,
qs = q_lognormal,
tail_dist = "lnorm")
r_cauchy_approx <- make_r_fn(ps = quantile_probs,
qs = q_lognormal,
tail_dist = "cauchy")
normal_approx_sample <- r_normal_approx(n = 10000)
lognormal_approx_sample <- r_lognormal_approx(n = 10000)
cauchy_approx_sample <- r_cauchy_approx(n = 10000)
bind_rows(
data.frame(x = normal_approx_sample, dist = "Spline interpolation,\nnormal tails"),
data.frame(x = lognormal_approx_sample, dist = "Spline interpolation,\nlog-normal tails"),
data.frame(x = cauchy_approx_sample, dist = "Spline interpolation,\nCauchy tails")
) %>%
ggplot() +
geom_density(mapping = aes(x = x, color = dist, linetype = dist)) +
scale_color_viridis_d(
"Distribution",
end = 0.9
) +
scale_linetype_discrete("Distribution") +
theme_bw()
The heavy tails of the Cauchy distribution dominate the display. Here we restrict the range of the horizontal axis:
bind_rows(
data.frame(x = normal_approx_sample, dist = "Spline interpolation,\nnormal tails"),
data.frame(x = lognormal_approx_sample, dist = "Spline interpolation,\nlog-normal tails"),
data.frame(x = cauchy_approx_sample, dist = "Spline interpolation,\nCauchy tails")
) %>%
ggplot() +
geom_density(mapping = aes(x = x, color = dist, linetype = dist)) +
scale_color_viridis_d(
"Distribution",
end = 0.9
) +
scale_linetype_discrete("Distribution") +
xlim(-100, 300) +
theme_bw()
#> Warning: Removed 293 rows containing non-finite outside the scale range
#> (`stat_density()`).
Finally, we display quantile function estimates:
ps <- seq(from = 0.01, to = 0.99, by = 0.01)
q_normal_approx <- make_q_fn(ps = quantile_probs,
qs = q_lognormal,
tail_dist = "norm")
q_lognormal_approx <- make_q_fn(ps = quantile_probs,
qs = q_lognormal,
tail_dist = "lnorm")
q_cauchy_approx <- make_q_fn(ps = quantile_probs,
qs = q_lognormal,
tail_dist = "cauchy")
quantiles_lognormal <- qlnorm(ps, meanlog = meanlog, sdlog = sdlog)
quantiles_normal_approx <- q_normal_approx(ps)
quantiles_lognormal_approx <- q_lognormal_approx(ps)
quantiles_cauchy_approx <- q_cauchy_approx(ps)
dplyr::bind_rows(
data.frame(
x = ps,
y = quantiles_lognormal,
dist = "Log normal"
),
data.frame(
x = ps,
y = quantiles_normal_approx,
dist = "Spline interpolation,\nnormal tails"
),
data.frame(
x = ps,
y = quantiles_lognormal_approx,
dist = "Spline interpolation,\nlognormal tails"
),
data.frame(
x = ps,
y = quantiles_cauchy_approx,
dist = "Spline interpolation,\nCauchy tails"
)
) %>%
ggplot() +
geom_line(
mapping = aes(x = x, y = y, color = dist, linetype = dist),
size = 0.8
) +
scale_color_viridis_d(
"Distribution",
end = 0.9
) +
scale_linetype_discrete("Distribution") +
ylab("Quantile") +
xlab("Probability Level") +
theme_bw()
A lognormal distribution specifies that \(P(X \leq 0) = 0\), i.e., the random variable will always non-negative values. If a lognormal tail distribution is used, e.g. to ensure that the distribution’s support is non-negative and to capture a long right tail, the package inserts a point mass at 0 if any provided quantiles are 0. We illustrate this behavior in this example.
# mixture of a LogNormal(4, 0.5) distribution with weight 0.8 and
# a point mass at 0 with weight 0.2
# probabilities and quantiles for the lognormal component
lnorm_ps <- seq(from = 0.1, to = 0.9, by = 0.1)
lnorm_qs <- qlnorm(lnorm_ps, meanlog = 4.0, sdlog = 0.5)
adj_lnorm_ps <- 0.2 + lnorm_ps * 0.8
# quantile at 0 with probability 0.2 for the point mass at 0
point_p <- 0.2
point_q <- 0.0
ps <- c(point_p, adj_lnorm_ps)
qs <- c(point_q, lnorm_qs)
Note that the CDF estimate is zero below 0, jumps to 0.2 at 0, and is continuous thereafter.
x <- seq(from = -100.0, to = 400.0, length = 501)
p_lognormal_approx <- make_p_fn(ps = ps,
qs = qs,
tail_dist = "lnorm")
cdf_lognormal_approx <- p_lognormal_approx(x)
data.frame(
x = x,
y = cdf_lognormal_approx
) %>%
ggplot() +
geom_line(
mapping = aes(x = x, y = y),
size = 0.8
) +
geom_point(
data = data.frame(q = qs, p = ps),
mapping = aes(x = q, y = p),
size = 1.2
) +
ylim(0, 1) +
ylab("Probability") +
xlab("") +
theme_bw()
As expected, the quantile function inverts the CDF.
plot_ps <- seq(from = 0.00, to = 0.99, by = 0.001)
q_lognormal_approx <- make_q_fn(ps = ps,
qs = qs,
tail_dist = "lnorm")
qf_lognormal_approx <- q_lognormal_approx(plot_ps)
data.frame(
x = plot_ps,
y = qf_lognormal_approx
) %>%
ggplot() +
geom_line(
mapping = aes(x = x, y = y),
size = 0.8
) +
geom_point(
data = data.frame(q = ps, p = qs),
mapping = aes(x = q, y = p),
size = 1.2
) +
xlim(0, 1) +
xlab("Probability") +
ylab("") +
theme_bw()
We can verify this by plotting the quantile function estimate with inverted axes on the same display as the CDF estimate:
dplyr::bind_rows(
data.frame(
x = x,
y = cdf_lognormal_approx,
method = "CDF Estimate"
),
data.frame(
x = qf_lognormal_approx,
y = plot_ps,
method = "Flipped QF Estimate"
)
) %>%
ggplot() +
geom_line(
mapping = aes(x = x, y = y, color = method, linetype = method),
size = 0.8
) +
geom_point(
data = data.frame(q = qs, p = ps),
mapping = aes(x = q, y = p),
size = 1.2
) +
ylim(0, 1) +
ylab("Probability") +
xlab("") +
theme_bw()
Samples from this distribution estimate will include a concentration of values at 0:
r_fn <- make_r_fn(ps = ps, qs = qs, tail_dist = "lnorm")
sampled_values_df <- data.frame(x = r_fn(10000))
ggplot(sampled_values_df) +
geom_histogram(mapping = aes(x = x), bins = 100) +
theme_bw()
Because the distribution has a discrete component, we cannot produce a density estimate:
d_fn_lnorm <- make_d_fn(ps = ps, qs = qs, tail_dist = "lnorm")
#> Error in make_d_fn(ps = ps, qs = qs, tail_dist = "lnorm"): make_d_fn requires the distribution to be continuous, but a discrete component was detected
Note that we do not get an error when estimating the density with a normal tail distribution, because in this case we are not forced to assume that there is a point mass at 0:
When quantile values are the same at multiple probability levels, the
distribution assigns non-zero probability to that value, corresponding
to a “step” or “jump” in the cumulative distribution function. Again, we
note that in this case, make_d_fn
will throw an error
because a continuous density does not exist. We illustrate the behavior
of the estimates provided by make_p_fn
and
make_q_fn
in the presence of repeated quantiles in this and
the following example.
quantile_probs <- seq(from = 0.1, to = 0.9, by = 0.1)
quantile_values <- c(1.0, 2.0, 3.0, 3.0, 3.0, 3.0, 3.0, 8.0, 9.0)
d_normal_approx <- make_d_fn(ps = quantile_probs,
qs = quantile_values,
tail_dist = "norm")
#> Error in make_d_fn(ps = quantile_probs, qs = quantile_values, tail_dist = "norm"): make_d_fn requires the distribution to be continuous, but a discrete component was detected
p_normal_approx <- make_p_fn(ps = quantile_probs,
qs = quantile_values,
tail_dist = "norm")
q_normal_approx <- make_q_fn(ps = quantile_probs,
qs = quantile_values,
tail_dist = "norm")
r_normal_approx <- make_r_fn(ps = quantile_probs,
qs = quantile_values,
tail_dist = "norm")
x <- seq(from = 0.0, to = 20.0, length = 5001)
cdf_normal_approx <- p_normal_approx(x)
ggplot() +
geom_line(data = data.frame(x = x, y = cdf_normal_approx),
mapping = aes(x = x, y = y)) +
geom_point(data = data.frame(x = quantile_values, y = quantile_probs),
mapping = aes(x = x, y = y))
ps <- seq(from = 0.0, to = 1.0, length = 5001)
qf_normal_approx <- q_normal_approx(ps)
ggplot() +
geom_line(data = data.frame(p = ps, y = qf_normal_approx),
mapping = aes(x = p, y = y)) +
geom_point(data = data.frame(x = quantile_probs, y = quantile_values),
mapping = aes(x = x, y = y))
samples_normal_approx <- r_normal_approx(n = 10000)
mean(samples_normal_approx == 3.0)
#> [1] 0.3989
We confirm here that the estimated CDF and QF are indeed inverses to a high degree of numerical precision in regions where the distribution is continuous:
ps <- seq(from = 0.0, to = 1.0, length = 101)
out_ps <- p_normal_approx(q_normal_approx(ps))
out_ps
#> [1] 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10 0.11 0.12 0.13 0.14
#> [16] 0.15 0.16 0.17 0.18 0.19 0.20 0.21 0.22 0.23 0.24 0.25 0.26 0.27 0.28 0.29
#> [31] 0.30 0.70 0.70 0.70 0.70 0.70 0.70 0.70 0.70 0.70 0.70 0.70 0.70 0.70 0.70
#> [46] 0.70 0.70 0.70 0.70 0.70 0.70 0.70 0.70 0.70 0.70 0.70 0.70 0.70 0.70 0.70
#> [61] 0.70 0.70 0.70 0.70 0.70 0.70 0.70 0.70 0.70 0.70 0.70 0.71 0.72 0.73 0.74
#> [76] 0.75 0.76 0.77 0.78 0.79 0.80 0.81 0.82 0.83 0.84 0.85 0.86 0.87 0.88 0.89
#> [91] 0.90 0.91 0.92 0.93 0.94 0.95 0.96 0.97 0.98 0.99 1.00
ps[(ps < 0.3) | (ps > 0.7)] - out_ps[(ps < 0.3) | (ps > 0.7)]
#> [1] 0.000000e+00 -1.561251e-17 -3.469447e-18 -1.040834e-17 0.000000e+00
#> [6] -2.775558e-17 2.775558e-17 0.000000e+00 0.000000e+00 0.000000e+00
#> [11] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
#> [16] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 -2.775558e-17
#> [21] 0.000000e+00 0.000000e+00 2.775558e-17 0.000000e+00 0.000000e+00
#> [26] 0.000000e+00 5.551115e-17 0.000000e+00 0.000000e+00 0.000000e+00
#> [31] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
#> [36] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
#> [41] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 1.110223e-16
#> [46] 0.000000e+00 0.000000e+00 -1.110223e-16 0.000000e+00 -1.110223e-16
#> [51] 0.000000e+00 0.000000e+00 1.110223e-16 0.000000e+00 0.000000e+00
#> [56] 0.000000e+00 0.000000e+00 1.110223e-16 0.000000e+00 0.000000e+00
#> [61] 0.000000e+00
As described earlier, we use a piecewise linear representation of the
CDF to avoid numerical issues that arise when inverting the monotonic
spline. We illustrate these problems by manually setting
n_grid = NULL
, i.e., using the stage 1 CDF estimate based
directly on splines. The two representations of the CDF are generally
quite similar, but the QF estimates can differ substantially.
p_normal_approx_spline <- make_p_fn(ps = quantile_probs,
qs = quantile_values,
tail_dist = "norm",
interior_args = list(n_grid = NULL))
q_normal_approx_spline <- make_q_fn(ps = quantile_probs,
qs = quantile_values,
tail_dist = "norm",
interior_args = list(n_grid = NULL))
x <- seq(from = 0.0, to = 20.0, length = 5001)
plot_df <- rbind(
data.frame(
x = x,
cdf = p_normal_approx(x),
type = "piecewise linear"
),
data.frame(
x = x,
cdf = p_normal_approx_spline(x),
type = "spline"
)
)
ggplot() +
geom_line(data = plot_df,
mapping = aes(x = x, y = cdf, color = type, linetype = type)) +
geom_point(data = data.frame(x = quantile_values, y = quantile_probs),
mapping = aes(x = x, y = y))
ps <- seq(from = 0.0, to = 1.0, length = 5001)
plot_df <- rbind(
data.frame(
p = ps,
qf = q_normal_approx(ps),
type = "piecewise linear"
),
data.frame(
p = ps,
qf = q_normal_approx_spline(ps),
type = "spline"
)
)
ggplot() +
geom_line(data = plot_df,
mapping = aes(x = p, y = qf, color = type, linetype = type)) +
geom_point(data = data.frame(x = quantile_probs, y = quantile_values),
mapping = aes(x = x, y = y))
We therefore recommend using the piecewise linear representation when using a quantile function or generating random deviates, which uses the quantile function in the background.
Our final example illustrates the behavior when there are duplicated
quantiles at the edges. In these cases, we ignore the specified
tail_dist
, and assume that the point mass at the edge
contains all of the tail probability.
quantile_probs <- seq(from = 0.1, to = 0.9, by = 0.1)
quantile_values <- c(1.0, 1.0, 3.0, 3.0, 3.0, 3.0, 3.0, 9.0, 9.0)
p_normal_approx <- make_p_fn(ps = quantile_probs,
qs = quantile_values,
tail_dist = "norm")
p_normal_approx_lin <- make_p_fn(ps = quantile_probs,
qs = quantile_values,
tail_dist = "norm",
interior_args = list(n_grid = 20))
q_normal_approx <- make_q_fn(ps = quantile_probs,
qs = quantile_values,
tail_dist = "norm")
q_normal_approx_lin <- make_q_fn(ps = quantile_probs,
qs = quantile_values,
tail_dist = "norm",
interior_args = list(n_grid = 20))
x <- seq(from = 0.0, to = 20.0, length = 5001)
plot_df <- rbind(
data.frame(
x = x,
cdf = p_normal_approx(x),
type = "spline"
),
data.frame(
x = x,
cdf = p_normal_approx_lin(x),
type = "piecewise linear"
)
)
ggplot() +
geom_line(data = plot_df,
mapping = aes(x = x, y = cdf, color = type, linetype = type)) +
geom_point(data = data.frame(x = quantile_values, y = quantile_probs),
mapping = aes(x = x, y = y))
ps <- seq(from = 0.0, to = 1.0, length = 5001)
plot_df <- rbind(
data.frame(
p = ps,
qf = q_normal_approx(ps),
type = "spline"
),
data.frame(
p = ps,
qf = q_normal_approx_lin(ps),
type = "piecewise linear"
)
)
ggplot() +
geom_line(data = plot_df,
mapping = aes(x = p, y = qf, color = type, linetype = type)) +
geom_point(data = data.frame(x = quantile_probs, y = quantile_values),
mapping = aes(x = x, y = y))