Added option noFixed
to function rbnpsd
to redraw loci that were drawn fixed for a single allele. These loci are
not polymorphic so they would normally not be considered in
analyses.
Added function fixed_loci
to test for fixed loci
within rbnpsd.
coanc_to_kinship
to easily obtain
kinship matrices from coancestry matrices.qis
now returns a numeric admixture proportions matrix
(used to be logical).q1d
and q1dc
now handle
sigma = 0
special case.q1d
and q1dc
now provide more informative
out-of-bounds messages when sigma
is missing (and
s
is provided)sigma
root finding in q1d
and
q1dc
(when s
is provided) is now more robust,
explicitly tested at boundaries (min s > 0
achieved at
sigma = 0
and max s = 1
achieved at
sigma = Inf
).
interval
and tol
from
both q1d
and q1dc
(users would never need to
set them now that procedure is more robust).coanc
-> coanc_admix
q1d
-> admix_prop_1d_linear
q1dc
-> admix_prop_1d_circular
qis
-> admix_prop_indep_subpops
rpanc
-> draw_p_anc
rpint
-> draw_p_subpops
rpiaf
-> make_p_ind_admix
rgeno
-> draw_genotypes_admix
rbnpsd
-> draw_all_admix
fst
-> fst_admix
(no deprecated version
available in this case, to eliminate conflict with
popkin::fst
)Q
-> admix_proportions
F
-> coanc_subpops
(if general matrix
is accepted), inbr_subpops
(vector or scalar versions
required)s
-> bias_coeff
w
-> weights
Theta
-> coancestry
m
-> m_loci
n
-> n_ind
k
-> k_subpops
pAnc
-> p_anc
B
-> p_subpops
P
-> p_ind
sigma = 0
bug in
admix_prop_1d_circular
.draw_all_admix
(compared to
deprecated rbnpsd
, which retains old defaults):
require_polymorphic_loci
(old noFixed
) is
now TRUE
by default.want_p_ind
and want_p_subpops
(old
wantP
and wantB
) are now FALSE
by
default.draw_p_subpops
now admits scalar inputs
p_anc
and inbr_subpops
, while number of loci
and number of subpopulations can be provided as additional options.coanc_subpops
) or if the diagonal matrix case is required
(specified as vector or scalar inbr_subpops
).admix_prop_1d_circular
) to
prevent overlapping individuals on the edges, and to better agree
visually with the linear version
(admix_prop_1d_linear
).draw_genotypes_admix
low_mem
option could be set but filled slowly by locus
only.draw_all_admix
is also now automatically low-memory
whenever want_p_ind = FALSE
, and the explicit
low_mem
option has also been removed.beta
in function draw_p_anc
to trigger a symmetric Beta distribution for the ancestral allele
frequencies, with the desired shape parameter. The beta
option can also be set on the wrapper function
draw_all_admix
. This option allows simulation of a
distribution heavier on rare variants (when beta
is much
smaller than 1), more similar to real human data.q1dc
,
q1d
, qis
, coanc
,
rbnpsd
, rgeno
, rpanc
,
rpint
, rpiaf
.man/figures/
bias_coeff_admix_fit
, which caused
it to die if the desired bias coefficient was an extreme value
(particularly 1
). The error message was:
f() values at end points not of opposite sign
. The actual
bug was not observed in the regular R build, but rather in a limited
precision setting where R was configured with
--disable-long-double
.p_anc
to function
draw_all_admix
, to specify desired ancestral allele
frequencies instead of having the code generate it randomly
(default).draw_p_subpops.R
, clarifying that input p_anc
can be scalar.draw_all_admix
: when option p_anc
is provided as scalar and want_p_anc = TRUE
, now the return
value is always a vector (in this case the input scalar value repeated
m_loci
times). The previous behavior was to return
p_anc
as scalar if that was the input, which could be
problematic for downstream applications.admix_prop_1d_linear
and
admix_prop_1d_circular
had these changes:
bias_coeff
,
coanc_subpops
and fst
now have default values
(of NA
, NULL
, and NA
,
respectively) instead of missing, and these “missing” values can be
passed to get the same behavior as if they hadn’t been passed at
all.bias_coeff = 1
(to fix an issue only observed on Apple
M1).admix_prop_indep_subpops
: default value for
the optional parameter subpops
is now made more clear in
arguments definition.DESCRIPTION
,
README.md
and the vignette, to point to the published
method in PLoS Genetics.draw_p_subpops_tree
is the tree version of
draw_p_subpops
.coanc_tree
calculates the true coancestry
matrix corresponding to the subpopulations related by a tree.draw_all_admix
has new argument
tree_subpops
that can be used in place of
inbr_subpops
(to simulated subpopulation allele frequencies
using draw_p_subpops_tree
instead of
draw_p_subpops
).coanc_subpops
) as input, so they work if they are passed
the matrix that coanc_tree
returns:
coanc_admix
, fst_admix
,
admix_prop_1d_linear
,
admix_prop_1d_circular
.admix_prop_1d_linear
and
admix_prop_1d_circular
, when sigma
is missing
(and therefore fit to a desired coanc_subpops
,
fst
, and bias_coeff
), now additionally return
multiplicative factor
used to rescale
coanc_subpops
.It’s Fangorn Forest around here with all the tree updates!
fit_tree
for fitting trees to coancestry matrices!scale_tree
to easily scale coancestry trees and check
for out-of-bounds values.tree_additive
for calculating “additive” edges for
probabilistic edge coancestry trees, and also the reverse function .
coanc_tree
, but now it’s renamed, exported, and well
documented!$root.edge
to tree phylo
objects passed to these functions:
coanc_tree
: edge is a shared covariance value affecting
all subpopulations.draw_all_admix
and draw_p_subpops_tree
: if
root edge is present, functions warn that it will be ignored.admix_prop_1d_linear
and
admix_prop_1d_circular
: debugged an edge case where
sigma
is small but not zero and numerically-calculated
densities all come out to zero in a given row of the
admix_proportions
matrix (for
admix_prop_1d_circular
infinite values also arise), which
used to lead to NAs upon row normalization; now for those rows, the
closest ancestry (by coordinate distance) gets assigned the full
admixture fraction (just as for independent
subpopulations/sigma = 0
).admix_prop_1d_linear
,
admix_prop_1d_circular
now copy names from the input
coanc_subpops
(vector and matrix versions, only required
when fitting bias_coeff
) to the columns of the output
admix_proportions
matrix.draw_genotypes_admix
now copies row and column
names from input matrix p_ind
(or rownames from
p_ind
and column names from the rownames of
admix_proportions
when the latter is provided) to output
genotype matrixdraw_p_subpops
now copies names from
p_anc
to rows, names from inbr_subpops
to
columns, when present and of the right dimensions.draw_p_subpops_tree
now copies names from
p_anc
to rows. Names from tree_subpops
were
already copied to columns before.coanc_admix
and fst_admix
stop
if the column names of admix_proportions
and the names of
coanc_subpops
disagree.draw_all_admix
stops if the column names of
admix_proportions
and the names of either
inbr_subpops
or tree_subpops
disagree.draw_genotypes_admix
, when
admix_proportions
is passed, stops if the column names of
admix_proportions
and p_ind
disagree.make_p_ind_admix
stops if the column names of
admix_proportions
and p_subpops
disagree.tree_additive
now has option
force
, which when TRUE
simply proceeds without
stopping if additive edges were already present (in
tree$edge.length.add
, which is ignored and
overwritten).New functions and bug fixes dealing with reordering tree edges and tips.
tree_reindex_tips
for ensuring that tip
order agrees in both the internal labels vector and the edge matrix.
Such lack of agreement is generally possible (technically the tree is
the same for arbitrary orders of edges in the edge matrix). However,
such a disagreement causes visual disagreement in plots (for example,
trees are plotted in the order of the edge matrix, versus coancestry
matrices are ordered as in the tip labels vector instead), which can now
be fixed in general.tree_reorder
for reordering tree edges
and tips to agree as much as possible with a desired tip order. The
heuristic finds the exact solution if it exists, otherwise returns a
reasonable order close to the desired order. Tip order in labels and
edge matrix agree (via tree_reindex_tips
).fit_tree
now outputs trees with tip order that
better agrees with the input data, and tip order in labels vector and
edge matrix now agree (via tree_reorder
).tree_additive
. Before this bug fix, some trees
could trigger the error message “Error: Node index 6 was not assigned
coancestry from root! (unexpected)”, where “6” could be other
numbers.draw_p_subpops_tree
. Before this bug fix, some
trees could trigger the error message “Error: The root node index in
tree_subpops$edge
(9) does not match
k_subpops + 1
(6) where k_subpops
is the
number of tips! Is the tree_subpops
object malformed?”,
where “9” and “6” could be other numbers. Other possible error messages
contain “Parent node index 6 has not yet been processed …” or “Child”
instead of “Parent”, where “6” could be other numbers.fit_tree
had related fixes,
but overall fit_tree
appears to have had no bugs because
users cannot provide trees, and the tree-building algorithm does not
produce scrambled edges that would have caused problems.fixed_loci
and draw_all_admix
have a new parameter maf_min
that, when greater than zero,
allows for treating rare variants as fixed. In
draw_all_admix
, this now allows for simulating loci with
frequency-based ascertainment bias.draw_all_admix
that could cause a
“stack overflow” error. The function used to call itself recursively if
require_polymorphic_loci = TRUE
, and in cases where there
are very rare allele frequencies or high maf_min
the number
of recursions could be so large that it triggered this error. Now the
function has a while
loop, and does not recurse more than
one level at the time; there is no limit to the number of iterations and
no errors occur inherently due to large numbers of iterations.fit_tree
internally simplified to use
stats::hclust
, which also results in a small runtime gain.
The new code (when method = "mcquitty"
, which is default)
gives the same answers as before (in other words, the original algorithm
was a special case of hierarchical clustering).
method
is passed to hclust
.
Although all hclust
methods are allowed, for this
application the only ones that make sense are “mcquitty” (WPGMA) and
“average” (UPGMA). In internal evaluations, both algorithms had similar
accuracy and runtime, but only “mcquitty” exactly recapitulates the
original algorithm.inst/CITATION
(missed last time I
updated them in other locations).undiff_af
for creating
“undifferentiated” allele frequency distributions based on real data but
with a lower variance (more concentrated around 0.5) according to a
given FST, useful for simulating data trying to match real data.LICENSE.md
.NEWS.md
slightly to improve its
automatic parsing.undiff_af
:
F_max
, V_in
, V_out
,
V_mix
, and alpha
.distr = "auto"
cases where mixing variance
ended up being smaller than required due to roundoff errors
(alpha
is now larger than given in direct formula by
eps = 10 * .Machine$double.eps
, which is also a new
option.draw_all_admix
added option
p_anc_distr
for passing custom ancestral allele frequency
distributions (as vector or function). This differs from the similar
preexisting option p_anc
, which fixed ancestral allele
frequencies per locus to those values. These two options behave
differently when loci have to be re-drawn due to being fixed or having
too-low MAFs: passing p_anc
never changes those values,
whereas passing p_anc_distr
results in drawing new values
as necessary. The new option is more natural biologically and results in
re-drawing fixed loci less often.undiff_af
renamed parameter F
to
kinship_mean
, and updated all documentation to reflect the
correction that this parameter is the mean kinship and not FST (the
complete derivation will appear in a manuscript).
F_max
is similarly now kinship_mean_max
.bias_coeff_admix_fit
) shared by
admix_prop_1d_linear
and
admix_prop_1d_circular
for edge cases.
Inf
, but instead an error was encountered.stats::uniroot
)