The bnpsd
(“Balding-Nichols
Pritchard-Stephens-Donnelly”) R package is for simulating admixed
populations. More specifically, bnpsd
facilitates
construction of admixed population structures and simulation of allele
frequencies and genotypes from the BN-PSD admixture model. This model
combines the Balding-Nichols (BN) allele frequency model for the
intermediate subpopulations with the Pritchard-Stephens-Donnelly (PSD)
model of individual-specific admixture proportions. This model enables
the simulation of complex population structures, ideal for illustrating
challenges in kinship coefficient and FST estimation. Note that
simulated loci are drawn independently (in linkage equilibrium).
The stable version of the package is now on CRAN and can be installed using
install.packages("bnpsd")
The current development version can be installed from the GitHub
repository using devtools
:
install.packages("devtools") # if needed
library(devtools)
install_github('StoreyLab/bnpsd', build_opts = c())
You can see the package vignette, which has more detailed documentation, by typing this into your R session:
vignette('bnpsd')
This is a quick overview of the main bnpsd
functions.
Define the population structure (in this case for 1D admixture scenario).
library(bnpsd)
# dimensions of data/model
# number of loci
<- 10
m_loci # number of individuals
<- 5
n_ind # number of intermediate subpops
<- 2
k_subpops
# define population structure
# FST values for k=2 subpopulations
<- c(0.1, 0.3)
inbr_subpops # admixture proportions from 1D geography
<- admix_prop_1d_linear(n_ind, k_subpops, sigma = 1)
admix_proportions # also available:
# - admix_prop_1d_circular
# - admix_prop_indep_subpops
# get pop structure parameters of the admixed individuals
# the coancestry matrix
<- coanc_admix(admix_proportions, inbr_subpops)
coancestry # FST of admixed individuals
<- fst(admix_proportions, inbr_subpops) Fst
Draw random allele frequencies and genotypes from this population structure.
# draw all random allele freqs and genotypes
<- draw_all_admix(admix_proportions, inbr_subpops, m_loci)
out # genotypes
<- out$X
X # ancestral allele frequencies (AFs)
<- out$p_anc
p_anc
# OR... draw each vector or matrix separately
# provided for additional flexibility
# ancestral AFs
<- draw_p_anc(m_loci)
p_anc # independent subpops (intermediate) AFs
<- draw_p_subpops(p_anc, inbr_subpops)
p_subpops # individual-specific AFs
<- make_p_ind_admix(p_subpops, admix_proportions)
p_ind # genotypes
<- draw_genotypes_admix(p_ind) X
This tree allows for correlated subpopulations (previous examples had independent subpopulations).
# best to start by specifying tree in Newick string format
<- '(S1:0.1,(S2:0.1,S3:0.1)N1:0.1)T;'
tree_str # and turn it into `phylo` object using the `ape` package
library(ape)
<- read.tree( text = tree_str )
tree_subpops # true coancestry matrix corresponding to this tree
<- coanc_tree( tree_subpops )
coanc_subpops
# admixture proportions from 1D geography
# (constructed again but for k=3 tree)
<- nrow( coanc_subpops )
k_subpops <- admix_prop_1d_linear( n_ind, k_subpops, sigma = 0.5 )
admix_proportions
# get pop structure parameters of the admixed individuals
# the coancestry matrix
<- coanc_admix( admix_proportions, coanc_subpops )
coancestry # FST of admixed individuals
<- fst_admix( admix_proportions, coanc_subpops )
Fst
# draw all random allele freqs and genotypes, tree version
<- draw_all_admix( admix_proportions, tree_subpops = tree_subpops, m_loci = m_loci )
out # genotypes
<- out$X
X # ancestral allele frequencies (AFs)
<- out$p_anc
p_anc
# OR... draw tree subpops (intermediate) AFs separately
<- draw_p_subpops_tree( p_anc, tree_subpops ) p_subpops_tree
Alejandro Ochoa, John D Storey. 2021. “Estimating FST and kinship for arbitrary population structures.” PLoS Genet 17(1): e1009241. PubMed ID 33465078. doi:10.1371/journal.pgen.1009241. bioRxiv doi:10.1101/083923 2016-10-27.
Alejandro Ochoa, John D Storey. 2016. “FST And Kinship for Arbitrary Population Structures I: Generalized Definitions.” bioRxiv doi:10.1101/083915.