Animal breeding relies on the prediction of breeding values of selection candidates and subsequent selection of animals for breeding in a way that restricts the rate of inbreeeding and maintains genetic originality of the breed. The methods of choice are the utilization of pedigree data, of genetic marker data, or a combination of both. This section covers the following evaluations based on pedigree data:
Note that the calculation of breeding values is not included as there already exist a couple of R packages for this purpose. Recommended are the free package MCMCglmm for small and medium sized data sets, and package asreml for large data sets.
A pedigree is a data frame or data table with the first three columns
being the individual ID, the sire ID, and the dam ID. Depending on the
intended evaluation, the pedigree may also provide the names of the
breeds (column Breed
), the years of birth or generation
numbers (column Born
), and the sexes (column
Sex
). A toy example of a pedigree may look like this:
Pedigree <- data.frame(
Indiv= c("Iffes", "Peter", "Anna-Lena", "Kevin", "Horst"),
Sire = c("Kevin", "Kevin", NA, 0, "Horst"),
Dam = c("Chantalle", "Angelika", "Chantalle", "", NA),
Breed= c("Angler", "Angler", "Angler", "Holstein", "Angler"),
Born = c(2015, 2016, 2011, 2010, 2015)
)
Pedigree
## Indiv Sire Dam Breed Born
## 1 Iffes Kevin Chantalle Angler 2015
## 2 Peter Kevin Angelika Angler 2016
## 3 Anna-Lena <NA> Chantalle Angler 2011
## 4 Kevin 0 Holstein 2010
## 5 Horst Horst <NA> Angler 2015
Individual IDs must be unique and should not contain blank spaces. The above pedigree contains a loop, so it is not a valid pedigree (Horst is an ancestor of himself). Function prePed will prepare the pedigree and turn it into a valid one. The function
Indiv
,
Sire
, Dam
(if needed),Sire
and
Dam
by NA
,Sex
. Sexes will be denoted as
male
and female
.numIndiv
, numSire
, and
numDam
with numeric IDs of the individuals (if
addNum=TRUE
is specified),thisBreed
that are
born after lastNative
to unknown (if these parameters are
specified).## Pedigree loops were detected. We recommend to correct them manually before
## using prePed(). The parents of the following individuals are set to unknown
## to remove the loops.
## Sire Dam
## Horst Horst <NA>
The result is:
## Sire Dam Sex Breed Born I Offspring
## Chantalle <NA> <NA> female Angler NA NA TRUE
## Angelika <NA> <NA> female Angler NA NA TRUE
## Kevin <NA> <NA> male Holstein 2010 NaN TRUE
## Horst <NA> <NA> <NA> Pedigree Error 2015 NaN FALSE
## Anna-Lena <NA> Chantalle <NA> Angler 2011 NaN FALSE
## Iffes Kevin Chantalle <NA> Angler 2015 5 FALSE
## Peter Kevin Angelika <NA> Angler 2016 6 FALSE
Plots can be created to show specified individuals and their
relatives. In the first step, function subPed is used
below to create a small pedigree that includes only the individuals to
be plotted, which are "Chantalle"
and
"Angelika"
, up to prevGen
previous (ancestral)
generations, and up to succGen
succeeding (descendant)
generations. The pedigree was plotted with function pedplot. It is
basically a wrapper function for function plot.pedigree
from package kinship2
, but has additional arguments.
Parameter label
contains the columns of the pedigree to be
used for labeling. By default, symbols of individuals included in vector
keep
are plotted in black and for individuals from other
breeds the symbol is crossed out.
sPed <- subPed(Pedig, keep = c("Chantalle","Angelika"), prevGen = 2, succGen = 1)
pedplot(sPed, label = c("Indiv", "Born", "Breed"), cex = 0.55)
Hereby, SAnna-Lena
denotes the unknown sire of
Anna-Lena
.
The next step is to find out if the completeness of the pedigree is
sufficient for the intended evaluation, and to identify individuals with
sufficient pedigree information. This will be demonstrated at the
example of a pedigree of Hinterwald cattle. Data frame Phen
contains selection candidates with breeding values in column
BV
, and data frame PedigWithErrors
contains
their pedigree. The column with individual IDs is called
Indiv
in both data sets.
Function completeness can be used to determine the proportion of known ancestors of each specified individual in each ancestral generation:
## Indiv Generation Completeness
## 269 276000891901739 0 1.0
## 270 276000891901739 1 0.5
## 279 276000891471209 0 1.0
## 280 276000891471209 1 0.5
## 283 276000891862755 0 1.0
## 284 276000891862755 1 0.5
or the mean completeness of the pedigrees of specified individuals within sexes:
compl <- completeness(Pedig, keep=Phen$Indiv, by="Sex")
library("ggplot2")
ggplot(compl, aes(x=Generation, y=Completeness, col=Sex)) + geom_line()
The completeness of the pedigree of each individual can be summarized with function summary:
## equiGen fullGen maxGen PCI Inbreeding
## 276000812496744 6.032959 3 12 0.9677419 0.008903503
## 276000812202159 5.789307 1 12 0.7692308 0.027274609
## 276000812749837 4.419678 2 12 0.8333333 0.005828857
## 276000891618444 4.234375 1 12 0.5454545 0.000000000
## 276000812922523 5.575928 2 12 0.8333333 0.003438234
## 276000891862786 6.565186 3 12 0.9677419 0.013700247
The following parameters were computed for each individual:
equiGen
fullGen
maxGen
PCI
Inbreeding
The best possibility to characterize completeness of pedigree information by a single value is the number of equivalent complete generations, averaged over all individuals of the actual breeding population which were included in the evaluations.
The relevant parameter to identify individuals with insufficient
pedigree information to estimate inbreeding, however, is the
PCI
. This is because inbreeding can be detected only if
both maternal and paternal ancestries are known. The harmonic mean
ensures that the less complete paternal pedigree is weighted more
heavily, so the PCI
equals zero when either parent is
unkown. Inbreeding coefficients can be valid despite small
PCI
s if the most recent founders were indeed unrelated,
e.g. because they were from other breeds.
Note that inbreeding coefficients can be computed faster with function pedInbreeding.
The inbreeding coefficient of an individual is the probability that two alleles chosen at random from the maternal and paternal haplotypes are identically by descent (IBD). This parameter estimates the extent to which the individual may suffer from inbreeding depression and predicts the homogeneity of its offspring. It can be calculated with
## [1] 0.01943394
Whether mating to a sound inbred individual should be favored or avoided depends on whether the breeder wishes offspring with uniform or heterogeneous breeding values. More important than the inbreeding coefficient of an animal itself, however, is the expected inbreeding coefficient of the offspring, which should be low. The expected inbreeding coefficient of the offspring is equal to the kinship of the parents.
The kinship between two individuals is the probability that two
alleles randomly chosen from both individuals are IBD. A matrix
containing the kinship between all pairs of individuals can be computed
with function pedIBD. It is
half the numerator relationship matrix. The R code below computes the
kinship between the female with ID 276000812750188
and all
male selection candidates that have a breeding value larger than 1.0 and
a pedigree with at least 5 equivalent complete generations.
pKin <- pedIBD(Pedig, keep.only=Phen$Indiv)
isMale <- Pedig$Sex=="male" & (Pedig$Indiv %in% Phen$Indiv[Phen$BV>1.0])
males <- Pedig$Indiv[isMale & summary(Pedig)$equiGen>5]
pKin[males, "276000812750188", drop=FALSE]
## 276000812750188
## 276000811902819 0.28955731
## 276000812749676 0.06626139
## 276000812689003 0.03807358
## 276000812750190 0.18099209
## 276000813155662 0.17488755
## 276000812688988 0.16645344
## 276000812771544 0.01719556
The males with lowest kinship should be favoured for mating.
Mating decisions should not only depend on the breeding value of the male and the kinship between male and female, but also on the native contribution of the male. Many endangered breeds have been graded up with commercial high-yielding breeds. These increasing contributions from other breeds displace the original genetic background of the endangered breed, decrease the genetic contribution from native ancestors, and reduce the conservation value of the breed.
For computing the breed composition of individuals it should be taken
into account that the breed name of founders born recently is in fact
unkown even if they have been classified as purebred. Hence, for these
individuals, the breed name should be changed to "unknown"
.
Below, the breed name of founders born after 1970 is changed to
"unknown"
if they had been classified as Hinterwald cattle.
Thereafter, function pedBreedComp
was used to estimate the contribution of each individual from each
foreign breed and from native founders. The contribution an individual
has from native founders is called the native contribution NC of the
individual. Finally, column NC
containing the native
contributions was added to the pedigree.
cont <- pedBreedComp(Pedig, thisBreed="Hinterwaelder")
Pedig$NC <- cont$native
head(cont[rev(Phen$Indiv), 2:6])
## native unknown unbek0 Fleckvieh Vorderwaelder
## 276000812749835 0.6096039 0.09570312 0.03677368 0.11021423 0.14770508
## 276000812749569 0.5945129 0.14160156 0.04071045 0.11599731 0.10717773
## 276000891724277 0.0000000 0.50000000 0.50000000 0.00000000 0.00000000
## 276000891920412 0.5501404 0.12109375 0.03527832 0.10354614 0.18994141
## 276000812496874 0.3734741 0.41601562 0.03430176 0.08294678 0.09326172
## 276000811287745 0.5804138 0.09765625 0.03259277 0.08963013 0.19970703
The columns are ordered so that the most influential foreign breeds come first. It can be seen that the contribution from native founders varies considerably between individuals. Individuals with high genetic contribution from native founders should be favored for mating provided that the kinship between male and female is sufficiently low.
Since animals with high native contributions tend to be related, the inbreeding level could increase considerably when introgressed genetic material is removed from the population. This could be avoided by restricting the incease in kinship at native alleles in the population. The kinship at native alleles is also called the native kinship. An R-Object containing the information needed to estimate native kinship from pedigree can be obtained as:
For pairs of individuals the native kinship can be estimated as:
## [1] 0.182625
However, the kinship at native alleles between two individuals says nothing about the amount of genetic material the individuals have from native ancesors. It only quantifies how different these native alleles are. Hence, in any case, the native contributions of the individuals should also be considered:
## Born NC
## 276000891862786 2004 0.6164551
## 276000812497659 2004 0.6274414
The native contributions of both indivividuals are larger than the mean NC of the phenotyped individuals, which is 0.44. However, their native kinship is larger than the average, which is
## [1] 0.0776925
The correlation between the kinship and the native kinship is high (as expected):
subKin <- pKin[Phen$Indiv, Phen$Indiv]
subNatKin <- pKinatN$of[Phen$Indiv, Phen$Indiv]
diag(subKin) <- NA
diag(subNatKin) <- NA
cor(c(subKin), c(subNatKin), use="complete.obs")
## [1] 0.8916129
The correlation is even higher if only individuals with high native contributions are considered.
The genetic diversity of a population is the probability that two alleles chosen at random from the population are not IBD. It is one minus the average kinship of the individuals. A simple estimate can be obtained as
## [1] 0.9755853
The diversity of this population is high. A high genetic diversity enables to avoid inbreeding and ensures that polygenic traits have a high additive variance, which is required to achieve genetic gain. However, the diversity of this population seems to be extremely high. This could have two reasons:
The diversity is indeed very high because the individuals are in fact crossbred individuals with genetic contributions from many unrelated breeds, or
The completeness of the pedigrees is insufficient for such an evaluation.
For this breed it is unclear whether pedigree completeness is low because pedigrees of indivduals from other breeds were cut, or because previously unregistered animals have been registered. Hence, pedigrees of introduced individuals from other breeds should not be cut to reduce this uncertainty.
A parameter which depends not so much on the completeness of the pedigrees is the diversity at native alleles.
The kinship at native alleles in the population is the probability that two alleles chosen at random from the population are not IBD, given that both alleles originate from native founders. The mean kinship at native alleles is
## [1] 0.0776925
The genetic diversity at native alleles is one minus the kinship at native alleles (named conditional gene diversity in Wellmann, Hartwig, and Bennewitz 2012). Thus, it can be calculated as
## [1] 0.9223075
Note that incomplete pedigrees result in an overestimation of the genetic diversity. This is not the case for the genetic diversity at native alleles because alleles originating from founders born after 1970 were classified to be non-native. Hence, the diversity at alleles originating from these individuals does not affect the estimate.
A high diversity at native alleles is important if a goal of the breeding program is to remove introgressed genetic material from the population. Without maintenance of a high diversity at native alleles, inbreeding coefficients will soon rise to an unreasonable level.
The native effective size of a population is defined as the size of an idealized random mating population for which the genetic diversity decreases as fast as the diversity at native alleles decreases in the population under study (Wellmann, Hartwig, and Bennewitz 2012). Thus, the native effective size quantifies how fast the diversity at native alleles decreases. In contrast, the effective size quantifies how fast the genetic diversity decreases.
In a population without historic introgression the native effective size (native Ne) is equal to the effective size (Ne) as estimated by Wellmann and Bennewitz (2011). But in a population with historic introgression it can be argued that the effective size is not useful to describe the history of a population because even a small amount of introgression with unrelated individuals prevents a drop of genetic diversity, so the effecive size would be infinite.
For estimating the native effective size, the diversity at native alleles needs to be estimated at various times, and then the native effective size is estimated from the slope of a regression function.
An estimate of the native effective size is automatically provided when the kinship at native alleles is computed and parameter is specified:
## Number of Migrant Founders: 237
## Number of Native Founders: 150
## Individuals in Pedigree : 1658
## Native effective size : 49.5
The native Ne is stored as an attribute of the result:
## [1] 49.5
Thus, the diversity at native alleles decreases as fast as in an idealized population consisting of 49.5 individuals.
The effective size Ne of a population is the size of an idealized random mating population for which the genetic diversity decreases as fast as in the population under study. It is commonly estimated from the mean rate of increase in coancestry (Cervantes et al. 2011), whereby the increase in coancestry between any pair of individuals \(i\) and \(j\) is computed as \(\Delta c_{ij}=1-\sqrt[\frac{g_i+g_j}{2}]{1-c_{ij}}\), where \(c_{ij}\) is the kinship between \(i\) and \(j\), and \(g_i,g_j\) are the numbers of equivalent complete generations of individuals \(i\) and \(j\). The effective size is then estimated as \(N_e=\frac{1}{2\overline{\Delta c}}\). Thus, the effective size can be estimated as:
id <- Summary$Indiv[Summary$equiGen>=4 & Summary$Indiv %in% Phen$Indiv]
g <- Summary[id, "equiGen"]
N <- length(g)
n <- (matrix(g, N, N, byrow=TRUE) + matrix(g, N, N, byrow=FALSE))/2
deltaC <- 1 - (1-pKin[id,id])^(1/n)
Ne <- 1/(2*mean(deltaC))
Ne
## [1] 97.95922
However, the above formula assumes that ancestors are missing at random. In populations with historic introgression, pedigrees of individuals from foreign breeds are often cut, so their parents are not missing at random. In fact, even a small amount of introgression suffices to prevent a decrease of genetic diversity, so that the effective size of such a population is \(\infty\).
The native genome equivalent NGE of a population is the minimum
number of founders that would be needed to create a population
consisting of unrelated individuals that has the same diversity at
native alleles as the population under study (Wellmann, Hartwig, and Bennewitz 2012). It is
assumed that the individuals were unrelated in base year
base
, which could be well before pedigree recording had
started. Between the base year and the year lastNative
in
which the last native founder was born, the population had a historical
effective size of histNe
. This assumption implies that the
founders of the pedigree were related due to ancestors whose pedigrees
had not been recorded.
The decrease of native genome equivalents is estimated below for the
time since lastNative=1970
by assuming that individuals
were unrelated in year base=1800
and that the historical
effective size was histNe=150
between 1800 and 1970. Since
the computation for the full pedigree could take some time, the
parameters are estimated below from a subset of the pedigree, which are
the individuals included in vector keep
. Vector
keep
contains from each birth cohort 50 randomly sampled
Hinterwald cattle.
data("PedigWithErrors")
Pedig <- prePed(PedigWithErrors, thisBreed="Hinterwaelder", lastNative=1970)
set.seed(0)
keep <- sampleIndiv(Pedig[Pedig$Breed=="Hinterwaelder",], from="Born", each=50)
cand <- candes(phen = Pedig[keep,],
pKin = pedIBD(Pedig, keep.only=keep),
pKinatN= pedIBDatN(Pedig, thisBreed="Hinterwaelder", keep.only=keep),
quiet=TRUE, reduce.data=FALSE)
##
## t I pKin pKinatN Ne NGE
## 1970 1970 5.46 0.004 0.005 55.4 5.06
## 1971 1971 5.44 0.004 0.006 55.4 5.01
## 1972 1972 5.42 0.005 0.007 55.4 4.95
## 1973 1973 5.39 0.006 0.009 55.3 4.88
## 1974 1974 5.37 0.006 0.011 55.1 4.79
## 1975 1975 5.35 0.007 0.014 55.0 4.66
## 1976 1976 5.33 0.008 0.015 54.7 4.61
## 1977 1977 5.31 0.009 0.017 54.0 4.54
## 1978 1978 5.29 0.010 0.018 52.7 4.52
## 1979 1979 5.27 0.010 0.018 50.7 4.49
## 1980 1980 5.25 0.011 0.020 48.2 4.44
## 1981 1981 5.23 0.011 0.021 45.5 4.40
## 1982 1982 5.22 0.012 0.023 42.9 4.33
## 1983 1983 5.20 0.012 0.024 40.7 4.29
## 1984 1984 5.18 0.013 0.028 38.9 4.18
## 1985 1985 5.17 0.014 0.030 37.8 4.09
## 1986 1986 5.16 0.015 0.033 37.5 4.01
## 1987 1987 5.15 0.017 0.037 38.0 3.90
## 1988 1988 5.14 0.018 0.042 39.5 3.78
## 1989 1989 5.14 0.017 0.044 42.1 3.71
## 1990 1990 5.14 0.018 0.048 46.2 3.62
## 1991 1991 5.14 0.018 0.049 51.8 3.59
## 1992 1992 5.14 0.017 0.049 59.2 3.58
## 1993 1993 5.15 0.018 0.052 68.7 3.54
## 1994 1994 5.16 0.019 0.052 80.3 3.53
## 1995 1995 5.17 0.017 0.051 93.4 3.56
## 1996 1996 5.18 0.017 0.051 106.6 3.56
## 1997 1997 5.20 0.016 0.050 117.7 3.57
## 1998 1998 5.22 0.014 0.051 124.8 3.55
## 1999 1999 5.24 0.013 0.052 127.8 3.53
## 2000 2000 5.26 0.012 0.054 127.3 3.49
## Warning: Removed 5 rows containing missing values or values outside the scale range
## (`geom_line()`).
Not only the native genome equivalents (column NGE
), but
also the native effective size (column Ne
), the mean
kinship (column pKin
), the mean kinship at native alleles
pKinatN
, and the generation interval (column
I
) have been estimated for all birth cohorts
t
.
It is possible that the diversity at native alleles increases for a
short period of time, in which case the estimate of the native effective
size would be NA
. Choose df<4
to get a
smooth estimate for the native effective size.
For monitoring the introgression from other breeds, the contributions of foreign breeds to all birth cohorts can be estimated with function conttac and then plotted, e.g. with function ggplot.