The CFAcoop package equips you with functions to analyze data from the clonogenic assay (also called ‘Colony Formation Assay’) in presence or absence of cellular cooperation. Thus, this package allows you to robustly extract clonogenic survival information for your cell lines under a given treatment.
This vignette is meant to enable you to process your data following the method first presented in Brix et al., Radiation Oncology, 2020: “The clonogenic assay: robustness of plating efficiency-based analysis is strongly compromised by cellular cooperation”. Therefore, the data presented in Figure 2 in Brix et al., which is provided within the package, is used to illustrate how to use the CFAcoop package. Further, it is shown, how the survival fractions of cooperative cell lines would look like, if cellular cooperation was ignored.
In order to avoid confusion, cellular cooperation should be defined. Understanding this concept comes easy, when starting from the opposite. Clonogenic growth of non-cooperative cells is independent of cell density. Thus, there is a constant relationship between cells seeded \(S\) and colonies counted \(C\): \[C = a \cdot S,\] where \(a\) corresponds just to the conventional plating efficiency (\(PE = \frac{C}{S}\)).
Now, cellular cooperation refers to the benefit cells can have from their surrounding neighbors which results in a non-constant relation of cells seeded and colonies counted (For details, see Brix et al., Radiation Oncology, 2020). The probability of clonogenic growth for a single cell increases with the number of surrounding cells to cooperate with. It has turned out that generalizing the equation above by a parameter \(b\) adequately models the colonies counted of cooperative and non-cooperative cell lines: \[C = a \cdot S^b.\] In this model, \(b = 1\) gives the non-cooperative case and \(b > 1\) corresponds to cooperative growth.
In short, a cell line is called cooperative, if \(b > 1\).
Conventionally, clonogenic survival at a given treatment \(x\) was determined as the ratio of colonies counted \(C_x\) and the cells seeded \(S_x\) scaled to the plating efficiency of a reference \(PE_0 = \frac{C_0}{S_0}\) \[SF'_x = \frac{\frac{C_x}{S_x}}{PE_0}.\]
The new method now shifts the focus of the survival fraction directly to the number of cells needed to be seeded under the two conditions (treated and untreated) in order to achieve an identical expectation of the number of colonies formed \(C\). Essentially, the new method does not focus on the number of colonies formed after growth in different cell densities, but on the number of seeded single cells with clonogenic potential before growing to identical colony numbers.
\[SF_x(C) = \frac{S_0(C)}{S_x(C)} = exp\left( \frac{log\left(\frac{C}{a_0}\right)}{b_0} - \frac{log\left(\frac{C}{a_x}\right)}{b_x}\right)\]
Obviously, for \(b_x = b_0 = 1\) the equivalence \(SF_x(C) \equiv SF'\) holds for all \(C\), and thus, the non-cooperative case is well covered by the new method. Importantly, the conventional determination of clonogenic survival is heavily compromised by cellular cooperation, if present.
The data as presented in Figure 2 in Brix et al. is included in the package in form of a `data.frame`
CFAdata. It can be loaded and summarized by:
library(CFAcoop)
data("CFAdata")
summary(CFAdata)
#> cell.line Cells seeded 0 Gy 1 Gy
#> T47D :54 Min. : 14.68 Min. : 0.00 Min. : 0.00
#> MDA-MB231:84 1st Qu.: 146.78 1st Qu.: 8.00 1st Qu.: 10.00
#> A549 :72 Median : 1000.00 Median : 35.50 Median : 35.00
#> HCC1806 :48 Mean : 3980.14 Mean : 79.55 Mean : 83.97
#> SKBR3 :80 3rd Qu.: 4641.59 3rd Qu.:114.00 3rd Qu.:125.50
#> SKLU1 :56 Max. :31622.78 Max. :535.00 Max. :510.00
#> BT20 :60 NA's :214 NA's :214
#> 2 Gy 4 Gy 6 Gy 8 Gy
#> Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.00
#> 1st Qu.: 10.00 1st Qu.: 7.50 1st Qu.: 4.00 1st Qu.: 1.00
#> Median : 35.00 Median : 29.00 Median : 19.00 Median : 7.00
#> Mean : 85.31 Mean : 70.01 Mean : 56.17 Mean : 30.21
#> 3rd Qu.:119.00 3rd Qu.:105.00 3rd Qu.: 73.25 3rd Qu.: 30.00
#> Max. :480.00 Max. :400.00 Max. :450.00 Max. :250.00
#> NA's :219 NA's :215 NA's :214 NA's :213
The shortcut to analyze data, is using the wrapper function `analyze_survival(RD, name, xtreat)`
where RD is a `data.frame`
or `matrix`
containing your numbers of seeded cells (first column) and numbers of colonies counted under the treatments (numeric argument, e.g. the dose applied `xtreat = c(0,1,2,4,6,8)`
).
The returned objects should be concatenated in a list-object and can be plotted by `plot_sf()`
.
data("CFAdata")
<- subset.data.frame(x = CFAdata, subset = CFAdata$cell.line == "T47D")
data1 <- subset.data.frame(x = CFAdata, subset = CFAdata$cell.line == "BT20")
data2 <- vector("list", 2)
SF 1]] <- analyze_survival(
SF[[RD = data1[, c("Cells seeded","0 Gy","1 Gy","2 Gy","4 Gy","6 Gy","8 Gy")],
name = as.character(data1[1,1]),
xtreat = c(0, 1, 2, 4, 6, 8),
C = 20)
2]] <- analyze_survival(
SF[[RD = data2[,-1],
name = as.character(data2[1,1]),
xtreat = c(0, 1, 2, 4, 6, 8))
plot_sf(SF = SF)
Raw data from single replicates is plotted as ‘\(+\)’-symbols. Corresponding regression lines are calculated using the mean values of replicates of identical numbers of cells seeded, which are plotted as dots. The color indicates the treatment (irradiation with 0 to 8 Gy) and links the numbers of colonies counted from the upper plot to the calculated clonogenic survival in the lower plot. Shaded areas indicate the span from \(C = 5\) to \(C = 100\), which is within the target region of good experimental practice. Dashed lines show regression lines with a slope of \(b=1\) (at \(log10(a)\) varying from 0 to 4) for orientation, so that any substantially non-linear relation (i.e. \(b \neq 1\)) between the number of colonies counted (\(C\)) and the number of cells seeded (\(S\)) can be spotted easily.
The dots in the treatment response curves correspond to the survival fractions at \(C = 20\) with error bars indicating the uncertainty of the estimated survival fractions in terms of its standard deviation. This uncertainty is calculated via First-Order-Taylor-Series-Approximation of \(SF_x(C)\).
All information used for plotting is contained in the objects returned by `analyze_survival`
.
A `data.frame`
with a summary of the estimated survival fractions can be generated by
<- export_sf(SF) summary_df
to export this `data.frame`
in a csv-File, execute: ` write.csv(x = summary_df,file = "CFAcoopResult.csv") `
The `data.frame`
includes the following columns
colnames(summary_df)
#> [1] "cell.line" "treatment" "ln.a" "sd.ln.a" "b"
#> [6] "sd.b" "r.ln.a.b" "SF" "sd.SF" "lb.SF"
#> [11] "ub.SF" "log10.SF" "sd.log10.SF" "lb.log10.SF" "ub.log10.SF"
head(summary_df)
#> cell.line treatment ln.a sd.ln.a b sd.b r.ln.a.b SF sd.SF lb.SF
#> 1 T47D 0 -1.040 0.108 1.024 0.020 -0.983 1.0000 NA NA
#> 2 T47D 1 -1.277 0.094 1.015 0.016 -0.986 0.7630 0.0340 0.6992
#> 3 T47D 2 -1.984 0.163 1.055 0.026 -0.987 0.4595 0.0251 0.4129
#> 4 T47D 4 -2.375 0.377 0.960 0.054 -0.990 0.1914 0.0189 0.1578
#> 5 T47D 6 -4.247 0.359 0.996 0.044 -0.993 0.0358 0.0023 0.0315
#> 6 T47D 8 -5.460 0.873 0.882 0.098 -0.994 0.0035 0.0005 0.0027
#> ub.SF log10.SF sd.log10.SF lb.log10.SF ub.log10.SF
#> 1 NA 0.0000 NA NA NA
#> 2 0.8325 -0.1175 0.0193 -0.1554 -0.0796
#> 3 0.5114 -0.3377 0.0237 -0.3842 -0.2913
#> 4 0.2322 -0.7180 0.0428 -0.8020 -0.6341
#> 5 0.0407 -1.4460 0.0285 -1.5018 -1.3902
#> 6 0.0047 -2.4526 0.0622 -2.5746 -2.3307
summary(summary_df)
#> cell.line treatment ln.a sd.ln.a
#> Length:12 Min. :0.0 Min. :-16.049 Min. :0.0940
#> Class :character 1st Qu.:1.0 1st Qu.:-12.333 1st Qu.:0.3100
#> Mode :character Median :3.0 Median : -7.246 Median :0.8135
#> Mean :3.5 Mean : -7.676 Mean :0.7391
#> 3rd Qu.:6.0 3rd Qu.: -2.277 3rd Qu.:1.0542
#> Max. :8.0 Max. : -1.040 Max. :1.7440
#>
#> b sd.b r.ln.a.b SF
#> Min. :0.882 Min. :0.01600 Min. :-0.9940 Min. :0.0035
#> 1st Qu.:1.010 1st Qu.:0.03950 1st Qu.:-0.9915 1st Qu.:0.1717
#> Median :1.407 Median :0.10450 Median :-0.9900 Median :0.4451
#> Mean :1.502 Mean :0.09942 Mean :-0.9891 Mean :0.4708
#> 3rd Qu.:2.069 3rd Qu.:0.14800 3rd Qu.:-0.9868 3rd Qu.:0.7641
#> Max. :2.155 Max. :0.21000 Max. :-0.9830 Max. :1.0000
#>
#> sd.SF lb.SF ub.SF log10.SF
#> Min. :0.00050 Min. :0.0027 Min. :0.0047 Min. :-2.4526
#> 1st Qu.:0.01792 1st Qu.:0.1016 1st Qu.:0.1727 1st Qu.:-0.7757
#> Median :0.02415 Median :0.2696 Median :0.3951 Median :-0.3517
#> Mean :0.03517 Mean :0.3030 Mean :0.4420 Mean :-0.6104
#> 3rd Qu.:0.04188 3rd Qu.:0.5000 3rd Qu.:0.7431 3rd Qu.:-0.1169
#> Max. :0.11330 Max. :0.6992 Max. :1.0249 Max. : 0.0000
#> NA's :2 NA's :2 NA's :2
#> sd.log10.SF lb.log10.SF ub.log10.SF
#> Min. :0.01930 Min. :-2.5746 Min. :-2.3307
#> 1st Qu.:0.03207 1st Qu.:-1.0118 1st Qu.:-0.7703
#> Median :0.04445 Median :-0.5906 Median :-0.4231
#> Mean :0.04452 Mean :-0.8198 Mean :-0.6452
#> 3rd Qu.:0.05860 3rd Qu.:-0.3034 3rd Qu.:-0.1361
#> Max. :0.06790 Max. :-0.1554 Max. : 0.0107
#> NA's :2 NA's :2 NA's :2
All information of this `data.frame`
is also accessible directly in the object returned by `analyze_survival`
. For instance, the information about the regression of the 0 Gy reference of the cell line BT20 or the survival fractions of the 5 treatments for T47D (at \(C = 20\)) can be recalled by:
2]]$fit[1]
SF[[#> [[1]]
#>
#> Call:
#> lm(formula = "lnC ~ 1 + lnS", data = x)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -0.60869 -0.30974 0.07812 0.28621 0.59499
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -9.0318 0.7538 -11.98 2.17e-06 ***
#> lnS 1.7586 0.1173 14.99 3.86e-07 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 0.4088 on 8 degrees of freedom
#> Multiple R-squared: 0.9656, Adjusted R-squared: 0.9613
#> F-statistic: 224.8 on 1 and 8 DF, p-value: 3.865e-07
1]]$SF
SF[[#> [1] 0.76295935 0.45950014 0.19141889 0.03581058 0.00352676
Key to the robust analysis of clonogenic analysis data is the modeling of the cellular cooperation. We assume that the underlying functional dependency of seeded cells and counted colonies is of the form \[C = a \cdot S^{b},\] where \(b\) indicates the degree of cellular cooperation (\(b = 1\) is implicitly assumed for the PE-based approach).
The coefficient \(b\) is estimated in a linear regression model \[log(C) = log(a) + b \cdot log(S) + \varepsilon, \varepsilon \sim \mathcal{N}(0,\sigma^2). \]
The function `pwr_reg(seede, counted)`
provides this regression and returns a `summary.lm`
object.
Note that the analysis of cellular cooperation is restricted to the range of seeded cells, where at least one colony was observed. Outside this range, the attempt of studying clonogenic survival based on no observed colony counts is not reasonable and thus, `pwr_reg`
will remove those data points from analysis. Thus, it is strongly recommended to use the averaged data for regression. By doing so, the range of the independent variable of the regression is widened. (Removing only those replicates with no colonies at one or few specific cell densities would bias the model fitting.)
<- subset.data.frame(x = CFAdata, subset = CFAdata$cell.line == "BT20")
data <- aggregate(x = data[, -1], by = list(data$`Cells seeded`), FUN = "mean", na.rm = TRUE)
data <- pwr_reg(seeded = data$`Cells seeded`, counted = data$`0 Gy`)
par_0 $coefficients
par_0#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -9.031773 0.7538225 -11.98130 2.169497e-06
#> lnS 1.758593 0.1172839 14.99433 3.864744e-07
plot(x = log10(data$`Cells seeded`), y = log10(data$`0 Gy`),xlim = c(2,3.5))
abline(a = log10(exp(1)) * par_0$coefficients[1, 1], b = par_0$coefficients[2, 1])
With the results of this function, we can also test for cellular cooperation. Note, that the p-value in the coefficients table corresponds to the null hypothesis \(b = 0\), but we are interested in the null hypothesis of \(b = 1\). Thus, we find our p-value of interest by computing
<-
p_value 1 - pt(
(q = (par_0$coefficients[2, 1] - 1) / par_0$coefficients[2, 2],
df = par_0$df[2]
))
Thus, BT20 is higly cooperative (\(b = 1.76\), \(\hat{\sigma}_b = 0.12\), \(p < 0.001\)).
In this package, the survival fraction \(SF(C)\) for clonogenic survival is calculated as the number of cells that need to be seeded without treatment divided by the number of cells needed to be seeded with treatment for obtaining the same expectation of colonies counted \(C\). \[ SF(C) = \frac{S_0(C)}{S_x(C)} = exp\left( \frac{log\left(\frac{C}{a_0}\right)}{b_0} - \frac{log\left(\frac{C}{a_x}\right)}{b_x} \right)\]
Given two parameter sets of clonogenic assay data, the clonogenic survival can be calculated as:
<- subset.data.frame(x = CFAdata, subset = CFAdata$cell.line == "BT20")
data <- aggregate(x = data[, -1], by = list(data$`Cells seeded`), FUN = "mean", na.rm = TRUE)
data <- pwr_reg(seeded = data$`Cells seeded`, counted = data$`0 Gy`)
par_0 <- pwr_reg(seeded = data$`Cells seeded`, counted = data$`4 Gy`)
par_4 calculate_sf(par_ref = par_0, par_treat = par_4, C = 20)
#> 20
#> 0.4308404
Note that in case of cooperative cell lines, the parameter \(a\) does not correspond to a plating efficiency as for non-cooperative cell lines. The concept of a characteristic plating efficiency does not apply to cooperative cell lines.
The survival fraction at \(C\) for treatment \(x\) is calculated by the function \[SF_x(C) = \frac{S_0(C)}{S_x(C)} = exp\left( \frac{log\left(\frac{C}{a_0}\right)}{b_0} - \frac{log\left(\frac{C}{a_x}\right)}{b_x}\right).\]
Since the SF-values are solely dependent on the estimated parameters in the power regression (and the chosen \(C\)), the inherent uncertainty can be assessed via parametric bootstrapping (e.g. using the package mvtnorm to generate parameter sets according to the variance-covariance matrix of the fit), or by following the laws of error propagation (First-Order Taylor-Series Approximation).
We choose the analytic approximation. In order to build meaningful uncertainty intervals (i.e. respect that survival fractions will never be below zero), we work on the log-scale and transform the boundaries to the linear scale at the end.
For the sake of a shorter notation, we write: \[g = log(SF_x(C)) = \frac{d-\alpha_0}{b_0} - \frac{d-\alpha_x}{b_x} \]
According to \(\Sigma_g \approx J \Sigma_pJ^T\), where \(J\) denotes the Jacobian of \(g\) and \(\Sigma_p\) the variance-covariance matrix of the estimated parameters \(\alpha = log(a)\) and \(b\) at \(0\) and \(x\) Gy, we find \[\sigma_g^2 \approx \frac{1}{b_0^2} z_0 \Sigma_0 z_0^T + \frac{1}{b_x^2} z_x \Sigma_x z_x^T \] with \[ z_x = \left(\begin{matrix}1 & \frac{d-\alpha_x}{b_x}\end{matrix}\right) \] and \[\Sigma_x = \left(\begin{matrix} \sigma_{\alpha_x}^2 & \sigma_{\alpha_x b_x}\\ \sigma_{\alpha_x b_x} & \sigma_{b_x}^2 \end{matrix} \right).\]
In short: The plating efficiency frequently is not as constant as it needs to be in order to serve as an adequate normalization factor.
To illustrate this, we compare the PE-based calculated \(SF'\)-values with the \(SF(C = 20)\)-values calculated with the new method for (1) the non-cooperative cell line T47D and (2) the cooperative cell line (BT20).
For calculating the survival fraction, plating efficiencies are required. Plating efficiencies (\(C/S\)) are calculated easily as:
data(CFAdata)
<- subset.data.frame(x = CFAdata, subset = CFAdata$cell.line == "T47D")
data <- aggregate(x = data[, -1], by = list(data$`Cells seeded`), FUN = "mean", na.rm = TRUE)
data <- data$`4 Gy` / data$`Cells seeded`
PE_x <- data$`0 Gy` / data$`Cells seeded`
PE_0 plot(x = rep(c(0, 1), each = 18), y = c(PE_0, PE_x), lty = 0, ylim = c(0,0.5),xlim = c(-0.1,1.1),
xlab = "treatment", ylab = "C / S", axes = FALSE, main = "T47D")
axis(side = 1,at = c(0,1),labels = c("0 Gy","4 Gy"))
axis(side = 2,at = seq(0,0.5,0.1))
Now, which values are to be compared?
When there is no effect of the cell density, as assumed by the conventional approach (not taking cellular cooperation into account), each combination (of a \(C_{0 Gy}/S_{0 Gy}\)- and \(C_{4 Gy}/S_{4 Gy}\)-value) is equally reliable.
Thus, the full set of all combinations is:
<- rep(PE_x, each = length(PE_0)) / rep(PE_0, times = length(PE_x))
SF_resample hist(SF_resample, breaks = 25,xlim = c(0.12,0.25),xlab = "(C_4/S_4) / (C_0/S_0)",
main = "valid PE-based SF'-values")
Without the assessment of cellular cooperation, conventional calculation of an \(SF'\)-value, corresponds to picking randomly a sample from the distribution shown in the histogram above.
A comparison of the range of this distribution and the calculated uncertainties of the new method shows, that for non-cooperative cell lines such as T47D, there is no big difference in this variability/uncertainty.
range(SF_resample,na.rm = TRUE)
#> [1] 0.1322039 0.2383178
<- analyze_survival(RD = data[,-1],C = 20)
as_nc_0 $uncertainty[4,c(3,5,6)]
as_nc_0#> SF lb.SF ub.SF
#> 4 0.1914189 0.1577691 0.2322457
Now, making the same comparison as in (1) for the cooperative cell line BT20 shows the disastrous effect of ignoring the coefficient \(b\), when it is in fact different from \(1\).
data(CFAdata)
<- subset.data.frame(x = CFAdata, subset = CFAdata$cell.line == "BT20")
data <- aggregate(x = data[, -1], by = list(data$`Cells seeded`), FUN = "mean", na.rm = TRUE)
data <- data$`4 Gy` / data$`Cells seeded`
PE_x <- data$`0 Gy` / data$`Cells seeded`
PE_0 plot(x = rep(c(0, 1), each = length(PE_x)),
y = c(PE_0, PE_x), lty = 0, ylim = c(0,0.08),xlim = c(-0.1,1.1),
xlab = "treatment", ylab = "C / S", axes = FALSE, main = "BT20")
axis(side = 1,at = c(0,1),labels = c("0 Gy","4 Gy"))
axis(side = 2,at = seq(0,0.08,0.02),las = 1)
<- rep(PE_x, each = length(PE_0)) / rep(PE_0, times = length(PE_x))
SF_resample hist(SF_resample, breaks = 100,xlim = c(0,10),xlab = "(C_4/S_4) / (C_0/S_0)",
main = "valid PE-based SF'-values")
The range of the PE-based \(SF'\)-values does not correspond to the uncertainty of the \(SF(20)\)-values:
range(SF_resample,na.rm = TRUE)
#> [1] 0.02511682 8.53525274
<- analyze_survival(RD = data[,-1],C = 20)
as_c_4 $uncertainty[4,c(3,5,6)]
as_c_4#> SF lb.SF ub.SF
#> 4 0.4308404 0.3518497 0.5275645
Even though the survival fraction can be accurately estimated under consideration of cellular cooperation the PE-based approach fails in returning a trustworthy estimate of the fraction of cells losing their potential due to the treatment (see histogram).
In particular, the average of PE-based \(SF'\) calculations does not asymptotically tend to a meaningful value. In case of strong cellular cooperation, the PE-based calculated \(SF'\)-value is heavily affected by this cellular cooperation and the treatment effect of interest is degraded to a side effect.
Before calculating PE-based survival fractions, one must check whether there is cellular cooperation or not.
Essentially, to decide, whether you have a cooperativity issue or not, you need to conduct the same analysis and to generate the same data that is necessary to solve this issue anyways.