Goodness of fit Tests

Arnau Garcia

2024-10-25

Goodness of fit tests for complete and right-censored data

Let \(T\) denote the time until the occurrence of an event of interest, whose distribution function we denote by \(F\). The GofCens package offers various goodness-of-fit techniques to assess whether a univariate sample from \(T\), either complete or right-censored, i.e., the observed times are smaller than the actual times of interest, comes from a specified (continuous) distribution \(F_0(t;\theta)\), where \(\theta\) represents a vector of unknown parameters. Formally, the null hypothesis goodness-of-fit test is \(H_0: F(t)=F_0(t;\theta)\), \(\theta\) representing a vector of unknown parameters. Specifically, the GofCens package provides implementations of well-known tests such as the Kolmogorov-Smirnov, Cramér-von Mises, and Anderson-Darling tests based on the empirical distribution function for complete data and their extensions for right-censored data. Additionally, GofCens includes a chi-squared-type test based on the squared differences between observed and expected counts using random cells, with an extension tailored for right-censored data.

We will illustrate how the functions of the package works using the following simulated survival times. We generate \(300\) survival times from a log-normal distribution with location parameter \(\mu = 2\) and scale parameter \(\beta=1\), i.e. \(T\sim LN(2,1)\), and \(300\) censoring times from an exponential distribution with scale parameter \(\beta=20\), i.e. \(C\sim Exp(20)\):

set.seed(123)
survt <- round(rlnorm(300, 2, 1), 2)
censt <- round(rexp(300, 1 / 20), 2)

The observed right-censored survival times, \(Y = min(T, C)\), and the corresponding event indicators, \(\delta = \boldsymbol{1}\{T \leq C\}\), are created as follows:

times <- pmin(survt, censt)
delta <- as.numeric(survt <= censt)

In total, \(106\) (\(35.3\%\)) of the survival times of the generated sample are right-censored:

table(delta)
#> delta
#>   0   1 
#> 106 194

Kolmogorov-Smirnov test

The Kolmogorov-Smirnov test adapted to right-censored data is implemented in the function KScens() of the GofCens package. This function provides the observed Kolmogorov-Smirnov statistic adapted to right-censored data and the p-value that can be computed either via the theoretical approximation or via bootstrap methods.

We run the KScens() function to assess the goodness of fit of the log-normal and the Weibull distributions. The p-value obtained with the log-normal distribution is, as expected, quite large (\(0.578\)), whereas the low p-value (\(0.041\)) in the second example provides large evidence against the Weibull distribution.

KScens(times, delta, distr = "lognormal")
#> Distribution: lognormal 
#> 
#> KS Test results:
#>       A p-value 
#>   0.781   0.116 
#> 
#> Parameter estimates (se):
#> location          scale             
#> 1.988 (0.058)     0.883 (0.045)     
#> 
#> AIC: 1266.899 
#> BIC: 1274.307

Note that in this second example, with apply the internal function print.KScens(), which allows us to change the default list output to a table format (outp = "table").

set.seed(123)
print(KScens(times, delta, distr = "weibull"), outp = "table")
#> Distribution: weibull 
#> 
#> KS Test results:
#> ------- | -------
#> Metric  | Value  
#> ------- | -------
#> A       | 1.391  
#> p-value | 0.001  
#> ------- | -------
#> 
#> Parameter estimates:
#> --------- | --------- | ---------
#> Parameter | Value     | s.e.     
#> --------- | --------- | ---------
#> shape     | 1.273     | 0.067    
#> scale     | 10.98     | 0.621    
#> --------- | --------- | ---------
#> 
#> AIC: 1304.046 
#> BIC: 1311.453

By default the KScens() function computes the p-value via bootstrap methods, nonetheless if the argument boot=FALSE is used, the computation of the p-value is done using the theoretical approximation. Let us see an example:

KScens(times, delta, distr = "lognormal", boot = FALSE)
#> Distribution: lognormal 
#> 
#> KS Test results:
#>       A p-value   F(ym)      ym 
#>   0.781   0.576   0.972  45.570 
#> 
#> Parameter estimates (se):
#> location          scale             
#> 1.988 (0.058)     0.883 (0.045)     
#> 
#> AIC: 1266.899 
#> BIC: 1274.307

Cramér-von Mises test

The function CvMcens() of the GofCens package provides both the observed Cramér-von Mises statistic adapted to right-censored data and the p-value obtained via bootstrap methods.

We run the CvMcens() function to assess the goodness of fit of the log-normal and the Weibull distributions, as before. Hence, the null hypotheses in the illustrations remain the same as before, as do the conclusions: we would conclude that the data may come from a log-normal distribution, but not from a Weibull distribution.

set.seed(123)
CvMcens(times, delta, distr = "lognormal")
#> Distribution: lognormal 
#> 
#> CvM Test results:
#>     CvM p-value 
#>   0.054   0.450 
#> 
#> Parameter estimates (se):
#> location          scale             
#> 1.988 (0.058)     0.883 (0.045)     
#> 
#> AIC: 1266.899 
#> BIC: 1274.307

And for the Weibull distribution:

set.seed(123)
CvMcens(times, delta, distr = "weibull")
#> Distribution: weibull 
#> 
#> CvM Test results:
#>     CvM p-value 
#>   0.376   0.001 
#> 
#> Parameter estimates (se):
#> shape             scale             
#> 1.273 (0.067)     10.98 (0.621)     
#> 
#> AIC: 1304.046 
#> BIC: 1311.453

Anderson-Darling

The function ADcens() of the GofCens package provides both the observed Anderson-Darling statistic adapted to right-censored data and the p-value obtained via bootstrap methods.

For example, we can test the null hypothesis that the data come from a log-normal distribution with location and scale parameters equal to \(\mu=2\) and \(\beta=1.5\), respectively. In this case, as shown below, the Anderson-Darling test clearly rejects the null hypothesis (\(p = 0.001\)). Note that the output now displays both the parameters of the null hypothesis and the estimated parameters

set.seed(123)
print(ADcens(times, delta, distr = "lognormal", 
             params0 = list(location = 2, scale = 1.5)), outp = "table")
#> Distribution: lognormal 
#> 
#> Null hypothesis:
#> --------- | ---------
#> Parameter | Value    
#> --------- | ---------
#> location  | 2        
#> scale     | 1.5      
#> --------- | ---------
#> 
#> AD Test results:
#> ------- | -------
#> Metric  | Value  
#> ------- | -------
#> AD      | 9.536  
#> p-value | 0.001  
#> ------- | -------
#> 
#> Parameter estimates:
#> --------- | --------- | ---------
#> Parameter | Value     | s.e.     
#> --------- | --------- | ---------
#> location  | 1.988     | 0.058    
#> scale     | 0.883     | 0.045    
#> --------- | --------- | ---------
#> 
#> AIC: 1266.899 
#> BIC: 1274.307

Gofcens function

There is a function implemented computing the three p-values for the three previous tests simultaneously by means of bootstrapping methods. The function gofcens() computes the test statistics of the Kolmogorov-Smirnov, Cramér-von Mises, and Anderson-Darling tests adapted to right-censored data and returns the corresponding p values computed via bootstrap methods.

Let us expose a couple of examples with the gofcens() function:

set.seed(123)
gofcens(times, delta, distr = "lognormal")
#> Distribution: lognormal 
#> 
#> Test statistics
#>    KS   CvM    AD 
#> 0.781 0.054 0.500 
#> 
#> p-values
#>    KS   CvM    AD 
#> 0.115 0.450 0.520 
#> 
#> Parameter estimates (se):
#> location          scale             
#> 1.988 (0.058)     0.883 (0.045)     
#> 
#> AIC: 1266.899 
#> BIC: 1274.307

Now we use the function to assess whether the Weibull distribution fits the data, and we print the results in the table format.

set.seed(123)
print(gofcens(times, delta, distr = "weibull"), outp = "table")
#> Distribution: weibull 
#> 
#> Tests results:
#> ---------------- | ---------------- | ----------------
#> Test             | Statistics value | p-value         
#> ---------------- | ---------------- | ----------------
#> KS               | 1.391            | 0.001           
#> CvM              | 0.376            | 0.001           
#> AD               | 2.828            | 0.001           
#> ---------------- | ---------------- | ----------------
#> 
#> Parameter estimates:
#> --------- | --------- | ---------
#> Parameter | Value     | s.e.     
#> --------- | --------- | ---------
#> shape     | 1.273     | 0.067    
#> scale     | 10.98     | 0.621    
#> --------- | --------- | ---------
#> 
#> AIC: 1304.046 
#> BIC: 1311.453

Chi-squared type test

Chi-squared type tests are also implemented, the function chisqcens() of the GofCens package uses bootstrap techniques to compute the p-value.

In this function two random cell numbers are provided: the number chosen by the user (Original) and the final number (Final), which might be smaller than the previous one because of right-censored data. For example, in the following illustrations with the data of the right-censored sample from the log-normal distribution, we choose \(M = 8\) random cells, but as shown in the output, the number is reduced to \(M = 7\).

set.seed(123)
chisqcens(times, delta, M = 8, distr = "lognormal")
#> Distribution: lognormal 
#> 
#> Chi-squared Test results:
#> Statistic   p-value 
#>     8.657     0.307 
#> 
#> Parameter estimates (se):
#> location          scale             
#> 1.988 (0.058)     0.883 (0.045)     
#> 
#> Cell numbers:
#> Original    Final 
#>        8        7 
#> 
#> AIC: 1266.899 
#> BIC: 1274.307

Now we use the function for the Weibull distribution and we print it in the table format.

set.seed(123)
print(chisqcens(times, delta, M = 8, distr = "weibull"), outp = "table")
#> Distribution: weibull 
#> 
#> Chi-squared Test results:
#> --------- | ---------
#> Metric    | Value    
#> --------- | ---------
#> Statistic | 24.297   
#> p-value   | 0.021    
#> --------- | ---------
#> 
#> Parameter estimates:
#> --------- | --------- | ---------
#> Parameter | Value     | s.e.     
#> --------- | --------- | ---------
#> shape     | 1.273     | 0.067    
#> scale     | 10.98     | 0.621    
#> --------- | --------- | ---------
#> 
#> Cell numbers:
#> Original  | Final    
#> --------- | ---------
#> 8         | 7        
#> --------- | ---------
#> 
#> AIC: 1304.046 
#> BIC: 1311.453

Again, based on the outputs, we would choose the log-normal distribution instead of the Weibull distribution, because its value of the test statistic is clearly smaller and the p-value is far larger.

Real data example: Survival times of retired NBA players

In this section, we apply the above functions of the GofCens package to determine which parametric model fits best to the survival times of former NBA players.

The data frame nba comes with the GofCens package and contains the survival times (variable survtime) of all \(3962\) former players of the of the National Basketball Association (NBA) until July 2019. Survival times are measured as the elapsed time (in years) from the end of the NBA career until either death (cens == 1) or July 31, 2019 (cens == 1). By this date, \(864\) (\(21.8\%\)) of the former players had died with uncensored post-NBA survival times ranging from a few days until nearly 70 years.

We apply the gofcens() function to the logistic and normal distributions in order to see the results of the Kolmogorov-Smirnov, Cramér-von Mises, and Anderson-Darling tests. To reduce the computation times, which would otherwise be very long, we first select a random sample of survival times of size \(n=500\).

data("nba")
set.seed(123)
nbasamp <- nba[sample(nrow(nba), 500), ]
set.seed(123)
gofcens(Surv(survtime, cens) ~ 1, nbasamp, distr = "logistic")
#> Distribution: logistic 
#> 
#> Test statistics
#>    KS   CvM    AD 
#> 0.911 0.229 2.450 
#> 
#> p-values
#>    KS   CvM    AD 
#> 0.017 0.123 0.056 
#> 
#> Parameter estimates (se):
#> location          scale             
#> 51.882 (1.204)     8.608 (0.579)     
#> 
#> AIC: 1104.212 
#> BIC: 1112.641

Now for the normal distribution:

set.seed(123)
gofcens(Surv(survtime, cens) ~ 1, nbasamp, distr = "normal")
#> Distribution: normal 
#> 
#> Test statistics
#>    KS   CvM    AD 
#> 1.207 0.432 3.787 
#> 
#> p-values
#>    KS   CvM    AD 
#> 0.001 0.046 0.027 
#> 
#> Parameter estimates (se):
#> location          scale             
#> 51.704 (1.359)     16.187 (0.969)     
#> 
#> AIC: 1111.097 
#> BIC: 1119.526

The test statistics of all three tests are smaller in the case of the logistic distribution and the corresponding p-values are larger. Thus, we would select the logistic distribution over the normal distribution.