Let us set up the notations first. Suppose a there exists a partition of a region \({\rm D} \in {\cal R}^2\) (e.g., a city). This partition is denoted by \(A_i\), \(i = 1, \ldots, n\). Moreover, there exists another partition of the same city, denoted \(B_j\), where \(j = 1, \ldots, m\). These partitions can be seen as two different administrative divisions within the same city. It is common for different government agencies to release data for different divisions of a same city, country, or state.
Assume we observe a random variable \(Y(\cdot)\) at each region \(A_i\) and we are interested in predict/estimate this variable in each of the regions \(B_j\). Now suppose the random variable \(Y(\cdot)\) varies continuously over \({\rm D}\) and is defined as follows \[ Y(\mathbf{s}) = \mu + S(\mathbf{s}) + \varepsilon(\mathbf{s}), \, \mathbf{s} \in {\rm D} \subset {\cal R}^2. \] where \[ S(\cdot) \sim {\rm GP}(0, \sigma^2 \rho(\cdot; \, \phi, \kappa)) \; \text{ and } \; \varepsilon(\cdot) \overset{{\rm i.i.d.}}{\sim} {\rm N}(0, \sigma^2 \rho(\cdot; \, \phi, \kappa)), \] with \(S\) and \(\varepsilon\) independent of each other. For now, let’s make the unrealistic assumption that all those parameters are known. Then, our assumption is that the observed data is as follows \[\begin{align*} Y(A_i) & = \frac{1}{\lvert A_i \rvert} \int_{A_i} Y(\mathbf{s}) \, {\rm d} \mathbf{s} \\ & = \frac{1}{\lvert A_i \rvert} \int_{A_i} [\mu + S(\mathbf{s}) + \varepsilon(\mathbf{s})] \, {\rm d} \mathbf{s} \\ & = \mu + \frac{1}{\lvert A_i \rvert} \int_{A_i} S(\mathbf{s}) {\rm d} \mathbf{s} + \frac{1}{\lvert A_i \rvert} \int_{A_i} \varepsilon(\mathbf{s}) {\rm d} \mathbf{s}, \end{align*}\] where \(\lvert \cdot \rvert\) returns the area of a polygon. Furthermore, it can be shown that (using Fubini’s Theorem and some algebraic manipulation) \[ {\rm Cov}(Y(A_i), Y(A_j)) = \frac{\sigma^2}{\lvert A_i \rvert \lvert A_j \rvert} \int_{A_i \times A_j} \rho( \lVert \mathbf{s} - \mathbf{s}' \rVert; \, \phi, \kappa ) \, {\rm d} \mathbf{s} \, {\rm d} \mathbf{s}' + \mathbf{I}(i = j) \frac{\tau}{\lvert A_i \rvert}, \] where \(\rho(\cdot ; \, \phi, \kappa)\) is a positive definite correlation function. Now, let \({\rm R}_{\kappa}(\phi)\) be a correlation matrix such that \[ {\rm R}_{\kappa}(\phi)_{ij} = \frac{1}{\lvert A_i \rvert \lvert A_j \rvert} \int_{A_i \times A_j} \rho( \lVert \mathbf{s} - \mathbf{s}' \rVert; \, \phi, \kappa ) \, {\rm d} \mathbf{s} \, {\rm d} \mathbf{s}', \] thus, \[ Y(A_1, \cdots, A_n) \sim {\rm N}( \mu \mathbf{1}_n, \sigma^2 {\rm R}_{\kappa}(\phi) + \tau {\rm diag}(\lvert A_1 \rvert^{-1}, \ldots, \lvert A_1 \rvert^{-1})). \] Then, if we assume \((Y^{\top}(A_1, \cdots, A_n), Y^{\top}(B_1, \cdots, A_m)^{\top})\) to be jointly normal, we use can the conditional mean of \(Y^{\top}(B_1, \cdots, A_m)^{\top}\) given \(Y^{\top}(A_1, \cdots, A_n)\) to estimate the observed random variable in the partition \(B_1, \ldots, B_m\).
Now, suppose the parameters \(\boldsymbol{\theta} = (\mu, \sigma^2, \phi, \tau)\) are unknown. The Likelihood of \(Y(A_1, \ldots, A_n)\) can still be computed.
In particular, if we use the parametrization \(\nu = \tau / \sigma^2\), we have closed form for the Maximum Likelihood estimators both for \(\mu\) and \(\sigma^2\). Thus, we can optimize the profile likelihood for \(\phi\) and \(\nu\) numerically. Then, we resort on conditional Normal properties again to compute the predictions in a new different set of regions.
Areal interpolation is a nonparametric approach that interpolates \(Y(A_i)\)’s to construct \(Y(B_j)\)’s. Define an \(m \times n\) matrix \(\mathbf{W} = \{ w_{ij} \}\), where \(w_{ij}\) is the weight associated with the polygon \(A_i\) in constructing \(Y(B_j)\). The weights are \(w_{ij} = \lvert A_i \cap B_j \rvert / \lvert B_j \rvert\) (Goodchild and Lam 1980; Gotway and Young 2002). The interpolation for \(\hat Y(B_1, \ldots, B_m)\) is constructed as \[\begin{equation} \label{eq:np-est} \hat{Y}(B_1, \ldots, B_m) = \mathbf{W} Y(A_1, \ldots, A_n). \end{equation}\] The expectation and variance of the predictor are, respectively, \[ {\rm E}[\hat{Y}(B_1, \ldots, B_m)] = \mathbf{W} {\rm E}[Y(A_1, \ldots, A_n)] \] and \[\begin{equation} \label{eq:np-matcov} \textrm{Var}[\hat{Y}(B_1, \ldots, B_m)] = \mathbf{W} \textrm{Var}[Y(A_1, \ldots, A_n)] \mathbf{W}^{\top}. \end{equation}\] In practice, the covariance matrix \(\textrm{Var}[Y(A_1, \ldots, A_n)]\) is unknown and, consequently needs to be estimated.
The variance each predictor \(\text{Var}[\hat Y(B_i)]\) is needed as an uncertainty measure. It relies on both the variances of \(Y(A_j)\)’s and their covariances: \[\begin{align} \label{eq:np-single-var} \textrm{Var}[\hat{Y}(B_i)] = \sum_{i = 1}^n w^2_{ij} \textrm{Var} \left [ Y(A_i) \right ] + 2 \sum_{l \neq i} w_{ij} w_{il} \textrm{Cov} \left[ Y(A_i), Y(A_l) \right]. \end{align}\] The variances are often observed in survey data, but the covariances are not. For practical purpose, we propose an approximation for \(\textrm{Cov}[ Y(A_i), Y(A_l)]\) based on Moran’s I, a global spatial autocorrelation. Specifically, let \(\rho_I\) be the Moran’s I calculated with a weight matrix constructed with first-degree neighbors. That is, \(\rho_I\) is the average of the pairwise correlation for all neighboring pairs. For regions \(A_i\) and \(A_l\), if they are neighbors of each other, our approximation is \[\begin{align} \label{eq:cova} \textrm{Cov} \left[ Y(A_i), Y(A_l) \right] = \rho_I \sqrt{\text{Var}[Y(A_i)] \text{Var}[Y(A_l)]}. \end{align}\] The covariance between non-neighboring \(A_i\) and \(A_l\) is discarded. The final uncertainty approximation for \(\textrm{Var}[\hat{Y}(B_i)]\) will be an underestimate. Alternatively, we can derive, at least, an upper bound for the variance of the estimates by using a simple application from the Cauchy–Schwartz inequality, in which case, \(\rho_I\) is replaced with~1.