KriSp

An R Package for Covariance Tapered Kriging
of Large Datasets Using Sparse Matrix Techniques

Tutorial

Reinhard FURRER
Mathematical and Computer Sciences Department
Colorado School of Mines
Golden, CO, 80401
rfurrer@mines.edu
November 21, 2006

Introduction

Interpolation of a spatially correlated random process is used in many scientific areas. The best unbiased linear predictor (BLUP), often called kriging predictor in geostatistics, requires the solution of a linear system based on the (estimated) covariance matrix of the observations. Frequently, the most interesting spatial problems involve large datasets and their analysis overwhelms traditional implementations of spatial statistics. Furrer et al. (2006) show that tapering the correct covariance matrix with an appropriate compactly supported covariance function reduces the computational burden significantly and still results in an asymptotic optimal mean squared error. The effect of tapering is to create a sparse approximate linear system that can then be solved using sparse matrix algorithms. This package provides a suite of functions for the R statistical computing software (R, 2004; Ihaka and Gentleman, 1996) to perform interpolation of large or even massive datasets using covariance tapering (R is an open source implementation of the § language, Chambers, 1998; Chambers and Hastie, 1992).

Throughout this document, packages, programs and external functions are written in sans serif font. R input commands, R functions and their arguments are typed in slanted typewriter font, corresponding output, if any, in upright typewriter font.

The package KriSp (Kriging with Sparse matrices) is considered as an add-on to the package fields. KriSp has a similar class to one of fields and uses some of its methods. Further, KriSp uses the package SparseM to handle the sparse matrix techniques. Thus the package is not exhaustive in its functionality compared to other geostatistical packages like geoR. Also, most functions are not fully optimized in order to enhance readability of the code.

The reader should be familiar with R as well as with standard geostatistical terms and modeling (see for instance Cressie, 1993 or Stein, 1999, for a detailed discussion on that topic). Insight of the packages fields and SparseM is beneficial but not mandatory. This tutorial should give insight in how to use the different functions and methods of KriSp. Typically, default arguments are used for the function calls. The user is strongly encouraged to examine the function arguments using the the R functions args and help (the help files of major functions included in KriSp are given in the Appendix). The commands used in this tutorial are also available at http://www.mines.edu/~rfurrer/software/KriSp/KriSp.tutorial.R. Please send any comments concerning this document or the package to rfurrer@mines.edu.

Getting Started

Download the package KriSp_0.4.tar.gz to your local file system and install it using the following command at your Unix prompt

$ R CMD INSTALL -l   /path/to/library  KriSp_0.4.tar.gz

For Windows, the equivalent command to be executed at the DOS prompt is

$ Rcmd INSTALL -l   /path/to/library  KriSp_0.4.tar.gz

Alternatively, the package can also be installed within an R session (see the R manual for more details). Start by opening an R session (version $\geq$ 1.7) and load the package

> library("KriSp", lib.loc="/path/to/library")

The required libraries fields and SparseM are automatically loaded, provided they have been previously installed.

The main class of KriSp is sparse. In order use the methods defined for the Krig class in fields, the class sparse has the secondary class Krig. The main functions of the package KriSP are

> Krig.simple.sparse()
> Krig.sparse()
> predict()

The first function performs simple kriging, the second universal kriging. The third function is a method to perform predictions on a given set of locations. In order to simplify the coding, the simple and the universal kriging approach have been separated. Both approaches are illustrated in the subsequent sections. In Section 4, the tapering technique is further illustrated with two examples.

Included Spatial Models

Spatial Model with Zero Mean

To illustrate the capacity of sparse matrix techniques, we start with the simple spatial model

$\displaystyle Y(\x)=Z(\x)+\varepsilon(\x),$

where

is mean zero process with covariance function

and $\varepsilon$ is a white noise with variance $\sigma^2$ . In other words, we observe the process

at some locations, say $\x_1,\dots,\x_n$ , with a measurement error of variance $\sigma^2$ . Then the best linear unbiased prediction (BLUP) of

at an (unobserved) location $\x^*$ is then

$\displaystyle \hat Z(\x^*)= \c^*\T(\C+\sigma^2\I)^{-1}\Z ,$

(1)

where $\Z=\bigl(Z(\x_1),\dots,Z(\x_n)\bigr)\T$ , $\C_{ij}=K(\x_i,\x_j)$ , $\c_{i}^*=K(\x_i,\x^*)$ and $\I$ is the identity matrix. In geostatistical literature, (1) is referred to as simple kriging (e.g. Cressie, 1993). When predicting on many points, a fine regular grid, spatial field or lattice, the vector $\c^*$ in (1) is replaced by a matrix $\C^*$ containing as columns the respective vectors $\c^*$ for the different points on the grid. The functions Krig.simple.sparse and predict calulate the BLUP (1) for large datasets and large spatial fields.

**Figure 1:** Dataset `simple.data`: locations on the left, histogram of the observations on the right.
$\includegraphics[height=6cm]{figures/simdata}$ $\includegraphics[height=6cm]{figures/simhist}$

There are a few datasets included in the package distribution. To illustrate the simple kriging approach, we use the dataset simple.data. This artificial dataset consists of 100 locations randomly distributed in a unit longitude-latitude square. The spatial Gaussian process has an exponential covariance structure with a range of 10 miles and a sill of 1 and no measurement error. The data are loaded with

> data(simple)
> attach(simple.data)

Figure 1 shows the locations and a histogram of the values.

> look <- as.image( Y, x=x)
> image.plot( look)
> hist( Y)

Kriging in KriSp is performed by creating a sparse object with a call to a kriging function such as Krig.simple.sparse and a subsequent call to a prediction function like predict.

> obj <- Krig.simple.sparse(x, Y)

The variable obj contains among other quantities the vector $(\C+\sigma^2\I)^{-1}\Z$ and the predictions at the observed locations. As there is no measurement error, the predictions are identic to the observations. Prediction of the proces

at an unobserved location $\x^\ast=(-105.5,40.5)$ is obtained simply by

> pre <- predict(obj, x=cbind(-105.5,40.5))

For prediction at several points, we just specify the arguments x with the locations in a $m\times2$ matrix.

**Figure 2:** Predictions on a fine grid.
$\includegraphics[height=6cm]{figures/simpred}$

**Figure 3:** `plot` method for a `sparse` object.
$\includegraphics[height=6cm]{figures/simkrig}$

If we want to predict on a fine grid, we use fields function predict.surface (Figure 2).

> surf <- predict.surface( obj)
> image.plot( surf)

There exists a rudimentary plott and print methods for the sparse object (Figure 3).

> plot( obj)
> obj

Call:
  Krig.simple.sparse(x = x, Y = Y)

Covariance: expo.cov
Taper: spher.cov with range 10

Number of obs. = 100

The method summary is just a more extended version of print. Other methods for the sparse class include residuals, fitted and coef. Those are included, as simple and universal kriging can be considered as a linear model.

KriSp is not about parameter estimation, thus we use our a priori knowledge for the parameter specification in the kriging routines. Typically, the covariance parameters are different to the default values. The exponential covariance and its parameters range=10, sill=1 and nugget=0 are the default values for the argument cov.fun="expo.cov" and cov.fun.args respectively. We also used the default taper function and taper parameter. The previous Krig.simple.sparse is acutally identical to

> obj <- Krig.simple.sparse(x, Y, cov.fun = "expo.cov", 
+      cov.fun.args = list(range = 10, sill = 1, nugget = 0),
+      taper.fun = "spher.cov", taper.fun.args = list(range = 50))

Note that the variance of the measurement error is given by the nugget argument in cov.fun.args.

Too small values for the arguments tmpmax (working array for chol) or nnzmax (upper bound of nonzero elements in $\C$ ) result in a warning or core dump respectively. Too big values do not hurt, just consume some computing time (see alse Section 4.2).

A little clean up, prior to the universal kriging example.

> detach(simple.data)
> rm("obj","pre","look","surf")

Spatial Model with a Drift or Trend

In this section we discuss an example where we have a spatial process of the form

$\displaystyle U(\x)=\m(\x)\T\bbeta+Z(\x),$

(2)

where $\m$ is a known function in $\IR^p$ and $\bbeta$ is an unknown parameter in $\IR^p$ . Suppose we observe the process

locations $\x_1,\dots,\x_n$ with a mesurement error having variance $\sigma^2$ . Similar to equation (1), the BLUP of $U(\x^*)$ is given by

$\displaystyle \hat U(\x^*)$

$\displaystyle = \c\T(\C+\sigma^2\I)^{-1}(\Y-\bfM\hat\bbeta)+\m(\x_0)\T\hat\bbeta,$

(3)

where

$\displaystyle \hat\bbeta=(\bfM\T(\C+\sigma^2\I)^{-1}\bfM)^{-1}\bfM\T(\C+\sigma^2\I)^{-1}\Y$

(4)

with $\bfM=\bigl(\m(\x_1),\dots,\m(\x_n)\bigr)\T$ . In geostatistical literature, (3) is referred to as universal kriging. The sparse matrix approach is used with an iterative procedure illustrated as follows. We estimate the mean structure, i.e. the vector $\bbeta$ in (2), via ordinary least squares (OLS), then $\Y-\bfM\hat{\bbeta}{}^\ast$ is kriged yielding $\Z^\ast$ . With OLS on $\Y-\Z^\ast$ we obtain a second estimate $\hat{\bbeta}{}^\ast$ and so forth. This convenient back-fitting procedure converges to the BLUP and a few iterations usually suffice to obtain precise results. Note that (4) is the solution of the weighted least squares. If

is not too big, the BLUP could be also obtained by solving

linear systems as given by equation (3) using a sparse approach.

The function Krig.sparse loops over the regression and simple kriging steps until convergence. Consider the example dataset universal.data. The artificial data is similar to simple.data except that we have a linear trend (Figure 4).

> data(universal)
> attach(universal.data)
> look <- as.image( Y, x=x)
> image.plot( look)

We create a sparse object by calling

> obj <- Krig.sparse(x, Y, 
+           cov.fun.args=list(range=10,sill=.9,nugget=.1))

The error norm between the consecutive coefficients $\hat{\bbeta}{}^\ast$ are in out$coef

**Figure 4:** Dataset `universal.data`.
$\includegraphics[height=6cm]{figures/unidata}$

The universal kriging object obj contains more information than in the simple kriging case.

> summary( obj)

Call:
  Krig.sparse(x = x, Y = Y, 
    cov.fun.args = list(range = 10, sill = 0.9, nugget = 0.1))

Covariance: expo.cov
Taper: spher.cov with range 10

Number of obs. =100
nnz(sigma)     =820, (8.2%)
nnz(Chol sigma)=645, (6.45%)

Spatial trend:
  order m=2, betahat=(-0.5179,4.319,-166.8438)
  convergence with MSE=0.008 after 25 steps
  (criterion maxiter>=25, or MSE<0.001)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.168041 -0.045088  0.002130  0.047612  0.189869

Timing:
Covariance   Cholesky  Backsolve Iterations 
      0.07       0.05       0.00       0.07

The predicted surface is in out$fitted, which is the sum of the trend, out$trend, and the spatial field, out$spatial.

As in the simple kriging case, the predict method returns a vector, the prediction on some specified locations. If trend.only=TRUE, predict returns the mean structure of the field only (Figure 5).

> image.plot( predict.surface(obj))
> image.plot( predict.surface(obj, trend.only=TRUE))

**Figure 5:** Prediction with universal kriging (left) and fitted trend structure (right) for `universal.data`.
$\includegraphics[height=6cm]{figures/unikrig}$ $\includegraphics[height=6cm]{figures/unitrend}$

Finally, proper clean up.

> detach(universal.data)
> rm("obj","look","surf")

Illustration of The Tapering Technique

Comparisation with Other Interpolation Methods

In this section we use the universal dataset to compare interpolation results obtained from different approaches, namely the straightforward naive approach (i.e. using equations (3) and (4)), fields' Krig and KriSps Krig.sparse. Refer to the fields for a detailed discussion of the use of Krig function.

To get started, we load the data and and set the covariance parameters..

> data(universal)
> attach(universal.data)
> range <- 10
> nugget <- 0.1
> sill <- 0.9

Prediction is performed on a $50\times 50$ grid (depending on the available computing power, you might want to lower the grid size to $30\times30$ ).

> nx <- ny <- 50
> xgrid <- make.surface.grid( grid.list=list(lon='x',lat='y'),
+                            X=x, nx=nx, ny=ny)

To predict the surface using fields, we type

> out.fields <- Krig( x, Y, expo.earth.cov, 
+                    sigma2=nugget, rho=sill, lambda=nugget/sill)
> fields <- predict.surface( out.fields, nx=nx, ny=ny)

The sparse approach (with taper range 30) is

> out.sparse <- Krig.sparse( x, Y,cov.fun.args=list(range = range,
+          sill=sill, nugget=nugget),
+          taper.fun.args = list(range = 30),
+          scale.type ="range")
> sparse <- predict.surface( out.sparse, nx=nx, ny=ny)

We used the scale.type argument to transform the locations to obtain better numerical stability. The naive approach is somewhat longer to calculate.

> lagC <- rdist.earth(x)
> lagc <- rdist.earth(x, xgrid)
> C <- expo.cov( lagC, range=range, sill=sill, nugget=nugget)
> c <- expo.cov( lagc, range=range, sill=sill, nugget=nugget)
> 
> invC <- solve(C)
> M <- out.sparse$xM
> scaledxgrid <- scale(xgrid, center = out.sparse$x.center,
+                      scale = out.sparse$x.scale)
> 
> m0 <- t(cbind(scaledxgrid, 1))
> 
> betahat <- solve(t(M) %*% invC %*% M)  %*% t(M) %*% invC  %*% Y
> naive <- c( t(c) %*%  invC %*% (Y-M%*%betahat )+t(m0)%*%betahat )

The results and the differences in the predicted fields are obtained with the following commands. (Figures 6 and 7).

> fields.naive <- fields
> fields.naive$z <- fields$z - array( naive, c(nx,ny))
> fields.sparse <- fields
> fields.sparse$z <- fields$z - sparse$z
> sparse.naive <- fields
> sparse.naive$z <- sparse$z - array(naive, c(nx,ny)) 
> 
> image.plot( sparse)
> image.plot( fields)
> 
> image.plot( fields.naive)
> image.plot( fields.sparse)
> image.plot( sparse.naive)

**Figure 6:** Predicted ozone on a $50\times 50$ grid obtained with fields' `Krig` (left) and KriSps `Krig.sparse` (right).
$\includegraphics[height=6cm]{figures/fieldsfit}$ $\includegraphics[height=6cm]{figures/sparsefit}$

**Figure 7:** Differences in prediction between approaches fields and naive (left), between approaches fields and KriSp (right) and between approaches KriSp and naive (lower panel).
$\includegraphics[height=6cm]{figures/fnresid}$ $\includegraphics[height=6cm]{figures/fsresid}$ $\includegraphics[height=6cm]{figures/snresid}$

The differences seem nevertheless remarkably high. With higher taper ranges, the difference between approaches KriSp and naive can be considerably lowered. Note the difference in the fitted coefficients for the mean structure.

> c(betahat)

[1] -0.1607431  4.0057863 60.2022324

> out.sparse$coef

[1] -0.5928071  4.2595088 60.3194296

For this illustrative example we used a small dataset such that the computing performance of KriSp is not as impressive as it could be.

Large Data

**Figure 8:** Measurement locations of precipitation anomalies for April 1948.
$\includegraphics[height=8cm]{figures/precip}$

In this section we look at the large dataset (anomaly.data) to illustrate the capacity of KriSp. The data are anomalies of aggregated monthly precipitation for April 1948 at 11,918 stations in the US (for more details about the data refer to Johns et al., 2003 and Furrer et al., 2006). We assume that the data is second order stationary. To simplify the document, we also suppose that the underlying isotropic structure is a mixture of two exponential covariance fucntions with range parameters of 40 and 520 miles with respective sill of 0.28 and 0.72.

> data(anomaly)
> attach(anomaly.data)
> US()            
> points(x, pch=".", col=3)

According to Furrer et al. (2006) a tapering radius with 16 to 24 points within the support is a sufficiently precise approximation. With our observation density, a tapering range of 40 miles is sufficient (cf. Figure 8 and 10). We first create the covariance function, being a mixture of two exponential ones. Typical KriSp covariance functions take the arguments distance, range and eps. If some are not used, just include the ``dot'' construction ... in the function definition. The mixture covariance could be defined by

> exp.mix.cov <- function(distance, ...)
+  0.28 * exp( -abs(distance)/40) + 0.72 * exp( -abs(distance)/520)

The sparse object is created with the following arguments.

> timing <- numeric(3)
> dummy <- gc()           # clean up before the heavy work...
> obj <- Krig.simple.sparse(x, Y, cov.fun = "exp.mix.cov",
+                    taper.fun.args = list(range=40), covfun = T,
+                    nnzmax = 320000, tmpmax = 12000)
> timing[1] <- (proc.time()-prtm)[1]
> summary( obj)

Call:
  Krig.simple.sparse(x = x, Y = Y, cov.fun = "exp.mix.cov", 
    taper.fun.args = list(range = 40), 
    nnzmax = 320000, tmpmax = 15000, covfun = T)

Covariance: exp.mix.cov
Taper: spher.cov with range 40

Number of obs. =11918
nnz(sigma)     =317832, (0.2238%)
nnz(Chol sigma)=720234, (0.5071%)


Residuals:
       Min         1Q     Median         3Q        Max 
-2.146e-04 -2.220e-16  0.000e+00  2.220e-16  5.786e-05

Timing:
Covariance   Cholesky  Backsolve    Fitting 
     21.38       1.50       0.03      21.06

Note that we had to increase the values of nnzmax and tmpmax to perform the kriging.

Be aware that the next step might be very time consuming. We perform a prediction and visualisation step (Figure 9).

> prtm <- proc.time()
> lon=seq( -126, -66, length=500)
> lat=seq( 24, 50, length=500)
> pred.surf <- predict.surface(obj,
+                grid.list = list(lon=lon, lat=lat), extrap=T)
> timing[2] <- (proc.time()-prtm)[1]
> 
> prtm <- proc.time()
> image.plot(pred.surf, xaxt="n", yaxt="n")
> US(add=T)
> timing[3] <- (proc.time()-prtm)[1]
> timing   # in seconds

[1]  43.99 308.37   0.98

(The calculations were performed on a Linux powered Xeon processor with 2Gbytes RAM.)

Even if extrap or chull.mask is altered, the computation time is not reduced as fields' predict.surface calculates the values for the entire grid and sets the values not falling in the chull.mask region to NA afterwards.

For readers that are familiar with the storage structure of sparse matrices, the following lines create Figure 10.

> ia <- slot( obj$sigma, "ia")
> n.tap <- diff( ia) - 1
> hist( n.tap)

**Figure 9:** Predicted precipitation anomaly for April 1948 ( $500\times 350$ grid).
$\includegraphics[height=8cm]{figures/precipfit}$

**Figure 10:** Histogram of the number of observations within the taper range. The median is 22 observations and the mean is slightly over 26. One location had just one, twelve locations had two observations within the range.
$\includegraphics[height=6cm]{figures/preciphist}$

As the we have 11.913 observations, the full covariance matrix takes more than 1136 Mbytes compared to 3.86 Mbytes, if we have ``typical'' precision with 8-byte reals and 4-byte integers.

Computational Issues

The function Krig.simple.sparse consists basically in calculating the covariance matrix $\C$ with a Fortran subroutine, using the Cholesky factorization of SparseM and in performing a backsolve operation with the observations. The function Krig.sparse uses a backfitting approach that loops over a regression and a kriging step. The regression step, a simple ordinary least squares estimation, does not use sparse matrix techniques. The kriging step is essentially identical to a call to Krig.simple.sparse.

The method predict is essentially a wrapper to a Fortran function, which loops over all points at which to predict. For each point it calculates $\c^*\T\v$ , where $\v=\C^{-1}\Z$ given by a sparse object. There are several ways to optimize this costly procedure:

use a fast Fourier transform (FFT) approach similar to the function krig.image in fields.
take account of some a priori knowledge of the locations and rewrite the Fortran functions. For example, if the points are on a regular grid, the operation can be done in (essentially) instead of operations ( the number of points to predict).

Evaluating the Matérn covariance function is computationally heavy. R essentially uses the Netlib function rkbesl.f. Instead of using the same or similar functions in the Fortran code of KriSp, a linear approximation scheme is used. All covariances are evaluated on a fine grid with R functions such as mater.cov, expo.cov, etc. Those function values are then passed to the corresponding Fortran routine in where a linear interpolation is made. Of course, this method could be refined to a cubic spline approximation (using fields' css.f) or some other interpolation scheme. This approach also eliminates the need of providing a large database of covariance functions coded in Fortran. Note that in Furrer et al. (2006), no covariance approximations were made. The the following lines display the logarithm of the maximum error of the linear covariance approximation as a function of the number of grid points (Figure 11).

> nseq <- seq(10,1000,by=25)
> maxerr <- sapply(nseq, "covapprox.error")
> plot( nseq, log(unlist(maxerr[1,]),10), type="l",
+          ylab="log( max error), base 10")

**Figure 11:** Error of the linear covariance approximation.
$\includegraphics[height=6cm]{figures/coverror}$

The functions of KriSp are written in a linear, sequential way, calling as few functions as possible in order to save memory. Of course, there would be further gain in ``unrolling'' some of the remaining functions. When dealing with massive data sets, I recommend to use the source of KriSp and to hard code the functions to completely adjust it to the specific problem. In such cases, it is possible to work with datasets involving up to 400,000 locations (Sain and Furrer, 2004).

Outlook

The package KriSp should provide an insight on how to interpolate large or massive spatial datasets. Current work focuses on applying tapering techniques to microarray data and to maximum likelihood covariance parameter estimation.

A possible extension might be to uniquely use S4 classes and methods. As the current version of fields uses S3 classes, KriSp does not entirely use the new class concept.

The next major version of fields (version>3.2) will contain a sparse matrix module and extends the capacities of KriSp. Therefore, KriSp will no longer be significantly extended and future releases will most likely consist of bug fixes only.

Another possible extension might be to uniquely use S4 classes and methods. As the current version of fields uses S3 classes, KriSp does not entirely use the new class concept.

Disclaimer

This is software for statistical research and not for commercial uses. The author does not guarantee the correctness of any function or program in this package. Any changes to the software should not be made without the authors permission.

Acknowledgments

Many thanks to Doug Nychka, Stephen R. Sain, Tim Hoar and Eva Maria Furrer for many valuable remarks on the programming of the package and the writing of this document.

Bibliography

Chambers, J. M. (1998).: Programming with Data: A Guide to the S Language.
Springer-Verlag.
Chambers, J. M. and Hastie T. J. (1992).: Statistical Models in S.
Wadsworth and Crooks/Cole.
Cressie, N. A. C. (1993).: Statistics for Spatial Data.
John Wiley & Sons Inc., New York, revised reprint.
Furrer, R., Genton, M. G., and Nychka, D. (2006).: Covariance Tapering for Interpolation of Large Spatial Datasets.
Journal of Computational and Graphical Statistics, 15, 502-523.
Ihaka, R. and Gentleman, R. (1996).: R: A language for data analysis and graphics.
Journal of Computational and Graphical Statistics, 5, 299-314.
Johns, C., Nychka, D., Kittel, T., and Daly, C. (2003).: Infilling sparse records of spatial fields.
Journal of the American Statistical Association, 98, 796-806.
R Development Core Team. (2004).: R: A language and environment for statistical computing,
R Foundation for Statistical Computing, Vienna, Austria, http://www.R-project.org.
Sain, S. and Furrer, R., (2004).: Fitting Large-Scale Spatial Models with Applications to Microarray Data Analysis.
Proceedings Interface 2004.
Stein, M. L. (1999).: Interpolation of Spatial Data.
Springer-Verlag, New York.

Furrer 2006-11-21