# Algorithm 798: High-Dimensional Interpolation Using the Modified Shepard Method.

1. INTRODUCTIONComputer simulation is an important and necessary method for large-scale regional assessments of ecological processes. Recently, a hierarchy of simulations has been used to assess forest responses to changing climate, air quality, and land use across 13 southeastern states of the United States [Luxmoore et al. 1999]. This assessment, referred to as the Integrated Modeling Project (or IMP), considers the responses of southern pine species and eastern deciduous forests over the next several decades using multiscale computer simulations and geographic information system capability. Simulation outputs such as forest production, evapotranspiration, and carbon pools can be analyzed for various equilibrium and transient scenarios.

The IMP, sponsored by the Southern Global Change Program of the USDA Forest Service [Mickler and Fox 1998], has developed an approach for incorporating physiological responses of forests to environmental stresses and disturbances into a regional forest assessment. In scaling up from physiological to landscape scales, one must be able (1) to account for relevant processes that control the behavior of the soil-plant-atmosphere system at particular scales and (2) transfer relevant information from one scale to the next. The Linked Dynamic Model (or LDM) described in Luxmoore et al. [1999] constitutes a hierarchical collection of simulation models designed to propagate the ecophysiological response of trees to climate and air quality to the scale of forest stands for plantation management. Each simulator in the LDM is run many times with uncertainty analysis using the mean and variance of input variables to generate output distributions of desired responses. Frequency distributions of responses to various combinations of

(1) temperature,

(2) precipitation,

(3) nitrogen deposition,

(4) atmospheric carbon dioxide, and

(5) tropospheric ozone exposure

are stored in response surfaces for simulators not directly used in the IMP regional assessment. The use of response surfaces avoids the need to run small-scale simulators in all regional assessments and allows flexibility in the choice of forest management scenarios used by the IMP.

Since all possible combinations of the five environmental conditions (listed above) are not explicitly simulated in the LDM, the need for 5D interpolation of sparsely populated 5D hypervolumes is required. The interpolated hypervolumes (one for each mean, standard deviation, maximum, and minimum of critical LDM responses) are used to obtain response values for environmental conditions not specifically simulated. The use of 5D interpolation in this context reflects the multivariate nature of dynamic environments and is appropriate where simulated responses to changing environmental conditions change smoothly. Previous forestry assessments have not employed such multidimensional approaches to model changing environmental conditions [Luxmoore et al. 1999].

The need to produce 5D hypervolumes of response values reflecting a complete set of environmental conditions led to a reinvestigation of the multivariate interpolation methods for scattered data. The modification of Shepard's method described in Renka [1988] had demonstrated both efficiency and accuracy (comparable to other alternative methods) in 3D tests. The natural question that arose in the context of the IMP was how would the modified Shepard method perform in 5 dimensions? Hence, the focus of this brief article is to report on a new implementation of the QSHEP3D method described in Renka [1988] for 5D interpolation. The next section briefly reviews the modified Quadratic Shepard method followed by the design/testing of the C+ + implementation (QSHEP5D) in Section 3. Section 4 illustrates the performance of QSHEP5D with respect to accuracy and speed for a sample 5D test function, and a brief summary is provided in Section 5.

2. MODIFIED QUADRATIC SHEPARD METHOD

Following Renka [1988], given the function f with known values [f.sub.i] at the nodes ([x.sub.i], [y.sub.i], [z.sub.i], [v.sub.i], [w.sub.i]), for i = 1, ... , N, define

(1) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where the nodal functions [Q.sub.k] are local approximations to the 5D function f at ([x.sub.k], [y.sub.k], [z.sub.k], [v.sub.k], [w.sub.k]). Also, [Q.sub.k] is a five-variate quadratic function such that

[Q.sub.k]([x.sub.k], [y.sub.k], [z.sub.k], [v.sub.k], [w.sub.k]) = [f.sub.k],

and fits the values of f on neighboring nodes in a weighted least-squares sense.

The weight functions [W.sub.k](x, y, z, v, w) are defined by

(2) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where [d.sub.k](x, y, z, v, w) is the Euclidean distance between the points (x, y, z, v, w) and ([x.sub.k], [y.sub.k], [z.sub.k], [v.sub.k], [w.sub.k]), and where [R.sub.w] is a radius of influence about the node ([x.sub.k], [y.sub.k], [z.sub.k], [v.sub.k], [w.sub.k]). As explained in Renka [1988], data at ([x.sub.k], [y.sub.k], [z.sub.k], [v.sub.k], [w.sub.k]) can only influence interpolated values at points within the radius [R.sub.w]. The function F in (1) maintains the local properties of the nodal functions [Q.sub.k](x, y, z, v, w), and if f is quadratic then [Q.sub.k](x, y, z, v, w) = f(x, y, z, v, w) for all k. By construction,

(3) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where the coefficients ([bar] [c.sub.kf]'s) minimize

(4) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Here, [R.sub.q] is the radius of influence about node ([x.sub.i], [y.sub.i], [z.sub.i], [v.sub.i], [w.sub.i]), and the nodes within the radius [R.sub.q] of ([x.sub.k], [y.sub.k], [z.sub.k], [v.sub.k], [w.sub.k]) are the only nodes contributing to the least-squares fit in (4) to define [Q.sub.k]. If [R.sub.q] is not constant for all i, then the cost for testing all possible nodes for inclusion in the least-squares fit for [Q.sub.k] can be ?? (N).

2.1 Parameter Descriptions

To minimize the complexity of constructing the nodal functions, [Q.sub.k], a fixed [R.sub.q] can be used for each [Q.sub.k] in conjunction with an efficient search method for determining nearest neighbors of ([x.sub.k], [y.sub.k], [z.sub.k], [v.sub.k], [w.sub.k]). The approach used in our 5D interpolation is the same as that described in Renka [1988], namely, choose [R.sub.w] and [R.sub.q] large enough to include NW and NQ nodes, respectively, where NW and NQ are fixed. For example, [R.sub.w] is defined as the distance from node k ([x.sub.k], [y.sub.k], [z.sub.k], [v.sub.k], [w.sub.k]) to the jth closest node (j as small as possible) for j [is greater than] NW. An additional constraint imposed is that the jth closest node must be much farther from ([x.sub.k], [y.sub.k], [z.sub.k], [v.sub.k], [w.sub.k]) than the (j - 1)st node. [R.sub.q] is computed in similar fashion.

Determining the nearest neighbors for ([x.sub.k], [y.sub.k], [z.sub.k], [v.sub.k], [w.sub.k]) in a 5D hypervolume can exploit the same cell method developed for QSHEP3D. The hypervolume is partitioned into NR x NR X NR x NR x NR uniform grids of cells with the indices of the nodes contained in each cell stored in arrays. For QSHEP5D, the number of cells is NR5 with cell density C = N/[NR.sup.5]. See Renka [1988] for a more complete description of the cell method (for 2D interpolation).

2.2 Parameter Selection

From previous 3D interpolation experiments [Renka 1988], the recommended values for NQ range from 13 to 17. For an M-dimensional hypervolume with N known nodes (i.e., where the function f is defined), the original software for QSHEP3D required

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

For 5D interpolation with QSHEP5D, the minimum and maximum values of 20 and 50, respectively, were used.

The recommended values [Renka 1988] for NW are 19 and 32 for M = 2 and M = 3, respectively, with the following constraint in QSHEP3D:

1 [is less than or equal to] NW [is less than or equal to] min(50, N - 1).

For QSHEP5D, larger radii of influence (ranging from NW = 32 to NW = 50) were used to construct the weight functions [W.sub.k](x, y, z, v, w).

Following QSHEP3D, the length of each axis in the 5D hypervolume searched by the cell method (for determining nearest neighbors) is defined by

(5) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

In all 5D interpolation experiments where N = 4000, NR was chosen to vary between 1 and 4, inclusive. For all other values of N only one value of NR was used according to (5).

3. SOFTWARE DEVELOPMENT AND TESTING

In this section, we briefly describe the design and testing methodology for QSHEP5D. Techniques for assessing the accuracy of 5D interpolation are presented as well.

3.1 Code Specifications

Whereas the original Fortran program (QSHEP3D) was primarily designed for 3D implementation, the new C++ implementation of QSHEP5D is much more general in that any number of dimensions (M) can be specified with perhaps nonuniform axis lengths across all dimensions. The storage requirements of QSHEP5D were, of course, much greater than QSHEP3D, since 21N storage locations are required for the coefficients of [Q.sub.k] (see (3)) in addition to the 6N storage locations required for [x.sub.i], [y.sub.i], [z.sub.i], vi, [w.sub.i], and [f.sub.i].

To process the response surfaces described in Section 1, QSHEP5D was specifically designed to read and write netCDF files.(1) netCDF (NETwork Common Data Form) is an interface for array-oriented data access and a library that provides an implementation of the interface. The netCDF library also defines a machine-independent format for representing scientific data. Together, the interface, library, and format support the creation, access, and sharing of scientific data. netCDF software was originally developed at the Unidata Program Center in Boulder, Colorado.

3.2 Data and Error Visualization

The accuracy of 3D interpolation by QSHEP3D was determined (see Renka [1988]) by computing the maximum ([e.sub.max]) and mean ([bar] e) errors defined by

(6) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where [f.sub.i] and [f.sub.i] are approximated and true values of the function f, respectively, and P is the total number of nodes interpolated. These same errors are computed in QSHEP5D for 5D interpolation and do provide global information on the goodness of fit. By exploiting the netCDF data format, however, QSHEP5D can facilitate the visualization of any 2D planes of the interpolated 5D hypervolume. Using a data viewer called Ncview, 2D slices of the original and interpolated hypervolumes can be viewed (see Section 4). Ncview is a visual browser(2) written by David W. Pierce at the Scripps Institution of Oceanography. The premise of Ncview is to get a quick and easy, push-button look at netCDF files. One can view simple movies of the data, view along various dimensions, take a look at the actual data values, change color maps, and invert the data. The viewer executes on UNIX platforms under X11, R4 or higher. The interpolation errors in (6) can also be visualized using Ncview. One can use QSHEP5D to produce an appropriate netCDF file of the error(s) at each node in the 5D hypervolume, and then navigate any selected 2D slice of the error hypervolume using Ncview.

3.3 Data Generation

In testing QSHEP5D, randomly scattered data points were generated for the known nodes or points (N) within each 5D hypervolume containing a total of P points. The N random (known) points, were generated by using the UNIX random-number functions srand and rand with the current wall-clock time used as the seed. Each generated point (indexed between 0 and P - 1) was matched with a set of relative coordinates in the 5D hypervolume. A selected test function f was then evaluated at each of the N known nodes ([x.sub.i], [y.sub.i], [z.sub.i], [v.sub.i], [w.sub.i]). In preparation for the frequency distributions of the five environmental conditions (not yet available) mentioned in Section 1, QSHEP5D was evaluated using modified test functions described in Renka [1988]. In this way, the true accuracy of the Modified Quadratic Shepard Method in 5D could be assessed using [e.sub.max] and [bar] e in (6).

3.4 Experiments

For testing purposes, only 10 grid points were used per axis for the 5D hypervolumes (compared to 20 grid points per axis for 3D interpolation in Renka [1988]). Hence, each interpolated 10 x 10 x 10 x 10 x 10 hyper-volume comprised exactly P = 100,000 nodes. In the sample results provided in Section 4, we used modified saddle functions similar to the 3D test function F3 from Renka [1988]. Specifically, the 5D function f defined by

(7) f(x, y, z, v, w) = (1.25 + cos(5.4w)) x cos(6x) x cos(6y) x cos(6z) =6 + 6[(3v - 1).sup.2]

was evaluated at N = 2000, 4000, 10000, and 25,000 (random) known nodes prior to each interpolation by QSHEP5D. The minimum and maximum values of NQ and NW discussed in Section 2 were used in different combinations, and the formula in (5) was used to determine NR for each data set. Using Ncview, the minimum number of known nodes (N) which produced acceptable results (from a rough visual perspective) was N = 4000. Further analysis for the N = 4000 case included an evaluation of how various parameter combinations affected performance (elapsed CPU minutes) and accuracy ([e.sub.max]).

Since the function f in (7) is relatively smooth, modifications to this function were also considered to reflect potential holes or cliffs which were anticipated in the initial 5D response surfaces produced by the IMP [Luxmoore et al. 1999]. In other words, the 5D hypervolumes for interpolation may well contain regions of missing data or perhaps sharp contrasts in functional values. To simulate such possibilities, constraints were introduced both on the independent variables and the function f in (7).

The sample data constraint

(8) f(x, y, z, v, w) = { f(1,1,1,1,1) if 1 [is less than] y [is less than] 5 f(x, y, z, v, w) otherwise

was used to simulate missing or unavailable data (cliffs), and the conditional function

(9) f(x, y, z, v, w) = {0.1 if 3 [is less than] x, y, z, v, w [is less than] 7 f(x, y, z, v, w) otherwise

was used to simulate sharp functional transitions (holes).

The same data sets and parameter combinations mentioned in Section 3 were also used for the constrained cases above. Comparisons of elapsed time and accuracy versus number of known nodes (N) between the unconstrained, constrained, and functional constrained input data sets (hypervolumes) are presented in the next section.

4. PERFORMANCE RESULTS

In this section, we assess both the efficiency and accuracy of the QSHEP5D implementation (in C+ +) for the interpolation of 5D hypervolumes. Although not reported here, some of the same experiments reported in Renka [1988] for selected 3D test functions were conducted with the same QSHEP5D software (adjusted for 3D, of course). The recommended values NQ = 13 and NW = 19 generally produced accuracies on the order of [e.sup.max] = [10.sup.-2] in under 140 and 20 CPU seconds(3) for NR = 1 and NR = 6, respectively.

For the 5D interpolation experiments presented below, the maximum value NQ = 50 and minimum value NW = 32 (see Section 2) generally required the least execution time for accuracy on the order of [e.sub.max] = [10.sup.-2]. As mentioned in Section 3, 5D data sets having known nodes ranging from N = 2000 to N = 25,000 were used to produce complete (or interpolated) hypervolumes of P = 100,000 nodes.

4.1 Unconstrained Case

Figure 1 illustrates the y-w plane(4) in the true 100,000-node hyper- volume for the function f defined in (7). Figures 2(a)-(d) depict the corresponding y-w planes of the interpolated 5D hypervolume.(5) By visual inspection for these particular planes (using Ncview), the minimum number of known nodes (N) which produced a recognizably similar image to that in Figure 1 is N = 4000. The cases of N = 10,000 and N = 25,000 produced slightly better accuracy but were too costly in runtime. Focusing just on the N = 4000 example, different combinations of the QSHEP5D parameters (NQ, NW, and NR) were used to study effects on both timing and accuracy. The graph in Figure 3 shows the results of these combinations.

[Figures 1-3 ILLUSTRATION OMITTED]

For this test case, all three parameters affected the performance and the accuracy of the 5D (unconstrained) interpolation. With the 4000-node data set, the best results (in terms of elapsed CPU minutes and minimum [e.sub.max]) were obtained with NQ = 50, NW = 32, and NR = 3. For the same choices of NQ and NW, QSHEP5D with N = 4000 was 5.5 and 16.6 times faster than the cases when N = 10,000 and 25,000, respectively. Certainly, the choice of NQ affects the computational complexity of QSHEP5D such that the fewer the nodes required to construct the nodal functions [Q.sub.k] in (3) the faster the interpolation (see Figure 3), but at the risk of reduced accuracy. The choice of NW directly affects the runtime also, but with little affect on accuracy. The choice of NR for defining the partitions (cells) for nearest-neighbor searching (see Section 2) also affects the overall QSHEP5D execution time, with no significant impact on accuracy.

4.2 Data and Functional Constraints

The image in Figure 4 represents the y-w plane of the true P = 100,000-node data-constrained hypervolume defined by (8), and Figures 5(a)-(d) show the corresponding planes of the interpolated hypervolume generated by QSHEP5D.

[Figures 4-5 ILLUSTRATION OMITTED]

As with the unconstrained example (Section 4), the N = 4000-node hypervolume was still acceptable (in a very crude sense) from visual inspection when the same parameter combinations described in Section 4 were used. The accuracy (in terms of [e.sub.max]) obtained for this 5D interpolation, however (see Figure 6), was significantly worse than that obtained for the unconstrained test function f in (7). Accuracy greatly improved, however, when NQ = 50 was used to construct the nodal functions [Q.sub.k]. Notice that the curves in Figure 6 are very similar to those in Figure 3 with the exception in accuracy.

[Figure 6 ILLUSTRATION OMITTED]

For the functional constraint test example described by (9), Figure 7 shows the y-w plane of the true P = 100,000-node 5D hypervolume. Figures 8(a)-(d) show the corresponding y-w planes produced by QSHEP5D. As with the previous test case, the same parameter combinations for the N = 4000-node hypervolume produced similar results to the unconstrained example. However, for sharp transitional values (shown as the hole in Figure 7) the accuracy deteriorated rapidly. The runtime performance remained approximately the same as that of the previous test cases.

[Figures 7-8 ILLUSTRATION OMITTED]

The curves in Figure 9 demonstrate the performance of QSHEP5D for such nonsmooth functions as that defined by (9). The timing results are quite similar to those shown in Figures 3 and 6. However, the accuracy for this 5D interpolation never approaches that of the unconstrained and data-constrained interpolations. Unless NQ and NW are chosen relatively large (e.g., NQ = 50 and NW = 32), the accuracy of QSHEP5D for such nonsmooth functions f (or similar scattered data patterns) could be unacceptable. Figure 10 summarizes the different accuracies obtained with QSHEP5D across the three test cases.

[Figures 9-10 ILLUSTRATION OMITTED]

5. SUMMARY

A translated and enhanced implementation of the Modified Quadratic Shepard method first discussed in Renka [1988] has been presented. This software, referred to as QSHEP5D, is written in C + + and has been tested for both 3D and 5D interpolation of randomly scattered data. For 5D interpolation experiments, optimal performance in time and accuracy was generally obtained for the parameter set {NQ = 50, NW = 32}. Maximum interpolation errors ([e.sub.max]) on the order of [10.sup.-2] were obtainable for two of the three 5D interpolation test cases. The accuracies obtained for a contrived test case with functional constraints (yielding nonsmooth transitions) were on the order of [10.sup.-1] at best.

(1) See http: //www.unidata.ucar.edu/packages/netcdf and ftp://ftp.unidata.ucar. edu/pub/netcdf (June 5, 1998).

(2) See http: //meteora.ucsd.edu/-pierce/ncview_home_page.html and ftp:// cirrus.ucsd.edu/pub/ncview/ncview-1.72.tar.z (June 5, 1998).

(3) Executing on one 167MHz processor of a Sun Ultra-1 SPARCstation having 256MB of RAM.

(4) All 2D planes shown were obtained using Ncview with x = z = v = 0.

(5) All images created with Ncview are in color. For printing purposes, only corresponding gray-scale images are used. Generally, the darker shades of gray depict smaller numerical values, and the brighter shades denote the larger values. Note that different colors can be represented with the same intensity level of gray.

REFERENCES

LUXMOORE, R. J., HARGROVE, W. W., THARP, M. L., POST, W. M., BERRY, M. W., MINSER, K. S., CROPPER, W. P., JOHNSON, D. W., ZEIDE, B., AMATEIS, R. L., BURKHART H. E., BALDWIN, V. C., AND PETERSON, K. D. 1999. Signal-transfer modeling for regional assessment of forest responses to environmental and land-use changes in the southern United States. Environ. Model. Assess. In press.

MICKLER, R. AND FOX, S. 1998. The Productivity and Sustainability of Southern Forest: Ecosystems in a Changing Environment. Springer-Verlag, Vienna, Austria.

RENKA, R. J. 1988. Multivariate interpolation of large sets of scattered data. ACM Trans. Math. Softw. 14, 2 (June), 139-148.

Received: July 1998; accepted: June 1999

MICHAEL W. BERRY and KAREN S. MINSER University of Tennessee

This work was supported by the USDA Forest Service under Subcontract Nos. 29-1286-96 and SRS-CA-96-067 as part of the the Integrated Modeling Project (IMP) for the Southern Global Change Program.

Authors' address: Department of Computer Science, University of Tennessee, 107 Ayres Hall, Knoxville, TN 37996-1301; email: berry@cs.utk.edu; minser@cs.utk.edu. Permission to make digital/hard copy of part or all of this work for personal or classroom use is granted without fee provided that the copies are not made or distributed for profit or commercial advantage, the copyright notice, the title of the publication, and its date appear, and notice is given that copying is by permission of the ACM, Inc. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee.

Printer friendly Cite/link Email Feedback | |

Author: | BERRY, MICHAEL W.; MINSER, KAREN S. |
---|---|

Publication: | ACM Transactions on Mathematical Software |

Geographic Code: | 1USA |

Date: | Sep 1, 1999 |

Words: | 3799 |

Previous Article: | Algorithm 797: Fortran Subroutines for Approximate Solution of Graph Planarization Problems Using GRASP. |

Next Article: | Beware of Linear Congruential Generators with Multipliers of the Form [Alpha] = [+ or -] [2.sup.q] [+ or -] [2.sup.r]. |

Topics: |