Publication Date:
2012-10-19
Description:
Historically, observing snow depth over large areas has been difficult. When snow depth observations are sparse, regression models can be used to infer the snow depth over a given area. Data sparsity has also left many important questions about such inference unexamined. Improved inference, or estimation, of snow depth and its spatial distribution from a given set of observations can benefit a wide range of applications from water resource management, to ecological studies, to validation of satellite estimates of snow pack. The development of LiDAR technology has provided non-sparse snow depth measurements which we use in this study to address fundamental questions about snow depth inference using both sparse and non-sparse observations. For example, when are more data needed and when are data redundant? Results apply to both traditional, manual snow depth measurements and to LiDAR observations. Through sampling experiments on high-resolution LiDAR snow depth observations at six separate 1.17 km 2 sites in the Colorado Rocky Mountains, we provide novel perspectives on a variety of issues affecting the regression estimation of snow depth from sparse observations. We measure the effects of observation count, random selection of observations, quality of predictor variables, and cross-validation procedures using three skill metrics: percent error in total snow volume, root mean squared error, and R 2 . Extremes of predictor quality are used to understand the range of its effect; how do predictors downloaded from internet perform against more accurate predictors measured by LiDAR? While cross validation remains the only option for validating inference from sparse observations, in our experiments the full set of LiDAR-measured snow depths can be considered the “true” spatial distribution and used to understand cross-validation bias at the spatial scale of inference. We model at the 30 m resolution of readily-available predictors which is a popular spatial resolution in the literature. Three regression models are also compared and we briefly examine how sampling design affects model skill. Results quantify the primary dependence of each skill metric on observation count which ranges over 3 orders of magnitude, doubling at each step from 25 up to 3200. While uncertainty (resulting from random selection of observations) in percent error of true total snow volume is typically well constrained by 100-200 observations, there is considerable uncertainty in the true spatial distribution ( R 2 ) even at medium observation counts (200-800).Weshowthatpercenterrorintotalsnowvolumeisnotsensitivetopredictor quality, though RMSE and R 2 (measures of spatial distribution) often depend critically on it. In accuracies of downloaded predictors (most often the vegetation predictors) caneasily require a quadrupling of observation count to match RMSE and R 2 scores obtained by LiDAR-measured predictors. Under cross validation, the RMSE and R 2 skill measures are consistently biased towards poorer results than the true validation. This is primarily a result of greater variance at the spatial scales of point observations used for cross validation than at the 30 m resolution of the model. The magnitude of this bias depends on individual site characteristics, observation count (for our experimental design), and on sampling design. Sampling designs which maximize independent information maximize cross-validation bias but also maximize true R 2 . The bagging tree model is found to generally out-perform the other regression models in the study on several criteria. Finally, we discuss and recommend use of LiDAR in conjunction with regression modeling to advance understanding of snow depth spatial distribution at spatial scales of thousands of square kilometers. Copyright © 2012 John Wiley & Sons, Ltd.
Print ISSN:
0885-6087
Electronic ISSN:
1099-1085
Topics:
Architecture, Civil Engineering, Surveying
,
Geography
Permalink