Introduction

When planning subsurface constructions below the water table, risks associated with groundwater drainage must be considered. A particularly important risk in this context is land subsidence induced by groundwater drawdown in areas with compressible clay deposits (Sundell 2016). Many examples have been documented where groundwater drawdown-induced land subsidence has led to severe consequences, including Shanghai (Xue et al. 2005) and Beijing (Zhu et al. 2015) in China, Mexico City (Ortega-Guerrero et al. 1999), Bangkok in Thailand (Phien-wej et al. 2006), Las Vegas (Burbey 2002) and Los Angeles (Bryan et al. 2018) in the USA, Stockholm and Gothenburg in Sweden, and Oslo in Norway (Karlsrud 1999; Olofsson 1994).

In infrastructure projects, damage can be avoided by implementing risk-reduction measures such as improved sealing to avoid leakage and artificial infiltration of water into the aquifer to maintain groundwater levels. To evaluate the effect of a planned measure properly, the relevant properties of the hydrogeological system need to be sufficiently understood. A hydrogeological system is often characterized by heterogeneous and anisotropic materials as well as high temporal and spatial variability in water balance conditions. Because of these characteristics, and since field investigations are both costly and time-consuming, the system cannot be exhaustively investigated, which means that decisions regarding the need for risk-reduction measures must be taken under uncertainty. To decide what is “sufficient”, a trade-off between the benefits of increased knowledge to reduce the risk of inappropriate decisions and the cost of new information can be made in accordance with the principles of Value of Information Analysis (VOIA). VOIA is a cost-benefit analysis (CBA) where the cost of collecting new information is compared with the expected benefits of a reduced risk of making an erroneous decision relative to a reference alternative. The result of the VOIA, from an economic perspective, is a selection of the most appropriate information collection alternatives.

VOIA as a decision support method in hydrogeological systems was introduced into a framework by Freeze et al. (1990) and further described by Freeze et al. (1992). Within this framework, the costs and benefits of alternative designs are compared using hydrogeological simulation models that account for uncertainties. If a hydrogeological system cannot be investigated in all its aspects, the problem is ill-posed, meaning many plausible models can be sufficiently consistent with available observations (Beven 2006; Carrera and Neuman 1986). Inverse probabilistic calibration to identify plausible models can be used to handle such a model structure with more independent rather than dependent parameters (Burrows and Doherty 2015; Carrera et al. 2005; Doherty 2003; Li and Zhang 2018; Siade et al. 2017; Sun 1999; Tonkin and Doherty 2009). The need for inverse calibration combined with VOIA was identified early on by Freeze et al. (1992) but no such method was present at that time. Recently, Kitanidis (2015) found it surprising that the issue had not received more attention given its importance, but points out that the topic is both conceptually difficult and computationally challenging.

A common tool for inverse probabilistic calibration of groundwater models in the numerical code MODFLOW (Harbaugh 2005) is PEST (Parameter ESTimation code). PEST allows inverse calibration of many plausible model parameterizations to be carried out based on a user-defined range. Using these calibrations, model parameter uncertainties can be estimated. Although these parameter uncertainties can be useful, it is the effects (e.g. changed groundwater heads, stream flows and water balance conditions) of a planned disturbance (e.g. groundwater extraction or infiltration) that are of primary interest. Relative parameter values that do not change over time, the future effect of a disturbance event cannot be measured at present; however, the future effect is dependent on the properties of the system described by the model and its parameters in the present state. Therefore, the hypothesis proposed here is that model parameters can serve as proxies for such effects; furthermore, whether investigations to reduce uncertainties in model parameters also reduce uncertainty regarding future effects is studied. Finally, whether additional investigations of these parameters can change which design alternative to recommend is investigated with VOIA.

The main objective of this paper is to suggest a procedure for VOIA in order to assess the need for additional information when assessing the effect of several design alternatives with regard to future disturbances in hydrogeological systems. The procedure is shown for a specific situation, i.e. planning risk-reduction measures for groundwater drawdown in infrastructure projects in subsidence-sensitive areas where the economic cost of failure is associated with exceeding acceptable groundwater drawdown magnitudes. The general procedure recommended by Freeze et al. (1992) is updated using a VOIA method capable of assessing several design alternatives, and where hydrogeological parameters are used as proxies for failure (in this case critical lowering of groundwater heads). The uncertainty estimation is based on an inverse probabilistic calibration using PEST and MODFLOW. The result introduces spatially distributed VOIA maps as a decision support tool for planning additional investigations. The procedure is demonstrated with the aid of a case study of a planned tunnel in central Stockholm, Sweden, which will be built in crystalline bedrock below soil layers with coarse-grained materials and postglacial clay. First, the general strategy is presented together with the groundwater modelling and the VOIA procedures. Then the Stockholm case study is described.

Method

General strategy

The overall framework (Fig. 1) of the proposed method covers the two steps in VOIA: (1) prior analysis and (2) pre-posterior analysis (dashed box, Fig. 1). Within the scope of decision support, there are two additional steps, which are followed by steps (3) decision on whether to proceed with the design alternative or additional investigations and (4) posterior analysis (Freeze et al. 1992). VOIA evaluates whether additional investigations are worthwhile as a basis for choosing between design alternatives. This evaluation is initiated using a prior analysis based on groundwater modelling results and using the current stage of available information (step 1). The prior analysis is a cost-benefit analysis (CBA) where the investment cost (ci) of design alternatives (Ai, i = 1,…,m represents the numbering of the alternatives) is compared with the expected benefits of reduced risks relative to a reference alternative A0. All alternatives, including the reference alternative, involve changed drainage conditions, which are expected to disturb the current water balance and groundwater head situation. The prior analysis is initiated by assigning prior estimates to groundwater modelling parameters (1a in Fig. 1). With the prior parameter estimates as initial values, a randomized inverse groundwater model calibration in PEST results in several calibrated model solutions representing plausible conditions (equally acceptable parameter settings of the ill-posed problem) for the current undisturbed situation (1b in Fig. 1). The repeated randomizations in the PEST calibrations result in updated posterior parameter estimates (c in Fig. 1). In each of the calibrated solutions, different design alternatives (including A0) that disrupt the current situation are modelled. From these models, the effect on the water balance and groundwater heads is evaluated. The probability of failure, P(F), for each alternative is calculated by comparing the difference in head relative to the current situation, with a failure criterion defined using risk areas of acceptable groundwater drawdown magnitudes (Sundell et al. 2017). Within the risk areas, the drawdown is not permitted to exceed a certain magnitude. Exceeding this failure criterion results in an economic cost of failure kF. By multiplying kF by P(F), the risk cost (R), i.e. the expected failure cost, is set for each alternative i. The reduction in risk cost relative to the reference alternative is a benefit which is is compared to the cost of obtaining the new information and the best prior alternative is identified as the alternative with the highest expected net benefit (step 1d in Fig. 1; e.g. Zetterlund et al. 2015). In the pre-posterior analysis (step 2 in Fig. 1), the expected information gain from a planned investigation is calculated by comparing the monetary benefit of the expected information with the cost of conducting the investigation. Based on the result of the VOIA, investigations (step 3a in Fig. 1) or the best prior alternative (step 3b in Fig. 1) are carried out. In the final posterior analysis, the model is updated with the new information and the decision alternatives are re-evaluated.

Fig. 1
figure 1

Principle of the valuation of information analysis (VOIA) for groundwater models as decision support for alternative designs. Modified from Zetterlund et al. (2015)

Prior analysis using a groundwater model

The groundwater modelling process follows general principles (e.g. Freeze et al. 1990; LeGrand and Rosén 2000; Reilly 2001) including: definition of the project goal, data collection, development of a conceptual model, development of a numerical model, model parameterization, calibration, assessment of a problem using a simulation model and, in the final stage, a prior design suggestion based on the model results. In addition to these steps, the whole process can be repeated when new information is available. In the case study, the numerical model is constructed in MODFLOW (Harbaugh 2005) using the NWT solver (Niswonger et al. 2011) together with the PEST sub-space Monte Carlo (SSMC) (Tonkin and Doherty 2009) technique in the GMS graphic user interface (Aquaveo 2017).

In Fig. 2, the groundwater modelling in the prior analysis is initiated with a definition of a conceptual model with three soil stratification layers and fractured bedrock at the bottom. Based on these materials, the numerical model can be discretized with different layers in a three-dimensional (3D) grid. The model is then parameterized with properties such as recharge (RCH) and hydraulic conductivity (K) for the different materials. Since significant heterogeneity and anisotropy is expected within different materials, fields of material properties are modelled with the aid of pilot points (Doherty 2003) in the different layers. Pilot points (PPs) are a two-dimensional (2D) scatter-point set representing different locations within a material. As recommended in Doherty (2003), PPs should be placed with a high spatial density throughout the model domain to encapsulate heterogeneity and avoid numerical instability. From the PP, a spatially distributed parameter field is modelled using kriging (Matheron 1963) along with a variogram that approximates the heterogeneity of the parameter.

Fig. 2
figure 2

Procedure for model conceptualization, definition of numerical model with grid layers, and randomized inverse calibration using PEST

Prior parameter estimates

Based on expected variability, prior estimates of material properties are assigned to the PPs (Fig. 2). If significant material heterogeneity is expected and few or no investigations of material properties are made, the set variability span should be quite large. For locations with hydrogeological investigations such as slug tests, pumping tests or screening curves of soil fractions, a smaller variability span can be set.

Inverse probabilistic calibration with PEST

From observations such as groundwater heads, the model is calibrated using PEST, resulting in posterior estimates at the individual PP. The theoretical considerations of the tools included in the PEST software suite (PEST 2018) are documented over a wide range of literature (Burrows and Doherty 2015; Doherty 2003, 2011; Doherty and Hunt 2010; Fienen et al. 2013, 2009; Moore and Doherty 2005; Rossi et al. 2014; Tonkin and Doherty 2005, 2009; Woodward et al. 2016). The goal of an inverse calibration using PEST SSMC is to find the parameter combinations that meet the calibration criterion. Although the process is conditioned on the calibration criterion, it is possible that some randomizations do not fulfil this criterion or result in an unreasonable water balance. In these cases, it is important that the modeller reviews these solutions based on expert knowledge, see discussion on “hydrosense” in Hunt and Zheng (2012). Two specific criteria are used here: solutions where the difference between simulated and measured head for any observation well is greater than 1.5 m, and solutions where the difference in water balance between the inflow and outflow of water is greater than 10%, are ignored.

Posterior parameter estimates

An example with three randomized calibrated solutions is shown in Fig. 3a. Each solution consists of posterior-calibrated parameter combinations of PPs for RCH and K in the different layers. The groundwater heads are calibrated for each observation well between the three solutions although the values of the PPs between the solutions vary. After the process is repeated for n calibrations, posterior parameter ranges can be calculated. In the example for one PP in Fig. 2, the SSMC process in PEST results in a narrower span for the posterior parameter range compared with its prior estimate.

Fig. 3
figure 3

a Three calibrated randomized solutions with hydraulic conductivity and groundwater heads. b Modelled effect of a design alternative on groundwater heads in each calibrated solution

Best prior design alternative

The effect of changed drainage conditions is modelled using the calibrated randomized solutions. These changes include groundwater leakage into a planned tunnel, as well as safety measures to reduce the effect of the leakage on the water balance and groundwater heads. The changes are represented by different design alternatives (Ai, with index i = 1,…,m), where A0 is the reference alternative and m is the number of alternatives. The effect of an alternative simulated in each calibrated solution is illustrated in Fig. 3b. For a model with n calibrated randomized parameter solutions, the difference between these simulations results in a range of possible effects on the water balance and groundwater heads for each design alternative. From these, the likelihood of a certain groundwater level at a specific location can be calculated.

To investigate whether a design alternative is acceptable, the simulations are compared with a failure criterion. Commonly, this criterion is defined by a regulation authority. In Sweden, limits for groundwater drainage for major subsurface and water supply projects are decided in the environmental court. In this process, the decision is based on the consequences of the drawdown. In the case study, land subsidence is the main consequence of drawdown. Risk areas for groundwater drawdown-induced land subsidence are defined from locations where the 95th percentile of simulations of subsidence exceed 2 cm for groundwater drawdown magnitudes of 0.5, 1 and 2 m in the confined soil aquifer. These simulations are based on a probabilistic method, where a geostatistics-based soil stratification model (Sundell et al. 2016) is combined with a one-dimensional (1D) elasto-plastic model for the calculation of consolidation settlements (Larsson and Sällfors 1986). See Sundell et al. (2017) for a complete presentation of the method and results. In order to not exceed 2 cm of subsidence the groundwater drawdown within the risk area (gwaccept) should not exceed 1 m.

The calculation of a design alternative aimed at a failure criterion defined by the risk area for gwaccept is illustrated by the upper “Prior analysis” part of Fig. 4 (the risk area is also illustrated in Figs. 3 and 4. To facilitate comparison between the risk area and the modelled changes in groundwater heads, and to enable field evaluations, the comparison is realized for groundwater observation wells within and close to the risk area. Although measurements of groundwater heads exist, the comparison is made between the simulated value in the calibrated model and the simulated value in the corresponding modified model, since every calibrated solution is a plausible representation of the reality as discussed earlier. In Fig. 4, the difference is calculated between each calibrated solution and the corresponding simulation using a design alternative. If the difference is less than gwaccept the simulation is deemed not to have failed (Fc), otherwise it has failed (F). This process is repeated for all n simulations for each alternative design, and the ratio of failed simulations corresponds to P(F). Figure 4a illustrates the maximum groundwater drawdown at any observation well in three design alternatives (A0, A1, A2) compared with gwaccept.

Fig. 4
figure 4

a Prior analysis with a comparison of differences in groundwater heads at observation wells, representing the risk area between a calibrated solution and the corresponding simulation for a design alternative. The ratio of unacceptable solutions (difference in any observation well >gwaccept) results in the probability of failure P(F). The graph illustrates the maximum groundwater drawdown at any observation well for three design alternatives (A0, A1, A2). b The initial step of the pre-posterior analysis. For each parameter, the posterior parameter estimate is grouped into frequency functions for failed and nonfailed simulations. The overlap of these results in the overlap coefficient (OVL)

Exceeding the limits of the failure criterion means that the associated consequences can take effect. If a regulation authority conditions the limit, exceedance can result in fines and delay costs in addition to costs for possible damage. The sum of these costs is the failure cost (kF).

The risk (R) for each alternative (including A0) is given by:

$$ {R}_i=P\left({F}_i\right)\times {k}_{\mathrm{F}} $$
(1)

The benefit of an alternative (Bi) is given by:

$$ {B}_i={R}_0-{R}_i $$
(2)

Each alternative (i) has an investment cost (ci). The net benefit of an alternative is given by:

$$ {\varPhi}_i={B}_i-{c}_i $$
(3)

Subsequently, the best prior alternative (ϕi) is the alternative with the greatest value:

$$ {\varPhi}_{\mathrm{prior}}=\max \left({\varPhi}_i\right) $$
(4)

Pre-posterior analysis

If additional information changes the recommended design alternative, it is evaluated in the subsequent pre-posterior analysis. This process is initiated by separating parameter values (Ɵ) of failed simulations from nonfailed simulations for each design alternative and parameter (“Pre-posterior analysis” part of Fig. 4 illustrated for a parameter represented by a PP named “K3a”). The division results in the relative frequency functions f(Ɵ|F) and f(Ɵ|Fc), where the integral of each of these functions equals 1. The overlap coefficient (OVL, Fig. 4) is used to measure the agreement between these two distribution functions (Inman and Bradley 1989). If OVL = 1 the parameters are identical, meaning that additional investigations to find the actual parameter value will not help to determine whether or not a design alternative will meet the tolerability criterion. OVL = 1 results in a probability of detecting failure, given failure P(D|F) = 0, and a probability of not detecting failure given nonfailure P(Dc|Fc) = 0. On the contrary, if OVL = 0 an additional investigation that can detect the true parameter value with a high degree of accuracy will determine whether the alternative will meet the tolerability criterion, resulting in P(D|F) = 1 and P(Dc|Fc) = 1.

Since the parameters are represented by spatially distributed pilot points, OVL can be mapped for each parameter group (RCH and K for the different layers) and design alternative. Locations with low OVL values indicate that additional investigations can detect whether the design alternative will fail or not.

For cases with an OVL between 0 and 1, and in the parameter value range where the functions overlap, it is uncertain whether a sampled value belongs to f(Ɵ|F) or f(Ɵ|Fc). In this range, a critical parameter value, Ɵc, is selected. For the example in Fig. 4, it is assumed that all values below Ɵc belong to f(Ɵ|F) and above Ɵc it is assumed that all values belong to f(Ɵ|Fc), meaning that D = [Ɵ < Ɵc] and Dc = [Ɵ ≥ Ɵc]. The opposite condition takes affect if it is f(Ɵ|F) instead of f(Ɵ|Fc) that contains the highest values. Detection of failure can either result in a true detection, P(D|F) or a false detection (type I error), P(D|Fc). Similarly, Dc can either result in P(Dc|Fc) or the type II error P(Dc|F). If Ɵc is chosen as the intersection point of the two functions, as illustrated in Fig. 4, the sum of the type I and type II errors is minimized. With this choice, OVL =  P(Dc|F) +  P(D|Fc) (Fig. 5).

Fig. 5
figure 5

Choice of critical parameter value (Ɵc) based on OVL together with the type II P(Dc|F) and type I errors P(D|Fc)

For each parameter and alternative, OVL and the Ɵc are calculated by testing each position along the parameter value axis (Ɵ). In each test, P(Dc|F) is calculated for f(Ɵ|F) and P(D|Fc) is calculated for f(Ɵ|Fc). The position that minimizes P(Dc|F) + P(D|Fc) gives OVL and Ɵc.

In the subsequent step, P(D|F) is compared between the alternatives. This comparison is complicated by the different positions of Ɵc between the alternatives, resulting in a different detection event (Di) for each alternative, as illustrated in Fig. 6. For the case with three alternatives, there are 23 = 8 different detection possibilities:

$$ {\displaystyle \begin{array}{l}{D}_0\cap {D}_1\cap {D}_2\\ {}{D}_0\cap {D}_1\cap {D}_2^c\\ {}{D}_0\cap {D}_1^c\cap {D}_2\\ {}{D}_0^c\cap {D}_1\cap {D}_2\\ {}{D}_0\cap {D}_1^c\cap {D}_2^c\\ {}{D}_0^c\cap {D}_1\cap {D}_2^c\\ {}{D}_0^c\cap {D}_1^c\cap {D}_2\\ {}{D}_0^c\cap {D}_1^c\cap {D}_2^c\end{array}} $$
Fig. 6
figure 6

Example of the location of Ɵc for three alternatives (A0, A1, A2) together with the detection intervals (D x j )

To make the comparison between the different detection events for each alternative plausible, detection intervals D x j are introduced based on the different positions of Ɵci:

$$ {D}_3^x={D}_0\cap {D}_1\cap {D}_2 $$
(5)

where failure is identified in all three alternatives,

$$ {D}_2^x=\left({D}_0\cap {D}_1\cap {D}_2^c\right)\cup \left({D}_0\cap {D}_1^c\cap {D}_2\right)\cup \left({D}_0^c\cap {D}_1\cap {D}_2\right) $$
(6)

where failure is identified in two alternatives,

$$ {D}_1^x=\left({D}_0\cap {D}_1^c\cap {D}_2^c\right)\cup \left({D}_0^c\cap {D}_1\cap {D}_2^c\right)\cup \left({D}_0^c\cap {D}_1^c\cap {D}_2\right) $$
(7)

where failure is identified in one alternative,

$$ {D}_0^x={D}_0^{\mathrm{c}}\cap {D}_1^c\cap {D}_2^c $$
(8)

where failure is not identified for any alternative.

For the example in Fig. 6:

$$ {D}_3^x={D}_0\cap {D}_1\cap {D}_2 $$
(9)
$$ {D}_2^x={D}_0\cap {D}_1\cap {D}_2^c $$
(10)
$$ {D}_1^x={D}_0^c\cap {D}_1\cap {D}_2^c $$
(11)
$$ {D}_0^x={D}_0^c\cap {D}_1^c\cap {D}_2^c $$
(12)

The other detection events for failure of three and two alternatives remain empty for the example.

Based on Eqs. (5)–(8), P(Fi|D x j ) and P(D x j ) are calculated for each alternative, Ai and detection event, D x j from the locations of Ɵci and the previous grouping of failed and nonfailed parameter values. From these calculations, the posterior net present value is calculated for each alternative and detection event by making a comparison with the reference alternative:

$$ {\varPhi}_i\left({D}_j^x\right)={B}_i\left({D}_j^x\right)-{c}_i={k}_F\left[P\left({F}_0|{D}_j^x\right)-P\left({F}_i|{D}_j^x\right)\right]-{c}_i $$
(13)

The best net posterior present value for a detection interval is:

$$ {\varPhi}_{\mathrm{posterior}}\left({D}_j^x\right)=\max \left[{\varPhi}_i\left({D}_j^x\right)\right] $$
(14)

The pre-posterior present value is the sum of the best posterior present values for all detection events multiplied by their probability:

$$ {\varPhi}_{\mathrm{pre}-\mathrm{posterior}}\left({D}_j^x\right)={\sum}_j{\varPhi}_{\mathrm{posterior}}\left({D}_j^x\right)\times P\left({D}_j^x\right) $$
(15)

Finally, the expected net value of information (EVI) is calculated as the difference between the pre-posterior and the prior net present values:

$$ \mathrm{EVI}={\varPhi}_{\mathrm{pre}-\mathrm{posterior}}-{\varPhi}_{\mathrm{prior}} $$
(16)

In these calculations, information is only of value if the investigation has the potential to change the decision on which alternative to recommend. EVI does not take into account the data collection cost (cp). To do so, the expected net value (ENV) is calculated:

$$ \mathrm{ENV}=\mathrm{EVI}-{c}_{\mathrm{p}} $$
(17)

Commonly, several investigations are suggested as part of an investigation programme. With the procedure presented, combinations of several investigations are not considered. Nevertheless, it is reasonable to recommend several locations with a high Фpre-posterior as potential sampling locations unless parameters within the same group are in close proximity.

Posterior analysis

If the pre-posterior result suggests carrying out the investigation programme, the programme is realized, and new information is obtained, the model is updated in a posterior analysis. This updating will follow the same procedure as the calibration in the prior analysis. After this step, the process loop continues according to Fig. 1 until a design alternative has been executed.

Case study

The method is applied in an area in central Stockholm, where a utility tunnel for power lines, 5 m in diameter, is modelled. The soil layers in the area are characterized by the Stockholm Esker with coarse-grained material as an open aquifer in the western part and confined by clay in the eastern part (Fig. 7). The pre-Cambrian crystalline bedrock has dominant fracture zones in an E–W, WNW and NW direction (Persson 1998). Although important data in the area is available from the investigations of the planned tunnel such as a model for soil and bedrock stratification, groundwater head measurements and hydraulic tests, there are significant uncertainties. These uncertainties include information on groundwater recharge (from precipitation and leakage from fresh water and sewage water distribution), drainage into existing tunnels, and K values of materials and locations not tested or not tested with sufficient accuracy. Based on an inverse calibrated steady-state SSMC model, three design alternatives for the planned tunnel are modelled. A0 (reference alternative) includes a tunnel without sealing in fracture zones; A1, a tunnel with sealing in fracture zones to K = 10–8 m/s; and A2, same as A1 but with three injection wells in the bedrock. Details of the case study are referred to as supporting information.

Fig. 7
figure 7

Map of the model area, including observation wells in the confined aquifer and observation wells in bedrock. Groundwater pressure levels in the confined aquifer are illustrated by iso-lines interpolated from observation data (Sundkvist 2015). The broken blue lines show a groundwater watershed and the blue arrows show the flow direction in the confined aquifer. The background map illustrates major fracture zones in bedrock, bedrock outcrops, Quaternary soil deposits, and later-age filling material (Stockholms stad 2014). The double lines in a N–S direction indicate the location of the planned tunnel

Groundwater model

The groundwater model is constructed in nine different layers with five different materials in bedrock and soil. Based on major expected differences in K, the soil materials are divided into three categories (Fig. 8): filling material, clay and coarse-grained soil (esker material and glacial till). The bedrock is divided into three categories (Fig. 8): a more fractured top layer, vertical fracture zones, and less fractured bedrock. Each of these materials is assigned prior parameter distributions (see supporting information), and the materials with a dominant response to the effect of the design alternatives are modelled using PPs. The model is calibrated against the observation wells presented in Fig. 7 and is further described in supporting information.

Fig. 8
figure 8

a Three-dimensional model grid and materials. b Groundwater heads of the initial calibration and locations of the observation wells in layer 3 (black dots)

Initial calibration

In a first step, the model is calibrated by manually adjusting the parameter sets to close proximity between the observed and modelled heads. Based on this model, a model parameter field with minimum error variance is calibrated using PEST. In these two steps a conductance value for all tunnels is set at a fixed value, 5 × 10−8 m2/s/m, to represent a reasonable inflow for tunnels in bedrock with cement injection. The K value for the less fractured bedrock is set at a fixed value of 10−8 m/s. Since the K of the other materials is several orders of magnitude higher, particularly in the fracture zones that dominate the inflow into the tunnels in bedrock, and it determines the connectivity with overburden layers, variability of K in the less fractured bedrock is of lesser importance for the result. The K value in the filling material is set at a fixed value of 1 × 10−5 m2/s. The resulting heads can be observed in Fig. 8 and fields of RCH and K in Fig. 9. For five wells in soil, the residuals between observed and modelled heads are between 0.2–0.9 m. In all the other wells, the residual is less than 0.2 m. One observation well in bedrock, 13CW424Hb, is excluded from calibration as it is not sectioned for different fracture zones, which means that it is not possible to determine what the measured head represents. When calibrating the model, the residuals in 13CW424Hb were always greater than 1.5 m, irrespective of the adjustments made.

Fig. 9
figure 9

Recharge (RCH) and hydraulic conductivity (K) fields in each layer for the initial calibration

Inverse probabilistic calibration

Using the result from the PEST calibration, variograms are modelled for the logarithms of each PP. In the SSMC step, the PP is randomized based on log-normal distributions, where the result of the initial PEST calibration represents mean values. The prior uncertainties are defined by the bounds of minimum and maximum values, see Appendix section ‘Material properties’. Because of this definition, a high value for the standard deviations of each parameter set (SD = 1.95) is selected, resulting in prior distributions truncated by their respective bounds. From these definitions of prior distributions, 1,000 SSMC runs are made, which was found to be the practical limit for the power station used for computation in this study. Out of the 1,000 runs, 747 were remaining after nonconverged solutions were removed, 670 were remaining after removing differences in simulated and observed heads greater than 1.5 m, and 563 were remaining after removing solutions with a water balance discrepancy between inflow and outflow greater than 10%. The accepted solutions demonstrate homoscedastic errors without any systematic under—or overestimation between observed and modelled heads. The resulting water balance for the accepted solutions is presented in Appendix section ‘Water balance: accepted SSMC solution’.

Prior analysis

From the calibrated SSMC solutions, changes in groundwater head and water balance are simulated for the three different design alternatives. The planned tunnel is located in layer 8. In A0 (reference alternative), sealing is represented by a conductance value of 5 × 10−8 m2/s/m, as assumed for the other tunnels in the area. In A1, the cells representing adjacent fracture zones are sealed to K = 10−8 m/s. A2 is the same as A1 but with three injection wells in the fracture zones in the bedrock, layer 5 (Fig. 10). In A2, flow into the wells is conditioned to a stop criterion that equals the surface level.

Fig. 10
figure 10

5th, 50th and 95th percentiles for simulated groundwater drawdowns in A0, A1 and A2. A negative value indicates a higher level relative to the calibrated model. The tick marks are located at 10-m intervals

Out of the 563 calibrated solutions, simulations that did not converge in any of the alternatives were removed from further analysis, resulting in 407 remaining solutions. Water balances for the different alternatives are presented in Appendix section ‘Water balance: design alternatives’.

The difference in groundwater level between the calibrated models and each alternative is presented in Fig. 10. In A0, A1 and A2, the median groundwater change is less than 0.5 m for each alternative (P50 in Fig. 10). The 5th percentiles show increased levels in all the alternatives as the injected tunnel can create a barrier in some cases. In the 95th percentiles, all alternatives have reduced levels. The failure criterion is defined for seven observation wells within or in close vicinity to the risk area: 14CW415U, 13CW414U, 15SW01R, 14CW416U, 77C75, 13CW462HB and 13CW415U. These wells are tested for a gwaccept of 0.5 and 1 m respectively. If the groundwater drawdown in any of the wells is higher than gwaccept, the simulation is regarded as a failure.

Although a detailed estimation of kF is beyond the scope of this paper, an overall principle with reasonable estimates is given here. Failure costs can be both direct and indirect. Direct costs refer to costs for repairing the damage, whereas indirect costs include project delays, a lower market value of the damaged buildings, or inconvenience for the tenants. The buildings within the risk area are founded on friction piles in concrete or steel although the basement floors have shallow foundations. The main concern regarding subsidence is not an additional pile load but damage to the basement floor (case M 2772–15, Land and Environment Court at the District Court in Nacka, 2016-11-30). Damage to the basement floor can be aesthetic or functional but is not assumed to result in structural damage affecting the stability of the buildings. With an assumed cost of SEK 4,000/m2 and a building area of 500 m2, kF is estimated at SEK 2 million (SEK 10 = approximately €1). If instead kF is assumed to result in delay costs because of not meeting the acceptance criterion, a delay of 1 month is estimated to cost SEK 20 million (case M 2772–15). During this month, the contractor is supposed to improve fracture sealing or infiltration and then continue with the project. This assumption implies that the modelled groundwater head changes in the steady-state model occur and can be observed within a short period of time after leakage into the bedrock tunnel begins.

The failure ratio together with a prior CBA analysis of the different alternatives is presented in Table 1 for 1 and 0.5 m as the failure criterion. The cost of sealing a fracture zone is estimated at SEK 100,000. With three zones crossed in A1, ci is calculated. In A2, the cost of sealing in A1 added to the installation cost of a permanent infiltration well is estimated at SEK 0.5 million. For the three wells in A2, a cost for infiltrating 0.4 l/sec of freshwater with a unit price of SEK 20/m3 for a 100-year period with discount rate of 3.5%, ci is calculated for this alternative. With 1 m as the failure criterion, A0 is identified as the best prior alternative although the difference is small in comparison with A1. If kF is reduced to SEK 2 million, A0 is still the best prior alternative. With 0.5 m as the failure criterion, A1 is identified as the best prior alternative. If kF is reduced to SEK 2 million, A0 is the best alternative.

Table 1 Cost-benefit analysis (CBA) for the different alternatives, A0, A1 and A2, with 1 m and 0.5 m groundwater drawdown as the failure criterion

Pre-posterior analysis

Following the description in section ‘Prior parameter estimates’, OVL and EVI are calculated in each of the PP sets (Figs. 11 and 12). OVL is presented as the sum of OVL for all three alternatives, resulting in a theoretical variation of OVL between 0 and 3. The result of the SSMC calibration shows that some PP sets have several values (Ɵ) equal to the bounds of the prior parameter estimates, resulting in fewer unique values (Ɵ) within the entire distribution (red dots in Fig. 11). Since Ɵ and the bounds have the same value, they cannot be ranked and differences between high and low Ɵ cannot be observed for (Ɵ|Fc) and (Ɵ|F). Nevertheless, these Ɵ are ranked in the calculations, which often result in cases with a low OVL and a high EVI, which is an artificial result. To eliminate such cases, only PPs with more than 300 unique Ɵ’s are presented in Fig. 12. This elimination affects K to a large extent for coarse-grained material and fracture zones, indicated by the white areas at the location of eliminated PPs in Fig. 12 (compared with Fig. 9). OVL is in general lower, with 1 m compared to 0.5 m as a failure criterion. Low P(F) values at 1 m result in fewer values in the failure category. Fewer values in one of the categories increases the possibility of both type I and II errors if these values are clustered by chance. Type I errors (false confirmation of a large OVL) occur for clusters close to the bounds of the value ranges for a PP, which can be observed by the scattered presence of low OVL values for 1 m in RCH and K for clay in Fig. 12. For K, uppermost bedrock and fractured bedrock spatial clusters of lower OVL values are observed both for 0.5 and 1 m. These clusters are correlated with locations close to settings that have a direct influence on failure (risk area, tunnel and infiltration wells). With these consistencies, it is reasonable to expect that future observations at the locations of these clusters can improve the possibility of determining failure or no failure (F or Fc) in an alternative.

Fig. 11
figure 11

Correlation between EVI (in thousands SEK: KSEK) and OVL for three classes of unique values (1–300, 300–400 and 400–407 respectively) with a 0.5 m and b 1 m as a failure criterion

Fig. 12
figure 12

OVL and EVI (in thousands SEK: KSEK) for gwaccept of 0.5 m and 1 m in each of the PP sets (RCH and K for the different materials)

Although a low OVL indicates that F can be detected, it is possible that sampling at this location has no or low EVI if the sampling does not change the recommended alternative from the prior analysis. The stability of the result is tested using a bootstrap analysis, where the null hypothesis, H0, states that EVI is independent of the sample size. Sample sizes of 100, 150, 200, 250, 300, 350 and 407 are tested. In sample sizes <300, EVI calculations yield unstable results since P(F) is low, which results in no observations of (Ɵ|F). With 0.5 m as a failure criterion, sample sizes >300 typically result in ± SEK 25000 for an 80% confidence interval of future observations. This result can be compared with the scatter plot in Fig. 11, which shows a deviation in this range for PPs with the same OVL value. With 1 m as a failure criterion, the difference between A0 and A1 is very small (A2 is never the preferred pre-posterior alternative), which in general results in a small EVI. Since zero is often within the 80% confidence interval of future observations, the result indicates that additional sampling is not beneficial with 0.5 m as a failure criterion.

Even if the number of realizations for 0.5 m as the failure criterion is sufficient to detect a positive EVI, the values are low and no specific PP is a major contribution to the pre-posterior analysis. This is a result of a cause-effect chain (initiated by groundwater leakage and infiltration, groundwater drawdown in different layers in bedrock, and failure determined by drawdown in the soil layer) that is more dependent on the interactions in the whole system rather than specific features. For other problems, where both the cause and the effect are closely related, higher EVI values are expected at PPs connected to the studied phenomena. This situation is likely to occur in water supply studies where failure is related to extracted groundwater quantities, which are in return related to K in the pumped layer. Even if the EVIs are reliable in the presented case study, the highest values (with more than 300 unique Ɵ’s) are about SEK 50000. This means that cp must be higher than this value to create a positive ENV. In addition, the investigation method needs to be sufficiently accurate to detect the actual value of the PPs. The previous investigations in the case study are both more expensive and quite inaccurate (see the Appendix), meaning that none of these conditions are assumed to be met. Low ENV values do not mean that the model is unreliable, more that additional information is not expected to be worthwhile. Furthermore, it implies that the recommendation in the prior analysis is sufficient. Since the EVI is calculated for one investigation at a time (a limitation of the method presented), it is possible that combinations of several investigations would produce another result.

Despite large variations in parameter values and water balance, the different realizations indicate low failure rates. The main reason for this is that the confined aquifer is very conductive and has a large groundwater flow relative to the bedrock layers and inflow into the tunnel. Nevertheless, additional investigations do not need to be worthless if they are able to change the conceptual understanding of the system and not only parameter uncertainties that are addressed in the presented method.

Conclusion

This article presents a novel method for VOIA to assess the need for additional information when estimating the effect of multiple design alternatives on future disturbances of hydrogeological systems. The method presented here facilitates a spatial VOIA of hydrogeological information for multiple design alternatives, which to the knowledge of the authors has previously not been possible. With this method, the economic benefit of additional investigation is presented in maps, which can be an important decision support tool with regard to additional investigations. The case study results indicate low expected value of information because of low failure rates for the studied alternatives, and a complex cause-effect chain where the resulting failure probability is more dependent on the interactions within the whole system rather than on specific features. This result means that no additional investigation can be recommended at any specific location, and that the recommendation from the prior analysis is sufficient. It should be emphasized that this conclusion is site-specific and that the value of hydrogeological information in projects relating to groundwater drawdown-induced land subsidence is expected to exhibit a large degree of variation between different projects and different hydrogeological settings.

Future research on implementing the method in less complex cause-effect chains, where both the cause and the effect are more closely related than the presented case study, is recommended for calculation of the relationship between the ability of hydrogeological information to represent the failure criterion and the value of additional hydrogeological information. Furthermore, this ability is expected to improve with additional inverse calibrations. Finally, it is recommended to examine the possibility of calculating the economic benefits of combined investigation programmes and not only one investigation programme at a time.