1 Introduction

The rainy season or rain band observed in an East Asia summer monsoon season is called the Baiu in Japan, the Mei-yu in China and the Changma in Korea. During this rainy season, the rain band or rain front (the Baiu rain band) stagnates over the Yangtze River valley, with its eastern edge passing through the Japan Islands (Ninomiya and Akiyama 1992). The onset and withdrawal of the Baiu season depend on the location (Wang and Ho 2002). The main Baiu season in Japan and Korea starts from June and ends in July. In this period very heavy precipitation events occur frequently and they often lead to natural disasters. Thus, future change in precipitation and its intensity is a critical issue to the people living in East Asia.

Dai (2006) have investigated the reproducibility of precipitation intensity simulated by Atmosphere–Ocean General Circulation Models (AOGCMs) participated in Fourth Assessment Report (AR4) of the Intergovernmental Panel on Climate Change (IPCC 2007). These models are also called the Couple Model Intercomparison Project 3 (CMIP3) models. He found models underestimate frequency of heavy rainfall and overestimates frequency of light rainfall. Although he investigated the reproducibility of precipitation intensity as well as precipitation climatology from a variety of viewpoint, the analysis was based on annual mean and global scale perspective. Also, the number of target AOGCM was restricted to only four, because of limitation to the availability of daily precipitation output of models at the time of analysis. Tu et al. (2009) investigated the reproducibility of extreme precipitation by 25 CMIP3 models focusing on China. They found model reasonably reproduces intense precipitation in northern part of China, but models underestimate intense precipitation in southern part of China. The results are based on annual statistics and target domain is restricted to land part of China.

The future projections by the Multi-Model Dataset (MMD) of AOGCMs in IPCC (2007) show that precipitation increases in East Asia in all season at the end of twentyfirst century (IPCC 2007). Kripalani et al. (2007) investigated the future change of precipitation in summertime East Asia rainy season with 22 CMIP3 models. They found significant increase of precipitation over Korea, Japan and north China. As for the change in precipitation intensity for the East Asian rainy season, there is no particular description in IPCC (2007). Based on projections of specific AOGCMs called MIROC which participated in IPCC (2007), Kimoto et al. (2005) reported that precipitation intensity increase around Japan in summer (June to August).

One of the major sources of uncertainty in simulations by AOGCMs is arising from uncertainty of modeling. Multi-Model Ensemble (MME) average can be expected to outperform individual models in case of present-day climate simulations (Lambert and Boer 2001; Gleckler et al. 2008; Reichler and Kim 2008) as well as seasonal forecast (Palmer et al. 2004; Hagedorn et al. 2005). In case of global warming projections, IPCC (2007) summarizes the performance of models and future change of climate in terms of MME approach. Kimoto (2005) and Kripalani et al. (2007) have also introduced MME approach to the evaluation of CMIP3 models and their future projections for East Asian summer monsoon. They found significant increase of precipitation over most part of East Asia region. Li et al. (2011) investigated future change in precipitation extremes in July and August over China by MME of 24 CMIP3 models. They found increase of extreme precipitation over land of China. All these studies use simple MME (un-weighted) average in which all models are treated as equally.

Giorgi and Mearns (2002) introduced weights into MME for assessing regional climate change. They defined model weights as a measure of a model’s ability to simulate observed climate. Applying this weighted MME method to the phase 2 of the Coupled Model Intercomparison Project (CMIP2) models, Min et al. (2004) found the increase of precipitation in East Asian summer monsoon. Introducing similar weighted MME method to CMIP3 models, Kitoh and Uchiyama (2006) also reported the increase of precipitation in the Baiu rain band. However, change in precipitation intensity in the East Asian summer monsoon with weighted MME method is not yet investigated for CMIP3 models.

In order to reproduce intense precipitation of summertime East Asian rainy season, model with higher horizontal resolution is required (Kusunoki et al. 2006). Using 20-km mesh atmospheric global model, Kusunoki et al. (2006) and Kusunoki and Mizuta (2008) have shown that intensity of summertime precipitation will increase in the future over East Asia. Feng et al. (2011) investigated change in precipitation intensity over China with a 40-km mesh atmospheric global model. They found significant increase of extreme precipitation over southeastern China. Although higher horizontal resolution models well reproduce intense rainfall, they are computationally so expensive that the number of studies are limited to draw reliable and robust conclusion.

The purpose of this paper is to investigate the future changes in precipitation intensity in the East Asian summer monsoon with weighted MME average approach using CMIP3 models. The target region of this study covers Japan, Korea and a part of China including ocean area of East Asia.

Section 2 contains a brief description of the models and dataset. Section 3 verifies the precipitation climatology and precipitation intensity in the present-day climate simulations. Section 4 shows the future change in precipitation intensity as well as precipitation climatology. Section 5 discuses the reliability of future projections. This paper is concluded in Sect. 6.

2 Models

In response to a proposed activity of the World Climate Research Programme’s (WCRP’s) Working Group on Coupled Modeling (WGCM), CMIP3 are archived at the Program for Climate Model Diagnosis and Intercomparison (PCMDI). This dataset is called the “WCRP CMIP3 multi-model dataset” (http://www-pcmdi.llnl.gov/ipcc/about_ipcc.php) which includes climate model output for IPCC AR4. In this paper, we refer models included in the WCRP CMIP3 multi-model dataset as the CMIP3 models.

The models and data used in this study are listed in Table 1. These models are a part of the CMIP3 models. We have only selected the models which archived daily precipitation data, but models with 30-days in every month (360 days in every year) are excluded in our analysis. The horizontal resolution of models at 35°N ranges from about 450 km (G23) to about 100 km (T106). For the present-day climate, we used simulations of the twentieth Century Climate in Coupled Models (20C3M). Selected target period for the present-day climate simulations is 10 years from 1991 to 2000 in the end of the twentieth century. Some models cover 8 or 9 years, because their simulations end in year 1998 or 1999. When model climatology is evaluated, climatology averaged for 20–30 years is generally assessed. Since we are focusing the intensity of precipitation in this study, we have to use daily data of observed precipitation starting from year 1997. This is the reason why we have to limit the target period to the last 10 years of twentieth century. For the future climate simulations, target period is 10 years from 2091 to 2100 in the A1B emission scenario projections. Two models cover 8 years, because their simulations end in year 2098. Target months are June and July, because most of the precipitation and intense precipitation concentrate to these months in the rainy season over Japan and Korea. The area (110–150°E, 20–50°N) is selected as our target of analysis, because the Baiu rain band stagnates over this area in June and July.

Table 1 Specifications of daily precipitation data provided by 15 CMIP3 models used in this study

3 Present-day climate simulations

3.1 Verification data

To verify the simulated precipitation, we used the One-Degree Daily (1DD) data of GPCP V1.1 compiled by Huffman et al. (2001). Horizontal resolution is one degree in longitude and latitude, corresponding to a grid spacing of about 90 km over Japan. The data cover 12 years from 1997 to 2008. Dai (2006) used the daily precipitation data from Tropical Rainfall Measuring Mission (TRMM, http://trmm.gsfc.nasa.gov) to verify the models. However, we did not use this data, because the region is restricted from 37.5S to 37.5N which does not cover our target region over East Asia.

3.2 Index of precipitation intensity

We used the Simple Daily precipitation Intensity Index (SDII) by Frich et al. (2002). SDII is defined as the total precipitation in June and July divided by the number of rainy day (precipitation ≥ 1 mm/day). If there is no rainy day at a grid point, we gave missing flag at this grid point. SDII is widely used in model studies such as Dai (2006) and the chapter 10 “Global Climate Projection” of IPCC (2007).

In order to evaluate the uncertainty originated from the choice of metric for precipitation intensity, we have introduced another precipitation intensity index; the number of heavy rain days (precipitation ≥ 30 mm/day) in June and July (R30).

3.3 Precipitation climatology for June to July

In the beginning of analysis, we have verified the precipitation climatology before the investigation of precipitation intensity. Figure 1 compares the simulated precipitation climatology with observation for June to July in the present-day climate simulations. We have calculated the skill score S proposed by Taylor (2001) to evaluate the model’s reproducibility of observed climatology. S is defined by

$$ S = \frac{4(1 + R)}{{\left( {\sigma + 1/\sigma } \right)^{2} \left( {1 + R_{0} } \right)}} $$

where R is the spatial correlation coefficient between observation and simulation, σ is spatial standard deviation of simulation divided by the that of observation, and R 0 is the maximum correlation attainable. Here we assumed R 0 = 1. S evaluates spatial correlation coefficient as well as spatial standard deviation. Simulated data by models were interpolated to 1-degree mesh grids points of GPCP 1DD. In Fig. 1, the values of root mean square (RMS), R and S for individual model are shown. Most models underestimate the amount of precipitation over China, Korea and Japan. PCM_T42 (o) lacks the Baiu rain band, resulting in the largest RMS error and the only one negative spatial correlation coefficient among all models. In contrast, MIROC_T106 (l) well simulates the Baiu rain band, but precipitation over southern part of China and Taiwan is overestimated.

Fig. 1
figure 1

Observed (top panel) and simulated (ao) precipitation climatology for June to July. Unit is mm/day. Observation is the average from 1997 to 2008 (12 years) of the GPCP 1DD V1.1 dataset (Huffman et al. 2001). Most model simulations are the average from 1991 to 2000 (10 years). The values of root mean square (RMS) error, spatial correlation coefficient (R) and skill score S by Taylor (2001) verified against observation are shown at the right of each panel. p Simple average of all 15 models (MME15). q S-weighted average of all 15 models (MM15W). r Simple average of the five best models based on S (MME05; model e, f, j, l, n). s S-weighted average of the five best models based on S (MM05W)

The introduction of multi-model ensemble (MME) of multiple models is effective to reduce errors and uncertainties of an individual model (Giorgi and Mearns 2002; Min et al. 2004). We have calculated a simple average of all 15 models (MME15), a S-weighted average of all 15 models (MM15W), a simple average of the five best models based on S (MME05), and a S-weighted average of the five best models based on S (MM05W). The five best models are MIROC_T106 (l, S = 0.863), CNRM_T42 (e, 0.765), GISS-AOM_G29 (j, 0.765), MRI _T42F (n, 0.761) and CSIRO_T63 (f, 0.730). Distributions of precipitation climatology by MMEs are shown in Fig. 1p–s. Reproducibility of MM05W (s) is higher than those of any other MMEs (p, q, r).

Figure 2 quantifies and visualizes the skill of models. Since the Taylor diagram (Taylor 2001) is derived from bias corrected RMS difference, we also plotted bias and RMS error in Fig. 2a for detail evaluation of model performance. ‘Bais’ is defined as domain-averaged difference of model climatological value from observed climatological value. Underestimation of precipitation is recognized by negative bias shown in Fig. 2b. The highest performance of MIROC_T106 (l) is evident from Taylor diagram of Fig. 2b in terms of skill score S (contour).

Fig. 2
figure 2

Skill of precipitation climatology for June to July simulated by models verified against the GPCP 1DD V1.1 data (Fig. 1, top panel). The target domain is the same as in Fig. 2 (110–150°E, 20–50°N). a Root mean square error (RMSE)s and biases. The unit is mm/day. The domain average of observation is shown above the panel. b Taylor diagram for displaying pattern statistics (Taylor 2001). The standard deviation of the observation in the domain is shown above the panel. The contour shows the measure of skill “S ”evaluating both the standard deviation and correlation coefficient. MME15 denotes a simple average of 15 models. MME15W denotes a S-weighted average of 15 models. MME05 denotes a simple average of five best models evaluated by S. MME05W denotes a S-weighted average of five best models

Skill scores for MMEs are also plotted in Fig. 2. RMS errors of MME05 and MME05 W are smaller than any other individual models. RMS errors of MME15 and MME15W (mark +) are almost comparable to that of most skillful individual models, but they are larger than the RMS errors of MME05 and MME05W (mark x). The advantage of MME05 and MME05W over MME15 and MME15W is reasonable, because erroneous models are excluded in the calculations of MME05 and MME05W. The advantage of introducing multi-model ensemble for bias is not clear as that for RMS error. The RMS error and bias of MME using weights (red mark x) are slightly smaller than those without weights (black mark x).

In terms of skill S (Fig. 2b), the skill of MME05 and MME05W are larger than those of MME15 and MME15W. The skill S of MME using weights (red mark) are larger than those without weights (black mark). These tendencies are generally similar to RMS error and bias case (Fig. 2a), but MMEs cannot outperform the most skillful model MIROC_T106 (l).

The advantage of MME over simple average with equal weights is not as remarkable as we might expect. According to the definition of Taylor (2001), we calculated the skill S after removing biases of models. This means that differences among skill S become much smaller than those among original performance of models.

Inoue and Ueda (2009) indicated the advantage of CMIP3 MME over individual models for simulating observed summer time precipitation in broad area of East Asia (40–160°E, 20°S–50°N). The reason why the advantage of MME in our study is not so striking may be attributed to our smaller area (40–160°E, 110–150°N) specific to the Baiu rain band. Smoothing effect of MME might blur the concentrated small scale structure of the Baiu rain band.

3.4 Precipitation intensity for June to July

Figure 3 compares the simulated SDII with observation for June to July in the present-day climate simulations. Most models underestimate intense precipitation over China and East Asia Sea. MIROC_T106 (l) shows best performance among all the models, but it still underestimates precipitation intensity. Distributions of precipitation climatology by MMEs are shown in Fig. 3p–s. The five best models are MIROC_T106 (l, S = 0.769), MRI _T42F (n, 0.530), MIROC_T42 (m, 0.409), CSIRO_T63 (f, 0.383) and CNRM_T42 (e, 0.344). Note that five best models for SDII is slightly different from those for precipitation climatology. GISS-AOM_G29 is included in the five best models for precipitation climatology but not for SDII, whereas MIROC_T42 is included in the five best models for SDII but not for precipitation climatology. Reproducibility of MM05W (s) is higher than those of any other MMEs (p, q, r).

Fig. 3
figure 3

Same as Fig. 1, but for Simple Daily precipitation Intensity Index (SDII) for June to July. Climatology is calculated only if SDII data exist for whole years of target period at each grid point. No shading region denotes missing data. The five best models based on S of SDII are model e, f, l, m, and n

Underestimation of precipitation intensity is recognized by negative bias shown in Fig. 4a. Similar to the case of precipitation climatology in Figs. 1 and 2, the highest performance of MIROC_T106 (l) is evident from Taylor diagram of Fig. 4b for SDII. In case of RMS error and bias (Fig. 4a), advantage of MME is not clear as in precipitation climatology (Fig. 2a). In terms of skill S (Fig. 4b), the skill of MME05 and MME05W (mark x) are larger than those of MME15 and MME15W (mark +). The skill S of MME using weights (red mark) are larger than those without weights (black mark), but MMEs cannot outperform the most skillful model MIROC_T106 (l).These tendencies are generally similar to precipitation climatology case (Fig. 2b).

Fig. 4
figure 4

Same as Fig. 2, but for SDII for June to July

The horizontal resolution of atmospheric part of MIROC_T106 (l) is highest among all 15 models. MIROC_T106 (l) show the highest skill score S, but the second highest resolution model CCSM_T85 (b) shows very low skill score S (Fig. 4). The models with higher horizontal resolution (smaller grid spacing) tends to have relatively higher skill score. The correlation coefficient between grid spacing and skill score S is −0.407, but this value is statistically not significant at 95% level. Sample size of 15 models is too small to draw a reliable conclusion. If we use other statistics of RMS error, above conclusion does not change.

For a single model, some studies indicate that higher horizontal resolution model performs better than lower resolution model does as for East Asia summer monsoon precipitation. Kimoto et al. (2005) reported that the reproducibility of precipitation intensity by MIROC_T106 (l) is higher than that of MIROC_T42 (m), although their target month and precipitation intensity index is different from ours. Judging from Fig. 3, our results are consistent with the result of Kimoto et al. (2005). Using several different horizontal resolution versions of a single atmospheric model, Kusunoki et al. (2006) indicated the higher horizontal resolution model tends to show improved reproducibility of heavy precipitation for the Baiu rain band. Our result is qualitatively consistent with the result of Kusunoki et al. (2006).

Figure 5 illustrates the relation between skill of precipitation climatology and SDII skill. Models with higher reproducibility of precipitation climatology tend to show higher reproducibility of SDII. The correlation coefficient between skill of precipitation climatology and SDII skill is +0.636 which is statistically significant at 95% level. This suggests that we have to improve model’s precipitation climatology itself for the higher reproducibility of intense precipitation.

Fig. 5
figure 5

Relationship between reproducibility of precipitation climatology and precipitation intensity (SDII). The horizontal axis is the skill S of precipitation climatology. The vertical axis is the skill S of SDII. Open circle denotes models with flux adjustment. Correlation coefficient between climatology skill and SDII skill is +0.636 which is statistically significant at 95% level

In Fig. 5, models using flux adjustment are denoted by circles. Although MRI_T42F (n) with flux adjustment shows the second highest skill score, some models with flux adjustment show very low skill score. The advantage of using flux adjustment is not definitive due to the small sample size of four. The best three model based on SDII skill score S are MIROC_T106 (l, S = 0.769), MRI _T42F (n, 0.530), MIROC_T42 (m, 0.409) which use Arakawa-Schubert (AS) scheme for deep convection (Table 1). This suggests some advantage of AS scheme over other schemes. However, the model b using AS scheme shows very low skill score. Considering the horizontal resolution of all fifteen models are not same, the advantage of AS scheme over other schemes cannot be separated from skill dependency on horizontal resolution. Similar to the case of flux adjustment, the small sample size of models hinders to draw definitive conclusion.

Introducing another precipitation intensity index such as the number of heavy rain days (precipitation ≥ 30 mm/day) in June and July (R30), we have done exactly the same calculation as Figs. 3, 4 and 5. Qualitatively similar results are also obtained in R30 case (figure not shown).

4 Future climate simulations

4.1 Precipitation climatology

Figure 6 illustrates future change in precipitation climatology. Precipitation around Japan increases in model a, b, c, g, i, l and m, whereas precipitation decreases in model e and j. In general, tendency of increase of precipitation over East Asia is stronger than that of decrease of precipitation, although the differences among models are large.

Fig. 6
figure 6

Future changes (F: 2091–2100) in precipitation climatology for June to July relative to present-day climatology (P: 1991–2000). Change ratios (F − P)/P are shown (%). Red contours show the 95% confidence level based on Student’s t test. The skill score S for the present-day climate simulations is shown at the left of each panel

The reliability of projections might be improved if models are weighted according to measure of a model’s ability to simulate the observed climate. In order to reduce biases and uncertainty of an individual models, MME average approach provides an improved ‘best estimates’ projections (Giorgi and Mearns 2002; Min et al. 2004; Kitoh and Uchiyama 2006; IPCC 2010). Figure 7 shows the future change in precipitation climatology by four different MME averages. Simple average of all 15 models shows statistically significant increase of precipitation over China and Japan (Fig. 7a). The distribution of precipitation change with skill S as weighting factor of average (Fig. 7b) is very similar to simple average (Fig. 7a). Figure 7c shows the precipitation change using the five best models for reproducing the observed climate without weighting factor. Decrease of precipitation is found over the central part of China and over Taiwan, but these changes are statistically not significant. The area of statistically significant increase of precipitation is smaller than that in Fig. 7a, b, because the degree of freedom for five models is much smaller than that for 15 models. If we introduce weights for ensemble average of the five best models (Fig. 7d), the distribution of precipitation change is almost same as that without any weights (Fig. 7c). The distributions of precipitation change with all models (Fig. 7a, b) are much smoother than those with five models (Fig. 7c, d) because of larger ensemble size. Precipitation change by the top model MIROC_T106 (Fig. 6l) shows statistically significant increase of precipitation over the northern part of Korea and the northern part of Japan (Hokkaidou). Contribution of these changes projected by MIROC_T106 to the weighted average in Fig. 7d is recognized due to the large change by MIROC_T106 and large weighting factor for MIROC_T106.

Fig. 7
figure 7

Comparison of future changes in precipitation climatology among different ensemble average methods. a The simple ensemble average of all fifteen models. b The S-weighted ensemble average of all fifteen models. The skill score S is based on the present-day climate simulations (Fig. 1). c The simple ensemble average of five best models: MIROC_T106 (l, S = 0.863), CNRM_T42 (e, 0.765), GISS-AOM_G29 (j, 0.765), MRI _T42F (n, 0.761), CSIRO_T63 (f, 0.730). d The S-weighted ensemble average of five best models. Red contours show the 95% confidence level based on Student’s t test. In the statistical significance calculation of Student’s t test, standard deviations are also weighted by S

4.2 Precipitation intensity

Figure 8 shows future change in precipitation intensity of SDII. When we focus on the individual models, distribution of change in SDII is qualitatively similar to that of precipitation climatology (Fig. 6) in most cases. The increase of SDII change projected by CSIRO_T63 (Fig. 8f) seems to be abnormally larger than that of other models. We found this is due to much larger reduction of rainy days by 30–40% as compared to other models. This leads to very large increase of SDII, because SDII is inversely proportional to rainy days. However, it would be not easy to identify the reason why CSIRO_T63 projects much fewer rainy days than other models.

Fig. 8
figure 8

Same as Fig. 6, but for SDII

MME averages for SDII are depicted in Fig. 9. Ensemble averages for all models show statistically significant increase of precipitation intensity over the almost whole domain (Fig. 9a, b). Difference between average without weights (Fig. 9a) and with weights (Fig. 9b) is small. Changes in SDII using the five best models show increase of precipitation intensity over the almost whole domain (Fig. 9c, d), but the area of statistically significant change is smaller than that for all model average (Fig. 9a, b). Note that the five best models for SDII (Fig. 9) are not the same as those for precipitation climatology (Fig. 7). Contribution of change by MIROC_T106 (Fig. 8l) is evident over the East China Sea and Japan in Fig. 9d. This is similar to the case of precipitation climatology in Fig. 7d. In case of the five best models, difference between average without weights (Fig. 9c) and with weights (Fig. 9d) is also small as in the case of all models (Fig. 9a, b). The distributions of precipitation change with all models (Fig. 9a, b) are much smoother than those with five models (Fig. 9c, d) because of larger ensemble size. In summary, we can conclude that precipitation intensity will increase almost all regions over East Asia in the rainy season. Our results is qualitatively consistent with projections using high horizontal resolution atmospheric model (Kusunoki et al. 2006; Kusunoki and Mizuta 2008).

Fig. 9
figure 9

Same as Fig. 7, but for SDII. The five best models are MIROC_T106 (l, S = 0.769), MRI _T42F (n, 0.530), MIROC_T42 (m, 0.409), CSIRO_T63 (f, 0.383) and CNRM_T42 (e, 0.344)

Figure 10 shows the dependence of MME average for SDII change on the selection of skill measure as weights. In case of using all fifteen models (Fig. 10a, c, e, g), dependence on skill measure is very small. SDII increases almost all regions over East Asia. In case of using five best models (Fig. 10b, d, f) based on respective skill measure, differences in SDII change is larger than that among all fifteen models (Fig. 10a, c, e, g). This is due to the differences in choice of five best models for each skill measure and to the small number of models for averaging. Nevertheless, SDII increases almost all regions over East Asia, showing statistically significant increases over China, Korea and Japan.

Fig. 10
figure 10

Dependence of MME average for SDII change on the selection of skill measure as weights. Red contours show the 95% confidence level. a The S-weighted ensemble average of all fifteen models. Same as Fig. 9b. b The S-weighted ensemble average of five best models (model e, f, l, m, n in Table 1 and Fig. 3). Same as Fig. 9d. c Same as (a) but for root mean square (RMS) error. Weights are given to be 1/RMS. d Same as (b) but for RMS error. The five best models are g, k, l, m and n. e Same as (a) but for spatial correlation coefficient (R). Weights are given to be (R + 1)/2. f Same as (b) but for R The five best models are e, f, k, l and m. g The simple ensemble average of all fifteen models

In order to investigate the sensitivity of the results to the selection of metrics for precipitation intensity, introducing the number of heavy rain days R30 we have conducted the same calculations as Fig. 10. Different metrics gives different ranking of models (Gleckler et al. 2008), but selected five best models are basically almost similar to SDII case. Figure 11 shows the dependence of MME average for R30 change on the selection of skill measure as weights. In case of using all fifteen models (Fig. 11a, c, e, g), dependence on skill measure is very small. R30 increases almost all regions over East Asia. Compared with SDII case (Fig. 10a, c, e, g), area of statistically significant regions are small to the south of Japan in R30 case (Fig. 11a, c, e, g). In case of using five best models (Fig. 11b, d, f), increase in precipitation intensity is not statistically significant over southern part of Japan and to the south of Japan. In general, the area of statistically significant increase of precipitation intensity measured by R30 (Fig. 11) is much smaller than that by SDII (Fig. 10).

Fig. 11
figure 11

Same as Fig. 10 but for the number of heavy rain days (precipitation ≥ 30 mm/day) in June and July (R30). b The five best models for S are e, g, l, m and n in Table 1. d The five best models for RMS are e, g, l, m and n. f The five best models for R are a, e, l, m and n

5 Discussion

5.1 Contributions to SDII change

The change in total precipitation as well as the change in the number of rainy day contribute future change of SDII, since SDII is defined as total precipitation divided by the number of rainy day. We have estimated relative contribution of change in total precipitation and rainy days to the total SDII change. In Fig. 12, left column shows SDII change projected by five best models for reproducing precipitation climatology. Center column shows the contribution by the change in total precipitation. Future change in SDII was calculated only by future climatology of total precipitation, assuming future number of rainy days is the same as the present-day climate. On the contrary, right column shows the contribution by the change in rainy days. Future change in SDII was calculated only by future climatology of rainy days, assuming future climatology of total precipitation is the same as the present-day climate. In case of MIROC_T106 (a, f, k), spatial distribution and amplitude of SDII change in (f) is almost similar to (a), while spatial distribution of (k) is just the opposite of (f) with weaker amplitude. This suggests that the contribution of total precipitation is much larger than that of rainy days and the contribution of rainy days counteracts that of total precipitation. Note that spatial distribution of (f) coincides with the spatial distribution of change in precipitation climatology itself (Fig. 6l) except for amplitude. Spatial distribution of change in rainy days of MIROC_T106 is qualitatively similar to (k) with opposite sign, because SDII is inversely proportional to rainy days. In case of CNRM_T42 (b, g, f), GISS-AOM_G29 (c, h, m), MRI_T42F(d, I, n), contribution of total precipitation is also larger than that of rainy days. On the contrary, in case of CSIRO_T63 (e, j, o), contribution of rainy days dominates over that of total precipitation. This is because CSIRO_T63 projects much larger reduction of rainy days (30–40%) compared with other models.

Fig. 12
figure 12

Contribution of future changes in precipitation climatology (fj) and rainy days (ko) to changes in SDII (ae). Five best models for reproducing precipitation climatology (Fig. 7) are selected; MIROC_T106 (a, f, k, S = 0.863), CNRM_T42 (b, g, l, 0.765), GISS-AOM_G29 (c, h, m, 0.765), MRI _T42F (d, i, n, 0.761), CSIRO_T63 (e, j, o, 0.730). ae SDII change same as in Fig. 8. fj SDII changes are calculated only by future climatology of total precipitation. ko SDII changes are calculated only by future climatology of rainy days

5.2 Interpretation of SDII change

In order to interpret future change in precipitation climatology and SDII, we have calculated change in horizontal transport of moisture. In Fig. 13, right column shows change in vertically integrated water vapor flux and its convergence. Selected models are same as Fig. 10. In case of MIROC_T106 (a, f, k), spatial distributions of (a) and (f) and convergence in (k) are qualitatively similar. This means changes in precipitation climatology and SDII can be interpreted as the moisture convergence change associated with change in horizontal transport of moisture. Considering future change in rainy days are small (Fig. 12k), it is reasonable that the increase of daily precipitation can be attributed to the enhancement of moisture transport. Clockwise water vapor flux change in (k) to the south of Japan is due to the intensification of subtropical high (figure not shown). Similar interpretation can be applied also to GISS-AOM_G29 (c, h, m), MRI_T42F(d, I, n). The intensification of subtropical high is often projected by CMIP3 models (Kimoto 2005; Kripalani et al. 2007) and by higher horizontal resolution atmospheric models (Kusunoki et al. 2006, 2011; Kusunoki and Mizuta 2008).

Fig. 13
figure 13

Comparison among changes in precipitation climatology (ae), SDII (fj) and vertically integrated water vapor flux (ko; arrow, Kg/m/s) and its convergence (ko; shade, mm/day). Selected models are same as Fig. 12. The unit of convergence is converted to mm/day assuming the density of liquid water as 1 g cm−3. Note that displayed region is extended toward south and east by 10 degree compared with Figs. 6 and 8 to cover subtropical high area. Hotelling’s T2 statistics (Storch and Zwiers 1999; Wilks 2011) was applied for statistical significance test of vector change in (ko). Red contours and red arrows show the 95% confidence level

In case of CNRM_T42 (b, g, f), and CSIRO_T63 (e, j, o) changes in precipitation climatology and SDII can be also interpreted as the moisture convergence change, but the striking difference from previous three models is that the weakening of subtropical high associated with anticlockwise vapor flux change (l, o). Li et al. (2011) have indicated that GFDL_G47 projects anticlockwise circulation change over Western Pacific Ocean which leads to the increase of extreme precipitation over China.

5.3 Reliability of future projections

The reliability of future climate projected by models is often assessed according to the ability to reproduce observed climate. An alternative method to infer the reliability of future projection is proposed by Whetton et al. (2007) and Abe et al. (2009). They tried to evaluate the reliability of future projections by the inter-model similarity both for present-day climate simulations and future climate simulations, assuming that models which are more similar to one another for the present-day climate simulations have also similar response for future climate simulations. This approach corresponds to a perfect model method recommended by IPCC (2010).

Figure 14 shows scatter plots of spatial correlation coefficient among inter-model ensemble between present-day climate and future climate. In case of precipitation climatology (Fig. 14a), firstly we have calculated spatial correlation coefficient between present-day climatology of one model and another model. Then, we have made same calculation for all possible 105 (=15C2) pairs of model among 15 models. Finally, we have made same calculation for future climatology. In Fig. 14 a pairs of model with high similarity (high correlation coefficient) in the present-climate simulation tend to show high similarity (high correlation coefficient) in the future climate simulation. Correlation coefficient r between coefficient of present-day simulation and that of future simulation, which we here refer to as ‘present-future correlation coefficient’, is 0.802. It is noteworthy to indicate that inter-model correlation coefficient among the five best models (red mark x) are relatively higher than coefficients of other remaining pairs both for present-day climate and future climate. In other words, red marks are located much closer to the top-right corner of the panel (Fig. 14a) than black marks are. This means that skillful models tend to show relatively good agreement on the future spatial distribution of precipitation, which can be considered as a kind of measure to evaluate the reliability of future projection.

Fig. 14
figure 14

Relationship of spatial correlation coefficient among inter-model ensemble between present-day climate and future climate. Number of all possible pairs of models are 15C2 = 15 × 14/2 = 105. Target area is the same as Fig. 1 (110–150°E, 20–50°N). Correlation coefficient r between correlations for present-day climate and those for future climate are shown above each panels. a Precipitation climatology for present-day and the future climate simulations. Red marks denote the correlations among five best models (Fig. 7c, d). b SDII for present-day and the future climate simulations. Red marks denote the correlations among five best models (Fig. 9c, d). c Precipitation climatology for present-day and the change ratio in future climate simulations. d SDII for present-day and the change ratio in future climate simulations

In case of SDII (Fig. 14b), the present-future correlation coefficient 0.823 is higher than that of precipitation climatology (0.802, Fig. 14a). However, similarity among skillful models (red mark x) is not high as in precipitation climatology case (Fig. 14a) both for the present-day climate and future climate.

If we use future change instead of absolute value of future climatology in the calculation of inter-model ensemble correlation, relation between correlation coefficient of present-day climate and future climate disappears as is shown by Fig. 14c, d. In fact, the present–future correlation coefficients r are almost zero. This suggests that present–future correlation coefficient for change in precipitation climatology and SDII is less effective metric than that for their absolute value in order to evaluate reliability of future change. Model tends to show lower skill for local small target area compared to global and hemispheric scale area. Effectiveness of using ‘present-future correlation coefficient’ as a metric for estimating reliability of future projection are originally based on seasonal average meteorological variables over global and hemispheric domain (Whetton et al. 2007; Abe et al. 2009). Their inter-model ensemble approach might have some limitation to the application for local area like East Asia.

In Fig. 14a, b, spatial correlation coefficient among inter-model ensemble for the present-day climate shows positive value. This suggests that the MME cannot be regarded as random sample distribution around observation. This skewed distribution toward positive value is a manifestation of similar bias pattern in models (Figs. 2a, 4a). Knutti et al. (2010) claimed that the model spread spanned by CMIP3 MME is too narrow and that the average of MME does not always cancel errors because of positive correlations between biases among CMIP3 MME. Our results is consistent with the indication of Knutti et al. (2010). Although Annan and Hargreaves (2010) stressed that CMIP3 MME can be regarded as statistically indistinguishable ensembles, the range spanned by the MME is not designed to sample uncertainties in a systematic way partly because models are not fully independent (IPCC 2010). In our present study, our results still include uncertainty originating from the sampling problem of MME.

6 Conclusion

The end of twentieth century simulations and the end of twentyfirst century projection by CMIP3 models are analyzed to investigate future change in precipitation intensity of East Asian summer monsoon projected. Target months are period from June to July which are the main rainy season over Japan and Korea.

In the present-day climate simulations, we have quantitatively evaluated model’s reproducibility of precipitation climatology and precipitation intensity, calculating bias, root mean square error and skill S proposed by Taylor (2001). Most models underestimate precipitation climatology and precipitation intensity over the East Asian region (110–150°E, 20–50°N). Based on S for precipitation climatology, we found five best models are MIROC_T106 (S = 0.863), CNRM_T42 (0.765), GISS-AOM_G29 (0.765), MRI _T42F (0.761) and CSIRO_T63 (0.730). Based on S for SDII, five best models are MIROC_T106 (S = 0.769), MRI _T42F (0.530), MIROC_T42 (0.409), CSIRO_T63 (0.383) and CNRM_T42 (0.344). The reproducibility of MME average using the five best models is better than that using all models. Introducing weighting factor based on the reproducibility of observation improves the performance of multi-model ensemble average. Nevertheless, MME using the five best models with weighing factor cannot outperform the best model. Models which have high reproducibility for precipitation climatology show also high reproducibility for precipitation intensity.

In the future climate simulations, MME using all models shows statistically significant increase of precipitation intensity over most part of East Asia. In case of MME with the five best models, precipitation intensity increases over most part of East Asia with larger locality originating from large change and large weight of the best model. Especially, contribution of change by MIROC_T106 is evident over the East China Sea and Japan. Difference of geographical distribution between multi-model ensemble with and without weights is small both for MME with all models and MME with the five best models. Introducing another precipitation intensity index such as the number of heavy rain days (precipitation ≥ 30 mm/day) in June and July (R30), geographical distribution of change in precipitation intensity is qualitatively similar to SDII case with smaller area of statistically significant increase.

We have estimated relative contribution of change in total precipitation and rainy days to the total SDII change, because SDII is defined as total precipitation divided by the number of rainy day. The contribution of change in total precipitation is much larger than that of rainy days.

In order to interpret future change in precipitation climatology and SDII, we have calculated change in vertically integrated horizontal transport of moisture. Changes in precipitation climatology and SDII can be interpreted as the moisture convergence change associated with change in horizontal transport of moisture. Clockwise moisture transport associated with intensification of subtropical high is found in three models out of five best models for reproducing precipitation climatology, but other two models show anticlockwise moisture transport associated with weakening of subtropical high.