1 Introduction

Seasonal forecasts provide estimates of statistics of weather on seasonal time scales placing them between conventional weather forecasts and climate projections. The physical basis for these forecasts comes from the effect of predictable seasonal-timescale signals arising from the ocean, and to a lesser extent the land surface, on the atmosphere (Palmer and Anderson 1994). Loosely speaking, the information we can get from seasonal forecasting is on the likelihood that the coming season will be wetter, drier, warmer or colder than the mean climatology for a given time of the year. To date, the reliability and the skill of seasonal forecasts varies considerably in the different regions of the world (e.g. Weisheimer and Palmer (2014)), moreover scores are subject to considerable sampling uncertainty. Nonetheless, recent efforts in improving seasonal forecast systems have shown a fairly consistent picture of the pattern of grid-point skill over the European region, including marked seasonal variation (Stockdale et al. 2018). In this context, some authors have reported how teleconnections may condition the prediction skill, as it happens with El Niño Southern Oscillation (ENSO) and the Madden–Julian oscillation (MJO) (Hendon et al. 2000). In particular, the reproduction of teleconnections in climate predictions may grant higher predictability (Miller and Wang 2019).

Weather regimes (WRs) are defined as persistent and recurrent atmospheric patterns that can last from a few days to two or three weeks at given geographic locations. They are generally used to study associated surface and atmospheric conditions in specific seasons and geographic domains (Michelangeli et al. 1995; Straus et al. 2007; Franzke et al. 2011; Hannachi et al. 2017). WRs are well established in the Euro-Atlantic (EAT) domain (Cassou 2004), and there is growing evidence of their importance in modulating European weather (Ferranti et al. 2015; Matsueda and Palmer 2018). Further, it has been shown that WRs can be associated to global teleconnections (Cassou 2008; Lin et al. 2009) making them suitable for determining the presence of global teleconnections and in assessing the prediction skill of seasonal forecast systems, henceforth called prediction systems.

EAT regimes have been used to study the atmospheric circulation and recurrent climate conditions of the Mediterranean region specifically (Ullmann et al. 2014; Zampieri et al. 2017). We focus our study of WRs on the Mediterranean region exclusively, as it has been done by Rojas et al. (2013), because they provide an effective way to look into seasonal variability in this region having identified four Weather Regimes: Meridional, Zonal, Anticyclonic, and Cyclonic.

The Mediterranean experiences an array of atmospheric circulation patterns whose dynamical interactions with the surface environment have high spatial variability. In particular the western and eastern sectors of the region show marked differences to consider the naming Eastern and Western Mediterranean (WM and EM, respectively) as homogeneous sub-regions of influence (Alpert et al. 2006; Ulbrich et al. 2012). Moreover, this region is influenced by a large number of teleconnection patterns, including many kinds of covariation, which are often far from being completely understood in terms of circulation dynamics (Ulbrich et al. 2012). The North Atlantic Oscillation (NAO), the Arctic Oscillation (AO), the East Atlantic (EA), the Scandinavian Pattern, and snow cover over northern Eurasia are important teleconnections influencing the Mediterranean (Ulbrich et al. 2012). In addition to extra-tropical circulation, other links of Mediterranean climate variability have been identified to tropical and sub-tropical systems: the El Niño Southern Oscillation (ENSO), and the South Asian Monsoon (SAM) (Alpert et al. 2006; Xoplaki et al. 2012). Among these teleconnections, particular attention has been given to NAO and ENSO, although the focus has typically been geared towards the European continent (Brönnimann 2007; Dunstone et al. 2016) as opposed to the Mediterranean region specifically.

The NAO represents a pattern of North Atlantic climate variability and has long been identified as an influencing factor on the Mediterranean climate variability, especially during winter (Ulbrich et al. 2012). The positive phase (NAO+) is characterized by stronger-than average westerlies associated with warm and wet winters over Northern Europe, and to a drier weather into the Mediterranean; conversely, the negative phase (NAO–) is associated with cold and dry winters in Northern Europe, and moist air (increased precipitations) into the Mediterranean (Hurrell and VanLoon 1997).

The ENSO is recognized as a major source of global climate variability (Halpert and Ropelewski 1992). Although related to the warming of the eastern Pacific SSTs (for extended periods of 6–12 months, and sometimes longer (Alpert et al. 2006)), ENSO induces climate anomalies in many parts of the planet and plays an important role in seasonal prediction (Brönnimann 2007; Goddard et al. 2012). The influence of ENSO in the North Atlantic-European area is more likely to be found during ENSO extreme events and during the winter (Pozo-Vázquez et al. 2001). The relationship of ENSO with weather and climate variability over the Mediterranean has most relevant influence at inter-annual timescales. While ENSO is associated to extreme precipitation in northern Europe during DJF for both phases, in the eastern Mediterranean this link occurs during the autumn (SON) for the El Niño phase (Sun et al. 2015). However, the interannual variability induced by the atmospheric circulation over the North Atlantic and Euro-Mediterranean region is large, making it hard to detect a noticeable ENSO signal (Kamil et al. 2017).

The overarching aim of this study is to investigate the prediction quality of five state of the art prediction systems in reproducing the seasonal variability of weather regimes over the Mediterranean during the winter (DJF). To address this aim, the objectives of this work are:

  • Identify major teleconnections associated to winter Mediterranean weather regimes.

  • Assess the ability of the C3S state of the art prediction systems in reproducing WRs and the spatial pattern of teleconnections.

  • Assess C3S ability in reproducing seasonal WRs frequencies.

We provide an overview of the dataset in Sect. 2, we describe results in Sect. 3, and summarize them in the concluding Sect. 4.

2 Data and methods

We use daily geopotential height at 500 hPa and sea surface temperature data from the ERAInterim reanalysis (Dee et al. 2011) as reference dataset to (1) define the weather regimes, (2) identify teleconnections and (3) assess the seasonal forecast skill of the C3S prediction systems.

The Copernicus Climate Change ServiceFootnote 1 (C3S—implemented by ECMWF on behalf of the European Commission as part of the Copernicus Programme) produces seasonal forecast products from state of the art systems: ECMWF SEAS5 (Johnson et al. 2019), the UK Met-Office GloSea5 (MacLachlan et al. 2015), Meteo-France System6 (Dorel et al. 2017), the German Weather Service DWD GCFS2.0 (Fröhlich et al. 2020) and the Euro-Mediterranean Centre on Climate Change CMCC SPSv3 (Sanna et al. 2017). C3S provides a multi-system seasonal forecast service, where data is collected, processed and combined to enable user-relevant applications. The data used in this study (Table 1) refers to the hindcast period 1993–2016, common to all C3S systems, and pertains atmospheric conditions globally, in particular: daily geopotential heights at 500 hPa for computing the weather regimes, and monthly sea surface temperature used for assessing the teleconnections. The use of multiple systems within C3S allows for accounting for model deficiencies, while having multiple runs per model allows for quantifying the effect of errors due to uncertainty in the initial conditions (Krishnamurti et al. 2000; Palmer et al. 2004; Hagedorn et al. 2005). Indeed, the C3S systems build their ensemble runs producing a set of slightly different realisations of the same prediction, which are considered equally likely. The different members of the ensemble forecasts are built in either “burst mode”: all members are initialized with conditions on the same start date, but from perturbed initial states, intended to sample the uncertainty in observations (e.g. ECMWF, all members initialized on 1st of November); or in “lagged” mode: members are initialized on different start dates, sufficiently close to each other (e.g. UKMO, seven members initialized on the 1st, 9th, 17th, 25th of the month). The prediction systems are operated by different modelling groups but are not always totally independent, e.g. they may share the same ocean or sea-ice component (Tebaldi and Knutti 2007). In this study, 25 ensemble members for every C3S are used, although some systems have more (up to 40).

Table 1 Hindcast configurations of the C3S systems, all run at a 1° × 1° spatial, 6 hourly temporal resolutions

For the assessment of the atmospheric conditions associated to each WR we use daily fields of precipitation from the Global Precipitation Climatology (GPCP) v1.2 (Huffman et al. 2001) and two meters temperature (t2m) from ERAInterim, while for exploring the reanalysis climate teleconnections outside the Mediterranean we use sea surface temperature monthly fields from the HadISST v2 dataset (Titchner and Rayner 2014).

All reanalysis and model outputs are interpolated at the same 2.5º grid resolution.

To identify the weather regimes, we use the Python package WRtool (Fabiano et al. 2020). The calculation is performed on the daily geopotential height field at 500 hPa through the following steps:

  • The seasonal cycle is calculated separately for each model applying a day-by-day average of the field, smoothed with a 20-days running mean. This seasonal cycle is removed from the daily fields to obtain daily anomalies.

  • Anomalies corresponding to the winter (DJF) season are selected over the Mediterranean, defined as the domain comprised between the latitudes 25 and 50 North, and longitudes − 10 and 40 East.

  • For the observations, we calculate the Empirical Orthogonal Functions (EOFs) of the daily anomalies and retain the 4 leading Principal Components (PCs), which account for about 80% of the sectorial variance. The 4 leading EOFs define our reference phase space, which is used for all further calculations. The model anomalies are projected onto the reference phase space, to obtain a set of 4 pseudo-PCs (see Fabiano et al. 2020 for more details on this point).

WRs are calculated in two ways: raw and projected as detailed below. This distinction is made because while assessing the WR patterns we use each model’s own clustering with the purpose of evaluating models’ capacity to simulate the observed regimes in their own phase space, while for assessing model skill in reproducing WR frequencies, upon clustering we “re-project” to the reference phase space in order to have a common ground for comparison for observed and modelled data:

  • Raw regimes A K-means clustering algorithm is applied to each set of model pseudo-PCs (or PCs for the observations), to group them into four clusters representing the Weather Regimes. The models’ regimes are finally reordered to give the best match with the observed ones in terms of pattern correlation.

  • Projected regimes The K-means is only applied to the observations. Each day in the model pseudo-PCs timeseries is simply assigned to the regime corresponding to the closest reference centroid in phase space.

For each model, the raw regimes are computed considering together all members and starting dates. The raw regimes patterns are compared to the observed ones to assess the model ability to reproduce them (Sect. 3.2). Conversely, the projected regimes are calculated for each member (25 per model) individually while assessing teleconnections (Sect. 3.3) and skill (Sect. 3.4) of the models.

We adopt four clusters on the basis of the current understanding of weather regimes in the Mediterranean and Europe for boreal winter (Rojas et al. 2013; Ullmann et al. 2014).

The weather regimes consist of a daily cluster index over the DJF season that is then aggregated monthly to form a vector of length 72 (3 months *24 years) in order to calculate composites with monthly SSTs and reveal the presence of teleconnections outside the domain of study.

3 Results

Four weather regimes from the 500 hPa geopotential height anomalies of the reanalysis data are identified (Fig. 1). The spatial patterns are consistent with the Mediterranean Weather regimes identified by Rojas et al. (2013) over a similar domain using 700 hPa geopotential height anomalies, and named as follows: WR1-Meridional, WR2-Zonal, WR3-Anticyclonic, and WR4-Cyclonic. WRs are sorted from largest to smallest frequency of occurrence ranging from 29.6% for the most frequent, WR1, to 21.3% for the least frequent, WR4.

Fig. 1
figure 1

Geopotential height anomaly for the four weather regimes. The white box indicates the domain on which the WRs are calculated

We provide a description of each of them considering the corresponding temperature (Fig. 2) and precipitation patterns (Fig. 3) computed as the composites of the fields with the daily regime index (the time series of WR ranging from one to four). This was possible over the common window Dec-1996–Feb2015 for which gridded daily precipitation is available. For consistency, the same time window was used for temperature.

Fig. 2
figure 2

Temperature (ERAInt tas) patterns corresponding to each WR

Fig. 3
figure 3

Precipitation (GPCP) patterns corresponding to each WR

Correspondence between these WRs and those obtained over the East Atlantic domain (EAT) was sought in order to find similarities of the regimes in the two domains (details are reported in the Supporting Information, henceforth SI).

3.1 Mediterranean weather regimes

The WR1, or meridional, is the most frequent regime with 29.6% of occurrences in the season, it features a geopotential height dipole with positive anomaly over the northwest of the domain and a negative anomaly over the eastern half of the domain, which generate the temperature conditions depicted in Fig. 2a, i.e. a cold spell originating in northeast Europe that propagates over the whole Mediterranean region, and a precipitation dipole (Fig. 3a) with a mostly dry west caused by the anticyclonic conditions over the northwest and wet east of the domain. Total precipitation of this regime contributes to up to 50% of the total winter precipitation with maximum contributions over the Alps and Turkey (Rojas et al. 2013).

WR1 has no univocal correspondence with any EAT regime, but rather it is split into mostly the EAT Scandinavian Blocking and Atlantic Ridge regimes (see Section S1 and Fig. S1–S2 in SI).

WR2 is named zonal as strong westerlies bring humid air from the Atlantic to the Mediterranean and follows the meridional regime with 25.7% of occurrences in the season. Opposite to WR1, WR2 features a negative anomaly over the northwest and a positive anomaly over the eastern half of the domain (Fig. 1-b) to which corresponds a warm anomaly propagating from the northeast to the whole Mediterranean region, particularly intense in the Italian peninsula and eastern Europe (Fig. 2-b), and wet (dry) conditions in the west (east) of the domain (Fig. 3-b). Total precipitation of this regime contributes 30–50% of the total winter precipitation over France and the Iberian Peninsula (Rojas et al. 2013).

WR2, similarly to WR1, has no direct match in any of the EAT regimes, showing contributions mainly from three regimes: the NAO+, the Scandinavian Blocking, and the NAO– (Section S1 of the SI). Thus far the Scandinavian Blocking contributes to both WR1 and WR2, indicating that the Mediterranean region has features of its own that deserve to be characterized with ad hoc clustering.

WR3 amounts to 23.5% of occurrences in the season and features a pronounced positive geopotential anomaly centred over southern France—northwest Italy (Fig. 1-c). This regime is associated with anticyclonic conditions inducing positive anomalies of surface temperature over most of the domain, except the southeast (Fig. 2-c), and very dry conditions over the whole domain (Fig. 3-c). Of all four regimes, WR3 is the one with greatest similarity to an EAT regime: the NAO+ (Section S1, SI), which is characterized by stronger-than-average westerlies associated with warm and wet winters over Northern Europe, and to drier weather in the Mediterranean (Hurrell and VanLoon 1997). In particular, this regime is associated with a strengthening of the Icelandic low and with a north-eastward shift of the Azores High leading to weak northerly winds in the Mediterranean region, typical of stable anticyclonic conditions (Ullmann et al. 2014).

WR4 features, in contrast to WR3, a pronounced negative system over most of the domain associated to cyclonic conditions (Fig. 1-d) inducing negative anomalies of surface temperature over most of the domain (Fig. 2-d). Although this is the least frequent regime (21.3%), it is associated to generally wet conditions from west to east (Fig. 3-d) with a large contribution to total seasonal precipitations (e.g., the Alps and Italy receive 50–60% (Rojas et al. 2013)).

This regime is mainly associated with the EAT regime NAO- (Section S1, SI), which brings cold and dry winters in northern Europe and moist air into the Mediterranean (Hurrell and VanLoon 1997). WR4 is also associated, although to a smaller degree, with the EAT Atlantic Ridge regime.

The comparison between DJF weather regimes over the Mediterranean and the East Atlantic domain carried out in S1.1 of the SI with the aim of quantifying similarities between the two domains indicates that only one Mediterranean regime (WR3-Anticyclonic) has marked similarity (i.e. is strongly related) with an EAT regime (NAO+), while the remaining three WRs result as hybrids of multiple EAT regimes, indicating that the Mediterranean region, although very close and partly overlapping the EAT domain, deserves to be analysed within its own boundaries as weather patterns are specific to this area.

Finally, the time persistence of the regimes, which can be expressed as the average number of consecutive days associated with a given WR, is shown in Fig. 4. For all WRs half of the sequences are shorter than 3 consecutive days. WR3-anticyclonic and WR1-meridional have the highest persistence on average, so they tend to last longer (4.7 and 4 on average) than cyclonic and zonal (3.7 and 3.4 days respectively). The range of sequence lengths is larger for WR3—anticyclonic (1–30 consecutive days) and WR2—zonal (1–29) than it is for WR1–meridional and WR4–cyclonic (1–17 and 1–15, respectively). The zonal regime, although it is the second most frequent WR (556 days out of 2166), tends to have shorter sequences than the anticyclonic (508 days) and cyclonic (461 days) regimes, indicating a pronounced transitional tendency. Interestingly, the highest outlier in Fig. 4 (WR3, 30 days) refers to winter 2013/2014, which was dominated by WR3, with January having 100% of WR3 days (December and February having 55% and 42%, respectively). This is consistent with Ferranti et al. (2018), who describe that the winter of 2013/2014 was dominated by NAO+ and westerly flow anomalies across the Atlantic and that these anomalous flow conditions yielded a series of storms and severe rainfall but rather mild temperatures over Europe. Regime persistence for both EAT and Mediterranean domains is shown in Figure S3 of the SI.

Fig. 4
figure 4

Box plots of single or consecutive days associated with each WR over the 24 seasons analysed. Bottom and top edges of the box indicate the 25th and 75th percentiles, respectively. The central line indicates the median, while the asterisk (with inset number) the mean. The ends of the whiskers correspond to the lowest (highest) value within the 2 inter-quartile range of the lower (upper) quartile. The outliers are plotted individually as ‘ + ’

3.2 WRs in the C3S seasonal prediction systems

The C3S prediction systems have shown a good consistency in reproducing raw WR patterns obtained with the ERAInterim reanalysis dataset. The Taylor diagrams (Taylor 2001) help summarize the relative skill with which the prediction systems simulate the raw WRs spatial patterns. In these diagrams the similarity between two patterns is quantified using their correlation, their centred root-mean-square difference and the amplitude of their variations as their standard deviations (Fig. 5). Overall, all C3S prediction systems capture well the spatial pattern of the reference (black dot) for the different regimes with the DWD system lying the farthest in all cases. In particular, WR3 and WR4 are the best captured regimes, showing good alignment in the standard deviation with ERA-Interim and generally smaller RMSE values, while WR1 and WR2, although with generally good correlation values, exhibit greater standard deviations compared to the reference dataset. It is therefore apparent that the C3S prediction systems are better at reproducing spatial patterns of Anticyclonic and Cyclonic than Meridional and Zonal regimes.

Fig. 5
figure 5

Taylor Diagrams relative to the geopotential height at 500 HPa anomalies (1993–2016 period) of each weather regime (Meridional, Zonal, Anticyclonic, Cyclonic), for the five C3S prediction systems (each comprising 25 members) and for the reference field, the ERAInterim reanalysis. The diagrams are function of the root mean square (dashed contours), the correlation coefficient (grey straight lines), and the standard deviation (x-axis or radial distance from the origin)

The ranking of seasonal frequency of the raw regimes is relatively well captured by the C3S prediction systems. In Table 2 the ranking of the regimes follows the reanalysis order from dark to pale: all systems rank unanimously WR4 as least frequent regime, while only ECMWF and CMCC rank WR1 as the most frequent regime. For instance, the second most frequent regime in the C3S systems is the Anticyclonic, whereas it is actually the third most frequent in the reanalysis. It should be noted that some systems like UKMO and MeteoFrance present similar values of the frequency (26%) in all regimes except WR4, so they do not show clear difference in the ranking. It is worth noting, as shown in Fig. 6, that C3S prediction systems underestimate regime frequencies for the WR1-meridional and overestimate for WR3-anticyclonic, while for the other two regimes the biases are less pronounced.

Table 2 Winter mean frequency (in % of total days) of raw weather regimes in ERA-interim and individual C3S prediction systems computed over the period 1993–2016
Fig. 6
figure 6

C3S prediction systems bias in regime frequencies relative to the reference ERAInt

3.3 Teleconnections

We explore teleconnections outside the domain of study using SST monthly fields. The SST teleconnections are firstly obtained for the reanalysis using the HadISST dataset as follows (as outlined in Figure S4 of the SI):

  1. 1.

    The WR frequency is computed monthly (DJF-1993-2016)

  2. 2.

    The 85th percentile of the WR frequency time series is taken as the threshold for WR intense months (or monthly regime extreme).

  3. 3.

    The monthly SST anomalies are selected whenever the frequency exceeds the threshold.

  4. 4.

    The composite is obtained by averaging the selected SSTs.

Secondly, SST teleconnections are sought for the C3S models computing the threshold using a single distribution comprising the projected WR frequencies of all of the ensemble members (25). The selection of SSTs is then carried out on a per member basis using this common threshold and yielding 275 SST occurrences (months). Ultimately, the SSTs obtained are averaged together to form the composite of each C3S model.

The choice of compositing over WR intense months with a threshold is owed to the large internal variability that would otherwise attenuate the signal. One expects that if a teleconnection exists, it will first appear in the case of extreme monthly anomalies, i.e. in months dominated by a single WR. However, it is worth noting that lowering the threshold a consistent signal in terms of pattern is found but the anomalies are weaker.

As shown in Figure S5 (in the SI) across all WRs the Pacific Ocean sector shows prominent teleconnection signals, particularly the Pacific Decadal Oscilllation (PDO) (Mantua et al. 1997; Mantua and Hare 2002) and ENSO regions. In WR1 a clear Niña/PDO– signal is found that is also persistent as it shows earlier than DJF when considering earlier time windows (NDJ and OND, not shown). The same applies to WR3 and the Niño signal. Conversely, the other WRs show weaker and short-lived signals: WR2 with a PDO+ signal and WR4 with a La Niña signal. These are consistent with the teleconnections documented in the literature as discussed in the Introduction.

After seeking teleconnections in the reanalysis for all WRs, we focus on the two regimes for which teleconnections are more pronounced, namely WR1—Meridional (Fig. 7) and WR3—Anticyclonic (Fig. 8) and assess how well the C3S prediction systems reproduce them. Interestingly, as seen in Fig. 6, these two WRs are the ones that are particularly affected by sizable frequency biases in the models.

Fig. 7
figure 7

DJF composites of reanalysis (top left) and C3S prediction systems obtained by averaging SST anomalies above the 85th percentile of the WR1 (meridional) frequencies, revealing teleconnections globally

Fig. 8
figure 8

Same as Fig. 7 but for DJF WR3 (anticyclonic)

The signals in the C3S prediction systems are generally weaker than in the reanalysis, note that the colour bar lies in the ± 0.5 °C range, whereas in the reanalysis case the range is ± 1 °C. This is to be somehow expected because the composites for the C3S prediction systems are the result of averaging across multiple members, whose spread might be non-negligible. Although this can also reflect deficiencies in model simulations.

However, some systems capture fairly well the spatial patterns of teleconnection. In particular, with the exception of CMCC showing an opposite signal, the C3S prediction systems capture well the La Niña pattern of WR1, although for MeteoFrance and UKMO the signal is weak and partly offset. Only ECMWF and DWD seem to capture the PDO− signal (the warm-centred eye in the north pacific) that is present in the reanalysis (Fig. 7). The El Niño pattern of WR3 is captured by the ECMWF and UKMO systems and only mildly by MeteoFrance, while neither CMCC and DWD are able to reproduce this signal (Fig. 8). Composites for WR2 and WR4 can be found in the SI (Figure S6 and Figure S7, respectively).

3.4 Predictive capability of C3S prediction systems

We assess the models’ predictive skillby considering WR frequencies in the reanalysis and looking at how well they are captured by the C3S prediction systems, using the projected regimes. We keep our focus on the WR1 and WR3 regimes. Figure 9 shows the seasonal frequency of the reanalysis (black) and that of each C3S prediction system mean (colour). A first positive result is that the overall mean of the C3S projected WR frequencies throughout the period is consistent with that of the reanalysis (Fig. 9 inset row one), i.e. ~ 30% for WR1 and ~ 23% for WR3. Secondly, the WR frequency correlations between the reanalysis and each C3S prediction system (Fig. 9 inset row two) are relatively low (in the 0.15–0.38 range) and not statistically significant but they are all positive, indicating an overall encouraging behaviour of the prediction systems in reproducing observed seasonal variations. Results for regimes WR2 and WR4 are shown in Figure S8 of the SI.

Fig. 9
figure 9

Seasonal WR frequency for ERAInt (black) and mean of 25 members for each C3S prediction systems (colour) over the period 1993–2016. Inset numbers report the mean frequency (row 1) and the Pearson’s correlation to the ERAInt (row 2) and to the Niño 3.4 (row 3) time series. Significant correlations are in bold

Lastly, comparing the WR frequency with the Niño 3.4 time series (Fig. 9 last inset row) yields a negative (positive) correlation in the WR1 (WR3) reanalysis that is consistent with the teleconnections seen in Fig. 7 (Fig. 8). Interestingly, correlations of the C3S with Niño 3.4 are higher than those with ERA (reaching ± 0.62), indicating that overall the right responses to ENSO are captured by the systems, although in some cases the models are overly sensitive (DWD in WR1) or simply not sensitive at all (CMCC in both WRs and DWD in WR3).

Furthermore, we attempt to ascertain the ability of the C3S prediction systems in capturing observed periods of WR prominence by using all of the members. If we consider terciles, there is no marked difference with climatology, that is, counting for each prediction system the number of members that fall into the reanalysis’ terciles we find values around 33%, although slightly higher generally (Table S2-a). Instead, considering the four (out of 24) WR-intense seasons, the share of members above the 85th percentile (Q85) ranges from 0% in winter 2001–2002 of WR1 for MeteoFrance to 64% in winter 2015–2016 of WR3 for UKMO as shown in Fig. 10. Overall, ECMWF and UKMO show markedly higher skill than the climatology. Indeed, the members of these two models predict frequencies above Q85 up to 50% more than expected in the case of no skill (15%, i.e. climatology) as summarised in Table S2-b.

Fig. 10
figure 10

For each C3S prediction systems (colour) percentage of members (out of 25) falling above the 85th percentile at each of the four WR-intense seasons (out of 24) over the period 1993–2016. Grey bars are seasonal WR frequency anomalies for ERAInt (grey), while red lines are the percentiles Q85 (top) and Q15 (bottom)

The overall moderate skill of the C3S in reproducing WR frequencies leads to a last step in our assessment, which is based on the fact that the WRs are teleconnected to climate signals that C3S are able to capture (ENSO), as seen in Sect. 3.3.

Therefore, we explore whether C3S models increase their prediction skill in presence of intense ENSO signals. Throughout the period of analysis 1993–2016, we thus isolate and plot the frequencies that correspond to intense DJF ENSO yearsFootnote 2 considering the Niño 3.4 index with a threshold equal to 0.5 K° i.e. 10 events for La Niña (≤ − 0.5): 1996, 1997, 1999, 2000, 2001, 2006, 2008, 2009, 2011, 2012; and 8 events for El Niño (≥ 0.5): 1995, 1998, 2003, 2005, 2007, 2010, 2015, 2016.

For the reanalysis Fig. 11 (left panel) shows a negative median value of the El Niño WR1 frequencies and a positive median value for frequencies occurring during strong La Niña events. This is consistent with our previous finding on WR1 being linked to La Niña. The C3S ensemble, denoted “full-ENS” behaves similarly: the distributions of the frequency anomalies of the ensemble considering only El Niño (red) or La Niña (blue) seasons—show an increase during La Niña signal (Fig. 11) as the reanalysis, with a concurrent decrease during El Niño. Conversely, for WR3, El Niño increases the frequency anomaly while La Niña reduces it. This time too results are in agreement with the El Niño teleconnection found in WR3. It should be noted that not only do the WR frequencies increase when the ENSO phase agrees with those observed (Figs. 7, 8), but also that the frequency is reduced with the opposite phase. Arguably, this is a sign of good predictability (e.g. WR1 the Niña signal increases, but also the Niño signal decreases considerably).

Fig. 11
figure 11

Boxplots of WR frequency anomalies considering La Niña (blue) and El Niño (red) intense years. Inset values indicate median of the data. The entire C3S prediction system ensemble is noted as “full-ENS”, while “sel-ENS” refers to the ECMWF, UKMO, and MeteoFrance systems. Inset stars flag significant difference against the sample with all years using the Kolmogorov–Smirnov test at the 5% significance level

We select the best three models among the C3S prediction systems (henceforth named “sel-ENS”) as those closest to the observations (reanalysis) across the four WRs in the Taylor diagrams shown in Fig. 5, namely ECMWF, UKMO, and MeteoFrance. The increase in WR frequency during intense ENSO years becomes more evident when considering this ensemble subset in Fig. 11—for which medians increase in value going from 0.52 to 0.82 for La Niña in WR1, and from 0.03 to 0.86 for El Niño in WR3. This corroborates the hypothesis that the systems are sensitive to ENSO and increase their skill when ENSO signal is intense.

The difference between the entire ensemble sample and its La Niña and El Niño sub-samples tested statistically significant (Table 3) in virtually all cases of the full and the selected ensembles, with smaller p-values for the latter. Equal medians, using the Wilcoxon ranksum test (W) (Wilcoxon 1945), and same parent distributions, using the two-sided Kolmogorov–Smirnov test (KS) (Massey 1951), were tested at the 5% significance level. WR3 full ensemble vs El Niño sub-sample is the only case in which the KS null hypothesis was not rejected (difference between the two samples is not statistically significant). The difference in the samples is less pronounced in WR3 than it is in WR1.

Table 3 Comparison between WR frequency anomalies for the full C3S prediction system ensemble and the sel-ENS (ECMWF, UKMO, MeteoFrance): absolute distances between group medians and two-sample statistical tests on WR frequency anomalies

Interestingly, the median values of the frequency of La Niña (El Niño) for WR1 (WR3) are enhanced for the sel-ENS compared to the full C3S prediction system ensemble, indicating that choosing the best performing models does reproduce better the ENSO effect seen in the reanalysis. This is further quantified in Table 3 (El Niño vs. La Niña row 3 and 6) in the distance between medians for which greater values are found in the sel-ENS: in WR1 3.29 as opposed to 2.71 for the full ensemble, and in WR3 3.88 with the full ensemble yielding 2.56. The significance of the difference between samples is also higher (smaller p-values than those of the full ensemble).

4 Summary and conclusions

The aim of this paper was to assess the ability of the C3S suite of prediction systems in reproducing Mediterranean WRs temporal variability and teleconnections associated.

After computing four weather regimes over the Mediterranean domain in the winter season, we described climatic conditions associated to each WR and identified major SST teleconnections globally. On the basis of these findings obtained using reanalysis data, we performed the same analysis using the C3S prediction systems to assess their ability to reproduce the weather regimes and the teleconnections. Finally, focussing on the two regimes with stronger teleconnection signal (WR1-meridional linked to La Niña, WR3-anticyclonic linked to El Niño) we considered the weather frequencies to evaluate the predictive skill of the C3S prediction systems.

We have focussed on the boreal winter that is typically the season in which medium to long range dynamical forecast systems have higher skill in predicting atmospheric variables (Neal et al. 2016). Using the WR approach, further research should focus on assessing predictability in other seasons, possibly extending the period of analysis to an entire year as done by Grams et al. (2017) with seven WRs in the Atlantic European region.

We can summarize our main findings as follows.

  • Weather regimes in the Mediterranean correspond to different weather conditions than those of the standard Euro-Atlantic ones. Only the regime associated to the NAO+ shows similarities between the WRs of the two domains. Mediterranean WRs also have a shorter time persistence (days) than that of the Euro-Atlantic domain (approximately 3–5 days and 4–6 days, respectively), indicating a higher tendency to transition from one WR to another.

  • Two WRs show a clear teleconnection to the ENSO region: WR1—meridional and WR3—anticyclonic. Although with a smaller amplitude compared to the reanalysis data, most C3S prediction systems are able to reproduce this teleconnection, confirming how the large number of ensemble members allows to increase the signal-to-noise ratio and separate the SST-forced variability from the atmospheric internal variability (Feng et al. 2019).

  • The C3S prediction systems show moderate predictive skill in capturing changes in the seasonal frequencies of weather regimes: correlations with the reanalysis’ WR frequency are small but always positive. The ability to predict the reanalysis’ terciles is barely above the climatology, however, models in the sel-ENS group grant higher predictive skill when the WR is prominent (WR frequencies above the 85th percentile) with probabilities of up to 50% higher than the climatology (i.e. 15%).

  • The predictive skill of the C3S prediction systems increases when considering only Niña (WR1) and Niño (WR3) intense years, indicating a good sensitivity of the models to the ENSO signal. In fact, the frequency distributions change significantly and, importantly, the significance increases when using the sel-ENS. This is consistent with Shukla et al. (2000), Peng et al. (2011) who report how substantial skill has been achieved in forecasting seasonal mean values, with skill consistently higher in the tropics and in regions with strong teleconnections with ENSO and notably decreased skill during neutral ENSO conditions.

The weather regimes approach provides an effective framework for exploring the relationship of the WRs with temperature and precipitation patterns e.g. Cipolla et al. (2020), Zhang and Villarini (2019) in the Mediterranean. For their societal and economic relevance particular focus should be given to extreme events like extreme dry spells (Raymond et al. 2016) and meteorological droughts (Richardson et al. 2020), assessing the ability of seasonal forecasting ensembles to forecast them (Bloomfield et al. 2020). In this context, state of the art seasonal prediction systems like the ones that belong to the Copernicus C3S, provide a valuable data set that is constantly updated for inter-model comparisons; their currently moderate seasonal predictive skill will benefit from a continual effort in adopting common verification diagnostics (Doblas-Reyes et al. 2013). As suggested by Weisheimer and Palmer (2014), the goodness of seasonal forecasts should be assessed primarily in terms of the probabilistic reliability of ensemble forecasts considering a small ensemble spread as an indicator of low ensemble forecast error.