The implications of initial model drift for decadal climate predictability using EC-Earth

The large heat capacity of the ocean as compared to the atmosphere provides a memory in the climate system that might have the potential for skilful climate predictions a few years ahead. However, experiments so far have only found limited predictability after accounting for the deterministic forcing signal provided by increased greenhouse gas concentrations. One of the problems is the drift that occurs when the model moves away from the initial conditions towards its own climate. This drift is often larger than the decadal signal to be predicted. In this paper we describe the drift occurring in the North Atlantic Ocean 5 in the EC-Earth climate model and relate it to the lack of decadal predictability in that region. While this drift may be resolution dependent and disappear in higher resolution models, we identify a second reason for the low predictability. A subsurface heat content anomaly can only influence de atmosphere if (deep) convection couples it to the surface, but the occurrence of deep convection events is random and probably mainly determined by unpredictable atmospheric noise.


Introduction
The heat capacity of the ocean is much larger than that of the atmosphere.The energy needed to change the temperature of the upper 4 m or so of the ocean is enough to change the temperature of the whole atmosphere by the same amount.Therefore the ocean provides a memory that potentially makes multi-year climate forecasts possible.Imagine, for instance, that a sizeable volume of the North Atlantic Ocean is warmer than normal.Releasing the extra energy over the course of a few years would systematically warm the overlying atmosphere, and the prevailing westerly winds would advect the warmer air into Europe.
This could make possible a prediction of the kind "the next few years will be warmer than average" in western Europe.
Recent years have seen a large amount of research into decadal prediction, i.e., prediction for up to ten years ahead (see the review by Meehl et al., 2014).Typically, climate models are initialized with observed past ocean states, and their ability to reproduce observed climate anomalies is assessed.
Climate variability can be externally forced or produced internally.The most important external forcing arises from the increasing greenhouse gas (GHG) concentrations, and the prediction "the next years will be warmer than average" is nearly always correct.This deterministic signal has to be removed from the model output to assess the predictability of internal variability arising from the ocean memory.After subtraction of the GHG signal, Van Oldenborgh et al. (2012) and Karspeck et al. (2015) found no predictability signal except for in the Labrador Sea.That decadal predictability is low after accounting 1 Ocean Sci.Discuss., doi:10.5194/os-2016Discuss., doi:10.5194/os- -27, 2016 Manuscript under review for journal Ocean Sci.Published: 30 May 2016 c Author(s) 2016.CC-BY 3.0 License.
for GHG forcing was also noted by other authors, e.g., Chikamoto et al. (2013) and Boer et al. (2013).Predictability is further reduced if the model is driven without known volcanic aerosol loadings (Van Oldenborgh et al., 2012).
In short, the predictability of internally generated variability appears to be low in current climate models.In this paper we try to shed some light on this apparent lack of decadal predictability by analysing the initial drift occurring in the ocean part the EC-Earth climate model (Hazeleger et al., 2012;Sterl et al., 2012) and investigating its effect on the atmosphere.We concentrate on the North Atlantic Ocean because this is the region where most models find some kind of predictability.

Experimental set-up
The decadal prediction experiments described in this paper are performed with EC-Earth v2.3 (Hazeleger et al., 2012;Sterl et al., 2012).EC-Earth is a climate model which uses NEMO (Madec, 2008) in the ORCA1 configuration as its ocean component.This configuration has an average horizontal resolution of 1 • , which, due to the curvilinear grid employed, increases to ≈ 0.5 • in the Labrador Sea region.The initialization procedure described in Wouters et al. (2013) is used.Ocean initial conditions are taken from ECMWF's ORAS4 ocean reanalysis (Balmaseda et al., 2013), and atmosphere initial conditions are from ERA-Interim (Dee et al., 2011).ORAS4 comprises five members, and five atmosphere initial conditions were created from ERA-Interim using singular vector perturbation.Three of the ORAS4 members are used and combined with all five atmosphere conditions, creating three ensembles differing by their ocean initial conditions.Hindcasts are started on the 1st of November of the years 1960, 1965, 1970, 1985, 1990, 1995, 2000, and 2005.Unfortunately, sea-ice restart files for 1975 and 1980 as used by Wouters et al. (2013) where not available any more.The forecast length is 10 years.
The original aim of this research was to investigate how initialization in different ocean regions impacts prediction quality.
To test this two ensembles were performed in which the ocean restart files were downgraded in specific regions.The impact turned out to be low, and results from these ensembles are presented here alongside those using the complete restart files.
Essentially, the low impact of the degraded restarts was the reason to write the present paper.
The decadal prediction runs are compared to the ORAS4 ocean reanalysis (Balmaseda et al., 2013).The comparison concentrates on the North Atlantic Ocean, as earlier work (e.g., Van Oldenborgh et al.;2012, Hazeleger et al., 2013a) ) has shown that this is one of the few regions in the world where useful skill exists beyond the trend caused by the greenhouse effect.

Initialization strategies
To perform a decadal climate prediction the climate model has to be initialized.The initialization of the atmosphere component is not crucial as the atmosphere time scale is only a few weeks.The atmosphere "forgets" its initial state quickly and cannot contribute to the ability of the model to forecast the climate years ahead, as is the purpose of decadal climate prediction.Any model skill must come from the ocean with its large thermal inertia and correspondingly long time scales.Initialization of the ocean component of the climate model is therefore crucial.
There are basically two methods to initialize the ocean, namely full-field and anomaly initialization.In full-field initialization an oceanic state is constructed from observations and used as the initial state.In anomaly initialization an observation-based anomaly field is added to the model climatology.In theory the full-field initialization should be preferable as the prediction run starts from a situation close to the real oceanic state.In practice, however, this "real" state is often incompatible with the model's own climatology.As a result the model immediately starts to drift away from the initial state towards its own climate.
The drift may be large (e.g., Hazeleger et al., 2013b;Huang et al., 2015;Sanches-Gomez et al., 2016) and can obscure the climate signal to be predicted.In practice it turns out that the skill of both initialization strategies is comparable (e.g., Polkova et al., 2014;Hazeleger et al., 2013b;Smith et al., 2013).
This paper aims at describing the initial drift and its implications for model performance and decadal predictions.To show that this drift is indeed towards the "model's own climate", comparisons with results from the PD and PI runs of Sterl et al. (2012) will be made.Those runs were performed with constant present-day (PD) and pre-industrial (PI) forcing, using the same version of EC-Earth (v2.3).

Determining the drift
To isolate the climate signal from full-field initialized decadal prediction runs, the model drift has to be determined and subtracted from the individual forecasts.Following the recommendation prepared for CMIP5 by the CMIP-WGCM-WGSIP Decadal Climate Prediction Panel (ICPO, 2011), the drift is assumed to be independent of the start date of the forecast, and only dependent on lead time (= time since start date).Thus the assumption is that for an ensemble of forecasts the development of any variable T can be written as where t s denotes the start date, t l the lead time, and i the ensemble member.T drift is the common drift, depending on lead time, but not on start date, T signal is the signal, which is the same for each member with the same start date, and T noise is the residual.
Averaging (1) over all start dates and ensemble members gives an estimate of T drift (t l ), where n is the number of start dates, T s = {t s } is the set of all start dates, and m denotes the number of ensemble members.
In our case n = 8, 1965, 1970, 1985, 1990, 1995, 2000, 2005}, and m = 5.In deriving (2) we have assumed that T signal is uncorrelated between start dates, and that the noise is uncorrelated between start dates and between members of the same start date.
An estimate of the signal can now be obtained by subtracting the estimated drift and averaging over the ensemble members, Note that this procedure naturally removes the annual cycle, as the latter is contained in Tdrift (t l ).Therefore, drift-corrected model output will be compared to anomalies from ORAS4, where anomalies are calculated in the usual way by subtracting the mean annual cycle.
It is considered good practise (ICPO, 2011) to apply (2) in a cross-validated manner by excluding the current start date from the calculation of the drift, As this paper is mainly concerned with the drift itself rather than with the correction of the forecast, ( 2) is used throughout.

Overturning Stream Function and Subpolar Gyre strength
The drift in EC-Earth is large and affects the whole North Atlantic basin.We illustrate this by investigating the changes in the Atlantic Meridional Overturning Circulation (AMOC) and the barotropic circulation in the basin.The AMOC is described by the overturning stream function where x W and x E are respectively the western and eastern boundaries of the basin, and the barotropic circulation by the barotropic stream function where z B is the ocean depth.
The upper panel of Fig. 1 shows the time-evolution of the AMOC strength, smoothed by a 12 month running mean (12mrm) filter.The AMOC strength is defined as the maximum of Φ between 20  The strength of the subpolar gyre (SPG), measured as the maximum of the barotropic stream function in a box over the Labrador Sea (LS, see Fig. 3 below), shows a behaviour that is similar to that of the AMOC (Fig. 2).Its evolution is dominated by a strong drift of about 10 Sv, and after drift-subtraction no signal remains.The remaining variability is far below that of ORAS4.
Figure 3 shows that there are substantial differences between the overturning and barotropic stream functions of ORAS4 and the PD run of Sterl et al. (2012).In PD the MOC cell is shallower than in ORAS4, and the SPG does not enter the Labrador Sea any more.Furthermore, the strengths of the AMOC, the SPG and the GS are lower than those in ORAS4, and the high  correlation between them is lost (see also Tab. 1). Figure 4 shows how the stream functions develop in one of the hindcast ensembles.At the beginning of the hindcast both the overturning and the barotropic stream functions resemble those from ORAS4 (Fig. 3, left), while after ten years they resemble those from PD (Fig. 3, right).Obviously, the model drifts quickly, within less than 10 years, to its own preferred climatology as represented by PD.
Although the initial drift found in EC-Earth is large, it is comparable to the drift occurring in other models.For instance,  1. Pairwise correlations between the detrended time series for the strengths of the MOC, the SPG and the GS (lower row of Fig. 3).
Values above the diagonal are for ORAS4, below for PD.In ORAS4 the correlation between SPG and GS strengths is maximum (−0.73) for SPG strength leading by 10 months.For the other pairs the maximum correlation occurs at lag 0.
Figure 5. Ensemble-mean sea surface salinity difference between year 9 and year 1 for two different ensembles, distinguished by colours and contours, respectively.The close resemblance between the two ensembles shows that the drift is robust and not dependent on the initialization.to 0.25 • .Although the circulation in the LS improves with higher resolution, even their 0.25 • model shows large differences with observations.Delworth et al. (2012) show that their model (CSM2.1),which has a horizontal resolution of about 10 km in the LS, can maintain a current around the LS, but that the coarser version CSM2.5 (100 km resolution) cannot.The higherresolution version also has a stronger MOC.
The inability to maintain the currents and the ensuing large drift is probably one reason for the lack of predictability found here.To confirm this suspicion, the experiments would have to be repeated with an ocean model of ≈ 0.1 • , which unfortunately is beyond our current capabilities.

Vertical structure of the drift
As shown above, the drift in the decadal prediction runs comprises of a large signal that exceeds the model's natural variability.
Of course, the drift is not confined to the overturning strength or the SPG, but these changes are accompanied by other large scale changes.Convective activity moves from the LS into a region south of Greenland and increases in the NE-Atlantic (west of Scotland).This is mirrored by changes in mixed-layer depth (MLD), temperature and salinity fields that are all close to the model climate at the end of the prediction runs (not shown).
The sea surface salinity (SSS) change consists of a freshening of the SPG (Fig. 5).This pattern resembles the SSS change pattern found by Huang et al. (2014), who ascribe it to excessive sea-ice melt.In our runs the sea ice volume increases during the first six or so years of the integration (not shown), so excessive ice melt cannot be the reason for the freshening of the SPG.SSS starts to decline immediately in the LS and after three years also in the central North Atlantic (near 40 • W, 45 • N).Areas of SSS decrease (increase) coincide with areas of MLD deepening (shoaling) (not shown) as surface freshening (salinification) inhibits (favours) deep convection.The intimate coupling between gyre strength, SSS and MLD in the LS found in the model is in qualitative agreement with the four-box model of Born and Stocker (2014).
At depth the drift of the T and S fields occurs highly coherent between ensemble members, but it is superimposed by weather noise near the surface.This is illustrated in Fig. 6 with the help of the pattern correlation coefficient (PCC; upper) and the rootmean-square difference (RMSD; lower).For each pair of runs from one ensemble and one start date (here: 1995) the PCC and RMSD are calculated and averaged over all pairs.For other start dates the plots look similarly.Above about 3 km depth the PCC rapidly decreases and the RMSD increases.The change occurs nearly stepwise during winter, when deep convection takes place.This is a random process, occurring at different places in different ensemble members.In regions without deep convection the development is much smoother, and the decorrelation only affects the mixed layer (not shown).
Near the surface decorrelation already takes place in the first year.Correlation recovers in the second winter, when convection brings water up from lower layers that have not been affected by weather noise during the first year (Alexander and Deser, 1995).The basic idea of decadal prediction is that the large heat capacity of the ocean provides a memory which systematically influences the atmosphere for several years.The ocean communicates with the atmosphere via its surface, especially the surface heat flux, which is related to SST.The decorrelation of the SST signal between members is fast, so one might anticipate the same for the heat flux.That this is the case is shown in Fig. 7, which shows the heat flux averaged over the SPG (black box in Fig. 3).There is a large spread between ensemble members, and large changes in heat flux occur during winter, when deep convection is randomly distributed between members (see above).Subtraction of the common drift (Fig. 7, middle) does not change the picture of large variations between ensemble members.The ensemble spread is larger than the variability of one member or that of ERA-Interim.
Despite the large intra-ensemble spread there does exist a distinguished heat flux signal (lower panel of Fig. 7).Averaged over the SPG, the surface heat flux drops by nearly 10 W/m 2 within two years.The drop is due to the cessation of the deep convection in the LS: less heat is extracted from the ocean and transferred to the atmosphere during winter.
The quick decorrelation in the upper parts of the ocean and the resulting large between-member variance of the heat flux provide a second explanation for the low predictability: To transmit an ocean signal to the atmosphere, the deep oceanic heat reservoir has to be tapped.Whether or not this happens depends on the occurrence of deep convection events, which is largely governed by chaotic atmospheric processes.In other words, a large heat content anomaly in the ocean can only influence the atmosphere when it can communicate with the latter through the surface heat flux.Note that this argument is not only applicable to the SPG with its deep convection events, but also to those parts of the ocean where vertical mixing usually reaches less deep.Also there the occurrence of deeper-than-usual vertical mixing predominantly depends on the atmospheric forcing.

Atmospheric drift
Until now we have only considered the drift in the ocean.As Fig. 8 shows, also the atmosphere exhibits a large drift.The drift patterns for 2-metre temperature (T2m) show some resemblance with the differences between the EC-Earth climatology and observations as presented in Hazeleger et al. (2012).The Northern Hemisphere mid-latitude continents warm up, most of the ocean equatorward of ±40 • cool, and a warming develops around Antarctica.For sea level pressure (SLP) the resemblance is smaller.So at least for T2m we can describe the atmospheric drift as a "return to model's own climatology".However, unlike the oceanic drift, the atmospheric drift takes place nearly immediately.In the Northern Hemisphere most of the change takes place during the first year.This can be seen from Fig. 9, which shows the evolution of the ensemble-mean of

Discussion
Full-field initialized decadal prediction runs with EC-Earth exhibit a large drift in the ocean.In the North Atlantic Ocean, the overturning strength reduces by more than 20%, a value far exceeding the model's natural variability.At the same time deep convection in the LS ceases, and the strength of the SPG reduces by a similar fraction.The cessation of the deep convection reduces the heat flux averaged over the SPG by ≈ 10 W/m 2 .Let us assume that cold air from the North American continent needs one day to cross the area of the SPG.The drift-related heat flux reduction of 10 W/m 2 is enough to reduce the heating of a 1 km column of air by slightly more than 1 K. Due to the westerly flow over the North Atlantic this signal should be advected to Europe and lead to cooling there.Europe indeed experiences a cooling (Figs. 8 and 9).However, this cooling is strongest in southern Europe and northern Africa and therefore probably not caused by advection from the SPG.Furthermore, it occurs abruptly in the first year, while the largest heat flux reduction only takes place after two years (Fig. 7).We are forced to conclude that the drift in the atmosphere is not caused by the drift in the ocean, but is an autonomous process.Given the timing of the drifts in ocean and atmosphere the cause-effect relation may even be the other way around: The initial fast drift in the atmosphere causes the drift in the ocean which has a longer duration due to the slower oceanic time scales.Although this suspicion needs confirmation by dedicated model simulations it is backed by available evidence.In their forced NEMO runs, Hirschi et al. (2013) find that "to a large extent the MOC directly reflects the atmospheric forcing on sub-to interannual timescales", and Stepanov and Haines (2014) find that MOC variability is well-represented in a forced model run and follows wind variations.
To summarize, an ocean signal that far exceeds the internal variability of the model is not able to impact the atmosphere.This raises serious doubts on the feasibility of decadal predictions, at least using EC-Earth.However, proven predictability is also low in other models (e.g., Oldenborgh et al., 2012;Hazeleger et al., 2013a;Karspeck et al., 2015).
Apart from causing doubts on the feasibility of decadal predictions, this result also raises questions on the mechanism assumed to cause decadal predictability, i.e., the large heat content of the ocean, which, upon release to the atmosphere, would systematically influence the atmosphere.Obviously this mechanism does not work, at least not in the models.One possible explanation is the unpredictability of deep convection which has to tap into the oceanic heat reservoir to transmit the ocean signal to the atmosphere.

Summary
We have performed decadal prediction runs using EC-Earth with full-field initialization.As these observation-based fields are not compatible with the model climatology, the model starts to drift away from the initial condition towards its own climatology.
In the ocean the drift signal is much larger than the model's internal variability, and the drift is discernible over several years.In Ocean Sci. Discuss., doi:10.5194/os-2016-27, 2016 Manuscript under review for journal Ocean Sci.Published: 30 May 2016 c Author(s) 2016.CC-BY 3.0 License.contrast, the drift in the atmosphere is small and of short duration.It is obviously not caused by the large drift in the ocean.That the large ocean drift in not able to systematically impact the atmosphere causes doubts on the feasibility of decadal climate predictions.
Beyond the large drift which obscures a possible signal we have identified another mechanism that limits decadal predictability.To be able to influence the atmosphere, oceanic heat content anomalies must be seen by the atmosphere.This is only possible if convection or vertical mixing reaches deep enough.However, vertical mixing (be it wind or heat flux driven) is mainly atmospherically forced and thus unpredictable.This argument is valid everywhere, not only in areas of large drift, and can explain the lack of decadal predictability found so far in coupled models.
Fig.1for ORAS4.The initial model drift is large, but within observational constraints.On the other hand the model variability

Figure 1 .
Figure 1.AMOC strength (in Sv), 12mrm-smoothed.Upper panel: full strength; thick black line: ORAS4, thin coloured lines: 15 individual members from three ensembles, thick coloured lines: ensemble averages.That the experimental lines do not start at (or close to) the ORAS4 lines is an artifact of the applied 12mrm smoothing.Middle panel: anomalies for ORAS4, drift-subtracted for experiments.Lower panel: The drift as a function of lead time for five ensembles.

Figure 2 .
Figure 2. Strength of the SPG (in Sv), 12mrm-smoothed.Upper panel: full strength; thick black line: ORAS4, thin coloured lines: 10 individual members of two ensembles, thick coloured lines: ensemble averages.That the experimental lines do not start at (or close to) the ORAS4 lines is an artifact of the smoothing.Middle panel: anomalies for ORAS4, drift-subtracted for experiments.Lower panel: The drift as a function of lead time for two ensembles.

6Figure 3 .Figure 4 .
Figure 3.Comparison between (left) ORAS4 and (right) the PD run ofSterl et al., (2012).Upper row: Atlantic overturning stream function (the panel for PD is Fig.15afromSterl et al. (2012)), middle row: barotropic stream function, lower row: detrended time series of overturning strength (black), SPG strength (red) and GS strength (blue).Time series are 12mrm-smoothed.To fit to scale the SPG and GS series are divided by 5 in case of ORAS4 and by 2 for PD, and the sign of the GS strength is inverted.The correlations between the different curves are given in Table1.The blue box in the plot for the ORAS4 barotropic stream function denotes the region over which the strength of the SPG is determined, the red box the region to determine GS strength, and the black box the region over which the heat flux is averaged in Fig.7.

Figure 6 .
Figure 6.PCC (upper) and RMSD (lower) for temperature (colours) and salinity (contours) in the SPG, averaged over combinations of runs with start date 1995, and after subtraction of the common drift.

Figure 7 .
Figure 7. Surface heat flux (W/m 2 ) averaged over the SPG (black box in Fig. 3), 12mrm-smoothed.The thick black line is for ERA-40, the thick red line for ERA-Interim.The thin lines are for different ensemble members.Upper panel: Full signal.For the two reanalyses a constant offset of 20 W/m 2 had to be applied to bring the values into the same range as the model output.Middle panel: Anomalies for ERA-40 and ERA-interim, drift-corrected for model.Lower panel: Drift of the surface heat flux as a function of lead time.The different lines are for five different ensembles.

Figure 8 .
Figure8.Ensemble-mean differences between year 9 and year 1 for SLP (upper, in hPa) and T2m (lower, in K) for two different ensembles, distinguished by colours and contours, respectively.

Figure 9 .
Figure 9. Ensemble-mean development of T2m anomalies, averaged over the longitude band spanning Western Europe (10 • W-10 • E), for two ensembles that are distinguished by colours and contours.