Application of EnOI Assimilation in BCC _ CSM 1 . 1 : Twin 1 Experiments for Assimilating Sea Surface Data and T / S Profiles 2

Abstract. We applied an Ensemble Optimal Interpolation (EnOI) data assimilation method in the BCC_CSM1.1 to investigate the impact of ocean data assimilations on seasonal forecasts in an idealized twin-experiment framework. Pseudo-observations of sea surface temperature (SST), sea surface height (SSH), sea surface salinity (SSS), temperature and salinity (T/S) profiles were first generated in a free model run. Then, a series of sensitivity tests initialized with predefined bias were conducted for a one-year period; this involved a free run (CTR) and seven assimilation runs. These tests allowed us to check the analysis field accuracy against the truth . As expected, data assimilation improved all investigated quantities; the joint assimilation of all variables gave more improved results than assimilating them separately. One-year predictions initialized from the seven runs and CTR were then conducted and compared. The forecasts initialized from joint assimilation of surface data produced comparable SST root mean square errors to that from assimilation of T/S profiles, but the assimilation of T/S profiles is crucial to reduce subsurface deficiencies. The ocean surface currents in the tropics were better predicted when initial conditions produced by assimilating T/S profiles, while surface data assimilation became more important at higher latitudes, particularly near the western boundary currents. The predictions of ocean heat content and mixed layer depth are significantly improved initialized from the joint assimilation of all the variables. Finally, a central Pacific El Nino was well predicted from the joint assimilation of surface data, indicating the importance of joint assimilation of SST, SSH, and SSS for ENSO predictions.


Introduction
Oceans play a key role in the predictability of the climate system due to their tremendous thermal inertia compared to atmosphere or land (Counillon et al. 2014).
Accuracy of the ocean initialization during modeling can significantly impact seasonal to decadal climate predictions (Alves et al. 2011;Zheng and Zhu 2015).A common strategy to obtain the optimal initialization is to assimilate available ocean observations into ocean models, which aim to produce the best estimates of ocean states.
There have been many advances in data assimilation techniques ranging from the relatively simple optimum interpolation (OI) and three-dimensional variational methods (3DVAR) to more sophisticated four-dimensional variational methods (4DVAR) and the Ensemble Kalman Filter (EnKF) approach.The OI and 3DVAR based schemes are computationally cheap to perform and have been widely used in operational ocean forecasting systems.However, both OI and 3DVAR use the time-invariant background error covariance, which tends to produce inaccurate analyses in areas with highly nonlinear flows.This problem can be partly solved by using the flow-dependent error covariance adopted in EnKF and 4DVAR.
Although EnKF and 4DVAR have been used in many studies, practical problems still exist for realistic ocean applications, especially for operational global ocean data assimilation systems.One disadvantage is that EnKF and 4DVAR are computationally expensive to perform.For example, the computational costs of EnKF Ocean Sci.Discuss., https://doi.org/10.5194/os-2017-31Manuscript under review for journal Ocean Sci. Discussion started: 6 June 2017 c Author(s) 2017.CC BY 3.0 License.increase linearly with the ensemble number N. A value of N >20 is unaffordable for operational forecasting given our current limited computer resources, while EnKF usually requires more than 20 ensemble members (e.g., Miyazawa et al. 2012;Xu et al. 2013;Xu and Oey 2014).
We adopt a computationally inexpensive Ensemble Optimal Interpolation (EnOI) approach in this study.EnOI runs only a single model member every time and has no risk of ensemble collapse (Pan et al. 2014).Its analysis formula is identical to that of the Local Ensemble Transform Kalman Filter (LETKF, refer to Miyazawa et al. 2012 for details), except that its background error covariance is advanced from a prescribed 100 static ensemble members instead of a flow-dependent ensemble.In general, EnOI has many attractive characteristics such as multivariate assimilation and inhomogeneous and anisotropic covariance.In addition, the static ensembles for EnOI can be time-dependent (e.g., Oke et al. 2005Oke et al. , 2013;;Fu et al. 2008) or seasonally varying.Consequently, EnOI has been used in many operational ocean forecast systems such as BODAS (Bluelink Ocean Data Assimilation System) at the Bureau of Meteorology in Australia (Oke et al. 2013).
A new generation of climate forecast system at the Beijing Climate Center is under development (Beijing Climate Center Climate System Model, BCC_CSM1.1)(e.g., Wu et al. 2010;Wu et al. 2014).BCC_CSM1.1 is a fully coupled climate system consisting of atmosphere, land, ocean, and sea ice components.The primary objective in regard to developing BCC_CSM1.1 is to generate a high-quality Ocean Sci.Discuss., https://doi.org/10.5194/os-2017-31Manuscript under review for journal Ocean Sci. Discussion started: 6 June 2017 c Author(s) 2017.CC BY 3.0 License.reanalysis dataset and improve predictions from sub-seasonal, seasonal, and up to decadal time scales.The development of a data assimilation system is crucial for this objective.One purpose of this study is to introduce the new ocean data assimilation system that is going to be adopted in the BCC_CSM1.1.
The other purpose of this study is to investigate the impact of data assimilation of various available observations on seasonal forecasts.For this purpose, the individual and combined contributions of sea surface satellite data to forecasting, such as sea surface temperature (SST), sea surface height (SSH), sea surface salinity (SSS), and temperature and salinity (T/S) profiles were evaluated.Model generated SST, SSH, and SSS were taken as pseudo-observations of satellites, and T/S profiles close to locations of Argo floats were chosen to represent pseudo-observations of Argo.The satellite sea surface data and Argo float data are major observational data sources nowadays, with global coverage and widespread availability in most of the ocean observing network.The satellite SST observations have been widely used in ocean assimilation applications since SST is a key geophysical variable in air-sea exchanges of heat (e.g., Tang et al. 2004).The SSS plays an important role in surface mixed layer dynamics, water mass formation, and global ocean circulation (Vernieres et al., 2014).The satellite observations of SSS have been available since the first satellite was launched by the European Space Agency (ESA) to monitor SSS (Boutin et al. 2016).Global SSH data from TOPEX/Poseidon altimeters have been available since October 1992.The dynamic topography depicts the surface geostrophic flow field.For example, assimilation of SSH contributes to better understanding of the tropical Pacific variability (Carton et al. 2008) and El Niño forecasting (Ji et al. 2000).One major concern in assimilating SSH is how to project the surface information downward to subsurface quantities (Fu et al. 2011).At present, SSH data are assimilated into ocean models either by developing a statistical relationship between SSH and subsurface temperature/salinity (Behringer et al. 1998;Yan et al. 2004) or by the inherent multivariate relation derived from the ensembles by using some ensemble-based data assimilation methods (Oke et al. 2008;Zheng et al. 2015).T/S profiles improve the representation of seawater density, which dictates water mass.In addition, T profiles have a direct influence on ocean heat content.
EnOI is implemented in a global ocean model (about 110 km in the horizontal) based on MOM4.0, which is the ocean model used in BCC_CSM1.1.An idealized twin experiment was carried out to test the assimilation and prediction system in a situation where the "truth" was known.The "observed" SST, SSH, SSS, and T/S were derived from free mode simulations and considered as the "truth."This paper is organized as follows.Section 2 presents a brief introduction of the EnOI data assimilation system and experimental setup.The assimilation and forecast results of all experiments are presented in Section 3. The discussion and summary are given in Section 4.

EnOI
EnOI is a simplified form of EnKF, which uses a stationary historical ensemble of model states to represent the background covariance matrix instead of time-dependent ensembles for EnKF.Consequently, it is more computationally efficient than EnKF, but is still multivariate and three-dimensional.In this study, we derive EnOI based on LETKF, an advanced version of EnKF (Miyoshi et al. 2010).
Here, the calculation equations of EnOI are given below (Oke et al. 2013).
where φ  is an m-dimensional vector representing the model analysis, φ  is an m-dimensional vector representing the model forecast, and  is a scaling factor used to represent the instantaneous forecast error variance, which is usually less than the historical error variance over a long time period. is in the range between 0 and 1, and it was set to 0.5 here by tuning the assimilation results.A is the historical ensemble composed of model states, and A ′ is the centered historical ensemble (i.e., =  ̅ ).= ∑ , and  represents the number of the historical ensembles. (A ′ , ) is an m × N matrix.  is an N-dimensional vector calculated from the observational data, model forecast, and historical ensemble model simulations; it can be computed as follows: where d represents the measurements, H is the measurement operator that interpolates the model space into the observational space, and R is the measurement error covariance.
Localized use of observation data is important in the method.The primary benefit of localization is to increase the rank of the forecast covariance, thus resulting in analysis fields that fit well with the observations (Oke et al. 2007).Localization is implemented explicitly in consideration of observational data from a region surrounding the target model grids.We define two localized scale parameters following Miyoshi et al. (2010) and Miyazawa et al. (2012): (3) where   (the number of surrounding grids) and   (meters) are the horizontal and vertical localization scales, respectively.The localization scale is chosen to correspond to the distance at which the Gaussian function drops to  -0.5 (Miyoshi et al. 2010).Observational data far from the target grid with horizontal distances larger than   or vertical distances larger than   are not used.A factor, exp( .5 * (( )), is multiplied to enhance observational errors of data far from the target grid (Miyazawa et al. 2012).The resulting localization scales are approximately 110 km and 2000 m in the horizontal and vertical, respectively.

The global ocean model
We have implemented the EnOI algorithm in MOM4, which was originally developed at the Geophysical Fluid Dynamics Laboratory (Griffies et al. 2003).The model covers the global ocean with a horizontal resolution of 1° and at 50 vertical levels.In the meridional direction, the resolution increases to 1/3° within 10° of the equator, and it smoothly reduces down to 1° poleward of 30°.To avoid a singularity at the North Pole, tripolar grids are adopted (Griffies et al. 2005).The physical parameterization schemes used in the simulation include the K-profile parameterization vertical mixing scheme, the isopycnal tracer mixing and diffusion, and the Laplace horizontal friction scheme, etc., the same as described in Griffies et al. (2005).
The model is driven by wind stress and heat fluxes estimated from 6-hourly atmospheric variables obtained from the National Centers for Environmental with restoring time scales of 90 and 120 days, respectively.Tidal forcing is not included.Sea ice is simulated with the Sea Ice Simulator (SIS) (Griffies et al. 2011).
In this study, the model was first spun up from 1948 to 2000, and a statistically quasi-equilibrium ocean field was established.This run was then continued from January 1, 1990 through 2009.One hundred ensemble members for estimates of the

Twin-experiment setup
An experiment (denoted as TRU), which was allowed to freely run from January generally vary from 1 cm to 4 cm (Chambers et al. 2003), and thus, the pseudo SSH error was specified as 3 cm.The SST error was set to be 0.3°C according to Guan and Kawamura (2004).The SSS error was set to be 0.1 PSU in consideration of the rapid development in inversion algorithms for satellite salinity (Peng et al. 2016).Similarly, for T/S profiles, the temperature and salinity errors were prescribed to be the same as the SST error and the SSS error, respectively.
Assimilation experiments, E01-E07 initialized as CTR, assimilated pseudo-observations from January 1, 2005 to December 31, 2005 (Table 1).E01 assimilated SST only, E02 assimilated SSH only, E03 assimilated both SST and SSH, E04 assimilated SSS, E05 assimilated all the SST, SSH, and SSS data, E06 assimilated T and S, and E07 assimilated all the variables.Then, we conducted seven 12-month test forecasts starting from January 1, 2006 corresponding to the seven analyses.By comparing the model states from CTR, E01-E07 against the known "true" states, we were able to investigate the performance of the assimilation system and forecast skills.

Assimilation performance measures
To evaluate the performance of data assimilation experiments, we examined the RMSEs by half (Fig. 2a).These results indicate that assimilation of SSH and SSS alone do not contribute to the SST analysis in the system.The SSS RMSEs of E04, E05, and E07 were reduced by about 80% compared to CTR (Fig. 2b).The RMSEs of E03 and E06 were reduced by about 30%.E01 and E02 only slightly improved SSS estimates.So experiments with SSS assimilation improve the SSS the most, while assimilating T/S profiles alone (E06) or SSH and SST (E03) can only improve SSS to a limited extent.For SSH, E02, E03, E05, and E07, the RMSE of SSH was reduced by about 82%, thus indicating the importance of the SSH assimilation (Fig. 2c).The T/S profile assimilation (E06) reduced the SSH RMSE by about half.SST (E01) and SSS (E04) only improved the SSH by about 20%.For the analysis of temperature and salinity at depth, all experiments showed improvements (Fig. 2d-g).E07 had the smallest RMSEs among all experiments.The RMSEs obtained when assimilating SST alone (E01) were larger than those obtained when assimilating T/S profiles (E06).
Similarly, the RMSEs obtained when assimilating SSS alone (E04) were larger than those obtained when assimilating T/S profiles (E06).These results demonstrate the importance of the assimilation of T/S profiles in the global data assimilation system.

Predictions
The impacts of data assimilation on seasonal forecasts were investigated by conducting a 12-month forecast initialized from restart files produced by CTR, E01-E07.The forcing was identical for all cases.
The time series of spatial RMSEs for temperature and salinity among all forecast showed some deficiencies, such as E01 for the SSS forecast (Fig. 3b) and E04 for the SSH forecast (Fig. 3c), etc.In contrast, in deep water (500-1500 m), the persistence beat the model forecasts because of the large bias from the initialization starting on June 1 st in the CTR run and all assimilation runs (Fig. 3e&g).These results demonstrate that the deep ocean bias cannot be completely corrected after one-year assimilations, though improvements are possible.
In addition to the aforementioned temporal variability, the spatial variability of Figure 4 shows the spatial distribution of the RMSE for SST from CTR and the differences with respect to E01-E07 over the prediction period (months 13-24).
Reductions of the RMSE were generally found in all experiments in regions where the RMSE was relatively high in CTR, such as the tropical eastern Pacific, the subarctic, and the Southern Ocean.The domain-averaged RMSE from E07, which was about 0.11°C, was the smallest.Interestingly, the experiment initialized from the assimilation of T/S profiles (E06) produced a comparable RMSE to the one from the experiment initialized from the joint assimilation of all surface data (E05).The values of the domain-averaged RMSEs from these two experiments were much smaller than those of E01, E02, and E04.These results indicate the importance of multivariate assimilation on initial conditions, and subsequently the forecasting.
Noticeably, in the Southern Ocean south of Africa ([0°E-60°E], [48°S-60°S], the black box in Fig. 4), the RMSEs for SST were much larger in E01, E02, and E03 than that in CTR.To explore the reasons for the high RMSEs, we examined the time evolution of vertical profiles of temperature averaged over the high RMSE region.
Figure 5 shows the vertical profiles of temperature and the corresponding RMSEs in January and September from all experiments.In January, SST estimates in E01, E03, E05, and E07 were much better than that in CTR (Fig. 5a,c).Conversely, in the subsurface layer (50-150 m), the values of RMSE in E01 and E03 were about 0.3°C larger than that in the CTR.In September, a thick mixed layer about 100 m deep developed after the austral winter (Fig. 5b,d).A pre-existing subsurface bias of The subsurface bias probably came from inaccurate estimates of the background error covariance in the multi-fronts Antarctic circumpolar region during the surface data assimilation.

Ocean surface currents
Large-scale ocean circulation is primarily geostrophic a few degrees away from the equator.Because of the availability of long-term satellite altimeter data, geostrophic parts of any model generated currents can be easily evaluated by using altimetry data.It is hence interesting to assess all forecasting experiments in this regard.Figure 6 compares the RMSEs of the predicted SSH from all experiments.
Compared to CTR, significant improvements of SSH were found in E02, E03, E05, E06, and E07.Similar to the SST RMSE reduction, the large reduction primarily occurred in regions where the RMSE was large in CTR.However, E01 and E04 showed almost no reduction, thus indicating that assimilation of SST or SSS alone cannot largely improve SSH forecasting.and so on.Comparisons of E01-E05 revealed that SSH assimilation was the dominant factor accounting for the forecast improvements.In contrast, major improvements in E06 were observed in the tropical regions, and these were more prominent than those in E05.These results indicate that the ocean surface currents in the tropics are better predicted when initial conditions are produced by assimilating T/S profiles, while surface data assimilation becomes more important at higher latitudes, particularly near the western boundary currents.This can be attributed to the dominant effects of SSH assimilation on geostrophic parts of surface currents away from the equator.

OHC
Ocean heat content is an important variable in climate studies, and it reflects the internal energy that the ocean has.To assess the standalone and joint effects of assimilation of surface data and T/S profiles on ocean predictions, we explored the upper 700 m OHC estimates from all experiments.Figure 8 shows the global distribution of the time-averaged upper 700 m OHC per unit area from all forecast experiments relative to that from the "truth."E07 had the smallest RMSE for OHC compared to TRU (Fig. 8h).The RMSE in E06 was about 1.2 × 10 8 J m -2 larger than that in E05, and this was primarily caused by the large bias in the subpolar regions (Fig. 8f,g), where the T/S profiles were relatively sparse compared to the gridded satellite surface data.In the lower latitudes, the difference between E06 and TRU was smaller than that in E05.Interestingly, none of the standalone assimilation of surface data experiments (E01, E02, or E04) significantly improved the OHC estimates (Fig. 8b,c,e).The joint assimilation of SST and SSH had already reduced the deficiency of OHC predictions to a large extent (Fig. 8d).When SSS was assimilated, the reduction was more significant (Fig. 8f).Thus, both surface variables and T/S profiles are important for OHC predictions.

MLD
Mixed layer depth is one of the most important quantities in the upper ocean.
Here, we define MLD following Breugem et al. (2008) as the depth (z) at which the potential density is  = ( .), where θ 10m and S 10m are the potential temperature and salinity at a depth of 10 m, respectively, and σ is the potential density.
We examined the MLD in the tropical Pacific since its variability is closely related to the El Niño Southern Oscillation (ENSO).experiments, but the improvements were most significant in E06 and E07 (Fig. 9g,h).
Noticeably, the differences of MLD from E02 were much smaller than those from E01, which suggests that assimilation of SSH instead of SST is important for To apply the data assimilation method in actual operations, we can decrease the forecast run number to 1 in LETKF.This time evolving run is combined with 100 static ensemble members to build the background error covariance for data assimilation.We found that increasing the number of time evolving ensemble members from 1 to 5 was not effective for obtaining more accurate ensemble covariance using the LETKF assimilation scheme.If the number increases to 20, the covariance is more accurate.This has been demonstrated in many previous studies such as Miyazawa et al. (2012), Xu et al. (2013), and Xu and Oey (2014).However, the computer costs of more than 10 ensemble numbers limits the ability of this approach in operational applications.
The inclusion of large errors in the initial conditions was aimed at testing the ability of the ocean assimilation system to correct the errors.We also conducted a series of experiments similar to the above experiments but initialized from January 5, 1990 to reduce initial errors.Note that TRU was initialized starting from January 1, 1990.The results showed improvements of the analysis and forecasts as well, though not as significant as those from the experiments listed in Table 1.
The newly developed system was tested in a twin-experiment framework.This approach allowed for extensive tests of system accuracy, and such an approach has been widely used in data assimilation studies (e.g., Counillon et al. 2014;Zhou et al. 2016).Even though the results were encouraging, our plan is to conduct a comprehensive test in a realistic framework before the system is put into operation.Finally, coupled assimilation with ice properties and atmospheric fluxes has been 429 shown to be advantageous in climate system analyses (e.g., Zhang et al. 2009 Ocean Sci.Discuss., https://doi.org/10.5194/os-2017-31Manuscript under review for journal Ocean Sci. Discussion started: 6 June 2017 c Author(s) 2017.CC BY 3.0 License.Furthermore, large-scale variability of SSH has close connections with climate signals.
Ocean Sci.Discuss., https://doi.org/10.5194/os-2017-31Manuscript under review for journal Ocean Sci. Discussion started: 6 June 2017 c Author(s) 2017.CC BY 3.0 License.background error covariance were sampled from the free-run simulation at time intervals of 25 days from January 1, 1995 to December 31, 2009.
1, 2005 to December 31, 2006, was designed to produce pseudo-observations and make comparisons with the assimilative analysis and model predictions.Another free model run (denoted as CTR), which was the same as TRU but initialized on a start date of June 1, 1990, was used to create large biases for initial conditions.The sea surface pseudo-observations including SST, SSH, and SSS were selected every three points in the model grids meridionally and zonally.The pseudo T/S profiles were selected at model grids that were as close to the locations of Argo floats on June 1, 2005 as possible (Fig.1).This time was chosen because it was the median time of the assimilation period(January 1, 2005-December 31, 2005).Considering the slow drifts of most Argo floats, the locations of pseudo T/S profiles did not change with time because of the relatively low model resolution of about 1° × 1°.The vertical levels are set as the same as the model levels.The altimeter SSH errors Ocean Sci.Discuss., https://doi.org/10.5194/os-2017-31Manuscript under review for journal Ocean Sci. Discussion started: 6 June 2017 c Author(s) 2017.CC BY 3.0 License.
domain-averaged root-mean-square error (RMSE) of SST, SSH, SSS, temperature, and salinity in the upper ocean (0-500 m) and the deep ocean (500-1500 m) with respect to the TRU experiment from months 1 to 12 (Fig. 2).All assimilative experiments generally showed improvements over CTR, but the improvements varied among different experiments.The SST RMSEs of E02 and E04 were comparable with that of CTR, while the other assimilation experiments approximately reduced the Ocean Sci.Discuss., https://doi.org/10.5194/os-2017-31Manuscript under review for journal Ocean Sci. Discussion started: 6 June 2017 c Author(s) 2017.CC BY 3.0 License.
Ocean Sci.Discuss., https://doi.org/10.5194/os-2017-31Manuscript under review for journal Ocean Sci. Discussion started: 6 June 2017 c Author(s) 2017.CC BY 3.0 License.experiments and TRU are shown in Fig. 3.All RMSEs from the forecasts initialized from the assimilated runs were smaller than that from CTR.The E07 forecast, initialized from joint assimilation of SST, SSS, SSH, and T/S profiles, had the smallest RMSEs for temperature and salinity compared to the others.Figure 3 also shows the "persistence" curves (black lines) based on TRU, i.e., the temperature, salinity, and SSH from TRU (January 1, 2005-December 31, 2005) were assumed as "repeat" for the subsequent 12 months.The model forecasts from E03, E05, and E07 beat the persistence in the upper ocean (Fig. 3a-d,f), while the other experiments ocean predictions was evaluated in different experiments as well.The SST and surface currents were compared at first.Ocean heat content (OHC) and mixed layer depth (MLD) were used to examine the subsurface predictions.Besides, the values of the Niño 3.4 index were compared too, because it is an important climate signal in the tropical Pacific.SST Ocean Sci.Discuss., https://doi.org/10.5194/os-2017-31Manuscript under review for journal Ocean Sci. Discussion started: 6 June 2017 c Author(s) 2017.CC BY 3.0 License.
Fig. 4 mainly developed as a result of the large temperature bias in the subsurface.
Ocean Sci.Discuss., https://doi.org/10.5194/os-2017-31Manuscript under review for journal Ocean Sci. Discussion started: 6 June 2017 c Author(s) 2017.CC BY 3.0 License.The RMSEs of the predicted surface current speed from all experiments are compared in Fig. 7.The largest reduction of the overall RMSE was observed in E07.E02, E03, and E05 showed clear improvements near the tropical Pacific and the western boundary current regions such as the Gulf Stream, the Kuroshio extension, Ocean Sci.Discuss., https://doi.org/10.5194/os-2017-31Manuscript under review for journal Ocean Sci. Discussion started: 6 June 2017 c Author(s) 2017.CC BY 3.0 License.

Figure 9
shows the spatial distribution of differences of the time-averaged MLD (over months 13-24) from all experiments relative to TRU in the tropical Pacific.A weak Central Pacific (CP) El Niño appeared in TRU.Compared to CTR, improvements in MLD were seen in all seven