Multivariate analysis of extreme storm surges in a semi-enclosed bay

The prediction of extreme storm surges is a critical task for coastal area protection. This study 10 examines extreme storm surges in Beibu Bay, a semi-enclosed bay in the South China Sea, and their joint probabilities. A method for the advanced prediction of the extreme storm surges is proposed using a multivariate extreme statistical method. We further present practical guidelines of the proposed multivariate analysis method, including guidelines for simulation. The simulation can be extended to multidimensional data to simplify computation, so the proposed approach can be 15 extended to use more points’ data from the semi-enclosed bay for predicting extreme storm surges probabilities. A practical case study illustrates the application of the proposed techniques for extreme storm surges prediction. A comparison of the conditional probabilities obtained from observations and simulation data show that the proposed method is effective.


Introduction
The prediction of extreme storm surges due to typhoons is a critical problem for low-lying coastal areas.This problem is expected to increase because of climate change (Lowe et al., 2010;Weisse et al., 2012;Weisse et al., 2014;Mcinnes et al., 2003;Tebaldi et al., 2012;Arns et al., 2015).
Inundation caused by storm surges in the Chinese coastal region severely impacts and poses a 25 continual threat to the activities and safety of the people living there.Between 1990 and 2010, storm surge disasters caused economic losses of approximately RMB 10.5 billion and 148 deaths, and affected 11.5 million annually.(Gao et al., 2014).Thus, storm surge prediction and rapid disaster Ocean Sci.Discuss., doi:10.5194/os-2016Discuss., doi:10.5194/os- -94, 2017 Manuscript under review for journal Ocean Sci.Published: 3 February 2017 c Author(s) 2017.CC-BY 3.0 License. 2 information dissemination are vital for facilitating evacuation and disaster mitigation in China.
This study focuses on extreme storm-surge probability prediction in Beibu Bay in the South 30 China Sea (SCS) using a multivariate extreme statistical method called multivariate generalized Pareto distribution (MGPD).Recently, research on the application of multivariate extreme values (EVs) has increased.In these studies, several possible probability distribution functions have been used to characterize extreme sea level events: the Copula function (Salvadori et al., 2015;Corbella and Stretch, 2012;Michele et al., 2007;Salvadori et al., 2013;Tao et al., 2013), multivariate EV 35 function (Morton and Bowers, 1996;Coles and Tawn, 1994;Bhunya, et al., 2011), and MGPD function (which is a type of multivariate EV function) (Falk et al., 2004;Rootzé n and Tajvidi, 2006).
In fact, multivariate EV analysis requires a reasonable extrapolation from observed states to unobserved states.Therefore, in EV analysis, these distribution functions are not only fitted by observed states, but they also predict unobserved states by reasonable extrapolation.The 40 multivariate EV and MGPD functions are derived from EV theory, a branch of probability theory (Coles, 2001;Beirlant et al., 2005).However, only the MGPD function is the natural distribution of the multivariate peak over threshold (MPOT) sampling method, which retains more extreme information from the raw data than the annual maxima method (Zhang et al., 2000;Yap and Zhu, 2014;Mené ndez and Woodworth,2010).The annual maxima method often ignores multiple severe 45 storm waves that occur in the same year, which may be much larger than the annual largest waves in many other years.Consequently, the annual maxima method may underestimate extreme variables (You and Yin, 2006).Moreover, the uncertainty of the return level estimation by the peak over threshold method is smaller than that by the annual maxima method under different sample lengths (Yao et al., 2012).50 In this paper, we do not specifically address MGPD theory; its details can be found in (Falk, 2004;Rootzé n and Tajvidi, 2006;Beirlant et al., 2005;Tajvidi, 1996).We instead present our developed MGPD procedures and show how the methodology can be exploited for the analysis of extreme surges at two adjacent sites.The proposed theory and its statistical methodology are presented in Section 2. In MGPD, determining the joint threshold and estimating the joint density 55 are critical tasks.These aspects are presented via an example in Sections 3 and 4, respectively.
Finally, MPOT advantages and application possibilities based on Monte Carlo simulation are outlined in the conclusion in Section 5.

G x x
 can greatly enrich the expression of W(X) for various dependence relationships (Falk 70 et al., 2004).Among them, W(X) of the logistic type is widely used in hydrology, finance, and other fields because of the favorable statistical properties of the following Pickands dependence function: which means the MGPD cumulative distribution function can be rewritten as Correlation parameter r can be evaluated using a step-by-step method, i.e., we first evaluate 80 two marginal distributions and then evaluate r within MGPD including the marginal distributions.
Alternatively, r can be evaluated using a global method in which r is directly estimated using the maximum likelihood for density function w .The global method more reliably evaluates the results because of its final function form; however, its evaluation processes are more complex.The maximum-likelihood function is 85

Simulation method
The Monte Carlo simulation method for a multivariate distribution is complex owing to the generation of multivariate random vectors.In general, the variables firstly are transformed into independent variables, for which random vectors are generated for each variable.The final random 90 vectors of the multivariate distribution are obtained by an inverse transformation.The MGPD simulation method was suggested in Michel (2007).
To more effectively demonstrate the MGPD simulation method, we use p T to transform vector 1 ( ,..., ) x into polar coordinates as follows:

   
1 1 1 , , ,, Then  depends only on z and, therefore, we put     and constant 0 0 c  exists and is close to zero, we can prove that c is a uniform distribution P (C > c) = µ|c| on 0 ( ,0) c . For details see Falk et al. (2004).So MGPD simulation method (details in Michel, 2007): 1) generates random numbers uniformly distributed on 105 unit simplex in Pickands polar coordinates and an acceptance-rejection method, 3) generates random numbers uniformly distributed on   0 ,0 c , and 4) calculates vector , which is a random vector that satisfies the multivariate over threshold distribution.Constant c0 is the joint threshold in the MGPD method, determined in this 110 study using the principle proposed in (Coles and Tawn, 1994).

Data
The data used in this study were provided by the Joint Archive for Sea Level (JASL) of the University of Hawaii Sea Level Center (http://uhslc.soest.hawaii.edu/home).The data consist of 115 simultaneous hourly sea-level observations at Beihai and Dongfang stations, which are located on the Beibu Bay coast in SCS (Fig. 1) and were collected from June 1975 to December 1997.The data can be used for the analysis of extreme surges because hourly sampling sufficiently captures high water levels.The missing values for Beihai correspond to only 0.023% of the data, while for Dongfang, 0.173% of the data was eliminated from the sample because of gauge malfunctions or 120 other issues.201,578 and 201,275 hourly values for Beihai and Dongfang, respectively, were to be processed, and the available data was judged sufficient to perform EV analysis of storm surges.

Data Analysis and Declustering
Beihai City, located in the south of Guangxi province of China, is a beach city on the coast of Ocean Sci.Discuss., doi:10.5194/os-2016-94,2017 Manuscript under review for journal Ocean Sci.Published: 3 February 2017 c Author(s) 2017.CC-BY 3.0 License.
Beibu Bay, which is a shallow, semi-closed bay.Owing to Beibu Bay's special geomorphologies, 125 its typhoon surges are violent and can cause flooding in the city.The surge levels at the site are defined as the residuals after the astronomically induced tidal components have been removed from the sea-level observations.The tidal component is cyclical and does not satisfy the basic hypothesis of a random variable.Godin's method (Godin, 1972) was used for tidal analysis.
The first stage in EV analysis is declustering, specifically, identifying a set of independent 130 events.Declustering is performed to make adjacent elements of a sample that consists of the maxima of events independent of each other.s.Our approach is to use a moving window of sufficient width to ensure that the extremes of each variable from a single 'meteorological event' are certain to fall within a single window (Coles and Tawn, 1994).Declustering techniques by Morton and Bowers (1996) were used.The features of storms and storm surges differ in each place.In Beibu Bay, the 135 main meteorological event, can generate extreme wave height, is a typhoon.The declustering of a sequence of surges is illustrated in Fig. 2. The duration of a typhoon surge in the Beibu Gulf is approximately 100 h.The components of each vector are the maximum surge at each site over a 100-h event.The peak events for the vectors in the cluster occurred at different times.The peak surges arrived at Dongfang 3-5 h before they arrived at Beihai, as shown in Fig. 2. 140 The main purpose of this study was to analyze the relationship between the extreme surges at both sites, and then predict extreme storm surge probabilities at one site.Therefore, the declustering method was judged to be appropriate because all the main peaks were included in these clusters.
The 100-h cluster interval enabled the surge peaks at both sites from the same typhoon to be in the same cluster.According to the above principles, the total number of independent events N was 2,016.145

Constructing Conditional Probability Functions
Extreme surges in the Beibu Gulf is are predominantly caused by typhoons from lower-latitude areas in the SCS.Typhoons typically move through the Beibu Gulf from south to north, with a small number of them moving from east to west.An extreme surge at Dongfang, which is to the south of Beihai, can serve as an early warning signal for Beihai.Multivariate EV analysis can be used to 150 provide such a warning.
The other four CP distributions (Y conditional on X) can be deduced by swapping two variables.

Extreme-value analysis
In this section, we focus on problems in extreme surge prediction that can be solved using the 160 statistical methodologies of EV analysis.These problems include joint threshold analysis, stochastic simulation, and the statistics of multivariate extreme surges.Finally, the interpretation of the statistical results for the extreme surges at the two locations is briefly discussed.

Marginal Transformation and Joint Threshold
Generalized Extreme Value Distribution (GEVD) includes EV ⅠⅡⅢ type distribution and 165 can describe accurately more EV events than any single component.So we choose that the marginal distributions of the 2,016 independent events could be described by the following GEVD: where ξ, σ, and μ are three GEVD parameters.They are estimated using maximum likelihood, which was suggested in Coles (2001) and Beirlant et al. (2005).Fig. 3 shows the probability plots 170 (including the 95% confidence intervals) of the marginal distributions before MGPD is fit and As described in Section 2.1, the MGPD variables must be close to zero in the negative quadrant.
The margins of a bivariate distribution can be transformed into uniform ones as suggested by Michel (2007).To standardize the margins, the marginal MGPD distribution must be a negative exponential 175 distribution.Using the Taylor expansion, we can transform the data onto (−1, 0) by computing ( ) (1 ( ) 1) ( ) 1 where F(xi) is the GEVD of index i (i = 1,2), indicating Beihai or Dongfang.In Eq. ( 11), F(xi) is close to one because this study focuses on extreme observations.
Many dependence models between extreme variables such as logistic, bilogistic, and Dirichlet 180 models have been suggested.However, the choice of dependence model is not usually critical to the accuracy of the final model (Morton and Bowers, 1996).Therefore, a simple bivariate logistic generalized Pareto distribution was selected.The MGPD model of this paper is based on a multivariate EV distribution.Its joint threshold can be calculated by the method from Coles and Tawn (1994).The joint threshold is c0 = −0.28,and there are 218 combinations of Beihai and 185 Dongfang with  >  0 .Fig. 4(a) shows the samples that are over the threshold value.In the left subfigure, c0 = −0.28 is a curve, and all observations on the right side of this curve are greater than c0.The converted data is shown in the right subfigure, where c0 = −0.28 is a line.
The correlation parameter r of the dependence function is estimated using maximum likelihood and is r = 2.15.Using the obtained estimates for all parameters, the joint extreme probability 190 distribution function is illustrated in Fig. 4(b).

Comparison of Stochastic Simulation Results
Using the simulation method in Section 2.2, we generated a very large sample of bivariate extreme storm surges.In this section, we compare the CP results obtained by simulation and direct calculation.Fig. 5 shows the results of the stochastic simulation for N = 10,000 and 100,000.The 195 simulation results are in basic conformity with the observations, which shows that the MGPD simulation method is effective.The scatter diagrams directly show the simulation result; however, they require further quantitative analysis to objectively show their differences.
We use two CPs: CP1 P(X > x | Y > y) and CP4 P(X < x | Y < y).Here, CP1 is the probability Ocean Sci.Discuss., doi:10.5194/os-2016-94,2017 Manuscript under review for journal Ocean Sci.Published: 3 February 2017 c Author(s) 2017.CC-BY 3.0 License.that the surge in Beihai exceeds x conditional on the surge in Dongfang exceeding y, and CP4 is the 200 probability that the surge in Beihai does not exceed x conditional on the surge in Dongfang not exceeding y.Fig. 6 shows that the relative difference in value between the simulation and direct calculation depends on the number of simulations N. The subplots in Fig. 6 clearly show that the relative difference is reduced as N increases.When N reaches 1.5×10 6 , the maximum relative error between the simulated and calculated results is 8.61%, which we consider satisfactory.205 In addition, we conducted runtime experiments in which 1×10 4 , 5×10 4 , 10×10 4 , 1×10 6 , and 1.5×10 6 random vectors were generated on a desktop PC with an Intel Core i7 3.4 GHz processor.
To estimate the M-year surge of Beihai and Dongfang by means of a univariate analysis, the Poisson-Gumbel distribution was used (Quek and Cheong, 1992;Naffa et al., 1991), which can be 210 described by , where () Gx is the Gumbel distribution.
Tables 2 and 3 show the values of CP1 and CP4 based on the results from 1.5×10 6 simulations.
The tables list the calculated and stochastic simulation results for five CP1 and CP4 groups for 215 different combinations of M-year surges at Dongfang and Beihai.The two results are very similar.
For instance, the directly calculated P(X > x50 | Y > y10) is 12.94% and its simulation result is 14.06%.
Their relative error is 8.61%, which is the maximum relative error for CP1.Moreover, the directly calculated P(X < x5 | Y<y50) is 99.35% and its simulation result is 94.13%.Their relative error is 5.25%, which is the maximum relative error for CP4.220

Prediction of Extreme Surges
The relationship between extreme surges at Beihai and Dongfang can be analyzed using the CP.Because the peak surges at Dongfang occur earlier than those at Beihai, we use CP1: P(X > x | Y > y) to warn of an extreme surge at Beihai.Tab. 2 shows that, when a 50-year surge appears at Dongfang, the probability of a greater than 50-year surge occurring at Beihai is 94.55%.225 Using long-term surge records, we can determine the relationship between extreme surges at Ocean Sci.Discuss., doi:10.5194/os-2016Discuss., doi:10.5194/os- -94, 2017 Manuscript under review for journal Ocean Sci.Published: 3 February 2017 c Author(s) 2017.CC-BY 3.0 License.
Beihai and Dongfang.Additionally, because of the special geographical relationship between the two locations, a peak surge at Dongfang is a signal that predicts a peak surge at Beihai.We can therefore predict the probability of different surges at Beihai and then take preemptive measures to reduce or eliminate the associated damages.230

Conclusion
The primary aim of this study is to illustrate how recent MGPD developments can be applied to marine disaster prediction.We not only developed a process for determining a joint threshold and simulation, but we conducted an analysis for extreme surge warnings using MGPD.MGPD is the 235 natural distribution of the MPOT method, which can cull extreme information from raw data.A model based on multivariate EV theory and the intrinsic properties of EVs were also considered.
A Monte Carlo MGPD simulation was used to determine the conditional probability of two extreme surges, and the proposed approach was judged to give satisfactory results by comparing the relative differences in conditional probabilities obtained from observations and simulation data.240

New Possibilities Based on Monte Carlo Simulation
Extreme surge warnings would be more reliable if the relationships of extreme surges at three or more sites could be established.Using more related (affected by the same typhoon) sites' storm surge real time data for storm surge warnings of Beihai city, the forecast result may be more believable by the statistical method presented in the paper.In this paper, the theory of MGPD 245 and its simulation was derived for multidimensional variables.The methodology could be extrapolated to higher dimensional space.Thus, difficulty of solving a procedure for MGPD cannot be exacerbated by the high dimensionality of the variables.A potentially more effective warning approach could be based on a Monte Carlo simulation.Once long-term (e.g., thousands of years) sea state data has been simulated, several ocean environmental factors can be quickly assessed by 250 the law of large numbers.
where U is a neighborhood of zeros in the negative quadrant ( ,0) d  n

Table 3
Comparison of the results for CP4 (%)