Evaluation of Peaks-Over-Threshold Method

Two extreme wave analysis models, namely Peaks-Over-Threshold (POT) and Generalized Pareto Distribution (GPD), were developed in order to improve the POT model and highlight merits and limitations of the two models. Studies have shown that the POT model was not equipped with a suitable approach to determine a true threshold value. This paper proposed an approach to specify the most suitable threshold value for the POT model, which is called Hybrid method. In addition, until now the MIR (minimum ratio of residual correlation coefficient) criterion has been used as a goodness-of-fit 5 method in the POT model. However, the examinations on the method represented that MIR is not always a stable approach in determining true distribution function. This paper proposed an alternative approach instead of the MIR criterion method, it is called Norm of Residuals, and its credibility was examined by the Chi-Square test. The results drawn from this study also demonstrated that the Hybrid method completely matched with the POT model, and the threshold obtained by this method is credible, moreover, the Norm of Residuals method is completely stable in determining the best fitting distribution for the POT 10 model.


Introduction
Currently, many types of structures are designed for different purposes in coastal and offshore regions.The force of storm waves usually influences the life span of these structures, compelling engineers to design structures to withstand these destructive waves.Therefore, the estimation of extreme waves is important in the design phase of structures in the marine areas.However, prior to the phase of designing, the probability of storm waves during the shelf life of structures should be projected.The estimation of a probable wave height is called return period of storms, and the employed method is generally known as extreme value analysis.Fundamentally, the aim of extreme value analysis is to specify the probabilities of exceedance and non-exceedance of wave heights, and non-exceedance probabilities are known as return periods.
There are several methods to estimate extreme values; two commonly used models, namely Peaks-Over-Threshold (POT) and Generalized Pareto Distribution (GPD) were developed in this study.
One crucial step in developing the POT model is to select the best fitting distribution function for dataset.To determine a best fitting distribution, two methods were introduced by Goda (1988); namely correlation coefficient method, and the MIR 1 Ocean Sci.Discuss., doi:10.5194/os-2016Discuss., doi:10.5194/os- -47, 2016 Manuscript under review for journal Ocean Sci.Published: 7 July 2016 c Author(s) 2016.CC-BY 3.0 License.
(minimum ratio of residual correlation coefficient) criterion method.However, examinations represented that the MIR criterion method has shown some instabilities in determining correct results when it comes to choosing a true distribution function.This study proposed an alternative approach instead of the MIR criterion method, it is called norm of residual (N.o.R).
Selecting an appropriate threshold value is important because the estimated extreme events by the POT model are sensitive to the changes in threshold value.Goda (2000) pointed out that in selecting a suitable threshold, users should refer to previous studies and engineering judgment.Li et al. (2012) followed the same procedure to determine a true threshold value.The authors provided a list of several extreme analysis studies, which had been conducted on the same data, and came to a conclusion with one of the thresholds used by one of those studies.However, in absence of a reliable threshold method, the other authors in the list of Li et al. (2012) could claim of the credibility of their threshold values.Moreover, if there is no preceding study for a particular data, engineering judgment alone may not be sufficient when selecting a suitable threshold.At present, there is still no appropriate approach for selecting a threshold value for the POT model.A hybrid method consists of a two-part procedure; a mean residual life (MRL) plot, and a graph of several thresholds, was proposed in this paper to specify a suitable threshold value for the POT model.
Based on the definition by Grimshaw (1993), the GPD is a two-parameter family of distribution, which is applied to a data model exceeding a known threshold value.The model was introduced for the first time by Pickands (1975) and is known to be a stable distribution for exceedances beyond a threshold.The model has been presented by many authors; such as, Davison (1984), Smith (1985), Van Montfort & Witter (1985), Hosking & Wallis (1987), and Coles (2001).
The data that was applied in this study comprised a set of wave heights gathered from shipboard observations over a 41-year period from 1949 to 1989 in the South China Sea.Since 1989, the process of data acquisition was stopped, and the modern observation was replaced by satellites.In this study, only the observed data during the 41-year period has been used, due to the focus of this study on the merits and limitations of the two models, and improving the POT model.Moreover, there is a previous study on the same data, which could be referenced and compared with the new findings.The Labuan data are a set of wave height records in which the time intervals between the successive data are considered to be at least three days to secure the required independence of the data.To fill the large gaps between the successive data; e.g., more than 4 days, the linear interpolation method was used since the data varied relatively slowly.A time interval of two to four days between successive peaks was recommended by Mathiesen et al. (1994) to maintain the condition of independency.The northeast monsoon waves (same direction with the smaller arrow in the magnified part of the map in (Fig. 1)) were the predominant waves among the wave heights for the four seasons in the region.About 65 percent of the waves during this season approached from between 30 to 60 degrees (Syed Abdullah, 1992).Therefore, to fulfill the requirement for homogeneity in the data, only storms in the compass direction of 30 to 60 degrees (northeast monsoon waves) were employed in the analysis.
The purpose of the foregoing tasks were to obtain the conditions of independent and identically distributed (iid) measurement for the Labuan data.The Labuan data was evaluated and used for the first time by Coastal and Offshore Engineering Institute, Universiti Teknologi Malaysia.( 2009) as a report on extreme wave analysis for the Marine Department of Malaysia.The results of the report (Table 1) represent the estimated extreme wave heights for several return periods.The authors employed only one distribution function (Gumbel distribution) in their analysis without considering any threshold value.The following objectives were discussed in this paper: i) developing the proposed Hybrid method to specify true threshold 10 value for the data for the POT model; ii) conducting the Norm of Residual method for the POT model to select the best fitting distribution function for the dataset; and iii) developing the modified POT (POT with the two proposed methods by this study) and GPD models, and comparing their results to evaluate the proposed methods and highlight merits and limitations of the two models.
This paper has been divided into four sections.Following this introduction, Section 2 provides summaries about the two models, and proposed methods for the POT model.Discussion about the return values and figures for both models, the goodnessof-fit tests, and the proposed methods are presented in Section 3. Finally, Section 4 contains the conclusions drawn from this study.

Generalized Pareto Distribution Model
Generalized Pareto Distribution model could be fitted for the observed wave height data with considering the following assumptions: The data are considered as exceedances from a specific threshold, which are a sequence of independent and identically distributed (iid) measurements, x (1) , . . ., x (k) .If exceeded data over a particular threshold u are defined as y (j) = x (j) − u for j = 1, . . ., k, therefore, by definition, a random variable of y (j) may be regarded as independent realization whose distribution is approximated by one of the members of the Generalized Pareto family (Coles, 2001).
The model has been evaluated and used by many authors: Smith (1984) developed and presented the model in a form of comparison method with a statistical model, Van Montfort & Witter (1985) embedded exponential distribution in GPD to check the adequacy of the exponential distribution for a set of data, Hosking & Wallis (1987) studied about several estimators for the GPD model, Moharram et al. (1993) estimated the river flood by the GPD model, Castillo & Hadi (1997) used the model for estimating wave height, Pandey et al. (2001) studied extreme wave velocity by the model, Coles et al. (2003) employed extreme analysis for estimating rainfall, Pandey et al. (2004) used the model to evaluate the sea level, Öztekin (2005) in estimating river discharge, Jagger & Elsner (2006) in studying hurricanes, and Moisello (2007) employed the model to estimate the extreme rainfall.
The generalized Pareto distribution function has the following formula, In general, the two parameters of GPD are known as: the scale parameter σ (σ > 0), and the shape parameter ξ (−∞ < ξ < ∞).The shape parameter, ξ has remarkable effects of specifying qualitative behavior of Generalized Pareto distribution.In this paper, the shape parameter is considered as ξ = 0.The case, ξ = 0 considered as the exponential distribution which is not discussed here.

Threshold Value
First Technique: With regard to the above assumptions, specifying a suitable threshold is the first step of developing the model.Two methods are available to this aim: first, mean residual life (MRL) plot is applied as an exploratory technique, which is carried out prior to the model estimation (Fig. 2).The method is described as a sequence of x (1) , .considered as n (u) observations, exceeded over a threshold u, and x max is the greatest value of x i .Therefore, Eq. ( 2) is termed as the MRL formula, and denoted as, Second Technique of Determining Threshold Value: The second method to determine threshold value was introduced as the complementary technique to evaluate the stability of parameter estimation (Coles, 2001).In general, the method of work is to fit GPD for a range of thresholds, and searching for the stability of parameter estimates.Therefore, if GPD is a reasonable model for data exceeding a threshold u, then, the model should be reasonable for higher threshold u 1 .The argument suggests two graphs of shape and modified scale against different threshold values with confidence intervals for each of these quantities.
The two graphs are called probability and quantile plots.The technique is carried out after the model estimation (Figs. 4,5).

Determining The Parameters
Once the threshold value has been specified, estimating the shape and scale parameters would be the next step of developing the model.Based on the size of the data, different parameters estimators methods can be applied to estimate the GPD parameters.Mackay et al. (2011) conducted a comparison of several estimators by Monte Carlo simulation, and argued that likelihoodmoment (LM) estimators is the most substantial method with the lowest bias and variance.Zhang & Stephens (2009) proposed an estimation procedure, and called it modified likelihood moment estimators (LME).Dupuis & Tsao (1998) proposed a hybrid estimators of two well-known estimators: method of moments (MOM), and probability weighted moments (PWM).
The method was derived by incorporating a simple auxiliary limitation of the estimates.
In addition, de Zea Bermudez & Kotz (2010) carried out an extensive study on several types of methods for estimating the GPD parameters.The authors argued that the success of GPD on a set of data depends substantially on the process of parameter estimate, and maximum likelihood (ML) estimators has been the most popular method among other estimators.However, the ML estimators exists only for the shape parameter ξ ≤ 1 because for ξ > 1 the log-likelihood becomes infinity.In this study, the maximum likelihood (ML) estimators was employed to estimate the parameters.The log-likelihood formula for ξ = 0 was derived from Eq. (1) as follows, As foregoing discussion, numerical techniques are required to estimate the shape and scale parameters.

Return Level and Confidence Intervals
Subsequent to the estimation of the shape and scale parameters, the return levels should be estimated.Coles (2001) argued that the interpretation of extreme value models in terms of return levels are usually more convenient than using individual parameter values.Therefore, we assume that GPD with defined shape and scale parameters is an appropriate model for the exceeded data over a threshold u to estimate m-observation return level, x m , where σ and ξ are the scale and shape parameters, respectively.And ζ u is described as ζ u = P r{x i > u}, in which x i stands for the wave heights, and u is the threshold value.Consequently, the return level x m , which is exceeded on average once every m observations being the solution of Eq. ( 4).Also, the parameters m and ζ u can be evaluated by Eqs.(5, 6) as follows, in which ARI and N t represent, respectively, the Average Recurrence Interval (return period), and the total number of data during a period of k years.Equation ( 6) can be employed to compute ζ u , By substituting Eqs.(5, 6) into Eq.( 4), x m can be computed as, where N is the number of events exceeding a threshold u.Subsequently, the confidence interval for the return graph should be computed.There are several methods to derive confidence intervals for x m , variance-covariance matrix, and delta methods are reliable methods (Davison & Smith, 1990;Coles, 2001;Pickands, 1975).
To check the quality of the modeling, probability plot, return level plot and quantile plot are usually evaluated (Coles, 2001).

Peaks-Over-Threshold Model (Goda Method)
The Peaks-Over-Threshold (POT) model was introduced by Goda (1988), and has been developed in this study for further evaluation.The model can be developed with a long length dataset to obtain the results in a smaller range of confidence intervals (Goda, 2010).Following is a summary of developing the model which were described by Goda (1988Goda ( , 2000Goda ( , 2010)), as well as the proposed methods added by this study in order to improving the model: 1.After arranging the data in descending order, specifying the threshold value is the next step of the analysis.
2. Due to the lack of a suitable method to determine a true threshold value, this study introduced a reliable approach, which is called Hybrid method.The Hybrid method consists of two graphs, as following descriptions: Ocean Sci.Discuss., doi:10.5194/os-2016-47,2016 Manuscript under review for journal Ocean Sci.Published: 7 July 2016 c Author(s) 2016.CC-BY 3.0 License.
At the first stage, the method of mean residual life (MRL) plot (Fig. 2) is recommended to be evaluated for determining a true threshold value.The plot is a part of the GPD model for determining threshold value which was described in the previous section.In this study, for the first time, the MRL plot was employed for determining threshold value for the POT model.The prerequisite condition to employ the MRL plot is to secure the iid measurement for the data contributing in the analysis (Coles, 2001).The MRL plot is based on the determination of deviation from mean in the population.
It is obvious that the extreme value analysis methods, such as the POT model, deal with standard deviation (errors of residuals) from the mean.Accordingly, the MRL plot can be employed in the POT analysis as the first part of Hybrid method to determine a true threshold value.
Second, the second part of the Hybrid method is done after determining the best fitting distribution function.Once the best fitting distribution function has been selected, several graphs of fitting for different threshold values should be drawn.For instance, in this study, several fitted Weibull distributions with shape parameter k = 1.4 have been depicted for different threshold values (Fig. 12).Ultimately, the most suitable threshold candidate(s) obtained from the first part (the MRL plot), should be assessed and evaluated.The thresholds, which were used in the figure can be chosen by considering the order and type of data.It is important to select the threshold values with considering the candidate(s) obtained from the MRL plot.Usually, 5 to 7 threshold values are sufficient to guide the user to make a suitable decision.
In most cases, the results of the first part of the Hybrid method determines a true threshold, however, the second part can be used for removing any misinterpretation of the MRL plot (more details in step (10)).
3. During evaluating a sample of extreme data, sometimes, an individual data exhibits a value much greater than other data.
In the process of fitting the sample of data to a candidate distribution, that particular data is plotted at a position far above the line of the fitted distribution of return period curve.Such data is known as outlier(s).Therefore, the sample of extreme data contributing in the analysis should be examined to explore for the presence of outlier(s).The process of searching for outlier(s) can be done by several methods such as a statistical examination by Bamnett & Lewis (1994) or using of DOL (Deviation of OutLier) criterion method Goda (2000); Goda & Kobune (1993).
4. Calculating the two following essential sample parameters: (a) Mean Rate λ, this parameter determines the number of observed data per year, λ = N T /k.Where N T , is the number of events and k, is the period of observed wave heights, in year.
(b) Censoring ν, it determines a fraction of data that contributes in the analysis, ν = N/N T , in which N is the number of data applied in the analysis and N T , number of the total data.
5. In this case, the POT model is applied with nine distribution candidates including FT-I, four FT-II distributions with fixed shape parameters 2.5, 3.33, 5, 10, and four Weibull distributions with fixed shape parameters 0.75, 1, 1.4, 2.Then, the Where A and B are the scale and location parameters, respectively; k is the shape parameter (dimensionless); and x stands for the extreme variate (wave height).
6.In the next step, the assignment of the non-exceedance probability F(m) , or plotting position for the nine distribution functions are determined (Eqs. 11,12,13).Gringorten (1963) was the first who introduced a formula to FT-I and claimed that "It yields no bias".Goda (1988) examined the formula and concluded the same result as, For the Weibull distribution, Petrauskas & Aagaard (1971) introduced a formula (known as P &A formula).Seventeen years later, Goda (1988) argued that the formula is biased, and proposed a modified version to rectify its biasness.Thus, the Goda or the modified P &A formula is written as follows, as follows, . The next step is to calculate the reduced variate y (m) for the m th ordered data.Equations (14, 15, 16) are the reduced variate for the three distribution families.
W eibull : 8. After determining the threshold value, the exceeded data over a certain threshold are employed in the analysis.Then, the plotting positions in Eqs.(11,12,13) are evaluated to use in Eqs.(14,15,16).The next step would be the estimation of the parameters A and B by means of the Least Squares method.The input data used in the Least Squares method are the extreme variates with the corresponding reduced variates for each distribution function (Goda, 2000).

Best Fitting for the Dataset
In this step, three methods are discussed: Correlation Coefficient method and MIR criteria which are recommended by Goda (2000), and the method of Norm of Residuals (N.o.R), presented by this paper.In this study, during several examinations, the results of MIR indicated some instabilities in obtaining the best fitting distribution function.
(c) Due to instabilities come with results of the MIR criterion, this study proposed an alternative method instead of the MIR criterion method.The method is called Norm of Residuals (N.o.R), as the following formula, where RSS is the residual sum of squares, y i and ŷi are the original data values and modeled values, respectively.
The value of N.o.R represents how well dataset fit with a statistical model.It varies between zero and infinity; however, values closer to zero indicate better fit.
(d) R square (R 2 ) is also calculated for each distribution.R 2 is always ranged between 0 and 1 (including both numbers), the closer value to 1 indicates better fit, it has the following formula, where TSS is the total sum of squares, and ȳ stands as the mean of y.
10. Second Part of Hybrid Method: Once the best fitting distribution for the dataset has been verified, the second part of Hybrid method is done to examine the reliability of the obtained threshold from the first part of conducting the Hybrid method.
In this step, two criteria, recommended by Goda (2000) are imposed on the candidates (the most probable true threshold values) obtained from the first part of Hybrid method.The two criteria are as follows: First, the threshold would be better set to maximize the number of storm.Second, if the threshold value is set too high, the number of exceedances data becomes too small.If it is considered too low, a sequence of several storm events could be judged as one event and the number of data will be decreased.
The two recommendations (criteria) should be considered during the visual assessment of the graphs in (Fig. 12) to choose the most suitable candidate obtained from the first part of Hybrid method.
11.In this study, the credibility of the N.o.R method was examined by another goodness-of-fit test .The Chi-Square test has been a reliable method to examine the distribution goodness-of-fit (Romeu, 2003).It uses the following formula, 12.The return values were estimated using Eqs.(22 -25), which is the next step of developing the POT model, where Â and B are the estimated scale and location parameters, respectively.Xm is the return value, and y R is the reduced variate, which is calculated as a function of the return period.In other words, by using the following equations, the return values are estimated by extrapolating the best-fit probability distribution.
where λ is the mean rate of the sample.
13.The 90% confidence intervals band was calculated by empirical formula introduced by Goda (1992).The empirical formula were categorized and described several years later as a textbook (Goda, 2000).

Results and Discussions
3.1 The GPD Model

Threshold Value
Coles (2001) recommended to use two techniques to determine the threshold value for the GPD model; the plot of mean excess against threshold (Fig. 2), and the complementary technique to fit GPD at a range of thresholds (Figs. 4, 5).
The first method, Fig.
(2) represents that the graph follows a linear trend until the threshold exceeds 2 m.After which, from u = 2.2 m, it is approximately linear until u = 2.9 m.If we choose u = 2.9 m, there will be only 43 exceedances of the threshold, not quite enough to make inferences, and the analysis leads to high variances.Moreover, the information in the plot with a large threshold value could be unreliable due to the low number of data on which the estimate of return periods and confidence interval are based.Accordingly, it is better to conclude u = 2 m as the threshold value for the data.

Parameter Estimation
The ML estimators is applied to estimate the shape and scale parameters for the exceedances data, which are equal to 263 independent storm events.The defined threshold is u = 2 m and the estimated parameters are ξ = −0.1442and σ = 0.6451.
Numerical method was used to calculate the parameters.However, because of converging the estimation to problem, care must be taken to avoid instabilities in numerical work around ξ = 0.As discussed in the preceding section, ξ = 0 is the exponential distribution, in which the shape of generalized Pareto distribution could be heavier or lighter tailed alternatives.Hence, the restricted attention should be assigned to the case −1/2 < ξ < 1/2, except ξ = 0.

Return Plot and Confidence Band
The return plot of the GPD model in  The dotted lines indicate the upper and lower confidence intervals band.In this research, the confidence intervals were calculated using the variance-covariance matrix and the delta method (Coles, 2001).
To provide the variance-covariance matrix the following evaluations are required: The first and second derivatives of the log-likelihood formula with respect to the parameter σ can be obtained as, The first and second derivatives of the log-likelihood formula with respect to parameter ξ can also be obtained as, To calculate the second derivatives of log-likelihood formula with respect to σ and ξ, the first derivative of the log-likelihood formula should be derived with respect to parameter ξ.The second derivative of Eq. ( 28) with respect to σ can be derived as, The results of Eqs.(27,29,30) are substituted in the following matrix, Ultimately, the confidence intervals band could be obtained as the following procedure: after substituting the calculated values in the M matrix, the inversion of the matrix should then be calculated.The return matrix will be the Variance-Covariance matrix for the exceeded data over a threshold u.Then, the delta method can easily be calculated based on the instructions provided by Coles (2001).

Complementary Technique (Threshold Value)
The plots of modified scale (σ * = σ − ξu) and shape against threshold values are the complementary technique to fit GPD at a range of threshold values, and to check the stability of parameter estimates.If GPD is a reasonable model for exceedances data over a threshold u, then excesses of a higher threshold, u 1 should also follow a generalized Pareto distribution.It should be noted that the shape parameters of the two situations are identical (for u and u 1 ).The GPD modified scale parameter and shape parameter are plotted against different threshold values (Figs. 4, 5).
The figures represent that the modified scale and shape parameters are approximately constant until u = 2m, after which the graphs slightly rise and decline, respectively.Thus, the threshold value for the Labuan dataset is selected as u = 2m for this GPD analysis.

Sensitivity of GPD With Different Threshold
The examinations on the model of Generalized Pareto Distribution indicated that the model is sensitive with any change in the value of threshold.The threshold value at 2.9 m represents a sharp increase in graphs 5, 10, 25 and 50 years.The instability in the graphs at 2.9 m occurred due to the low number of exceeded data contributing in the analysis.In addition, low sensitivity for 5 and 10 years return periods can be observed for different threshold values.
However, for higher return periods the GPD model represents different behaviors (higher sensitivity) in estimating the significant wave height.

Best Fitting Distribution (POT model)
In this study, based on Goda (2000) recommendation, the Correlation Coefficient (C.C) and MIR criterion were calculated to 5 find the best fitting distribution.Table (2

Determining Outliers
Usually, at this stage of developing the POT model, data should be assessed in order to find out the possible outlier(s) among the  Table 2.The results of three goodness of fit methods; proposed by Goda (2000), and this paper.outlier(s) using suitable methods.In this study the recommended DOL criterion method by Goda (2000) was employed.By definition, if the DOL method detected any outlier, it is recommended to check the process of data acquisition.If no error was occurred, therefore, instead of removing the outlier, the first fitting distribution should be removed from the list of distribution candidates.Then, the second best fitting distribution should be chosen for the analysis.
In this study, the DOL method has found no any outlier among the data.However, to explain the proposed methods and to

5
show the instability of the MIR method, it is assumed that there has been an outlier among the dataset.As foregoing discussions,
if DOL detected any outlier, the data should be checked to find out human errors or instrument malfunction in the process of data collection.If the quality of data has proved that there is no error in the data acquisition process, then instead of removing the data, the first fitting distribution should be removed from the analysis.Table (2) shows that the first fitting distribution is Weibull (with k=1.4).
Therefore, the second candidate shows its importance, for instance, in this case, the C.C, R 2 and N.o.R methods represent Weibull (k = 1) as the second candidate (Table 2).However, the MIR criterion method represents a different candidate (FT-II with k=10).Means that the methods obtained different parent distribution as the best fitting parent distribution function.
Therefore, another goodness-of-fit method should be used to specify a suitable parent distribution.In the next section, the Chi-Square test is developed to figure out the best parent distribution function.

Developing the Chi-Square Method
Evaluating the underlying distribution by a goodness-of-fit method such as the Chi-Square method offers a suitable process to examine the accuracy of selected distribution as the best fitting for a set of data.By definition, in extreme value analysis, a particular distribution is usually considered as the underlying distribution, and it is assumed that the data should follow this pre-specific distribution.Corder & Foreman (2009) and Romeu (2003) in separate literatures argued that this assumption could be risky if the considered underlying distribution is not correct, and consequences of incorrectly identifying a distribution may prove costly.Vivekanandan (2015) employed the Chi-Square test to evaluate the adequacy of fitting of probability distributions for their data and noted that the Chi-Square method is a credible approach.
The results of Chi-Square test were listed in Table (3).First, the dataset should be divided into 9 bin-sectors (k = 9 subintervals) by end points 2.12, 2.2, 2.3, 2.5, 2.7, 2.85, 3.1 and 3.6, which are listed in the second column of the table.By applying the probability density function (PDF) of Weibull distribution, the non-exceedance probabilities are calculated in third column.
In fourth column, the probabilities of the individual sub-interval from the third column are listed.Then, in the column fifth, the results in column four should be multiplied by the total sample size contributed in the analysis.In the sixth column, the number of samples from each sub-intervals are listed.Finally, in the last column the test parameters are calculated.
The Chi-Square table (http://www.unc.edu/farkouh/usefull/chi.html) includes the information of probabilities of several percentage of confidence intervals and values of freedom where k is the number of bins and n is the number of parameters for the assumed distribution (Romeu, 2003).
According to the table of Chi-Square, the crosspoint of two values consisting of 95% probability, and DF = 6, spots the value of 12.53.The calculated Chi-Square statistic value for Weibull distribution based on Table (3) is 10.66, which is less than 12.53.Therefore, it is assumed that the distribution of the population originating from the data is Weibull, which proves the accuracy of the Norm of Residual and Correlation Coefficient methods, and shows the instability of the MIR criterion method.In this study, for the first time, the sensitivity of the POT model with different threshold values was scrutinized (Fig. 12).The

Second Part of the Hybrid Method
Once the best fitting distribution for the dataset has been verified, it is important to examine the reliability of the threshold obtained through the first part of the Hybrid method.To select suitable threshold value, Goda (2000) noted that the threshold would be better set to maximize the number of storm wave heights.However, if the threshold is considered too low, a sequence of several storm events could be judged as one event and the number of data will be decreased, and the inference leads to bias.On the other hand, if the threshold value is set too high, the number of exceedance data becomes too small, again it leads to bias. Figure ( 12) represents several return periods corresponding to different thresholds (0.5, 1, 1.5, 2, 2.6, and 3 m).The results from the first part of the Hybrid method can be visually judged on Fig. ( 12) with considering the above-mentioned two criteria for lower and higher thresholds.For instance, in this case, in the first part of Hybrid method, u = 2.9 m was one of the another possible threshold candidate (Fig. 2), however, based on the second limitation and visually judgment, u = 2.9 m is not an appropriate threshold value because it is too high.Thus u = 2 m is selected as the best choice.

Return Plot
The return plot in Fig. ( 13) represents that the Weibull distribution with k = 1.4 is sufficiently fitted for the date.The graph was depicted based on average recurrence intervals on the x axis to evaluate the significant wave heights in different return periods.

Comparison of the Two Models
The higher number of data contributing in the analysis for the models of POT and GPD provide narrower confidence intervals band.Therefore, choosing lower threshold value (if possible), could be an easiest way to increase the number of data, consequently, the narrower confidence intervals band will be obtained (for both models).Thompson et al. (2009) proposed a method to determine threshold value for the GPD model, and argued that lower threshold value yields narrower confidence intervals band.It means that the higher number of data returns lower uncertainties in the GPD model.However, if the threshold value was selected too low, it is likely to violate the asymptotic basis of the model and the model leads to bias (Coles, 2001).In contrast, if the threshold value was chosen too high, little excessive data remain to estimate the model, and the inference leads to high variance.Therefore, a suitable selection of threshold value makes a balance between bias and variance in GPD model (Scarrott & MacDonald, 2012).
The comparison of Fig. ( 13) with Fig.
(3) discloses that the 90% confidence intervals band obtained by the POT model is narrower than the one estimated by the GPD model.It means that there are lower uncertainties in using the POT model.With regard to the importance of lower uncertainties as a significant criterion for designing maritime structures, narrower confidence intervals band is desirable.There are many reasons that can change the uncertainties of confidence intervals band (in both models) such as the number of data contributing in the analysis, the accuracy of the estimators methods (the measurement of their biasedness), the accuracy of choosing threshold value, and the compatibility of the data with the extreme model (it directly or indirectly relates to the iid measurement of the data).Therefore, since the two models have used the same dataset in this study, the narrower confidence band is an advantage for the POT model.However, with regard to the POT inferences, the fixed shape parameter can cause unrealistic narrower confidence band, but for this model, it has not been proven, yet.
The estimated return values for the GPD and POT models, as well as the report of the Coastal and Offshore Engineering Institute, Universiti Teknologi Malaysia are shown in Table (4 However, it should be noted that the Gumbel distribution function was one of the POT distribution candidates (Fig. 9).Based on the results of the N.o.R method, it was not a best fitting distribution for the dataset (Table 2).Thus, the use of a such stable method (norm of residual) to determine the best fitting distribution shows its importance, again.

Conclusions
In this study, two commonly used extreme wave analysis models, namely Peaks-Over-Threshold (POT) and Generalized Pareto Distribution (GPD) were developed in order to highlight their merits and limitations, and compare them together for evaluating The data were collected within the Marsden Square Numbers 2554, 2555, 2564 and 2565 (the rectangular area within the magnified section of Fig. (1)) in the Federal Territory of Labuan off the coast of Sabah, Malaysia.The Labuan metropolitan coast is located at the South China Sea on the continental shelf of Malaysia.National Climatic Data Center, USA compiled the dataset, which is known as Labuan data.The Labuan data are available from the wave database of the Drainage and Irrigation Department, Malaysia.

Figure 1 .
Figure 1.Location of Federal Territory of Labuan, and the surrounding offshore area.

8
23 √ k Ocean Sci.Discuss., doi:10.5194/os-2016-47,2016 Manuscript under review for journal Ocean Sci.Published: 7 July 2016 c Author(s) 2016.CC-BY 3.0 License.The plotting position empirical formula for the FT-II distribution function, was introduced byGoda & Onozawa (1990) (a) The Correlation Coefficient of the order statistics (data) x m and their corresponding reduced variates y (m) should be computed to determine the best fitting distribution for the dataset.The largest value represents a distribution function with the best fit for the dataset.In some literatures, it has been shown by R 2 .The correlation coefficient value is the square root of R 2 .(b)The MIR criterion is a method to evaluate the reliability of the results of correlation coefficient (the best fitting distribution is the one with the smallest ratio).However,Goda (2000) recommended that the criterion of the correlation Ocean Sci.Discuss., doi:10.5194/os-2016Discuss., doi:10.5194/os--47, 2016     Manuscript under review for journal Ocean Sci.Published: 7 July 2016 c Author(s) 2016.CC-BY 3.0 License.coefficient (CC.) produces better results than that of the MIR criterion, and concluded that further examination on the MIR criterion is required.
21) Ocean Sci.Discuss., doi:10.5194/os-2016-47,2016 Manuscript under review for journal Ocean Sci.Published: 7 July 2016 c Author(s) 2016.CC-BY 3.0 License.Where o i is the observed number of data in the specific bin, e i is the expected number of data in bin i, and k stands for the total number of bins.k − 1 − nep, is the Chi-Square degrees of freedom, in which nep is the abbreviation of Number of Estimated Parameters.The Chi-Square test is not part of the POT model.In this study, it has exceptionally been used to examine the reliability of the N.o.R method.
Fig. (3) represents that the GPD model was fitted for the wave heights data.The abscissa of the graph represents the average recurrence intervals (ARI) from 1 to 100 years against the significant wave height (SWH) 10 at y axis.

Figure 3 .
Figure 3.Return plot for the GPD model, the graph sufficiently fitted for the wave height data.
Figure (6) represents the sensitivity of return values (SWH) of the GPD model with different thresholds (u) in a range of return periods (ARIs).10 14 Ocean Sci.Discuss., doi:10.5194/os-2016-47,2016 Manuscript under review for journal Ocean Sci.Published: 7 July 2016 c Author(s) 2016.CC-BY 3.0 License.

Figure 6 .
Figure 6.Sensitivity of the GPD model with different threshold u.
Examinations on GPD ModelQuantile-quantile (QQ), probability-probability (PP) plots(Figs.7, 8), and return plot (Fig.3) leave no doubt that the generalized Pareto distribution model is fitted for the dataset.The three described plots represent the quality of the fitted GPD for the dataset.If the GPD model is reasonable for the data excesses, the points in the QQ and PP plots should be close to the diagonal line(Figs.7, 8), meaning that they should be arranged in a linear manner.In both plots, the points are sufficiently close to the 10 diagonal line to lend support to the fitted model.

Figure 7 .
Figure 7. Probability plot for the GPD model.

Figure 8 .
Figure 8. Quantile plot for the GPD model.

Figure 9 .
Figure 9. Fitting FT-I (Gumbel) distribution function for the data, the threshold value u = 2 m, and the horizontal axis for the fitted graph is Reduced Variate (dimensionless).

Figure 10 .
Figure 10.Fitting four FT-II (Frechét) distribution functions for the data, the threshold value u = 2m, and the horizontal axis for the fitted graphs are Reduced Variates (dimensionless).
) shows the results of Correlation Coefficient and MIR criterion, as well as the results of the Norm of Residuals (N.o.R) method.All three methods (MIR, N.o.R and C.C) show that the Weibull distribution (with k=1.4) is the best fitting distribution function among nine distribution candidates.
10 dataset.The possibility of existing any outlier among the data could early be anticipated by doing data analysis and drawing box plot to check outlier(s).For instance, if the box plot represented any outlier, the data should be re-evaluated for any plausible Ocean Sci.Discuss., doi:10.5194/os-2016-47,2016 Manuscript under review for journal Ocean Sci.Published: 7 July 2016 c Author(s) 2016.CC-BY 3.0 License.

Figure 11 .
Figure 11.Fitting four Weibull distribution functions for the data, the threshold value u = 2 m, and the horizontal axis for the fitted graphs are Reduced Variates (dimensionless).
figure represents the best fittings of Weibull distribution with shape parameter k = 1.4 for exceeded data over several thresholds versus reduced variate.The advantage of using reduced variate for abscissa rather than ARI is that the graphs can separately be drawn, consequently, they could be evaluated easily.The figure shows that lower thresholds yield higher return values due 5

Figure 12 .
Figure 12.Sensitivity of the POT model in estimating 100 years return values when different threshold values are employed, [The top end of the estimated lines indicate the return values estimated for 100-year return periods (the graphs are fitted Weibull distribution with k = 1.4,and u = 0.5, 1, 1.5, 2, 2.6 and 3)].

Figure 13 .
Figure 13.The return plot of the POT model with 90% confidence intervals band upper and lower the estimated line.
).The obtained return values by the GPD model are remarkably close to those estimated by the POT model.The proximity of the return values represents that the proposed methods -Norm of Residual and Hybrid methods -have improved the POT model.However, the third row of Table (4) represents the return values obtained by the Gumbel distribution, and using the same dataset.It indicates a higher return value for 100 years return period.
OceanSci.Discuss., doi:10.5194/os-2016-47,2016   Manuscript under review for journal Ocean Sci.Published: 7 July 2016 c Author(s) 2016.CC-BY 3.0 License.their results.The results of examinations represented that the POT model introduced by Goda in 1988 was not equipped with a suitable approach to determine threshold value, moreover, the MIR criterion method introduced by the author to determine the best fitting distribution function has shown some instabilities in determining the results, especially where the parent distribution function was unknown.To deal with the limitations, two methods, namely Hybrid, and Norm of Residuals were proposed in this study.The drawn results are summarized as follows: i) the proposed Norm of Residuals method is a stable approach with reasonable results to specify the best fitting distribution function, and its credibility has been proven by Chi-Square test.ii) the proposed Hybrid method to select a suitable threshold value for the model of POT is reliable, feasible and completely matched with the model.iii) the obtained confidence intervals band by the POT model is narrower than the one obtained by the GPD model.The narrower band provides lower uncertainties which is desirable for the designers.The lower uncertainties of the POT model is related to several reasons: number of data contributing in the analysis, the employed distribution function, the 10 estimators method, and the approach of determining threshold value.iv) the proximity of the obtained return values of the two models (POT and GPD) demonstrated that the two models are credible and sufficiently fitted for the data, and the proposed approaches (N.o.R and Hybrid method) have strikingly improved the POT model as well.Ocean Sci.Discuss., doi:10.5194/os-2016-47,2016 Manuscript under review for journal Ocean Sci.Published: 7 July 2016 c Author(s) 2016.CC-BY 3.0 License.Ocean Sci.Discuss., doi:10.5194/os-2016-47,2016 Manuscript under review for journal Ocean Sci.Published: 7 July 2016 c Author(s) 2016.CC-BY 3.0 License.

Table 1 .
Extreme wave heights estimated by Gumbel distribution and the Labuan data (UTM Archive).

Table 3 .
Intermediate values for the Weibull Chi-Square test.Sensitivity of the POT Model With Changes in Threshold Value

Table 4 .
The results of significant wave heights for a range of several return periods obtained by this study, and the Institute of Coastal and Offshore Engineering, Universiti Teknologi Malaysia (2009).