Spatial Bayesian hierarchical modelling of extreme sea states

A Bayesian hierarchical framework is used to model extreme sea states, incorporating a latent spatial process to more eﬀectively capture the spatial variation of the extremes. The model is applied to a 34-year hindcast of signiﬁcant wave height oﬀ the west coast of Ireland. The generalised Pareto distribution is ﬁtted to declustered peaks over a threshold given by the 99.8th percentile of the data. Return levels of signiﬁcant wave height are computed and compared against those from a model based on the commonly-used maximum likelihood inference method. The Bayesian spatial model produces smoother maps of return levels. Furthermore, this approach greatly reduces the uncertainty in the estimates, thus providing information on extremes which is more useful for practical ap-plications.


Introduction
A detailed knowledge of the extreme sea states affecting a region is essential for any marine activity. For shipping, offshore and coastal installations, or the deployment of devices such as wave energy converters, it is crucial to have accurate information on the extremes likely to be encountered during operational 5 lifetimes. These are typically expressed in terms of return levels and periods; for example, the level of significant wave height which is likely to occur on average once every 100 years. Extreme value theory provides statistical tools for such an analysis (Coles, 2001) and the methods have been widely applied in studies of ocean waves; reviews may be found in Vanem (2011) and Jonathan 10 and Ewans (2013). The background theory for this extreme value analysis is outlined in Section 2 below.
Models of extremes are often fitted to data-sets using a maximum likelihood approach. Although straightforward to implement, this can lead to large uncertainties in the parameter estimations and subsequent return levels 15 (Vanem, 2015). Obviously, we wish to reduce the levels of uncertainty and obtain meaningful results which are of practical use. Bayesian inference allows for a more detailed analysis of this uncertainty, by providing complete probability distributions for the parameters given the data (Gelman et al., 2013).
Our aim in this paper is to use Bayesian techniques to model the spatial vari- 20 ability of ocean wave extremes. We follow the approach of Cooley et al. (2007), who include a latent spatial process within a Bayesian hierarchical framework to capture the spatial dependence of precipitation extremes. This is described in detail in Section 3. Such a model has been applied to the study of temperature A description of the domain and data under study, along with model imple-2 mentation details, is given in Section 4. The results are presented in Section 5 35 with a discussion and conclusions in Section 6.

Background theory
There are a number of possible approaches to extreme value analysis. An introduction to the field may be found in Coles (2001). One fundamental 40 method is the block maxima approach. We consider a sequence of independent and identically-distributed random variables, Z 1 , Z 2 , . . ., and let M n = max (Z 1 , . . . , Z n ) be the maximum over a block of n values; for example, we may take M n to be the annual maxima in a multi-year set of significant wave height data. The extremal types theorem states that, under certain regularity 45 conditions, the distribution function of the M n will converge to a specific threeparameter form, known as the generalised extreme value (GEV) distribution.
A major disadvantage to this approach is the fact that, by using only the maxima from a given block size, we are discarding a lot of data. In this work we consider a data-set of hourly significant wave height, H s . Modelling with, 50 for example, annual maxima would be quite wasteful. An alternative is to model the excesses over a given threshold (Pickands, 1975). We assume that our sequence of independent random variables, Z 1 , Z 2 , . . ., satisfies the extremal types theorem described above. For large enough threshold u, the distribution function of the exceedances Y = Z − u, conditional on Z > u, is approximately 55 given by the generalised Pareto distribution (GPD) defined on the set {y : y > 0 and (1 + ξy/σ) > 0}. Here, ξ and σ are known as the shape and scale parameters, respectively, and have ranges −∞ < ξ < ∞ and σ > 0. For the limiting value when ξ = 0, we get the exponential distribution

65
Once we have the parameters of a distribution, we may compute the N -year return levels. For the GPD in (1), we have We write ζ u = P (Z > u) and can then find the return level z m , the level which is exceeded on average once every m observations, by solving Letting m = N n y , where n y is the number of observations per year, we arrive at the following expression for the N -year return level: For the case of the exponential distribution with ξ = 0, we have Given a set of data, we may fit one of the models described above. The maximum likelihood (ML) estimation method is commonly used. We can consider a set of n independent values, z 1 , . . . , z n , to which we wish to fit a probability density function f (z; θ), where θ is a parameter of the distribution. The likelihood function is given by The maximum likelihood estimatorθ is found by maximising the above likelihood function or, more usually, the logarithm of L(θ). Asymptotic properties of the ML estimate, which assume Gaussian behaviour, may then be used to compute confidence intervals. Furthermore, the so-called delta method provides confidence intervals for quantities derived from the parameter estimates; for ex-75 ample, the return levels in (3). Details of these are given in Coles (2001), along with a discussion of other methods for fitting and analysing uncertainty, such as the profile likelihood method.
A further alternative is to use Bayesian inference for parameter estimation (Gelman et al., 2013). Continuing the above example, we use Bayes' Theorem Thus, we arrive at a posterior distribution, f (θ|z), from a combination of the likelihood of the data and a given prior distribution f (θ). Whereas the ML method gives a point estimate of a parameter, with the Bayesian approach the parameter is described by a complete distribution. This allows us to characterise 85 the uncertainty in a natural way. Rather than appealing to asymptotic theory for confidence intervals, we may use, for example, the percentiles of the posterior distribution. For studies involving extreme value modelling of significant wave height, the global or regional data-sets used have come from satellites (Vinoth and Young, it was assumed that regional extreme precipitation is driven by a latent spatial process, defined by geographical and climatological covariates, and that effects not fully captured by the covariates are captured by the spatial structure in the 125 hierarchies, using Gaussian processes. Inference was then conducted using an MCMC algorithm. This approach has since been used in oceanographic applications by Oliver et al. (2014), to analyse extremes of sea surface temperatures.
In this current work, we apply a similar model to extreme significant wave

Model Details
The aim is to produce N -year return levels of significant wave height, H s . 135 We described earlier how these could be computed using (3). Given a dataset, we require a suitable threshold u, the parameters from the generalised Pareto distribution (GPD) for modelling the exceedances, and their probability of occurrence ζ u . The choice of threshold will be discussed later in Section 4.
For the exceedances and the probability ζ u , we follow the approach of Cooley The second describes the latent spatial process underlying the extremes in the region while the third layer consists of the prior distributions on the parameters controlling the second.

145
Using Bayes theorem, under a three-layer hierarchical model the inference for the vector of parameters θ 1 (for the GPD of exceedances or the probabilities ζ u ) is given by where the p j are the probability densities with indices associated with the levels of the hierarchy and Z(x) specifies the data at a given location x. We now 150 describe the two hierarchical models. A directed acyclic graph (DAG) depicting the hierarchal structure of the models in detail is given in Fig. 1. we reparameterise with φ = log σ. At this level we thus have two spatiallyvarying parameters for the distribution, which we collectively write as The first term in the hierarchy (5) is then derived from the density function for the GPD and given by the likelihood function where the indices i and k are such that z k (x i ) refers to the k-th exceedance 155 at grid-point x i . We have denoted the number of grid-points by n x and the number of exceedances at each point x i is then n i .

Process layer
Both φ(x) and ξ(x) are modelled as Gaussian processes (Banerjee et al., 2014) and so the second term in (5) will take the form A similar expression is used for p ξ (ξ(x)|µ ξ , Σ ξ ). Here | . | denotes the determinant and θ 2 above represents all of the hyperparameters for µ φ , µ ξ , Σ φ and Σ ξ , 160 to be discussed below.
A Gaussian process characterises an infinite-dimensional smooth surface such that any finite collection of n x points on the surface follows a multivariate normal distribution (above) of dimension n x . Such a smooth surface is an appropriate choice for the model parameters as we expect similar wave climates at nearby 165 locations.
In addition to distance, the effect of any other covariates may be readily incorporated into the model. For m covariates c (1) , . . . , c (m) , we write the mean vector in the general form In this work we have 170 used m = 3: the longitude, latitude and depth of a grid-point, and therefore we will have four coefficients α φ,j .
The use of the Gaussian process also offers great flexibility through the choice of the covariance matrix Σ φ . Here, we use the matrix given by where I is the identity matrix. The matrix E is given by an exponential corre-175 lation function and has components For two grid-points x i and x j , the vector d(i, j) has two components given by the differences in longitude and latitude between x i and x j .
The 2 × 2 matrix β is symmetric positive definite. Its entries measure how quickly spatial dependence drops off in the two different directions. The other 180 parameters of the covariance matrix appearing in (7) are the partial sill ς 2 φ and the nugget parameter τ 2 φ . Further details on variogram analysis may be found in Zimmerman and Li (2012).
As mentioned, we assume the same Gaussian process model for the shape parameter ξ. Similar expressions as those above are used for µ ξ and Σ ξ .

Priors layer
Finally for the third layer in the hierarchical model, priors must be assigned to the hyperparameters, which are assumed to be independent. For those in (6), a normal distribution with large variance was selected: the covariates were re-scaled to be centred on zero and priors α φ,i , α ξ,i ∼ N(0, 50) were used. A 190 lognormal prior was employed for the positive ς 2 φ and τ 2 φ parameters in (7); that is, their logarithm was assumed to have the normal distribution N(0,10).
For the entries of the matrix β in (8)

Modelling the frequency of exceedances
We now turn to ζ u , which is defined as the probability that the threshold u is exceeded and is needed in (3) to compute return levels. For a given choice of 200 threshold (discussed in Section 4), we let ζ(x i ) be the exceedance probability at the location x i . It is again assumed that there is a latent spatial process driving this and a hierarchical model is used, with data, process and prior layers.
At the data layer it is assumed that, at each grid-point i, the number of declustered threshold exceedances N i is a binomial random variable with m i trials (the total number of observations), each with a probability ζ(x i ) of being a cluster maximum. That is, The process layer is similar to that of the GPD parameter φ(x). Following Diggle et al. (1998), ζ(x i ) is first transformed using a logit transformation, where This is then modelled as a Gaussian process as before, with mean vector µ ζ and covariance matrix Σ ζ taking the same form as in (6) and (7), respectively.

205
The hyperparameters are then given the same prior distributions as described above in Section 3.1.3.

Data
The data used in this study comes from the 34-year hindcast described in using a data-set from a coarser-resolution hindcast.
Following on from this, we will focus on the region off the west coast indicated in Fig. 2. This domain contains n x = 334 nodes and has a depth ranging from 230 39m to 1902m (see Fig. 2(b)). In Fig. 3  We wish to apply the GPD model of threshold exceedance to this data-set.
The In this study we apply a similar, though slightly stricter, method of declus-270 tering to that of Caires (2011). Two successive sequences of exceedances are considered to be part of the same cluster and system if the time series drops below the threshold for 48 hours or less. We then use the peaks of each cluster for modelling with the GPD.

Model fitting 275
The GPD was fitted to the data-set discussed above using the spatial Bayesian  Fig. 2(b). Their locations and some hindcast details are listed in Table 1.

300
As discussed in Section 2.2 above, the ML approach produces a single estimate of a given parameter and confidence intervals may be derived from its asymptotic properties. We will consider 95% confidence intervals for our estimates and present the lower and upper bounds of these intervals. The Bayesian model, on the other hand, yields a distribution for the parameter. From the 305 values simulated by the MCMC algorithm, we will present the median value of this distribution and, again, confidence intervals bounded by the 2.5th and 97.5th percentiles.

310
In Figs. 4 and 5 we compare the estimates of the GPD shape, ξ, and scale, σ, parameters, respectively. In both cases, the BHM shows less variation over the domain when compared with the ML. Additionally, in particular for σ, the spatial variation is noticeably smoother in the BHM. This is to be expected given the latent spatial processes in the BHM, with the covariances of parameters at 315 different grid-points, given by (7) and (8) can be seen in more detail in Fig. 8, in which we show parameter estimates 330 with confidence intervals for both models for the four points detailed in Table 1.
Looking at Figs. 8(a) and 8(b) for shape and scale, respectively, we see that at each of these four points the confidence intervals for the BHM are contained within the larger intervals for the ML; in fact, this is true for σ at every gridpoint and for ξ at all but five grid-points. 335 We note that the estimates in Fig. 4 show a positive shape parameter in general. Looking at the confidence intervals in Fig. 6 and also, in particular, given the data. The appropriateness of this can be addressed when validating 345 the model. This will be discussed in detail in Section 6 below.

Return levels
We now turn to the N -year return levels of significant wave height, which we denote by H sN . In Figs. 9 and 10, we present spatial plots of H s20 and H s100 , respectively. The estimates are given along with the lower and upper bounds 350 of the 95% confidence interval. The overall patterns of the return levels are broadly similar for the estimates with both models (top panels of Figs. 9 and 10). The main differences can be seen in the size of the confidence intervals, which again are much narrower in the BHM. This is even more evident for H s100 in Fig. 10. showed the most extreme sea states.

370
Return level plots, as described in Coles (2001), are a useful diagnostic tool for assessing the fit of a model, in addition to illustrating the model estimates.
The return level estimates, along with the bounds of the confidence intervals, are plotted against the return period. Empirical return level estimates from the data are added as follows. Given any value z (i) in the data set, the return period is given by whereF (z (i) ) is the empirical distribution function; c.f. Sections 2.6.7 and 3.3.5 of Coles (2001) for more details.
These are shown in Fig. 11 for the four locations in Table 1  The Bayesian spatial model is further validated in Fig. 12 using a posterior predictive distribution . This has been constructed as follows: at a given grid-point i, we have n i exceedances over the threshold.
We then randomly draw n i values for the shape and scale parameters from the respective estimated posterior distributions; these describe n i separate GPDs.

385
For each of these, one value is generated. We now have n i observed and n i predicted exceedances. After ordering, the two sets should match since a good showed much narrower intervals throughout, yielding much higher levels of certainty with the results. This is of crucial importance, as a single point estimate alone is of little practical value without a meaningful measure of uncertainty.

430
Indeed, looking at the 1000-year return levels in Fig. 11, we see upper bounds of nearly 35m for the ML, which seem physically unrealistic.
The threshold chosen for this work was the 99.8th percentile of the H s data at each grid-point. As noted, this approach has been used by a number of authors, with various ranges of percentiles tested. Initially, we fitted the model using 435 lower percentiles: the 97th, following previous work in Clancy et al. (2015), and then the 99th. However, when analysing the validity of the fit, as discussed in Section 5.3, we found that both models (ML and BHM) greatly underestimated the higher return levels when using the 97th. Subsequently in an early draft of this work, modelling using the 99th percentile still showed a bias with the 440 return level intervals for the Bayesian model still failing to capture the higher extremes. Moving to the higher 99.8th as a threshold improved this greatly, as seen from Fig. 11.
As discussed earlier, a very high threshold is necessary for the asymptotic validity of the GPD model, but can result in too few points with which to fit. Increasing the threshold was also found to have a noticeable effect on the shape parameter. With the lower thresholds, the shape was negative, similar to other reported results (for example Caires, 2011            Observed H s (m)