Download as pdf here.
The code can be find here.




Closed-skew Distributions
Simulation, Inversion and Parameter Estimation
Daniel Høyer Iversen
Master of Science in Physics and Mathematics
Submission date:
June 2010
Supervisor:
Karl Henning Omre, MATH
Norwegian University of Science and Technology
Department of Mathematical Sciences
Problem Description
To further develope theory based on the closed-skew Gaussian distribution   and if
possible generalize it to represent heavy-tailed, skew phenomena.
Particular issues:
- improved simulation algorithm
- improved parameter estimators
- definitions of spatially, stationary models
- generalizations
This MSc-thesis is to be carried out at the Department of Mathematical Science
under guidance by Professor Henning Omre.
Assignment given: 18. January 2010
Supervisor: Karl Henning Omre, MATH
Preface
This report is the result of my work in the TMA4905 Master Thesis in the final
semester of my Master of Science degree in Industrial Mathematics. The study is
performed at the Department of Mathematical Sciences, NTNU, during the spring
semester 2010,
I would like to thank my supervisor Professor Henning Omre, for his help-
ful advice and guidance through the entire year. I would also like to thank my
"officemates" Eivind Gausland and Morten Valberg.
Daniel Høyer Iversen
Trondheim, June 14, 2010
v
vi
Abstract
Bayesian closed-skew Gaussian inversion is defined as a generalization of tradi-
tional Bayesian Gaussian inversion. Bayesian inversion is often used in seismic
inversion, and the closed-skew model is able to capture the skewness in the
variable of interest. Different stationary prior models are presented, but the
generalization comes at a cost, simulation from high-dimensional pdfs and pa-
rameter inference from data is more complicated. An efficient algorithm to gen-
erate realizations from the high-dimensional closed-skew Gaussian distribution
is presented. A full-likelihood is used for parameter estimation of stationary
prior models under exponential dependence structure. The simulation algo-
rithms and estimators are evaluated on synthetic examples. Also a closed-skew
T-distribution is presented to include heavy tails in the pdf and the model is
presented with some examples. In the last part the simulation algorithm, the
different prior models and parameter estimators are demonstrated on real data
from a well in the Sleipner Øst field. The full-likelihood estimator seems to be
the best estimator for data with exponential dependence structure
viii
Contents
Preface
v
Abstract
vii
1
Introduction
1
2
Methodology
3
2.1
The CS-Gaussian distribution
3
2.2
Properties of the CS-Gaussian Distribution .
5
2.3
Statistics for the CS-Gaussian Distribution
7
2.3.1
Location statistics
7
2.3.2
Spread statistics
9
2.4
Examples of CS-Gaussian Distributions . .
10
2.5
Bayesian CS-Gaussian Inversion
15
2.5.1
Center of Likelihood Model
17
3
Simulation
19
3.1
Algorithm A: Brute Force Simulation
19
3.2
Algorithm B: Two Steps Simulation
20
3.3
Algorithm C: Iterative Simulation
21
3.4
Empirical Study
25
4
Stationary Prior Models
31
4.1
x-space Prior
32
4.2
tv-space Prior
33
ix
Contents
4.3
Comparing the Priors
35
5
Parameter Estimation in Prior Models
37
5.1
x-space Prior
37
5.1.1
Full Likelihood Model
38
5.1.2
Localized Pseudo Likelihood Model
41
5.1.3
Empirical Test for fL and Localized pL
43
5.2
tv-space Prior
44
5.2.1
Full Likelihood Model
44
5.2.2
Pseudo Likelihood Model
45
5.2.3
Empirical Test for pL
48
5.3
Closing Remarks
49
6
The CST-distribution
51
6.1
Multivariate T-distribution
51
6.2
The CST-distribution
52
6.3
Properties of the CST-distribution
54
6.4
Examples of the CST-distribution
55
6.5
Bayesian CST Inversion
58
7
Inversion Case Study
61
7.1
Seismic Inversion Model
61
7.2
Inversion Results
67
8
Closing Remarks and Further Work
73
Bibliography
75
A Proof of Properties for the CST-distribution
i
A.1
Closed Under Linear Transformations
i
A.2
Closed Under Conditioning
ii
x
Chapter 1
Introduction
The objectives of inverse problems are to get information about a parameterized
physical system from observational data, based on prior information and theoret-
ical relationships between model parameters and data. Inverse problem theory is
used in geophysics to get information about the Earths properties from physical
measurements at the surface. An important inverse problem is found in seismology,
where recorded seismic waves at the Earths surface are used to estimate subsur-
face parameters. These inverse problems are often multidimensional and strongly
affected by noise and measurement uncertainty.
One important subject is prediction of the P-wave impedance, z, in reservoirs
from seismic stacked data. The impedance is defined as the product of the P-wave
velocity, vp, and the density, ρ. P-wave is a longitudinal wave, which means that
the particles in the material moves parallel to the direction of the travel of the wave
energy. The inverted properties are used to get information about reservoir char-
acteristics as lithology-fluid and porosity-permeability. Knowledge about these
characteristics are important for development and production of petroleum reser-
voirs. Oil companies rely on seismic analysis before selecting sites for exploratory
oil wells. The seismic waves are generated by an explosion and geophones are used
to detect the signal. More information about seismology can be found in Sheriff
et al. (1995).
From a statistical point of view the solution of an inverse problem is represented
by a posterior probability distribution. The aim of the inversion is not only to
obtain a best solution, but also to characterize the uncertainty in the solution.
A Bayesian setting makes it possible to combine available prior knowledge with
the information from the measured data. In Mosegaard and Tarantola (1995) a
model that only can be assessed by iterative algorithms is used. This approach is
too computer demanding to be applied in large scale 3D studies. The approach in
Buland and Omre (2003) relies on an analytical evaluation, hence it is extremely
computer efficient.
The approach in Buland and Omre (2003) relies heavily on Gaussian assump-
1
CHAPTER 1. INTRODUCTION
tions, so it can be termed Bayesian Gaussian inversion. It is assumed that the
prior model of the log-transformed elastic material properties is Gaussian. The
log-transform is needed to obtain a linear relation between the seismic data and
the elastic properties. A convolved linearized Zoeppritz relation with an additive
Gaussian error term defines the likelihood model. These assumptions provides a
Gaussian posterior model with model parameters analytical assessable from the
prior and the likelihood model. Then the posterior model can be determined for
high-dimensional 3D problems, but some numerical approximations may be needed
to asses the numerical values of the solution.
In Karimi et al. (2009) the Gaussian assumption in Buland and Omre (2003) is
replaced by a closed-skew (CS) Gaussian assumption, which introduce skewness in
the model for the variables involved. Under the assumptions of a CS-Gaussian prior
also the posterior model will be CS-Gaussian, with model parameters analytically
assessable.
The generalization to the CS-Gaussian distribution comes at a cost. The num-
ber of parameters in the model is larger than for the Gaussian distribution and the
inference from data is more complicated. In this study we focus on the inference
of the CS-Gaussian distribution, different stationary priors and parameter estima-
tion. Algorithms to generate realizations from the CS-Gaussian distribution are
also studied.
The thesis proceeds as follows: In Chapter 2 the CS-Gaussian distribution is
introduced and properties of this distribution are discussed. Bayesian CS-Gaussian
inversion is also presented. Further are different algorithms to generate realizations
from high dimensional CS-Gaussian distribution given. Different stationary prior
models are defined in Chapter 4. Parameter estimation for the different priors is
discussed in Chapter 5. In Chapter 6 is a closed-skew T-distribution defined with
properties and some examples. In the last part are the simulation algorithms,
prior models and parameter estimators tested on real seismic data in a Bayesian
setting. Finally we draw some conclusions, and outline possible further work in
Chapter 8.
2
Chapter 2
Methodology
2.1
The CS-Gaussian distribution
The multivariate closed-skew (CS) Gaussian distribution was first introduced in
González-Farías et al. (2004). It is a generalization of the Gaussian distribution,
and the most important extension is that it adds skewness into the Gaussian model.
The most convenient way to define the CS-Gaussian distribution is to consider a
random vector t ∈ Rn and a random vector v ∈ Rq . Let the joint probability
density function (pdf) of (t, v) be multivariate Gaussian as:
]
]
])
[t
( [µ
[ Σt Γtv
t
∼ Nn+q
,
,
(2.1)
v
µv
Γvt Σv
where µt and µv are expectation vectors and Σt, Σv and Γvt define covariance
matrices with proper dimensions. The multivariate Gaussian distribution of a
y ∈ Rk with dimension k, expectation µy ∈ Rk and covariance Σy ∈ Rk×k is
defined by the pdf:
{
}
1
y∼Nkyy)=φk(y;µyy)=
−1
(2π)k/2y |1/2 exp
2 (y − µy )Σy1(y − µy )
Then the CS-Gaussian variable of interest x ∈ Rn is defined as
x=[t|v≥0].
Where a ≥ b for vectors with elements ai and bi is defined as ai ≥ bi for all
elements i. Let p(x) be the generic term for a pdf of the argument, and the term
p(x|d) represents the conditional pdf of x given d. The pdf of the CS-Gaussian
distribution is then given as:
x∼p(x)=p(t|v≥0)= p(v≥0|t)p(t)
(2.2)
p(v ≥ 0)
= [1 − Φq(0;µvv)]−1
[1 − Φq (0; µ
] φn(t; µ
v|t, Σv|t)
tt),
3
CHAPTER 2. METHODOLOGY
with Φn(·; µ, Σ) and φn(·; µ, Σ) being the cumulative distribution function (cdf)
and the pdf of an n-dimensional Gaussian distribution. So the CS-Gaussian distri-
bution can be seen as a Gaussian distribution, which is skewed by a multiplication
of a cumulative Gaussian distribution. The pdf will be unimodal, since it is a prod-
uct of an unimodal function and a monotone function. The usual parameterization
of the CS-Gaussian distribution is
CSNn,q (µ, Σ, Γ, ν, Δ) = ψn,q (x; µ, Σ, Γ, ν, Δ)
(2.3)
= [Φq(0;ν,Δ + ΓΣΓ)]−1 Φq(Γ(x − µ);ν,Δ)φn(x;µ,Σ),
with µ = µt, Σ = Σt, Γ = ΓvtΣt1, ν
= −µv , Δ = Σv − ΓvtΣt1Γtv and
Ψn,q (x; µ, Σ, Γ, ν, Δ) being the cdf of the CS-Gaussian distribution. Δ and Σ
must be positive definite matrices. Γ is called the skewness parameter, and Γ = 0
entails that the CS-Gaussian distribution coincide with the Gaussian distribution
with parameters (µ, Σ). Hence the CS-Gaussian distribution should be seen as a
generalization of the Gaussian distribution, and not an alternative distribution.
The q-parameter may be seen as the degree of skewness freedom in the pdf, and
q = 0 also entails that the CS-Gaussian distribution coincide to the Gaussian
distribution. Note also that Δ = Σv|t, where Σv|t is the covariance of v given t.
There exists no analytical solution of the Gaussian cdf, Φq (·; ·, ·), and there
is also computationally hard to estimate the value of Φq (·; ·, ·). Hence it is not
possible to obtain an analytical expression for the CS-Gaussian pdf. That cause
problems in several cases, and is the underlying problem in this thesis.
The transformation between the CS-Gaussian distribution for x and the Gaus-
sian distribution for (t, v) is one-to-one given by:
]
]
])
[t
( [µ
[ Σt Γtv
t
x=[t|v≥0]∼CSNn,q(µ,Σ,Γ,ν,Δ)⇐⇒
∼ Nn+q
,
v
µv
Γvt Σv
(2.4)
with
µ=µt ∈Rn
µt = µ ∈ Rn
Σ=Σt ∈Rn×n
Σt = Σ ∈ Rn×n
Γ=ΓvtΣt1 ∈Rq×n
⇐⇒
Γvt = Γtv = ΓΣ ∈ Rq×n
ν =−µv ∈Rq
µv = −ν ∈ Rq
Δ=Σv −ΓvtΣt1Γtv ∈Rq×q
Σv = Δ + ΓΣΓ ∈ Rq×q
4
2.2. PROPERTIES OF THE CS-GAUSSIAN DISTRIBUTION
2.2
Properties of the CS-Gaussian Distribution
Some favorable characteristics of the CS-Gaussian distribution that are relevant
for inversion problems are proved in González-Farías et al. (2004):
1.
Linear combinations of components CS-Gaussian random variables are also
CS-Gaussian random variables. If x ∼ CSNn,q (µ, Σ, Γ, ν, Δ), A is a deter-
ministic l × n matrix with (l ≤ n) and c is a deterministic l-dimensional
vector. Then
[y = Ax + c] ∼ CSNl,qyyyyy),
where
µy =Aµ + c,
Σy =AΣA,
Γy =ΓΣAΣy1,
νy =ν,
Δy =Δ + ΓΣΓ − ΓΣAΣy1AΣΓ.
So the CS-Gaussian distribution is closed under linear transformations.
2.
The composition of two independent CS-Gaussian variables will also be CS-
Gaussian. If x1 and x2 are two independent CS-Gaussian variables with
distribution x1 ∼ CSNn
x
1 ,qx1x1 , Σx1 , Γx1 νx1 , Δx1 ) and
x2 ∼ CSNn
x
2 ,qx2x2 , Σx2 , Γx2 , νx2 , Δx2 ), then
]
[x1
y=
∼ CSNn,qy , Σy , Γy , νy , Δy ).
x2
5
CHAPTER 2. METHODOLOGY
Where
n =nx
1 +nx2,
q =qx
1 +qx2,
]
[µ
x1
µy =
,
µx
2
]
[Σx
0
1
Σy =
,
0
Σx
2
]
[Γx
0
1
Γy =
,
0
Γx
2
]
[νx
1
νy =
,
νx
2
]
[Δx
0
1
Δy =
0
Δx
2
3. CS-Gaussian random variables conditional of the components are also CS-
Gaussian variables. Let x ∼ CSNn,q (µ, Σ, Γ, ν, Δ), with x1 ∈ Rk and x2
Rn−k being two subvectors of x = [x1, x2]. Further let the parameters
associated with the subvectors be:
]
]
[µ
[Σ11
Σ12
1
µ=
,
Σ=
,
Γ=[Γ12].
µ2
Σ21
Σ22
Then
[x1|x2] ∼ CSNk,q1|2, Σ1|2, Γ1|2, ν1|2, Δ1|2).
Where
µ1|21 + Σ12Σ
(x2 − µ2),
2
Σ1|211 − Σ12Σ
2 Σ21,
Γ1|21,
ν 1|2 =ν −
(Γ2 + Γ1Σ12Σ
) (x2 − µ
2
2),
Δ1|2 =Δ.
Note that a result of property 1. is that also marginal pdfs will be CS-Gaussian
by choosing A = b(i) with b(i) an n × 1 vector with entries zeros except for element
number i, so that xi = b(i)x. A result of property 1. and 2. is that a sum of
independent CS-Gaussian variables also will be CS-Gaussian.
6
2.3. STATISTICS FOR THE CS-GAUSSIAN DISTRIBUTION
2.3
Statistics for the CS-Gaussian Distribution
We will here discuss location statistics and spread statistics.
2.3.1
Location statistics
The mode, mean and median are three location statistics that are useful in char-
acterizing the distribution. These statistics can be used for prediction and are for
x∼p(x)definedas:
• Mode, Mod(x) = arg maxx p(x).
• Mean, E(x) = [E(x1), E(x2), ..., E(xn)], where E(xi) =
xi xip(xi)dxi.
• Median, Med(x) = [Med(x1), Med(x2), ..., Med(xn)], where Med(xi) is de-
fined by
−∞ p(xi)dxi = 0.5.
For the Gaussian distribution these three statistics are identical, but for a CS-
Gaussian distribution they are in general different, due to the skewness in the
CS-Gaussian model. These statistics are also analytical obtainable for the Gaus-
sian distribution, since they are given as the model parameter µ. It would be
useful to be able to calculate the mean, mode and median for the CS-Gaussian
distribution.
Mode. Since we know that the CS-Gaussian pdf is unimodal, the mode is found
as the x that satisfies
xp(x) = 0
where
]
[
x =
∂x1 ,
∂x2 , ...,
∂xn
Let p(t|v ≥ 0) be as given in Expression (2.2). The derivation gives
0
{
}
tp(t|v ≥ 0) =c1
exp
−1
(v − µv − ΓvtΣt1(t − µt))
|t (v − µv|t)
v
2 (v − µv|t)Σ
{
}
· Σ
−1
|t ΓvtΣt1dv · exp
2 (t − µt)Σt1(t − µt)
{
}
+ c2(1 − Φ(0; µv|t, Σv|t)) exp
−1
(t − µtt1
2 (t − µt)Σt1(t − µt)
=0.
7
CHAPTER 2. METHODOLOGY
It is hard to solve this equation with respect to t, hence an analytical solution for
the mode is complicated to obtain. Numerical methods can be used to solve this
problem. For example can stochastic simulation be used, by generating realizations
from the CS-Gaussian distribution. Then choose the area with the highest density
of points as the mode. More traditionally optimization methods are complicated
to use, due to the difficulty of analytically compute the value of the CS-Gaussian
pdf. These methods are not investigated any further in this project.
Mean. The mean is given as the first moment. If x ∼ CSNn,q (µ, Σ, Γ, ν, Δ), then
the moment generating function of x is given in González-Farías et al. (2004) as
{
}
Mx(s) = Φq (ΓΣs; v, Δ + ΓΣΓ)
· exp
sµ + 1
,
s∈Rn.
Φq (0; v, Δ + ΓΣΓ)
2 sΣs
The mean of the CS-Gaussian distribution is then given as
E(x) = ∇sMx(s)|s=0 .
Then
{
}
sMx(s) =ΣΓ [∇sΦq (s; ν, Δ + ΓΣΓ)]
· exp
sµ + 1
Φq (0; ν, Δ + ΓΣΓ)
2 sΣs
{
}
+ Φq (ΓΣs; ν, Δ + ΓΣΓ)
· exp
sµ + 1
· (µ + Σs)
Φq (0; ν, Δ + ΓΣΓ)
2 sΣs
So the mean value of x is then
sMx(s)|s=0 = µ + ΣΓη,
(2.5)
where
η= [∇sΦq(s;ν,Δ+ΓΣΓ)]
Φq (0; ν, Δ + ΓΣΓ)
s=0
(Dominguez-Molina et al. 2003). For a high dimensional problem it is complicated
to compute the mean, since it does not exits any analytical solution for Φq (·; ·, ·).
But for q = 1 the mean will be
E(x) = µ + ΣΓ φ1(0; ν, Δ + ΓΣΓ)
Φ1(0; ν, Δ + ΓΣΓ) .
To obtain the mean for a higher dimensional problem a numerical method can be
used to compute Expression (2.5). Alternatively, E(xi) =
xi xipi(xi)dxi can be
approximated by a numerical method. For example by stochastic integration as
E(xi) = 1
xij),
N
j=1
8
2.3. STATISTICS FOR THE CS-GAUSSIAN DISTRIBUTION
where xij) is realization number j of element xi from simulation of the CS-Gaussian
distribution.
Median. The median is often used as location measure instead of the mean. The
median is not simple to obtain analytically, since it requires an inversion of the cdf
for the CS-Gaussian distribution. Therefore it is necessary to predict the median
with numerically methods. For example by
Med(x
i) = Med
{xi1), xi2), ..., xij), ..., xiN )} ,
where xij) is realization number j of element xi from the CS-Gaussian distribution.
It is possible to predict the mean and the median from each marginal distri-
bution. But since the marginal distribution depends of the full v-vector it will be
faster to generate x(j) from the joint distribution.
2.3.2
Spread statistics
The covariance and the quantiles are two spread statistics that are suitable for
characterizing the variability in the model. These statistics can be used to describe
the uncertainty in the predicted values and are defined as:
• Covariance, Cov(x) = E ((x − E(x)(x − E(x))).
• Quantiles, the α-quantile, Qα(x) = [Qα(x1), Qα(x2), ..., Qα(xn)], where Qα(xi)
is defined by−∞ p(xi)dxi = α.
For the Gaussian distribution the covariance is given as the model parameter, Σ,
while the quantiles must be found by numerical methods. It would be useful to be
able to calculate these spread statistics for the CS-Gaussian distribution.
Covariance. The covariance is given as Cov(x) = E ((x − E(x))(x − E(x)) =
E(xx) − E(x)E(x). So the covariance can be discussed from the first and sec-
ond moment. The first moment is already found and the second moment can be
calculated as
E(xx) =∇sMx(s)|s=0
=Σ + µµ + µηΓΣ + ΣΓηµ + ΣΓΛΓΣ,
9
CHAPTER 2. METHODOLOGY
where
Λ= ∇ssΦq(s;ν,Δ+ΓΣΓ)
Φq (0; ν, Δ + ΓΣΓ)
s=0
So the covariance of x is given as
Cov(x) =E(xx) − E(x)E(x)
=Σ + ΣΓΛΓΣ − ΣΓηηΓΣ
(Dominguez-Molina et al. 2003). It is not possible to compute the covariance
analytically for a high dimensional problem, so we need to use numerical approx-
imations.
Quantiles. First note that Q0.5(x) is identical as the median. So inversion of the
CS-Gaussian cdf is required to obtain the quantiles. As for the median it seems
that a numerical approximation is best for computing the quantiles. Also here it
is natural to use stochastic simulation to calculate the quantiles.
2.4
Examples of CS-Gaussian Distributions
The CS-Gaussian distribution is closely related to the Gaussian distribution and
has therefore also many similar properties. To illustrate the characteristics of the
CS-Gaussian distribution some examples of the CS-Gaussian distribution will be
shown. We will first consider a CS-Gaussian x ∈ R1, with a base case as
µ=5, Σ=9, Γ=1, ν =0, Δ=0.05.
(2.6)
Note that Σ, Γ and Δ here are one dimensional matrices.
From Expression (2.3) we know that the CS-Gaussian distribution is propor-
tional to a product of a Gaussian pdf with variance Σ and a Gaussian cdf with
variance Δ. A lower variance, Δ, will give more skewness in the model. A higher
variance, Σ, for the Gaussian pdf will also lead to more skewness. This can be
demonstrated by considering the base case, but with three different values for Σ
and Δ:
Σ1 = 9, Δ1 = 0.05
Σ2 = 6, Δ2 = 0.5
Σ3 = 3, Δ3 = 5.
10
2.4. EXAMPLES OF CS-GAUSSIAN DISTRIBUTIONS
1=9, 1=0.05
2=6, 2=0.5
3=3, 3=5
0.25
0.2
0.15
0.1
0.05
0
1
2
3
4
5
6
7
8
9
10
11
Figure 2.1: Pdf for different Σ and Δ, and µ = 5, Γ = 1 and ν = 0.
The pdf for these three cases are displayed in Figure 2.1.
The skewness in the model is not only determined by Δ and Σ, it is also
influenced by ν. For Δ very small and Σ very large will there still not be any
skewness in the model if ν = −∞. This effect is demonstrated by considering
a CS-Gaussian x with the parameters given in Expression (2.6), but with three
different values of ν:
ν1 = 0
ν2 = −3
ν3 = −9.
The three different pdfs are displayed in Figure 2.2. In the first case the pdf is very
skewed and in the third case the pdf is almost Gaussian. From these two examples
we observe that higher Σ will give longer tail on the right side and a small Δ will
give a short left tail. The CSN pdfs can be seen as a Gaussian pdf that here is cut
11
CHAPTER 2. METHODOLOGY
1=0
2=−3
3=−9
0.25
0.2
0.15
0.1
0.05
0
1
2
3
4
5
6
7
8
9
10
11
Figure 2.2: Pdf for different ν, and µ = 5, Σ = 9, Γ = 1 and Δ = 0.05.
off at the left side. The position of the cut off is decided by ν and the sharpness
is decided by Δ.
The parameter µ is the only parameter that not will influence the skewness
in the model. In an equivalent way as for the µ in the Gaussian distribution will
the µ in the CS-Gaussian distribution only shift the location of the distribution.
The translation effect is demonstrated by considering a CS-Gaussian x with the
parameters given in Expression (2.6), but with three different values of µ:
µ1 = 5
µ2 = 7
µ3 = 2.
All these three pdfs are identical except for a shift in the density, as displayed in
Figure 2.3.
12
2.4. EXAMPLES OF CS-GAUSSIAN DISTRIBUTIONS
1=5
2=7
3=2
0.25
0.2
0.15
0.1
0.05
0
1
2
3
4
5
6
7
8
9
10
11
Figure 2.3: Pdf for different µ, and Σ = 9, Γ = 1, ν = 0 and Δ = 0.05.
The same situations will occur for higher dimensional problems, and we will
here also study a particular CS-Gaussian distributed variable x with n = q = 2,
and parameters
]
]
]
]
]
[5
[ 1
0.2
[4
0
[−2
[1
0
µ1 =
,
Σ1 =
,
Γ1 =
,
ν1 =
,
Δ1 =
7
0.2
4
0
5
6
0
1
The pdf of this two dimensional CS-Gaussian distribution is shown in Figure 2.4.
Another example is shown in Figure 2.5 with
]
[4
0
µ2 = µ1, Σ2 = Σ1, Γ2 =
,
ν21, Δ21.
0
0
Note that the only difference between these two examples is one element in Γ2.
It is observed from Figure 2.4 and Figure 2.5 that there is more skewness in the
first example. By setting one diagonal element in Γ2 to zero makes the element
13
CHAPTER 2. METHODOLOGY
t2 almost uncorrelated with v, hence will the marginal of t2 be almost Gaussian
with mean 7. This shows that the Γ controls the skewness in the model. It is also
observed that this small change in the Γ also change the location of the distribution
in the x2-dimension. In the first example the mode is about (5, 9) and in the second
example is it about (5, 7).
These examples show that Σ, Γ, ν and Δ all influence the location, spread and
skewness in the model. This makes the inference in the model quite complicated.
12
10
8
6
4
2
2
4
6
8
10
12
x
1
Figure 2.4: First bivariate CS-Gaussian example. Bivariate pdf and marginal pdfs.
14
2.5. BAYESIAN CS-GAUSSIAN INVERSION
12
10
8
6
4
2
2
4
6
8
10
12
x
1
Figure 2.5: Second bivariate CS-Gaussian example. Bivariate pdf and marginal
pdfs.
2.5
Bayesian CS-Gaussian Inversion
Let x ∈ Rnx be the variable of interest and consider a linear model
[d|x] = Hx + e,
(2.7)
where d ∈ Rnd is measured data with error e ∈ Rnd . The object is to assess x given
d, and there exists several methods to do this. Here we will consider Bayesian in-
version, so we assume a probability distribution for x and e. In Bayesian inversion
we are interested in the probability distribution of an x given some observed data
d. Bayes rule gives the posterior model as
p(x|d) = c · p(d|x) · p(x),
where c is a constant, p(d|x) is the likelihood and p(x) represents the prior knowl-
edge about the variable of interest. The prior model for x is defined as
x∼CSNn
x,qxx, Σx, Γx, νx, Δx).
15
CHAPTER 2. METHODOLOGY
The error, e, is assumed to be independent of x and distributed as
e∼p(e)=CSNn
d,qd (0, Σe, Γe, 0, I),
where I is the identity matrix of proper dimension. This gives
[d|x] ∼ p(d|x) = CSNn
d,qd (Hx, Σe, Γe, 0, I).
Then the posterior model p(x|d) is given by
[x|d] ∼ p(x|d) = CSNn
x,qx+qdx|d, Σx|d, Γx|d, νx|d, Δx|d).
Where the parameters are given in Karimi et al. (2009) as
µx|dx + ΣxH[HΣxH + Σe]−1(d − Hµx)
Σx|dx − ΣxH[HΣxH + Σe]−1x
]
]
]
[[ΓxΣx
[ΓxΣxH
Γx|d =
[HΣxH + Σe]−1x
Σ
|d
0
ΓeΣe
]
]
[νx
[ΓxΣxH
ν x|d =
[HΣxH + Σe]−1(d − Hµx)
0
ΓeΣe
]
]
]
[Δx + ΓxΣxΓx
0
[ΓxΣxH
[ΓxΣxH
Δx|d =
[HΣxH + Σe]−1
0
I+ΓeΣeΓe
ΓeΣe
ΓeΣe
− Γx|dΣx|dΓx|d.
This is a generalization of the traditional Bayesian Gaussian inversion, which
is presented in Mosegaard and Tarantola (1995) and in Buland and Omre (2003).
This can be seen by setting Γe = 0 in the likelihood and Γx = 0 in the prior
model. In Bayesian Gaussian inversion the prior is Gaussian and the likelihood
is Gauss-linear. Then the posterior model will be Gaussian with parameters µx|d
and Σx|d. The posterior model, p(x|d), can be used to predict x based on the
data, d.
All the three location statistics presented in Section 2.3 can be used as pre-
dictors for x given the data d, but we have chosen to use the median. Hence
x|d] = Med{x|d} = Q0.5{x|d}. The median is,
as argued for in Karimi et al. (2009), a good choice since it is logically related to
the natural definition of the (1 − α)-prediction interval, [Qα/2{x|d}, Q1−α/2{x|d}].
The median is also a linear operator with respect to any monotone function h(·),
16
2.5. BAYESIAN CS-GAUSSIAN INVERSION
i.e. Q0.5{h(x|d)} = h(Q0.5{x|d}). Then for example [
x|d}, where
z =exp{x}. The median as a predictor is also robust with respect to deviations
in the tail of the posterior model p(x|d). Note that the median is frequently used
as a predictor for skew distributions instead of the mean, since the mean is less
robust towards long tails than the median.
2.5.1
Center of Likelihood Model.
The error term, e, of the likelihood model is not centered for all choices of pa-
rameters Σe and Γe. It is normal to assume an error term with location measure
zero, and with Γe = 0 the distribution is reduced to a Gaussian distribution with
location zero. Another way to make sure that the error term has zero median is to
introduce a centered error, ec. The centered error term has the same parameters
Σe and Γe as e, but includes an additional µec . The µec parameter should translate
the distribution such that median will be zero, and is therefore defined as
µec = −Med(e).
The median of e must be obtained by a numerical method.
17
Chapter 3
Simulation
Simulation based inference of the CS-Gaussian distribution will often be required.
In order to estimate the median and the quantiles as seen in Section 2.3 it is useful
to generate random values, x, from the multivariate CS-Gaussian distribution.
Even to make density plots as shown in Figure 2.4 and Figure 2.5, it is useful to
be able to simulate from the CS-Gaussian distribution, since it is computationally
hard to compute the density directly. In this chapter three different algorithms
to generate realizations from the multivariate CS-Gaussian distribution will be
presented. An empirical study will be done to compare the algorithms in the end
of the chapter.
3.1
Algorithm A: Brute Force Simulation
The simplest way to simulate, x, from a CS-Gaussian distribution is to generate
(t, v) from the multivariate Gaussian distribution as given in Expression (2.1) with
rejection sampling until v ≥ 0.
Algorithm A: Brute Force Simulation
• Repeat u[n]l v ≥ 0
]
])
t
([µ
[ Σt Γtv
t
Generate
∼ Nn+q
,
v
µv
Γvt Σv
• x←t
Then x will be a realization of the CS-Gaussian distribution.
This algorithm is not efficient for µv ≪ 0 and elements in v highly negatively
correlated, since it is very unlikely to generate vi ≥ 0; i = 1, ..., q at the same
time. When v is high dimensional it will be almost impossible to generate all
vi ≥ 0; i = 1, ..., q simultaneously.
19
CHAPTER 3. SIMULATION
3.2
Algorithm B: Two Steps Simulation
A more efficient algorithm is to first generate v until v ≥ 0, and then generate t
conditional on v. Consider the joint distribution p(t, v), then
p(v|v ≥ 0) = p(v, v ≥ 0)
=c·p(v)·I(v≥0).
p(v ≥ 0)
Where c = [p(v ≥ 0)]−1 is a constant and I(·) an indicator function taking value
1 if the argument is true and 0 otherwise. Further we have that
p(t, v|v ≥ 0) = p(t, v, v ≥ 0)
=c·p(t,v)·I(v≥0).
p(v ≥ 0)
Then the distribution of x can be written as
x=[t|v≥0]∼p(t|v≥0)=
p(t, v|v ≥ 0)dv
−∞
=
c·p(t,v)·I(v≥0)dv
−∞
=
c·p(t,v)dv.
0
We want to generate x from this distribution, and that will be the same as first
generate
v∼p(v|v≥0),
and then generate x as
x=[t|v]∼p(t|v).
This is valid since x then is generated from
x∼
p(t|v) · p(v|v ≥ 0)dv =
p(t|v) · c · p(v) · I(v ≥ 0)dv
−∞
−∞
=
c·p(t,v)dv
0
=p(t|v ≥ 0).
Since p(t, v) is Gaussian will also p(t|v) be Gaussian with mean and variance given
as
E(t|v) =µ + Γtv Σv1(v + ν)
Var(t|v) =Σt − Γtv Σv1Γtv .
20
3.3. ALGORITHM C: ITERATIVE SIMULATION
Then we can first generate v until all elements vi; i = 1, ..., q is greater than
zero, and so generate t from the corresponding Gaussian distribution. Hence we
reduce the rejection problem to a lower dimension. The algorithm is:
Algorithm B: Two Steps Simulation
• Repeat until v ≥ 0
Generate v ∼ Nq (−ν, Δ + ΓΣΓ)
• x←t|v∼Nn(µ+ΓtvΣv1(v+ν),Σt −ΓtvΣv1Γtv)
Then x will be a realization of the CS-Gaussian distribution.
This is a more efficient algorithm than the first algorithm, since we generate q val-
ues instead of n + q values for each step. But also this algorithm requires that
vi ≥ 0 for all i happen simultaneously.
3.3
Algorithm C: Iterative Simulation
An even more efficient method can be constructed by sequential rejection sam-
pling combined with an MCMC approach. We will use the same principle as in
the previous algorithm, by first generate v ≥ 0 and then generate t. But in-
stead of generating all vi ≥ 0; i = 1, ..., q simultaneously will each vi be generated
sequentially by an iterative method.
To generate vi ≥ 0 a rejection sampling approach, as described in Robert
(1995) is used. First we will present the general rejection algorithm to generate
realizations from h(w). Choose a distribution g(w) to be similar to h(w), but
such that it is easier to simulate from. Choose c to be a constant such that
h(w) ≤ c · g(w) for every w in the support of h(w). Then should a w be generated
from g(w) and an u from the uniform distribution between 0 and 1, U1(0, 1). The
w will be a realization from h(w), when the w and u satisfies u ≤ h(w)/(c · g(w)).
Let h(v; µ+, σ+, v0) be the truncated univariate Gaussian distribution with
truncation point v0, then the density will be
exp(−(v−µ+)2/2σ+)
,v≥v0
2πσ+(1−Φ((v0−µ+)/σ+))
h(v; µ+, σ+, v0) = N1+, σ+, v0) =
0
,v<v0.
Where v0 → −∞ gives the Gaussian distribution with mean µ+ and variance σ+.
In the rest of the algorithm description we assume with out loss of generality that
21
CHAPTER 3. SIMULATION
µ+ = 0 and σ+ = 1. Since v0 can be updated as v0 ← (v0 − µ+)/σ+ and then the
algorithm can be performed with µ+ = 0 and σ+ = 1, afterwards v is updated as
v←µ+ +v·σ+.
A possible choice for the proposal function g(·) is a translated exponential
distribution with density
α exp(−α(v − v0))
,v≥v0
g(v; α, v0) =
0
,v<v0.
Translated exponential distributed realizations can be generated by the inversion
method by v = −1/α·ln(u)+v0, where u is generated from the uniform distribution
U1(0, 1). Where α is a parameter that must be set. For v ≥ v0 we have that
α>v0: exp(α(v−v0))exp(−v2/2)≤exp(α2/2−v0α)=c
α≤v0: exp(α(v−v0))exp(−v2/2)≤exp(−(v0)2/2)=c,
then
g(v)
exp(−v2/2 + α(v − v0) − α2/2 + αv0) if α ≥ v0
c·h(v) =
exp(−v2/2 + α(v − v0) − α2/2 + αv0) otherwise.
Then from Robert (1995) it is given that
v0 +
v0 + 4
α=
2
is the optimal factor, which will give the highest probability for acceptance. Hence
also the most efficient algorithm.
Then we have an algorithm to generate a value from a truncated univariate
Gaussian distribution, but we want to simulate from a truncated multivariate
Gaussian distribution. This can be done by an MCMC approach with a Gibbs
step, then v(j) in state j is updated as
v1j) ∼ N1 (E(v1|v2j−1), v3j−1), ..., vqj−1)), Var(v1|v2j−1), v3j−1), ..., vqj−1)), v0,1)
v2j) ∼ N1 (E(v2|v1j), v3j−1), ..., vqj−1)), Var(v2|v1j), v3j−1), ..., vqj−1)), v0,2)
vqj) ∼ N1
(E(vq |v1j), v2j), ..., vq−)
1), Var(vq |v1j), v2j), ..., vq−)1), v0,q ) ,
where v0,i is the truncation point for element vi. Let v¬i = [v1, .., vi−1, vi+1, ..., vq ].
Then E(vi|v¬i) = −νi + Σv,i¬iΣ
,¬i¬i(v¬i + ν¬i) and
22
3.3. ALGORITHM C: ITERATIVE SIMULATION
Var(vi|v¬i) = σv,ii − Σv,i¬iΣ
,¬i¬iΣv,i¬i, where Σv,¬i¬i is the (q − 1) × (q − 1) matrix
derived from Σv , with elements σv,ij , by eliminating row i and column i. And Σv,i¬i
is the (q − 1) vector derived from the ith column of Σv by removing the ith row
term. It is also important to note that it is possible to calculate Σ
,¬i¬i without
performing a full inversion. Let V = Σv1 then Σ
,¬i¬i = V¬i¬i − Vi¬iVi¬i/vii, where
vii is element (i, i) in V . So the inversion of Σv in full dimension is the only
inversion that needs to be done. An initial state has to be chosen for v(0), and
a natural choice is v(0) = max{µv , v0} = max{−ν, v0}. Where max{·, ·} acts
component wise on the vectorial argument.
Then we have everything needed to define an efficient algorithm to generate
realizations from the CS-Gaussian distribution.
23
CHAPTER 3. SIMULATION
Algorithm C: Iterative Simulation
• V ←Σv1
• v=max{−ν,v0}
• Iterate until convergence
• for i = 1, ..., q
Calculate Σ
,¬i¬i ← V¬i¬i − Vi¬iVi¬i/vii
Calculate the mean of vi|v¬i: µ+ ← −νi + Σv,i¬iΣ
,¬i¬i(v¬i + ν¬i)
Calculate the variance of vi|v¬i: σ+ ← Σv,ii − Σv,i¬iΣ
,¬i¬iΣv,i¬i
Standardize v0: v0 ← (v0,i − µ+)/σ+
Calc)ate the parameter for the proposal distribution: α
← v0
+
(√
v0 + 4
/2
Repeat until u ≤ exp(−(v − α)2/2) for v0 < α
and u ≤ exp(−(v − α)2/2) exp(−(v0 − α)2/2) else.
Generate v ∼ exp(1/α) + v0
Generate u ∼ U1(0, 1)
• vi ← µ+ + vσ+ for i = 1, ..., q
• x←t|v∼Nn(µ+ΓtvΣv1(v+ν),Σt −ΓtvΣv1Γtv)
Then x will be a realization of the CS-Gaussian distribution.
Contrary to the other algorithms that requires to generate vi ≥ 0 for all i simul-
taneously, this algorithm generates vi ≥ 0 one by one. But it require an iterative
method for each realization, which can be too computer demanding. It might
be possible to generate v simultaneous and without iterations. For example with
rejection sampling from a multivariate exponential distribution as proposal distri-
bution, thus avoid using an MCMC method.
The number of iterations necessary for convergence will depend of the problem.
But in general it seems that the realizations have a relative low correlation. Since
it is only v(j) that depends on the previous state, and the variable of interest
24
3.4. EMPIRICAL STUDY
x = [t|v ≥ 0] is not directly dependent of the previous state, as displayed in
Figure 3.1. The truncated multivariate Gaussian distribution is unimodal, hence
we are avoiding problems due to convergence and several maxima in the MCMC
method. This is further discussed in the next Section and in Section 7.2.
v(0)
v(1)
v(2)
···
v(N )
···
t(1)
t(2)
t(N )
Figure 3.1: Dependency graph.
3.4
Empirical Study
To compare the algorithms an inverse problem, as discussed in Section 2.5, is
constructed with an x ∈ R3. A stationary model CSN3,3(·, ·, ·, ·, ·) is assumed. The
parameters are chosen to µx
= 2, σx
= 0.3, νx
= 1, δx
= 1, Σe = 10−2(HH + I),
Γe = 0 · I and five different values of γx
= {−1.5, 1, 0, 1, 1.5}. An exponential
dependency function is used, c(h) = exp{−h/5}. The H matrix in the likelihood
model is chosen to be
0.75
0.25
0
H =
0.20
0.60
0.20
.
0
0.25
0.75
First reference values x is generated from the prior, then d ∈ R3 is generated as
[d|x] = Hx + e. The posterior model can be found as described in Section 2.5.
50000 realizations are simulated from the posterior model with algorithm A, B
and C, and the average time of repeating this for 50 different reference data sets
is used to compare the algorithms.
To study the convergence of the Iterative Simulation algorithm a traceplot of
the L2-norm of each step is presented in Figure 3.2. Three different initial states are
used: v(0) = −ν, v(0) = [−2, −2, −2, −2, −2, −2] and v(0) = [4, 4, 4, 4, 4, 4]. As
seen in Figure 3.2 the algorithm appears to converge almost immediately. In Figure
3.3 the two first components of v (v1, v2) are displayed, and also this confirms that
the algorithm seems to converge fast for this case. For the Iterative Simulation
algorithm a burn in period of 50 realizations are used to be sure that the algorithm
has converged.
25
CHAPTER 3. SIMULATION
10
9
8
7
6
5
4
3
2
1
0
0
10
20
30
40
50
60
Iteration number
Figure 3.2: Traceplot for the length of v.
4
3
2
1
0
−1
−2
−2
−1
0
1
2
3
4
v
1
Figure 3.3: Plot of v1 and v2 for the three different initial states.
26
3.4. EMPIRICAL STUDY
γx
Brute Force Simulation Two Steps Simulation Iterative Simulation
-1.5
42s
30s
6.4s
-1
259s
191s
6.5s
0
679s
494s
6.5s
1
274s
204s
6.5s
1.5
42s
31s
6.5s
Table 3.1: Runtime in seconds (s) for the algorithms.
As seen in Table 3.1 the Iterative Simulation algorithm is the most efficient
one. As expected is it observed that the Brute Force Simulation algorithm has a
bit longer runtime than the Two Steps Simulation algorithm. The runtime for the
Iterative Simulation algorithm appear as almost constant for these values of γx
For a higher dimensional problem the difference in runtime will be even greater.
The runtime of algorithm A and B is particularly influenced by
]
[νx
·1−γx
.σx.CnH(HΣxH + Σe)−1(d − Hµx)
ν x|d =
,
0
for γe
= 0, since −νx|d ≪ 0 will shift the density such that the probability
to generate v ≥ 0 is low. The sign of the elements in νx|d is then given from
the sign of γx
= d−Hµ and of the value of νx
., e
.. Some elements ei will be
negative and some will be positive, hence νx|d will have both positive and negative
values. For γx
≪ 0 will the density shift such that most elements in e is likely
to be positive. And for γx
≫ 0 will most elements in e be less than zero. So a
low absolute value of γx
. will give that some vi are very unlikely to be generated
greater than zero. Then the Brute Force Simulation algorithm and the Two Steps
Simulation algorithm are expected to propose many times, before a realization is
accepted. The Iterative Simulation algorithm is not affected by this in the same
way, since the rejection sampling use a translated distribution for the proposal.
Then the proposal distribution will shift as νx|d changes. The algorithms give the
same predicted mean and median, but the realizations from algorithm C are not
independent as the realizations from algorithm A and B are. The estimated cdfs
and pdfs from these three algorithms are almost identical.
Since the Iterative Simulation algorithm is the fastest it is studied in more
detail to see how different parameters influence the runtime. It is shown in Figure
3.4 that different values of σe
and σx
do not affect the runtime much. For σe
and
σx
between 10−5 and 5 the maximum change in runtime is only 1.4 seconds.
27
CHAPTER 3. SIMULATION
5*10^−5
7.8
5*10^−4
7.6
5*10^−3
7.4
7.2
5*10^−2
7
5*10^−1
6.8
5
6.6
5*10^−5
5*10^−4
5*10^−3
5*10^−2
5*10^−1
5
seconds
2
x
Figure 3.4: Runtime in seconds for algorithm C for different values of σx
and σe
The effect of different values of γx
. and γe. is displayed in Figure 3.5. The
runtime is highest when both |γx
.|and|γe.|arelarge. Thecovarianceofv|dis
Σv|d =Δ + ΓΣΓ
]
[δx
−γx
.I+γx.Σx −γx.ΣxHAHΣx
.γe. ΣxHe
=
,
−γx
I+γe
Σe − γe
Σee
.γe. ΣxHe
where A = [HΣxH + Σe]−1. So when γx
. and γe. have a high absolute value will
the correlation between v and ve be high. This causes higher runtime, but even
with very high values of γx
. and γe. is the runtime only 1.5 seconds higher than
for the lowest runtime.
The runtime for the Iterative Simulation algorithm will increase as the dimen-
sion grows. Let T = nx then q = qx + qd = 2T , so the total dimension is 3T . Let
28
3.4. EMPIRICAL STUDY
−100
8
−80
−60
−40
7.5
−20
0
20
7
40
60
80
100
6.5
−100
−80
−60
−40
−20
0
20
40
60
80
100seconds
x
Figure
3.5: Runtime in seconds for algorithm C for different values of γx
γe
. and
all the parameters be as before with γx
=3andtheH-matrixisextendedas
0.75
0.25
0
···
···
0
0.20
0.60
0.20
0
0
0
0.20
0.60
0.20
0
H =
∈ RT ×T .
0
0
0.20
0.60
0.20
0
···
···
0
0.25
0.75
Then we have a posterior model that can be scaled up to any dimension T . To check
how the runtime of the algorithm change for different values of T the average time
to generate 1000 realizations with 20 different data sets for T ∈ {20, 40, 60, ..., 420}
is used. It is observed that the runtime increases a bit faster than linear as a
function of T , as displayed in Figure 3.6. This can be a problem for an extremely
large trace or in a 3D-setting. The runtime of inverting Σv is asymptotically
T log T , and will be a problem for large T . So an approximation should be done
29
CHAPTER 3. SIMULATION
such that inverting of the full Σv is avoided.
450
400
350
300
250
200
150
100
50
0
0
50
100
150
200
250
300
350
400
450
T
Figure 3.6: Runtime as a function of the dimension, T , of the problem.
30
Chapter 4
Stationary Prior Models
A stationary prior will not change when shifted in space. So all the marginals,
bivariate pdfs and higher dimensional pdfs must be shift invariant. As shown
in Expression (2.4) there exists an one-to-one transformation between the CS-
Gaussian distribution for x and the Gaussian distribution for (t, v). Hence is it
possible to specify a prior model in the x-space and one alternative model in the
tv-space. So we will here present two different priors, and it is possible to extend
them to higher dimensional reference spaces. For both the priors we use a shift
invariant dependence matrix C ∈ Rnx×nx given as
1
c(1)
c(nx − 1)
c(1)
1
C =
1
c(1)
c(nx − 1)
c(1)
1
Where c(i, j) = c(|i − j|) is a spatial positive definite dependence measure with
c(0) = 1. Here we assume that c(·) is parameterized with given parameters, but
in theory it should be possible to also estimate the parameters of the dependence
measure.
31
CHAPTER 4. STATIONARY PRIOR MODELS
4.1
x-space Prior
In this prior x ∈ Rnx is assumed to be CS-Gaussian, as
x∼CSNn
x,qxx, Σx, Γx, νx, Δx).
Let 1 be a vector containing only ones, then the parameters can be specified in
the x-space as:
µx = µx
·1∈Rnx
µt = µx
·1∈Rnx
Σx = σx
·C ∈Rnx×nx
Σt = Σx = σx
·C ∈Rnx×nx
Γx = γx
·I ∈Rnx×nx
⇔ Γvt = ΓxΣx = γx
.σx. · C ∈ Rqx×nx
νxx
·1∈Rqx
µv = −ν ∈ Rqx
Δx = δx
·I ∈Rqx×qx
Σv = Δx + ΓxΣxΓx = δx
·I+γx
σx
·C ∈Rqx×qx.
Let nx = qx so the free parameters are the scalars µx
.x.x.x. andδx.. The
parameters of the marginals can be calculated as described in Section 2.2:
xi ∼CSN1,q
xxi , Σxi , Γxi , νxi , Δxi )
µx
=b(i)µx = µx
i
.,
Σx
=b(i)Σxb(i) = σx
,
i
1
Γx
xΣxb(i) 1
= γx
= γx
i
.σx.Cb(i)
.C(i),
σx
σx
νx
x,
i
Δx
x + ΓxΣxΓx − ΓxΣxb(i) 1
b(i)ΣxΓx
i
σx
x
.I+γx.σx.C−γx.σx.C(i)C(i),
where C(i) is column i of the dependence matrix C. By transforming the distri-
bution to the corresponding Gaussian distribution of (t, v) we obtain parameter
values
Γv
x
Σx
= γx
it
i
i
.σx.C(i)
Σv
x
+ Γx
Σx
Γx
i
i
i
i
i
x
.I+γx.σxC.
Since C(i) is almost identical for every i except for some deviations caused by the
boundary, will the dominating elements be identical for the marginals for a high
32
4.2. TV-SPACE PRIOR
dimensional problem. The marginals will be almost identical, except for relatively
few variables close to the border. The marginal of xi is then
p(xi) = p(ti|v ≥ 0).
So ti depends on all vj ; j = 1, ..., qx and not only vi.
It is possible to constrain the prior by setting νx
= 0 and δx
= 1. This
gives a simpler prior, where the skewness can be controlled by γx
.. Hencethefree
parameters are µx
.x. andγx.. Forthemodeltobevalidisitnecessarytorequire
that σx
> 0. Note that the model is valid for all values of γx
. and µx. .
4.2
tv-space Prior
An alternative prior can be specified in the tv-space, where (t, v) ∈ Rnx+qx is
assumed to be Gaussian, as
]
]
])
[t
( [µ
[ Σt Γtv
t
∼ Nn
x+qx
,
,
(4.1)
v
µv
Γvt Σv
then the parameters can be specified in the tv-space as:
µt = µt
µx = µt
.1∈Rnx
.1∈Rnx
Σt = σt
C ∈Rnx×nx
Σx = Σt = σt
C ∈Rnx×nx
Γvt = γtv
Γx = σt.
C −1 ∈ Rqx×nx
.I ∈Rqx×nx
µv = µv
.1∈Rqx
νx =−µv
.1∈Rqx
Σv = σv
I ∈Rqx×qx
Δx = σvI − σt.
C −1 ∈ Rqx×qx
Let nx = qx so the free parameters are the scalars µt
.t.tv.v. andσv.. The
33
CHAPTER 4. STATIONARY PRIOR MODELS
parameters of the marginals can be calculated as described in Section 2.2:
xi ∼CSN1,q
xxi , Σxi , Γxi , νxi , Δxi )
µx
=b(i)µx = µt
i
.,
Σx
=b(i)Σxb(i) = σt
,
i
Γx
xΣxb(i) 1
= γtv.
C −1σt
Cb(i) 1
= γtv.
b(i),
i
σt
σt
σt
σt
νx
=−µv,
i
Δx
x + ΓxΣxΓx − ΓxΣxb(i) 1
b(i)ΣxΓx
i
σt
tv.
v
I−
b(i)b(i),
σt
By transforming the distribution to the corresponding Gaussian distribution of
(t, v) we obtain
Γvt
x
Σx
= γtv
i
i
i
.b(i)
Σv
x
+ Γx
Σx
Γx
i
i
i
i
i
v
I.
So
(
)
tv.
xi ∼ CSN1,1
µt
t
, γtv.
, −µv
i
.v.
σt
σt
The marginal of xi is then
p(xi) = p(ti|v ≥ 0) = p(ti|vi ≥ 0)
So each ti only depends of vi. Hence this prior model is also stationary, since
p(xi) = p(xj ) and independent for all i and j. It is possible to constrain the prior
by setting µv
=0andσv
= 1. This gives a more simple prior, where the skewness
can be controlled by γtv
.. Hence the free parameters are µt., σt. and γtv.. For the
model to be valid is it necessary to require that
]
[σtC γtv
.I
γtv
I
.I
is positive definite. This requirement can often be difficult to meet.
34
4.3. COMPARING THE PRIORS
4.3
Comparing the Priors
For the x-space prior ti is connected to all v, but for the tv-space prior is ti only
connected to vi. The latter provides a more logical constructed prior, hence will it
also be easier to understand the effects of the different parameters. The marginals
for tv-space prior have q = 1, but the marginals for the x-space prior have q = nx.
So the marginals for the tv-space prior are easier to calculate. But the x-space
prior has less complicated requirements for the parameter independence than the
tv-space prior. For the tv-space prior it is hard to specify valid parameters, since
a determinant has to be calculated to check if a parameter set is valid.
35
Chapter 5
Parameter Estimation in Prior Models
For an inverse problem in an empirical Bayes spirit will we estimate the parame-
ters in the prior model from observations of a comparable phenomenon. Previous
work have considered parameter estimation of the one-dimensional skew-normal
distribution, see Pewsey (2000). In this paper "Method of moments" and "Max-
imum likelihood" methods are used to estimate the parameters. The "Method
of moments" is not working well for very skew distributed data sets. The "Maxi-
mum likelihood" method must be solved by a numerical approximation. In Pewsey
(2000) the problems of finding the global maximum are solved by a grid of different
initial values. Here we will present alternative methods to estimate the parame-
ters of the CS-Gaussian distribution for the two different priors presented in the
previous section. We will assume that observations xo ∈ Rnx of a comparable
one-dimensional phenomenon is available, for example observations along a well
trace.
5.1
x-space Prior
For the x-space prior we will present two different methods for estimation, and
compare them empirically. The first method, with a full likelihood model, requires
an exponential dependency function.
37
CHAPTER 5. PARAMETER ESTIMATION IN PRIOR MODELS
5.1.1
Full Likelihood Model.
For the x-space prior the full likelihood (fL) is defined as:
]−1
L(µx
[Φq
(5.1)
.x.x.x.x.|xo)=
x(0; νx. 1, δx. I + γx. σx. C)
· Φq
xx. (xo − µx. 1); νx. 1, δx. I)
· φn
x(xo; µx, σx. C)
]−1
=
[Φq
x(0; νx. · 1, δx. · I + γx. σx. C)
·
Φ1x
.(xi − µx. ); νx. , δx. )
i=1
· φn
x(xo; µt. 1, σx. C),
Here we have used that Φq
xx. (xo − µx. 1); νx. 1, δx. I) can be factorized as
qx
i=1 Φ1x. (xi − µx. ); νx. , δx. ), since the covariance δx. I is diagonal. φnx (·; ·, ·) is
analytically obtainable and Φ1(·; ·, ·) is fast to approximate for a one dimensional
Gaussian variable. But the normalization constant Φq
x(·; ·, ·) can in general be
computationally hard to obtain, so we will here look at the special case where an
efficient solution can be found. Then the maximum likelihood, ML, is given as
x
arg max
L(µx
.x.x.x.x.)=
.x.x.x.x.|xo).
µx.x.x.x.x.
Calculation of the normalization constant. The normalization constant is
given as
Φn(0; 0, δx
·I+γx
σx
C)=p(x≤0)
0
=
φn(x; 0, δx
.I+γx.σx.C)dx.
−∞
where x is Gaussian distributed as Nn(x; 0, δx
.I
+ γx
σx
C), so the covariance
matrix can be written as Σ = α · I + β · C. Where α and β are constants and C is
a correlation matrix. This is in general a very complicated problem, but we will
study the special case where C is assumed to be constructed from an exponential
correlation function, c(|i − j|) = exp{−|i − j|/r}.
This case can be modeled with an x0 and ϵ, as shown in Figure 5.1, where
x0 ∼ Nn(0, β · C), ϵ ∼ Nn(0, α · I) and (x0, ϵ) are independent. Then
x=x0 +ϵ∼Nn(0,α·I+β·C).
38
5.1. X-SPACE PRIOR
y1
y2
ynC
ϵ1
ϵ2
ϵn
x1
x2
xn
Figure 5.1: Dependency graph.
Define a likelihood model such that
1,
xi + ϵi ≤ 0
[yi|xi, ϵi] =
(5.2)
0,
xi + ϵi > 0.
Hence will Φn(0; 0, α · I + β · C) = p(x ≤ 0) = p(y = 1).
In Rybicki and Press (1995) it is shown that x0 has Markov property such
that p(x0) = p(xn|xn−1) · ... · p(x2|x1) · p(x1). This require that C is defined
from an exponential correlation function. The pdf, p(xi|xi−1), for i ≥ 2 is the
conditional Gaussian distribution given as φ1(xi; exp(1/r) · xi−1, β − β · exp(2/r))
and p(x1) = φ1(x1; 0, β).
39
CHAPTER 5. PARAMETER ESTIMATION IN PRIOR MODELS
Then p(y) = p(y1, y2, ..., yn) can be written as:
p(y1, y2, ..., yn) =p(yn|y1, y2, ..., yn−1) · ... · p(y2|y1) · p(y1)
=
p(yn|y1, y2, ..., yn−1, x0) · ... · p(y2|y1, x0) · p(y1|x0) · p(x0)dx0
x0
=
p(yn|xn) · ... · p(y2|x2) · p(y1|x1) · p(xn|xn−1) · ...·
x0
p(x2|x1) · p(x1)dx0
=
p(yn|xn, ϵn) · ... · p(y2|x2, ϵ2) · p(y1|x1, ϵ1) · p(xn|xn−1)
x0
ϵ0
· ... · p(x2|x1) · p(x1) · p(ϵn) · ... · p(ϵ2) · p(ϵ1)dϵdx0
=
p(yn|xn, ϵn) · p(ϵn)dϵn
xn
ϵn
·
p(yn−1|xn−1, ϵn−1) · p(xn|xn−1) · p(ϵn−1)dϵn−1dxn · ...·
xn−1
ϵn−1
·
p(y2|x2, ϵ2) · p(x3|x2) · p(ϵ2)dϵ2dx3
x2
ϵ2
·
p(y1|x1, ϵ1) · p(x1) · p(ϵ1)dϵ1dx2dx1
x1
ϵ1
In our case we want to calculate p(y = 1). Since p(yi|xi, ϵi) is zero for xi + ϵi > 0,
as given in Expression (5.2), we get the following recursive expression:
p(y = 1) =
p(ϵn)dϵn
p(xn|xn−1) ·
p(ϵn−1)dϵn−1 · ...·
xn
ϵn<−xn
xn−1
ϵn−1<−xn−1
·
p(x3|x2) ·
p(ϵ2)dϵ2 ·
p(x2|x1) · p(x1)
x2
ϵ2<−x2
x1
·
p(ϵn−1)dϵ1dx1 · ... · dxn
ϵ1<−x1
=
Φ1(xn; 0, 1)
φ1(xn; exp(1/r) · xn−1, β − β · exp(2/r))
xn
xn−1
· Φ1(xn−1; 0, 1) · ... ·
φ1(x3; exp(1/r) · x2, β − β · exp(2/r))
x2
· Φ1(x2; 0, 1) ·
φ1(x2; exp(1/r) · x1, β − β · exp(2/r))
x1
· φ1(x1; 0, β) · Φ1(x1; 0, 1)dx1 · ... · dxn
To calculate this a numerical approximation is needed, so let
φ(·; ·, ·) be the discrete
40
5.1. X-SPACE PRIOR
approximation of the normal distribution φ(·; ·, ·). Hence
p(y = 1) ≈
Φ1(xn; 0, 1)
φ1(xn; exp(1/r) · xn−1, β − β · exp(2/r))
xn
xn−1
· Φ1(xn−1; 0, 1) · ... ·φ1(x3; exp(1/r) · x2, β − β · exp(2/r)) · Φ1(x2; 0, 1)
x2
·
φ1(x2; exp(1/r) · x1, β − β · exp(2/r)) ·
φ1(x1; 0, β)) · Φ1(x1; 0, 1)
x1
The sums can be computed recursively and relatively fast, hence is it possible to
estimate Φn(0; 0, α · I + βC) relatively fast. This gives a method to also calculate
other Gaussian quantiles, by changing the likelihood model given in Expression
(5.2). Hence can the method be very useful in many different situations. The
method presented here is much faster than simulate from the corresponding dis-
tribution and count the number of points in the desired region.
5.1.2
Localized Pseudo Likelihood Model.
Here we will approximate the likelihood with a pseudo likelihood given as:
]−nx
pL(µx
[Φq
.x.x.x.x.|xo)=
x(0; νx. 1, δx. I + γx. σx. C)
·
Φq
)
xx. C(i)(xi − µx. ); νx. 1, δx. I + γx. σx. C − γx. σx. C(i)C(i)
i=1
·
φ1(xi; µx
.x.).
i=1
The high dimensional Gaussian quantile, Φq
xx. C(i)(xi −µx. ); νx. 1, δx. I +γx. σx. C −
γx
σx
C(i)C(i)), is hard to calculate. The method to calculate quantiles for the
Gaussian distribution presented in previous section does not work here. It is not
possible to express the x-variable as a Markov chain, even for a correlation ma-
trix C constructed from an exponential correlation function. So an approximated
marginal need to be used. Consider the equivalent parameterization of (t, v) for x,
as given in Expression (2.4). The marginal variable ti is normally mostly correlated
with vi, vi+1, vi−1, vi−2, vi+2, ..., in decreasing order, such that an approximation can
be defined by setting the correlation between ti and vj for |j − i| > ⌊qx/2⌋ to zero
for a given qx. Where ⌊·⌋ rounds the element to the nearest integer towards zero.
Thus can the marginal distribution be approximated as
CSN1,q
(5.3)
xx. , σx. , Γxi , νx, Δxi ) ≈ CSN1,qxx. , σx. , Γxi , νxi , Δxi ),
41
CHAPTER 5. PARAMETER ESTIMATION IN PRIOR MODELS
where
C(⌊ qx
2 ⌋)
C(1)
Γx
x
C(0)
= γx
i
.
(i) ∈ Rqx ,
C(1)
C(⌊ qx
2 ⌋)
ν
ν
νx
=
∈ Rqx ,
i
ν
Δx
x
i
.I+γx.σx.C −γx.σx.
(i)
(i) ∈ Rqx×qx .
Hence can the marginals be approximated as
p(xi) = p(ti|v ≥ 0) ≈ p(ti|v
≥ 0, ..., vi−1 ≥ 0, vi ≥ 0, vi+1 ≥ 0, ..., v
≥ 0).
i−⌊ qx
i+⌊ qx
2
2
By using this approximated marginal distribution will the pseudo likelihood be
]−nx
pLx
[Φq
(0; νx
.x.x.x.x.|xo)=
x
.1, δx. I + γx. σx. C)
′∗
·
Φq
x
C
)
x
.
(i)(xi − µx. ); νx. 1, δx. I + γx. σx. C − γx. σx.
(i)
(i)
i=1
·
φ1(xi; µx
.x.)
i=1
Then the maximum pseudo likelihood, MpL, is given as
x
arg max
pLx
.x.x.x.x.)=
.x.x.x.x.|xo)
µx.x.x.x.x.
It is observed that this MpL is not defined for all possible choices of parameters,
and this problem is avoided by requiring σx
>0and δx
> 0. Then from the defi-
nition of the CS-Gaussian distribution we know that δx
.I+γx.σx.C−γx.σx.C(i)C(i)
must be positive definite if C is positive definite. Then a CS-Gaussian distribution
will be defined for the estimated parameters. A numerical method can be used
to solve the optimization to determine the parameters, and with the constrained
prior, νx
=0andδx
= 1, is it possible to find the global maximum. The localized
pseudo likelihood will work for every dependency functions, c(·), so a exponential
dependency function is not required as for the full likelihood.
42
5.1. X-SPACE PRIOR
5.1.3
Empirical Test for fL and Localized pL
We have constructed a test example with µx
= 0, σx
=1andγx
=3tocompare
the two presented estimators. For different range parameters, r, have we generated
100 data points and the estimated parameters are compared in Table 5.1. The tests
are repeated 100 times for each parameter set, and used to estimate an empirical
90% confidence interval.
For the case with r = 0.00001 the estimated parameters are identical, which
is natural since the data points are almost independent. Hence will the pseudo
likelihood be almost identical to the full likelihood. For a higher range parameter
will the full likelihood give better results than the pseudo likelihood. That is also
expected since the approximation of independency in the pseudo likelihood will be
less correct for higher range parameter. The µx
. parameter seems to be biased for
the pseudo likelihood and highly correlated data. For the case with r = 30, the
correct value of µx
. is not in the empirical confidence interval. The σx. parameter
is estimated quite reliably for all range values, but the precision is lower for the
higher range parameters. The skewness parameter, γx
., is estimated with a high
bias and low precision with the pL estimator. For the pL with range parameter
larger than 1 the correct value of γx
. is not in the empirical confidence interval.
It is natural that also the full likelihood will be less correct for a higher range
parameter, since there are less information in a highly correlated data set than in
an uncorrelated data set. Note that when the distance between two points is 3r,
the correlation is still about 5%. So the practical range with r = 30 is usually
considered to be 90, hence the data is highly correlated.
43
CHAPTER 5. PARAMETER ESTIMATION IN PRIOR MODELS
Real values
µx
=0
σx
=1
γx
=3
r=0.00001
fL
µx
= 0.02
[−0.17, 0.19]
σx
= 0.98
[0.65, 1.33]
[1.96, 4.50]
γx. = 3.18
pL
µx
= 0.02
[−0.17, 0.19]
σx
= 0.98
[0.65, 1.33]
[1.96, 4.50]
γx. = 3.18
r=1
fL
µx
= 0.17
[−0.10, 0.38]
σx
= 0.85
[0.52, 1.17]
[0.83, 5.04]
γx. = 2.87
pL
µx
= −0.06
[−0.21, 0.08]
σx
= 1.18
[0.89, 1.50]
[3.07, 9.91]
γx. = 5.52
r=10
fL
µx
= −0.12
[−0.49, 0.21]
σx
= 0.95
[0.77, 1.13]
[0.70, 5.61]
γx. = 2.78
pL
µx
= 0.23
[−0.01, 0.55]
σx
= 1.45
[0.85, 2.18]
[4.13, 8.40]
γx. = 6.14
r=30
fL
µx
= −0.29
[−1.05, 0.19]
σx
= 0.95
[0.73, 1.15]
[0.63, 4.49]
γx. = 2.49
pL
µx
= 0.43
[0.18, 0.73]
σx
= 0.99
[0.54, 1.57]
[4.26, 11.52]
γx. = 6.90
Table 5.1: Estimated parameters form 100 data points with exponential correla-
tion, C(i, j) = exp{−|i − j|/r}. The estimation is repeated 100 times for each
case, and the empirical mean of the parameters are presented with an empirical
90% confidence interval.
5.2
tv-space Prior
For this prior we will also define a full likelihood and a pseudo likelihood for
parameter estimation.
5.2.1
Full Likelihood Model.
For the tv-space prior is the full likelihood given as:
]−1
L(µt
.t.tv.v.v.|xo)=
[Φq
(5.4)
x(0; −µv. 1, σv. I)
)
(
tv.
tv.
· Φq
x
C −1(xo − µt
C −1
.1); −µv. 1, σv. I −
σt
σt
· φn
x(xo; µt1, σt. C)
44
5.2. TV-SPACE PRIOR
(
)
γtv.
The problem here is to evaluate Φq
x
σt. C−1 · (xo − µt. · 1); −µv. 1, σv. I −
σt. C−1
There is no Markov property with this covariance, hence is the earlier presented
method for Gaussian quantiles not working here. So we need to use a pseudo
likelihood approach.
5.2.2
Pseudo Likelihood Model.
For the prior specified in the tv-space we have to approximate the likelihood with
a pseudo likelihood:
]−nx
pL(µt
[Φ1(0; −µv
.t.tv.v.v.|xo)=
.v.)
)
( γtv
tv.
·
Φ1
· (xi − µt
.); −µv. , σv.
σt
σt
i=1
·
φ1(xi; µt
.t.).
i=1
Note that this pseudo likelihood can be calculated without any approximations,
since this pL factorize, as is required with the previous prior. A numerical method
is used to solve the optimization to determine the parameters, but it is often
difficult to ensure that the global maximum is identified. Therefore a constrained
pseudo likelihood approach with µv
=0 and σv
=1 can be used as described in
Section 4.2. It is observed that this pL is not defined for all possible choices of
parameters, and this problem is avoided by requiring σt
> 0 and σv
σt.
> 0.
With σv
= 1 it is sufficient to require σt
>
> 0. This ensures that the
tv.
marginals have valid parameters. But to be sure that the simultaneous distribution
has valid parameters, is it necessary to require that
]
[σtC γtv
.I
(5.5)
γtv
.I σv.I
is positive definite.
Then the maximum pseudo likelihood, MpL, is given as
t
arg max
pL(µt
.t.tv.v.v.)=
.t.tv.v.v.|xo).
µt.t.tv.v.v.
With µt
.t. andγtv. asthefreeparameters,thisisalmostthesameproblemas
described in Pewsey (2000). The numerical methods to find the maximum of the
pL requires initial values for the parameters, but instead of having a grid of initial
45
CHAPTER 5. PARAMETER ESTIMATION IN PRIOR MODELS
values we will here present an alternative algorithm to find the global maximum.
It seems that there often will occur two maxima for the log-pseudo likelihood. One
gives an almost Gaussian fit (γtv
≈ 0) and one with skewness. To understand this
we will first look at the log-pseudo likelihood:
)
log
(pL(µt
)=−nxlog(Φ1(0;0,σv
)
.t.tv.v.v.|xo)
(
))
( γtv
tv.
+
log
Φ1
(xi − µt
.); 0, 1 −
σt
σt
i=1
)
+
log
(φ1(xi; µt
.t.)
i=1
=nx log (2)
(
))
( γtv
tv.
+
log
Φ1
(xi − µt
.); 0, 1 −
σt
σt
i=1
)2
∑( xi − µt
−1
− nx log(
2π · σt)
2
σt
i=1
(
(
))
γtv.
Φ1
is unimodal.
=1 log
σt. (xi − µt. ); 0, 1 −
σt.
(
nx
xi−µt.
The last part, − 1
)2 − nx log(2π · σt), is recognized from the Gaus-
2
i=1
σt.
sian log-likelihood and is unimodal. So the log pseudo likelihood seems to have
two modes.
If there exist two modes the parameters can be estimated by initiate µt
. and
σt
with the standard estimators for the Gaussian distribution. Since γtv
. is lim-
ited between σt
. and −σt. we will run the numerical method twice with Gaussian
estimates for (µt
.t.)andthetwoextremevaluesofγtv. asinitialvalues.
46
5.2. TV-SPACE PRIOR
Algorithm: Maximum pL(µt
.t.tv.,0,1|xo)
• Initiate the parameters:
nx
⋆ µt
= 1
.,0
nx
i=1 xi
1
nx
⋆ σt
=
i=1 (xi − µt.,0)2
.,0
nx−1
⋆ γtv
= σt
.,0
.,0
• Do the optimization to identify (µt
.t.tv.)+. Make sure that the pa-
rameters meet the positive definite requirement given in Expression (5.5).
• Initiate the parameters µt
σt
and γtv
= −σt
.,0,
.,0
.,0
.,0
• Do the optimization to identify (µt
.t.tv.). Make sure that the pa-
rameters meet the positive definite requirement given in Expression (5.5).
• Choose the parameter set with highest likelihood, (µt
.t.tv.)
The two maxima is illustrated by an example in Figure 5.2, where a likelihood
for a profile between the two maxima is displayed. The two maxima are found by
the given algorithm, and the value of the likelihood on a linear transect between
these maxima is calculated. Note that the non-Gaussian maximum has the highest
likelihood and also is the solution closest to the correct values of the example which
is µt
= 0, σt
=2andγtv
= −0.75.
x 104
−1.6846
−1.6848
−1.685
−1.6852
−1.6854
−1.6856
−1.6858
(t=0.08, 2=2.09, vt=−0.78)
(t=−0.54, 2=1.70, vt=0.00)
Figure 5.2: Log-pL for an example with real values: µt
= 0, σt
= 2 and γtv
=
−0.75.
47
CHAPTER 5. PARAMETER ESTIMATION IN PRIOR MODELS
5.2.3
Empirical Test for pL
The estimation procedure was tested on the same parameters as in the empirical
test for the x-space prior in Section 5.1.3, but it seems to be impossible to a have
very skew data with a high correlation, and meet the requirements in Expression
(5.5). So we will study another test example than for the x-space prior. We have
constructed a test example with 100 data points with µt
= 0 and σt
= 4 and
range parameter r = 3. We have done parameter estimation with different values
of γtv
. between −0.8 and 0.8. The test is repeated 100 times for each parameter set.
The empirical mean of the parameter estimation is shown in Table 5.2, with an
empirical 90% confidence interval. All parameter estimates ensures the constrain
in Expression (5.5).
Real values
µt
=0
σt
=4
γtv
= 0.8 (Real value)
µt
= 0.01
[−0.76, 0.62]
σt
= 3.96
[2.11, 5.95]
γtv. = 0.75
[0.30, 0.96]
γtv
= 0.5 (Real value)
µt
= 0.21
[−0.65, 1.40]
σt
= 3.93
[2.35, 5.53]
[−0.90, 0.94]
γtv. = 0.22
γtv
= 0.1 (Real value)
µt
= −0.03
[−1.07, 1.11]
σt
= 4.24
[2.90, 5.93]
[−0.92, 0.97]
γtv. = 0.18
γtv
= −0.1 (Real value)
µt
= 0.02
[−1.32, 1.14]
σt
= 4.16
[2.73, 5.53]
[−0.96, 0.89]
γtv. = −0.12
γtv
= −0.5 (Real value)
µt
= −0.19
[−1.35, 0.87]
σt
= 4.10
[2.66, 5.93]
[−0.94, 0.91]
γtv. = −0.16
γtv
= −0.8 (Real value)
µt
= 0.01
[−0.77, 0.76]
σt
= 3.81
[2.61, 5.22]
γtv. = −0.72
[−0.93, 0.10]
Table 5.2: Parameter estimation with different values of γtv
100 data points with
µt
=0andσt
=4andr=3.
It is observed that for data with low skewness the precision in the estimation
is very low. Specially the γtv
. seems to be hard to estimate. For small values of
tv
.|thevalueofthepLforthetwomaximaareoftenclosetoeachother. Hence
will the wrong maxima (γtv
≈ 0) be chosen more often for small values of |γtv
.|.
But the answer for small values of |γtv
.|willnotbeaffectedsomuchasforhigher
values of |γtv
.|. So it seems that the cases with γtv. = 0.5 and γtv. = −0.5 will be
influenced most.
48
5.3. CLOSING REMARKS
5.3
Closing Remarks
For parameter estimation it seems to be hard to fulfill the requirements in Ex-
pression (5.5) for the tv-space prior. Specially to combine high skewness and high
correlation seems to be really difficult. So it seems that the x-space prior with the
full likelihood estimator is the best choice, since the requirements for the parame-
ters are easier to fulfill. The full likelihood also takes into account the correlation
in the data. But the full likelihood requires a exponential correlation measure. For
other correlation functions must the pL be used. It will also be possible to let the
range parameter be a free parameter, and then optimize the likelihood with respect
to µx
.x.x. andr. Theoptimizationwillbemuchmorecomputerdemanding.
49
Chapter 6
The CST-distribution
In Karimi et al. (2009) is the inversion done for the elastic material properties: P-
wave velocity, S-wave velocity and density. The results shows that the distribution
of the P-wave velocity has a bit heavier tails than the CSN-distribution. Therefore
it would be useful to have a skew distribution with heavier tails. To meet this
requirement the multivariate closed-skew (CS) T-distribution is defined.
The CST-distribution is a generalization of the T-distribution. In a similar
way as the CS-Gaussian distribution it adds skewness into the model. First we
present some properties of the multivariate T-distribution:
6.1
Multivariate T-distribution
The multivariate T-distribution is a generalization of the Students t-distribution
and a random vector y ∈ Rn is multivariate T-distributed
y∼Tn(µ,Ω,η)
if the pdf is
(
)
η+n
(
) η+n
Γ
2
2
1
τn(y; µ, Ω, η) =
(
2
1+ 1
η
Γ
) (η · π)n/2 |Ω|
η (y − µ)Ω−1(y − µ)
2
where Γ(·) is the gamma function, µ ∈ Rn is a centering vector, Ω ∈ Rn×n is a
positive definite dependence matrix and η ∈ R+ is the degree of freedom.
Some relevant characteristics are given in Roislien and Omre (2006):
1. Special case of the T-distribution:
51
CHAPTER 6. THE CST-DISTRIBUTION
2. Moments of T-distribution: Let x ∼ Tn(µ, Ω, η) then
E(x) = µ; η ≥ 2
Cov(x) = η
η−2·Ω; η≥3
3.
Linear combinations of T-distributed random variables are also T-distributed.
Let x ∼ Tn(µ, Ω, η) and A be a deterministic l × n matrix then
[Ax] ∼ Tl(Aµ, AΩA, η)
4.
T-distributed random variables conditional of the components are also T-
distributed. Let x ∼ Tn(µ, Ω, η), with x1 ∈ Rk and x2 ∈ Rn−k being two
subvectors of x = [x1, x2]. Further let the parameters associated with the
subvectors be:
]
]
[µ
[Ω11
Γ12
1
µ=
,
Ω=
µ2
Γ21
Ω22
Then
[x1|x2] ∼ Tk
(µ
(x2 − µ), κ(η) · (Ω11 − Γ12Ω
),
112Ω
2
2 Γ21), η + n − k
where
(
)
1
κ(η) =
1+ 1
1
(x2 − µ2)
1+n−k
η (x2 − µ2)Ω
η
5.
Components of T-distributed random variables are non-independent. Let
x=[x1,x2]beasabove,then
Tn(µ, Ω, η) = Tn1, Ω11, η) · Tn2, Ω22, η); η < ∞
for all eligible (µ, Ω). Even for the particular case where Γ12 and Γ21 being
a zero-matrix this holds. The subvectors x1 and x2 will not be independent
even if Ω is a block matrix. When η → ∞ we get the multivariate Gaussian
case and independence can be obtained.
6.2
The CST-distribution
The closed-skew T-distribution adds skewness into the T-distribution. The most
convenient way to define the CST-distribution is to consider a random vector
52
6.2. THE CST-DISTRIBUTION
t ∈ Rn and a random vector v ∈ Rq. Let the joint pdf of (t,v) be multivariate
T-distributed as
]
]
]
)
[t
( [µ
t
[ Ωt Γtv
∼T
,
v
µv
Γvt Ωv
Then the CST-distributed variable of interest x ∈ Rn is defined as
x=[t|v≥0].
Let p(x) be the generic term for a pdf of the argument, and the term p(x|d)
represents the conditional pdf of x given d. The pdf of x is then given as:
x∼p(x)=p(t|v≥0)= p(v≥0|t)p(t)
p(v ≥ 0)
=[1 − Tq (0; µv , Ωv , η)]−1
[1 − Tq (0; µ
] τn(t; µ
v|t, Ωv|t, η + n)
tt,η),
where
µv|tv + ΓvtΩt1(t − µt)
(
)
1
Ωv|t =
1+ 1
v − ΓvtΩt1Γtv )
1+n
η (t − µt)Ω−1(t − µt)
η
Here Tq (·; µ, Ω, η) is the q-dimensional cumulative T-distribution. The closed-skew
T-distribution can be parameterized as:
CSTn,q (µ, Ω, Γ, ν, Δ, η) = [Tq (0; ν, Δ+ΓΩΓ, η)]−1Tq (Γ(x−µ); ν, Δ, η+n)τn(x; µ, Ω, η)
with µ = µt, Ω = Ωt, Γ = ΓvtΩt1,ν = −µv and Δ = Ωv|t.
The transformation between the CST-distribution for x and the T-distribution
for (t, v) is one-to-one given by:
]
]
]
)
[t
( [µ
[ Ωt Γtv
t
x=[t|v≥0]∼CSTn,q(µ,Ω,Γ,ν,Δ,η)⇔
∼ Tn+q
,
v
µv
Γvt Ωv
(6.1)
53
CHAPTER 6. THE CST-DISTRIBUTION
with
µ=µt ∈Rn
µt = µ ∈ Rn
Ω=Ωt ∈Rn×n
Ωt = Ω ∈ Rn×n
Γ=ΓvtΩt1 ∈Rq×n
⇐⇒ Γvt = Γtv = ΓΩ ∈ Rq×n
ν =−µv ∈Rq
µv = −ν ∈ Rq
Δ=Ωv −ΓvtΩt1Γtv ∈Rq×q
Ωv = Δ + ΓΩΓ ∈ Rq×q
6.3
Properties of the CST-distribution
Some favorable characteristics of the CST-distribution that are relevant for inver-
sion problems are given here and proven in Appendix A:
1.
Linear combinations of components CST-distributed random variables are
also CST-distributed random variables. If x ∼ CSTn,q (µ, Ω, Γ, ν, Δ, η) and
A is a deterministic l × n matrix with (l ≤ n). Then
[y = Ax] ∼ CSNl,qyyyyyy),
where
µy =Aµ,
Ωy =AΩA,
Γy =ΓΩAΩy1,
νy =ν,
Δy =Δ + ΓΩΓ − ΓΩAΩy1AΩΓ
ηy =η.
So the CST-distribution is closed under linear transformations. This is
proven in Appendix A.1. Note that a result is that also marginal pdfs will
be CST by choosing A = b(i) with b(i) an n × 1 vector with entries zeros
except for element number i, so that xi = b(i)x.
2.
Components of CST-distributed random variables are presumable non-independent.
From property 5 in Section 6.1 we know that components of T-distributed
random variables are non-independent, hence is unlikely that there is possible
to construct a CST-distributed random variable with independent compo-
nents. We have not been able to prove this.
54
6.4. EXAMPLES OF THE CST-DISTRIBUTION
3.
CST-distributed random variables conditional of the components are also
CST-distributed variables. Let x ∼ CSTn,q (µ, Ω, Γ, ν, Δ, η), with x1 ∈ Rk
and x2 ∈ Rn−k being two subvectors of x = [x1, x2]. Further let the param-
eters associated with the subvectors be:
]
]
[µ
[Ω11
Ω12
1
µ=
,
Ω=
,
Γ=[Γ12].
µ2
Ω21
Ω22
Then
[x1|x2] ∼ CSTk,q1|2, Ω1|2, Γ1|2, ν1|2, Δ1|2, η1|2).
Where
µ1|21 + Ω12Ω
(x2 − µ2),
2
Ω1|2 =κ(η)(Ω11 − Ω12Ω
2 Ω21),
Γ1|21,
ν 1|2 =ν −
(Γ2 + Γ1Ω12Ω
) (x2 − µ
2
2),
Δ1|2 =κ(η)Δ
η1|2 =η.
This is proven in Appendix A.2. So the CST-distribution is closed under linear
transformation and conditioning.
6.4
Examples of the CST-distribution
The CST-distribution is closely related to the CSN-distribution and the T-distribution.
To illustrate the characteristics of the CST-distribution some examples of the dis-
tribution will be shown. We will first consider a CST-distributed x ∈ R1, with the
similar base case as in Section 2.4 with
µ=5, Ω=9, Γ=1, ν =0, Δ=0.05.
(6.2)
Note that Ω, Γ and Δ here are one dimensional matrices. In Figure 6.1 are the
density displayed for four different degrees of freedom, η. For η = ∞ we have a
CSN distribution and the same density as displayed in Figure 2.1. It is observed
that lower degrees of freedom gives heavier tails.
55
CHAPTER 6. THE CST-DISTRIBUTION
=
=3
=2
=1
4
5
6
7
8
9
10
11
12
13
Figure 6.1: Pdf for different degrees of freedom. µ = 5, Ω = 9, Γ = 1, ν = 0 and
Δ=0.05.
The same situations will occur for higher dimensional problems, and we will
here also study a particular CST-distributed variable x with n = q = 2, and
parameters
]
]
]
]
]
[5
[ 1
0.2
[4
0
[−2
[1
0
µ=
,
Ω=
,
Γ=
,
ν =
,
Δ=
7
0.2
4
0
5
6
0
1
The pdf of this two dimensional CST distribution is shown in Figure 6.2. The CSN-
distribution with η = ∞ is displayed in Figure 6.2a and the CST-distribution with
η =2 is displayed in Figure 6.2b. It is observed that the distribution with η =2
has heavier tails than the other.
56
6.4. EXAMPLES OF THE CST-DISTRIBUTION
16
15
14
13
12
11
10
9
8
7
2
4
6
8
10
12
x
1
(a) η = ∞
16
15
14
13
12
11
10
9
8
7
2
4
6
8
10
12
x
1
(b) η = 2
Figure 6.2: Bivariate pdf and marginal pdfs for a CST-distributed x.
57
CHAPTER 6. THE CST-DISTRIBUTION
6.5
Bayesian CST Inversion
Let x ∈ Rnx be the variable of interest and consider a linear model
d=Hx
where d ∈ Rnd is measured data, nd < nx. Note that there can not be an error
term, e. From property 2 in Section 6.3 we know that it probably not is possible to
specify a multivariate CST-distribution of independent vectors Hx and e. Hence
is it not possible to have an independent error term in the model. The prior model
for x is defined as
x∼CSTn
x,qxx, Ωx, νx, Δx, η).
Let this define a T-distributed vector as:

t
µx
Ωx ΩxH
ΩxΓx


r=Ht
Tn
x
xxH
xΓx
η
x+nd+qx

,
,
v
−νx
ΓxΩx ΓxΩxH Δx + ΓxΩxΓx
where
x =[t|v ≥ 0], t ∈ Rnx , u ∈ Rqx
d =[r|v ≥ 0] r ∈ Rnd , v ∈ Rqd .
Then
]
]
]
)
[ t|r
([µ
[ Ωt|r Γtv|r
t|r
∼ Tn
,
,η+nd
,
x+qx
v|r
µv|r
Γvt|r Ωt|r
where
µt|rx + ΩxH(HΩxH)−1(r − Hµx)
µv|r = − νx + ΓxΩxH(HΩxH)−1(r − Hµx)
Ωt|r =κ(η + nd)(Ωx − ΩH(HΩxH)−1x)
Ωv|r =κ(η + nd)
(Δx + ΓxΩxΓx − ΓxΩxH(HΩxH)−1xΩxH))
Γv|r =κ(η + nd)
(ΓxΩx − ΓxΩxH(HΩxH)−1) .
Then from the definition of the CST-distribution we have:
[x|d] = [t|r, v ≥ 0] ∼ CSTn
x,qxx|d, Ωx|d, Γx|d, νx|d, Δx|d, η + nd + qx)
58
6.5. BAYESIAN CST INVERSION
where
µx|dx + ΩxH(HΩxH)−1(r − Hµx)
Ωx|d =κ(η + nd + qx)(Ωx − ΩH(HΩxH)−1x)
Γd|x =κ(η + nd + qx)
(ΓxΩx − ΓxΩxH(HΩxH)−1) Ω
|d
ν x|dx − ΓxΩxH(HΩxH)−1(r − Hµx)
Δx|d =κ(η + nd + qx)
(Δx + ΓxΩxΓx − ΓxΩxH(HΩxH)−1xΩxH))
− Γx|dΩx|dΓx|d.
In the limit η → ∞ we get κ(η) → 1 and the parameters are identical as
presented in Section 2.5 for the CS Gaussian distribution. Note that there is only
one parameter for the degrees of freedom, such that there is not possible to have
different degrees of freedom for different marginals. It might be possible to make
a CST-distribution with a vector of degrees of freedom, but we have not had time
to study this here.
59
Chapter 7
Inversion Case Study
In Buland and Omre (2003) Bayesian Gaussian inversion is used on data from the
Sleipner Øst field in the North Sea. The Sleipner Øst field is a gas condensate
field in the southern part of the North Sea. The depth of the reservoir is in the
range of 2270 to 2500 meter sub-sea. The prior model is inferred from elastic
properties in a well, and the assumptions for a Gaussian model are justified by a
probability plot. The data do not fit a Gaussian model very well, but the authors
states that assuming a Gaussian prior model is necessary to make the inversion
analytically tractable. In Karimi et al. (2009) it is shown that a CS-Gaussian
approach makes it possible to capture the skewness in the data and still obtain
analytical tractability.
The same data set that is used in Karimi et al. (2009) is used here to generate
realizations from the posterior model and to test the efficiency of the simulation
algorithms. Then the algorithms can be tested with real data and real parameters.
The two different prior models and the different parameter estimators are tested
with real data.
We have limited our study to consider the CS-Gaussian distribution. The
CST-distribution is more general and could possibly give even better results.
7.1
Seismic Inversion Model
The data is collected from a seismic vessel, generating waves that propagates
through the water and the seabed. In the layers in the seabed the properties of
the medium change, and portions of the waves are reflected to the surface where
hydrophones on the seismic streamers behind the vessel registers the amplitudes
of the reflected waves. This results in a set of stacked seismic data. This gives a
convolved linear relation between the data and the reflection coefficient, r,
d=Wr+e,
61
CHAPTER 7. INVERSION CASE STUDY
where W is a wavelet matrix shown in Figure 7.1. The seismic observation error is
defined as e = W e1 + e2, where e1 and e2 are independent Gaussian distributed
with mean zero and variance σe. Hence e is wavelet colored with mean zero and
variance Σe = σe(W W +I). Since the CS-Gaussian distribution is a generalization
of the Gaussian distribution e is also CS-Gaussian distributed as CSN(0, Σe, ·, ·, ·),
where the three latter parameters are unspecified since q = 0. The reflection
Wavelet
25
20
15
10
5
0
−5
−10
−15
−30
−20
−10
0
10
20
30
Figure 7.1: Wavelet
coefficient for layer j can be written as rj = ajr)/aji), where ajr) is the reflected
amplitude of the wave and aji) is the amplitude of the incident wave between layer
j and j + 1. Let vj be the velocity of the P-wave and let ρj be the density in
layer j, as shown in Figure 7.2. Then will the impedance, zj , in layer j be vj ρj .
The parameters v and ρ from a well at the Sleipner Øst field is used to calculate
the impedance, z, and the resulting data are displayed in Figure 7.3. The depth
interval used is [2050, 2378] milliseconds with two milliseconds sampling interval
making the number of data, T = 165.
For normal incidence Zoeppritz equations reduce to a simple form as described
in Sheriff et al. (1995). This gives
rj = ajr)
= vj+1ρj+1 − vj ρj
vj+1ρj+1 + vj ρj = zj+1 + zj
aji)
≈ Δzj
≈ 1
2zj
2Δln(zj).
62
7.1. SEISMIC INVERSION MODEL
vj
ρj
Layer j
aji)
ajr)
θ
vj+1
ρj+1
aj+) 1
Layer j + 1
Figure 7.2: Layer j and j + 1.
Where ln(·) is the natural logarithm and Δ ln zj = ln zj+1 − ln zj . Then the linear
relation between d and r becomes
d=Wr+e=WDlnz+e
=Hx+e,
(7.1)
where x = ln z, H = W D and D is given as
−1
1
0
···
···
0
0
−1
1
0
···
0
D= 1
2
0
··· ...
0
−1
1
0
···
···
0
0
0
Then the model is on the same form as in Expression (2.7). From the model
described in Expression (7.1) and the observed impedances, synthetic seismic data,
d, can be created. The error variance is set to σe
=5·10−5 asusedinKarimietal.
(2009). The observed impedance, zo is displayed in Figure 7.3 and the synthetic
seismic data is displayed in Figure 7.4.
63
CHAPTER 7. INVERSION CASE STUDY
P−Wave Velocity
Density
Impedance
2050
2050
2050
2100
2100
2100
2150
2150
2150
2200
2200
2200
2250
2250
2250
2300
2300
2300
2350
2350
2350
2000
2500
3000
3500
4000
2000
2100
2200
2300
2400
2500
4
5
6
7
8
m/s
kg/m3
kg/m3 m/s
x 106
Figure 7.3: Material property observations from a well.
Seismic data
2050
2100
2150
2200
2250
2300
2350
2400
−3
−2
−1
0
1
2
3
Figure 7.4: Synthetic seismic data.
64
7.1. SEISMIC INVERSION MODEL
In order to solve the inverse problem a prior model for x must be determined.
It is natural to assume a stationary prior as described in Section 4.1. The used
dependency measure is c(h) = exp{−h/15}. The empirical correlation function
and the used dependency measure are displayed in Figure 7.5. As described in
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
10
20
30
40
50
60
70
Lag
Figure 7.5: Fitted correlation function of the log-impedance ln(z).
Chapter 4 are there three different priors that can be used. For the x-space prior
with the pseudo likelihood estimator is qx set to 3. The results from the different
priors are:
x-space prior:
fL: µx
= 15.85,
σx
= 0.0351,
γx. = −15.32
pL: µx
= 15.78,
σx
= 0.0398,
γx. = −32.73
tv-space prior:
pL: µt
= 15.61,
σt
= 0.0110,
γtv. = −0.019,
Gaussian prior:
pL: µt
= 15.59,
σt
= 0.0109.
The different prior models are displayed together with the observations in Figure
7.6. Because of the high range parameter and the positive definite restrictions in
65
CHAPTER 7. INVERSION CASE STUDY
15.2
15.3
15.4
15.5
15.6
15.7
15.8
Ln z
Figure 7.6: Estimated CS-Gaussian model and histogram of the log-impedance
ln(z). Solid line (—-) is with the fL and the x-space prior. Dashed line (− − −)
is the pL with x-space prior. Doted line, (· · · ), is pL with the tv-space prior.
Expression (5.5) is it not possible to capture the skewness in the data with this
tv-space model. Hence is the parameters with the tv-space prior almost Gaussian.
66
7.2. INVERSION RESULTS
7.2
Inversion Results
To predict the impedance, z, from the seismic data the posterior distribution of
x is found as described in Section 2.5. As displayed in Figure 7.7 the Iterative
Simulation algorithm appears to converge after 15 iterations, because of the fact
that the distribution is unimodal as discussed before. After the burn in period are
1000 realizations generated from the posterior model. Realization number 100,
400, 700 and 1000 are displayed in Figure 7.8 with parameters from the x-space
prior and fL. The median of z is calculated from these 1000 realizations. Then
the impedance is calculated as [
|d] = exp([
|d]) = exp(Q0.5(x|d)). In Karimi
et al. (2009) is the prediction from the posterior done by an approximation, since
they are not able to generate from the full posterior distribution. But with the
Iterative Simulation algorithm presented in this thesis it is possible to generate
realizations from the full posterior distribution. In Figure 7.9 the true impedance
is compared with the predicted impedance for the x-space prior with fL, and it
is observed that the predicted values fit the true data quite good. The generated
realizations are also used to make 80%-prediction interval as displayed in Figure
7.10 for the x-space prior with the fL estimator. The results seems to fit the real
values very good. In Figure 7.11 the results for the x-space prior with the pL
estimator is displayed, hence is it observed that the result is heavy shifted.
100
90
80
70
60
50
40
30
20
10
0
0
5
10
15
20
25
30
35
40
45
50
Itteration number
Figure 7.7: Traceplot for convergence of v.
67
CHAPTER 7. INVERSION CASE STUDY
Impedance
Impedance
2050
2050
2100
2100
2150
2150
2200
2200
2250
2250
2300
2300
2350
2350
2400
2400
4.5
5
5.5
6
6.5
7
7.5
4.5
5
5.5
6
6.5
7
7.5
8
kg/m3 m/s
x 106
kg/m3 m/s
x 106
(a) Realization number 100.
(b) Realization number
400.
Impedance
Impedance
2050
2050
2100
2100
2150
2150
2200
2200
2250
2250
2300
2300
2350
2350
2400
2400
4.5
5
5.5
6
6.5
7
7.5
4
4.5
5
5.5
6
6.5
7
7.5
8
kg/m3 m/s
x 106
kg/m3 m/s
x 106
(c) Realization number 700.
(d) Realization number 1000.
Figure
7.8: Realizations from the posterior model. The x-space prior and the full
Likelihood estimator is used.
68
7.2. INVERSION RESULTS
Impedance
2050
2100
2150
2200
2250
2300
2350
2400
4
4.5
5
5.5
6
6.5
7
7.5
8
kg/m3 m/s
x 106
Figure 7.9: Inversion results
from Bayesian CS-Gaussian inversion.
The true
impedance and the predicted impedance. The x-space prior and the full Like-
lihood estimator is used.
Impedance
2050
2100
2150
2200
2250
2300
2350
2400
4
4.5
5
5.5
6
6.5
7
7.5
8
kg/m3 m/s
x 106
Figure 7.10: 80%-prediction intervals (thin dotted) and predictions (thick solid).
The x-space prior and the full Likelihood estimator is used.
69
CHAPTER 7. INVERSION CASE STUDY
Impedance
2050
2100
2150
2200
2250
2300
2350
2400
4
4.5
5
5.5
6
6.5
7
7.5
8
kg/m3 m/s
x 106
Figure 7.11: 80%-prediction intervals (thin dotted) and predictions (thick solid).
The x-space prior and the pseudo Likelihood estimator is used.
In Table 7.1 is the fraction of correct values that lies below the corresponding
quantiles listed. The x-space prior with fL estimator gives the best results, but
the predicted interval is a bit too wide. The predicted impedance is not centered
for the x-space prior with pL. That can be a result of the approximation for
the parameter estimation and a too low qx. To use a higher qx requires that a qx-
dimensional Gaussian cdf is evaluated for every step in the numerical optimization.
The tv-space prior with pL estimator gives almost the same results as the Gaussian
prior. The bias is a bit higher than for the x-space fL estimator, and the predicted
interval is wider.
Theoretical:
0.1
0.5
0.9
Estimated, x-space prior fL:
0.048
0.482
0.956
Estimated, x-space prior pL:
0.006
0.030
0.446
Estimated, tv-space prior pL:
0.036
0.619
0.960
Estimated, Gaussian prior pL:
0.036
0.590
0.970
Table 7.1: Fraction of observations below predicted quantiles.
70
7.2. INVERSION RESULTS
The Iterative Simulation algorithm use only about 35 seconds to generate 1015
realizations and estimate the median, but the Brute Force Simulation algorithm
and the Two Steps Simulation algorithm were not able to generate one single
realization during three weeks.
71
Chapter 8
Closing Remarks and Further Work
In this thesis the properties of the CS-Gaussian distribution have been studied,
with a special focus on Bayesian inversion. Bayesian inversion and parameter
inference from data is more complicated than for the Gaussian distribution, there-
fore it is necessary to generate realizations from the CS-Gaussian distribution.
An efficient simulation algorithm to generate realizations from a high dimensional
problem has been obtained. In contrast to Karimi et al. (2009) is it here possible
to generate realizations of the posterior model without any approximations. Dif-
ferent stationary prior models and different methods to do parameter estimation
have been developed. For an exponential dependence structure is it possible to
use a full-likelihood estimator, hence also capture the dependency in the data.
The simulation algorithm for the CS-Gaussian distribution were implemented
and shown to work well on synthetic seismic data inspired by the Sleipner Øst field.
Also the different prior models and estimators for the CS-Gaussian distribution
are tested on the data set. The full-likelihood estimator seems to be the best
estimator for data with exponential dependence structure.
A skew distribution with heavy tails, CST-distribution, is presented with some
properties and examples. But it is still much to study about this distribution.
A sampling algorithm should be developed, and it would also be useful to have
different degrees of freedom for each dimension of the CST-distribution. The
Bayesian CS-Gaussian inversion could also be extended to handle 3D problems.
73
Bibliography
Buland, A. and Omre, H. (2003). Bayesian linearized AVO inversion, Geophysics 68: 185.
Dominguez-Molina, J., González-Farías, G., and Gupta, A. (2003). The multivariate
closed skew normal distribution, Technical report, Technical Report 03.
González-Farías, G., Domínguez-Molina, J., and Gupta, A. (2004). The closed skew-
normal distribution, Skew-elliptical distributions and their applications: a journey
beyond normality, Chapman & Hall/CRC, Boca Raton, FL, pp. 25-42.
Karimi, O., Omre, H., and Mohammadzadeh, M. (2009). Bayesian closed-skew Gaussian
inversion of seismic AVO data into elastic material properties.
Mosegaard, K. and Tarantola, A. (1995). Monte Carlo sampling of solutions to inverse
problems, J. geophys. Res 100(12): 431-47.
Pewsey, A. (2000). Problems of inference for Azzalinis skewnormal distribution, Journal
of Applied Statistics 27(7): 859-870.
Robert, C. (1995). Simulation of truncated normal variables, Statistics and computing
5(2): 121-125.
Roislien, J. and Omre, H. (2006). T-distributed Random Fields: A Parametric Model
for Heavy-tailedWell-log Data1, Mathematical Geology 38(7): 821-849.
Rybicki, G. B. and Press, W. H. (1995). Class of fast methods for processing irregu-
larly sampled or otherwise inhomogeneous one-dimensional data, Phys. Rev. Lett.
74(7): 1060-1063.
Sheriff, R., Geldart, L., and Lees, J. (1995). Exploration seismology, Cambridge Univer-
sity Press Cambridge.
75
Appendix A
Proof of Properties for the
CST-distribution
A.1
Closed Under Linear Transformations
Let x = [t|v ≥ 0] ∼ CSTn,q (µ, Ω, Γ, ν, Δ, η), and let A be a deterministic l × n
matrix with (l ≤ n). If we do the transformation of x to the tv-space as given in
Expression (6.1) we obtain
]
]
]
)
[t
([ µ
[ Ω
ΓΩ
∼ Tn+q
,
v
−ν
ΩΓ Δ + ΓΩΓ
Hence will the distribution of (At, v) be as given in property 3 by Section
6.1:
]
]
]
)
[At
([
[AΩA AΓΩ
∼ Tl+q
,
v
−ν
ΩΓA Δ + ΓΩΓ
Then we can do the transformation back to the x-space and we obtain:
y=Ax=[At|v≥0]=∼CSTl,qyyyyyy),
where the parameters are given as
µy =Aµ,
Ωy =AΩA,
Γy =ΓΩAΩy1,
νy =ν,
Δy =Δ + ΓΩΓ − ΓΩAΩy1AΩΓ
ηy =η.
i
APPENDIX A. PROOF OF PROPERTIES FOR THE CST-DISTRIBUTION
A.2
Closed Under Conditioning
Let x ∼ CSTn,q (µ, Ω, Γ, ν, Δ, η), with x1 ∈ Rk and x2 ∈ Rn−k being two subvec-
tors of x = [x1, x2]. Further let the parameters associated with the subvectors
be:
]
]
[µ
[Ω11
Ω12
1
µ=
,
Ω=
,
Γ=[Γ12].
µ2
Ω21
Ω22
If we do the transformation of x to the tv-space as given in Expression (6.1) we
obtain

t1
µ1
Ω11
Ω12
Ω11Γ1 + Ω12Γ2


t2
Tn+q
µ2
Ω21
Ω22
Ω21Γ1 + Ω22Γ2
η

,
,
v
−ν
Γ1Ω11 + Γ2Ω21
Γ1Ω12 + Γ2Ω22
Δ+ΓΩΓ
Hence will the distribution of [(t1, v)|t2] be as given by property 4 in Section
6.1:
]
[µ
t1|t2
µt
=
1v|t2
µv|t
2
]
[
]
[ µ
1
Ω12
=
+
Ω
(t2 − µ2)
2
−ν2
Γ2Ω22 + Γ1Ω12
[
]
µ1 + Ω12Ω
(t2 − µ2)
2
=
−ν1 + (Γ2 + Γ1Ω12Ω
(t2 − µ2)
2 )
and the correlation matrix of (t1, v) given t2 is
]
[ Ωt
Γt
1|t2
1v|t2
Ωt
=
1v|t2
Γvt
Ωv|t
1|t2
2
]
([ Ω11
Ω11Γ1 + Ω12Γ2
=κ(η)
Γ1Ω11 + Γ2Ω21
Δ+ΓΩΓ
[
]
[
]
Ω12
Ω12
Ω
2
Γ2Ω22 + Γ1Ω12
Γ2Ω22 + Γ1Ω12
]
([ Ω11
Ω11Γ1 + Ω12Γ2
=κ(η)
Γ1Ω11 + Γ2Ω21
Δ+ΓΩΓ
[
])
Ω12Ω
Ω12Γ2 + Ω12Ω
2 Ω21
2 Ω21Γ1
Γ2Ω21 + Γ1Ω12Ω
Γ2Ω22Γ2 + Γ2Ω21Γ1 + Γ1Ω12Γ2 + Γ1Ω12Ω
Γ
2 Ω21
2 Ω21
1
Then we can do the transformation back to the x-space and we obtain
[x1|x2] ∼ CSTk,q1|2, Ω1|2, Γ1|2, ν1|2, Δ1|2, η1|2).
ii
A.2. CLOSED UNDER CONDITIONING
Where
µ1|2t
= µ1 + Ω12Ω
(x2 − µ2),
1|t2
2
Ω1|2t
= κ(η)(Ω11 − Ω12Ω
1|t2
2 Ω21),
Γ1|2vt
= κ(η)
1|t2
2 Ω21)(Ω11 − Ω12Ω
2 Ω21)−1
κ(η) (Γ1Ω11 + Γ2Ω21 − Γ2Ω21 − Γ1Ω12Ω
1
ν 1|2 = − µv|t
=ν−
(Γ2 + Γ1Ω12Ω
) (x2 − µ
2
2
2),
Δ1|2v|t
2 −Γvt1|t2Ωt1|t2Γt1v|t2
=κ(η) (Δ + Γ1Ω11Γ1 + Γ1Ω12Γ2 + Γ2Ω21Γ1 + Γ2Ω22Γ2
− Γ2Ω22Γ2 − Γ2Ω21Γ1 − Γ1Ω12Γ2 − Γ1Ω12Ω
2 Ω21Γ1
)
−Γ111Γ1 − Ω12Ω
2 Ω21Γ1)
=κ(η)Δ
η1|2 =η.
iii