Closed-skew Distributions

Simulation, Inversion and Parameter Estimation

Daniel Høyer Iversen

Master of Science in Physics and Mathematics

Submission date:

June 2010

Supervisor:

Karl Henning Omre, MATH

Norwegian University of Science and Technology

Department of Mathematical Sciences

Problem Description

To further develope theory based on the closed-skew Gaussian distribution and if

possible generalize it to represent heavy-tailed, skew phenomena.

Particular issues:

- improved simulation algorithm

- improved parameter estimators

- definitions of spatially, stationary models

- generalizations

This MSc-thesis is to be carried out at the Department of Mathematical Science

under guidance by Professor Henning Omre.

Assignment given: 18. January 2010

Supervisor: Karl Henning Omre, MATH

Preface

This report is the result of my work in the TMA4905 Master Thesis in the final

semester of my Master of Science degree in Industrial Mathematics. The study is

performed at the Department of Mathematical Sciences, NTNU, during the spring

semester 2010,

I would like to thank my supervisor Professor Henning Omre, for his help-

ful advice and guidance through the entire year. I would also like to thank my

"officemates" Eivind Gausland and Morten Valberg.

Daniel Høyer Iversen

Trondheim, June 14, 2010

v

vi

Abstract

Bayesian closed-skew Gaussian inversion is defined as a generalization of tradi-

tional Bayesian Gaussian inversion. Bayesian inversion is often used in seismic

inversion, and the closed-skew model is able to capture the skewness in the

variable of interest. Different stationary prior models are presented, but the

generalization comes at a cost, simulation from high-dimensional pdfs and pa-

rameter inference from data is more complicated. An efficient algorithm to gen-

erate realizations from the high-dimensional closed-skew Gaussian distribution

is presented. A full-likelihood is used for parameter estimation of stationary

prior models under exponential dependence structure. The simulation algo-

rithms and estimators are evaluated on synthetic examples. Also a closed-skew

T-distribution is presented to include heavy tails in the pdf and the model is

presented with some examples. In the last part the simulation algorithm, the

different prior models and parameter estimators are demonstrated on real data

from a well in the Sleipner Øst field. The full-likelihood estimator seems to be

the best estimator for data with exponential dependence structure

viii

Contents

Preface

v

Abstract

vii

1

Introduction

1

2

Methodology

3

2.1

The CS-Gaussian distribution

3

2.2

Properties of the CS-Gaussian Distribution .

5

2.3

Statistics for the CS-Gaussian Distribution

7

2.3.1

Location statistics

7

2.3.2

Spread statistics

9

2.4

Examples of CS-Gaussian Distributions . .

10

2.5

Bayesian CS-Gaussian Inversion

15

2.5.1

Center of Likelihood Model

17

3

Simulation

19

3.1

Algorithm A: Brute Force Simulation

19

3.2

Algorithm B: Two Steps Simulation

20

3.3

Algorithm C: Iterative Simulation

21

3.4

Empirical Study

25

4

Stationary Prior Models

31

4.1

x-space Prior

32

4.2

tv-space Prior

33

ix

Contents

4.3

Comparing the Priors

35

5

Parameter Estimation in Prior Models

37

5.1

x-space Prior

37

5.1.1

Full Likelihood Model

38

5.1.2

Localized Pseudo Likelihood Model

41

5.1.3

Empirical Test for fL and Localized pL

43

5.2

tv-space Prior

44

5.2.1

Full Likelihood Model

44

5.2.2

Pseudo Likelihood Model

45

5.2.3

Empirical Test for pL

48

5.3

Closing Remarks

49

6

The CST-distribution

51

6.1

Multivariate T-distribution

51

6.2

The CST-distribution

52

6.3

Properties of the CST-distribution

54

6.4

Examples of the CST-distribution

55

6.5

Bayesian CST Inversion

58

7

Inversion Case Study

61

7.1

Seismic Inversion Model

61

7.2

Inversion Results

67

8

Closing Remarks and Further Work

73

Bibliography

75

A Proof of Properties for the CST-distribution

i

A.1

Closed Under Linear Transformations

i

A.2

Closed Under Conditioning

ii

x

Chapter 1

Introduction

The objectives of inverse problems are to get information about a parameterized

physical system from observational data, based on prior information and theoret-

ical relationships between model parameters and data. Inverse problem theory is

used in geophysics to get information about the Earth’s properties from physical

measurements at the surface. An important inverse problem is found in seismology,

where recorded seismic waves at the Earth’s surface are used to estimate subsur-

face parameters. These inverse problems are often multidimensional and strongly

affected by noise and measurement uncertainty.

One important subject is prediction of the P-wave impedance, z, in reservoirs

from seismic stacked data. The impedance is defined as the product of the P-wave

velocity, vp, and the density, ρ. P-wave is a longitudinal wave, which means that

the particles in the material moves parallel to the direction of the travel of the wave

energy. The inverted properties are used to get information about reservoir char-

acteristics as lithology-fluid and porosity-permeability. Knowledge about these

characteristics are important for development and production of petroleum reser-

voirs. Oil companies rely on seismic analysis before selecting sites for exploratory

oil wells. The seismic waves are generated by an explosion and geophones are used

to detect the signal. More information about seismology can be found in Sheriff

et al. (1995).

From a statistical point of view the solution of an inverse problem is represented

by a posterior probability distribution. The aim of the inversion is not only to

obtain a best solution, but also to characterize the uncertainty in the solution.

A Bayesian setting makes it possible to combine available prior knowledge with

the information from the measured data. In Mosegaard and Tarantola (1995) a

model that only can be assessed by iterative algorithms is used. This approach is

too computer demanding to be applied in large scale 3D studies. The approach in

Buland and Omre (2003) relies on an analytical evaluation, hence it is extremely

computer efficient.

The approach in Buland and Omre (2003) relies heavily on Gaussian assump-

1

CHAPTER 1. INTRODUCTION

tions, so it can be termed Bayesian Gaussian inversion. It is assumed that the

prior model of the log-transformed elastic material properties is Gaussian. The

log-transform is needed to obtain a linear relation between the seismic data and

the elastic properties. A convolved linearized Zoeppritz relation with an additive

Gaussian error term defines the likelihood model. These assumptions provides a

Gaussian posterior model with model parameters analytical assessable from the

prior and the likelihood model. Then the posterior model can be determined for

high-dimensional 3D problems, but some numerical approximations may be needed

to asses the numerical values of the solution.

In Karimi et al. (2009) the Gaussian assumption in Buland and Omre (2003) is

replaced by a closed-skew (CS) Gaussian assumption, which introduce skewness in

the model for the variables involved. Under the assumptions of a CS-Gaussian prior

also the posterior model will be CS-Gaussian, with model parameters analytically

assessable.

The generalization to the CS-Gaussian distribution comes at a cost. The num-

ber of parameters in the model is larger than for the Gaussian distribution and the

inference from data is more complicated. In this study we focus on the inference

of the CS-Gaussian distribution, different stationary priors and parameter estima-

tion. Algorithms to generate realizations from the CS-Gaussian distribution are

also studied.

The thesis proceeds as follows: In Chapter 2 the CS-Gaussian distribution is

introduced and properties of this distribution are discussed. Bayesian CS-Gaussian

inversion is also presented. Further are different algorithms to generate realizations

from high dimensional CS-Gaussian distribution given. Different stationary prior

models are defined in Chapter 4. Parameter estimation for the different priors is

discussed in Chapter 5. In Chapter 6 is a closed-skew T-distribution defined with

properties and some examples. In the last part are the simulation algorithms,

prior models and parameter estimators tested on real seismic data in a Bayesian

setting. Finally we draw some conclusions, and outline possible further work in

Chapter 8.

2

Chapter 2

Methodology

2.1

The CS-Gaussian distribution

The multivariate closed-skew (CS) Gaussian distribution was first introduced in

González-Farías et al. (2004). It is a generalization of the Gaussian distribution,

and the most important extension is that it adds skewness into the Gaussian model.

The most convenient way to define the CS-Gaussian distribution is to consider a

random vector t ∈ Rn and a random vector v ∈ Rq . Let the joint probability

density function (pdf) of (t, v) be multivariate Gaussian as:

]

]

])

[t

( [µ

[ Σt Γtv

t

∼ Nn+q

,

,

(2.1)

v

µv

Γvt Σv

where µt and µv are expectation vectors and Σt, Σv and Γvt define covariance

matrices with proper dimensions. The multivariate Gaussian distribution of a

y ∈ Rk with dimension k, expectation µy ∈ Rk and covariance Σy ∈ Rk×k is

defined by the pdf:

{

}

1

y∼Nk(µy,Σy)=φk(y;µy,Σy)=

−1

(2π)k/2|Σy |1/2 exp

2 (y − µy )′Σy1(y − µy )

Then the CS-Gaussian variable of interest x ∈ Rn is defined as

x=[t|v≥0].

Where a ≥ b for vectors with elements ai and bi is defined as ai ≥ bi for all

elements i. Let p(x) be the generic term for a pdf of the argument, and the term

p(x|d) represents the conditional pdf of x given d. The pdf of the CS-Gaussian

distribution is then given as:

x∼p(x)=p(t|v≥0)= p(v≥0|t)p(t)

(2.2)

p(v ≥ 0)

= [1 − Φq(0;µv,Σv)]−1

[1 − Φq (0; µ

] φn(t; µ

v|t, Σv|t)

t,Σt),

3

CHAPTER 2. METHODOLOGY

with Φn(·; µ, Σ) and φn(·; µ, Σ) being the cumulative distribution function (cdf)

and the pdf of an n-dimensional Gaussian distribution. So the CS-Gaussian distri-

bution can be seen as a Gaussian distribution, which is skewed by a multiplication

of a cumulative Gaussian distribution. The pdf will be unimodal, since it is a prod-

uct of an unimodal function and a monotone function. The usual parameterization

of the CS-Gaussian distribution is

CSNn,q (µ, Σ, Γ, ν, Δ) = ψn,q (x; µ, Σ, Γ, ν, Δ)

(2.3)

= [Φq(0;ν,Δ + ΓΣΓ′)]−1 Φq(Γ(x − µ);ν,Δ)φn(x;µ,Σ),

with µ = µt, Σ = Σt, Γ = ΓvtΣt1, ν

= −µv , Δ = Σv − ΓvtΣt1Γtv and

Ψn,q (x; µ, Σ, Γ, ν, Δ) being the cdf of the CS-Gaussian distribution. Δ and Σ

must be positive definite matrices. Γ is called the skewness parameter, and Γ = 0

entails that the CS-Gaussian distribution coincide with the Gaussian distribution

with parameters (µ, Σ). Hence the CS-Gaussian distribution should be seen as a

generalization of the Gaussian distribution, and not an alternative distribution.

The q-parameter may be seen as the degree of skewness freedom in the pdf, and

q = 0 also entails that the CS-Gaussian distribution coincide to the Gaussian

distribution. Note also that Δ = Σv|t, where Σv|t is the covariance of v given t.

There exists no analytical solution of the Gaussian cdf, Φq (·; ·, ·), and there

is also computationally hard to estimate the value of Φq (·; ·, ·). Hence it is not

possible to obtain an analytical expression for the CS-Gaussian pdf. That cause

problems in several cases, and is the underlying problem in this thesis.

The transformation between the CS-Gaussian distribution for x and the Gaus-

sian distribution for (t, v) is one-to-one given by:

]

]

])

[t

( [µ

[ Σt Γtv

t

x=[t|v≥0]∼CSNn,q(µ,Σ,Γ,ν,Δ)⇐⇒

∼ Nn+q

,

v

µv

Γvt Σv

(2.4)

with

µ=µt ∈Rn

µt = µ ∈ Rn

Σ=Σt ∈Rn×n

Σt = Σ ∈ Rn×n

Γ=ΓvtΣt1 ∈Rq×n

⇐⇒

Γvt = Γtv = ΓΣ ∈ Rq×n

ν =−µv ∈Rq

µv = −ν ∈ Rq

Δ=Σv −ΓvtΣt1Γtv ∈Rq×q

Σv = Δ + ΓΣΓ′ ∈ Rq×q

4

2.2. PROPERTIES OF THE CS-GAUSSIAN DISTRIBUTION

2.2

Properties of the CS-Gaussian Distribution

Some favorable characteristics of the CS-Gaussian distribution that are relevant

for inversion problems are proved in González-Farías et al. (2004):

1.

Linear combinations of components CS-Gaussian random variables are also

CS-Gaussian random variables. If x ∼ CSNn,q (µ, Σ, Γ, ν, Δ), A is a deter-

ministic l × n matrix with (l ≤ n) and c is a deterministic l-dimensional

vector. Then

[y = Ax + c] ∼ CSNl,q(µy,Σy,Γy,νy,Δy),

where

µy =Aµ + c,

Σy =AΣA′,

Γy =ΓΣA′Σy1,

νy =ν,

Δy =Δ + ΓΣΓ′ − ΓΣA′Σy1AΣΓ′.

So the CS-Gaussian distribution is closed under linear transformations.

2.

The composition of two independent CS-Gaussian variables will also be CS-

Gaussian. If x1 and x2 are two independent CS-Gaussian variables with

distribution x1 ∼ CSNn

x

1 ,qx1 (µx1 , Σx1 , Γx1 νx1 , Δx1 ) and

x2 ∼ CSNn

x

2 ,qx2 (µx2 , Σx2 , Γx2 , νx2 , Δx2 ), then

]

[x1

y=

∼ CSNn,q (µy , Σy , Γy , νy , Δy ).

x2

5

CHAPTER 2. METHODOLOGY

Where

n =nx

1 +nx2,

q =qx

1 +qx2,

]

[µ

x1

µy =

,

µx

2

]

[Σx

0

1

Σy =

,

0

Σx

2

]

[Γx

0

1

Γy =

,

0

Γx

2

]

[νx

1

νy =

,

νx

2

]

[Δx

0

1

Δy =

0

Δx

2

3. CS-Gaussian random variables conditional of the components are also CS-

Gaussian variables. Let x ∼ CSNn,q (µ, Σ, Γ, ν, Δ), with x1 ∈ Rk and x2 ∈

Rn−k being two subvectors of x = [x1, x2]′. Further let the parameters

associated with the subvectors be:

]

]

[µ

[Σ11

Σ12

1

µ=

,

Σ=

,

Γ=[Γ1,Γ2].

µ2

Σ21

Σ22

Then

[x1|x2] ∼ CSNk,q (µ1|2, Σ1|2, Γ1|2, ν1|2, Δ1|2).

Where

µ1|2 =µ1 + Σ12Σ

(x2 − µ2),

2

Σ1|2 =Σ11 − Σ12Σ

2 Σ21,

Γ1|2 =Γ1,

ν 1|2 =ν −

(Γ2 + Γ1Σ12Σ

) (x2 − µ

2

2),

Δ1|2 =Δ.

Note that a result of property 1. is that also marginal pdfs will be CS-Gaussian

by choosing A = b(i) with b(i) an n × 1 vector with entries zeros except for element

number i, so that xi = b(i)x. A result of property 1. and 2. is that a sum of

independent CS-Gaussian variables also will be CS-Gaussian.

6

2.3. STATISTICS FOR THE CS-GAUSSIAN DISTRIBUTION

2.3

Statistics for the CS-Gaussian Distribution

We will here discuss location statistics and spread statistics.

2.3.1

Location statistics

The mode, mean and median are three location statistics that are useful in char-

acterizing the distribution. These statistics can be used for prediction and are for

x∼p(x)definedas:

• Mode, Mod(x) = arg maxx p(x).

• Mean, E(x) = [E(x1), E(x2), ..., E(xn)]′, where E(xi) = ∫

xi xip(xi)dxi.

• Median, Med(x) = [Med(x1), Med(x2), ..., Med(xn)]′, where Med(xi) is de-

fined by ∫

−∞ p(xi)dxi = 0.5.

For the Gaussian distribution these three statistics are identical, but for a CS-

Gaussian distribution they are in general different, due to the skewness in the

CS-Gaussian model. These statistics are also analytical obtainable for the Gaus-

sian distribution, since they are given as the model parameter µ. It would be

useful to be able to calculate the mean, mode and median for the CS-Gaussian

distribution.

Mode. Since we know that the CS-Gaussian pdf is unimodal, the mode is found

as the x that satisfies

∇xp(x) = 0

where

]′

[ ∂

∂

∂

∇x =

∂x1 ,

∂x2 , ...,

∂xn

Let p(t|v ≥ 0) be as given in Expression (2.2). The derivation gives

∫ 0

{

}

′

∇tp(t|v ≥ 0) =c1

exp

−1

(v − µv − ΓvtΣt1(t − µt))

|t (v − µv|t)

v

2 (v − µv|t)′Σ

{

}

· Σ

−1

|t ΓvtΣt1dv · exp

2 (t − µt)′Σt1(t − µt)

{

}

+ c2(1 − Φ(0; µv|t, Σv|t)) exp

−1

(t − µt)Σt1

2 (t − µt)′Σt1(t − µt)

=0.

7

CHAPTER 2. METHODOLOGY

It is hard to solve this equation with respect to t, hence an analytical solution for

the mode is complicated to obtain. Numerical methods can be used to solve this

problem. For example can stochastic simulation be used, by generating realizations

from the CS-Gaussian distribution. Then choose the area with the highest density

of points as the mode. More traditionally optimization methods are complicated

to use, due to the difficulty of analytically compute the value of the CS-Gaussian

pdf. These methods are not investigated any further in this project.

Mean. The mean is given as the first moment. If x ∼ CSNn,q (µ, Σ, Γ, ν, Δ), then

the moment generating function of x is given in González-Farías et al. (2004) as

{

}

Mx(s) = Φq (ΓΣs; v, Δ + ΓΣΓ′)

· exp

s′µ + 1

,

s∈Rn.

Φq (0; v, Δ + ΓΣΓ′)

2 s′Σs

The mean of the CS-Gaussian distribution is then given as

E(x) = ∇sMx(s)|s=0 .

Then

{

}

∇sMx(s) =ΣΓ′ [∇sΦq (s; ν, Δ + ΓΣΓ′)]′

· exp

s′µ + 1

Φq (0; ν, Δ + ΓΣΓ′)

2 s′Σs

{

}

+ Φq (ΓΣs; ν, Δ + ΓΣΓ′)

· exp

s′µ + 1

· (µ + Σs)

Φq (0; ν, Δ + ΓΣΓ′)

2 s′Σs

So the mean value of x is then

∇sMx(s)|s=0 = µ + ΣΓ′η,

(2.5)

where

η= [∇sΦq(s;ν,Δ+ΓΣΓ′)]′

Φq (0; ν, Δ + ΓΣΓ′)

s=0

(Dominguez-Molina et al. 2003). For a high dimensional problem it is complicated

to compute the mean, since it does not exits any analytical solution for Φq (·; ·, ·).

But for q = 1 the mean will be

E(x) = µ + ΣΓ′ φ1(0; ν, Δ + ΓΣΓ′)

Φ1(0; ν, Δ + ΓΣΓ′) .

To obtain the mean for a higher dimensional problem a numerical method can be

used to compute Expression (2.5). Alternatively, E(xi) = ∫

xi xipi(xi)dxi can be

approximated by a numerical method. For example by stochastic integration as

∑

E(xi) = 1

xij),

N

j=1

8

2.3. STATISTICS FOR THE CS-GAUSSIAN DISTRIBUTION

where xij) is realization number j of element xi from simulation of the CS-Gaussian

distribution.

Median. The median is often used as location measure instead of the mean. The

median is not simple to obtain analytically, since it requires an inversion of the cdf

for the CS-Gaussian distribution. Therefore it is necessary to predict the median

with numerically methods. For example by

Med(x

i) = Med

{xi1), xi2), ..., xij), ..., xiN )} ,

where xij) is realization number j of element xi from the CS-Gaussian distribution.

It is possible to predict the mean and the median from each marginal distri-

bution. But since the marginal distribution depends of the full v-vector it will be

faster to generate x(j) from the joint distribution.

2.3.2

Spread statistics

The covariance and the quantiles are two spread statistics that are suitable for

characterizing the variability in the model. These statistics can be used to describe

the uncertainty in the predicted values and are defined as:

• Covariance, Cov(x) = E ((x − E(x)(x − E(x))′).

• Quantiles, the α-quantile, Qα(x) = [Qα(x1), Qα(x2), ..., Qα(xn)]′, where Qα(xi)

is defined by ∫−∞ p(xi)dxi = α.

For the Gaussian distribution the covariance is given as the model parameter, Σ,

while the quantiles must be found by numerical methods. It would be useful to be

able to calculate these spread statistics for the CS-Gaussian distribution.

Covariance. The covariance is given as Cov(x) = E ((x − E(x))(x − E(x)′) =

E(xx′) − E(x)E(x′). So the covariance can be discussed from the first and sec-

ond moment. The first moment is already found and the second moment can be

calculated as

E(xx′) =∇sMx(s)|s=0

=Σ + µµ′ + µη′ΓΣ + ΣΓηµ′ + ΣΓ′ΛΓΣ,

9

CHAPTER 2. METHODOLOGY

where

Λ= ∇s∇sΦq(s;ν,Δ+ΓΣΓ′)

Φq (0; ν, Δ + ΓΣΓ′)

s=0

So the covariance of x is given as

Cov(x) =E(xx′) − E(x)E(x′)

=Σ + ΣΓ′ΛΓΣ − ΣΓ′ηη′ΓΣ

(Dominguez-Molina et al. 2003). It is not possible to compute the covariance

analytically for a high dimensional problem, so we need to use numerical approx-

imations.

Quantiles. First note that Q0.5(x) is identical as the median. So inversion of the

CS-Gaussian cdf is required to obtain the quantiles. As for the median it seems

that a numerical approximation is best for computing the quantiles. Also here it

is natural to use stochastic simulation to calculate the quantiles.

2.4

Examples of CS-Gaussian Distributions

The CS-Gaussian distribution is closely related to the Gaussian distribution and

has therefore also many similar properties. To illustrate the characteristics of the

CS-Gaussian distribution some examples of the CS-Gaussian distribution will be

shown. We will first consider a CS-Gaussian x ∈ R1, with a base case as

µ=5, Σ=9, Γ=1, ν =0, Δ=0.05.

(2.6)

Note that Σ, Γ and Δ here are one dimensional matrices.

From Expression (2.3) we know that the CS-Gaussian distribution is propor-

tional to a product of a Gaussian pdf with variance Σ and a Gaussian cdf with

variance Δ. A lower variance, Δ, will give more skewness in the model. A higher

variance, Σ, for the Gaussian pdf will also lead to more skewness. This can be

demonstrated by considering the base case, but with three different values for Σ

and Δ:

Σ1 = 9, Δ1 = 0.05

Σ2 = 6, Δ2 = 0.5

Σ3 = 3, Δ3 = 5.

10

2.4. EXAMPLES OF CS-GAUSSIAN DISTRIBUTIONS

1=9, 1=0.05

2=6, 2=0.5

3=3, 3=5

0.25

0.2

0.15

0.1

0.05

0

1

2

3

4

5

6

7

8

9

10

11

Figure 2.1: Pdf for different Σ and Δ, and µ = 5, Γ = 1 and ν = 0.

The pdf for these three cases are displayed in Figure 2.1.

The skewness in the model is not only determined by Δ and Σ, it is also

influenced by ν. For Δ very small and Σ very large will there still not be any

skewness in the model if ν = −∞. This effect is demonstrated by considering

a CS-Gaussian x with the parameters given in Expression (2.6), but with three

different values of ν:

ν1 = 0

ν2 = −3

ν3 = −9.

The three different pdfs are displayed in Figure 2.2. In the first case the pdf is very

skewed and in the third case the pdf is almost Gaussian. From these two examples

we observe that higher Σ will give longer tail on the right side and a small Δ will

give a short left tail. The CSN pdfs can be seen as a Gaussian pdf that here is cut

11

CHAPTER 2. METHODOLOGY

1=0

2=−3

3=−9

0.25

0.2

0.15

0.1

0.05

0

1

2

3

4

5

6

7

8

9

10

11

Figure 2.2: Pdf for different ν, and µ = 5, Σ = 9, Γ = 1 and Δ = 0.05.

off at the left side. The position of the cut off is decided by ν and the sharpness

is decided by Δ.

The parameter µ is the only parameter that not will influence the skewness

in the model. In an equivalent way as for the µ in the Gaussian distribution will

the µ in the CS-Gaussian distribution only shift the location of the distribution.

The translation effect is demonstrated by considering a CS-Gaussian x with the

parameters given in Expression (2.6), but with three different values of µ:

µ1 = 5

µ2 = 7

µ3 = 2.

All these three pdfs are identical except for a shift in the density, as displayed in

Figure 2.3.

12

2.4. EXAMPLES OF CS-GAUSSIAN DISTRIBUTIONS

1=5

2=7

3=2

0.25

0.2

0.15

0.1

0.05

0

1

2

3

4

5

6

7

8

9

10

11

Figure 2.3: Pdf for different µ, and Σ = 9, Γ = 1, ν = 0 and Δ = 0.05.

The same situations will occur for higher dimensional problems, and we will

here also study a particular CS-Gaussian distributed variable x with n = q = 2,

and parameters

]

]

]

]

]

[5

[ 1

0.2

[4

0

[−2

[1

0

µ1 =

,

Σ1 =

,

Γ1 =

,

ν1 =

,

Δ1 =

7

0.2

4

0

5

6

0

1

The pdf of this two dimensional CS-Gaussian distribution is shown in Figure 2.4.

Another example is shown in Figure 2.5 with

]

[4

0

µ2 = µ1, Σ2 = Σ1, Γ2 =

,

ν2 =ν1, Δ2 =Δ1.

0

0

Note that the only difference between these two examples is one element in Γ2.

It is observed from Figure 2.4 and Figure 2.5 that there is more skewness in the

first example. By setting one diagonal element in Γ2 to zero makes the element

13

CHAPTER 2. METHODOLOGY

t2 almost uncorrelated with v, hence will the marginal of t2 be almost Gaussian

with mean 7. This shows that the Γ controls the skewness in the model. It is also

observed that this small change in the Γ also change the location of the distribution

in the x2-dimension. In the first example the mode is about (5, 9) and in the second

example is it about (5, 7).

These examples show that Σ, Γ, ν and Δ all influence the location, spread and

skewness in the model. This makes the inference in the model quite complicated.

12

10

8

6

4

2

2

4

6

8

10

12

x

1

Figure 2.4: First bivariate CS-Gaussian example. Bivariate pdf and marginal pdfs.

14

2.5. BAYESIAN CS-GAUSSIAN INVERSION

12

10

8

6

4

2

2

4

6

8

10

12

x

1

Figure 2.5: Second bivariate CS-Gaussian example. Bivariate pdf and marginal

pdfs.

2.5

Bayesian CS-Gaussian Inversion

Let x ∈ Rnx be the variable of interest and consider a linear model

[d|x] = Hx + e,

(2.7)

where d ∈ Rnd is measured data with error e ∈ Rnd . The object is to assess x given

d, and there exists several methods to do this. Here we will consider Bayesian in-

version, so we assume a probability distribution for x and e. In Bayesian inversion

we are interested in the probability distribution of an x given some observed data

d. Bayes’ rule gives the posterior model as

p(x|d) = c · p(d|x) · p(x),

where c is a constant, p(d|x) is the likelihood and p(x) represents the prior knowl-

edge about the variable of interest. The prior model for x is defined as

x∼CSNn

x,qx (µx, Σx, Γx, νx, Δx).

15

CHAPTER 2. METHODOLOGY

The error, e, is assumed to be independent of x and distributed as

e∼p(e)=CSNn

d,qd (0, Σe, Γe, 0, I),

where I is the identity matrix of proper dimension. This gives

[d|x] ∼ p(d|x) = CSNn

d,qd (Hx, Σe, Γe, 0, I).

Then the posterior model p(x|d) is given by

[x|d] ∼ p(x|d) = CSNn

x,qx+qd (µx|d, Σx|d, Γx|d, νx|d, Δx|d).

Where the parameters are given in Karimi et al. (2009) as

µx|d =µx + ΣxH′[HΣxH′ + Σe]−1(d − Hµx)

Σx|d =Σx − ΣxH′[HΣxH′ + Σe]−1HΣx

]

]

]

[[ΓxΣx

[ΓxΣxH′

Γx|d =

−

[HΣxH′ + Σe]−1HΣx

Σ

|d

0

ΓeΣe

]

]

[νx

[ΓxΣxH′

ν x|d =

−

[HΣxH′ + Σe]−1(d − Hµx)

0

ΓeΣe

]

]

]′

′

[Δx + ΓxΣxΓx

0

[ΓxΣxH′

[ΓxΣxH

Δx|d =

−

[HΣxH′ + Σe]−1

0

I+ΓeΣeΓe

ΓeΣe

ΓeΣe

− Γx|dΣx|dΓx|d.

This is a generalization of the traditional Bayesian Gaussian inversion, which

is presented in Mosegaard and Tarantola (1995) and in Buland and Omre (2003).

This can be seen by setting Γe = 0 in the likelihood and Γx = 0 in the prior

model. In Bayesian Gaussian inversion the prior is Gaussian and the likelihood

is Gauss-linear. Then the posterior model will be Gaussian with parameters µx|d

and Σx|d. The posterior model, p(x|d), can be used to predict x based on the

data, d.

All the three location statistics presented in Section 2.3 can be used as pre-

dictors for x given the data d, but we have chosen to use the median. Hence

x|d] = Med{x|d} = Q0.5{x|d}. The median is,

as argued for in Karimi et al. (2009), a good choice since it is logically related to

the natural definition of the (1 − α)-prediction interval, [Qα/2{x|d}, Q1−α/2{x|d}].

The median is also a linear operator with respect to any monotone function h(·),

16

2.5. BAYESIAN CS-GAUSSIAN INVERSION

i.e. Q0.5{h(x|d)} = h(Q0.5{x|d}). Then for example [

x|d}, where

z =exp{x}. The median as a predictor is also robust with respect to deviations

in the tail of the posterior model p(x|d). Note that the median is frequently used

as a predictor for skew distributions instead of the mean, since the mean is less

robust towards long tails than the median.

2.5.1

Center of Likelihood Model.

The error term, e, of the likelihood model is not centered for all choices of pa-

rameters Σe and Γe. It is normal to assume an error term with location measure

zero, and with Γe = 0 the distribution is reduced to a Gaussian distribution with

location zero. Another way to make sure that the error term has zero median is to

introduce a centered error, ec. The centered error term has the same parameters

Σe and Γe as e, but includes an additional µec . The µec parameter should translate

the distribution such that median will be zero, and is therefore defined as

µec = −Med(e).

The median of e must be obtained by a numerical method.

17

Chapter 3

Simulation

Simulation based inference of the CS-Gaussian distribution will often be required.

In order to estimate the median and the quantiles as seen in Section 2.3 it is useful

to generate random values, x, from the multivariate CS-Gaussian distribution.

Even to make density plots as shown in Figure 2.4 and Figure 2.5, it is useful to

be able to simulate from the CS-Gaussian distribution, since it is computationally

hard to compute the density directly. In this chapter three different algorithms

to generate realizations from the multivariate CS-Gaussian distribution will be

presented. An empirical study will be done to compare the algorithms in the end

of the chapter.

3.1

Algorithm A: Brute Force Simulation

The simplest way to simulate, x, from a CS-Gaussian distribution is to generate

(t, v) from the multivariate Gaussian distribution as given in Expression (2.1) with

rejection sampling until v ≥ 0.

Algorithm A: Brute Force Simulation

• Repeat u[n]l v ≥ 0

]

])

t

([µ

[ Σt Γtv

t

Generate

∼ Nn+q

,

v

µv

Γvt Σv

• x←t

Then x will be a realization of the CS-Gaussian distribution.

This algorithm is not efficient for µv ≪ 0 and elements in v highly negatively

correlated, since it is very unlikely to generate vi ≥ 0; i = 1, ..., q at the same

time. When v is high dimensional it will be almost impossible to generate all

vi ≥ 0; i = 1, ..., q simultaneously.

19

CHAPTER 3. SIMULATION

3.2

Algorithm B: Two Steps Simulation

A more efficient algorithm is to first generate v until v ≥ 0, and then generate t

conditional on v. Consider the joint distribution p(t, v), then

p(v|v ≥ 0) = p(v, v ≥ 0)

=c·p(v)·I(v≥0).

p(v ≥ 0)

Where c = [p(v ≥ 0)]−1 is a constant and I(·) an indicator function taking value

1 if the argument is true and 0 otherwise. Further we have that

p(t, v|v ≥ 0) = p(t, v, v ≥ 0)

=c·p(t,v)·I(v≥0).

p(v ≥ 0)

Then the distribution of x can be written as

∫ ∞

x=[t|v≥0]∼p(t|v≥0)=

p(t, v|v ≥ 0)dv

−∞

∫ ∞

=

c·p(t,v)·I(v≥0)dv

−∞

∫ ∞

=

c·p(t,v)dv.

0

We want to generate x from this distribution, and that will be the same as first

generate

v∼p(v|v≥0),

and then generate x as

x=[t|v]∼p(t|v).

This is valid since x then is generated from

∫ ∞

∫ ∞

x∼

p(t|v) · p(v|v ≥ 0)dv =

p(t|v) · c · p(v) · I(v ≥ 0)dv

−∞

−∞

∫ ∞

=

c·p(t,v)dv

0

=p(t|v ≥ 0).

Since p(t, v) is Gaussian will also p(t|v) be Gaussian with mean and variance given

as

E(t|v) =µ + Γtv Σv1(v + ν)

Var(t|v) =Σt − Γtv Σv1Γtv .

20

3.3. ALGORITHM C: ITERATIVE SIMULATION

Then we can first generate v until all elements vi; i = 1, ..., q is greater than

zero, and so generate t from the corresponding Gaussian distribution. Hence we

reduce the rejection problem to a lower dimension. The algorithm is:

Algorithm B: Two Steps Simulation

• Repeat until v ≥ 0

Generate v ∼ Nq (−ν, Δ + ΓΣΓ)

• x←t|v∼Nn(µ+ΓtvΣv1(v+ν),Σt −ΓtvΣv1Γtv)

Then x will be a realization of the CS-Gaussian distribution.

This is a more efficient algorithm than the first algorithm, since we generate q val-

ues instead of n + q values for each step. But also this algorithm requires that

vi ≥ 0 for all i happen simultaneously.

3.3

Algorithm C: Iterative Simulation

An even more efficient method can be constructed by sequential rejection sam-

pling combined with an MCMC approach. We will use the same principle as in

the previous algorithm, by first generate v ≥ 0 and then generate t. But in-

stead of generating all vi ≥ 0; i = 1, ..., q simultaneously will each vi be generated

sequentially by an iterative method.

To generate vi ≥ 0 a rejection sampling approach, as described in Robert

(1995) is used. First we will present the general rejection algorithm to generate

realizations from h(w). Choose a distribution g(w) to be similar to h(w), but

such that it is easier to simulate from. Choose c to be a constant such that

h(w) ≤ c · g(w) for every w in the support of h(w). Then should a w be generated

from g(w) and an u from the uniform distribution between 0 and 1, U1(0, 1). The

w will be a realization from h(w), when the w and u satisfies u ≤ h(w)/(c · g(w)).

Let h(v; µ+, σ+, v0) be the truncated univariate Gaussian distribution with

truncation point v0, then the density will be

exp(−(v−µ+)2/2σ+)

√

,v≥v0

2πσ+(1−Φ((v0−µ+)/σ+))

h(v; µ+, σ+, v0) = N1(µ+, σ+, v0) =

0

,v<v0.

Where v0 → −∞ gives the Gaussian distribution with mean µ+ and variance σ+.

In the rest of the algorithm description we assume with out loss of generality that

21

CHAPTER 3. SIMULATION

µ+ = 0 and σ+ = 1. Since v0 can be updated as v0 ← (v0 − µ+)/σ+ and then the

algorithm can be performed with µ+ = 0 and σ+ = 1, afterwards v is updated as

v←µ+ +v·σ+.

A possible choice for the proposal function g(·) is a translated exponential

distribution with density

α exp(−α(v − v0))

,v≥v0

g(v; α, v0) =

0

,v<v0.

Translated exponential distributed realizations can be generated by the inversion

method by v = −1/α·ln(u)+v0, where u is generated from the uniform distribution

U1(0, 1). Where α is a parameter that must be set. For v ≥ v0 we have that

α>v0: exp(α(v−v0))exp(−v2/2)≤exp(α2/2−v0α)=c

α≤v0: exp(α(v−v0))exp(−v2/2)≤exp(−(v0)2/2)=c,

then

g(v)

exp(−v2/2 + α(v − v0) − α2/2 + αv0) if α ≥ v0

c·h(v) =

exp(−v2/2 + α(v − v0) − α2/2 + αv0) otherwise.

Then from Robert (1995) it is given that

√

v0 +

v0 + 4

α=

2

is the optimal factor, which will give the highest probability for acceptance. Hence

also the most efficient algorithm.

Then we have an algorithm to generate a value from a truncated univariate

Gaussian distribution, but we want to simulate from a truncated multivariate

Gaussian distribution. This can be done by an MCMC approach with a Gibbs

step, then v(j) in state j is updated as

v1j) ∼ N1 (E(v1|v2j−1), v3j−1), ..., vqj−1)), Var(v1|v2j−1), v3j−1), ..., vqj−1)), v0,1)

v2j) ∼ N1 (E(v2|v1j), v3j−1), ..., vqj−1)), Var(v2|v1j), v3j−1), ..., vqj−1)), v0,2)

vqj) ∼ N1

(E(vq |v1j), v2j), ..., vq−)

1), Var(vq |v1j), v2j), ..., vq−)1), v0,q ) ,

where v0,i is the truncation point for element vi. Let v¬i = [v1, .., vi−1, vi+1, ..., vq ].

Then E(vi|v¬i) = −νi + Σv,i¬iΣ

,¬i¬i(v¬i + ν¬i) and

22

3.3. ALGORITHM C: ITERATIVE SIMULATION

Var(vi|v¬i) = σv,ii − Σv,i¬iΣ

,¬i¬iΣv,i¬i, where Σv,¬i¬i is the (q − 1) × (q − 1) matrix

derived from Σv , with elements σv,ij , by eliminating row i and column i. And Σv,i¬i

is the (q − 1) vector derived from the ith column of Σv by removing the ith row

term. It is also important to note that it is possible to calculate Σ

,¬i¬i without

performing a full inversion. Let V = Σv1 then Σ

,¬i¬i = V¬i¬i − Vi¬iVi¬i/vii, where

vii is element (i, i) in V . So the inversion of Σv in full dimension is the only

inversion that needs to be done. An initial state has to be chosen for v(0), and

a natural choice is v(0) = max{µv , v0} = max{−ν, v0}. Where max{·, ·} acts

component wise on the vectorial argument.

Then we have everything needed to define an efficient algorithm to generate

realizations from the CS-Gaussian distribution.

23

CHAPTER 3. SIMULATION

Algorithm C: Iterative Simulation

• V ←Σv1

• v=max{−ν,v0}

• Iterate until convergence

• for i = 1, ..., q

•

Calculate Σ

,¬i¬i ← V¬i¬i − Vi¬iVi¬i/vii

•

Calculate the mean of vi|v¬i: µ+ ← −νi + Σv,i¬iΣ

,¬i¬i(v¬i + ν¬i)

•

Calculate the variance of vi|v¬i: σ+ ← Σv,ii − Σv,i¬iΣ

,¬i¬iΣv,i¬i

•

Standardize v0: v0 ← (v0,i − µ+)/σ+

•

Calc)ate the parameter for the proposal distribution: α

← v0

+

(√

v0 + 4

/2

•

Repeat until u ≤ exp(−(v − α)2/2) for v0 < α

and u ≤ exp(−(v − α)2/2) exp(−(v0 − α)2/2) else.

Generate v ∼ exp(1/α) + v0

Generate u ∼ U1(0, 1)

• vi ← µ+ + vσ+ for i = 1, ..., q

• x←t|v∼Nn(µ+ΓtvΣv1(v+ν),Σt −ΓtvΣv1Γtv)

Then x will be a realization of the CS-Gaussian distribution.

Contrary to the other algorithms that requires to generate vi ≥ 0 for all i simul-

taneously, this algorithm generates vi ≥ 0 one by one. But it require an iterative

method for each realization, which can be too computer demanding. It might

be possible to generate v simultaneous and without iterations. For example with

rejection sampling from a multivariate exponential distribution as proposal distri-

bution, thus avoid using an MCMC method.

The number of iterations necessary for convergence will depend of the problem.

But in general it seems that the realizations have a relative low correlation. Since

it is only v(j) that depends on the previous state, and the variable of interest

24

3.4. EMPIRICAL STUDY

x = [t|v ≥ 0] is not directly dependent of the previous state, as displayed in

Figure 3.1. The truncated multivariate Gaussian distribution is unimodal, hence

we are avoiding problems due to convergence and several maxima in the MCMC

method. This is further discussed in the next Section and in Section 7.2.

v(0)

v(1)

v(2)

···

v(N )

···

t(1)

t(2)

t(N )

Figure 3.1: Dependency graph.

3.4

Empirical Study

To compare the algorithms an inverse problem, as discussed in Section 2.5, is

constructed with an x ∈ R3. A stationary model CSN3,3(·, ·, ·, ·, ·) is assumed. The

parameters are chosen to µx

= 2, σx

= 0.3, νx

= 1, δx

= 1, Σe = 10−2(HH′ + I),

Γe = 0 · I and five different values of γx

= {−1.5, 1, 0, 1, 1.5}. An exponential

dependency function is used, c(h) = exp{−h/5}. The H matrix in the likelihood

model is chosen to be

0.75

0.25

0

H =

0.20

0.60

0.20

.

0

0.25

0.75

First reference values x is generated from the prior, then d ∈ R3 is generated as

[d|x] = Hx + e. The posterior model can be found as described in Section 2.5.

50000 realizations are simulated from the posterior model with algorithm A, B

and C, and the average time of repeating this for 50 different reference data sets

is used to compare the algorithms.

To study the convergence of the Iterative Simulation algorithm a traceplot of

the L2-norm of each step is presented in Figure 3.2. Three different initial states are

used: v(0) = −ν, v(0) = [−2, −2, −2, −2, −2, −2]′ and v(0) = [4, 4, 4, 4, 4, 4]′. As

seen in Figure 3.2 the algorithm appears to converge almost immediately. In Figure

3.3 the two first components of v (v1, v2) are displayed, and also this confirms that

the algorithm seems to converge fast for this case. For the Iterative Simulation

algorithm a burn in period of 50 realizations are used to be sure that the algorithm

has converged.

25

CHAPTER 3. SIMULATION

10

9

8

7

6

5

4

3

2

1

0

0

10

20

30

40

50

60

Iteration number

Figure 3.2: Traceplot for the length of v.

4

3

2

1

0

−1

−2

−2

−1

0

1

2

3

4

v

1

Figure 3.3: Plot of v1 and v2 for the three different initial states.

26

3.4. EMPIRICAL STUDY

γx

Brute Force Simulation Two Steps Simulation Iterative Simulation

-1.5

42s

30s

6.4s

-1

259s

191s

6.5s

0

679s

494s

6.5s

1

274s

204s

6.5s

1.5

42s

31s

6.5s

Table 3.1: Runtime in seconds (s) for the algorithms.

As seen in Table 3.1 the Iterative Simulation algorithm is the most efficient

one. As expected is it observed that the Brute Force Simulation algorithm has a

bit longer runtime than the Two Steps Simulation algorithm. The runtime for the

Iterative Simulation algorithm appear as almost constant for these values of γx

For a higher dimensional problem the difference in runtime will be even greater.

The runtime of algorithm A and B is particularly influenced by

]

[νx

·1−γx

.σx.CnH′(HΣxH′ + Σe)−1(d − Hµx)

ν x|d =

,

0

for γe

= 0, since −νx|d ≪ 0 will shift the density such that the probability

to generate v ≥ 0 is low. The sign of the elements in νx|d is then given from

the sign of γx

= d−Hµ and of the value of νx

., e

.. Some elements ei will be

negative and some will be positive, hence νx|d will have both positive and negative

values. For γx

≪ 0 will the density shift such that most elements in e is likely

to be positive. And for γx

≫ 0 will most elements in e be less than zero. So a

low absolute value of γx

. will give that some vi are very unlikely to be generated

greater than zero. Then the Brute Force Simulation algorithm and the Two Steps

Simulation algorithm are expected to propose many times, before a realization is

accepted. The Iterative Simulation algorithm is not affected by this in the same

way, since the rejection sampling use a translated distribution for the proposal.

Then the proposal distribution will shift as νx|d changes. The algorithms give the

same predicted mean and median, but the realizations from algorithm C are not

independent as the realizations from algorithm A and B are. The estimated cdfs

and pdfs from these three algorithms are almost identical.

Since the Iterative Simulation algorithm is the fastest it is studied in more

detail to see how different parameters influence the runtime. It is shown in Figure

3.4 that different values of σe

and σx

do not affect the runtime much. For σe

and

σx

between 10−5 and 5 the maximum change in runtime is only 1.4 seconds.

27

CHAPTER 3. SIMULATION

5*10^−5

7.8

5*10^−4

7.6

5*10^−3

7.4

7.2

5*10^−2

7

5*10^−1

6.8

5

6.6

5*10^−5

5*10^−4

5*10^−3

5*10^−2

5*10^−1

5

seconds

2

x

Figure 3.4: Runtime in seconds for algorithm C for different values of σx

and σe

The effect of different values of γx

. and γe. is displayed in Figure 3.5. The

runtime is highest when both |γx

.|and|γe.|arelarge. Thecovarianceofv|dis

Σv|d =Δ + ΓΣΓ′

]

[δx

−γx

.I+γx.Σx −γx.ΣxH′AHΣx

.γe. ΣxH′AΣe

=

,

−γx

I+γe

Σe − γe

ΣeAΣe

.γe. ΣxH′AΣe

where A = [HΣxH′ + Σe]−1. So when γx

. and γe. have a high absolute value will

the correlation between v and ve be high. This causes higher runtime, but even

with very high values of γx

. and γe. is the runtime only 1.5 seconds higher than

for the lowest runtime.

The runtime for the Iterative Simulation algorithm will increase as the dimen-

sion grows. Let T = nx then q = qx + qd = 2T , so the total dimension is 3T . Let

28

3.4. EMPIRICAL STUDY

−100

8

−80

−60

−40

7.5

−20

0

20

7

40

60

80

100

6.5

−100

−80

−60

−40

−20

0

20

40

60

80

100seconds

x

Figure

3.5: Runtime in seconds for algorithm C for different values of γx

γe

. and

all the parameters be as before with γx

=3andtheH-matrixisextendedas

0.75

0.25

0

···

···

0

0.20

0.60

0.20

0

0

0

0.20

0.60

0.20

0

H =

∈ RT ×T .

0

0

0.20

0.60

0.20

0

···

···

0

0.25

0.75

Then we have a posterior model that can be scaled up to any dimension T . To check

how the runtime of the algorithm change for different values of T the average time

to generate 1000 realizations with 20 different data sets for T ∈ {20, 40, 60, ..., 420}

is used. It is observed that the runtime increases a bit faster than linear as a

function of T , as displayed in Figure 3.6. This can be a problem for an extremely

large trace or in a 3D-setting. The runtime of inverting Σv is asymptotically

T log T , and will be a problem for large T . So an approximation should be done

29

CHAPTER 3. SIMULATION

such that inverting of the full Σv is avoided.

450

400

350

300

250

200

150

100

50

0

0

50

100

150

200

250

300

350

400

450

T

Figure 3.6: Runtime as a function of the dimension, T , of the problem.

30

Chapter 4

Stationary Prior Models

A stationary prior will not change when shifted in space. So all the marginals,

bivariate pdfs and higher dimensional pdfs must be shift invariant. As shown

in Expression (2.4) there exists an one-to-one transformation between the CS-

Gaussian distribution for x and the Gaussian distribution for (t, v). Hence is it

possible to specify a prior model in the x-space and one alternative model in the

tv-space. So we will here present two different priors, and it is possible to extend

them to higher dimensional reference spaces. For both the priors we use a shift

invariant dependence matrix C ∈ Rnx×nx given as

1

c(1)

c(nx − 1)

c(1)

1

C =

1

c(1)

c(nx − 1)

c(1)

1

Where c(i, j) = c(|i − j|) is a spatial positive definite dependence measure with

c(0) = 1. Here we assume that c(·) is parameterized with given parameters, but

in theory it should be possible to also estimate the parameters of the dependence

measure.

31

CHAPTER 4. STATIONARY PRIOR MODELS

4.1

x-space Prior

In this prior x ∈ Rnx is assumed to be CS-Gaussian, as

x∼CSNn

x,qx (µx, Σx, Γx, νx, Δx).

Let 1 be a vector containing only ones, then the parameters can be specified in

the x-space as:

µx = µx

·1∈Rnx

µt = µx

·1∈Rnx

Σx = σx

·C ∈Rnx×nx

Σt = Σx = σx

·C ∈Rnx×nx

Γx = γx

·I ∈Rnx×nx

⇔ Γvt = ΓxΣx = γx

.σx. · C ∈ Rqx×nx

νx =νx

·1∈Rqx

µv = −ν ∈ Rqx

Δx = δx

·I ∈Rqx×qx

Σv = Δx + ΓxΣxΓx = δx

·I+γx

σx

·C ∈Rqx×qx.

Let nx = qx so the free parameters are the scalars µx

.,σx.,γx.,νx. andδx.. The

parameters of the marginals can be calculated as described in Section 2.2:

xi ∼CSN1,q

x(µxi , Σxi , Γxi , νxi , Δxi )

µx

=b(i)µx = µx

i

.,

Σx

=b(i)Σxb(i) = σx

,

i

1

Γx

=ΓxΣxb(i) 1

= γx

= γx

i

.σx.Cb(i)

.C(i),

σx

σx

νx

=νx,

i

Δx

=Δx + ΓxΣxΓx − ΓxΣxb(i) 1

b(i)ΣxΓx

i

σx

=δx

.I+γx.σx.C−γx.σx.C(i)C(i),

where C(i) is column i of the dependence matrix C. By transforming the distri-

bution to the corresponding Gaussian distribution of (t, v) we obtain parameter

values

Γv

=Γx

Σx

= γx

it

i

i

.σx.C(i)

Σv

=Δx

+ Γx

Σx

Γx

i

i

i

i

i

=δx

.I+γx.σxC.

Since C(i) is almost identical for every i except for some deviations caused by the

boundary, will the dominating elements be identical for the marginals for a high

32

4.2. TV-SPACE PRIOR

dimensional problem. The marginals will be almost identical, except for relatively

few variables close to the border. The marginal of xi is then

p(xi) = p(ti|v ≥ 0).

So ti depends on all vj ; j = 1, ..., qx and not only vi.

It is possible to constrain the prior by setting νx

= 0 and δx

= 1. This

gives a simpler prior, where the skewness can be controlled by γx

.. Hencethefree

parameters are µx

.,σx. andγx.. Forthemodeltobevalidisitnecessarytorequire

that σx

> 0. Note that the model is valid for all values of γx

. and µx. .

4.2

tv-space Prior

An alternative prior can be specified in the tv-space, where (t, v) ∈ Rnx+qx is

assumed to be Gaussian, as

]

]

])

[t

( [µ

[ Σt Γtv

t

∼ Nn

x+qx

,

,

(4.1)

v

µv

Γvt Σv

then the parameters can be specified in the tv-space as:

µt = µt

µx = µt

.1∈Rnx

.1∈Rnx

Σt = σt

C ∈Rnx×nx

Σx = Σt = σt

C ∈Rnx×nx

Γvt = γtv

⇔

Γx = σt.

C −1 ∈ Rqx×nx

.I ∈Rqx×nx

µv = µv

.1∈Rqx

νx =−µv

.1∈Rqx

Σv = σv

I ∈Rqx×qx

Δx = σvI − σt.

C −1 ∈ Rqx×qx

Let nx = qx so the free parameters are the scalars µt

.,σt.,γtv.,µv. andσv.. The

33

CHAPTER 4. STATIONARY PRIOR MODELS

parameters of the marginals can be calculated as described in Section 2.2:

xi ∼CSN1,q

x(µxi , Σxi , Γxi , νxi , Δxi )

µx

=b(i)µx = µt

i

.,

Σx

=b(i)Σxb(i) = σt

,

i

Γx

=ΓxΣxb(i) 1

= γtv.

C −1σt

Cb(i) 1

= γtv.

b(i),

i

σt

σt

σt

σt

νx

=−µv,

i

Δx

=Δx + ΓxΣxΓx − ΓxΣxb(i) 1

b(i)ΣxΓx

i

σt

tv.

=σv

I−

b(i)b(i),

σt

By transforming the distribution to the corresponding Gaussian distribution of

(t, v) we obtain

Γvt

=Γx

Σx

= γtv

i

i

i

.b(i)

Σv

=Δx

+ Γx

Σx

Γx

i

i

i

i

i

=σv

I.

So

(

)

tv.

xi ∼ CSN1,1

µt

,σt

, γtv.

, −µv

i

.,σv. −

σt

σt

The marginal of xi is then

p(xi) = p(ti|v ≥ 0) = p(ti|vi ≥ 0)

So each ti only depends of vi. Hence this prior model is also stationary, since

p(xi) = p(xj ) and independent for all i and j. It is possible to constrain the prior

by setting µv

=0andσv

= 1. This gives a more simple prior, where the skewness

can be controlled by γtv

.. Hence the free parameters are µt., σt. and γtv.. For the

model to be valid is it necessary to require that

]

[σtC γtv

.I

γtv

I

.I

is positive definite. This requirement can often be difficult to meet.

34

4.3. COMPARING THE PRIORS

4.3

Comparing the Priors

For the x-space prior ti is connected to all v, but for the tv-space prior is ti only

connected to vi. The latter provides a more logical constructed prior, hence will it

also be easier to understand the effects of the different parameters. The marginals

for tv-space prior have q = 1, but the marginals for the x-space prior have q = nx.

So the marginals for the tv-space prior are easier to calculate. But the x-space

prior has less complicated requirements for the parameter independence than the

tv-space prior. For the tv-space prior it is hard to specify valid parameters, since

a determinant has to be calculated to check if a parameter set is valid.

35

Chapter 5

Parameter Estimation in Prior Models

For an inverse problem in an empirical Bayes spirit will we estimate the parame-

ters in the prior model from observations of a comparable phenomenon. Previous

work have considered parameter estimation of the one-dimensional skew-normal

distribution, see Pewsey (2000). In this paper "Method of moments" and "Max-

imum likelihood" methods are used to estimate the parameters. The "Method

of moments" is not working well for very skew distributed data sets. The "Maxi-

mum likelihood" method must be solved by a numerical approximation. In Pewsey

(2000) the problems of finding the global maximum are solved by a grid of different

initial values. Here we will present alternative methods to estimate the parame-

ters of the CS-Gaussian distribution for the two different priors presented in the

previous section. We will assume that observations xo ∈ Rnx of a comparable

one-dimensional phenomenon is available, for example observations along a well

trace.

5.1

x-space Prior

For the x-space prior we will present two different methods for estimation, and

compare them empirically. The first method, with a full likelihood model, requires

an exponential dependency function.

37

CHAPTER 5. PARAMETER ESTIMATION IN PRIOR MODELS

5.1.1

Full Likelihood Model.

For the x-space prior the full likelihood (fL) is defined as:

]−1

L(µx

[Φq

(5.1)

.,σx.,γx.,νx.,δx.|xo)=

x(0; νx. 1, δx. I + γx. σx. C)

· Φq

x(γx. (xo − µx. 1); νx. 1, δx. I)

· φn

x(xo; µx, σx. C)

]−1

=

[Φq

x(0; νx. · 1, δx. · I + γx. σx. C)

∏

·

Φ1(γx

.(xi − µx. ); νx. , δx. )

i=1

· φn

x(xo; µt. 1, σx. C),

Here we have used that Φq

x(γx. (xo − µx. 1); νx. 1, δx. I) can be factorized as

∏qx

i=1 Φ1(γx. (xi − µx. ); νx. , δx. ), since the covariance δx. I is diagonal. φnx (·; ·, ·) is

analytically obtainable and Φ1(·; ·, ·) is fast to approximate for a one dimensional

Gaussian variable. But the normalization constant Φq

x(·; ·, ·) can in general be

computationally hard to obtain, so we will here look at the special case where an

efficient solution can be found. Then the maximum likelihood, ML, is given as

(µx

arg max

L(µx

.,σx.,γx.,νx.,δx.)=

.,σx.,γx.,νx.,δx.|xo).

µx. ,σx. ,γx. ,νx. ,δx.

Calculation of the normalization constant. The normalization constant is

given as

Φn(0; 0, δx

·I+γx

σx

C)=p(x≤0)

∫ 0

=

φn(x; 0, δx

.I+γx.σx.C)dx.

−∞

where x is Gaussian distributed as Nn(x; 0, δx

.I

+ γx

σx

C), so the covariance

matrix can be written as Σ = α · I + β · C. Where α and β are constants and C is

a correlation matrix. This is in general a very complicated problem, but we will

study the special case where C is assumed to be constructed from an exponential

correlation function, c(|i − j|) = exp{−|i − j|/r}.

This case can be modeled with an x0 and ϵ, as shown in Figure 5.1, where

x0 ∼ Nn(0, β · C), ϵ ∼ Nn(0, α · I) and (x0, ϵ) are independent. Then

x=x0 +ϵ∼Nn(0,α·I+β·C).

38

5.1. X-SPACE PRIOR

y1

y2

ynC

ϵ1

ϵ2

ϵn

x1

x2

xn

Figure 5.1: Dependency graph.

Define a likelihood model such that

1,

xi + ϵi ≤ 0

[yi|xi, ϵi] =

(5.2)

0,

xi + ϵi > 0.

Hence will Φn(0; 0, α · I + β · C) = p(x ≤ 0) = p(y = 1).

In Rybicki and Press (1995) it is shown that x0 has Markov property such

that p(x0) = p(xn|xn−1) · ... · p(x2|x1) · p(x1). This require that C is defined

from an exponential correlation function. The pdf, p(xi|xi−1), for i ≥ 2 is the

conditional Gaussian distribution given as φ1(xi; exp(1/r) · xi−1, β − β · exp(2/r))

and p(x1) = φ1(x1; 0, β).

39

CHAPTER 5. PARAMETER ESTIMATION IN PRIOR MODELS

Then p(y) = p(y1, y2, ..., yn) can be written as:

p(y1, y2, ..., yn) =p(yn|y1, y2, ..., yn−1) · ... · p(y2|y1) · p(y1)

∫

=

p(yn|y1, y2, ..., yn−1, x0) · ... · p(y2|y1, x0) · p(y1|x0) · p(x0)dx0

x0

∫

=

p(yn|xn) · ... · p(y2|x2) · p(y1|x1) · p(xn|xn−1) · ...·

x0

p(x2|x1) · p(x1)dx0

∫

∫

=

p(yn|xn, ϵn) · ... · p(y2|x2, ϵ2) · p(y1|x1, ϵ1) · p(xn|xn−1)

x0

ϵ0

· ... · p(x2|x1) · p(x1) · p(ϵn) · ... · p(ϵ2) · p(ϵ1)dϵdx0

∫

∫

=

p(yn|xn, ϵn) · p(ϵn)dϵn

xn

ϵn

∫

∫

·

p(yn−1|xn−1, ϵn−1) · p(xn|xn−1) · p(ϵn−1)dϵn−1dxn · ...·

xn−1

ϵn−1

∫

∫

·

p(y2|x2, ϵ2) · p(x3|x2) · p(ϵ2)dϵ2dx3

x2

ϵ2

∫

∫

·

p(y1|x1, ϵ1) · p(x1) · p(ϵ1)dϵ1dx2dx1

x1

ϵ1

In our case we want to calculate p(y = 1). Since p(yi|xi, ϵi) is zero for xi + ϵi > 0,

as given in Expression (5.2), we get the following recursive expression:

∫

∫

∫

∫

p(y = 1) =

p(ϵn)dϵn

p(xn|xn−1) ·

p(ϵn−1)dϵn−1 · ...·

xn

ϵn<−xn

xn−1

ϵn−1<−xn−1

∫

∫

∫

·

p(x3|x2) ·

p(ϵ2)dϵ2 ·

p(x2|x1) · p(x1)

x2

ϵ2<−x2

x1

∫

·

p(ϵn−1)dϵ1dx1 · ... · dxn

ϵ1<−x1

∫

∫

=

Φ1(xn; 0, 1)

φ1(xn; exp(1/r) · xn−1, β − β · exp(2/r))

xn

xn−1

∫

· Φ1(xn−1; 0, 1) · ... ·

φ1(x3; exp(1/r) · x2, β − β · exp(2/r))

x2

∫

· Φ1(x2; 0, 1) ·

φ1(x2; exp(1/r) · x1, β − β · exp(2/r))

x1

· φ1(x1; 0, β) · Φ1(x1; 0, 1)dx1 · ... · dxn

To calculate this a numerical approximation is needed, so let

φ(·; ·, ·) be the discrete

40

5.1. X-SPACE PRIOR

approximation of the normal distribution φ(·; ·, ·). Hence

∑

p(y = 1) ≈

Φ1(xn; 0, 1) ∑

φ1(xn; exp(1/r) · xn−1, β − β · exp(2/r))

xn

xn−1

· Φ1(xn−1; 0, 1) · ... · ∑φ1(x3; exp(1/r) · x2, β − β · exp(2/r)) · Φ1(x2; 0, 1)

x2

∑

·

φ1(x2; exp(1/r) · x1, β − β · exp(2/r)) ·

φ1(x1; 0, β)) · Φ1(x1; 0, 1)

x1

The sums can be computed recursively and relatively fast, hence is it possible to

estimate Φn(0; 0, α · I + βC) relatively fast. This gives a method to also calculate

other Gaussian quantiles, by changing the likelihood model given in Expression

(5.2). Hence can the method be very useful in many different situations. The

method presented here is much faster than simulate from the corresponding dis-

tribution and count the number of points in the desired region.

5.1.2

Localized Pseudo Likelihood Model.

Here we will approximate the likelihood with a pseudo likelihood given as:

]−nx

pL(µx

[Φq

.,σx.,γx.,νx.,δx.|xo)=

x(0; νx. 1, δx. I + γx. σx. C)

∏

·

Φq

)

x(γx. C(i)(xi − µx. ); νx. 1, δx. I + γx. σx. C − γx. σx. C(i)C(i)

i=1

∏

·

φ1(xi; µx

.,σx.).

i=1

The high dimensional Gaussian quantile, Φq

x(γx. C(i)(xi −µx. ); νx. 1, δx. I +γx. σx. C −

γx

σx

C(i)C(i)), is hard to calculate. The method to calculate quantiles for the

Gaussian distribution presented in previous section does not work here. It is not

possible to express the x-variable as a Markov chain, even for a correlation ma-

trix C constructed from an exponential correlation function. So an approximated

marginal need to be used. Consider the equivalent parameterization of (t, v) for x,

as given in Expression (2.4). The marginal variable ti is normally mostly correlated

with vi, vi+1, vi−1, vi−2, vi+2, ..., in decreasing order, such that an approximation can

be defined by setting the correlation between ti and vj for |j − i| > ⌊qx/2⌋ to zero

for a given qx. Where ⌊·⌋ rounds the element to the nearest integer towards zero.

Thus can the marginal distribution be approximated as

CSN1,q

(5.3)

x(µx. , σx. , Γxi , νx, Δxi ) ≈ CSN1,qx (µx. , σx. , Γxi , νxi , Δxi ),

41

CHAPTER 5. PARAMETER ESTIMATION IN PRIOR MODELS

where

C(⌊ qx

2 ⌋)

C(1)

Γx

=γx

C(0)

= γx

i

.

(i) ∈ Rqx ,

C(1)

C(⌊ qx

2 ⌋)

ν

ν

νx

=

∈ Rqx ,

i

ν

Δx

=δx

i

.I+γx.σx.C∗ −γx.σx.

(i)

(i) ∈ Rqx×qx .

Hence can the marginals be approximated as

p(xi) = p(ti|v ≥ 0) ≈ p(ti|v

≥ 0, ..., vi−1 ≥ 0, vi ≥ 0, vi+1 ≥ 0, ..., v

≥ 0).

i−⌊ qx

i+⌊ qx

2 ⌋

2 ⌋

By using this approximated marginal distribution will the pseudo likelihood be

]−nx

pL∗(µx

[Φq∗

(0; νx

.,σx.,γx.,νx.,δx.|xo)=

x

.1, δx. I + γx. σx. C∗)

∏

′∗

·

Φq∗

(γx

C

)

x

.

(i)(xi − µx. ); νx. 1, δx. I + γx. σx. C∗ − γx. σx.

(i)

(i)

i=1

∏

·

φ1(xi; µx

.,σx.)

i=1

Then the maximum pseudo likelihood, MpL, is given as

(µx

arg max

pL∗(µx

.,σx.,γx.,νx.,δx.)=

.,σx.,γx.,νx.,δx.|xo)

µx. ,σx. ,γx. ,νx. ,δx.

It is observed that this MpL is not defined for all possible choices of parameters,

and this problem is avoided by requiring σx

>0and δx

> 0. Then from the defi-

nition of the CS-Gaussian distribution we know that δx

.I+γx.σx.C−γx.σx.C(i)C(i)

must be positive definite if C is positive definite. Then a CS-Gaussian distribution

will be defined for the estimated parameters. A numerical method can be used

to solve the optimization to determine the parameters, and with the constrained

prior, νx

=0andδx

= 1, is it possible to find the global maximum. The localized

pseudo likelihood will work for every dependency functions, c(·), so a exponential

dependency function is not required as for the full likelihood.

42

5.1. X-SPACE PRIOR

5.1.3

Empirical Test for fL and Localized pL

We have constructed a test example with µx

= 0, σx

=1andγx

=3tocompare

the two presented estimators. For different range parameters, r, have we generated

100 data points and the estimated parameters are compared in Table 5.1. The tests

are repeated 100 times for each parameter set, and used to estimate an empirical

90% confidence interval.

For the case with r = 0.00001 the estimated parameters are identical, which

is natural since the data points are almost independent. Hence will the pseudo

likelihood be almost identical to the full likelihood. For a higher range parameter

will the full likelihood give better results than the pseudo likelihood. That is also

expected since the approximation of independency in the pseudo likelihood will be

less correct for higher range parameter. The µx

. parameter seems to be biased for

the pseudo likelihood and highly correlated data. For the case with r = 30, the

correct value of µx

. is not in the empirical confidence interval. The σx. parameter

is estimated quite reliably for all range values, but the precision is lower for the

higher range parameters. The skewness parameter, γx

., is estimated with a high

bias and low precision with the pL estimator. For the pL with range parameter

larger than 1 the correct value of γx

. is not in the empirical confidence interval.

It is natural that also the full likelihood will be less correct for a higher range

parameter, since there are less information in a highly correlated data set than in

an uncorrelated data set. Note that when the distance between two points is 3r,

the correlation is still about 5%. So the practical range with r = 30 is usually

considered to be 90, hence the data is highly correlated.

43

CHAPTER 5. PARAMETER ESTIMATION IN PRIOR MODELS

Real values

µx

=0

σx

=1

γx

=3

r=0.00001

fL

µx

= 0.02

[−0.17, 0.19]

σx

= 0.98

[0.65, 1.33]

[1.96, 4.50]

γx. = 3.18

pL∗

µx

= 0.02

[−0.17, 0.19]

σx

= 0.98

[0.65, 1.33]

[1.96, 4.50]

γx. = 3.18

r=1

fL

µx

= 0.17

[−0.10, 0.38]

σx

= 0.85

[0.52, 1.17]

[0.83, 5.04]

γx. = 2.87

pL∗

µx

= −0.06

[−0.21, 0.08]

σx

= 1.18

[0.89, 1.50]

[3.07, 9.91]

γx. = 5.52

r=10

fL

µx

= −0.12

[−0.49, 0.21]

σx

= 0.95

[0.77, 1.13]

[0.70, 5.61]

γx. = 2.78

pL∗

µx

= 0.23

[−0.01, 0.55]

σx

= 1.45

[0.85, 2.18]

[4.13, 8.40]

γx. = 6.14

r=30

fL

µx

= −0.29

[−1.05, 0.19]

σx

= 0.95

[0.73, 1.15]

[0.63, 4.49]

γx. = 2.49

pL∗

µx

= 0.43

[0.18, 0.73]

σx

= 0.99

[0.54, 1.57]

[4.26, 11.52]

γx. = 6.90

Table 5.1: Estimated parameters form 100 data points with exponential correla-

tion, C(i, j) = exp{−|i − j|/r}. The estimation is repeated 100 times for each

case, and the empirical mean of the parameters are presented with an empirical

90% confidence interval.

5.2

tv-space Prior

For this prior we will also define a full likelihood and a pseudo likelihood for

parameter estimation.

5.2.1

Full Likelihood Model.

For the tv-space prior is the full likelihood given as:

]−1

L(µt

.,σt.,γtv.,µv.,σv.|xo)=

[Φq

(5.4)

x(0; −µv. 1, σv. I)

)

(

tv.

tv.

· Φq

x

C −1(xo − µt

C −1

.1); −µv. 1, σv. I −

σt

σt

· φn

x(xo; µt1, σt. C)

44

5.2. TV-SPACE PRIOR

(

)

γtv.

The problem here is to evaluate Φq

x

σt. C−1 · (xo − µt. · 1); −µv. 1, σv. I −

σt. C−1

There is no Markov property with this covariance, hence is the earlier presented

method for Gaussian quantiles not working here. So we need to use a pseudo

likelihood approach.

5.2.2

Pseudo Likelihood Model.

For the prior specified in the tv-space we have to approximate the likelihood with

a pseudo likelihood:

]−nx

pL(µt

[Φ1(0; −µv

.,σt.,γtv.,µv.,σv.|xo)=

.,σv.)

)

∏

( γtv

tv.

·

Φ1

· (xi − µt

.); −µv. , σv. −

σt

σt

i=1

∏

·

φ1(xi; µt

.,σt.).

i=1

Note that this pseudo likelihood can be calculated without any approximations,

since this pL factorize, as is required with the previous prior. A numerical method

is used to solve the optimization to determine the parameters, but it is often

difficult to ensure that the global maximum is identified. Therefore a constrained

pseudo likelihood approach with µv

=0 and σv

=1 can be used as described in

Section 4.2. It is observed that this pL is not defined for all possible choices of

parameters, and this problem is avoided by requiring σt

> 0 and σv

− σt.

> 0.

With σv

= 1 it is sufficient to require σt

>

> 0. This ensures that the

tv.

marginals have valid parameters. But to be sure that the simultaneous distribution

has valid parameters, is it necessary to require that

]

[σtC γtv

.I

(5.5)

γtv

.I σv.I

is positive definite.

Then the maximum pseudo likelihood, MpL, is given as

(µt

arg max

pL(µt

.,σt.,γtv.,µv.,σv.)=

.,σt.,γtv.,µv.,σv.|xo).

µt. ,σt. ,γtv. ,µv. ,σv.

With µt

.,σt. andγtv. asthefreeparameters,thisisalmostthesameproblemas

described in Pewsey (2000). The numerical methods to find the maximum of the

pL requires initial values for the parameters, but instead of having a grid of initial

45

CHAPTER 5. PARAMETER ESTIMATION IN PRIOR MODELS

values we will here present an alternative algorithm to find the global maximum.

It seems that there often will occur two maxima for the log-pseudo likelihood. One

gives an almost Gaussian fit (γtv

≈ 0) and one with skewness. To understand this

we will first look at the log-pseudo likelihood:

)

log

(pL(µt

)=−nxlog(Φ1(0;0,σv

)

.,σt.,γtv.,µv.,σv.|xo)

(

))

∑

( γtv

tv.

+

log

Φ1

(xi − µt

.); 0, 1 −

σt

σt

i=1

∑

)

+

log

(φ1(xi; µt

.,σt.)

i=1

=nx log (2)

(

))

∑

( γtv

tv.

+

log

Φ1

(xi − µt

.); 0, 1 −

σt

σt

i=1

)2

√

∑( xi − µt

−1

− nx log(

2π · σt)

2

σt

i=1

(

(

))

γtv.

Φ1

is unimodal.

=1 log

σt. (xi − µt. ); 0, 1 −

σt.

(

∑nx

xi−µt.

The last part, − 1

)2 − nx log(√2π · σt), is recognized from the Gaus-

2

i=1

σt.

sian log-likelihood and is unimodal. So the log pseudo likelihood seems to have

two modes.

If there exist two modes the parameters can be estimated by initiate µt

. and

σt

with the standard estimators for the Gaussian distribution. Since γtv

. is lim-

ited between σt

. and −σt. we will run the numerical method twice with Gaussian

estimates for (µt

.,σt.)andthetwoextremevaluesofγtv. asinitialvalues.

46

5.2. TV-SPACE PRIOR

Algorithm: Maximum pL(µt

.,σt.,γtv.,0,1|xo)

• Initiate the parameters:

∑nx

⋆ µt

= 1

.,0

nx

i=1 xi

1

∑nx

⋆ σt

=

i=1 (xi − µt.,0)2

.,0

nx−1

⋆ γtv

= σt

.,0

.,0

• Do the optimization to identify (µt

.,σt.,γtv.)+. Make sure that the pa-

rameters meet the positive definite requirement given in Expression (5.5).

• Initiate the parameters µt

σt

and γtv

= −σt

.,0,

.,0

.,0

.,0

• Do the optimization to identify (µt

.,σt.,γtv.)−. Make sure that the pa-

rameters meet the positive definite requirement given in Expression (5.5).

• Choose the parameter set with highest likelihood, (µt

.,σt.,γtv.)

The two maxima is illustrated by an example in Figure 5.2, where a likelihood

for a profile between the two maxima is displayed. The two maxima are found by

the given algorithm, and the value of the likelihood on a linear transect between

these maxima is calculated. Note that the non-Gaussian maximum has the highest

likelihood and also is the solution closest to the correct values of the example which

is µt

= 0, σt

=2andγtv

= −0.75.

x 104

−1.6846

−1.6848

−1.685

−1.6852

−1.6854

−1.6856

−1.6858

(t=0.08, 2=2.09, vt=−0.78)

(t=−0.54, 2=1.70, vt=0.00)

Figure 5.2: Log-pL for an example with real values: µt

= 0, σt

= 2 and γtv

=

−0.75.

47

CHAPTER 5. PARAMETER ESTIMATION IN PRIOR MODELS

5.2.3

Empirical Test for pL

The estimation procedure was tested on the same parameters as in the empirical

test for the x-space prior in Section 5.1.3, but it seems to be impossible to a have

very skew data with a high correlation, and meet the requirements in Expression

(5.5). So we will study another test example than for the x-space prior. We have

constructed a test example with 100 data points with µt

= 0 and σt

= 4 and

range parameter r = 3. We have done parameter estimation with different values

of γtv

. between −0.8 and 0.8. The test is repeated 100 times for each parameter set.

The empirical mean of the parameter estimation is shown in Table 5.2, with an

empirical 90% confidence interval. All parameter estimates ensures the constrain

in Expression (5.5).

Real values

µt

=0

σt

=4

γtv

= 0.8 (Real value)

µt

= 0.01

[−0.76, 0.62]

σt

= 3.96

[2.11, 5.95]

γtv. = 0.75

[0.30, 0.96]

γtv

= 0.5 (Real value)

µt

= 0.21

[−0.65, 1.40]

σt

= 3.93

[2.35, 5.53]

[−0.90, 0.94]

γtv. = 0.22

γtv

= 0.1 (Real value)

µt

= −0.03

[−1.07, 1.11]

σt

= 4.24

[2.90, 5.93]

[−0.92, 0.97]

γtv. = 0.18

γtv

= −0.1 (Real value)

µt

= 0.02

[−1.32, 1.14]

σt

= 4.16

[2.73, 5.53]

[−0.96, 0.89]

γtv. = −0.12

γtv

= −0.5 (Real value)

µt

= −0.19

[−1.35, 0.87]

σt

= 4.10

[2.66, 5.93]

[−0.94, 0.91]

γtv. = −0.16

γtv

= −0.8 (Real value)

µt

= 0.01

[−0.77, 0.76]

σt

= 3.81

[2.61, 5.22]

γtv. = −0.72

[−0.93, 0.10]

Table 5.2: Parameter estimation with different values of γtv

100 data points with

µt

=0andσt

=4andr=3.

It is observed that for data with low skewness the precision in the estimation

is very low. Specially the γtv

. seems to be hard to estimate. For small values of

|γtv

.|thevalueofthepLforthetwomaximaareoftenclosetoeachother. Hence

will the wrong maxima (γtv

≈ 0) be chosen more often for small values of |γtv

.|.

But the answer for small values of |γtv

.|willnotbeaffectedsomuchasforhigher

values of |γtv

.|. So it seems that the cases with γtv. = 0.5 and γtv. = −0.5 will be

influenced most.

48

5.3. CLOSING REMARKS

5.3

Closing Remarks

For parameter estimation it seems to be hard to fulfill the requirements in Ex-

pression (5.5) for the tv-space prior. Specially to combine high skewness and high

correlation seems to be really difficult. So it seems that the x-space prior with the

full likelihood estimator is the best choice, since the requirements for the parame-

ters are easier to fulfill. The full likelihood also takes into account the correlation

in the data. But the full likelihood requires a exponential correlation measure. For

other correlation functions must the pL be used. It will also be possible to let the

range parameter be a free parameter, and then optimize the likelihood with respect

to µx

.,σx.,γx. andr. Theoptimizationwillbemuchmorecomputerdemanding.

49

Chapter 6

The CST-distribution

In Karimi et al. (2009) is the inversion done for the elastic material properties: P-

wave velocity, S-wave velocity and density. The results shows that the distribution

of the P-wave velocity has a bit heavier tails than the CSN-distribution. Therefore

it would be useful to have a skew distribution with heavier tails. To meet this

requirement the multivariate closed-skew (CS) T-distribution is defined.

The CST-distribution is a generalization of the T-distribution. In a similar

way as the CS-Gaussian distribution it adds skewness into the model. First we

present some properties of the multivariate T-distribution:

6.1

Multivariate T-distribution

The multivariate T-distribution is a generalization of the Student’s t-distribution

and a random vector y ∈ Rn is multivariate T-distributed

y∼Tn(µ,Ω,η)

if the pdf is

(

)

η+n

(

)− η+n

Γ

2

2

1

τn(y; µ, Ω, η) =

(

2

1+ 1

η

Γ

) (η · π)n/2 |Ω|−

η (y − µ)′Ω−1(y − µ)

2

where Γ(·) is the gamma function, µ ∈ Rn is a centering vector, Ω ∈ Rn×n is a

positive definite dependence matrix and η ∈ R+ is the degree of freedom.

Some relevant characteristics are given in Roislien and Omre (2006):

1. Special case of the T-distribution:

51

CHAPTER 6. THE CST-DISTRIBUTION

2. Moments of T-distribution: Let x ∼ Tn(µ, Ω, η) then

E(x) = µ; η ≥ 2

Cov(x) = η

η−2·Ω; η≥3

3.

Linear combinations of T-distributed random variables are also T-distributed.

Let x ∼ Tn(µ, Ω, η) and A be a deterministic l × n matrix then

[Ax] ∼ Tl(Aµ, AΩA′, η)

4.

T-distributed random variables conditional of the components are also T-

distributed. Let x ∼ Tn(µ, Ω, η), with x1 ∈ Rk and x2 ∈ Rn−k being two

subvectors of x = [x1, x2]′. Further let the parameters associated with the

subvectors be:

]

]

[µ

[Ω11

Γ12

1

µ=

,

Ω=

µ2

Γ21

Ω22

Then

[x1|x2] ∼ Tk

(µ

(x2 − µ), κ(η) · (Ω11 − Γ12Ω

),

1 +Γ12Ω

2

2 Γ21), η + n − k

where

(

)

1

κ(η) =

1+ 1

1

(x2 − µ2)

1+n−k

η (x2 − µ2)′Ω

η

5.

Components of T-distributed random variables are non-independent. Let

x=[x1,x2]beasabove,then

Tn(µ, Ω, η) = Tn(µ1, Ω11, η) · Tn(µ2, Ω22, η); η < ∞

for all eligible (µ, Ω). Even for the particular case where Γ12 and Γ21 being

a zero-matrix this holds. The subvectors x1 and x2 will not be independent

even if Ω is a block matrix. When η → ∞ we get the multivariate Gaussian

case and independence can be obtained.

6.2

The CST-distribution

The closed-skew T-distribution adds skewness into the T-distribution. The most

convenient way to define the CST-distribution is to consider a random vector

52

6.2. THE CST-DISTRIBUTION

t ∈ Rn and a random vector v ∈ Rq. Let the joint pdf of (t,v) be multivariate

T-distributed as

]

]

]

)

[t

( [µ

t

[ Ωt Γtv

∼T

,

,η

v

µv

Γvt Ωv

Then the CST-distributed variable of interest x ∈ Rn is defined as

x=[t|v≥0].

Let p(x) be the generic term for a pdf of the argument, and the term p(x|d)

represents the conditional pdf of x given d. The pdf of x is then given as:

x∼p(x)=p(t|v≥0)= p(v≥0|t)p(t)

p(v ≥ 0)

=[1 − Tq (0; µv , Ωv , η)]−1

[1 − Tq (0; µ

] τn(t; µ

v|t, Ωv|t, η + n)

t,Ωt,η),

where

µv|t =µv + ΓvtΩt1(t − µt)

(

)

1

Ωv|t =

1+ 1

(Ωv − ΓvtΩt1Γtv )

1+n

η (t − µt)′Ω−1(t − µt)

η

Here Tq (·; µ, Ω, η) is the q-dimensional cumulative T-distribution. The closed-skew

T-distribution can be parameterized as:

CSTn,q (µ, Ω, Γ, ν, Δ, η) = [Tq (0; ν, Δ+ΓΩΓ′, η)]−1Tq (Γ(x−µ); ν, Δ, η+n)τn(x; µ, Ω, η)

with µ = µt, Ω = Ωt, Γ = ΓvtΩt1,ν = −µv and Δ = Ωv|t.

The transformation between the CST-distribution for x and the T-distribution

for (t, v) is one-to-one given by:

]

]

]

)

[t

( [µ

[ Ωt Γtv

t

x=[t|v≥0]∼CSTn,q(µ,Ω,Γ,ν,Δ,η)⇔

∼ Tn+q

,

,η

v

µv

Γvt Ωv

(6.1)

53

CHAPTER 6. THE CST-DISTRIBUTION

with

µ=µt ∈Rn

µt = µ ∈ Rn

Ω=Ωt ∈Rn×n

Ωt = Ω ∈ Rn×n

Γ=ΓvtΩt1 ∈Rq×n

⇐⇒ Γvt = Γtv = ΓΩ ∈ Rq×n

ν =−µv ∈Rq

µv = −ν ∈ Rq

Δ=Ωv −ΓvtΩt1Γtv ∈Rq×q

Ωv = Δ + ΓΩΓ′ ∈ Rq×q

6.3

Properties of the CST-distribution

Some favorable characteristics of the CST-distribution that are relevant for inver-

sion problems are given here and proven in Appendix A:

1.

Linear combinations of components CST-distributed random variables are

also CST-distributed random variables. If x ∼ CSTn,q (µ, Ω, Γ, ν, Δ, η) and

A is a deterministic l × n matrix with (l ≤ n). Then

[y = Ax] ∼ CSNl,q(µy,Ωy,Γy,νy,Δy,ηy),

where

µy =Aµ,

Ωy =AΩA′,

Γy =ΓΩA′Ωy1,

νy =ν,

Δy =Δ + ΓΩΓ′ − ΓΩA′Ωy1AΩΓ′

ηy =η.

So the CST-distribution is closed under linear transformations. This is

proven in Appendix A.1. Note that a result is that also marginal pdfs will

be CST by choosing A = b(i) with b(i) an n × 1 vector with entries zeros

except for element number i, so that xi = b(i)x.

2.

Components of CST-distributed random variables are presumable non-independent.

From property 5 in Section 6.1 we know that components of T-distributed

random variables are non-independent, hence is unlikely that there is possible

to construct a CST-distributed random variable with independent compo-

nents. We have not been able to prove this.

54

6.4. EXAMPLES OF THE CST-DISTRIBUTION

3.

CST-distributed random variables conditional of the components are also

CST-distributed variables. Let x ∼ CSTn,q (µ, Ω, Γ, ν, Δ, η), with x1 ∈ Rk

and x2 ∈ Rn−k being two subvectors of x = [x1, x2]′. Further let the param-

eters associated with the subvectors be:

]

]

[µ

[Ω11

Ω12

1

µ=

,

Ω=

,

Γ=[Γ1,Γ2].

µ2

Ω21

Ω22

Then

[x1|x2] ∼ CSTk,q (µ1|2, Ω1|2, Γ1|2, ν1|2, Δ1|2, η1|2).

Where

µ1|2 =µ1 + Ω12Ω

(x2 − µ2),

2

Ω1|2 =κ(η)(Ω11 − Ω12Ω

2 Ω21),

Γ1|2 =Γ1,

ν 1|2 =ν −

(Γ2 + Γ1Ω12Ω

) (x2 − µ

2

2),

Δ1|2 =κ(η)Δ

η1|2 =η.

This is proven in Appendix A.2. So the CST-distribution is closed under linear

transformation and conditioning.

6.4

Examples of the CST-distribution

The CST-distribution is closely related to the CSN-distribution and the T-distribution.

To illustrate the characteristics of the CST-distribution some examples of the dis-

tribution will be shown. We will first consider a CST-distributed x ∈ R1, with the

similar base case as in Section 2.4 with

µ=5, Ω=9, Γ=1, ν =0, Δ=0.05.

(6.2)

Note that Ω, Γ and Δ here are one dimensional matrices. In Figure 6.1 are the

density displayed for four different degrees of freedom, η. For η = ∞ we have a

CSN distribution and the same density as displayed in Figure 2.1. It is observed

that lower degrees of freedom gives heavier tails.

55

CHAPTER 6. THE CST-DISTRIBUTION

=

=3

=2

=1

4

5

6

7

8

9

10

11

12

13

Figure 6.1: Pdf for different degrees of freedom. µ = 5, Ω = 9, Γ = 1, ν = 0 and

Δ=0.05.

The same situations will occur for higher dimensional problems, and we will

here also study a particular CST-distributed variable x with n = q = 2, and

parameters

]

]

]

]

]

[5

[ 1

0.2

[4

0

[−2

[1

0

µ=

,

Ω=

,

Γ=

,

ν =

,

Δ=

7

0.2

4

0

5

6

0

1

The pdf of this two dimensional CST distribution is shown in Figure 6.2. The CSN-

distribution with η = ∞ is displayed in Figure 6.2a and the CST-distribution with

η =2 is displayed in Figure 6.2b. It is observed that the distribution with η =2

has heavier tails than the other.

56

6.4. EXAMPLES OF THE CST-DISTRIBUTION

16

15

14

13

12

11

10

9

8

7

2

4

6

8

10

12

x

1

(a) η = ∞

16

15

14

13

12

11

10

9

8

7

2

4

6

8

10

12

x

1

(b) η = 2

Figure 6.2: Bivariate pdf and marginal pdfs for a CST-distributed x.

57

CHAPTER 6. THE CST-DISTRIBUTION

6.5

Bayesian CST Inversion

Let x ∈ Rnx be the variable of interest and consider a linear model

d=Hx

where d ∈ Rnd is measured data, nd < nx. Note that there can not be an error

term, e. From property 2 in Section 6.3 we know that it probably not is possible to

specify a multivariate CST-distribution of independent vectors Hx and e. Hence

is it not possible to have an independent error term in the model. The prior model

for x is defined as

x∼CSTn

x,qx (µx, Ωx, νx, Δx, η).

Let this define a T-distributed vector as:

t

µx

Ωx ΩxH′

ΩxΓx

r=Ht

Tn

Hµx

HΩx HΩxH′

HΩxΓx

η

∼

x+nd+qx

,

,

v

−νx

ΓxΩx ΓxΩxH′ Δx + ΓxΩxΓx

where

x =[t|v ≥ 0], t ∈ Rnx , u ∈ Rqx

d =[r|v ≥ 0] r ∈ Rnd , v ∈ Rqd .

Then

]

]

]

)

[ t|r

([µ

[ Ωt|r Γtv|r

t|r

∼ Tn

,

,η+nd

,

x+qx

v|r

µv|r

Γvt|r Ωt|r

where

µt|r =µx + ΩxH′(HΩxH′)−1(r − Hµx)

µv|r = − νx + ΓxΩxH′(HΩxH′)−1(r − Hµx)

Ωt|r =κ(η + nd)(Ωx − ΩH′(HΩxH′)−1HΩx)

Ωv|r =κ(η + nd)

(Δx + ΓxΩxΓx − ΓxΩxH(HΩxH′)−1(ΓxΩxH′)′)

Γv|r =κ(η + nd)

(ΓxΩx − ΓxΩxH′(HΩxH′)−1) .

Then from the definition of the CST-distribution we have:

[x|d] = [t|r, v ≥ 0] ∼ CSTn

x,qx (µx|d, Ωx|d, Γx|d, νx|d, Δx|d, η + nd + qx)

58

6.5. BAYESIAN CST INVERSION

where

µx|d =µx + ΩxH′(HΩxH′)−1(r − Hµx)

Ωx|d =κ(η + nd + qx)(Ωx − ΩH′(HΩxH′)−1HΩx)

Γd|x =κ(η + nd + qx)

(ΓxΩx − ΓxΩxH′(HΩxH′)−1) Ω

|d

ν x|d =νx − ΓxΩxH′(HΩxH′)−1(r − Hµx)

Δx|d =κ(η + nd + qx)

(Δx + ΓxΩxΓx − ΓxΩxH′(HΩxH′)−1(ΓxΩxH′)′)

− Γx|dΩx|dΓx|d.

In the limit η → ∞ we get κ(η) → 1 and the parameters are identical as

presented in Section 2.5 for the CS Gaussian distribution. Note that there is only

one parameter for the degrees of freedom, such that there is not possible to have

different degrees of freedom for different marginals. It might be possible to make

a CST-distribution with a vector of degrees of freedom, but we have not had time

to study this here.

59

Chapter 7

Inversion Case Study

In Buland and Omre (2003) Bayesian Gaussian inversion is used on data from the

Sleipner Øst field in the North Sea. The Sleipner Øst field is a gas condensate

field in the southern part of the North Sea. The depth of the reservoir is in the

range of 2270 to 2500 meter sub-sea. The prior model is inferred from elastic

properties in a well, and the assumptions for a Gaussian model are justified by a

probability plot. The data do not fit a Gaussian model very well, but the authors

states that assuming a Gaussian prior model is necessary to make the inversion

analytically tractable. In Karimi et al. (2009) it is shown that a CS-Gaussian

approach makes it possible to capture the skewness in the data and still obtain

analytical tractability.

The same data set that is used in Karimi et al. (2009) is used here to generate

realizations from the posterior model and to test the efficiency of the simulation

algorithms. Then the algorithms can be tested with real data and real parameters.

The two different prior models and the different parameter estimators are tested

with real data.

We have limited our study to consider the CS-Gaussian distribution. The

CST-distribution is more general and could possibly give even better results.

7.1

Seismic Inversion Model

The data is collected from a seismic vessel, generating waves that propagates

through the water and the seabed. In the layers in the seabed the properties of

the medium change, and portions of the waves are reflected to the surface where

hydrophones on the seismic streamers behind the vessel registers the amplitudes

of the reflected waves. This results in a set of stacked seismic data. This gives a

convolved linear relation between the data and the reflection coefficient, r,

d=Wr+e,

61

CHAPTER 7. INVERSION CASE STUDY

where W is a wavelet matrix shown in Figure 7.1. The seismic observation error is

defined as e = W e1 + e2, where e1 and e2 are independent Gaussian distributed

with mean zero and variance σe. Hence e is wavelet colored with mean zero and

variance Σe = σe(W W ′ +I). Since the CS-Gaussian distribution is a generalization

of the Gaussian distribution e is also CS-Gaussian distributed as CSN(0, Σe, ·, ·, ·),

where the three latter parameters are unspecified since q = 0. The reflection

Wavelet

25

20

15

10

5

0

−5

−10

−15

−30

−20

−10

0

10

20

30

Figure 7.1: Wavelet

coefficient for layer j can be written as rj = ajr)/aji), where ajr) is the reflected

amplitude of the wave and aji) is the amplitude of the incident wave between layer

j and j + 1. Let vj be the velocity of the P-wave and let ρj be the density in

layer j, as shown in Figure 7.2. Then will the impedance, zj , in layer j be vj ρj .

The parameters v and ρ from a well at the Sleipner Øst field is used to calculate

the impedance, z, and the resulting data are displayed in Figure 7.3. The depth

interval used is [2050, 2378] milliseconds with two milliseconds sampling interval

making the number of data, T = 165.

For normal incidence Zoeppritz’ equations reduce to a simple form as described

in Sheriff et al. (1995). This gives

rj = ajr)

= vj+1ρj+1 − vj ρj

vj+1ρj+1 + vj ρj = zj+1 + zj

aji)

≈ Δzj

≈ 1

2zj

2Δln(zj).

62

7.1. SEISMIC INVERSION MODEL

vj

ρj

Layer j

aji)

ajr)

θ

vj+1

ρj+1

aj+) 1

Layer j + 1

Figure 7.2: Layer j and j + 1.

Where ln(·) is the natural logarithm and Δ ln zj = ln zj+1 − ln zj . Then the linear

relation between d and r becomes

d=Wr+e=WDlnz+e

=Hx+e,

(7.1)

where x = ln z, H = W D and D is given as

−1

1

0

···

···

0

0

−1

1

0

···

0

D= 1

2

0

··· ...

0

−1

1

0

···

···

0

0

0

Then the model is on the same form as in Expression (2.7). From the model

described in Expression (7.1) and the observed impedances, synthetic seismic data,

d, can be created. The error variance is set to σe

=5·10−5 asusedinKarimietal.

(2009). The observed impedance, zo is displayed in Figure 7.3 and the synthetic

seismic data is displayed in Figure 7.4.

63

CHAPTER 7. INVERSION CASE STUDY

P−Wave Velocity

Density

Impedance

2050

2050

2050

2100

2100

2100

2150

2150

2150

2200

2200

2200

2250

2250

2250

2300

2300

2300

2350

2350

2350

2000

2500

3000

3500

4000

2000

2100

2200

2300

2400

2500

4

5

6

7

8

m/s

kg/m3

kg/m3 m/s

x 106

Figure 7.3: Material property observations from a well.

Seismic data

2050

2100

2150

2200

2250

2300

2350

2400

−3

−2

−1

0

1

2

3

Figure 7.4: Synthetic seismic data.

64

7.1. SEISMIC INVERSION MODEL

In order to solve the inverse problem a prior model for x must be determined.

It is natural to assume a stationary prior as described in Section 4.1. The used

dependency measure is c(h) = exp{−h/15}. The empirical correlation function

and the used dependency measure are displayed in Figure 7.5. As described in

1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0

0

10

20

30

40

50

60

70

Lag

Figure 7.5: Fitted correlation function of the log-impedance ln(z).

Chapter 4 are there three different priors that can be used. For the x-space prior

with the pseudo likelihood estimator is qx set to 3. The results from the different

priors are:

x-space prior:

fL: µx

= 15.85,

σx

= 0.0351,

γx. = −15.32

pL∗: µx

= 15.78,

σx

= 0.0398,

γx. = −32.73

tv-space prior:

pL: µt

= 15.61,

σt

= 0.0110,

γtv. = −0.019,

Gaussian prior:

pL: µt

= 15.59,

σt

= 0.0109.

The different prior models are displayed together with the observations in Figure

7.6. Because of the high range parameter and the positive definite restrictions in

65

CHAPTER 7. INVERSION CASE STUDY

15.2

15.3

15.4

15.5

15.6

15.7

15.8

Ln z

Figure 7.6: Estimated CS-Gaussian model and histogram of the log-impedance

ln(z). Solid line (—-) is with the fL and the x-space prior. Dashed line (− − −)

is the pL∗ with x-space prior. Doted line, (· · · ), is pL with the tv-space prior.

Expression (5.5) is it not possible to capture the skewness in the data with this

tv-space model. Hence is the parameters with the tv-space prior almost Gaussian.

66

7.2. INVERSION RESULTS

7.2

Inversion Results

To predict the impedance, z, from the seismic data the posterior distribution of

x is found as described in Section 2.5. As displayed in Figure 7.7 the Iterative

Simulation algorithm appears to converge after 15 iterations, because of the fact

that the distribution is unimodal as discussed before. After the burn in period are

1000 realizations generated from the posterior model. Realization number 100,

400, 700 and 1000 are displayed in Figure 7.8 with parameters from the x-space

prior and fL. The median of z is calculated from these 1000 realizations. Then

the impedance is calculated as [

|d] = exp([

|d]) = exp(Q0.5(x|d)). In Karimi

et al. (2009) is the prediction from the posterior done by an approximation, since

they are not able to generate from the full posterior distribution. But with the

Iterative Simulation algorithm presented in this thesis it is possible to generate

realizations from the full posterior distribution. In Figure 7.9 the true impedance

is compared with the predicted impedance for the x-space prior with fL, and it

is observed that the predicted values fit the true data quite good. The generated

realizations are also used to make 80%-prediction interval as displayed in Figure

7.10 for the x-space prior with the fL estimator. The results seems to fit the real

values very good. In Figure 7.11 the results for the x-space prior with the pL∗

estimator is displayed, hence is it observed that the result is heavy shifted.

100

90

80

70

60

50

40

30

20

10

0

0

5

10

15

20

25

30

35

40

45

50

Itteration number

Figure 7.7: Traceplot for convergence of v.

67

CHAPTER 7. INVERSION CASE STUDY

Impedance

Impedance

2050

2050

2100

2100

2150

2150

2200

2200

2250

2250

2300

2300

2350

2350

2400

2400

4.5

5

5.5

6

6.5

7

7.5

4.5

5

5.5

6

6.5

7

7.5

8

kg/m3 m/s

x 106

kg/m3 m/s

x 106

(a) Realization number 100.

(b) Realization number

400.

Impedance

Impedance

2050

2050

2100

2100

2150

2150

2200

2200

2250

2250

2300

2300

2350

2350

2400

2400

4.5

5

5.5

6

6.5

7

7.5

4

4.5

5

5.5

6

6.5

7

7.5

8

kg/m3 m/s

x 106

kg/m3 m/s

x 106

(c) Realization number 700.

(d) Realization number 1000.

Figure

7.8: Realizations from the posterior model. The x-space prior and the full

Likelihood estimator is used.

68

7.2. INVERSION RESULTS

Impedance

2050

2100

2150

2200

2250

2300

2350

2400

4

4.5

5

5.5

6

6.5

7

7.5

8

kg/m3 m/s

x 106

Figure 7.9: Inversion results

from Bayesian CS-Gaussian inversion.

The true

impedance and the predicted impedance. The x-space prior and the full Like-

lihood estimator is used.

Impedance

2050

2100

2150

2200

2250

2300

2350

2400

4

4.5

5

5.5

6

6.5

7

7.5

8

kg/m3 m/s

x 106

Figure 7.10: 80%-prediction intervals (thin dotted) and predictions (thick solid).

The x-space prior and the full Likelihood estimator is used.

69

CHAPTER 7. INVERSION CASE STUDY

Impedance

2050

2100

2150

2200

2250

2300

2350

2400

4

4.5

5

5.5

6

6.5

7

7.5

8

kg/m3 m/s

x 106

Figure 7.11: 80%-prediction intervals (thin dotted) and predictions (thick solid).

The x-space prior and the pseudo Likelihood estimator is used.

In Table 7.1 is the fraction of correct values that lies below the corresponding

quantiles listed. The x-space prior with fL estimator gives the best results, but

the predicted interval is a bit too wide. The predicted impedance is not centered

for the x-space prior with pL∗. That can be a result of the approximation for

the parameter estimation and a too low qx. To use a higher qx requires that a qx-

dimensional Gaussian cdf is evaluated for every step in the numerical optimization.

The tv-space prior with pL estimator gives almost the same results as the Gaussian

prior. The bias is a bit higher than for the x-space fL estimator, and the predicted

interval is wider.

Theoretical:

0.1

0.5

0.9

Estimated, x-space prior fL:

0.048

0.482

0.956

Estimated, x-space prior pL∗:

0.006

0.030

0.446

Estimated, tv-space prior pL:

0.036

0.619

0.960

Estimated, Gaussian prior pL:

0.036

0.590

0.970

Table 7.1: Fraction of observations below predicted quantiles.

70

7.2. INVERSION RESULTS

The Iterative Simulation algorithm use only about 35 seconds to generate 1015

realizations and estimate the median, but the Brute Force Simulation algorithm

and the Two Steps Simulation algorithm were not able to generate one single

realization during three weeks.

71

Chapter 8

Closing Remarks and Further Work

In this thesis the properties of the CS-Gaussian distribution have been studied,

with a special focus on Bayesian inversion. Bayesian inversion and parameter

inference from data is more complicated than for the Gaussian distribution, there-

fore it is necessary to generate realizations from the CS-Gaussian distribution.

An efficient simulation algorithm to generate realizations from a high dimensional

problem has been obtained. In contrast to Karimi et al. (2009) is it here possible

to generate realizations of the posterior model without any approximations. Dif-

ferent stationary prior models and different methods to do parameter estimation

have been developed. For an exponential dependence structure is it possible to

use a full-likelihood estimator, hence also capture the dependency in the data.

The simulation algorithm for the CS-Gaussian distribution were implemented

and shown to work well on synthetic seismic data inspired by the Sleipner Øst field.

Also the different prior models and estimators for the CS-Gaussian distribution

are tested on the data set. The full-likelihood estimator seems to be the best

estimator for data with exponential dependence structure.

A skew distribution with heavy tails, CST-distribution, is presented with some

properties and examples. But it is still much to study about this distribution.

A sampling algorithm should be developed, and it would also be useful to have

different degrees of freedom for each dimension of the CST-distribution. The

Bayesian CS-Gaussian inversion could also be extended to handle 3D problems.

73

Bibliography

Buland, A. and Omre, H. (2003). Bayesian linearized AVO inversion, Geophysics 68: 185.

Dominguez-Molina, J., González-Farías, G., and Gupta, A. (2003). The multivariate

closed skew normal distribution, Technical report, Technical Report 03.

González-Farías, G., Domínguez-Molina, J., and Gupta, A. (2004). The closed skew-

normal distribution, Skew-elliptical distributions and their applications: a journey

beyond normality, Chapman & Hall/CRC, Boca Raton, FL, pp. 25-42.

Karimi, O., Omre, H., and Mohammadzadeh, M. (2009). Bayesian closed-skew Gaussian

inversion of seismic AVO data into elastic material properties.

Mosegaard, K. and Tarantola, A. (1995). Monte Carlo sampling of solutions to inverse

problems, J. geophys. Res 100(12): 431-47.

Pewsey, A. (2000). Problems of inference for Azzalinis skewnormal distribution, Journal

of Applied Statistics 27(7): 859-870.

Robert, C. (1995). Simulation of truncated normal variables, Statistics and computing

5(2): 121-125.

Roislien, J. and Omre, H. (2006). T-distributed Random Fields: A Parametric Model

for Heavy-tailedWell-log Data1, Mathematical Geology 38(7): 821-849.

Rybicki, G. B. and Press, W. H. (1995). Class of fast methods for processing irregu-

larly sampled or otherwise inhomogeneous one-dimensional data, Phys. Rev. Lett.

74(7): 1060-1063.

Sheriff, R., Geldart, L., and Lees, J. (1995). Exploration seismology, Cambridge Univer-

sity Press Cambridge.

75

Appendix A

Proof of Properties for the

CST-distribution

A.1

Closed Under Linear Transformations

Let x = [t|v ≥ 0] ∼ CSTn,q (µ, Ω, Γ, ν, Δ, η), and let A be a deterministic l × n

matrix with (l ≤ n). If we do the transformation of x to the tv-space as given in

Expression (6.1) we obtain

]

]

]

)

[t

([ µ

[ Ω

ΓΩ

∼ Tn+q

,

,η

v

−ν

ΩΓ Δ + ΓΩΓ′

Hence will the distribution of (At, v) be as given in property 3 by Section

6.1:

]

]

]

)

[At

([Aµ

[AΩA′ AΓΩ

∼ Tl+q

,

,η

v

−ν

ΩΓA′ Δ + ΓΩΓ′

Then we can do the transformation back to the x-space and we obtain:

y=Ax=[At|v≥0]=∼CSTl,q(µy,Ωy,Γy,νy,Δy,ηy),

where the parameters are given as

µy =Aµ,

Ωy =AΩA′,

Γy =ΓΩA′Ωy1,

νy =ν,

Δy =Δ + ΓΩΓ′ − ΓΩA′Ωy1AΩΓ′

ηy =η.

i

APPENDIX A. PROOF OF PROPERTIES FOR THE CST-DISTRIBUTION

A.2

Closed Under Conditioning

Let x ∼ CSTn,q (µ, Ω, Γ, ν, Δ, η), with x1 ∈ Rk and x2 ∈ Rn−k being two subvec-

tors of x = [x1, x2]′. Further let the parameters associated with the subvectors

be:

]

]

[µ

[Ω11

Ω12

1

µ=

,

Ω=

,

Γ=[Γ1,Γ2].

µ2

Ω21

Ω22

If we do the transformation of x to the tv-space as given in Expression (6.1) we

obtain

t1

µ1

Ω11

Ω12

Ω11Γ1 + Ω12Γ2

t2

Tn+q

µ2

Ω21

Ω22

Ω21Γ1 + Ω22Γ2

η

∼

,

,

v

−ν

Γ1Ω11 + Γ2Ω21

Γ1Ω12 + Γ2Ω22

Δ+ΓΩΓ′

Hence will the distribution of [(t1, v)|t2] be as given by property 4 in Section

6.1:

]

[µ

t1|t2

µt

=

1v|t2

µv|t

2

]

[

]

[ µ

1

Ω12

=

+

Ω

(t2 − µ2)

2

−ν2

Γ2Ω22 + Γ1Ω12

[

]

µ1 + Ω12Ω

(t2 − µ2)

2

=

−ν1 + (Γ2 + Γ1Ω12Ω

(t2 − µ2)

2 )

and the correlation matrix of (t1, v) given t2 is

]

[ Ωt

Γt

1|t2

1v|t2

Ωt

=

1v|t2

Γvt

Ωv|t

1|t2

2

]

([ Ω11

Ω11Γ1 + Ω12Γ2

=κ(η)

Γ1Ω11 + Γ2Ω21

Δ+ΓΩΓ′

[

]

[

]′

Ω12

Ω12

−

Ω

2

Γ2Ω22 + Γ1Ω12

Γ2Ω22 + Γ1Ω12

]

([ Ω11

Ω11Γ1 + Ω12Γ2

=κ(η)

Γ1Ω11 + Γ2Ω21

Δ+ΓΩΓ′

[

])

Ω12Ω

Ω12Γ2 + Ω12Ω

2 Ω21

2 Ω21Γ1

−

′

Γ2Ω21 + Γ1Ω12Ω

Γ2Ω22Γ2 + Γ2Ω21Γ1 + Γ1Ω12Γ2 + Γ1Ω12Ω

Γ

2 Ω21

2 Ω21

1

Then we can do the transformation back to the x-space and we obtain

[x1|x2] ∼ CSTk,q (µ1|2, Ω1|2, Γ1|2, ν1|2, Δ1|2, η1|2).

ii

A.2. CLOSED UNDER CONDITIONING

Where

µ1|2 =µt

= µ1 + Ω12Ω

(x2 − µ2),

1|t2

2

Ω1|2 =Ωt

= κ(η)(Ω11 − Ω12Ω

1|t2

2 Ω21),

Γ1|2 =Γvt

= κ(η)

1|t2

2 Ω21)(Ω11 − Ω12Ω

2 Ω21)−1

κ(η) (Γ1Ω11 + Γ2Ω21 − Γ2Ω21 − Γ1Ω12Ω

=Γ1

ν 1|2 = − µv|t

=ν−

(Γ2 + Γ1Ω12Ω

) (x2 − µ

2

2

2),

Δ1|2 =Ωv|t

2 −Γvt1|t2Ωt1|t2Γt1v|t2

=κ(η) (Δ + Γ1Ω11Γ1 + Γ1Ω12Γ2 + Γ2Ω21Γ1 + Γ2Ω22Γ2

− Γ2Ω22Γ2 − Γ2Ω21Γ1 − Γ1Ω12Γ2 − Γ1Ω12Ω

2 Ω21Γ1

)

−Γ1(Ω11Γ1 − Ω12Ω

2 Ω21Γ1)

=κ(η)Δ

η1|2 =η.

iii