1
$\begingroup$

I'm simulating a simple Ornstein-Uhlenbeck process

$dx=-x dt+\sqrt{2}dW$

which is well-known to have a steady state distribution of

$p_s(x)=\frac{1}{\sqrt{2\pi}}e^{-x^2/2}$

Here's my matlab code to run the simulation

dt = 0.01; T= 5000; X = zeros(size(0:dt:T)); for t=0:dt:T % This is the Euler scheme X(:,i+1)=X(:,i)-X(:,i)*dt+sqrt(2*dt)*randn; % This is the exact formula % X(:,i+1)=X(:,i)*exp(-dt)+sqrt(1-exp(-2*dt))*randn; i=i+1; end

Then I do the Kolmogorov–Smirnov test to check the normality of the resulting distribution

[h,p] = kstest(X);

but always get h = 1, which rejects the null hypothesis that the data obeys standard normal distribution.

To find out the reason, I generate random numbers of normal distribution x = randn(size(X)) and compare their cumulative distribution function with the standard normal distribution pd = makedist('normal',0,1); [fX,t] = ecdf(X); y = cdf(pd,t); plot(t,fX-y) [fx,t] = ecdf(x); hold on; plot(t,fx-y); The simulation-generated X and the matlab-generated x show very similar shape.
Then I do the two-sample KS test,

[h,p] = kstest2(X,x)

It returns h=0 (X and x are from the same distribution). So I'm really confused here. Why the simulated X cannot pass the normality test and the p value is far less than the significance level 0.05?

  • 0
    How many scenarios have you tried ?2017-01-11
  • 0
    no less than 10. The results of kstest2 vary from runs. But the rejection of X from being normally distributed is consistent.2017-01-11
  • 0
    Ultimately , you would like to prove that $x_{5000}$ is normally distributed. I am not familiar with Matlab, and I have no idea what kstest actually manages $X$.Yet, it seems like $X$ is a matrix, with $X(i,j)$ the $i-th$ sample of UL process at time $t_j$. Why dont you generate all scenarios by yourself? for instance 1000 scenarios, then you build the histogram of $\{X(1,5000),X(2,5000),...,X(1000,5000)\}$.2017-01-11
  • 0
    What you meant is ensemble average over different scenarios. What I did is temporal average over a single but long-enough scenario. They are equivalent if the system is ergodic. I did draw the histograms, and they look very similar to normal distribution. But this judgement is subjective. kstest (Kolmogorov-Smirnov test) is a statistical test to judge whether the data obeys the specified distribution. [link](https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test)2017-01-11
  • 0
    Is your sample the vector $$U=\{X(1,1),...,X(1,5000)\}$$ ?2017-01-11
  • 0
    Yes, it is my sample.2017-01-11
  • 0
    You would agree with me that $U_k \sim \mathcal{N}(0,1-e^{-2t_k})$, where $t_k=kdt$.It means that $U$ is a collection of normally distributed variables with different variances, and that the vector $X$ that you generated has no reason to approach a standard Gaussian variable. . I read on Matlab website that "h = kstest(x) returns a test decision for the null hypothesis that the data in vector x comes from a standard normal distribution"2017-01-11
  • 0
    Let even assume that you quickly converge to standard deviation $1$. The way you build the samples imposes a correlation between $X(1,i+1)$ and $X(1,i)$2017-01-11
  • 0
    The simulation does impose a correlation between X(1,i+1) and X(1,i), but does this have anything to do with the distribution of X(1)?2017-01-11

1 Answers 1

0

The process $x(t)$ can be written as : $$x(t)=x(s)e^{-(t-s)}+\sqrt{2}\int_{s}^{t}{e^{-(t-u)}dW_u}$$, with $s

Let $t_0

Obviously , one would expect that if $t_n$ is large, $x(t_n) \sim \mathcal{N}(0,1)$ approximately.

This can be tested by building a sample $U=\{x(t_n,\omega_1),...,x(t_n,\omega_N)\}$, with $x(t_k,\omega _l)$ the value of $x(t_k)$ for the $l^{th}$ scenario.

We can pick $t_n=5000$ as you did , and a decent number of scenarios $N=1000$ for example. MATLAB function kstest applied on $U$ should be positive, because we stick to one distribution : $x(t_n)$

What you tried to do is to test the sample $U'=\{x(t_1,\omega_1),...,x(t_n,\omega_1)\}$, which is basically one occurrence for each distribution $x(t_k)$. Each element of $U'$ are not taken from one distribution but $n$ ones. For example, $var(X_{t_1}) \approx 0.0198$ while $var(X_{t_{50}}) \approx 0.6321$. We hit the $0.999$-ish from $t_{500}$.

That means that roughly $10\%$ of your sample comes from distributions with variances smaller than $0.999$.

Let assume that this bogus part of the sample is irrelevant. We still have the issue with the covariance

Indeed, by definition , a sample must come from IID random variables. In your case, we have $$cov(x(t),x(s))=cov(\sqrt{2}\int_{0}^{t}{e^{-(t-u)}dW_u},\sqrt{2}\int_{0}^{s}{e^{-(s-u)}dW_u})$$ $$=2\int_{0}^{s}{e^{-(t-u)}e^{-(s-u)}du}=e^{s-t}$$

By construction of $U'$, we have non-nil correlation between its elements, even for large time points. Take $x(1000)$ and $x(1001)$ , both variance "are" one, but the correlation is $0.3678$.

The conclusion is that one should not expect a positive result from $ktest$.

You can try to take a subset of your initial sample from $t_1=50$ for example, and take the values every $30$ units. For example $U''=\{x(50,\omega_1),x(80,\omega_1)...,x(4970,\omega_1),,x(5000,\omega_1)\}$. It is just a suggestion, I do not guarantee results.

  • 0
    You are welcome2017-01-11