Generation of Truncated Gaussian Samples

For the generation of test samples, it is sometimes necessary to only use truncated values, since they have to be in a specific range. Achieving this for values following a normal distribution is not straightforward to implement efficiently since the distribution should be left unchanged.

In Python it is possible to generate values following a truncated normal distribution with the scipy.stats.truncnorm module. Out of interest (and, of course, speed issues :-P), I was looking for alternatives and came across this page of Vincent Mazet. There, he presents an algorithm and implementation of a simulation for a truncated gaussian distribution. The Python version of N. Chopin did not work for me (update: I used the wrong method! It works, but is a lot slower than Mazet's version), so I converted his most recent (very nice to read!) Matlab implementation to Python. It is slower than the scipy one, but might be interesting for theoretical aspects and leaves room for a lot of optimization.

The implementation is available on github.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
import matplotlib.pyplot as plt
import rtnorm as rt
from scipy.stats import truncnorm as tn

# Mazet's version
%timeit rtvals = rt.rtnorm(-5, 5, sigma=3., size=10000)
# 10 loops, best of 3: 167 ms per loop
rtvals = rt.rtnorm(-5, 5, sigma=3., size=10000)
plt.hist(rtvals)

# Scipy version
# The parameterization is different here!
scipytn = tn(-5/3., 5/3., scale=3)
# C implementation :) -> 1000 loops, best of 3: 528 us per loop
%timeit k = scipytn.rvs(10000)
scipyvals = scipytn.rvs(10000)
plt.hist(scipyvals)

Mazet's algorithm Scipy implementation

Update 11/27/2014:

  1. Updated the link to Vincent Mazet's homepage.
  2. Removed the static link to the implementation .zip file and moved the source to github.

Comments