For the generation of test samples, it is sometimes necessary to only use truncated values, since they have to be in a specific range. Achieving this for values following a normal distribution is not straightforward to implement efficiently since the distribution should be left unchanged.
In Python it is possible to generate values following a truncated normal distribution with the
scipy.stats.truncnorm module. Out of interest (and, of course, speed issues :-P), I was looking for alternatives and came across this page of Vincent Mazet. There, he presents an algorithm and implementation of a simulation for a truncated gaussian distribution. The Python version of N. Chopin did not work for me (update: I used the wrong method! It works, but is a lot slower than Mazet's version), so I converted his most recent (very nice to read!) Matlab implementation to Python. It is slower than the scipy one, but might be interesting for theoretical aspects and leaves room for a lot of optimization.
The implementation is available on github.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
import matplotlib.pyplot as plt import rtnorm as rt from scipy.stats import truncnorm as tn # Mazet's version %timeit rtvals = rt.rtnorm(-5, 5, sigma=3., size=10000) # 10 loops, best of 3: 167 ms per loop rtvals = rt.rtnorm(-5, 5, sigma=3., size=10000) plt.hist(rtvals) # Scipy version # The parameterization is different here! scipytn = tn(-5/3., 5/3., scale=3) # C implementation :) -> 1000 loops, best of 3: 528 us per loop %timeit k = scipytn.rvs(10000) scipyvals = scipytn.rvs(10000) plt.hist(scipyvals)
- Updated the link to Vincent Mazet's homepage.
- Removed the static link to the implementation .zip file and moved the source to github.