I want to replicate MATLAB's randperm()
with NumPy.
Currently, to get randperm(n, k)
I use np.random.permutation(n)[:k]
. The problem is it allocates an array of size n
then takes only k
entries of it.
Is there a more memory efficient way to directly create the array?
Best Answer
numpy.random.choice(n, k, replace=False)
is no more memory efficient than numpy.random.permutation(n)[:k]
. It too creates an n
-item temporary list, shuffles that list, and takes k
items from that list. See:
- Comparison of np.random.choice vs np.random.shuffle for samples without replacement
However, numpy.random.*
functions, such as numpy.random.choice
and numpy.random.permutation
, have become legacy functions as of NumPy 1.17, and their algorithms — inefficiencies and all — are expected to remain as they are for backward compatibility reasons (see the recent RNG policy for NumPy).
Fortunately, NumPy since version 1.17 has an alternative:numpy.random.Generator.choice
, which uses a much more efficient implementation, as can be seen below:
In [227]: timeit np.random.choice(4000000, 48, replace = False) 163 ms ± 19.3 ms per loop (mean ± std. Dev. Of 7 runs, 1 loop each)In [228]: timeit np.random.permutation(4000000)[:48] 178 ms ± 22.5 ms per loop (mean ± std. Dev. Of 7 runs, 1 loop each)In [229]: r=numpy.random.default_rng() In [230]: timeit r.choice(4000000,48,replace=False) 14.5 µs ± 28.9 ns per loop (mean ± std. Dev. Of 7 runs, 100000 loops each)
If you use NumPy 1.17 or later, you should make use of the new pseudorandom number generation system introduced in version 1.17, including numpy.random.Generator
, in newer applications.
I can recommend you np.random.choice(n, k, replace = False)
.Yet, I am not sure about memory efficiency.Please refer to docs
Based on @TaQ answer:
np.random.choice(n, k, replace = False)
Is the equivalent to MATLAB's randperm()
.
Update: I will update his answer as well to mark it.