How does the numpy function `array_split` work mathematically?

Question

I need to write a Python function that when passed an array, and an integer N, returns the contents of the array divided into N sub-arrays of equal size.

If the length of the array cannot be divided equally by N, the final sub-arrays must be of suitable length to accommodate the remaining elements.

Example:split_array(array=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10], n=4)

Should output: [[1, 2, 3], [4, 5, 6], [7, 8], [9, 10]]

My research indicated that the numpy.array_split function does exactly that and I looked at the source code on GitHub and found that first it composes an array containing all the sizes of the sub-arrays which it then iterates over to split the original array.

Abridged sample from numpy.array_split

def array_split(ary, indices_or_sections, axis=0):# indices_or_sections is a scalar, not an array.Nsections = int(indices_or_sections)if Nsections <= 0:raise ValueError('number sections must be larger than 0.')Neach_section, extras = divmod(Ntotal, Nsections)section_sizes = ([0] +extras * [Neach_section+1] +(Nsections-extras) * [Neach_section])div_points = _nx.array(section_sizes, dtype=_nx.intp).cumsum()sub_arys = []sary = _nx.swapaxes(ary, axis, 0)for i in range(Nsections):st = div_points[i]end = div_points[i + 1]sub_arys.append(_nx.swapaxes(sary[st:end], axis, 0))return sub_arys

The only thing I'm struggling to understand is how the variable section_sizes is created mathematically.For the example split_array(array=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10], n=4) it builds a list of sizes which would be [3, 3, 2, 2] which is exactly what I need but I don't understand why it works.

I understand that divmod(Ntotal, Nsections) will give you the quotient(Neach_section) and remainder(extras) of a division calculation.

But why does quotient * [remainder+1] always give you the exact right number of correctly-sized "quotient" sub-array sizes (In the case of this example [3, 3])?

Why does [quotient-remainder] * quotient give you the exact right number of correctly-sized "remainder" sub-array sizes (In the case of this example [2, 2])?

Could someone even just tell me what this kind of operation is called or what branch of mathematics this deals with as it's not something I've come across before.

Best Answer

Accepted Answer

For clarity, I'll refer to this:

Neach_section, extras = divmod(Ntotal, Nsections)section_sizes = ([0] +extras * [Neach_section+1] +(Nsections-extras) * [Neach_section])

as

quotient, remainder = divmod(Ntotal, Nsections)section_sizes = ([0] +remainder * [quotient+1] +(Nsections- remainder) * [quotient])

First lets imagine a similar case to the one shown in your question. (Modified for Quotient != remainder)

print(np.array_split(np.arange(1,15),4) >>>[array([1, 2, 3, 4]), array([5, 6, 7, 8]), array([ 9, 10, 11]), array([12, 13, 14])]

Its easier to think it in terms of the division that this ultimately represents.

14 = 4*3 + 2

Which is also the same as

14 = (3 + 3 + 3 + 3) + 2

= (3 + 3 + 3 + 3) + (1 + 1)

And critically we can add those ones to the first two terms in the first bracket.

14 = 4 + 4 + 3 + 3

In general what we've done is we're adding one to the first (Remainder) terms of the output list leaving us with the snippet of code

...remainder * [quotient+1]...

Out of the (Quotient) terms in the output we have added the first (remainder) terms leaving us with the next (quotient-remainder) terms to fill

...(Nsections- remainder) * [quotient])

Leaving us with the final code.

Could someone even just tell me what this kind of operation is called or what branch of mathematics this deals with as it's not something I've come across before.

I believe this is loosely related to number theory and the quotient-remainder theorem is probably one of the first things you learn for it.

Anyway I hope that helped :)

How does the numpy function `array_split` work mathematically?

Best Answer

Random Posts