Euclidean distance between the two points using vectorized approach

Question

I have two large numpy arrays for which I want to calculate an Euclidean Distance using sklearn. The following MRE achieves what I want in the final result, but since my RL usage is large, I really want a vectorized solution as opposed to using a for loop.

import numpy as npfrom sklearn.metrics.pairwise import euclidean_distancesn = 3sample_size = 5X = np.random.randint(0, 10, size=(sample_size, n))Y = np.random.randint(0, 10, size=(sample_size, n))lst = []for f in range(0, sample_size):ed = euclidean_distances([X[f]], [Y[f]])lst.append(ed[0][0])print(lst)

Best Answer

Accepted Answer

euclidean_distances computes the distance for each combination of X,Y points; this will grow large in memory and is totally unnecessary if you just want the distance between each respective row. Sklearn includes a different function called paired_distances that does what you want:

from sklearn.metrics.pairwise import paired_distancesd = paired_distances(X,Y)# array([5.83095189, 9.94987437, 7.34846923, 5.47722558, 4. ])

If you need the full pairwise distances, you can get the same result from the diagonal (as pointed out in the comments):

d = euclidean_distances(X,Y).diagonal()

Lastly: arrays are a numpy type, so it is useful to know the numpy api itself (prob. what sklearn calls under the hood). Here are two examples:

d = np.linalg.norm(X-Y, axis=1)d = np.sqrt(np.sum((X-Y)**2, axis=1))

Euclidean distance between the two points using vectorized approach

Best Answer

Random Posts