How do I find the Euclidean distance of two vectors:
x1 <- rnorm(30)x2 <- rnorm(30)
Best Answer
Use the dist()
function, but you need to form a matrix from the two inputs for the first argument to dist()
:
dist(rbind(x1, x2))
For the input in the OP's question we get:
> dist(rbind(x1, x2))x1x2 7.94821
a single value that is the Euclidean distance between x1
and x2
.
As defined on Wikipedia, this should do it.
euc.dist <- function(x1, x2) sqrt(sum((x1 - x2) ^ 2))
There's also the rdist
function in the fields
package that may be useful. See here.
EDIT: Changed **
operator to ^
. Thanks, Gavin.
try using this:
sqrt(sum((x1-x2)^2))
If you want to use less code, you can also use the norm
in the stats
package (the 'F' stands for Forbenius, which is the Euclidean norm):
norm(matrix(x1-x2), 'F')
While this may look a bit neater, it's not faster. Indeed, a quick test on very large vectors shows little difference, though so12311's method is slightly faster. We first define:
set.seed(1234)x1 <- rnorm(300000000)x2 <- rnorm(300000000)
Then testing for time yields the following:
> system.time(a<-sqrt(sum((x1-x2)^2)))user system elapsed 1.02 0.12 1.18 > system.time(b<-norm(matrix(x1-x2), 'F'))user system elapsed 0.97 0.33 1.31
If you need to quickly calculate the Euclidean distance between one vector and a matrix of many vectors, then you can use the tcrossprod
method from this answer:
bench=function(...,n=1,r=3){a=match.call(expand.dots=F)$...t=matrix(ncol=length(a),nrow=n)for(i in 1:length(a))for(j in 1:n){t1=Sys.time();eval(a[[i]],parent.frame());t[j,i]=Sys.time()-t1}o=t(apply(t,2,function(x)c(median(x),min(x),max(x),mean(x))))round(100*`dimnames<-`(o,list(names(a),c("median","min","max","mean"))),r)}es=3:6r=sapply(es,function(e){m=matrix(rnorm(10^e),ncol=10)v=rnorm(10)bench(n=10,tcrossprod={sqrt(outer(rowSums(m^2),rowSums(t(v)^2),"+")-tcrossprod(m,2*t(v)))},Rfast_dista={Rfast::dista(m,t(v))},vectorized={sapply(colSums((v-t(m))^2),sqrt)},regular={apply(m,1,function(x)sqrt(sum((v-x)^2)))},dotproduct={apply(m,1,function(x){q=v-x;sqrt(q%*%q)})},norm={apply((v-t(m)),2,function(x)norm(as.matrix(x),"F"))},rbind_single={apply(m,1,function(x)dist(rbind(v,x))[1])},rbind_all={if(e<=5)unname(as.matrix(dist(rbind(v,m)))[1,-1])})[,1]})colnames(r)=paste0("1e",es)r[8,4]=NAround(r,3)
Output:
1e3 1e4 1e5 1e6tcrossprod 0.004 0.012 0.093 0.972Rfast_dista 0.003 0.013 0.115 1.148vectorized 0.006 0.038 0.366 3.835regular 0.021 0.181 1.901 20.437dotproduct 0.020 0.183 2.010 24.560norm 0.057 0.562 6.017 62.532rbind_single 0.159 1.592 16.982 181.834rbind_all 0.036 3.493 530.259 NA