[Scilab-users] Pairwise distance of a huge amount of points

classic Classic list List threaded Threaded
11 messages Options
Christophe Dang Ngoc Chan Christophe Dang Ngoc Chan
Reply | Threaded
Open this post in threaded view
|

[Scilab-users] Pairwise distance of a huge amount of points

Hello,

I need to compute the pairwise distance of a huge amount of points, namely n = 49545.

I re-read a previous discussion
http://mailinglists.scilab.org/Plot-overlays-on-images-tp2617675p4030599.html

So, I'm trying yo avoid loops,
but I need then at least to have a vectors, or sparse matrices, containing n*(n+1)/2 elements,
i.e. 1.227D+09, which of course exceeds the maximum stacksize.

Any hint?

FYI, three vectors of size n called X0, Y0 and Z0,
the following procedure works well for small vectors
(I tried with n = 1000)

// **********
n = size(X0, "r"); // amount of points

uns = sparse(triu(ones(n, n))); // sparse upper triangle matrix of ones

Xd = meshgrid(X0).*uns;
Yd = meshgrid(Y0).*uns;
Zd = meshgrid(Z0).*uns;
Xf = meshgrid(X0)'.*uns;
Yf = meshgrid(Y0)'.*uns;
Zf = meshgrid(Z0)'.*uns;
D = sqrt((Xf - Xd).^2 + (Yf - Yd).^2 + (Zf - Zd).^2);
// **********

in my case, it stops at step 2.

The limiting operation is ones(n, n), which requires a n*n matrix.
Using directly sparse() also requires a n^2 vector.

It might be possible to work with higher values of n by generating a sparse matrix without generating a n*n matrix (or a n^2 vector), as there are only n*(n+1)/2 non-zeroes values -- Is it possible?

(Anyway, in my case, it is still too big).

--
Christophe Dang Ngoc Chan
Mechanical calculation engineer

______________________________________________________________________

This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error), please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.
______________________________________________________________________
_______________________________________________
users mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/users
Samuel GOUGEON Samuel GOUGEON
Reply | Threaded
Open this post in threaded view
|

Re: Pairwise distance of a huge amount of points

Hello Christophe,

Le 28/08/2014 16:22, Dang, Christophe a écrit :
> Hello,
>
> I need to compute the pairwise distance of a huge amount of points, namely n = 49545.
>
> I re-read a previous discussion
> http://mailinglists.scilab.org/Plot-overlays-on-images-tp2617675p4030599.html
>
> So, I'm trying yo avoid loops,
I'm afraid you can't. You will need one loop over slices along one of
both dimensions.
Such a slicing algorithm in an identical situation is used inside
members.sci.
You may edit it and see.

HTH
Samuel

_______________________________________________
users mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/users
Christophe Dang Ngoc Chan Christophe Dang Ngoc Chan
Reply | Threaded
Open this post in threaded view
|

Re: Pairwise distance of a huge amount of points

Hello,

> De Samuel Gougeon
> Envoyé : jeudi 28 août 2014 23:59
>
> You will need one loop over slices along one of both dimensions.
> Such a slicing algorithm in an identical situation is used inside members.sci.

OK, I just overlooked the file but am lazy to read a 400 lines script and prefered to write from scratch (-:

My script so far looks like

// **********

n = size(X0, "r"); // amount of points

dmax = 0;

for i = 1:n-1
    d = max(sqrt((X0(1:$-i) - X0(i+1:$)).^2 + (Y0(1:$-i) - Y0(i+1:$)).^2 +...
        (Z0(1:$-i) - Z0(i+1:$)).^2));
    if d > dmax then
        dmax = d;
    end
end

// **********

tested on a subset of 1000 data, it gives the same result as the naive n^2 calculation, and is 64 times faster.

Thanks for the help.

--
Christophe Dang Ngoc Chan
Mechanical calculation engineer

______________________________________________________________________

This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error), please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.
______________________________________________________________________
_______________________________________________
users mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/users
Christophe Dang Ngoc Chan Christophe Dang Ngoc Chan
Reply | Threaded
Open this post in threaded view
|

Re: Pairwise distance of a huge amount of points

Just to close the subject:

I tried to implement the algorithm with sparse matrices, and it is less efficient than scanning over one dimension: 7 times faster than the naive algorithm.

If I generate the sparse matrix from a n*(n-1)/2 vector,
it is even worse: less efficient than the naive algorithm (1.1 times longer),
this only to gain a factor 2 on the amount of points that can be handeled.

Conclusion: forget the sparse matrices for this application.

--
Christophe Dang Ngoc Chan
Mechanical calculation engineer

______________________________________________________________________

This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error), please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.
______________________________________________________________________
_______________________________________________
users mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/users
shorne shorne
Reply | Threaded
Open this post in threaded view
|

Re: Pairwise distance of a huge amount of points

Jumping in late --
If your goal is to find nearest the N neighbors of a given point, look up the K-D Tree algorithm --( I used it some decades ago...  in my case the points were each of higher dimension -- around 9 or so.)

And if that is the application, and you want to go fast - consider using something other than the L-2 norm to set up the tree.  L-1 should be much faster - sum of absolute values of component distances; no squaring or square roots -- "Manhattan norm"
Once you have built the tree and retrieved a bunch of nearest neighbors, you can of course compute the L-2 distances over that much smaller set, if necessary...

S




From: "Dang, Christophe" <[hidden email]>
To: "International users mailing list for Scilab." <[hidden email]>,
Date: 08/29/2014 06:58 AM
Subject: Re: [Scilab-users] Pairwise distance of a huge amount of points
Sent by: "users" <[hidden email]>





Just to close the subject:

I tried to implement the algorithm with sparse matrices, and it is less efficient than scanning over one dimension: 7 times faster than the naive algorithm.

If I generate the sparse matrix from a n*(n-1)/2 vector,
it is even worse: less efficient than the naive algorithm (1.1 times longer),
this only to gain a factor 2 on the amount of points that can be handeled.

Conclusion: forget the sparse matrices for this application.

--
Christophe Dang Ngoc Chan
Mechanical calculation engineer

______________________________________________________________________

This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error), please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.
______________________________________________________________________
_______________________________________________
users mailing list
[hidden email]
http://cp.mcafee.com/d/1jWVIe418SyNsQsITd79EVpKrKrsKDtCXUUSVteXaaapJOWtSrLL6T4S7bLzC3t2toMY5v3UAvD8mVsTYfyh-sxrBPrTSvLtx_HYCCyyyyqeuLsKyqerTVBdV4sMZORQr8EGTsjVkffGhBrwqrhdECXYCej79zANOoUTsS03fBiteFqh-Ae00U9GX33VkDa3JsrFYq6SWv6xsxlK5LE2zVkDjGmAvF3w09JZdNcS2_id41Fr6bifQwnrFYq6BQQg1rphciFVEwgBiuMJVEwGWq8dd41dIzVEwRmHs9Cy2HFEwmHa4W6T6jp-xPMhmc


_______________________________________________
users mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/users
Samuel GOUGEON Samuel GOUGEON
Reply | Threaded
Open this post in threaded view
|

Re: Pairwise distance of a huge amount of points

In reply to this post by Christophe Dang Ngoc Chan
Le 29/08/2014 09:29, Dang, Christophe a écrit :
> Hello,
>
>> De Samuel Gougeon
>> Envoyé : jeudi 28 août 2014 23:59
>>
>> You will need one loop over slices along one of both dimensions.
>> Such a slicing algorithm in an identical situation is used inside members.sci.
> OK, I just overlooked the file but am lazy to read a 400 lines script and prefered to write from scratch (-:
Yes. Actually the algo in members()  defines and runs over multiline
slices, that shortens a lot the number of iterations in the explicit
loop (in order to maximize the vectorized (and so faster) processing of
the bloc/slice), but is more complex to implement.
Since using single lines in the loop already saves a lot of time for
your case, and is enough, simplicity is smarter.

Samuel

_______________________________________________
users mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/users
Samuel GOUGEON Samuel GOUGEON
Reply | Threaded
Open this post in threaded view
|

Re: Pairwise distance of a huge amount of points

In reply to this post by Christophe Dang Ngoc Chan
Le 29/08/2014 12:58, Dang, Christophe a écrit :
> Just to close the subject:
>
> I tried to implement the algorithm with sparse matrices, and it is less efficient than scanning over one dimension: 7 times faster than the naive algorithm.
Yes, it was somewhat expected. Also for memory consumption, the sparse
encoding becomes interesting vs the dense encoding only when the
sparsity becomes smaller than .. 50%.
There was a quite recent thread on bugzilla, about the relevance of
keeping a sparse encoding for the result of cos(SP) where SP is a sparse
(so with a good fraction of null values).
It was finally -- wisely -- decided (after some tests) to switch to a
dense encoding for that case (as it should be for any function turning 0
into a non-null value).

Samuel

_______________________________________________
users mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/users
Christophe Dang Ngoc Chan Christophe Dang Ngoc Chan
Reply | Threaded
Open this post in threaded view
|

Re: Pairwise distance of a huge amount of points

In reply to this post by shorne
Hello,

> De : [hidden email]
> Envoyé : vendredi 29 août 2014 16:38
>
> If your goal is to find nearest the N neighbors of a given point,
> look up the K-D Tree algorithm

This is not the case, but I keep your advice in mind.
Tree algorithms are often an elegant and efficient way to solve problems,
but are less intuitive to me so I don't thin about them first.

More generally speaking, I should keep in mind maybe to ask more general questions
(i.e. presenting the problem to be solved instead of a solution used to solve it).

In my case, I have coordinates of points of an elastic body obtained by finite elemnts analysis.

The absolute elongation should be below a given value
(this is imho not a relevant criterion, but that's the way it is),
so I compute the difference of pairwise distances and look for the greater one.

Anyway, tha,ks for the tip.

--
Christophe Dang Ngoc Chan
Mechanical calculation engineer
This e-mail may contain confidential and / or privileged information. If you are not the intended recipient (or have received this e-mail in error), please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.
_______________________________________________
users mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/users
Christophe Dang Ngoc Chan Christophe Dang Ngoc Chan
Reply | Threaded
Open this post in threaded view
|

Re: Pairwise distance of a huge amount of points

In reply to this post by Samuel GOUGEON
Hello,

> De : Samuel Gougeon
> Envoyé : vendredi 29 août 2014 23:51
>
> Actually the algo in members()  defines and runs over multiline slices,
> that shortens a lot the number of iterations in the explicit loop [...]
> but is more complex to implement.

Sounds interesting.
I'll have a look at it. Thanks.

--
Christophe Dang Ngoc Chan
Mechanical calculation engineer
This e-mail may contain confidential and / or privileged information. If you are not the intended recipient (or have received this e-mail in error), please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.
_______________________________________________
users mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/users
ezequiel soule ezequiel soule
Reply | Threaded
Open this post in threaded view
|

Re: Pairwise distance of a huge amount of points

In reply to this post by Samuel GOUGEON
Hi, I am working in something related, which is deteriming overlaps in a
sistem of spheres with random positions. So I have to compute the
distance between centers of spheres, and compare it with the contact
distance. I implemented a cell algorithm, which is used in molecular
dynamics (you can look it up in internet if you don´t know it):
essentially you divide the space in "cells", with a size that has to be
larger than the maximum posible distance, then you identify in which
cell each point is located, and then you compute the distance between
each point and the points located in the same and the neigbhouring
cells. The speed of the algorithm scales with n, and the computed
distance matrix has n*c*b non-zero elements, where "c" is the average
number of points in a cell and "b" is the number of neiborghs of each
cells, plus one. "c*b" might be orders of magnitude smaller than "n" so
there is a big advantaje in using sparse matrices. I used this with
hundred thousands spheres without exeding the maximum stacksize.
The only problem is that you need to know the maximum possible distance
in order to define the cell size (if your cell is to small you can miss
relevant meassurements, if it is too large, you lose efficiency); I
don´t know if this is possible in your application.

El 29/08/2014 07:01 p.m., Samuel Gougeon escribió:

> Le 29/08/2014 12:58, Dang, Christophe a écrit :
>> Just to close the subject:
>>
>> I tried to implement the algorithm with sparse matrices, and it is
>> less efficient than scanning over one dimension: 7 times faster than
>> the naive algorithm.
> Yes, it was somewhat expected. Also for memory consumption, the sparse
> encoding becomes interesting vs the dense encoding only when the
> sparsity becomes smaller than .. 50%.
> There was a quite recent thread on bugzilla, about the relevance of
> keeping a sparse encoding for the result of cos(SP) where SP is a
> sparse (so with a good fraction of null values).
> It was finally -- wisely -- decided (after some tests) to switch to a
> dense encoding for that case (as it should be for any function turning
> 0 into a non-null value).
>
> Samuel
>
> _______________________________________________
> users mailing list
> [hidden email]
> http://lists.scilab.org/mailman/listinfo/users

_______________________________________________
users mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/users
Christophe Dang Ngoc Chan Christophe Dang Ngoc Chan
Reply | Threaded
Open this post in threaded view
|

Re: Pairwise distance of a huge amount of points

Hello,

> De : Ezequiel Soule
> Envoyé : lundi 1 septembre 2014 15:36
>
> I implemented a cell algorithm, which is used in molecular dynamics [...]
> you divide the space in "cells", with a size that has to be larger than the maximum
> posible distance, then you identify in which cell each point is located,
> and then you compute the distance between each point and the points located
> in the same and the neigbhouring cells.
> The speed of the algorithm scales with n, [...]
> there is a big advantaje in using sparse matrices.

That sounds a nice solution for some cases; I'll keep it in mind.
Thanks for the information.

--
Christophe Dang Ngoc Chan
Mechanical calculation engineer
This e-mail may contain confidential and / or privileged information. If you are not the intended recipient (or have received this e-mail in error), please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.
_______________________________________________
users mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/users