# [Scilab-users] Pairwise distance of a huge amount of points

11 messages
Open this post in threaded view
|

## [Scilab-users] Pairwise distance of a huge amount of points

 Hello, I need to compute the pairwise distance of a huge amount of points, namely n = 49545. I re-read a previous discussion http://mailinglists.scilab.org/Plot-overlays-on-images-tp2617675p4030599.htmlSo, I'm trying yo avoid loops, but I need then at least to have a vectors, or sparse matrices, containing n*(n+1)/2 elements, i.e. 1.227D+09, which of course exceeds the maximum stacksize. Any hint? FYI, three vectors of size n called X0, Y0 and Z0, the following procedure works well for small vectors (I tried with n = 1000) // ********** n = size(X0, "r"); // amount of points uns = sparse(triu(ones(n, n))); // sparse upper triangle matrix of ones Xd = meshgrid(X0).*uns; Yd = meshgrid(Y0).*uns; Zd = meshgrid(Z0).*uns; Xf = meshgrid(X0)'.*uns; Yf = meshgrid(Y0)'.*uns; Zf = meshgrid(Z0)'.*uns; D = sqrt((Xf - Xd).^2 + (Yf - Yd).^2 + (Zf - Zd).^2); // ********** in my case, it stops at step 2. The limiting operation is ones(n, n), which requires a n*n matrix. Using directly sparse() also requires a n^2 vector. It might be possible to work with higher values of n by generating a sparse matrix without generating a n*n matrix (or a n^2 vector), as there are only n*(n+1)/2 non-zeroes values -- Is it possible? (Anyway, in my case, it is still too big). -- Christophe Dang Ngoc Chan Mechanical calculation engineer ______________________________________________________________________ This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error), please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden. ______________________________________________________________________ _______________________________________________ users mailing list [hidden email] http://lists.scilab.org/mailman/listinfo/users
Open this post in threaded view
|

## Re: Pairwise distance of a huge amount of points

 Hello Christophe, Le 28/08/2014 16:22, Dang, Christophe a écrit : > Hello, > > I need to compute the pairwise distance of a huge amount of points, namely n = 49545. > > I re-read a previous discussion > http://mailinglists.scilab.org/Plot-overlays-on-images-tp2617675p4030599.html> > So, I'm trying yo avoid loops, I'm afraid you can't. You will need one loop over slices along one of both dimensions. Such a slicing algorithm in an identical situation is used inside members.sci. You may edit it and see. HTH Samuel _______________________________________________ users mailing list [hidden email] http://lists.scilab.org/mailman/listinfo/users
Open this post in threaded view
|

## Re: Pairwise distance of a huge amount of points

 Hello, > De Samuel Gougeon > Envoyé : jeudi 28 août 2014 23:59 > > You will need one loop over slices along one of both dimensions. > Such a slicing algorithm in an identical situation is used inside members.sci. OK, I just overlooked the file but am lazy to read a 400 lines script and prefered to write from scratch (-: My script so far looks like // ********** n = size(X0, "r"); // amount of points dmax = 0; for i = 1:n-1     d = max(sqrt((X0(1:\$-i) - X0(i+1:\$)).^2 + (Y0(1:\$-i) - Y0(i+1:\$)).^2 +...         (Z0(1:\$-i) - Z0(i+1:\$)).^2));     if d > dmax then         dmax = d;     end end // ********** tested on a subset of 1000 data, it gives the same result as the naive n^2 calculation, and is 64 times faster. Thanks for the help. -- Christophe Dang Ngoc Chan Mechanical calculation engineer ______________________________________________________________________ This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error), please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden. ______________________________________________________________________ _______________________________________________ users mailing list [hidden email] http://lists.scilab.org/mailman/listinfo/users
Open this post in threaded view
|

## Re: Pairwise distance of a huge amount of points

 Just to close the subject: I tried to implement the algorithm with sparse matrices, and it is less efficient than scanning over one dimension: 7 times faster than the naive algorithm. If I generate the sparse matrix from a n*(n-1)/2 vector, it is even worse: less efficient than the naive algorithm (1.1 times longer), this only to gain a factor 2 on the amount of points that can be handeled. Conclusion: forget the sparse matrices for this application. -- Christophe Dang Ngoc Chan Mechanical calculation engineer ______________________________________________________________________ This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error), please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden. ______________________________________________________________________ _______________________________________________ users mailing list [hidden email] http://lists.scilab.org/mailman/listinfo/users
Open this post in threaded view
|

## Re: Pairwise distance of a huge amount of points

Jumping in late --
If your goal is to find nearest the N neighbors of a given point, look up the K-D Tree algorithm --( I used it some decades ago...  in my case the points were each of higher dimension -- around 9 or so.)

And if that is the application, and you want to go fast - consider using something other than the L-2 norm to set up the tree.  L-1 should be much faster - sum of absolute values of component distances; no squaring or square roots -- "Manhattan norm"
Once you have built the tree and retrieved a bunch of nearest neighbors, you can of course compute the L-2 distances over that much smaller set, if necessary...

S

 From: "Dang, Christophe" <[hidden email]> To: "International users mailing list for Scilab." <[hidden email]>, Date: 08/29/2014 06:58 AM Subject: Re: [Scilab-users] Pairwise distance of a huge amount of points Sent by: "users" <[hidden email]>

Just to close the subject:

I tried to implement the algorithm with sparse matrices, and it is less efficient than scanning over one dimension: 7 times faster than the naive algorithm.

If I generate the sparse matrix from a n*(n-1)/2 vector,
it is even worse: less efficient than the naive algorithm (1.1 times longer),
this only to gain a factor 2 on the amount of points that can be handeled.

Conclusion: forget the sparse matrices for this application.

--
Christophe Dang Ngoc Chan
Mechanical calculation engineer

______________________________________________________________________

This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error), please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.
______________________________________________________________________
_______________________________________________
users mailing list
[hidden email]
http://cp.mcafee.com/d/1jWVIe418SyNsQsITd79EVpKrKrsKDtCXUUSVteXaaapJOWtSrLL6T4S7bLzC3t2toMY5v3UAvD8mVsTYfyh-sxrBPrTSvLtx_HYCCyyyyqeuLsKyqerTVBdV4sMZORQr8EGTsjVkffGhBrwqrhdECXYCej79zANOoUTsS03fBiteFqh-Ae00U9GX33VkDa3JsrFYq6SWv6xsxlK5LE2zVkDjGmAvF3w09JZdNcS2_id41Fr6bifQwnrFYq6BQQg1rphciFVEwgBiuMJVEwGWq8dd41dIzVEwRmHs9Cy2HFEwmHa4W6T6jp-xPMhmc

_______________________________________________
users mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/users
Open this post in threaded view
|

## Re: Pairwise distance of a huge amount of points

 In reply to this post by Christophe Dang Ngoc Chan Le 29/08/2014 09:29, Dang, Christophe a écrit : > Hello, > >> De Samuel Gougeon >> Envoyé : jeudi 28 août 2014 23:59 >> >> You will need one loop over slices along one of both dimensions. >> Such a slicing algorithm in an identical situation is used inside members.sci. > OK, I just overlooked the file but am lazy to read a 400 lines script and prefered to write from scratch (-: Yes. Actually the algo in members()  defines and runs over multiline slices, that shortens a lot the number of iterations in the explicit loop (in order to maximize the vectorized (and so faster) processing of the bloc/slice), but is more complex to implement. Since using single lines in the loop already saves a lot of time for your case, and is enough, simplicity is smarter. Samuel _______________________________________________ users mailing list [hidden email] http://lists.scilab.org/mailman/listinfo/users
Open this post in threaded view
|

## Re: Pairwise distance of a huge amount of points

 In reply to this post by Christophe Dang Ngoc Chan Le 29/08/2014 12:58, Dang, Christophe a écrit : > Just to close the subject: > > I tried to implement the algorithm with sparse matrices, and it is less efficient than scanning over one dimension: 7 times faster than the naive algorithm. Yes, it was somewhat expected. Also for memory consumption, the sparse encoding becomes interesting vs the dense encoding only when the sparsity becomes smaller than .. 50%. There was a quite recent thread on bugzilla, about the relevance of keeping a sparse encoding for the result of cos(SP) where SP is a sparse (so with a good fraction of null values). It was finally -- wisely -- decided (after some tests) to switch to a dense encoding for that case (as it should be for any function turning 0 into a non-null value). Samuel _______________________________________________ users mailing list [hidden email] http://lists.scilab.org/mailman/listinfo/users
Open this post in threaded view
|

## Re: Pairwise distance of a huge amount of points

 In reply to this post by shorne Hello, > De : [hidden email] > Envoyé : vendredi 29 août 2014 16:38 > > If your goal is to find nearest the N neighbors of a given point, > look up the K-D Tree algorithm This is not the case, but I keep your advice in mind. Tree algorithms are often an elegant and efficient way to solve problems, but are less intuitive to me so I don't thin about them first. More generally speaking, I should keep in mind maybe to ask more general questions (i.e. presenting the problem to be solved instead of a solution used to solve it). In my case, I have coordinates of points of an elastic body obtained by finite elemnts analysis. The absolute elongation should be below a given value (this is imho not a relevant criterion, but that's the way it is), so I compute the difference of pairwise distances and look for the greater one. Anyway, tha,ks for the tip. -- Christophe Dang Ngoc Chan Mechanical calculation engineer This e-mail may contain confidential and / or privileged information. If you are not the intended recipient (or have received this e-mail in error), please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden. _______________________________________________ users mailing list [hidden email] http://lists.scilab.org/mailman/listinfo/users
Open this post in threaded view
|

## Re: Pairwise distance of a huge amount of points

 In reply to this post by Samuel GOUGEON Hello, > De : Samuel Gougeon > Envoyé : vendredi 29 août 2014 23:51 > > Actually the algo in members()  defines and runs over multiline slices, > that shortens a lot the number of iterations in the explicit loop [...] > but is more complex to implement. Sounds interesting. I'll have a look at it. Thanks. -- Christophe Dang Ngoc Chan Mechanical calculation engineer This e-mail may contain confidential and / or privileged information. If you are not the intended recipient (or have received this e-mail in error), please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden. _______________________________________________ users mailing list [hidden email] http://lists.scilab.org/mailman/listinfo/users
Open this post in threaded view
|

## Re: Pairwise distance of a huge amount of points

 In reply to this post by Samuel GOUGEON Hi, I am working in something related, which is deteriming overlaps in a sistem of spheres with random positions. So I have to compute the distance between centers of spheres, and compare it with the contact distance. I implemented a cell algorithm, which is used in molecular dynamics (you can look it up in internet if you don´t know it): essentially you divide the space in "cells", with a size that has to be larger than the maximum posible distance, then you identify in which cell each point is located, and then you compute the distance between each point and the points located in the same and the neigbhouring cells. The speed of the algorithm scales with n, and the computed distance matrix has n*c*b non-zero elements, where "c" is the average number of points in a cell and "b" is the number of neiborghs of each cells, plus one. "c*b" might be orders of magnitude smaller than "n" so there is a big advantaje in using sparse matrices. I used this with hundred thousands spheres without exeding the maximum stacksize. The only problem is that you need to know the maximum possible distance in order to define the cell size (if your cell is to small you can miss relevant meassurements, if it is too large, you lose efficiency); I don´t know if this is possible in your application. El 29/08/2014 07:01 p.m., Samuel Gougeon escribió: > Le 29/08/2014 12:58, Dang, Christophe a écrit : >> Just to close the subject: >> >> I tried to implement the algorithm with sparse matrices, and it is >> less efficient than scanning over one dimension: 7 times faster than >> the naive algorithm. > Yes, it was somewhat expected. Also for memory consumption, the sparse > encoding becomes interesting vs the dense encoding only when the > sparsity becomes smaller than .. 50%. > There was a quite recent thread on bugzilla, about the relevance of > keeping a sparse encoding for the result of cos(SP) where SP is a > sparse (so with a good fraction of null values). > It was finally -- wisely -- decided (after some tests) to switch to a > dense encoding for that case (as it should be for any function turning > 0 into a non-null value). > > Samuel > > _______________________________________________ > users mailing list > [hidden email] > http://lists.scilab.org/mailman/listinfo/users_______________________________________________ users mailing list [hidden email] http://lists.scilab.org/mailman/listinfo/users
Open this post in threaded view
|

## Re: Pairwise distance of a huge amount of points

 Hello, > De : Ezequiel Soule > Envoyé : lundi 1 septembre 2014 15:36 > > I implemented a cell algorithm, which is used in molecular dynamics [...] > you divide the space in "cells", with a size that has to be larger than the maximum > posible distance, then you identify in which cell each point is located, > and then you compute the distance between each point and the points located > in the same and the neigbhouring cells. > The speed of the algorithm scales with n, [...] > there is a big advantaje in using sparse matrices. That sounds a nice solution for some cases; I'll keep it in mind. Thanks for the information. -- Christophe Dang Ngoc Chan Mechanical calculation engineer This e-mail may contain confidential and / or privileged information. If you are not the intended recipient (or have received this e-mail in error), please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden. _______________________________________________ users mailing list [hidden email] http://lists.scilab.org/mailman/listinfo/users