# finding similar values with scilab

9 messages
Open this post in threaded view
|

## finding similar values with scilab

 Hello!!! I have a big question ! I have a biiiiiiiiiiig data archive with text files with the form: 18.87 2.6 0.00545558 19.98 2.6 0.00225349 18.87 2.6 0.00405905 13.32 2.6 0.01338288 19.98 2.6 0.01537532 18.87 2.6 0.00481375 19.98 2.6 0.00936207 12.21 2.6 0.00558517 I need  eliminate extreme values, for example: If I plot a random text file I obtain a image like this: As you can see theres points extremes. If i make a zoom DN=find(DM(:,3)<=0.0001 & DM(:,3)>=0.00002 ) dn=DM(DN,:) figure(2) plot(dn(:,1),dn(:,3),'O') I have this picture My question is: Has scilab a command that can help me to find the "dense data" , I mean find all the values that are similar. I ask this cause I have about 20,000 text files and is imposible for me make "artesal zoms". Please any suggestion, answer going to help me.
Open this post in threaded view
|

## Re: finding similar values with scilab

 Hi, If you want to remove outliers from a data set, you can use the modified Thompson Tau method.  Basically tau provides a limit based on mean and standard deviation so that values outside (mean +/- n * std dev) are rejected.  The method is unfortunately iterative, but it works well.  You can remove the located outliers using something like x = x(find(delta < tau * sd)), where delta is the absolute difference from the mean. Hope that helps, Mike. -----Original Message----- From: users [mailto:[hidden email]]On Behalf Of samaelkreutz Sent: 21 February 2014 05:18 To: [hidden email] Subject: [Scilab-users] finding similar values with scilab Hello!!! I have a big question ! I have a biiiiiiiiiiig data archive with text files with the form: 18.87 2.6 0.00545558 19.98 2.6 0.00225349 18.87 2.6 0.00405905 13.32 2.6 0.01338288 19.98 2.6 0.01537532 18.87 2.6 0.00481375 19.98 2.6 0.00936207 12.21 2.6 0.00558517 I need  eliminate extreme values, for example: If I plot a random text file I obtain a image like this: As you can see theres points extremes. If i make a zoom DN=find(DM(:,3)<=0.0001 & DM(:,3)>=0.00002 ) dn=DM(DN,:) figure(2) plot(dn(:,1),dn(:,3),'O') I have this picture My question is: Has scilab a command that can help me to find the "dense data" , I mean find all the values that are similar. I ask this cause I have about 20,000 text files and is imposible for me make "artesal zoms". Please any suggestion, answer going to help me. -- View this message in context: http://mailinglists.scilab.org/finding-similar-values-with-scilab-tp4028792. html Sent from the Scilab users - Mailing Lists Archives mailing list archive at Nabble.com. _______________________________________________ users mailing list [hidden email] http://lists.scilab.org/mailman/listinfo/users----- No virus found in this message. Checked by AVG - www.avg.com Version: 2014.0.4335 / Virus Database: 3705/7111 - Release Date: 02/20/14 _______________________________________________ users mailing list [hidden email] http://lists.scilab.org/mailman/listinfo/users
Open this post in threaded view
|

## Re: finding similar values with scilab

 In reply to this post by samaelkreutz Did you have a look to "awk" and "sed" tools that can be used in conjunction with Scilab ? -----Message d'origine----- De : users [mailto:[hidden email]] De la part de samaelkreutz Envoyé : vendredi 21 février 2014 06:18 À : [hidden email] Objet : [Scilab-users] finding similar values with scilab Hello!!! I have a big question ! I have a biiiiiiiiiiig data archive with text files with the form: 18.87 2.6 0.00545558 19.98 2.6 0.00225349 18.87 2.6 0.00405905 13.32 2.6 0.01338288 19.98 2.6 0.01537532 18.87 2.6 0.00481375 19.98 2.6 0.00936207 12.21 2.6 0.00558517 I need  eliminate extreme values, for example: If I plot a random text file I obtain a image like this: As you can see theres points extremes. If i make a zoom DN=find(DM(:,3)<=0.0001 & DM(:,3)>=0.00002 ) dn=DM(DN,:) figure(2) plot(dn(:,1),dn(:,3),'O') I have this picture My question is: Has scilab a command that can help me to find the "dense data" , I mean find all the values that are similar. I ask this cause I have about 20,000 text files and is imposible for me make "artesal zoms". Please any suggestion, answer going to help me. -- View this message in context: http://mailinglists.scilab.org/finding-similar-values-with-scilab-tp4028792. html Sent from the Scilab users - Mailing Lists Archives mailing list archive at Nabble.com. _______________________________________________ users mailing list [hidden email] http://lists.scilab.org/mailman/listinfo/users--- Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce que la protection avast! Antivirus est active. http://www.avast.com_______________________________________________ users mailing list [hidden email] http://lists.scilab.org/mailman/listinfo/users
Open this post in threaded view
|

## Re: finding similar values with scilab

 I need compute an iteration , but (obviously I don't know how to do it  =/  ) Read thousand of text files... then what I do is check the mean, stand. deviation, and variation coefficient ( std/mean ). If the variation coefficient is <= 0.4, stop the process. If not, eliminate the big value and recalculate all the process again. But i'm lost!!! because Im reading multiple text files and is confusing!!!!! Heeelp!!!!   clc clear all z=[]; // genero mi matriz vacia que se ira llenando con el for  for i=103 //35:2:37//      for j=2.6:0.1:3.9//      DM = fscanfMat(msprintf("new-C1_%d_%3.1f.txt",i,j));      //disp(i,j)      med=mean(DM(:,3));      std=st_deviation(DM(:,3));      CV=std/abs(med);      Z=[med std CV CV*100];      z=[z;Z]; // Matriz con información requerida      //figure(i);      //plot(DM(:,1), DM(:,3), 'ro*')      end  end z
Open this post in threaded view
|

## Re: finding similar values with scilab

 Do you have to do this on a single file and to repeat it X times (where X is the thousands of file you're speaking) or do you have to calculate the deviation and so on) on the X files at the same time ? Obviously the first case is much easier than the second one ..... -----Message d'origine----- De : users [mailto:[hidden email]] De la part de samaelkreutz Envoyé : samedi 22 février 2014 00:26 À : [hidden email] Objet : Re: [Scilab-users] finding similar values with scilab I need compute an iteration , but (obviously I don't know how to do it  =/  ) Read thousand of text files... then what I do is check the mean, stand. deviation, and variation coefficient ( std/mean ). If the variation coefficient is <= 0.4, stop the process. If not, eliminate the big value and recalculate all the process again. But i'm lost!!! because Im reading multiple text files and is confusing!!!!! Heeelp!!!!   clc clear all z=[]; // genero mi matriz vacia que se ira llenando con el for  for i=103 //35:2:37//      for j=2.6:0.1:3.9//      DM = fscanfMat(msprintf("new-C1_%d_%3.1f.txt",i,j));      //disp(i,j)      med=mean(DM(:,3));      std=st_deviation(DM(:,3));      CV=std/abs(med);      Z=[med std CV CV*100];      z=[z;Z]; // Matriz con información requerida      //figure(i);      //plot(DM(:,1), DM(:,3), 'ro*')      end  end z -- View this message in context: http://mailinglists.scilab.org/finding-similar-values-with-scilab-tp4028792p4028865.htmlSent from the Scilab users - Mailing Lists Archives mailing list archive at Nabble.com. _______________________________________________ users mailing list [hidden email] http://lists.scilab.org/mailman/listinfo/users--- Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce que la protection avast! Antivirus est active. http://www.avast.com_______________________________________________ users mailing list [hidden email] http://lists.scilab.org/mailman/listinfo/users
Open this post in threaded view
|

## Re: finding similar values with scilab

 I've ever suggested to use "awk" / "sed" and other linux tool (even under Windows) in conjunction to Scilab ; 1rst step : concatenate all the files if necessary (with Scilab for example) Average : http://azimuth.biz/2009/03/30/calculate-average-using-awk/standard deviation : http://azimuth.biz/2010/02/25/calculate-standard-deviation-using-awk/Min-Max: http://azimuth.biz/2010/10/20/find-minimum-and-maximum-using-awk/supress values greater than : (ditto for min value) http://www.unix.com/shell-programming-scripting/178904-awk-show-lines-where-nth-column-greater-than-some-number.htmlhttp://oreilly.com/catalog/unixnut3/chapter/ch11.htmland so on I'm not a awk/sed specialist, but I've a look to such tools when my input files have million's of lines ... and it works fine and fast !!! Just an idea NB: tutos http://www.grymoire.com/Unix/Awk.htmlhttp://www.cs.unibo.it/~renzo/doc/awk/nawkA4.pdfetc. ... -----Message d'origine----- De : users [mailto:[hidden email]] De la part de Paul CARRICO Envoyé : samedi 22 février 2014 08:48 À : 'International users mailing list for Scilab.' Objet : Re: [Scilab-users] finding similar values with scilab Do you have to do this on a single file and to repeat it X times (where X is the thousands of file you're speaking) or do you have to calculate the deviation and so on) on the X files at the same time ? Obviously the first case is much easier than the second one ..... -----Message d'origine----- De : users [mailto:[hidden email]] De la part de samaelkreutz Envoyé : samedi 22 février 2014 00:26 À : [hidden email] Objet : Re: [Scilab-users] finding similar values with scilab I need compute an iteration , but (obviously I don't know how to do it  =/  ) Read thousand of text files... then what I do is check the mean, stand. deviation, and variation coefficient ( std/mean ). If the variation coefficient is <= 0.4, stop the process. If not, eliminate the big value and recalculate all the process again. But i'm lost!!! because Im reading multiple text files and is confusing!!!!! Heeelp!!!!   clc clear all z=[]; // genero mi matriz vacia que se ira llenando con el for  for i=103 //35:2:37//      for j=2.6:0.1:3.9//      DM = fscanfMat(msprintf("new-C1_%d_%3.1f.txt",i,j));      //disp(i,j)      med=mean(DM(:,3));      std=st_deviation(DM(:,3));      CV=std/abs(med);      Z=[med std CV CV*100];      z=[z;Z]; // Matriz con información requerida      //figure(i);      //plot(DM(:,1), DM(:,3), 'ro*')      end  end z -- View this message in context: http://mailinglists.scilab.org/finding-similar-values-with-scilab-tp4028792p4028865.htmlSent from the Scilab users - Mailing Lists Archives mailing list archive at Nabble.com. _______________________________________________ users mailing list [hidden email] http://lists.scilab.org/mailman/listinfo/users--- Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce que la protection avast! Antivirus est active. http://www.avast.com_______________________________________________ users mailing list [hidden email] http://lists.scilab.org/mailman/listinfo/users--- Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce que la protection avast! Antivirus est active. http://www.avast.com_______________________________________________ users mailing list [hidden email] http://lists.scilab.org/mailman/listinfo/users
Open this post in threaded view
|

## Re: finding similar values with scilab

 Is the second option... with a for cycle im calling the files... I have about 2,000 files each one with hundred of lines. And yep... I manipulate the files with awk, at the beginning I was programming with bash the process becomes slow and two... (it takes about half hour, ok is not so much... but for me it is). And second...  I need eliminate the outliers, because if a take a random file and compute the mean and std, the mean is so sensitive to extreme values and sometimes those values aren representative. I have to eliminate them and recalculate the process again without that outlier and test again... is coefficient std/mean less that 0.4 ?? if not, find again other outlier, eliminate and repeat the process again and so on until the coefficient  is 0.4 or less. Save the information required (mean, std, coefficient,) into matrix and then take the following text file repeat the process before and save the results in the matrix. At the end of the process I pretend obtain a representative matrix with the information of all my text files.   As a user posted here, I think the modified Thompson Tau method sounds great... I think I'm going to try this... seriously I never though that statistics were useful  until my thesis obviously...     =/
Open this post in threaded view
|

## Re: finding similar values with scilab

 Hello, > De la part de samaelkreutz > Envoyé : samedi 22 février 2014 15:01 > > I need eliminate the outliers, because if a take a random file and > compute the mean and std, the mean is so sensitive to extreme values > and sometimes those values aren representative. Another approach is to use another position criterion that is less sensitive to outliers, e.g. the median. Best regards. -- Christophe Dang Ngoc Chan Mechanical calculation engineer ______________________________________________________________________ This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error), please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden. ______________________________________________________________________ _______________________________________________ users mailing list [hidden email] http://lists.scilab.org/mailman/listinfo/users
Open this post in threaded view
|

## Re: finding similar values with scilab

 On 2/24/2014 10:36, Dang, Christophe wrote: > Hello, > >> De la part de samaelkreutz >> Envoyé : samedi 22 février 2014 15:01 >> >> I need eliminate the outliers, because if a take a random file and >> compute the mean and std, the mean is so sensitive to extreme values >> and sometimes those values aren representative. > Another approach is to use another position criterion that is less > sensitive to outliers, e.g. the median. > > Best regards. > I agree with Christophe ... it sounds like you should take a look at median filtering. /Claus --- This email is free from viruses and malware because avast! Antivirus protection is active. http://www.avast.com_______________________________________________ users mailing list [hidden email] http://lists.scilab.org/mailman/listinfo/users