finding similar values with scilab

classic Classic list List threaded Threaded
9 messages Options
samaelkreutz samaelkreutz
Reply | Threaded
Open this post in threaded view
|

finding similar values with scilab

Hello!!! I have a big question !
I have a biiiiiiiiiiig data archive with text files with the form:

18.87 2.6 0.00545558
19.98 2.6 0.00225349
18.87 2.6 0.00405905
13.32 2.6 0.01338288
19.98 2.6 0.01537532
18.87 2.6 0.00481375
19.98 2.6 0.00936207
12.21 2.6 0.00558517

I need  eliminate extreme values, for example: If I plot a random text file I obtain a image like this:



As you can see theres points extremes. If i make a zoom

DN=find(DM(:,3)<=0.0001 & DM(:,3)>=0.00002 )
dn=DM(DN,:)
figure(2)
plot(dn(:,1),dn(:,3),'O')

I have this picture




My question is: Has scilab a command that can help me to find the "dense data" , I mean find all the values that are similar. I ask this cause I have about 20,000 text files and is imposible for me make "artesal zoms".

Please any suggestion, answer going to help me.
Mike Page Mike Page
Reply | Threaded
Open this post in threaded view
|

Re: finding similar values with scilab

Hi,

If you want to remove outliers from a data set, you can use the modified
Thompson Tau method.  Basically tau provides a limit based on mean and
standard deviation so that values outside (mean +/- n * std dev) are
rejected.  The method is unfortunately iterative, but it works well.  You
can remove the located outliers using something like x = x(find(delta < tau
* sd)), where delta is the absolute difference from the mean.

Hope that helps,
Mike.


-----Original Message-----
From: users [mailto:[hidden email]]On Behalf Of
samaelkreutz
Sent: 21 February 2014 05:18
To: [hidden email]
Subject: [Scilab-users] finding similar values with scilab


Hello!!! I have a big question !
I have a biiiiiiiiiiig data archive with text files with the form:

18.87 2.6 0.00545558
19.98 2.6 0.00225349
18.87 2.6 0.00405905
13.32 2.6 0.01338288
19.98 2.6 0.01537532
18.87 2.6 0.00481375
19.98 2.6 0.00936207
12.21 2.6 0.00558517

I need  eliminate extreme values, for example: If I plot a random text file
I obtain a image like this:

<http://mailinglists.scilab.org/file/n4028792/Captura_de_pantalla_2014-02-21
_a_la%28s%29_2.07.54.png>

As you can see theres points extremes. If i make a zoom

DN=find(DM(:,3)<=0.0001 & DM(:,3)>=0.00002 )
dn=DM(DN,:)
figure(2)
plot(dn(:,1),dn(:,3),'O')

I have this picture

<http://mailinglists.scilab.org/file/n4028792/Captura_de_pantalla_2014-02-21
_a_la%28s%29_2.10.56.png>


My question is: Has scilab a command that can help me to find the "dense
data" , I mean find all the values that are similar. I ask this cause I have
about 20,000 text files and is imposible for me make "artesal zoms".

Please any suggestion, answer going to help me.




--
View this message in context:
http://mailinglists.scilab.org/finding-similar-values-with-scilab-tp4028792.
html
Sent from the Scilab users - Mailing Lists Archives mailing list archive at
Nabble.com.
_______________________________________________
users mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/users


-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2014.0.4335 / Virus Database: 3705/7111 - Release Date: 02/20/14

_______________________________________________
users mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/users
paul.carrico paul.carrico
Reply | Threaded
Open this post in threaded view
|

Re: finding similar values with scilab

In reply to this post by samaelkreutz
Did you have a look to "awk" and "sed" tools that can be used in conjunction
with Scilab ?



-----Message d'origine-----
De : users [mailto:[hidden email]] De la part de
samaelkreutz
Envoyé : vendredi 21 février 2014 06:18
À : [hidden email]
Objet : [Scilab-users] finding similar values with scilab

Hello!!! I have a big question !
I have a biiiiiiiiiiig data archive with text files with the form:

18.87 2.6 0.00545558
19.98 2.6 0.00225349
18.87 2.6 0.00405905
13.32 2.6 0.01338288
19.98 2.6 0.01537532
18.87 2.6 0.00481375
19.98 2.6 0.00936207
12.21 2.6 0.00558517

I need  eliminate extreme values, for example: If I plot a random text file
I obtain a image like this:

<http://mailinglists.scilab.org/file/n4028792/Captura_de_pantalla_2014-02-21
_a_la%28s%29_2.07.54.png>

As you can see theres points extremes. If i make a zoom

DN=find(DM(:,3)<=0.0001 & DM(:,3)>=0.00002 )
dn=DM(DN,:)
figure(2)
plot(dn(:,1),dn(:,3),'O')

I have this picture

<http://mailinglists.scilab.org/file/n4028792/Captura_de_pantalla_2014-02-21
_a_la%28s%29_2.10.56.png>


My question is: Has scilab a command that can help me to find the "dense
data" , I mean find all the values that are similar. I ask this cause I have
about 20,000 text files and is imposible for me make "artesal zoms".

Please any suggestion, answer going to help me.




--
View this message in context:
http://mailinglists.scilab.org/finding-similar-values-with-scilab-tp4028792.
html
Sent from the Scilab users - Mailing Lists Archives mailing list archive at
Nabble.com.
_______________________________________________
users mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/users


---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce que la protection avast! Antivirus est active.
http://www.avast.com

_______________________________________________
users mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/users
samaelkreutz samaelkreutz
Reply | Threaded
Open this post in threaded view
|

Re: finding similar values with scilab

I need compute an iteration , but (obviously I don't know how to do it  =/  )
Read thousand of text files... then what I do is check the mean, stand. deviation, and variation coefficient ( std/mean ). If the variation coefficient is <= 0.4, stop the process. If not, eliminate the big value and recalculate all the process again.

But i'm lost!!! because Im reading multiple text files and is confusing!!!!!
Heeelp!!!!  




clc
clear all
z=[]; // genero mi matriz vacia que se ira llenando con el for
 for i=103 //35:2:37//
     for j=2.6:0.1:3.9//
     DM = fscanfMat(msprintf("new-C1_%d_%3.1f.txt",i,j));
     //disp(i,j)
     med=mean(DM(:,3));
     std=st_deviation(DM(:,3));
     CV=std/abs(med);
     Z=[med std CV CV*100];
     z=[z;Z]; // Matriz con información requerida
     //figure(i);
     //plot(DM(:,1), DM(:,3), 'ro*')
     end
 end
z
paul.carrico paul.carrico
Reply | Threaded
Open this post in threaded view
|

Re: finding similar values with scilab

Do you have to do this on a single file and to repeat it X times (where X is the thousands of file you're speaking) or do you have to calculate the deviation and so on) on the X files at the same time ?

Obviously the first case is much easier than the second one .....



-----Message d'origine-----
De : users [mailto:[hidden email]] De la part de samaelkreutz
Envoyé : samedi 22 février 2014 00:26
À : [hidden email]
Objet : Re: [Scilab-users] finding similar values with scilab

I need compute an iteration , but (obviously I don't know how to do it  =/  ) Read thousand of text files... then what I do is check the mean, stand.
deviation, and variation coefficient ( std/mean ). If the variation coefficient is <= 0.4, stop the process. If not, eliminate the big value and recalculate all the process again.

But i'm lost!!! because Im reading multiple text files and is confusing!!!!!
Heeelp!!!!  




clc
clear all
z=[]; // genero mi matriz vacia que se ira llenando con el for  for i=103 //35:2:37//
     for j=2.6:0.1:3.9//
     DM = fscanfMat(msprintf("new-C1_%d_%3.1f.txt",i,j));
     //disp(i,j)
     med=mean(DM(:,3));
     std=st_deviation(DM(:,3));
     CV=std/abs(med);
     Z=[med std CV CV*100];
     z=[z;Z]; // Matriz con información requerida
     //figure(i);
     //plot(DM(:,1), DM(:,3), 'ro*')
     end
 end
z




--
View this message in context: http://mailinglists.scilab.org/finding-similar-values-with-scilab-tp4028792p4028865.html
Sent from the Scilab users - Mailing Lists Archives mailing list archive at Nabble.com.
_______________________________________________
users mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/users


---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce que la protection avast! Antivirus est active.
http://www.avast.com

_______________________________________________
users mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/users
paul.carrico paul.carrico
Reply | Threaded
Open this post in threaded view
|

Re: finding similar values with scilab

I've ever suggested to use "awk" / "sed" and other linux tool (even under Windows) in conjunction to Scilab ;

1rst step : concatenate all the files if necessary (with Scilab for example)

Average :
http://azimuth.biz/2009/03/30/calculate-average-using-awk/

standard deviation :
http://azimuth.biz/2010/02/25/calculate-standard-deviation-using-awk/


Min-Max:
http://azimuth.biz/2010/10/20/find-minimum-and-maximum-using-awk/


supress values greater than : (ditto for min value)
http://www.unix.com/shell-programming-scripting/178904-awk-show-lines-where-nth-column-greater-than-some-number.html
http://oreilly.com/catalog/unixnut3/chapter/ch11.html


and so on

I'm not a awk/sed specialist, but I've a look to such tools when my input files have million's of lines ... and it works fine and fast !!!
Just an idea


NB: tutos
http://www.grymoire.com/Unix/Awk.html
http://www.cs.unibo.it/~renzo/doc/awk/nawkA4.pdf
etc. ...




-----Message d'origine-----
De : users [mailto:[hidden email]] De la part de Paul CARRICO
Envoyé : samedi 22 février 2014 08:48
À : 'International users mailing list for Scilab.'
Objet : Re: [Scilab-users] finding similar values with scilab

Do you have to do this on a single file and to repeat it X times (where X is the thousands of file you're speaking) or do you have to calculate the deviation and so on) on the X files at the same time ?

Obviously the first case is much easier than the second one .....



-----Message d'origine-----
De : users [mailto:[hidden email]] De la part de samaelkreutz Envoyé : samedi 22 février 2014 00:26 À : [hidden email] Objet : Re: [Scilab-users] finding similar values with scilab

I need compute an iteration , but (obviously I don't know how to do it  =/  ) Read thousand of text files... then what I do is check the mean, stand.
deviation, and variation coefficient ( std/mean ). If the variation coefficient is <= 0.4, stop the process. If not, eliminate the big value and recalculate all the process again.

But i'm lost!!! because Im reading multiple text files and is confusing!!!!!
Heeelp!!!!  




clc
clear all
z=[]; // genero mi matriz vacia que se ira llenando con el for  for i=103 //35:2:37//
     for j=2.6:0.1:3.9//
     DM = fscanfMat(msprintf("new-C1_%d_%3.1f.txt",i,j));
     //disp(i,j)
     med=mean(DM(:,3));
     std=st_deviation(DM(:,3));
     CV=std/abs(med);
     Z=[med std CV CV*100];
     z=[z;Z]; // Matriz con información requerida
     //figure(i);
     //plot(DM(:,1), DM(:,3), 'ro*')
     end
 end
z




--
View this message in context: http://mailinglists.scilab.org/finding-similar-values-with-scilab-tp4028792p4028865.html
Sent from the Scilab users - Mailing Lists Archives mailing list archive at Nabble.com.
_______________________________________________
users mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/users


---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce que la protection avast! Antivirus est active.
http://www.avast.com

_______________________________________________
users mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/users


---
Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce que la protection avast! Antivirus est active.
http://www.avast.com

_______________________________________________
users mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/users
samaelkreutz samaelkreutz
Reply | Threaded
Open this post in threaded view
|

Re: finding similar values with scilab

Is the second option...
with a for cycle im calling the files... I have about 2,000 files each one with hundred of lines. And yep... I manipulate the files with awk, at the beginning I was programming with bash the process becomes slow and two... (it takes about half hour, ok is not so much... but for me it is). And second...  I need eliminate the outliers, because if a take a random file and compute the mean and std, the mean is so sensitive to extreme values and sometimes those values aren representative. I have to eliminate them and recalculate the process again without that outlier and test again... is coefficient std/mean less that 0.4 ?? if not, find again other outlier, eliminate and repeat the process again and so on until the coefficient  is 0.4 or less. Save the information required (mean, std, coefficient,) into matrix and then take the following text file repeat the process before and save the results in the matrix. At the end of the process I pretend obtain a representative matrix with the information of all my text files.  

As a user posted here, I think the modified Thompson Tau method sounds great... I think I'm going to try this... seriously I never though that statistics were useful  until my thesis obviously...     =/
Christophe Dang Ngoc Chan Christophe Dang Ngoc Chan
Reply | Threaded
Open this post in threaded view
|

Re: finding similar values with scilab

Hello,

> De la part de samaelkreutz
> Envoyé : samedi 22 février 2014 15:01
>
> I need eliminate the outliers, because if a take a random file and
> compute the mean and std, the mean is so sensitive to extreme values
> and sometimes those values aren representative.

Another approach is to use another position criterion that is less
sensitive to outliers, e.g. the median.

Best regards.

--
Christophe Dang Ngoc Chan
Mechanical calculation engineer

______________________________________________________________________

This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error), please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.
______________________________________________________________________
_______________________________________________
users mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/users
Claus Futtrup Claus Futtrup
Reply | Threaded
Open this post in threaded view
|

Re: finding similar values with scilab

On 2/24/2014 10:36, Dang, Christophe wrote:

> Hello,
>
>> De la part de samaelkreutz
>> Envoyé : samedi 22 février 2014 15:01
>>
>> I need eliminate the outliers, because if a take a random file and
>> compute the mean and std, the mean is so sensitive to extreme values
>> and sometimes those values aren representative.
> Another approach is to use another position criterion that is less
> sensitive to outliers, e.g. the median.
>
> Best regards.
>

I agree with Christophe ... it sounds like you should take a look at
median filtering.

/Claus

---
This email is free from viruses and malware because avast! Antivirus protection is active.
http://www.avast.com

_______________________________________________
users mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/users