[Scilab-users] Count specific values in text file

classic Classic list List threaded Threaded
18 messages Options
arctica1963 arctica1963
Reply | Threaded
Open this post in threaded view
|

[Scilab-users] Count specific values in text file

Hello all,

Basic query. I have text files of Pi and e to a million places and I want to
scan the number for the occurrences of particular values and the separation
of those values in the number.

This code snippet works on a vector:

// create vector of elements

A=[1 2 3 4 5 6 7 8 9 1 1 1 1 1 1 4 4 5 5 5 5 8 8];

// Count number of values, e.g. 1

Val_count1= sum(A==1)

disp(Val_count1) // answer 7

I can open the text file with pinum=mopen('pi-million.txt','rt') - but does
it need to be changed to a vector where each value is an element?. At the
moment the text file is just a single line of numbers (no decimal pint at
the start (e.g. 314159...etc).

Any pointers would be good. Sorry if this is basic one!

Thanks



--
Sent from: http://mailinglists.scilab.org/Scilab-users-Mailing-Lists-Archives-f2602246.html
_______________________________________________
users mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/users
Dang Ngoc Chan, Christophe Dang Ngoc Chan, Christophe
Reply | Threaded
Open this post in threaded view
|

Re: {EXT} Count specific values in text file

Hello,

> De la part de arctica1963
> Envoyé : mardi 11 février 2020 13:11
>
> I have text files of Pi and e to a million places [...]
>
> I can open the text file with pinum=mopen('pi-million.txt','rt') - but does it
> need to be changed to a vector where each value is an element?. At the
> moment the text file is just a single line of numbers (no decimal pint at the
> start (e.g. 314159...etc).

You might have a look at csvRead and csvTextScan

https://help.scilab.org/docs/6.0.2/en_US/csvRead.html

https://help.scilab.org/docs/6.0.2/en_US/csvTextScan.html

Hope this helps,

regards

--
Christophe Dang Ngoc Chan
Mechanical calculation engineer

General
This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error), please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.
_______________________________________________
users mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/users
Samuel GOUGEON Samuel GOUGEON
Reply | Threaded
Open this post in threaded view
|

Re: Count specific values in text file

In reply to this post by arctica1963
Hello,

Le 11/02/2020 à 13:10, arctica1963 a écrit :
> Hello all,
>
> Basic query. I have text files of Pi and e to a million places and I want to
> scan the number for the occurrences of particular values and the separation
> of those values in the number.

I am afraid that i do not catch clearly the purpose.
Could you post a sample, and the expected result for it?

You may also have a look to grep().

Regards
Samuel

_______________________________________________
users mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/users
Carrico, Paul-2 Carrico, Paul-2
Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] Count specific values in text file

In reply to this post by arctica1963
Hi

Is that what you're expecting i.e. a way to reshape your matrix among other things ?

Paul
##################################################
mode(0)
A=[1 2 3 4 5 6 7 8 9 1 1 1 1 1 1 4 4 5 5 5 5 8 8];

loc = find(A == 1);
occ = size(loc, "*");
printf("Number of occurences = %d",occ);

// reshaping
B=[1 2 3 4 5 6 7 8 9 1 1 1 ;
   1 1 1 4 4 5 5 5 5 8 8 0];
[n,m] = size(B);
C = matrix(B,(n*m),1);

// then look for your specific values



-----Message d'origine-----
De : users [mailto:[hidden email]] De la part de arctica1963
Envoyé : mardi 11 février 2020 13:11
À : [hidden email]
Objet : [EXTERNAL] [Scilab-users] Count specific values in text file

Hello all,

Basic query. I have text files of Pi and e to a million places and I want to
scan the number for the occurrences of particular values and the separation
of those values in the number.

This code snippet works on a vector:

// create vector of elements

A=[1 2 3 4 5 6 7 8 9 1 1 1 1 1 1 4 4 5 5 5 5 8 8];

// Count number of values, e.g. 1

Val_count1= sum(A==1)

disp(Val_count1) // answer 7

I can open the text file with pinum=mopen('pi-million.txt','rt') - but does
it need to be changed to a vector where each value is an element?. At the
moment the text file is just a single line of numbers (no decimal pint at
the start (e.g. 314159...etc).

Any pointers would be good. Sorry if this is basic one!

Thanks



--
Sent from: https://urldefense.com/v3/__http://mailinglists.scilab.org/Scilab-users-Mailing-Lists-Archives-f2602246.html__;!!Cn2zSGkVug!dS2Fn2dKbFa5OF5GfEkOTNYlP5qA1dPf0ClsKnbfRbq7V6EWja6b9vP-OaWfFOX5zIqjbi3g$ 
_______________________________________________
users mailing list
[hidden email]
https://urldefense.com/v3/__http://lists.scilab.org/mailman/listinfo/users__;!!Cn2zSGkVug!dS2Fn2dKbFa5OF5GfEkOTNYlP5qA1dPf0ClsKnbfRbq7V6EWja6b9vP-OaWfFOX5zLxxkxMH$ 
_______________________________________________
users mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/users
der_Phil der_Phil
Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] Count specific values in text file

Hi,

is this what you're looking for?
// path to the txt file
path = 'pathToFile'

// read the file as string
piAsString = csvRead(path, [],['.'],'string')

// split the string at the decimal
[piAsString] = strsplit(piAsString,'.');

// just get the digits
piDigits = strtod(strsplit(piAsString(2)));

// search: how often does appears a certain value in a specific range within the digits
searchRange = 100;
searchVal = 1;

[locations] = find(piDigits(1:searchRange) == searchVal);

printf("The number %d appears %d times in the first %d digits of Pi\n",searchVal,length(locations),searchRange) ;
printf("The number %d appears at following locations: \n", searchVal);
disp(locations');
Best regards,
Philipp





_______________________________________________
users mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/users
arctica1963 arctica1963
Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] Count specific values in text file

Hello Philipp

Your suggestion is kind of what I am trying to do, but the text file is not
a CSV structure. It is just a single, very big number on one row. A small
chunk:

31415926535897932384626433832795028841971693993751058209749445923078164 etc
(no spaces between digits)

How best to load the text file and then count the number of occurrences of a
specific digit (e.g. 1)? At the moment the file essentially contains a very
big integer.

Interested to see the best method of doing this.

Thanks





--
Sent from: http://mailinglists.scilab.org/Scilab-users-Mailing-Lists-Archives-f2602246.html
_______________________________________________
users mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/users
Tan Chin Luh Tan Chin Luh
Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] Count specific values in text file

Hi,

tabul function might help.


unzip the PimultDP.Zip

following lines do the "counting" job. 

have not tried the millions decimal places, good luck.


--> data = mgetl('PI25K_DP.TXT');

--> data2=strsplit(data);

--> tabul(data2)
ans  =


       ans(1)

!9  !
!   !
!8  !
!   !
!7  !
!   !
!6  !
!   !
!5  !
!   !
!4  !
!   !
!3  !
!   !
!2  !
!   !
!1  !
!   !
!0  !


       ans(2)

   2509.
   2465.
   2480.
   2541.
   2567.
   2549.
   2491.
   2403.
   2519.
   2476.





---- On Wed, 12 Feb 2020 15:57:23 +0800 arctica1963 <[hidden email]> wrote ----

Hello Philipp

Your suggestion is kind of what I am trying to do, but the text file is not
a CSV structure. It is just a single, very big number on one row. A small
chunk:

31415926535897932384626433832795028841971693993751058209749445923078164 etc
(no spaces between digits)

How best to load the text file and then count the number of occurrences of a
specific digit (e.g. 1)? At the moment the file essentially contains a very
big integer.

Interested to see the best method of doing this.

Thanks





--
Sent from: http://mailinglists.scilab.org/Scilab-users-Mailing-Lists-Archives-f2602246.html
_______________________________________________
users mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/users



_______________________________________________
users mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/users
der_Phil der_Phil
Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] Count specific values in text file

In reply to this post by arctica1963
Hi,

Option 1:
You manually put a "." inside the file and run the script.
Since you only mentioned "pi" and "e" it's little efford.

Option 2:
You still can use csvRead even without having a "." as separator.
Note: Search range adapted to count only in digits behind the decimal sign.
You may change as you wish.

Best regards,
Philipp
// path to the txt file
path = 'pathToFile'
// read the file as string
piAsString = csvRead(path, [],[],'string')
// split the string at '' (use no token) 
[piAsString] = strsplit(piAsString,'');
// convert string to double
piDigits = strtod(piAsString);
// search: how often appears a certain value in a specific range within the digits
searchRange = 100;
searchVal = 1;
// adapt the range, to search only places behind the decimal sign
[locations] = find(piDigits(2:searchRange+1) == searchVal); // uncomment if wished // printf("location \t number\n"); // for i= 2:searchRange+1 // printf("%d \t %d \n", i-1, piDigits(i)); // end printf("The number %d appears %d times in the first %d digits of Pi\n",searchVal,length(locations),searchRange) ; printf("The number %d appears at following locations: \n", searchVal); disp(locations');

Am Mi., 12. Feb. 2020 um 08:57 Uhr schrieb arctica1963 <[hidden email]>:
Hello Philipp

Your suggestion is kind of what I am trying to do, but the text file is not
a CSV structure. It is just a single, very big number on one row. A small
chunk:

31415926535897932384626433832795028841971693993751058209749445923078164 etc
(no spaces between digits)

How best to load the text file and then count the number of occurrences of a
specific digit (e.g. 1)? At the moment the file essentially contains a very
big integer.

Interested to see the best method of doing this.

Thanks





--
Sent from: http://mailinglists.scilab.org/Scilab-users-Mailing-Lists-Archives-f2602246.html
_______________________________________________
users mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/users

_______________________________________________
users mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/users
Samuel GOUGEON Samuel GOUGEON
Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] Count specific values in text file

In reply to this post by arctica1963
Le 12/02/2020 à 08:57, arctica1963 a écrit :

> Hello Philipp
>
> Your suggestion is kind of what I am trying to do, but the text file is not
> a CSV structure. It is just a single, very big number on one row. A small
> chunk:
>
> 31415926535897932384626433832795028841971693993751058209749445923078164 etc
> (no spaces between digits)
>
> How best to load the text file and then count the number of occurrences of a
> specific digit (e.g. 1)? At the moment the file essentially contains a very
> big integer.

--> t =
"31415926535897932384626433832795028841971693993751058209749445923078164";
--> length(regexp(t,"/1/"))
  ans  =
    6.

--> length(regexp(t,"/41/"))
  ans  =
    2.

_______________________________________________
users mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/users
arctica1963 arctica1963
Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] Count specific values in text file

In reply to this post by der_Phil
Hi Philipp,

Your script works fine using csvRead.

One odd thing. Tested a few files with pi, and ones with >= 500,000 gave an
error:

csvRead: can not read file pi-million.txt: Error in the column structure.
(same for 500,000), but worked with 250,000? Not sure if this an internal
bug/limit in csvRead.

Thanks for the pointers everyone

Lester



--
Sent from: http://mailinglists.scilab.org/Scilab-users-Mailing-Lists-Archives-f2602246.html
_______________________________________________
users mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/users
Antoine Monmayrant-2 Antoine Monmayrant-2
Reply | Threaded
Open this post in threaded view
|

Re: ?==?utf-8?q? ?==?utf-8?q? [EXTERNAL] Count specific values in text file

Hello Lester,

Could this be related to http://bugzilla.scilab.org/show_bug.cgi?id=15788 ?
Which version of scilab are you using (the bug above was fixed in 6.0.2)?

Antoine


Le Mercredi, Février 12, 2020 12:31 CET, arctica1963 <[hidden email]> a écrit:
 
Hi Philipp,

Your script works fine using csvRead.

One odd thing. Tested a few files with pi, and ones with >= 500,000 gave an
error:

csvRead: can not read file pi-million.txt: Error in the column structure.
(same for 500,000), but worked with 250,000? Not sure if this an internal
bug/limit in csvRead.

Thanks for the pointers everyone

Lester



--
Sent from: http://mailinglists.scilab.org/Scilab-users-Mailing-Lists-Archives-f2602246.html
_______________________________________________
users mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/users
 

 
_______________________________________________
users mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/users
Clément DAVID Clément DAVID
Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] Count specific values in text file

In reply to this post by arctica1963
Hi all,

Nice challenge, I have a simple answer to display an histogram (with 10
classes) of the PI digits stored in a pi.txt file (without any dot
separator):

F="pi.txt";
fd = mopen(F, 'r');
histplot(ascii(strcat(string(0:9))), mget(fileinfo(F)(1), "c", fd));
mclose(fd);

The idea is to read the file using mget() and store each ascii value into a
vector. These data are plot as a histogram of "0" to "9" ascii value.

--
Clément



--
Sent from: http://mailinglists.scilab.org/Scilab-users-Mailing-Lists-Archives-f2602246.html
_______________________________________________
users mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/users
arctica1963 arctica1963
Reply | Threaded
Open this post in threaded view
|

Re: ?==?utf-8?q? ?==?utf-8?q? [EXTERNAL] Count specific values in text file

In reply to this post by Antoine Monmayrant-2
Hello Antoine

I am using version 6.0.2. Not sure if it is the same issue as there is only
one line in the file, but a lot of digits to handle. The error message is
not exactly clear.

Lester



--
Sent from: http://mailinglists.scilab.org/Scilab-users-Mailing-Lists-Archives-f2602246.html
_______________________________________________
users mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/users
Antoine Monmayrant-2 Antoine Monmayrant-2
Reply | Threaded
Open this post in threaded view
|

Re: ?==?utf-8?q? ?==?utf-8?q? ?= e

Hello Lester,

Could you:

(1) Share the file so that we can try & see whether we can reproduce the bug?
(2) Share a minimum working example that shows the bug / no bug depending on the lenght of the file?

This could help us confirm/report a bug.

Cheers,

Antoine


Le Mercredi, Février 12, 2020 15:15 CET, arctica1963 <[hidden email]> a écrit:
 
Hello Antoine

I am using version 6.0.2. Not sure if it is the same issue as there is only
one line in the file, but a lot of digits to handle. The error message is
not exactly clear.

Lester



--
Sent from: http://mailinglists.scilab.org/Scilab-users-Mailing-Lists-Archives-f2602246.html
_______________________________________________
users mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/users
 

 
_______________________________________________
users mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/users
arctica1963 arctica1963
Reply | Threaded
Open this post in threaded view
|

Re: ?= ?==?utf-8?q? e

PI250KDP.TXT <http://mailinglists.scilab.org/file/t495709/PI250KDP.TXT>  
PI500KDP.TXT <http://mailinglists.scilab.org/file/t495709/PI500KDP.TXT>  

clear
// path to the txt file
path = 'PI250KDP.txt'
// read the file as string
piAsString = csvRead(path, [],[],'string')
// split the string at '' (use no token)
[piAsString] = strsplit(piAsString,'');
// convert string to double
piDigits = strtod(piAsString);
// search: how often appears a certain value in a specific range within the
digits
searchRange = 10000;
searchVal = 1;
// adapt the range, to search only places behind the decimal sign
[locations] = find(piDigits(1:searchRange+1) == searchVal); // small tweak
here
printf("The number %d appears %d times in the first %d digits of
Pi\n",searchVal,length(locations),searchRange) ;
printf("The number %d appears at following locations: \n", searchVal);
disp(locations');

The 250K file works; the 500k file fails



--
Sent from: http://mailinglists.scilab.org/Scilab-users-Mailing-Lists-Archives-f2602246.html
_______________________________________________
users mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/users
Tan Chin Luh Tan Chin Luh
Reply | Threaded
Open this post in threaded view
|

Re: ?= ?==?utf-8?q? e

to add on, it might not just affect csvRead, but also others file IO functions as well:

how to reproduce:

// Not OK
a = ones(1,300000);
b = strcat(string(a));
mputl(b,'test.txt');
c = mgetl('test.txt');


then a is 250,000 mgetl get the string correctly, but when a is 300,000, it return empty string.

on the other hand, file in single column of data working fine. 

// OK
a = ones(1,300000);
b = string(a);
mputl(b,'test.txt');
c = mgetl('test.txt');

thanks.

rgds,
CL


---- On Wed, 12 Feb 2020 23:40:45 +0800 arctica1963 <[hidden email]> wrote ----

PI250KDP.TXT <http://mailinglists.scilab.org/file/t495709/PI250KDP.TXT>
PI500KDP.TXT <http://mailinglists.scilab.org/file/t495709/PI500KDP.TXT>

clear
// path to the txt file
path = 'PI250KDP.txt'
// read the file as string
piAsString = csvRead(path, [],[],'string')
// split the string at '' (use no token)
[piAsString] = strsplit(piAsString,'');
// convert string to double
piDigits = strtod(piAsString);
// search: how often appears a certain value in a specific range within the
digits
searchRange = 10000;
searchVal = 1;
// adapt the range, to search only places behind the decimal sign
[locations] = find(piDigits(1:searchRange+1) == searchVal); // small tweak
here
printf("The number %d appears %d times in the first %d digits of
Pi\n",searchVal,length(locations),searchRange) ;
printf("The number %d appears at following locations: \n", searchVal);
disp(locations');

The 250K file works; the 500k file fails



--
Sent from: http://mailinglists.scilab.org/Scilab-users-Mailing-Lists-Archives-f2602246.html
_______________________________________________
users mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/users



_______________________________________________
users mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/users
Antoine Monmayrant Antoine Monmayrant
Reply | Threaded
Open this post in threaded view
|

Re: ?= ?==?utf-8?q? e

Well, that sounds like a bug!

Could you report it?

Cheers,

Antoine

Le 12/02/2020 à 17:02, Chin Luh Tan a écrit :
to add on, it might not just affect csvRead, but also others file IO functions as well:

how to reproduce:

// Not OK
a = ones(1,300000);
b = strcat(string(a));
mputl(b,'test.txt');
c = mgetl('test.txt');


then a is 250,000 mgetl get the string correctly, but when a is 300,000, it return empty string.

on the other hand, file in single column of data working fine. 

// OK
a = ones(1,300000);
b = string(a);
mputl(b,'test.txt');
c = mgetl('test.txt');

thanks.

rgds,
CL


---- On Wed, 12 Feb 2020 23:40:45 +0800 arctica1963 [hidden email] wrote ----

PI250KDP.TXT <http://mailinglists.scilab.org/file/t495709/PI250KDP.TXT>
PI500KDP.TXT <http://mailinglists.scilab.org/file/t495709/PI500KDP.TXT>

clear
// path to the txt file
path = 'PI250KDP.txt'
// read the file as string
piAsString = csvRead(path, [],[],'string')
// split the string at '' (use no token)
[piAsString] = strsplit(piAsString,'');
// convert string to double
piDigits = strtod(piAsString);
// search: how often appears a certain value in a specific range within the
digits
searchRange = 10000;
searchVal = 1;
// adapt the range, to search only places behind the decimal sign
[locations] = find(piDigits(1:searchRange+1) == searchVal); // small tweak
here
printf("The number %d appears %d times in the first %d digits of
Pi\n",searchVal,length(locations),searchRange) ;
printf("The number %d appears at following locations: \n", searchVal);
disp(locations');

The 250K file works; the 500k file fails



--
Sent from: http://mailinglists.scilab.org/Scilab-users-Mailing-Lists-Archives-f2602246.html
_______________________________________________
users mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/users



_______________________________________________
users mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/users

_______________________________________________
users mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/users
Tan Chin Luh Tan Chin Luh
Reply | Threaded
Open this post in threaded view
|

Re: 3D ?=3D=3D?utf-8?q? e?=

sure, here it is:


rgds,
CL


---- On Thu, 13 Feb 2020 20:15:05 +0800 Antoine Monmayrant <[hidden email]> wrote ----

Well, that sounds like a bug!

Could you report it?

Cheers,

Antoine

Le 12/02/2020 à 17:02, Chin Luh Tan a écrit :

_______________________________________________
users mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/users
to add on, it might not just affect csvRead, but also others file IO functions as well:

how to reproduce:

// Not OK
a = ones(1,300000);
b = strcat(string(a));
mputl(b,'test.txt');
c = mgetl('test.txt');


then a is 250,000 mgetl get the string correctly, but when a is 300,000, it return empty string.

on the other hand, file in single column of data working fine. 

// OK
a = ones(1,300000);
b = string(a);
mputl(b,'test.txt');
c = mgetl('test.txt');

thanks.

rgds,
CL


---- On Wed, 12 Feb 2020 23:40:45 +0800 arctica1963 [hidden email] wrote ----

PI250KDP.TXT <http://mailinglists.scilab.org/file/t495709/PI250KDP.TXT>
PI500KDP.TXT <http://mailinglists.scilab.org/file/t495709/PI500KDP.TXT>

clear
// path to the txt file
path = 'PI250KDP.txt'
// read the file as string
piAsString = csvRead(path, [],[],'string')
// split the string at '' (use no token)
[piAsString] = strsplit(piAsString,'');
// convert string to double
piDigits = strtod(piAsString);
// search: how often appears a certain value in a specific range within the
digits
searchRange = 10000;
searchVal = 1;
// adapt the range, to search only places behind the decimal sign
[locations] = find(piDigits(1:searchRange+1) == searchVal); // small tweak
here
printf("The number %d appears %d times in the first %d digits of
Pi\n",searchVal,length(locations),searchRange) ;
printf("The number %d appears at following locations: \n", searchVal);
disp(locations');

The 250K file works; the 500k file fails



--
Sent from: http://mailinglists.scilab.org/Scilab-users-Mailing-Lists-Archives-f2602246.html
_______________________________________________
users mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/users



_______________________________________________
users mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/users




_______________________________________________
users mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/users