Big literal integers for int64() and uint64()

classic Classic list List threaded Threaded
7 messages Options
Samuel GOUGEON Samuel GOUGEON
Reply | Threaded
Open this post in threaded view
|

Big literal integers for int64() and uint64()

Hello,

Here is an awkward case. I am afraid that it won't be possible to improve the situation, but let's see it:

--> int64(461168601842738791)

 ans  =
  461168601842738816   <<< ..38816 =/= ..38791

I know that this comes from the fact that the literal 461168601842738791 is by default parsed as the decimal number 461168601842738791., and since 1/461168601842738791 < %eps, there is a round-off error, after what the "truncated" decimal number is converted into an int64.

So, the question is: in the specific cases of int64() and uint64(), would it be possible to parse literals input arguments assuming that they are not decimal?

Before parsing the literal, could the parser be aware that the calling function is int64() or uint64()?

Another solution -- may be easier to implement -- to provide literal numbers to int64() and uint64() without rounding them could be to support string inputs, as for instance hex2dec() does it. It is presently not the case, and the overloading is not enabled:

--> int64("461168601842738791")
int64: Wrong type for input argument #1: integer, boolean or double expected.

Samuel



_______________________________________________
dev mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/dev
Antoine ELIAS-2 Antoine ELIAS-2
Reply | Threaded
Open this post in threaded view
|

Re: Big literal integers for int64() and uint64()

Hello Samuel,

Currently, parser ... lexer, in fact, try to read "number" and convert it to floating point number, at this moment, we have no idea what the final goal of this number.
After that, the parser try to understand what to do with this number (already a double).
So I think the first case is not possible with the current management of numbers.

For string argument, I think it cannot be done easily in an overload macro for the same reason.
But it can be done in (u)int builtin .

Regards,
Antoine
Le 27/10/2018 à 01:06, Samuel Gougeon a écrit :

Hello,

Here is an awkward case. I am afraid that it won't be possible to improve the situation, but let's see it:

--> int64(461168601842738791)

 ans  =
  461168601842738816   <<< ..38816 =/= ..38791

I know that this comes from the fact that the literal 461168601842738791 is by default parsed as the decimal number 461168601842738791., and since 1/461168601842738791 < %eps, there is a round-off error, after what the "truncated" decimal number is converted into an int64.

So, the question is: in the specific cases of int64() and uint64(), would it be possible to parse literals input arguments assuming that they are not decimal?

Before parsing the literal, could the parser be aware that the calling function is int64() or uint64()?

Another solution -- may be easier to implement -- to provide literal numbers to int64() and uint64() without rounding them could be to support string inputs, as for instance hex2dec() does it. It is presently not the case, and the overloading is not enabled:

--> int64("461168601842738791")
int64: Wrong type for input argument #1: integer, boolean or double expected.

Samuel




_______________________________________________
dev mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/dev


_______________________________________________
dev mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/dev
Samuel GOUGEON Samuel GOUGEON
Reply | Threaded
Open this post in threaded view
|

Re: Big literal integers for int64() and uint64()

Hello Antoine,

Thanks for your answer.
I have opened a bug report #15837 about this topic.

About the overload:

Le 27/10/2018 à 14:00, Antoine ELIAS a écrit :
Hello Samuel,

Currently, parser ... lexer, in fact, try to read "number" and convert it to floating point number, at this moment, we have no idea what the final goal of this number.
After that, the parser try to understand what to do with this number (already a double).
So I think the first case is not possible with the current management of numbers.

For string argument, I think it cannot be done easily in an overload macro for the same reason.

In the report, i have posted a proposal for a working 11-rows-long %c_uint64() overload (without the error messages ;)
With it, we get for instance

--> %c_uint64("9000000000000001000") + [ 1 -1001 ; 4 7]
 ans  =
  9000000000000001001  8999999999999999999
  9000000000000001004  9000000000000001007

whereas
--> uint64(9000000000000001000) + [ 1 -1001 ; 4 7]
 ans  =
  9000000000000001025  9000000000000000023
  9000000000000001028  9000000000000001031

It can process any relevant input array of any number of dimensions.

Best regards
Samuel


_______________________________________________
dev mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/dev
Antoine ELIAS-2 Antoine ELIAS-2
Reply | Threaded
Open this post in threaded view
|

Re: Big literal integers for int64() and uint64()

Hello again,
I worked on it ( challenge accepted ! )

I took a look of your implementation, I'm not sure it is necessary to remove trailing white spaces or inner spaces ( " 1 000 000 " is not a correct representation of a number, that's all ! )
Or you have to manage "\t", "," or any localized separator.

In lot of language when you convert "123toto" you get 123 not an error.
And does not allow "1.123" is a mistake for me since uint64(1.123) -> 1(ui64).

I made an implementation in builtin for all (u)int functions that raises error only on too long input string ("10000000000000000000" for example ) and empty string ( but we can convert to 0 like %nan )
I manage "nan", "NaN" "%nan", "inf", "Inf", "%inf", "-inf", "-Inf", "-%inf" to respectively 0, minval and maxval
And "icing on the cake", is little bit faster than yours :p
(between 2 and 100 times depending of input size, C++ vs script)

(missing NRT, changes ... but it is Saturday ^^ )
https://codereview.scilab.org/#/c/20587/

Regards,
Antoine
Le 27/10/2018 à 17:34, Samuel Gougeon a écrit :
Hello Antoine,

Thanks for your answer.
I have opened a bug report #15837 about this topic.

About the overload:

Le 27/10/2018 à 14:00, Antoine ELIAS a écrit :
Hello Samuel,

Currently, parser ... lexer, in fact, try to read "number" and convert it to floating point number, at this moment, we have no idea what the final goal of this number.
After that, the parser try to understand what to do with this number (already a double).
So I think the first case is not possible with the current management of numbers.

For string argument, I think it cannot be done easily in an overload macro for the same reason.

In the report, i have posted a proposal for a working 11-rows-long %c_uint64() overload (without the error messages ;)
With it, we get for instance

--> %c_uint64("9000000000000001000") + [ 1 -1001 ; 4 7]
 ans  =
  9000000000000001001  8999999999999999999
  9000000000000001004  9000000000000001007

whereas
--> uint64(9000000000000001000) + [ 1 -1001 ; 4 7]
 ans  =
  9000000000000001025  9000000000000000023
  9000000000000001028  9000000000000001031

It can process any relevant input array of any number of dimensions.

Best regards
Samuel



_______________________________________________
dev mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/dev


_______________________________________________
dev mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/dev
Clément David-3 Clément David-3
Reply | Threaded
Open this post in threaded view
|

Re: Big literal integers for int64() and uint64()

Hello all,

 

Nice implementation ! IMHO `double()` might also be extended to support a string, do you expect any other functions to be impacted by “string interpretation” on type conversion ?

 

Thanks,

 

--

Clément

 

From: dev <[hidden email]> On Behalf Of Antoine ELIAS
Sent: Saturday, October 27, 2018 8:29 PM
To: [hidden email]
Subject: Re: [Scilab-Dev] Big literal integers for int64() and uint64()

 

Hello again,
I worked on it ( challenge accepted ! )

I took a look of your implementation, I'm not sure it is necessary to remove trailing white spaces or inner spaces ( " 1 000 000 " is not a correct representation of a number, that's all ! )
Or you have to manage "\t", "," or any localized separator.

In lot of language when you convert "123toto" you get 123 not an error.
And does not allow "1.123" is a mistake for me since uint64(1.123) -> 1(ui64).

I made an implementation in builtin for all (u)int functions that raises error only on too long input string ("10000000000000000000" for example ) and empty string ( but we can convert to 0 like %nan )
I manage "nan", "NaN" "%nan", "inf", "Inf", "%inf", "-inf", "-Inf", "-%inf" to respectively 0, minval and maxval
And "icing on the cake", is little bit faster than yours :p
(between 2 and 100 times depending of input size, C++ vs script)

(missing NRT, changes ... but it is Saturday ^^ )
https://codereview.scilab.org/#/c/20587/

Regards,
Antoine

Le 27/10/2018 à 17:34, Samuel Gougeon a écrit :

Hello Antoine,

Thanks for your answer.
I have opened a bug report #15837 about this topic.

About the overload:

Le 27/10/2018 à 14:00, Antoine ELIAS a écrit :

Hello Samuel,

Currently, parser ... lexer, in fact, try to read "number" and convert it to floating point number, at this moment, we have no idea what the final goal of this number.
After that, the parser try to understand what to do with this number (already a double).
So I think the first case is not possible with the current management of numbers.

For string argument, I think it cannot be done easily in an overload macro for the same reason.


In the report, i have posted a proposal for a working 11-rows-long %c_uint64() overload (without the error messages ;)
With it, we get for instance

--> %c_uint64("9000000000000001000") + [ 1 -1001 ; 4 7]
 ans  =
  9000000000000001001  8999999999999999999
  9000000000000001004  9000000000000001007

whereas
--> uint64(9000000000000001000) + [ 1 -1001 ; 4 7]
 ans  =
  9000000000000001025  9000000000000000023
  9000000000000001028  9000000000000001031

It can process any relevant input array of any number of dimensions.

Best regards
Samuel




_______________________________________________
dev mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/dev

 


_______________________________________________
dev mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/dev

smime.p7s (5K) Download Attachment
Samuel GOUGEON Samuel GOUGEON
Reply | Threaded
Open this post in threaded view
|

Re: Big literal integers for int64() and uint64()

Hello,

Le 30/10/2018 à 13:51, Clément David a écrit :

Hello all,

 

Nice implementation ! IMHO `double()` might also be extended to support a string,


The question is: What to do? With wich features?
It would be a nice replacement of strtod(). IMO, it should in no way duplicate it.

But is it really a priority?
There are already too many functions all doing rather the same thing, and rather weakly:
strtod(), msscanf(), evstr() (and still eval().. whose removal is not yet merged).

However, there are certainly priorities in matter of string conversion. Noticeably,
currently, there is no way to read imaginary parts of complex numbers
as text, and convert them into numeric type.

You may have a look at a (still) vain analysis at http://bugzilla.scilab.org/4401
As well, supporting hexa and octal string input is requested for msscanf()
there http://bugzilla.scilab.org/8905 . These formats are also discussed as strtod()
possible input (with the same tolerance (and output) about any trailing part),
in the bug 4401 report.

Bets regards
Samuel


_______________________________________________
dev mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/dev
Samuel GOUGEON Samuel GOUGEON
Reply | Threaded
Open this post in threaded view
|

Re: Big literal integers for int64() and uint64()

Le 30/10/2018 à 14:24, Samuel Gougeon a écrit :
Hello,

Le 30/10/2018 à 13:51, Clément David a écrit :

Hello all,

 

Nice implementation ! IMHO `double()` might also be extended to support a string,


The question is: What to do? With wich features?
It would be a nice replacement of strtod(). IMO, it should in no way duplicate it.

But is it really a priority?
There are already too many functions all doing rather the same thing, and rather weakly:
strtod(), msscanf(), evstr() (and still eval().. whose removal is not yet merged).

However, there are certainly priorities in matter of string conversion. Noticeably,
currently, there is no way to read imaginary parts of complex numbers
as text, and convert them into numeric type.


csvTextScan() (another guy..) does it. It is definitely not emphasized in its description.
It has only a small not-illustrated example. But it does it.

Samuel


_______________________________________________
dev mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/dev