1 boolean = 4 bytes => 1 byte ?

classic Classic list List threaded Threaded
4 messages Options
Samuel GOUGEON Samuel GOUGEON
Reply | Threaded
Open this post in threaded view
|

1 boolean = 4 bytes => 1 byte ?

Hello,

As reported  @ http://bugzilla.scilab.org/12789 in 2013 (so 2 years before the first 6.0 alpha release), in Scilab 5 each boolean takes 4 bytes to be stored.

It is 4 times more than an easy storage and handling with 1 byte per boolean, and 32 times more than a memory optimum with 8 booleans per byte.

Since [names, memory]=who(..) is broken in Scilab 6, i did not check that this poor memory usage is still actual in 6.0.

Assuming that it is the case, then, what would imply to change the storage -- say with 1 byte per boolean --

  • in terms of implementation : would it be heavy to implement?
  • in terms of back-compatibility : would it have a big impact?

Regards

Samuel


_______________________________________________
dev mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/dev
Clément David-3 Clément David-3
Reply | Threaded
Open this post in threaded view
|

Re: 1 boolean = 4 bytes => 1 byte ?

Hello Samuel,

After some diving into the ast/types source code and debugging session I got some information :

 --> disp(%t)
    // gdb resolved the value as a <types::ArrayOf<int>> where m_iSize = 1
    // (the size of the inner m_pRealData)
 --> disp([%t %f %t %f])
    // gdb resolved the value as a <types::ArrayOf<int>> where m_iSize = 4
    // (the size of the inner m_pRealData)

So in Scilab 6, there is 4 byte per boolean; to me a first thing to do before changing the current
implementation is to let `who()` return both the memory used (including the Scilab header) and the
memory used by the inner data storage.

Note: as discussed in this ML, the overhead per for Scilab datatype (not inner value) is 208 byte
per value, to me it is more important to reduce it first as it will impact all ArrayOf based
datatype.

Thanks,

--
Clément

Le lundi 23 avril 2018 à 18:25 +0200, Samuel Gougeon a écrit :

> Hello,
> As reported  @ http://bugzilla.scilab.org/12789 in 2013 (so 2 years before the first 6.0 alpha
> release), in Scilab 5 each boolean takes 4 bytes to be stored.
> It is 4 times more than an easy storage and handling with 1 byte per boolean, and 32 times more
> than a memory optimum with 8 booleans per byte.
>
> Since [names, memory]=who(..) is broken in Scilab 6, i did not check that this poor memory usage
> is still actual in 6.0.
>
> Assuming that it is the case, then, what would imply to change the storage -- say with 1 byte per
> boolean --
> in terms of implementation : would it be heavy to implement?
> in terms of back-compatibility : would it have a big impact?
> Regards
> Samuel
> _______________________________________________
> dev mailing list
> [hidden email]
> http://lists.scilab.org/mailman/listinfo/dev
_______________________________________________
dev mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/dev
Samuel GOUGEON Samuel GOUGEON
Reply | Threaded
Open this post in threaded view
|

Re: 1 boolean = 4 bytes => 1 byte ?

Hello Clément,

Le 22/05/2018 à 11:12, Clément David a écrit :
Hello Samuel,

After some diving into the ast/types source code and debugging session I got some information :

 --> disp(%t)
    // gdb resolved the value as a <types::ArrayOf<int>> where m_iSize = 1
    // (the size of the inner m_pRealData)
 --> disp([%t %f %t %f])
    // gdb resolved the value as a <types::ArrayOf<int>> where m_iSize = 4
    // (the size of the inner m_pRealData)

So in Scilab 6, there is 4 byte per boolean; to me a first thing to do before changing the current
implementation is to let `who()` return both the memory used (including the Scilab header) and the
memory used by the inner data storage.

Note: as discussed in this ML, the overhead per for Scilab datatype (not inner value) is 208 byte
per value, to me it is more important to reduce it first as it will impact all ArrayOf based
datatype.

So, i understand that it is hard, or even impossible, or useless, to assess the
impact of a change in term of back-compatibility wrt existing datafiles.

I understand also that the Scilab devs team has finally decided -- at least as a first step --
to retrieve the former who() behavior to get the memory, so including the Scilab header.
To me, it would be anyway better to drop the "word" unit (set of 8 bytes) and to
return the memory preferably in bytes. Beyond the recovery of [x,mem]=who(..),
this would already be an improvement.

As far as i understand it -- what's not sure --, i am not convinced by the last point, in terms of priority.
The main concern of the initial report is the better usage of the memory,
noticeably when doing operations on big arrays with big intermediate boolean arrays.
Some operations may fail because of insufficient intermediate memory.
Now, even if 1000 variables are simultaneously defined in the workspace
-- this never happens, but let's assume it is so --,
and that each one takes 208 bytes (really per value ?? i assume it is rather per container. Is it right?),
then this uses 208 kbytes, what's nothing.
Now, if a single boolean array is defined and used to process a whole 1000x1000x4 RGBA image,
it uses alone 16 MB instead of 4MB, that's >>> 208 kbytes.
Avoiding to waste these 12 MB was the main aim of my initial report.

Please correct me if my calculations are wrong, with respect to this aim.

Best regards
Samuel


_______________________________________________
dev mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/dev
Clément David-3 Clément David-3
Reply | Threaded
Open this post in threaded view
|

Re: 1 boolean = 4 bytes => 1 byte ?

Hello Samuel,

> > After some diving into the ast/types source code and debugging session I got some information :
> >
> >  --> disp(%t)
> >     // gdb resolved the value as a <types::ArrayOf<int>> where m_iSize = 1
> >     // (the size of the inner m_pRealData)
> >  --> disp([%t %f %t %f])
> >     // gdb resolved the value as a <types::ArrayOf<int>> where m_iSize = 4
> >     // (the size of the inner m_pRealData)
> >
> > So in Scilab 6, there is 4 byte per boolean; to me a first thing to do before changing the
> > current
> > implementation is to let `who()` return both the memory used (including the Scilab header) and
> > the
> > memory used by the inner data storage.
> >
> > Note: as discussed in this ML, the overhead per for Scilab datatype (not inner value) is 208
> > byte
> > per value, to me it is more important to reduce it first as it will impact all ArrayOf based
> > datatype.
>  
> So, i understand that it is hard, or even impossible, or useless, to assess the
> impact of a change in term of back-compatibility wrt existing datafiles.
>
> I understand also that the Scilab devs team has finally decided -- at least as a first step --
> to retrieve the former who() behavior to get the memory, so including the Scilab header.
> To me, it would be anyway better to drop the "word" unit (set of 8 bytes) and to
> return the memory preferably in bytes. Beyond the recovery of [x,mem]=who(..),
> this would already be an improvement.
>
> As far as i understand it -- what's not sure --, i am not convinced by the last point, in terms of
> priority.
> The main concern of the initial report is the better usage of the memory,
> noticeably when doing operations on big arrays with big intermediate boolean arrays.
> Some operations may fail because of insufficient intermediate memory.
> Now, even if 1000 variables are simultaneously defined in the workspace
> -- this never happens, but let's assume it is so --,
> and that each one takes 208 bytes (really per value ?? i assume it is rather per container. Is it
> right?),
> then this uses 208 kbytes, what's nothing.
> Now, if a single boolean array is defined and used to process a whole 1000x1000x4 RGBA image,
> it uses alone 16 MB instead of 4MB, that's >>> 208 kbytes.
> Avoiding to waste these 12 MB was the main aim of my initial report.
>
> Please correct me if my calculations are wrong, with respect to this aim.

Your calculations are correct and I fully agree with you : wasting 4byte per boolean value might be
an issue for some computation.

To clarify, the idea behind reducing the ArrayOf<> size is to follow PHP internal value optimization
[1] to improve performance on allocation (basically using less memory means having more Scilab
values stored on fast smallbins allocated memory area).

[1]: https://nikic.github.io/2015/05/05/Internal-value-representation-in-PHP-7-part-1.html

--
Clément
_______________________________________________
dev mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/dev