[Scilab-users] HDF5 save is super slow

classic Classic list List threaded Threaded
22 messages Options
12
Arvid Rosén Arvid Rosén
Reply | Threaded
Open this post in threaded view
|

[Scilab-users] HDF5 save is super slow

Dear Scilab list,

 

We have been using Scilab since version 3 at my company. Migrating from one version to another has always been some work, but going from 5 to 6 seems to be the most difficult so far.

 

One of the problems for us, is the new HDF5 format for loading and saving. We have a huge number of old data sets that we need to keep working on, and most of them contain large number of state-space systems stored in lists. However, loading of saving these data sets is extremely slow compared to the old binary format. So slow in fact, that using Scilab 6 is impossible for us at the moment. I have a short test case that demonstrates the problem:

 

/////////////////////////////////

N = 4;

n = 10000;

 

filters = list();

 

for i=1:n

  G=syslin('c', rand(N,N), rand(N,1), rand(1,N), rand(1,1));

  filters($+1) = G;

end

 

tic();

save('filters.dat', filters);

ts1 = toc();

 

tic();

save('filters.dat', 'filters');

ts2 = toc();

 

printf("old save %.2fs\n", ts1);

printf("new save %.2fs\n", ts2);

printf("slowdown %.1f\n", ts2/ts1);

/////////////////////////////////

 

Obviousle, the code above need to run in scilab 5, as it uses both new and old methods for saving the list of state-space filters. And to be fair, HDF5 saving a bit faster in scilab 6, but still orders of magnitude slower than the old format in scilab 5. As a reference, below is the output on my pretty fast Mac running scilab 5:

 

Warning: Scilab 6 will not support the file format used.

Warning: Please quote the variable declaration. Example, save('myData.sod',a) becomes save('myData.sod','a').

Warning: See help('save') for the rational.

Warning: file 'filters.dat' already opened in Scilab.

old save 0.03s

new save 20.93s

slowdown 775.0

 

So my questions:

Can and will this be addressed in future versions of Scilab?

Can I store large number of state-space systems in another way to make this faster?

 

Best Regards,

Arvid


_______________________________________________
users mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/users
Antoine Monmayrant-2 Antoine Monmayrant-2
Reply | Threaded
Open this post in threaded view
|

Re: HDF5 save is super slow

Hello,

I tried your code in 5.5.1 and the last nightly-build of 6.0: I see a slowdown of around 175 between old save in 5.5.1 and new (and only) save in 6.0.
It's really related to the data structure, because we use hdf5 read/write a lot here and did not experience significant slowdowns using 6.0.
I think the overhead might come to the translation of your fairly complex variable (a long array of tlist) in the corresponding hdf5 structure.
In the old save, this translation was not necessary.
Maybe you could try to save your data in a different way.
For example:
3) you could save each element of "filters" in a separate file.
2) you could bypass save and directly write your data in a hdf5 file by using h5open(), h5write() directly. It means you need to write your own load() for your custom file format. But this way, you can try to find the best way to layout your data in hdf5 format.
3) in addition to 2) you could try to save each entry of your "filters" array as one dataset in a given hdf5 file.

Did you search on bugzilla whether this bug was already submitted?
Could you try to report it?


Antoine

Le 15/10/2018 à 10:11, Arvid Rosén a écrit :

/////////////////////////////////

N = 4;

n = 10000;

 

filters = list();

 

for i=1:n

  G=syslin('c', rand(N,N), rand(N,1), rand(1,N), rand(1,1));

  filters($+1) = G;

end

 

tic();

save('filters.dat', filters);

ts1 = toc();

 

tic();

save('filters.dat', 'filters');

ts2 = toc();

 

printf("old save %.2fs\n", ts1);

printf("new save %.2fs\n", ts2);

printf("slowdown %.1f\n", ts2/ts1);

/////////////////////////////////


-- 
+++++++++++++++++++++++++++++++++++++++++++++++++++++++

 Antoine Monmayrant LAAS - CNRS
 7 avenue du Colonel Roche
 BP 54200
 31031 TOULOUSE Cedex 4
 FRANCE

 Tel:+33 5 61 33 64 59
 
 email : [hidden email]
 permanent email : [hidden email]

+++++++++++++++++++++++++++++++++++++++++++++++++++++++


_______________________________________________
users mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/users
Antoine Monmayrant-2 Antoine Monmayrant-2
Reply | Threaded
Open this post in threaded view
|

Re: HDF5 save is super slow

In reply to this post by Arvid Rosén
Hello Arvid,

On m

Le 15/10/2018 à 10:11, Arvid Rosén a écrit :

/////////////////////////////////

N = 4;

n = 10000;

 

filters = list();

 

for i=1:n

  G=syslin('c', rand(N,N), rand(N,1), rand(1,N), rand(1,1));

  filters($+1) = G;

end

 

tic();

save('filters.dat', filters);

ts1 = toc();

 

tic();

save('filters.dat', 'filters');

ts2 = toc();

 

printf("old save %.2fs\n", ts1);

printf("new save %.2fs\n", ts2);

printf("slowdown %.1f\n", ts2/ts1);

/////////////////////////////////


-- 
+++++++++++++++++++++++++++++++++++++++++++++++++++++++

 Antoine Monmayrant LAAS - CNRS
 7 avenue du Colonel Roche
 BP 54200
 31031 TOULOUSE Cedex 4
 FRANCE

 Tel:+33 5 61 33 64 59
 
 email : [hidden email]
 permanent email : [hidden email]

+++++++++++++++++++++++++++++++++++++++++++++++++++++++


_______________________________________________
users mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/users
Arvid Rosén Arvid Rosén
Reply | Threaded
Open this post in threaded view
|

Re: HDF5 save is super slow

In reply to this post by Antoine Monmayrant-2

Hi,

 

Thanks for getting back to me!

 

Unfortunately, we used Scilab’s pretty cool way of doing object orientation, so we have big nested tlist structures with multiple instances of various lists of filters and other structures, as in my example. Saving those structures in some explicit manual way would be extremely complicated. Or is there some way of writing explicit HDF5 saving/loading schemes using overloading? That would be great! I am sure we could find the main culprits and do something explicit for them, but as they can be located wherever in a big nested structure, it would be painful to do anything on the top level.

 

Another, related I guess, problem here is that the new file format uses about 15 times as much disk space as the old format (for a typical ill-behaved nested structure). That adds to the save/load time too I guess, but is probably not the main source here.

 

I think I might have reported this earlier using Bugzilla, but I’m not sure. I’ll check and report it if not.

 

Cheers,

Arvid

 

From: users <[hidden email]> on behalf of "[hidden email]" <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>, Users mailing list for Scilab <[hidden email]>
Date: Monday, 15 October 2018 at 11:08
To: "[hidden email]" <[hidden email]>
Subject: Re: [Scilab-users] HDF5 save is super slow

 

Hello,

I tried your code in 5.5.1 and the last nightly-build of 6.0: I see a slowdown of around 175 between old save in 5.5.1 and new (and only) save in 6.0.
It's really related to the data structure, because we use hdf5 read/write a lot here and did not experience significant slowdowns using 6.0.
I think the overhead might come to the translation of your fairly complex variable (a long array of tlist) in the corresponding hdf5 structure.
In the old save, this translation was not necessary.
Maybe you could try to save your data in a different way.
For example:
3) you could save each element of "filters" in a separate file.
2) you could bypass save and directly write your data in a hdf5 file by using h5open(), h5write() directly. It means you need to write your own load() for your custom file format. But this way, you can try to find the best way to layout your data in hdf5 format.
3) in addition to 2) you could try to save each entry of your "filters" array as one dataset in a given hdf5 file.

Did you search on bugzilla whether this bug was already submitted?
Could you try to report it?


Antoine

Le 15/10/2018 à 10:11, Arvid Rosén a écrit :

/////////////////////////////////

N = 4;

n = 10000;

 

filters = list();

 

for i=1:n

  G=syslin('c', rand(N,N), rand(N,1), rand(1,N), rand(1,1));

  filters($+1) = G;

end

 

tic();

save('filters.dat', filters);

ts1 = toc();

 

tic();

save('filters.dat', 'filters');

ts2 = toc();

 

printf("old save %.2fs\n", ts1);

printf("new save %.2fs\n", ts2);

printf("slowdown %.1f\n", ts2/ts1);

/////////////////////////////////

 

-- 
+++++++++++++++++++++++++++++++++++++++++++++++++++++++
 
 Antoine Monmayrant LAAS - CNRS
 7 avenue du Colonel Roche
 BP 54200
 31031 TOULOUSE Cedex 4
 FRANCE
 
 Tel:+33 5 61 33 64 59
 
 email : [hidden email]
 permanent email : [hidden email]
 
+++++++++++++++++++++++++++++++++++++++++++++++++++++++
 

_______________________________________________
users mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/users
Antoine Monmayrant Antoine Monmayrant
Reply | Threaded
Open this post in threaded view
|

Re: HDF5 save is super slow

Le 15/10/2018 à 11:55, Arvid Rosén a écrit :

Hi,

 

Thanks for getting back to me!

 

Unfortunately, we used Scilab’s pretty cool way of doing object orientation, so we have big nested tlist structures with multiple instances of various lists of filters and other structures, as in my example. Saving those structures in some explicit manual way would be extremely complicated. Or is there some way of writing explicit HDF5 saving/loading schemes using overloading? That would be great! I am sure we could find the main culprits and do something explicit for them, but as they can be located wherever in a big nested structure, it would be painful to do anything on the top level.

 

Another, related I guess, problem here is that the new file format uses about 15 times as much disk space as the old format (for a typical ill-behaved nested structure). That adds to the save/load time too I guess, but is probably not the main source here.

Argh, yes, I tested it and in your example, I have a file x8.5 bigger.
I think that both increases in time and size are real issues and should be reported as bugs.

By the way, I rewrote your script to run it under both 6.0 and 5.5:

/////////////////////////////////
N = 4;
n = 10000;
filters = list();

for i=1:n
  G=syslin('c', rand(N,N), rand(N,1), rand(1,N), rand(1,1));
  filters($+1) = G;
end
 
ver=getversion('scilab');

if ver(1)<6 then
    tic();
    save('filters_old.dat', filters);
    ts1 = toc();
else
    tic();
    save('filters_new.dat', 'filters');
    ts1 = toc();   
end
 
printf("Time for save %.2fs\n", ts1);
/////////////////////////////////

Hope it helps,

Antoine

 

I think I might have reported this earlier using Bugzilla, but I’m not sure. I’ll check and report it if not.

 

Cheers,

Arvid

 

From: users [hidden email] on behalf of [hidden email] [hidden email]
Reply-To: [hidden email] [hidden email], Users mailing list for Scilab [hidden email]
Date: Monday, 15 October 2018 at 11:08
To: [hidden email] [hidden email]
Subject: Re: [Scilab-users] HDF5 save is super slow

 

Hello,

I tried your code in 5.5.1 and the last nightly-build of 6.0: I see a slowdown of around 175 between old save in 5.5.1 and new (and only) save in 6.0.
It's really related to the data structure, because we use hdf5 read/write a lot here and did not experience significant slowdowns using 6.0.
I think the overhead might come to the translation of your fairly complex variable (a long array of tlist) in the corresponding hdf5 structure.
In the old save, this translation was not necessary.
Maybe you could try to save your data in a different way.
For example:
3) you could save each element of "filters" in a separate file.
2) you could bypass save and directly write your data in a hdf5 file by using h5open(), h5write() directly. It means you need to write your own load() for your custom file format. But this way, you can try to find the best way to layout your data in hdf5 format.
3) in addition to 2) you could try to save each entry of your "filters" array as one dataset in a given hdf5 file.

Did you search on bugzilla whether this bug was already submitted?
Could you try to report it?


Antoine

Le 15/10/2018 à 10:11, Arvid Rosén a écrit :

/////////////////////////////////

N = 4;

n = 10000;

 

filters = list();

 

for i=1:n

  G=syslin('c', rand(N,N), rand(N,1), rand(1,N), rand(1,1));

  filters($+1) = G;

end

 

tic();

save('filters.dat', filters);

ts1 = toc();

 

tic();

save('filters.dat', 'filters');

ts2 = toc();

 

printf("old save %.2fs\n", ts1);

printf("new save %.2fs\n", ts2);

printf("slowdown %.1f\n", ts2/ts1);

/////////////////////////////////

 

-- 
+++++++++++++++++++++++++++++++++++++++++++++++++++++++
 
 Antoine Monmayrant LAAS - CNRS
 7 avenue du Colonel Roche
 BP 54200
 31031 TOULOUSE Cedex 4
 FRANCE
 
 Tel:+33 5 61 33 64 59
 
 email : [hidden email]
 permanent email : [hidden email]
 
+++++++++++++++++++++++++++++++++++++++++++++++++++++++
 


-- 
+++++++++++++++++++++++++++++++++++++++++++++++++++++++

 Antoine Monmayrant LAAS - CNRS
 7 avenue du Colonel Roche
 BP 54200
 31031 TOULOUSE Cedex 4
 FRANCE

 Tel:+33 5 61 33 64 59
 
 email : [hidden email]
 permanent email : [hidden email]

+++++++++++++++++++++++++++++++++++++++++++++++++++++++


_______________________________________________
users mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/users
mottelet mottelet
Reply | Threaded
Open this post in threaded view
|

Re: HDF5 save is super slow

Hello,

I looked a little bit in the sources: the evident bottleneck is the nested creation of an hdf5 group each time that a container variable is met.
For the given example, this is particularly evident. If you replace the syslin structure by the corresponding [A,B;C,D] matrix, then save is ten times faster:

N = 4;
n = 1000;
filters = list();
for i=1:n
  G=syslin('c', rand(N,N), rand(N,1), rand(1,N), rand(1,1));
  filters($+1) = G;
end
tic();
save('filters.dat', 'filters');
disp(toc());
--> disp(toc());

   0.724754

N = 4;
n = 1000;
filters = list()
for i=1:n
  G=syslin('c', rand(N,N), rand(N,1), rand(1,N), rand(1,1));
  filters($+1) = [G.a G.b;G.c G.d];
end
tic();
save('filters.dat', 'filters');
disp(toc());
--> disp(toc());

   0.082302

Serializing container objects seems to be the solution, but it goes towards an orthogonal direction w.r.t. the hdf5 portability spirit.

S.


Le 15/10/2018 à 12:22, Antoine Monmayrant a écrit :
Le 15/10/2018 à 11:55, Arvid Rosén a écrit :

Hi,

 

Thanks for getting back to me!

 

Unfortunately, we used Scilab’s pretty cool way of doing object orientation, so we have big nested tlist structures with multiple instances of various lists of filters and other structures, as in my example. Saving those structures in some explicit manual way would be extremely complicated. Or is there some way of writing explicit HDF5 saving/loading schemes using overloading? That would be great! I am sure we could find the main culprits and do something explicit for them, but as they can be located wherever in a big nested structure, it would be painful to do anything on the top level.

 

Another, related I guess, problem here is that the new file format uses about 15 times as much disk space as the old format (for a typical ill-behaved nested structure). That adds to the save/load time too I guess, but is probably not the main source here.

Argh, yes, I tested it and in your example, I have a file x8.5 bigger.
I think that both increases in time and size are real issues and should be reported as bugs.

By the way, I rewrote your script to run it under both 6.0 and 5.5:

/////////////////////////////////
N = 4;
n = 10000;
filters = list();

for i=1:n
  G=syslin('c', rand(N,N), rand(N,1), rand(1,N), rand(1,1));
  filters($+1) = G;
end
 
ver=getversion('scilab');

if ver(1)<6 then
    tic();
    save('filters_old.dat', filters);
    ts1 = toc();
else
    tic();
    save('filters_new.dat', 'filters');
    ts1 = toc();   
end
 
printf("Time for save %.2fs\n", ts1);
/////////////////////////////////

Hope it helps,

Antoine

 

I think I might have reported this earlier using Bugzilla, but I’m not sure. I’ll check and report it if not.

 

Cheers,

Arvid

 

From: users [hidden email] on behalf of [hidden email] [hidden email]
Reply-To: [hidden email] [hidden email], Users mailing list for Scilab [hidden email]
Date: Monday, 15 October 2018 at 11:08
To: [hidden email] [hidden email]
Subject: Re: [Scilab-users] HDF5 save is super slow

 

Hello,

I tried your code in 5.5.1 and the last nightly-build of 6.0: I see a slowdown of around 175 between old save in 5.5.1 and new (and only) save in 6.0.
It's really related to the data structure, because we use hdf5 read/write a lot here and did not experience significant slowdowns using 6.0.
I think the overhead might come to the translation of your fairly complex variable (a long array of tlist) in the corresponding hdf5 structure.
In the old save, this translation was not necessary.
Maybe you could try to save your data in a different way.
For example:
3) you could save each element of "filters" in a separate file.
2) you could bypass save and directly write your data in a hdf5 file by using h5open(), h5write() directly. It means you need to write your own load() for your custom file format. But this way, you can try to find the best way to layout your data in hdf5 format.
3) in addition to 2) you could try to save each entry of your "filters" array as one dataset in a given hdf5 file.

Did you search on bugzilla whether this bug was already submitted?
Could you try to report it?


Antoine

Le 15/10/2018 à 10:11, Arvid Rosén a écrit :

/////////////////////////////////

N = 4;

n = 10000;

 

filters = list();

 

for i=1:n

  G=syslin('c', rand(N,N), rand(N,1), rand(1,N), rand(1,1));

  filters($+1) = G;

end

 

tic();

save('filters.dat', filters);

ts1 = toc();

 

tic();

save('filters.dat', 'filters');

ts2 = toc();

 

printf("old save %.2fs\n", ts1);

printf("new save %.2fs\n", ts2);

printf("slowdown %.1f\n", ts2/ts1);

/////////////////////////////////

 

-- 
+++++++++++++++++++++++++++++++++++++++++++++++++++++++
 
 Antoine Monmayrant LAAS - CNRS
 7 avenue du Colonel Roche
 BP 54200
 31031 TOULOUSE Cedex 4
 FRANCE
 
 <a class="moz-txt-link-freetext" href="Tel:+33">Tel:+33 5 61 33 64 59
 
 email : [hidden email]
 permanent email : [hidden email]
 
+++++++++++++++++++++++++++++++++++++++++++++++++++++++
 


-- 
+++++++++++++++++++++++++++++++++++++++++++++++++++++++

 Antoine Monmayrant LAAS - CNRS
 7 avenue du Colonel Roche
 BP 54200
 31031 TOULOUSE Cedex 4
 FRANCE

 <a class="moz-txt-link-freetext" href="Tel:+33">Tel:+33 5 61 33 64 59
 
 email : [hidden email]
 permanent email : [hidden email]

+++++++++++++++++++++++++++++++++++++++++++++++++++++++



_______________________________________________
users mailing list
[hidden email]
https://antispam.utc.fr/proxy/1/c3RlcGhhbmUubW90dGVsZXRAdXRjLmZy/lists.scilab.org/mailman/listinfo/users


-- 
Stéphane Mottelet
Ingénieur de recherche
EA 4297 Transformations Intégrées de la Matière Renouvelable
Département Génie des Procédés Industriels
Sorbonne Universités - Université de Technologie de Compiègne
CS 60319, 60203 Compiègne cedex
Tel : +33(0)344234688
http://www.utc.fr/~mottelet

_______________________________________________
users mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/users
Arvid Rosén Arvid Rosén
Reply | Threaded
Open this post in threaded view
|

Re: HDF5 save is super slow

Hi,

 

Yeah, that makes sense. Or, it was about what I expected at least. It is a pity though, as handling thousands of filters isn’t necessarily a strange thing to do with a software like Scilab, and making a special serialization like that would be nothing less than a hack.

 

Do you think there is a way forward under the hood that could make big deep list structures >10x faster in the future? Otherwise, the whole object orientation part of Scilab (tlist and mlist etc.) would be hard to use for anything that comes in large numbers, which would be a shame, especially as it used to work just fine (well, I can see how the old structure wasn’t “just fine” in other ways, but still).

 

Cheers,

Arvid

 

 

From: users <[hidden email]> on behalf of Stéphane Mottelet <[hidden email]>
Organization: Université de Technologie de Compiègne
Reply-To: Users mailing list for Scilab <[hidden email]>
Date: Monday, 15 October 2018 at 14:37
To: "[hidden email]" <[hidden email]>
Subject: Re: [Scilab-users] HDF5 save is super slow

 

Hello,

I looked a little bit in the sources: the evident bottleneck is the nested creation of an hdf5 group each time that a container variable is met.
For the given example, this is particularly evident. If you replace the syslin structure by the corresponding [A,B;C,D] matrix, then save is ten times faster:

N = 4;
n = 1000;
filters = list();
for i=1:n
  G=syslin('c', rand(N,N), rand(N,1), rand(1,N), rand(1,1));
  filters($+1) = G;
end
tic();
save('filters.dat', 'filters');
disp(toc());
--> disp(toc());

   0.724754

N = 4;
n = 1000;
filters = list()
for i=1:n
  G=syslin('c', rand(N,N), rand(N,1), rand(1,N), rand(1,1));
  filters($+1) = [G.a G.b;G.c G.d];
end
tic();
save('filters.dat', 'filters');
disp(toc());
--> disp(toc());

   0.082302

Serializing container objects seems to be the solution, but it goes towards an orthogonal direction w.r.t. the hdf5 portability spirit.

S.


Le 15/10/2018 à 12:22, Antoine Monmayrant a écrit :

Le 15/10/2018 à 11:55, Arvid Rosén a écrit :

Hi,

 

Thanks for getting back to me!

 

Unfortunately, we used Scilab’s pretty cool way of doing object orientation, so we have big nested tlist structures with multiple instances of various lists of filters and other structures, as in my example. Saving those structures in some explicit manual way would be extremely complicated. Or is there some way of writing explicit HDF5 saving/loading schemes using overloading? That would be great! I am sure we could find the main culprits and do something explicit for them, but as they can be located wherever in a big nested structure, it would be painful to do anything on the top level.

 

Another, related I guess, problem here is that the new file format uses about 15 times as much disk space as the old format (for a typical ill-behaved nested structure). That adds to the save/load time too I guess, but is probably not the main source here.

Argh, yes, I tested it and in your example, I have a file x8.5 bigger.
I think that both increases in time and size are real issues and should be reported as bugs.

By the way, I rewrote your script to run it under both 6.0 and 5.5:

/////////////////////////////////
N = 4;
n = 10000;
filters = list();

for i=1:n
  G=syslin('c', rand(N,N), rand(N,1), rand(1,N), rand(1,1));
  filters($+1) = G;
end
 
ver=getversion('scilab');

if ver(1)<6 then
    tic();
    save('filters_old.dat', filters);
    ts1 = toc();
else
    tic();
    save('filters_new.dat', 'filters');
    ts1 = toc();   
end
 
printf("Time for save %.2fs\n", ts1);
/////////////////////////////////

Hope it helps,

Antoine


 

I think I might have reported this earlier using Bugzilla, but I’m not sure. I’ll check and report it if not.

 

Cheers,

Arvid

 

From: users [hidden email] on behalf of [hidden email] [hidden email]
Reply-To: [hidden email] [hidden email], Users mailing list for Scilab [hidden email]
Date: Monday, 15 October 2018 at 11:08
To: [hidden email] [hidden email]
Subject: Re: [Scilab-users] HDF5 save is super slow

 

Hello,

I tried your code in 5.5.1 and the last nightly-build of 6.0: I see a slowdown of around 175 between old save in 5.5.1 and new (and only) save in 6.0.
It's really related to the data structure, because we use hdf5 read/write a lot here and did not experience significant slowdowns using 6.0.
I think the overhead might come to the translation of your fairly complex variable (a long array of tlist) in the corresponding hdf5 structure.
In the old save, this translation was not necessary.
Maybe you could try to save your data in a different way.
For example:
3) you could save each element of "filters" in a separate file.
2) you could bypass save and directly write your data in a hdf5 file by using h5open(), h5write() directly. It means you need to write your own load() for your custom file format. But this way, you can try to find the best way to layout your data in hdf5 format.
3) in addition to 2) you could try to save each entry of your "filters" array as one dataset in a given hdf5 file.

Did you search on bugzilla whether this bug was already submitted?
Could you try to report it?


Antoine

Le 15/10/2018 à 10:11, Arvid Rosén a écrit :

/////////////////////////////////

N = 4;

n = 10000;

 

filters = list();

 

for i=1:n

  G=syslin('c', rand(N,N), rand(N,1), rand(1,N), rand(1,1));

  filters($+1) = G;

end

 

tic();

save('filters.dat', filters);

ts1 = toc();

 

tic();

save('filters.dat', 'filters');

ts2 = toc();

 

printf("old save %.2fs\n", ts1);

printf("new save %.2fs\n", ts2);

printf("slowdown %.1f\n", ts2/ts1);

/////////////////////////////////

 

-- 
+++++++++++++++++++++++++++++++++++++++++++++++++++++++
 
 Antoine Monmayrant LAAS - CNRS
 7 avenue du Colonel Roche
 BP 54200
 31031 TOULOUSE Cedex 4
 FRANCE
 
 <a href="Tel:&#43;33">Tel:+33 5 61 33 64 59
 
 email : [hidden email]
 permanent email : [hidden email]
 
+++++++++++++++++++++++++++++++++++++++++++++++++++++++
 

 

-- 
+++++++++++++++++++++++++++++++++++++++++++++++++++++++
 
 Antoine Monmayrant LAAS - CNRS
 7 avenue du Colonel Roche
 BP 54200
 31031 TOULOUSE Cedex 4
 FRANCE
 
 <a href="Tel:&#43;33">Tel:+33 5 61 33 64 59
 
 email : [hidden email]
 permanent email : [hidden email]
 
+++++++++++++++++++++++++++++++++++++++++++++++++++++++
 




_______________________________________________
users mailing list
[hidden email]
https://antispam.utc.fr/proxy/1/c3RlcGhhbmUubW90dGVsZXRAdXRjLmZy/lists.scilab.org/mailman/listinfo/users

 

-- 
Stéphane Mottelet
Ingénieur de recherche
EA 4297 Transformations Intégrées de la Matière Renouvelable
Département Génie des Procédés Industriels
Sorbonne Universités - Université de Technologie de Compiègne
CS 60319, 60203 Compiègne cedex
Tel : +33(0)344234688
http://www.utc.fr/~mottelet

_______________________________________________
users mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/users
mottelet mottelet
Reply | Threaded
Open this post in threaded view
|

Re: HDF5 save is super slow

Le 15/10/2018 à 15:07, Arvid Rosén a écrit :

Hi,

 

Yeah, that makes sense. Or, it was about what I expected at least. It is a pity though, as handling thousands of filters isn’t necessarily a strange thing to do with a software like Scilab, and making a special serialization like that would be nothing less than a hack.

 

Do you think there is a way forward under the hood that could make big deep list structures >10x faster in the future?

No. I think that hdf5 is not convenient for deeply structured data with small leafs. Some interesting discussions can be found here:

https://cyrille.rossant.net/should-you-use-hdf5/
https://cyrille.rossant.net/moving-away-hdf5/

If you just need to read/write within your own software, serializing should not be an issue. In the example you gave, the structure of each leaf is always the same: using an array of structs improves performances a little bit:

clear
N = 4;
n = 1000;
for i=1:n
   G(i).a=rand(N,N);
   G(i).b=rand(N,1);
   G(i).c=rand(1,N);
   G(i).c=rand(1,1);
end
tic();
save('filters.dat', 'G');
disp(toc());
--> disp(toc());

   0.24133


S.

Otherwise, the whole object orientation part of Scilab (tlist and mlist etc.) would be hard to use for anything that comes in large numbers, which would be a shame, especially as it used to work just fine (well, I can see how the old structure wasn’t “just fine” in other ways, but still).

 

Cheers,

Arvid

 

 

From: users [hidden email] on behalf of Stéphane Mottelet [hidden email]
Organization: Université de Technologie de Compiègne
Reply-To: Users mailing list for Scilab [hidden email]
Date: Monday, 15 October 2018 at 14:37
To: [hidden email] [hidden email]
Subject: Re: [Scilab-users] HDF5 save is super slow

 

Hello,

I looked a little bit in the sources: the evident bottleneck is the nested creation of an hdf5 group each time that a container variable is met.
For the given example, this is particularly evident. If you replace the syslin structure by the corresponding [A,B;C,D] matrix, then save is ten times faster:

N = 4;
n = 1000;
filters = list();
for i=1:n
  G=syslin('c', rand(N,N), rand(N,1), rand(1,N), rand(1,1));
  filters($+1) = G;
end
tic();
save('filters.dat', 'filters');
disp(toc());
--> disp(toc());

   0.724754

N = 4;
n = 1000;
filters = list()
for i=1:n
  G=syslin('c', rand(N,N), rand(N,1), rand(1,N), rand(1,1));
  filters($+1) = [G.a G.b;G.c G.d];
end
tic();
save('filters.dat', 'filters');
disp(toc());
--> disp(toc());

   0.082302

Serializing container objects seems to be the solution, but it goes towards an orthogonal direction w.r.t. the hdf5 portability spirit.

S.


Le 15/10/2018 à 12:22, Antoine Monmayrant a écrit :

Le 15/10/2018 à 11:55, Arvid Rosén a écrit :

Hi,

 

Thanks for getting back to me!

 

Unfortunately, we used Scilab’s pretty cool way of doing object orientation, so we have big nested tlist structures with multiple instances of various lists of filters and other structures, as in my example. Saving those structures in some explicit manual way would be extremely complicated. Or is there some way of writing explicit HDF5 saving/loading schemes using overloading? That would be great! I am sure we could find the main culprits and do something explicit for them, but as they can be located wherever in a big nested structure, it would be painful to do anything on the top level.

 

Another, related I guess, problem here is that the new file format uses about 15 times as much disk space as the old format (for a typical ill-behaved nested structure). That adds to the save/load time too I guess, but is probably not the main source here.

Argh, yes, I tested it and in your example, I have a file x8.5 bigger.
I think that both increases in time and size are real issues and should be reported as bugs.

By the way, I rewrote your script to run it under both 6.0 and 5.5:

/////////////////////////////////
N = 4;
n = 10000;
filters = list();

for i=1:n
  G=syslin('c', rand(N,N), rand(N,1), rand(1,N), rand(1,1));
  filters($+1) = G;
end
 
ver=getversion('scilab');

if ver(1)<6 then
    tic();
    save('filters_old.dat', filters);
    ts1 = toc();
else
    tic();
    save('filters_new.dat', 'filters');
    ts1 = toc();   
end
 
printf("Time for save %.2fs\n", ts1);
/////////////////////////////////

Hope it helps,

Antoine


 

I think I might have reported this earlier using Bugzilla, but I’m not sure. I’ll check and report it if not.

 

Cheers,

Arvid

 

From: users [hidden email] on behalf of [hidden email] [hidden email]
Reply-To: [hidden email] [hidden email], Users mailing list for Scilab [hidden email]
Date: Monday, 15 October 2018 at 11:08
To: [hidden email] [hidden email]
Subject: Re: [Scilab-users] HDF5 save is super slow

 

Hello,

I tried your code in 5.5.1 and the last nightly-build of 6.0: I see a slowdown of around 175 between old save in 5.5.1 and new (and only) save in 6.0.
It's really related to the data structure, because we use hdf5 read/write a lot here and did not experience significant slowdowns using 6.0.
I think the overhead might come to the translation of your fairly complex variable (a long array of tlist) in the corresponding hdf5 structure.
In the old save, this translation was not necessary.
Maybe you could try to save your data in a different way.
For example:
3) you could save each element of "filters" in a separate file.
2) you could bypass save and directly write your data in a hdf5 file by using h5open(), h5write() directly. It means you need to write your own load() for your custom file format. But this way, you can try to find the best way to layout your data in hdf5 format.
3) in addition to 2) you could try to save each entry of your "filters" array as one dataset in a given hdf5 file.

Did you search on bugzilla whether this bug was already submitted?
Could you try to report it?


Antoine

Le 15/10/2018 à 10:11, Arvid Rosén a écrit :

/////////////////////////////////

N = 4;

n = 10000;

 

filters = list();

 

for i=1:n

  G=syslin('c', rand(N,N), rand(N,1), rand(1,N), rand(1,1));

  filters($+1) = G;

end

 

tic();

save('filters.dat', filters);

ts1 = toc();

 

tic();

save('filters.dat', 'filters');

ts2 = toc();

 

printf("old save %.2fs\n", ts1);

printf("new save %.2fs\n", ts2);

printf("slowdown %.1f\n", ts2/ts1);

/////////////////////////////////

 

-- 
+++++++++++++++++++++++++++++++++++++++++++++++++++++++
 
 Antoine Monmayrant LAAS - CNRS
 7 avenue du Colonel Roche
 BP 54200
 31031 TOULOUSE Cedex 4
 FRANCE
 
 <a href="Tel:+33" moz-do-not-send="true">Tel:+33 5 61 33 64 59
 
 email : [hidden email]
 permanent email : [hidden email]
 
+++++++++++++++++++++++++++++++++++++++++++++++++++++++
 

 

-- 
+++++++++++++++++++++++++++++++++++++++++++++++++++++++
 
 Antoine Monmayrant LAAS - CNRS
 7 avenue du Colonel Roche
 BP 54200
 31031 TOULOUSE Cedex 4
 FRANCE
 
 <a href="Tel:+33" moz-do-not-send="true">Tel:+33 5 61 33 64 59
 
 email : [hidden email]
 permanent email : [hidden email]
 
+++++++++++++++++++++++++++++++++++++++++++++++++++++++
 




_______________________________________________
users mailing list
[hidden email]
https://antispam.utc.fr/proxy/1/c3RlcGhhbmUubW90dGVsZXRAdXRjLmZy/lists.scilab.org/mailman/listinfo/users

 

-- 
Stéphane Mottelet
Ingénieur de recherche
EA 4297 Transformations Intégrées de la Matière Renouvelable
Département Génie des Procédés Industriels
Sorbonne Universités - Université de Technologie de Compiègne
CS 60319, 60203 Compiègne cedex
Tel : +33(0)344234688
http://www.utc.fr/~mottelet


_______________________________________________
users mailing list
[hidden email]
https://antispam.utc.fr/proxy/1/c3RlcGhhbmUubW90dGVsZXRAdXRjLmZy/lists.scilab.org/mailman/listinfo/users


-- 
Stéphane Mottelet
Ingénieur de recherche
EA 4297 Transformations Intégrées de la Matière Renouvelable
Département Génie des Procédés Industriels
Sorbonne Universités - Université de Technologie de Compiègne
CS 60319, 60203 Compiègne cedex
Tel : +33(0)344234688
http://www.utc.fr/~mottelet

_______________________________________________
users mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/users
Clément David-2 Clément David-2
Reply | Threaded
Open this post in threaded view
|

Re: HDF5 save is super slow

In reply to this post by mottelet

Hello all,

 

Correct, I experienced such a slowness while working with Xcos diagrams for Scilab 5. At first we considered HDF5 for this deep nested list / mlist data-structure storage however after some tests ; XML might be used for tree-like storage and HDF5 (or Java types serialization) for big matrices.

 

AFAIK currently there is no easy way to load/save specifying a format other than HDF5 ; maybe adding xmlSave/xmlLoad sci_gateway to let the user select an xml file format for any Scilab structure might provide better performance on your use-case. JSON might also be another candidate to look at for decent serialization support.

 

PS: Scilab 5.5.1 load/save are direct memory dump so this is really the fastest you can get from Scilab ; HDF5 binary format is good enough for matrices

 

--

Clément

 

From: users <[hidden email]> On Behalf Of Stéphane Mottelet
Sent: Monday, October 15, 2018 2:36 PM
To: [hidden email]
Subject: Re: [Scilab-users] HDF5 save is super slow

 

Hello,

I looked a little bit in the sources: the evident bottleneck is the nested creation of an hdf5 group each time that a container variable is met.
For the given example, this is particularly evident. If you replace the syslin structure by the corresponding [A,B;C,D] matrix, then save is ten times faster:

N = 4;
n = 1000;
filters = list();
for i=1:n
  G=syslin('c', rand(N,N), rand(N,1), rand(1,N), rand(1,1));
  filters($+1) = G;
end
tic();
save('filters.dat', 'filters');
disp(toc());
--> disp(toc());

   0.724754

N = 4;
n = 1000;
filters = list()
for i=1:n
  G=syslin('c', rand(N,N), rand(N,1), rand(1,N), rand(1,1));
  filters($+1) = [G.a G.b;G.c G.d];
end
tic();
save('filters.dat', 'filters');
disp(toc());
--> disp(toc());

   0.082302

Serializing container objects seems to be the solution, but it goes towards an orthogonal direction w.r.t. the hdf5 portability spirit.

S.


Le 15/10/2018 à 12:22, Antoine Monmayrant a écrit :

Le 15/10/2018 à 11:55, Arvid Rosén a écrit :

Hi,

 

Thanks for getting back to me!

 

Unfortunately, we used Scilab’s pretty cool way of doing object orientation, so we have big nested tlist structures with multiple instances of various lists of filters and other structures, as in my example. Saving those structures in some explicit manual way would be extremely complicated. Or is there some way of writing explicit HDF5 saving/loading schemes using overloading? That would be great! I am sure we could find the main culprits and do something explicit for them, but as they can be located wherever in a big nested structure, it would be painful to do anything on the top level.

 

Another, related I guess, problem here is that the new file format uses about 15 times as much disk space as the old format (for a typical ill-behaved nested structure). That adds to the save/load time too I guess, but is probably not the main source here.

Argh, yes, I tested it and in your example, I have a file x8.5 bigger.
I think that both increases in time and size are real issues and should be reported as bugs.

By the way, I rewrote your script to run it under both 6.0 and 5.5:

/////////////////////////////////
N = 4;
n = 10000;
filters = list();

for i=1:n
  G=syslin('c', rand(N,N), rand(N,1), rand(1,N), rand(1,1));
  filters($+1) = G;
end
 
ver=getversion('scilab');

if ver(1)<6 then
    tic();
    save('filters_old.dat', filters);
    ts1 = toc();
else
    tic();
    save('filters_new.dat', 'filters');
    ts1 = toc();   
end
 
printf("Time for save %.2fs\n", ts1);
/////////////////////////////////

Hope it helps,

Antoine


 

I think I might have reported this earlier using Bugzilla, but I’m not sure. I’ll check and report it if not.

 

Cheers,

Arvid

 

From: users [hidden email] on behalf of [hidden email] [hidden email]
Reply-To: [hidden email] [hidden email], Users mailing list for Scilab [hidden email]
Date: Monday, 15 October 2018 at 11:08
To: [hidden email] [hidden email]
Subject: Re: [Scilab-users] HDF5 save is super slow

 

Hello,

I tried your code in 5.5.1 and the last nightly-build of 6.0: I see a slowdown of around 175 between old save in 5.5.1 and new (and only) save in 6.0.
It's really related to the data structure, because we use hdf5 read/write a lot here and did not experience significant slowdowns using 6.0.
I think the overhead might come to the translation of your fairly complex variable (a long array of tlist) in the corresponding hdf5 structure.
In the old save, this translation was not necessary.
Maybe you could try to save your data in a different way.
For example:
3) you could save each element of "filters" in a separate file.
2) you could bypass save and directly write your data in a hdf5 file by using h5open(), h5write() directly. It means you need to write your own load() for your custom file format. But this way, you can try to find the best way to layout your data in hdf5 format.
3) in addition to 2) you could try to save each entry of your "filters" array as one dataset in a given hdf5 file.

Did you search on bugzilla whether this bug was already submitted?
Could you try to report it?


Antoine

Le 15/10/2018 à 10:11, Arvid Rosén a écrit :

/////////////////////////////////

N = 4;

n = 10000;

 

filters = list();

 

for i=1:n

  G=syslin('c', rand(N,N), rand(N,1), rand(1,N), rand(1,1));

  filters($+1) = G;

end

 

tic();

save('filters.dat', filters);

ts1 = toc();

 

tic();

save('filters.dat', 'filters');

ts2 = toc();

 

printf("old save %.2fs\n", ts1);

printf("new save %.2fs\n", ts2);

printf("slowdown %.1f\n", ts2/ts1);

/////////////////////////////////

 

-- 
+++++++++++++++++++++++++++++++++++++++++++++++++++++++
 
 Antoine Monmayrant LAAS - CNRS
 7 avenue du Colonel Roche
 BP 54200
 31031 TOULOUSE Cedex 4
 FRANCE
 
 <a href="Tel:&#43;33">Tel:+33 5 61 33 64 59
 
 email : [hidden email]
 permanent email : [hidden email]
 
+++++++++++++++++++++++++++++++++++++++++++++++++++++++
 

 

-- 
+++++++++++++++++++++++++++++++++++++++++++++++++++++++
 
 Antoine Monmayrant LAAS - CNRS
 7 avenue du Colonel Roche
 BP 54200
 31031 TOULOUSE Cedex 4
 FRANCE
 
 <a href="Tel:&#43;33">Tel:+33 5 61 33 64 59
 
 email : [hidden email]
 permanent email : [hidden email]
 
+++++++++++++++++++++++++++++++++++++++++++++++++++++++
 




_______________________________________________
users mailing list
[hidden email]
https://antispam.utc.fr/proxy/1/c3RlcGhhbmUubW90dGVsZXRAdXRjLmZy/lists.scilab.org/mailman/listinfo/users

 

-- 
Stéphane Mottelet
Ingénieur de recherche
EA 4297 Transformations Intégrées de la Matière Renouvelable
Département Génie des Procédés Industriels
Sorbonne Universités - Université de Technologie de Compiègne
CS 60319, 60203 Compiègne cedex
Tel : +33(0)344234688
http://www.utc.fr/~mottelet

_______________________________________________
users mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/users
Arvid Rosén Arvid Rosén
Reply | Threaded
Open this post in threaded view
|

Re: HDF5 save is super slow

Hi again,

 

I just filed a bug report here:

http://bugzilla.scilab.org/show_bug.cgi?id=15809

 

Would it be possible to bring back the old mem-dump approach in scilab 6? I mean, could I write a gateway that just takes a pointer to the first byte in memory, figures out the size, and dumps to disk? Or maybe it doesn’t work like that. Writing a JSON exporter for storing filter coefficients in a math software package seems a bit ridicules, but hey, if it works it might be worth it in our case.

 

Cheers,

Arvid

 

 

From: users <[hidden email]> on behalf of Clément DAVID <[hidden email]>
Reply-To: Users mailing list for Scilab <[hidden email]>
Date: Monday, 15 October 2018 at 15:48
To: Users mailing list for Scilab <[hidden email]>
Cc: Clément David <[hidden email]>
Subject: Re: [Scilab-users] HDF5 save is super slow

 

Hello all,

 

Correct, I experienced such a slowness while working with Xcos diagrams for Scilab 5. At first we considered HDF5 for this deep nested list / mlist data-structure storage however after some tests ; XML might be used for tree-like storage and HDF5 (or Java types serialization) for big matrices.

 

AFAIK currently there is no easy way to load/save specifying a format other than HDF5 ; maybe adding xmlSave/xmlLoad sci_gateway to let the user select an xml file format for any Scilab structure might provide better performance on your use-case. JSON might also be another candidate to look at for decent serialization support.

 

PS: Scilab 5.5.1 load/save are direct memory dump so this is really the fastest you can get from Scilab ; HDF5 binary format is good enough for matrices

 

--

Clément

 

From: users <[hidden email]> On Behalf Of Stéphane Mottelet
Sent: Monday, October 15, 2018 2:36 PM
To: [hidden email]
Subject: Re: [Scilab-users] HDF5 save is super slow

 

Hello,

I looked a little bit in the sources: the evident bottleneck is the nested creation of an hdf5 group each time that a container variable is met.
For the given example, this is particularly evident. If you replace the syslin structure by the corresponding [A,B;C,D] matrix, then save is ten times faster:

N = 4;
n = 1000;
filters = list();
for i=1:n
  G=syslin('c', rand(N,N), rand(N,1), rand(1,N), rand(1,1));
  filters($+1) = G;
end
tic();
save('filters.dat', 'filters');
disp(toc());
--> disp(toc());

   0.724754

N = 4;
n = 1000;
filters = list()
for i=1:n
  G=syslin('c', rand(N,N), rand(N,1), rand(1,N), rand(1,1));
  filters($+1) = [G.a G.b;G.c G.d];
end
tic();
save('filters.dat', 'filters');
disp(toc());
--> disp(toc());

   0.082302

Serializing container objects seems to be the solution, but it goes towards an orthogonal direction w.r.t. the hdf5 portability spirit.

S.


Le 15/10/2018 à 12:22, Antoine Monmayrant a écrit :

Le 15/10/2018 à 11:55, Arvid Rosén a écrit :

Hi,

 

Thanks for getting back to me!

 

Unfortunately, we used Scilab’s pretty cool way of doing object orientation, so we have big nested tlist structures with multiple instances of various lists of filters and other structures, as in my example. Saving those structures in some explicit manual way would be extremely complicated. Or is there some way of writing explicit HDF5 saving/loading schemes using overloading? That would be great! I am sure we could find the main culprits and do something explicit for them, but as they can be located wherever in a big nested structure, it would be painful to do anything on the top level.

 

Another, related I guess, problem here is that the new file format uses about 15 times as much disk space as the old format (for a typical ill-behaved nested structure). That adds to the save/load time too I guess, but is probably not the main source here.

Argh, yes, I tested it and in your example, I have a file x8.5 bigger.
I think that both increases in time and size are real issues and should be reported as bugs.

By the way, I rewrote your script to run it under both 6.0 and 5.5:

/////////////////////////////////
N = 4;
n = 10000;
filters = list();

for i=1:n
  G=syslin('c', rand(N,N), rand(N,1), rand(1,N), rand(1,1));
  filters($+1) = G;
end
 
ver=getversion('scilab');

if ver(1)<6 then
    tic();
    save('filters_old.dat', filters);
    ts1 = toc();
else
    tic();
    save('filters_new.dat', 'filters');
    ts1 = toc();   
end
 
printf("Time for save %.2fs\n", ts1);
/////////////////////////////////

Hope it helps,

Antoine



 

I think I might have reported this earlier using Bugzilla, but I’m not sure. I’ll check and report it if not.

 

Cheers,

Arvid

 

From: users [hidden email] on behalf of [hidden email] [hidden email]
Reply-To: [hidden email] [hidden email], Users mailing list for Scilab [hidden email]
Date: Monday, 15 October 2018 at 11:08
To: [hidden email] [hidden email]
Subject: Re: [Scilab-users] HDF5 save is super slow

 

Hello,

I tried your code in 5.5.1 and the last nightly-build of 6.0: I see a slowdown of around 175 between old save in 5.5.1 and new (and only) save in 6.0.
It's really related to the data structure, because we use hdf5 read/write a lot here and did not experience significant slowdowns using 6.0.
I think the overhead might come to the translation of your fairly complex variable (a long array of tlist) in the corresponding hdf5 structure.
In the old save, this translation was not necessary.
Maybe you could try to save your data in a different way.
For example:
3) you could save each element of "filters" in a separate file.
2) you could bypass save and directly write your data in a hdf5 file by using h5open(), h5write() directly. It means you need to write your own load() for your custom file format. But this way, you can try to find the best way to layout your data in hdf5 format.
3) in addition to 2) you could try to save each entry of your "filters" array as one dataset in a given hdf5 file.

Did you search on bugzilla whether this bug was already submitted?
Could you try to report it?


Antoine

Le 15/10/2018 à 10:11, Arvid Rosén a écrit :

/////////////////////////////////

N = 4;

n = 10000;

 

filters = list();

 

for i=1:n

  G=syslin('c', rand(N,N), rand(N,1), rand(1,N), rand(1,1));

  filters($+1) = G;

end

 

tic();

save('filters.dat', filters);

ts1 = toc();

 

tic();

save('filters.dat', 'filters');

ts2 = toc();

 

printf("old save %.2fs\n", ts1);

printf("new save %.2fs\n", ts2);

printf("slowdown %.1f\n", ts2/ts1);

/////////////////////////////////

 

-- 
+++++++++++++++++++++++++++++++++++++++++++++++++++++++
 
 Antoine Monmayrant LAAS - CNRS
 7 avenue du Colonel Roche
 BP 54200
 31031 TOULOUSE Cedex 4
 FRANCE
 
 <a href="Tel:&#43;33">Tel:+33 5 61 33 64 59
 
 email : [hidden email]
 permanent email : [hidden email]
 
+++++++++++++++++++++++++++++++++++++++++++++++++++++++
 

 

-- 
+++++++++++++++++++++++++++++++++++++++++++++++++++++++
 
 Antoine Monmayrant LAAS - CNRS
 7 avenue du Colonel Roche
 BP 54200
 31031 TOULOUSE Cedex 4
 FRANCE
 
 <a href="Tel:&#43;33">Tel:+33 5 61 33 64 59
 
 email : [hidden email]
 permanent email : [hidden email]
 
+++++++++++++++++++++++++++++++++++++++++++++++++++++++
 





_______________________________________________
users mailing list
[hidden email]
https://antispam.utc.fr/proxy/1/c3RlcGhhbmUubW90dGVsZXRAdXRjLmZy/lists.scilab.org/mailman/listinfo/users

 

-- 
Stéphane Mottelet
Ingénieur de recherche
EA 4297 Transformations Intégrées de la Matière Renouvelable
Département Génie des Procédés Industriels
Sorbonne Universités - Université de Technologie de Compiègne
CS 60319, 60203 Compiègne cedex
Tel : +33(0)344234688
http://www.utc.fr/~mottelet

_______________________________________________
users mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/users
Antoine Monmayrant-2 Antoine Monmayrant-2
Reply | Threaded
Open this post in threaded view
|

Re: HDF5 save is super slow

Le 15/10/2018 à 18:17, Arvid Rosén a écrit :

Hi again,

 

I just filed a bug report here:

http://bugzilla.scilab.org/show_bug.cgi?id=15809

 

Would it be possible to bring back the old mem-dump approach in scilab 6?

Couldn't you create your own atom package that restore this raw memory dump for scilab 6.0?
I understand why we moved away from this model, but it seems to be key for you.
There is always a trade-off between portability (and robustness) and raw speed...

I mean, could I write a gateway that just takes a pointer to the first byte in memory, figures out the size, and dumps to disk? Or maybe it doesn’t work like that. Writing a JSON exporter for storing filter coefficients in a math software package seems a bit ridicules, but hey, if it works it might be worth it in our case.

I was also wondering whether this can be done in HDF5: ie do some serialization of your structure and dump it in hdf5?
We use hdf5 for Labview and for some horrible structures (like arrays of clusters containing lots of elements of different types), we just turn them into byte stream and dump the stream in an hdf5 dataset.
We then retrieve it and rebuild the structure (knowing its shape).
Could this be implemented in Scilab 6?
What could be missing, the any variable -> bytestream conversion and the way back?

Antoine

 

Cheers,

Arvid

 

 

From: users [hidden email] on behalf of Clément DAVID [hidden email]
Reply-To: Users mailing list for Scilab [hidden email]
Date: Monday, 15 October 2018 at 15:48
To: Users mailing list for Scilab [hidden email]
Cc: Clément David [hidden email]
Subject: Re: [Scilab-users] HDF5 save is super slow

 

Hello all,

 

Correct, I experienced such a slowness while working with Xcos diagrams for Scilab 5. At first we considered HDF5 for this deep nested list / mlist data-structure storage however after some tests ; XML might be used for tree-like storage and HDF5 (or Java types serialization) for big matrices.

 

AFAIK currently there is no easy way to load/save specifying a format other than HDF5 ; maybe adding xmlSave/xmlLoad sci_gateway to let the user select an xml file format for any Scilab structure might provide better performance on your use-case. JSON might also be another candidate to look at for decent serialization support.

 

PS: Scilab 5.5.1 load/save are direct memory dump so this is really the fastest you can get from Scilab ; HDF5 binary format is good enough for matrices

 

--

Clément

 

From: users [hidden email] On Behalf Of Stéphane Mottelet
Sent: Monday, October 15, 2018 2:36 PM
To: [hidden email]
Subject: Re: [Scilab-users] HDF5 save is super slow

 

Hello,

I looked a little bit in the sources: the evident bottleneck is the nested creation of an hdf5 group each time that a container variable is met.
For the given example, this is particularly evident. If you replace the syslin structure by the corresponding [A,B;C,D] matrix, then save is ten times faster:

N = 4;
n = 1000;
filters = list();
for i=1:n
  G=syslin('c', rand(N,N), rand(N,1), rand(1,N), rand(1,1));
  filters($+1) = G;
end
tic();
save('filters.dat', 'filters');
disp(toc());
--> disp(toc());

   0.724754

N = 4;
n = 1000;
filters = list()
for i=1:n
  G=syslin('c', rand(N,N), rand(N,1), rand(1,N), rand(1,1));
  filters($+1) = [G.a G.b;G.c G.d];
end
tic();
save('filters.dat', 'filters');
disp(toc());
--> disp(toc());

   0.082302

Serializing container objects seems to be the solution, but it goes towards an orthogonal direction w.r.t. the hdf5 portability spirit.

S.


Le 15/10/2018 à 12:22, Antoine Monmayrant a écrit :

Le 15/10/2018 à 11:55, Arvid Rosén a écrit :

Hi,

 

Thanks for getting back to me!

 

Unfortunately, we used Scilab’s pretty cool way of doing object orientation, so we have big nested tlist structures with multiple instances of various lists of filters and other structures, as in my example. Saving those structures in some explicit manual way would be extremely complicated. Or is there some way of writing explicit HDF5 saving/loading schemes using overloading? That would be great! I am sure we could find the main culprits and do something explicit for them, but as they can be located wherever in a big nested structure, it would be painful to do anything on the top level.

 

Another, related I guess, problem here is that the new file format uses about 15 times as much disk space as the old format (for a typical ill-behaved nested structure). That adds to the save/load time too I guess, but is probably not the main source here.

Argh, yes, I tested it and in your example, I have a file x8.5 bigger.
I think that both increases in time and size are real issues and should be reported as bugs.

By the way, I rewrote your script to run it under both 6.0 and 5.5:

/////////////////////////////////
N = 4;
n = 10000;
filters = list();

for i=1:n
  G=syslin('c', rand(N,N), rand(N,1), rand(1,N), rand(1,1));
  filters($+1) = G;
end
 
ver=getversion('scilab');

if ver(1)<6 then
    tic();
    save('filters_old.dat', filters);
    ts1 = toc();
else
    tic();
    save('filters_new.dat', 'filters');
    ts1 = toc();   
end
 
printf("Time for save %.2fs\n", ts1);
/////////////////////////////////

Hope it helps,

Antoine



 

I think I might have reported this earlier using Bugzilla, but I’m not sure. I’ll check and report it if not.

 

Cheers,

Arvid

 

From: users [hidden email] on behalf of [hidden email] [hidden email]
Reply-To: [hidden email] [hidden email], Users mailing list for Scilab [hidden email]
Date: Monday, 15 October 2018 at 11:08
To: [hidden email] [hidden email]
Subject: Re: [Scilab-users] HDF5 save is super slow

 

Hello,

I tried your code in 5.5.1 and the last nightly-build of 6.0: I see a slowdown of around 175 between old save in 5.5.1 and new (and only) save in 6.0.
It's really related to the data structure, because we use hdf5 read/write a lot here and did not experience significant slowdowns using 6.0.
I think the overhead might come to the translation of your fairly complex variable (a long array of tlist) in the corresponding hdf5 structure.
In the old save, this translation was not necessary.
Maybe you could try to save your data in a different way.
For example:
3) you could save each element of "filters" in a separate file.
2) you could bypass save and directly write your data in a hdf5 file by using h5open(), h5write() directly. It means you need to write your own load() for your custom file format. But this way, you can try to find the best way to layout your data in hdf5 format.
3) in addition to 2) you could try to save each entry of your "filters" array as one dataset in a given hdf5 file.

Did you search on bugzilla whether this bug was already submitted?
Could you try to report it?


Antoine

Le 15/10/2018 à 10:11, Arvid Rosén a écrit :

/////////////////////////////////

N = 4;

n = 10000;

 

filters = list();

 

for i=1:n

  G=syslin('c', rand(N,N), rand(N,1), rand(1,N), rand(1,1));

  filters($+1) = G;

end

 

tic();

save('filters.dat', filters);

ts1 = toc();

 

tic();

save('filters.dat', 'filters');

ts2 = toc();

 

printf("old save %.2fs\n", ts1);

printf("new save %.2fs\n", ts2);

printf("slowdown %.1f\n", ts2/ts1);

/////////////////////////////////

 

-- 
+++++++++++++++++++++++++++++++++++++++++++++++++++++++
 
 Antoine Monmayrant LAAS - CNRS
 7 avenue du Colonel Roche
 BP 54200
 31031 TOULOUSE Cedex 4
 FRANCE
 
 <a href="Tel:+33" moz-do-not-send="true">Tel:+33 5 61 33 64 59
 
 email : [hidden email]
 permanent email : [hidden email]
 
+++++++++++++++++++++++++++++++++++++++++++++++++++++++
 

 

-- 
+++++++++++++++++++++++++++++++++++++++++++++++++++++++
 
 Antoine Monmayrant LAAS - CNRS
 7 avenue du Colonel Roche
 BP 54200
 31031 TOULOUSE Cedex 4
 FRANCE
 
 <a href="Tel:+33" moz-do-not-send="true">Tel:+33 5 61 33 64 59
 
 email : [hidden email]
 permanent email : [hidden email]
 
+++++++++++++++++++++++++++++++++++++++++++++++++++++++
 





_______________________________________________
users mailing list
[hidden email]
https://antispam.utc.fr/proxy/1/c3RlcGhhbmUubW90dGVsZXRAdXRjLmZy/lists.scilab.org/mailman/listinfo/users

 

-- 
Stéphane Mottelet
Ingénieur de recherche
EA 4297 Transformations Intégrées de la Matière Renouvelable
Département Génie des Procédés Industriels
Sorbonne Universités - Université de Technologie de Compiègne
CS 60319, 60203 Compiègne cedex
Tel : +33(0)344234688
http://www.utc.fr/~mottelet


_______________________________________________
users mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/users


-- 
+++++++++++++++++++++++++++++++++++++++++++++++++++++++

 Antoine Monmayrant LAAS - CNRS
 7 avenue du Colonel Roche
 BP 54200
 31031 TOULOUSE Cedex 4
 FRANCE

 Tel:+33 5 61 33 64 59
 
 email : [hidden email]
 permanent email : [hidden email]

+++++++++++++++++++++++++++++++++++++++++++++++++++++++


_______________________________________________
users mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/users
Arvid Rosén Arvid Rosén
Reply | Threaded
Open this post in threaded view
|

Re: HDF5 save is super slow

From: users <[hidden email]> on behalf of "[hidden email]" <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>, Users mailing list for Scilab <[hidden email]>
Date: Tuesday, 16 October 2018 at 09:53
To: "[hidden email]" <[hidden email]>
Subject: Re: [Scilab-users] HDF5 save is super slow

 

Couldn't you create your own atom package that restore this raw memory dump for scilab 6.0?
I understand why we moved away from this model, but it seems to be key for you.
There is always a trade-off between portability (and robustness) and raw speed...

 

Yeah, if that was possible, I would certainly do it. We already have a bunch of C/C++ binaries that we compile and link dynamically, but for that to be easy to implement, I guess the lists and structures need to be stored linearly in one consecutive chunk of memory. I don’t know if that is the case. Anyone? C++ integrations and gateways are very poorly documented at the moment.

Otherwise, I would need to do some recursive implementation, that handles a bunch of different object types. Sounds painful.

 

Cheers,

Arvid


_______________________________________________
users mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/users
Clément David-2 Clément David-2
Reply | Threaded
Open this post in threaded view
|

Re: HDF5 save is super slow

Hello,

 

My 2cents, this is probably a poor man’s approach but Xcos offers vec2var / var2vec functions that encode in a double vector any Scilab datatypes passed as arguments. The encoding duplicates the data in memory so there might be some overhead.

 

On my machine, I have these timings using the attached script (Antoine’s one edited):

save list of syslins: 1.361704

save list of vec[]: 0.056788

save var2vec(list of syslins): 0.014411

 

Discarding hdf5 groups creation is a huge performance win but remove any way to create clean hdf5 (eg. to address subgroups directly).

 

Thanks,

 

--

Clément

 

From: users <[hidden email]> On Behalf Of Arvid Rosén
Sent: Tuesday, October 16, 2018 1:01 PM
To: [hidden email]; Users mailing list for Scilab <[hidden email]>
Subject: Re: [Scilab-users] HDF5 save is super slow

 

From: users <[hidden email]> on behalf of "[hidden email]" <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>, Users mailing list for Scilab <[hidden email]>
Date: Tuesday, 16 October 2018 at 09:53
To: "[hidden email]" <[hidden email]>
Subject: Re: [Scilab-users] HDF5 save is super slow

 

Couldn't you create your own atom package that restore this raw memory dump for scilab 6.0?
I understand why we moved away from this model, but it seems to be key for you.
There is always a trade-off between portability (and robustness) and raw speed...

 

Yeah, if that was possible, I would certainly do it. We already have a bunch of C/C++ binaries that we compile and link dynamically, but for that to be easy to implement, I guess the lists and structures need to be stored linearly in one consecutive chunk of memory. I don’t know if that is the case. Anyone? C++ integrations and gateways are very poorly documented at the moment.

Otherwise, I would need to do some recursive implementation, that handles a bunch of different object types. Sounds painful.

 

Cheers,

Arvid


_______________________________________________
users mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/users

sample.sce (1K) Download Attachment
mottelet mottelet
Reply | Threaded
Open this post in threaded view
|

Re: HDF5 save is super slow

Hello Clément,

Le 18/10/2018 à 14:09, Clément DAVID a écrit :

Hello,

 

My 2cents, this is probably a poor man’s approach but Xcos offers vec2var / var2vec functions that encode in a double vector any Scilab datatypes passed as arguments. The encoding duplicates the data in memory so there might be some overhead.

Do you think it would be complicated to continuously write the serialized data on the disk ?

 

On my machine, I have these timings using the attached script (Antoine’s one edited):

save list of syslins: 1.361704

save list of vec[]: 0.056788

save var2vec(list of syslins): 0.014411

 

Discarding hdf5 groups creation is a huge performance win but remove any way to create clean hdf5 (eg. to address subgroups directly).

 

Thanks,

 

--

Clément

 

From: users [hidden email] On Behalf Of Arvid Rosén
Sent: Tuesday, October 16, 2018 1:01 PM
To: [hidden email]; Users mailing list for Scilab [hidden email]
Subject: Re: [Scilab-users] HDF5 save is super slow

 

From: users <[hidden email]> on behalf of "[hidden email]" <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>, Users mailing list for Scilab <[hidden email]>
Date: Tuesday, 16 October 2018 at 09:53
To: "[hidden email]" <[hidden email]>
Subject: Re: [Scilab-users] HDF5 save is super slow

 

Couldn't you create your own atom package that restore this raw memory dump for scilab 6.0?
I understand why we moved away from this model, but it seems to be key for you.
There is always a trade-off between portability (and robustness) and raw speed...

 

Yeah, if that was possible, I would certainly do it. We already have a bunch of C/C++ binaries that we compile and link dynamically, but for that to be easy to implement, I guess the lists and structures need to be stored linearly in one consecutive chunk of memory. I don’t know if that is the case. Anyone? C++ integrations and gateways are very poorly documented at the moment.

Otherwise, I would need to do some recursive implementation, that handles a bunch of different object types. Sounds painful.

 

Cheers,

Arvid



_______________________________________________
users mailing list
[hidden email]
https://antispam.utc.fr/proxy/1/c3RlcGhhbmUubW90dGVsZXRAdXRjLmZy/lists.scilab.org/mailman/listinfo/users


-- 
Stéphane Mottelet
Ingénieur de recherche
EA 4297 Transformations Intégrées de la Matière Renouvelable
Département Génie des Procédés Industriels
Sorbonne Universités - Université de Technologie de Compiègne
CS 60319, 60203 Compiègne cedex
Tel : +33(0)344234688
http://www.utc.fr/~mottelet

_______________________________________________
users mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/users
Antoine Monmayrant Antoine Monmayrant
Reply | Threaded
Open this post in threaded view
|

Re: HDF5 save is super slow

In reply to this post by Clément David-2


Le 18/10/2018 à 14:09, Clément DAVID a écrit :
> Hello,
>
> My 2cents, this is probably a poor man’s approach but Xcos offers vec2var / var2vec functions that encode in a double vector any Scilab datatypes passed as arguments. The encoding duplicates the data in memory so there might be some overhead.
Er, I tried var2vec, but it does not work with structures:
--> typeof(t)
  ans  =
  st

--> var2vec(t)
var2vec: Wrong type for input argument #1: Double, Integer, Boolean,
String or List type.

Arghh... so var2vec does not work for any datatype right?

Antoine

>
> On my machine, I have these timings using the attached script (Antoine’s one edited):
> save list of syslins: 1.361704
> save list of vec[]: 0.056788
> save var2vec(list of syslins): 0.014411
>
> Discarding hdf5 groups creation is a huge performance win but remove any way to create clean hdf5 (eg. to address subgroups directly).
>
> Thanks,
>
> --
> Clément
>
> From: users <[hidden email]> On Behalf Of Arvid Rosén
> Sent: Tuesday, October 16, 2018 1:01 PM
> To: [hidden email]; Users mailing list for Scilab <[hidden email]>
> Subject: Re: [Scilab-users] HDF5 save is super slow
>
> From: users <[hidden email]<mailto:[hidden email]>> on behalf of "[hidden email]<mailto:[hidden email]>" <[hidden email]<mailto:[hidden email]>>
> Reply-To: "[hidden email]<mailto:[hidden email]>" <[hidden email]<mailto:[hidden email]>>, Users mailing list for Scilab <[hidden email]<mailto:[hidden email]>>
> Date: Tuesday, 16 October 2018 at 09:53
> To: "[hidden email]<mailto:[hidden email]>" <[hidden email]<mailto:[hidden email]>>
> Subject: Re: [Scilab-users] HDF5 save is super slow
>
> Couldn't you create your own atom package that restore this raw memory dump for scilab 6.0?
> I understand why we moved away from this model, but it seems to be key for you.
> There is always a trade-off between portability (and robustness) and raw speed...
>
> Yeah, if that was possible, I would certainly do it. We already have a bunch of C/C++ binaries that we compile and link dynamically, but for that to be easy to implement, I guess the lists and structures need to be stored linearly in one consecutive chunk of memory. I don’t know if that is the case. Anyone? C++ integrations and gateways are very poorly documented at the moment.
> Otherwise, I would need to do some recursive implementation, that handles a bunch of different object types. Sounds painful.
>
> Cheers,
> Arvid

_______________________________________________
users mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/users
Clément David-3 Clément David-3
Reply | Threaded
Open this post in threaded view
|

Re: HDF5 save is super slow

Hi Antoine,

That one point, vec2var has been defined to pass some datatypes from Scilab "ast" (C++ side, data pointers, refcounted) to Scilab "scicos" (C, raw memory allocated once and passed around). Some data structures might not be handled correctly, I was even surprised that mlists worked correctly.

Scilab Struct (or Cell) are missing as they are more complex datatypes to serialize. Handle are even harder (as you need to list the properties somewhere). Feel free to take a look at the code [1],

[1]: http://cgit.scilab.org/scilab/tree/scilab/modules/scicos/src/cpp/var2vec.cpp?h=6.0#n243

Cheers,
--
Clément

-----Original Message-----
From: antoine monmayrant <[hidden email]>
Sent: Thursday, October 18, 2018 2:47 PM
To: Clément DAVID <[hidden email]>; Users mailing list for Scilab <[hidden email]>
Cc: Clément David <[hidden email]>
Subject: Re: [Scilab-users] HDF5 save is super slow



Le 18/10/2018 à 14:09, Clément DAVID a écrit :
> Hello,
>
> My 2cents, this is probably a poor man’s approach but Xcos offers vec2var / var2vec functions that encode in a double vector any Scilab datatypes passed as arguments. The encoding duplicates the data in memory so there might be some overhead.
Er, I tried var2vec, but it does not work with structures:
--> typeof(t)
  ans  =
  st

--> var2vec(t)
var2vec: Wrong type for input argument #1: Double, Integer, Boolean, String or List type.

Arghh... so var2vec does not work for any datatype right?

Antoine

>
> On my machine, I have these timings using the attached script (Antoine’s one edited):
> save list of syslins: 1.361704
> save list of vec[]: 0.056788
> save var2vec(list of syslins): 0.014411
>
> Discarding hdf5 groups creation is a huge performance win but remove any way to create clean hdf5 (eg. to address subgroups directly).
>
> Thanks,
>
> --
> Clément
>
> From: users <[hidden email]> On Behalf Of Arvid Rosén
> Sent: Tuesday, October 16, 2018 1:01 PM
> To: [hidden email]; Users mailing list for Scilab
> <[hidden email]>
> Subject: Re: [Scilab-users] HDF5 save is super slow
>
> From: users
> <[hidden email]<mailto:[hidden email]>
> > on behalf of "[hidden email]<mailto:[hidden email]>"
> <[hidden email]<mailto:[hidden email]>>
> Reply-To:
> "[hidden email]<mailto:[hidden email]>"
> <[hidden email]<mailto:[hidden email]>>, Users
> mailing list for Scilab
> <[hidden email]<mailto:[hidden email]>>
> Date: Tuesday, 16 October 2018 at 09:53
> To: "[hidden email]<mailto:[hidden email]>"
> <[hidden email]<mailto:[hidden email]>>
> Subject: Re: [Scilab-users] HDF5 save is super slow
>
> Couldn't you create your own atom package that restore this raw memory dump for scilab 6.0?
> I understand why we moved away from this model, but it seems to be key for you.
> There is always a trade-off between portability (and robustness) and raw speed...
>
> Yeah, if that was possible, I would certainly do it. We already have a bunch of C/C++ binaries that we compile and link dynamically, but for that to be easy to implement, I guess the lists and structures need to be stored linearly in one consecutive chunk of memory. I don’t know if that is the case. Anyone? C++ integrations and gateways are very poorly documented at the moment.
> Otherwise, I would need to do some recursive implementation, that handles a bunch of different object types. Sounds painful.
>
> Cheers,
> Arvid

_______________________________________________
users mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/users
mottelet mottelet
Reply | Threaded
Open this post in threaded view
|

Re: HDF5 save is super slow

Hello again,

Le 18/10/2018 à 14:56, Clément David a écrit :
> Hi Antoine,
>
> That one point, vec2var has been defined to pass some datatypes from Scilab "ast" (C++ side, data pointers, refcounted) to Scilab "scicos" (C, raw memory allocated once and passed around). Some data structures might not be handled correctly, I was even surprised that mlists worked correctly.
>
> Scilab Struct (or Cell) are missing as they are more complex datatypes to serialize. Handle are even harder (as you need to list the properties somewhere). Feel free to take a look at the code [1],
>
> [1]: https://antispam.utc.fr/proxy/1/c3RlcGhhbmUubW90dGVsZXRAdXRjLmZy/cgit.scilab.org/scilab/tree/scilab/modules/scicos/src/cpp/var2vec.cpp?h=6.0#n243
Why is the code for structs (lines 242--74)  commented out ? Is it
broken or else ?

> Cheers,
> --
> Clément
>
> -----Original Message-----
> From: antoine monmayrant <[hidden email]>
> Sent: Thursday, October 18, 2018 2:47 PM
> To: Clément DAVID <[hidden email]>; Users mailing list for Scilab <[hidden email]>
> Cc: Clément David <[hidden email]>
> Subject: Re: [Scilab-users] HDF5 save is super slow
>
>
>
> Le 18/10/2018 à 14:09, Clément DAVID a écrit :
>> Hello,
>>
>> My 2cents, this is probably a poor man’s approach but Xcos offers vec2var / var2vec functions that encode in a double vector any Scilab datatypes passed as arguments. The encoding duplicates the data in memory so there might be some overhead.
> Er, I tried var2vec, but it does not work with structures:
> --> typeof(t)
>    ans  =
>    st
>
> --> var2vec(t)
> var2vec: Wrong type for input argument #1: Double, Integer, Boolean, String or List type.
>
> Arghh... so var2vec does not work for any datatype right?
>
> Antoine
>> On my machine, I have these timings using the attached script (Antoine’s one edited):
>> save list of syslins: 1.361704
>> save list of vec[]: 0.056788
>> save var2vec(list of syslins): 0.014411
>>
>> Discarding hdf5 groups creation is a huge performance win but remove any way to create clean hdf5 (eg. to address subgroups directly).
>>
>> Thanks,
>>
>> --
>> Clément
>>
>> From: users <[hidden email]> On Behalf Of Arvid Rosén
>> Sent: Tuesday, October 16, 2018 1:01 PM
>> To: [hidden email]; Users mailing list for Scilab
>> <[hidden email]>
>> Subject: Re: [Scilab-users] HDF5 save is super slow
>>
>> From: users
>> <[hidden email]<mailto:[hidden email]>
>>> on behalf of "[hidden email]<mailto:[hidden email]>"
>> <[hidden email]<mailto:[hidden email]>>
>> Reply-To:
>> "[hidden email]<mailto:[hidden email]>"
>> <[hidden email]<mailto:[hidden email]>>, Users
>> mailing list for Scilab
>> <[hidden email]<mailto:[hidden email]>>
>> Date: Tuesday, 16 October 2018 at 09:53
>> To: "[hidden email]<mailto:[hidden email]>"
>> <[hidden email]<mailto:[hidden email]>>
>> Subject: Re: [Scilab-users] HDF5 save is super slow
>>
>> Couldn't you create your own atom package that restore this raw memory dump for scilab 6.0?
>> I understand why we moved away from this model, but it seems to be key for you.
>> There is always a trade-off between portability (and robustness) and raw speed...
>>
>> Yeah, if that was possible, I would certainly do it. We already have a bunch of C/C++ binaries that we compile and link dynamically, but for that to be easy to implement, I guess the lists and structures need to be stored linearly in one consecutive chunk of memory. I don’t know if that is the case. Anyone? C++ integrations and gateways are very poorly documented at the moment.
>> Otherwise, I would need to do some recursive implementation, that handles a bunch of different object types. Sounds painful.
>>
>> Cheers,
>> Arvid
> _______________________________________________
> users mailing list
> [hidden email]
> https://antispam.utc.fr/proxy/1/c3RlcGhhbmUubW90dGVsZXRAdXRjLmZy/lists.scilab.org/mailman/listinfo/users


--
Stéphane Mottelet
Ingénieur de recherche
EA 4297 Transformations Intégrées de la Matière Renouvelable
Département Génie des Procédés Industriels
Sorbonne Universités - Université de Technologie de Compiègne
CS 60319, 60203 Compiègne cedex
Tel : +33(0)344234688
http://www.utc.fr/~mottelet

_______________________________________________
users mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/users
Antoine Monmayrant-2 Antoine Monmayrant-2
Reply | Threaded
Open this post in threaded view
|

Re: HDF5 save is super slow

Le 18/10/2018 à 15:00, Stéphane Mottelet a écrit :

> Hello again,
>
> Le 18/10/2018 à 14:56, Clément David a écrit :
>> Hi Antoine,
>>
>> That one point, vec2var has been defined to pass some datatypes from
>> Scilab "ast" (C++ side, data pointers, refcounted) to Scilab "scicos"
>> (C, raw memory allocated once and passed around). Some data
>> structures might not be handled correctly, I was even surprised that
>> mlists worked correctly.
>>
>> Scilab Struct (or Cell) are missing as they are more complex
>> datatypes to serialize. Handle are even harder (as you need to list
>> the properties somewhere). Feel free to take a look at the code [1],
>>
>> [1]:
>> https://antispam.utc.fr/proxy/1/c3RlcGhhbmUubW90dGVsZXRAdXRjLmZy/cgit.scilab.org/scilab/tree/scilab/modules/scicos/src/cpp/var2vec.cpp?h=6.0#n243
> Why is the code for structs (lines 242--74)  commented out ? Is it
> broken or else ?
If var2vec() / vec2var() could be extended to provide a universal way to
serialize / deserialize really any scilab variable, that would be really
nice.
Could we make a SEP or fill a bug as a wish ?

Antoine

>
>> Cheers,
>> --
>> Clément
>>
>> -----Original Message-----
>> From: antoine monmayrant <[hidden email]>
>> Sent: Thursday, October 18, 2018 2:47 PM
>> To: Clément DAVID <[hidden email]>; Users
>> mailing list for Scilab <[hidden email]>
>> Cc: Clément David <[hidden email]>
>> Subject: Re: [Scilab-users] HDF5 save is super slow
>>
>>
>>
>> Le 18/10/2018 à 14:09, Clément DAVID a écrit :
>>> Hello,
>>>
>>> My 2cents, this is probably a poor man’s approach but Xcos offers
>>> vec2var / var2vec functions that encode in a double vector any
>>> Scilab datatypes passed as arguments. The encoding duplicates the
>>> data in memory so there might be some overhead.
>> Er, I tried var2vec, but it does not work with structures:
>> --> typeof(t)
>>    ans  =
>>    st
>>
>> --> var2vec(t)
>> var2vec: Wrong type for input argument #1: Double, Integer, Boolean,
>> String or List type.
>>
>> Arghh... so var2vec does not work for any datatype right?
>>
>> Antoine
>>> On my machine, I have these timings using the attached script
>>> (Antoine’s one edited):
>>> save list of syslins: 1.361704
>>> save list of vec[]: 0.056788
>>> save var2vec(list of syslins): 0.014411
>>>
>>> Discarding hdf5 groups creation is a huge performance win but remove
>>> any way to create clean hdf5 (eg. to address subgroups directly).
>>>
>>> Thanks,
>>>
>>> --
>>> Clément
>>>
>>> From: users <[hidden email]> On Behalf Of Arvid Rosén
>>> Sent: Tuesday, October 16, 2018 1:01 PM
>>> To: [hidden email]; Users mailing list for Scilab
>>> <[hidden email]>
>>> Subject: Re: [Scilab-users] HDF5 save is super slow
>>>
>>> From: users
>>> <[hidden email]<mailto:[hidden email]>
>>>> on behalf of "[hidden email]<mailto:[hidden email]>"
>>> <[hidden email]<mailto:[hidden email]>>
>>> Reply-To:
>>> "[hidden email]<mailto:[hidden email]>"
>>> <[hidden email]<mailto:[hidden email]>>, Users
>>> mailing list for Scilab
>>> <[hidden email]<mailto:[hidden email]>>
>>> Date: Tuesday, 16 October 2018 at 09:53
>>> To: "[hidden email]<mailto:[hidden email]>"
>>> <[hidden email]<mailto:[hidden email]>>
>>> Subject: Re: [Scilab-users] HDF5 save is super slow
>>>
>>> Couldn't you create your own atom package that restore this raw
>>> memory dump for scilab 6.0?
>>> I understand why we moved away from this model, but it seems to be
>>> key for you.
>>> There is always a trade-off between portability (and robustness) and
>>> raw speed...
>>>
>>> Yeah, if that was possible, I would certainly do it. We already have
>>> a bunch of C/C++ binaries that we compile and link dynamically, but
>>> for that to be easy to implement, I guess the lists and structures
>>> need to be stored linearly in one consecutive chunk of memory. I
>>> don’t know if that is the case. Anyone? C++ integrations and
>>> gateways are very poorly documented at the moment.
>>> Otherwise, I would need to do some recursive implementation, that
>>> handles a bunch of different object types. Sounds painful.
>>>
>>> Cheers,
>>> Arvid
>> _______________________________________________
>> users mailing list
>> [hidden email]
>> https://antispam.utc.fr/proxy/1/c3RlcGhhbmUubW90dGVsZXRAdXRjLmZy/lists.scilab.org/mailman/listinfo/users 
>>
>
>

--
+++++++++++++++++++++++++++++++++++++++++++++++++++++++

  Antoine Monmayrant LAAS - CNRS
  7 avenue du Colonel Roche
  BP 54200
  31031 TOULOUSE Cedex 4
  FRANCE

  Tel:+33 5 61 33 64 59
 
  email : [hidden email]
  permanent email : [hidden email]

+++++++++++++++++++++++++++++++++++++++++++++++++++++++

_______________________________________________
users mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/users
Clément David-2 Clément David-2
Reply | Threaded
Open this post in threaded view
|

Re: HDF5 save is super slow

In reply to this post by mottelet

Hello Stephane,

 

TL ;DR ; HDF5 is a cross-platform, cross-language, portable file format used in almost all scientific software these days. Please use this sane default !

 

Writing a custom serialization scheme (like the one provided by vec2var / var2vec) might not be complicated to implement however the hard part is maintaining and describing a serialization format to be used in the long term.

 

Using Scilab 5, the “stack” save and load functions were almost trivial as they are directly mapped from memory to disk; the format used is “the stack” so it is known and used everywhere (even for custom string encoding). This vec2var serialization is only used internally (to pass block parameters around), does not respect any described format nor validate against any documentation and is not portable; in the long term, I won’t promise it to be stable. Implementing your own serialization scheme will probably lead your software into trouble. Really, it isn’t easy in the long term! The HDF5 format is described, its serialized data are browsable (through hdfview) and does not cope with low-level requirements.

 

To me, the issue is really a performance bug. We might find a way to fix it within Scilab rather than provide a workaround (with custom encodings). The hdf5 library is a bug one, maybe with a clever understanding of its internal serialization, we might find a better execution path for this use-case (without changing the file format).

 

Thanks,

 

--

Clément

 

From: users <[hidden email]> On Behalf Of Stéphane Mottelet
Sent: Thursday, October 18, 2018 2:39 PM
To: [hidden email]
Subject: Re: [Scilab-users] HDF5 save is super slow

 

Hello Clément,

Le 18/10/2018 à 14:09, Clément DAVID a écrit :

Hello,

 

My 2cents, this is probably a poor man’s approach but Xcos offers vec2var / var2vec functions that encode in a double vector any Scilab datatypes passed as arguments. The encoding duplicates the data in memory so there might be some overhead.

Do you think it would be complicated to continuously write the serialized data on the disk ?

 

On my machine, I have these timings using the attached script (Antoine’s one edited):

save list of syslins: 1.361704

save list of vec[]: 0.056788

save var2vec(list of syslins): 0.014411

 

Discarding hdf5 groups creation is a huge performance win but remove any way to create clean hdf5 (eg. to address subgroups directly).

 

Thanks,

 

--

Clément

 

From: users [hidden email] On Behalf Of Arvid Rosén
Sent: Tuesday, October 16, 2018 1:01 PM
To: [hidden email]; Users mailing list for Scilab [hidden email]
Subject: Re: [Scilab-users] HDF5 save is super slow

 

From: users <[hidden email]> on behalf of "[hidden email]" <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>, Users mailing list for Scilab <[hidden email]>
Date: Tuesday, 16 October 2018 at 09:53
To: "[hidden email]" <[hidden email]>
Subject: Re: [Scilab-users] HDF5 save is super slow

 

Couldn't you create your own atom package that restore this raw memory dump for scilab 6.0?
I understand why we moved away from this model, but it seems to be key for you.
There is always a trade-off between portability (and robustness) and raw speed...

 

Yeah, if that was possible, I would certainly do it. We already have a bunch of C/C++ binaries that we compile and link dynamically, but for that to be easy to implement, I guess the lists and structures need to be stored linearly in one consecutive chunk of memory. I don’t know if that is the case. Anyone? C++ integrations and gateways are very poorly documented at the moment.

Otherwise, I would need to do some recursive implementation, that handles a bunch of different object types. Sounds painful.

 

Cheers,

Arvid




_______________________________________________
users mailing list
[hidden email]
https://antispam.utc.fr/proxy/1/c3RlcGhhbmUubW90dGVsZXRAdXRjLmZy/lists.scilab.org/mailman/listinfo/users

 

-- 
Stéphane Mottelet
Ingénieur de recherche
EA 4297 Transformations Intégrées de la Matière Renouvelable
Département Génie des Procédés Industriels
Sorbonne Universités - Université de Technologie de Compiègne
CS 60319, 60203 Compiègne cedex
Tel : +33(0)344234688
http://www.utc.fr/~mottelet

_______________________________________________
users mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/users
Clément David-2 Clément David-2
Reply | Threaded
Open this post in threaded view
|

Re: HDF5 save is super slow

In reply to this post by mottelet
Hello Stephane,

Probably commented out as we have no easy way to extract such data easily using only C constructs (from a Scicos block). It might be possible to uncomment and check the counterpart side (vec2var.cpp) to ensure it works correctly.

Thanks,

--
Clément

-----Original Message-----
From: users <[hidden email]> On Behalf Of Stéphane Mottelet
Sent: Thursday, October 18, 2018 3:01 PM
To: [hidden email]
Subject: Re: [Scilab-users] HDF5 save is super slow

Hello again,

Le 18/10/2018 à 14:56, Clément David a écrit :

> Hi Antoine,
>
> That one point, vec2var has been defined to pass some datatypes from Scilab "ast" (C++ side, data pointers, refcounted) to Scilab "scicos" (C, raw memory allocated once and passed around). Some data structures might not be handled correctly, I was even surprised that mlists worked correctly.
>
> Scilab Struct (or Cell) are missing as they are more complex datatypes
> to serialize. Handle are even harder (as you need to list the
> properties somewhere). Feel free to take a look at the code [1],
>
> [1]:
> https://antispam.utc.fr/proxy/1/c3RlcGhhbmUubW90dGVsZXRAdXRjLmZy/cgit.
> scilab.org/scilab/tree/scilab/modules/scicos/src/cpp/var2vec.cpp?h=6.0
> #n243
Why is the code for structs (lines 242--74)  commented out ? Is it broken or else ?

> Cheers,
> --
> Clément
>
> -----Original Message-----
> From: antoine monmayrant <[hidden email]>
> Sent: Thursday, October 18, 2018 2:47 PM
> To: Clément DAVID <[hidden email]>; Users
> mailing list for Scilab <[hidden email]>
> Cc: Clément David <[hidden email]>
> Subject: Re: [Scilab-users] HDF5 save is super slow
>
>
>
> Le 18/10/2018 à 14:09, Clément DAVID a écrit :
>> Hello,
>>
>> My 2cents, this is probably a poor man’s approach but Xcos offers vec2var / var2vec functions that encode in a double vector any Scilab datatypes passed as arguments. The encoding duplicates the data in memory so there might be some overhead.
> Er, I tried var2vec, but it does not work with structures:
> --> typeof(t)
>    ans  =
>    st
>
> --> var2vec(t)
> var2vec: Wrong type for input argument #1: Double, Integer, Boolean, String or List type.
>
> Arghh... so var2vec does not work for any datatype right?
>
> Antoine
>> On my machine, I have these timings using the attached script (Antoine’s one edited):
>> save list of syslins: 1.361704
>> save list of vec[]: 0.056788
>> save var2vec(list of syslins): 0.014411
>>
>> Discarding hdf5 groups creation is a huge performance win but remove any way to create clean hdf5 (eg. to address subgroups directly).
>>
>> Thanks,
>>
>> --
>> Clément
>>
>> From: users <[hidden email]> On Behalf Of Arvid Rosén
>> Sent: Tuesday, October 16, 2018 1:01 PM
>> To: [hidden email]; Users mailing list for Scilab
>> <[hidden email]>
>> Subject: Re: [Scilab-users] HDF5 save is super slow
>>
>> From: users
>> <[hidden email]<mailto:[hidden email]
>> >
>>> on behalf of "[hidden email]<mailto:[hidden email]>"
>> <[hidden email]<mailto:[hidden email]>>
>> Reply-To:
>> "[hidden email]<mailto:[hidden email]>"
>> <[hidden email]<mailto:[hidden email]>>,
>> Users mailing list for Scilab
>> <[hidden email]<mailto:[hidden email]>>
>> Date: Tuesday, 16 October 2018 at 09:53
>> To: "[hidden email]<mailto:[hidden email]>"
>> <[hidden email]<mailto:[hidden email]>>
>> Subject: Re: [Scilab-users] HDF5 save is super slow
>>
>> Couldn't you create your own atom package that restore this raw memory dump for scilab 6.0?
>> I understand why we moved away from this model, but it seems to be key for you.
>> There is always a trade-off between portability (and robustness) and raw speed...
>>
>> Yeah, if that was possible, I would certainly do it. We already have a bunch of C/C++ binaries that we compile and link dynamically, but for that to be easy to implement, I guess the lists and structures need to be stored linearly in one consecutive chunk of memory. I don’t know if that is the case. Anyone? C++ integrations and gateways are very poorly documented at the moment.
>> Otherwise, I would need to do some recursive implementation, that handles a bunch of different object types. Sounds painful.
>>
>> Cheers,
>> Arvid
> _______________________________________________
> users mailing list
> [hidden email]
> https://antispam.utc.fr/proxy/1/c3RlcGhhbmUubW90dGVsZXRAdXRjLmZy/lists
> .scilab.org/mailman/listinfo/users


--
Stéphane Mottelet
Ingénieur de recherche
EA 4297 Transformations Intégrées de la Matière Renouvelable Département Génie des Procédés Industriels Sorbonne Universités - Université de Technologie de Compiègne CS 60319, 60203 Compiègne cedex Tel : +33(0)344234688 http://www.utc.fr/~mottelet

_______________________________________________
users mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/users
_______________________________________________
users mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/users
12