Machine Learning Toolbox

classic Classic list List threaded Threaded
13 messages Options
Caioc2 Caioc2
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Machine Learning Toolbox

Hi,


I have been thinking about the usability of the toolbox and independent of which algorithms we are going to have, would be interesting to have some simplified structure (like TensorFlow).

Despite it being a lot of work to have such structure, (data, model, cost function, minimizer), it would make the toolbox easy to use and extend, having minimum impact to the usability.

IMHO, this is something that should be defined before any coding starts, and also well explained to the student.

I would like to hear from you what do you think, so we can start a discussion.


Best,
Caio SOUZA

_______________________________________________
dev mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/dev
Amanda Osvaldo Amanda Osvaldo
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Machine Learning Toolbox

A Idea.

Bindings for Machine Learning Frameworks, not necessary a Full Machine Learning implementation.

Intel, for example, in GitHub have a optimized Theano implementation for Intel Xeon and Intel Xeon Phi processors.

Bind SciLab with a Full and Optimized Machine Learning Implementation can allow users to use Scilab from the prototyping to the deploy of the production software.

-- Amanda Osvaldo


On Wed, 2017-04-26 at 14:32 -0300, Caio Souza wrote:
Hi,


I have been thinking about the usability of the toolbox and independent of which algorithms we are going to have, would be interesting to have some simplified structure (like TensorFlow).

Despite it being a lot of work to have such structure, (data, model, cost function, minimizer), it would make the toolbox easy to use and extend, having minimum impact to the usability.

IMHO, this is something that should be defined before any coding starts, and also well explained to the student.

I would like to hear from you what do you think, so we can start a discussion.


Best,
Caio SOUZA
_______________________________________________
dev mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/dev

_______________________________________________
dev mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/dev
Amanda Osvaldo Amanda Osvaldo
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Machine Learning Toolbox

In reply to this post by Caioc2
Hi Caio, sorry for the late.

I think we should ask ourselves what SciLAB's focus and what audience are.
I feel a lack of knowing what users of Scilab seek.

Me, for example, I want to do everything from protyping to running the script on hundreds of Intel Xeon servers with the least possible effort.
Even with less effort than it would have if the script were built in Python.

I am sure that new data structures will expand the use of SciLAB.

But what advantage will this bring to users?
Python, as example, have already optimized data structures and libraries.

-- Amanda Osvaldo


On Wed, 2017-04-26 at 14:32 -0300, Caio Souza wrote:
Hi,


I have been thinking about the usability of the toolbox and independent of which algorithms we are going to have, would be interesting to have some simplified structure (like TensorFlow).

Despite it being a lot of work to have such structure, (data, model, cost function, minimizer), it would make the toolbox easy to use and extend, having minimum impact to the usability.

IMHO, this is something that should be defined before any coding starts, and also well explained to the student.

I would like to hear from you what do you think, so we can start a discussion.


Best,
Caio SOUZA
_______________________________________________
dev mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/dev

_______________________________________________
dev mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/dev
Caioc2 Caioc2
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Machine Learning Toolbox

Hi,

Yann Debray, I think it is great.

Amanda Osvaldo, My first thought is about simplicity and organization, I totaly agree the less effort needed to use it the better. Besides I citing tensorflow as example, my idea is not to bring new data structures, but follow their organization.

We would pass the data and a model(function) and let the scilab work.

Writing it from scratch we have the freedom to make it the way we want, ofcourse the performance wouldnt match with the best libraries out there. In the other hand using a library we could have much more performance, but some restrictions in the organization and simplicity.

I'm open to any approach, because I'm a bit suspect to speak about what the users want/need. :)


On Thu, May 18, 2017 at 1:01 PM, Yann Debray <[hidden email]> wrote:

Dear Caio, Dhruv and Amanda,

 

I would like to include my colleague Philippe Saadé to the exchanges on Machine Learning for Scilab.

He is an experienced mathematician working with us at ESI Group, and has an interesting vision on the subject.

He will be scientific advisor and mentor for a joint internship on Machine learning starting mid june.

 

[hidden email]: Could you maybe share with us your view on the subject?  

 

We can keep this exchange public if it is alright with you all, since I believe our success on the subject will depend on our capacity to centralize and merge our community efforts.

You can all collaborate on the project on our forge:

http://forge.scilab.org/index.php/p/machine-learning-toolbox/

 

Yours

Yann @ Scilab

 

De : Amanda Osvaldo <[hidden email]>
Date : vendredi 28 avril 2017 à 01:03
À : List dedicated to the development of Scilab <[hidden email]>, Yann Debray <[hidden email]>, Dhruv Khattar <[hidden email]>
Objet : Re: [Scilab-Dev] Machine Learning Toolbox

 

Hi Caio, sorry for the late.

 

I think we should ask ourselves what SciLAB's focus and what audience are.

I feel a lack of knowing what users of Scilab seek.

 

Me, for example, I want to do everything from protyping to running the script on hundreds of Intel Xeon servers with the least possible effort.

Even with less effort than it would have if the script were built in Python.

 

I am sure that new data structures will expand the use of SciLAB.

 

But what advantage will this bring to users?

Python, as example, have already optimized data structures and libraries.

 

-- Amanda Osvaldo

 

 

On Wed, 2017-04-26 at 14:32 -0300, Caio Souza wrote:

Hi,

 

 

I have been thinking about the usability of the toolbox and independent of which algorithms we are going to have, would be interesting to have some simplified structure (like TensorFlow).

 

Despite it being a lot of work to have such structure, (data, model, cost function, minimizer), it would make the toolbox easy to use and extend, having minimum impact to the usability.

 

IMHO, this is something that should be defined before any coding starts, and also well explained to the student.

 

I would like to hear from you what do you think, so we can start a discussion.

 

 

Best,

Caio SOUZA

_______________________________________________
dev mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/dev


_______________________________________________
dev mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/dev
Amanda Osvaldo Amanda Osvaldo
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Machine Learning Toolbox

In reply to this post by Amanda Osvaldo
Hi everybody, can I made some questions ?

First, at all, I really agree that SciLab needs a Machine Learning toolbox.

However, I'm pretty critical about Scilab in your limitations.
I see very potential in the software but require a reform in your infrastructure.


So, my questions.

How large are we talking about the training dataset in scilab ?
Even with Tensorflow compatibility if you need to put all the dataset into the RAM I fear the toolbox utility will be very limited.
In another words: The toolbox will can handle a 250GB dataset or just a few GBs from a desktop ?

Have I read right ?
We are talking about to integrate Scilab and tensorflow or scikit-learn ?
I think it's a good idea, I just whant to know if I'm interpreting right.

Somebody have some idea how to handle this project in a software engineering perspective?
Just to ensure the tests and code quality.


-- Amanda Osvaldo


On Thu, 2017-05-18 at 16:01 +0000, Yann Debray wrote:

Dear Caio, Dhruv and Amanda,

 

I would like to include my colleague Philippe Saadé to the exchanges on Machine Learning for Scilab.

He is an experienced mathematician working with us at ESI Group, and has an interesting vision on the subject.

He will be scientific advisor and mentor for a joint internship on Machine learning starting mid june.

 

[hidden email]: Could you maybe share with us your view on the subject?  

 

We can keep this exchange public if it is alright with you all, since I believe our success on the subject will depend on our capacity to centralize and merge our community efforts.

You can all collaborate on the project on our forge:

http://forge.scilab.org/index.php/p/machine-learning-toolbox/

 

Yours

Yann @ Scilab

 

De : Amanda Osvaldo <[hidden email]>
Date : vendredi 28 avril 2017 à 01:03
À : List dedicated to the development of Scilab <[hidden email]>, Yann Debray <[hidden email]>, Dhruv Khattar <[hidden email]>
Objet : Re: [Scilab-Dev] Machine Learning Toolbox

 

Hi Caio, sorry for the late.

 

I think we should ask ourselves what SciLAB's focus and what audience are.

I feel a lack of knowing what users of Scilab seek.

 

Me, for example, I want to do everything from protyping to running the script on hundreds of Intel Xeon servers with the least possible effort.

Even with less effort than it would have if the script were built in Python.

 

I am sure that new data structures will expand the use of SciLAB.

 

But what advantage will this bring to users?

Python, as example, have already optimized data structures and libraries.

 

-- Amanda Osvaldo

 

 

On Wed, 2017-04-26 at 14:32 -0300, Caio Souza wrote:

Hi,

 

 

I have been thinking about the usability of the toolbox and independent of which algorithms we are going to have, would be interesting to have some simplified structure (like TensorFlow).

 

Despite it being a lot of work to have such structure, (data, model, cost function, minimizer), it would make the toolbox easy to use and extend, having minimum impact to the usability.

 

IMHO, this is something that should be defined before any coding starts, and also well explained to the student.

 

I would like to hear from you what do you think, so we can start a discussion.

 

 

Best,

Caio SOUZA

_______________________________________________
dev mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/dev

_______________________________________________
dev mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/dev
Dhruv Khattar Dhruv Khattar
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Machine Learning Toolbox

In reply to this post by Caioc2
Hi all,

Caio, are you talking about something like an API to call tensorflow functions in Scilab?
I think it would be better if we implement it from scratch as we can make functions which will be easier to use for our users. We can document it as well.

Dhruv
Tan Chin Luh-2 Tan Chin Luh-2
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Machine Learning Toolbox

In reply to this post by Amanda Osvaldo
Hi all Scilab and machine learning enthusiasts,

Great to have this topic in the mailing list as I am also exploring in
deep learning recently.

 From my point of view, there are a few possibilities to build the ML  
toolbox in Scilab, namely:
1. Using the Scilab Matrix operation (Pro: fast for the parts which
allow vectorization, Con: memory issue. Not sure about GPU support)
2. Using C/C++ API, such as caffe, caffe2, dlib, tiny dnn....?
3. Using Python API through PIMS, such as python with tensorflow, keras,
dlib...?
4. Using Java interface throught JIMS, such as....? (a few i came across
which never explore..)

For the small to medium size network such as conventional FFBP, i think
method one would have  advantage as the batch processing could speed up
the training and the codes are highly "readable" for non hardcore
programmer. The network weights which could be simply representing by
the matrices (1-2 hidden layers) and let the users easily visualize the
"internal beauty" of the trained network with Scilab visualization features.

However, when we move to CNN, or other deep learning network, i am not
sure whether we could leverage the advantage of this. Or at least, it
won't be a "jumpstart" way to build a new ML module.

In seeing this, a quick "jumpstart" could be looking into the 2-4
methods. Then another issue might appear. Each of these having their
class/structure to  keep the complicated deep network architecture, and
how are we going to interface this to Scilab? Should we:
1. Use the objects (Java objects, C++ class object in Scilab)  to access
the network created or loaded through the API?
2. Convert the objects into the Scilab mlist so it is more readable?

Then from the Scilab programmers point of view, if we were using the
JIMS or PIMS, at the ends the Scilab codes would be very much looks like
Python or Java style,  unless we wrote another macros to wrap all these
into Scilab style. So far I think the C/C++ API might be the most
"seamless" integrated into Scilab,  which we could utilizing parts of
the C/C++ libraries while others work in Scilab

Finally as for the GPU usage concern, using libs could have solve this
depending on the lib being used.

Forgive me if I made any mistake, just my 2 cents.

Regards,
Tan Chin Luh
_______________________________________________
dev mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/dev
Caioc2 Caioc2
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Machine Learning Toolbox

Hi,

Thanks for input

We have both possibilities, from scratch or/and using a library, still an open topic.
For now either way we chose, I think we would be limited to a few GB's of data, but we can extend it later if we start with this idea in mind. IMHO making anything to work with hundreds of GB's of data could be alone another project :)

My idea is to have a scilab interface which is how it's organized and can be used/called, and how it works in the backend could be scilab code, C/C++ code or calling other libraries.


Somebody have some idea how to handle this project in a software engineering perspective?
Just to ensure the tests and code quality.

1) Define our interface (most important thing), here I think who uses most the feature should chose.
2) Start with simple algorithms, but working 100% inside our interface.
3) Demos, tests and documentation for everything
4) Repeat 2 and 3.

After doing 2 and 3 for the first time we will be able to advance faster.


Then from the Scilab programmers point of view, if we were using the JIMS or PIMS, at the ends the Scilab codes would be very much looks like Python or Java style,  unless we wrote another macros to wrap all these into Scilab style. 
 
Not sure what would be more convenient to users, but I would prefer to wrap everything inside scilab to make a simpler interface.


So far I think the C/C++ API might be the most "seamless" integrated into Scilab,  which we could utilizing parts of the C/C++ libraries while others work in Scilab

If we have our own interface, using C/C++, Python or Java could make difference only in performance, but all those lib have their core in C/C++, so I would choose simplicity for now.

 
Finally as for the GPU usage concern, using libs could have solve this depending on the lib being used.

I see GPU support being an option after we have something solid working already on CPU.



Best,
Caio SOUZA

_______________________________________________
dev mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/dev
Amanda Osvaldo Amanda Osvaldo
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Machine Learning Toolbox

Hi everybody. :-D

Well, i think here we are starting the "The Hopeless Machine Learning Team". :-D

It's a good news for all the community, we should celebrate. :-D

I'm working on a series of studies to determine if SciLAB can meet the requirements for my job.
Unfortunately, this machine learning toolbox not fits in the requirements, so I can not help in a long way. However, perhaps I can do something, I will keep myself open. :-D

A machine learning toolbox is very important, we need to do it. :-O

-- Amanda Osvaldo


On Fri, 2017-05-19 at 11:37 -0300, Caio Souza wrote:
Hi,

Thanks for input

We have both possibilities, from scratch or/and using a library, still an open topic.
For now either way we chose, I think we would be limited to a few GB's of data, but we can extend it later if we start with this idea in mind. IMHO making anything to work with hundreds of GB's of data could be alone another project :)

My idea is to have a scilab interface which is how it's organized and can be used/called, and how it works in the backend could be scilab code, C/C++ code or calling other libraries.


Somebody have some idea how to handle this project in a software engineering perspective?
Just to ensure the tests and code quality.

1) Define our interface (most important thing), here I think who uses most the feature should chose.
2) Start with simple algorithms, but working 100% inside our interface.
3) Demos, tests and documentation for everything
4) Repeat 2 and 3.

After doing 2 and 3 for the first time we will be able to advance faster.


Then from the Scilab programmers point of view, if we were using the JIMS or PIMS, at the ends the Scilab codes would be very much looks like Python or Java style,  unless we wrote another macros to wrap all these into Scilab style. 
 
Not sure what would be more convenient to users, but I would prefer to wrap everything inside scilab to make a simpler interface.


So far I think the C/C++ API might be the most "seamless" integrated into Scilab,  which we could utilizing parts of the C/C++ libraries while others work in Scilab

If we have our own interface, using C/C++, Python or Java could make difference only in performance, but all those lib have their core in C/C++, so I would choose simplicity for now.

 
Finally as for the GPU usage concern, using libs could have solve this depending on the lib being used.

I see GPU support being an option after we have something solid working already on CPU.



Best,
Caio SOUZA
_______________________________________________
dev mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/dev

_______________________________________________
dev mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/dev
Tan Chin Luh-2 Tan Chin Luh-2
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Machine Learning Toolbox

Hi,

Just to share a little bit of my experience in this field, which I think
it is definitely possible to move further to more advance networks
despite some negative feedback.

Firstly, while working on the neural network module
(https://atoms.scilab.org/toolboxes/neuralnetwork/2.0), all the codes
are done in Scilab, with vectorized codes for batch learning to increase
the speed. However, the trade off is that the training could not handle
large data set especially for the LM algorithm. This could be improved
by using the online training with slower speed but less memory usage.

While exploring module like SVM, fuzzy (not ML perhaps? AI?), both
modules using dll from third parties and seamlessly integrated into
Scilab. In which both module performs well with my "not so big" datasets.

Moving towards deeper network, I just used the dnn importer from OPENCV
3.2 to import the caffe model and try to classify the image. The next
bottle neck is whether to put the loaded model into the shared library
which could be referred by Scilab later with pointer, or to import the
model into Scilab list which could be then read by the gateway when needed.

Thanks.

regards,
Chin Luh


_______________________________________________
dev mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/dev
Amanda Osvaldo Amanda Osvaldo
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Machine Learning Toolbox

In reply to this post by Amanda Osvaldo

I everyone, I think we have nothing about it. :-O
So ... somebody have a plan ? :-O

-- Amanda Osvaldo


On Mon, 2017-05-29 at 00:04 +0200, Philippe Saadé (ESI INENDI) wrote:
Dear All,

I took some time to jump in the discussion due to the fact that I wanted to get a better understanding of the current status of your discussions, a better understanding of Mandar's profile and expertise, and also what is easy/hard to do with Scilab to meet some serious and legitimate demands from Scilab's users.

As I am the last to join the discussion, I will voluntarily reset my mind and start again the discussions with you so that we can try to structure the project and converge quickly on an achievable list of goals for this GSoC.

For that purpose, I would like to list a series of questions on which we need to share a mutual list of answers and common understanding.
This should serve as a basis to decide what to do, how and when.

So, feel free to fill in...

  1. Scilab has a way to use Python : PIMS. Originaly created in August 2014.
    1. How mature do you think it is?
    2. How compatible is it with the potential need of using existing Python-based ML framework from within Scilab?
    3. How easy/hard would it be for Mandar to pursue what has been done here so that using the ML frameworks from Scilab would be working well?
  2. Data Management. I think the questions related to the actual size of the data that would be possibly handled by Scilab's users is key. Many ML methods (not necessarily "Deep" ones) need to be trained on large data sets. It doesn't mean that everything has to sit in RAM during training or general pre-processing but it must be possible to handle large data sets.
    1. Do we use only "pointers" from Scilab to give an access to the real data structures that are used by the ML frameworks?
    2. Do we want to integrate part or all of the data structures that are useful, as native Scilab data structures?
    3. Do we consider that the execution of ML algorithms should be designed and architectured in a way that it is done "remotely" from the perspective of Scilab?
  3. Use Cases. We need to list some use cases that are typical of what Scilab users do and that make the usage of ML an exciting perspective. If we can not demonstrate that ML within Scilab is possible, easy and really useful on these Use cases, I am not sure we will have reached the main target of that GSoC opportunity.
    Can we list use cases together?
    I will start by items some but your input is important here.
    1. image classification
    2. object recognition in images and video
    3. Data Driven Industrial Process Control
    4. Anomaly Detection
    5. Dimensionality / Model reduction
    6. etc.

For sure, these questions do not cover all the important topics for this "ML Toolbox" project but this is a way to bootstrap.
As we know, we need to be active and efficient for the 30th of May!

Thanks for your feedback and feel free to share your point of view.
 

 

Cordialement – Best regards,

 

Philippe SAADÉ

 

Le 18/05/2017 à 21:50, Amanda Osvaldo a écrit :
Hi everybody, can I made some questions ?

First, at all, I really agree that SciLab needs a Machine Learning toolbox.

However, I'm pretty critical about Scilab in your limitations.
I see very potential in the software but require a reform in your infrastructure.


So, my questions.

How large are we talking about the training dataset in scilab ?
Even with Tensorflow compatibility if you need to put all the dataset into the RAM I fear the toolbox utility will be very limited.
In another words: The toolbox will can handle a 250GB dataset or just a few GBs from a desktop ?

Have I read right ?
We are talking about to integrate Scilab and tensorflow or scikit-learn ?
I think it's a good idea, I just whant to know if I'm interpreting right.

Somebody have some idea how to handle this project in a software engineering perspective?
Just to ensure the tests and code quality.


-- Amanda Osvaldo


On Thu, 2017-05-18 at 16:01 +0000, Yann Debray wrote:

Dear Caio, Dhruv and Amanda,

 

I would like to include my colleague Philippe Saadé to the exchanges on Machine Learning for Scilab.

He is an experienced mathematician working with us at ESI Group, and has an interesting vision on the subject.

He will be scientific advisor and mentor for a joint internship on Machine learning starting mid june.

 

[hidden email]: Could you maybe share with us your view on the subject?  

 

We can keep this exchange public if it is alright with you all, since I believe our success on the subject will depend on our capacity to centralize and merge our community efforts.

You can all collaborate on the project on our forge:

http://forge.scilab.org/index.php/p/machine-learning-toolbox/

 

Yours

Yann @ Scilab

 

De : Amanda Osvaldo [hidden email]
Date : vendredi 28 avril 2017 à 01:03
À : List dedicated to the development of Scilab [hidden email], Yann Debray [hidden email], Dhruv Khattar [hidden email]
Objet : Re: [Scilab-Dev] Machine Learning Toolbox

 

Hi Caio, sorry for the late.

 

I think we should ask ourselves what SciLAB's focus and what audience are.

I feel a lack of knowing what users of Scilab seek.

 

Me, for example, I want to do everything from protyping to running the script on hundreds of Intel Xeon servers with the least possible effort.

Even with less effort than it would have if the script were built in Python.

 

I am sure that new data structures will expand the use of SciLAB.

 

But what advantage will this bring to users?

Python, as example, have already optimized data structures and libraries.

 

-- Amanda Osvaldo

 

 

On Wed, 2017-04-26 at 14:32 -0300, Caio Souza wrote:

Hi,

 

 

I have been thinking about the usability of the toolbox and independent of which algorithms we are going to have, would be interesting to have some simplified structure (like TensorFlow).

 

Despite it being a lot of work to have such structure, (data, model, cost function, minimizer), it would make the toolbox easy to use and extend, having minimum impact to the usability.

 

IMHO, this is something that should be defined before any coding starts, and also well explained to the student.

 

I would like to hear from you what do you think, so we can start a discussion.

 

 

Best,

Caio SOUZA

_______________________________________________
dev mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/dev


_______________________________________________
dev mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/dev
Caioc2 Caioc2
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Machine Learning Toolbox

Hi,

I have received it, I'm planning to answer all those questions untill the end of the week. Sorry for the delay.

On Wed, May 31, 2017 at 11:25 AM, Philippe Saade <[hidden email]> wrote:
Hi all
It looks like you didn't receive my first email ? 

Envoyé de mon mobile

Le 31 mai 2017 à 16:20, Amanda Osvaldo <[hidden email]> a écrit :


I everyone, I think we have nothing about it. <face-surprise.png>
So ... somebody have a plan ? <face-surprise.png>

-- Amanda Osvaldo


On Mon, 2017-05-29 at 00:04 +0200, Philippe Saadé (ESI INENDI) wrote:
Dear All,

I took some time to jump in the discussion due to the fact that I wanted to get a better understanding of the current status of your discussions, a better understanding of Mandar's profile and expertise, and also what is easy/hard to do with Scilab to meet some serious and legitimate demands from Scilab's users.

As I am the last to join the discussion, I will voluntarily reset my mind and start again the discussions with you so that we can try to structure the project and converge quickly on an achievable list of goals for this GSoC.

For that purpose, I would like to list a series of questions on which we need to share a mutual list of answers and common understanding.
This should serve as a basis to decide what to do, how and when.

So, feel free to fill in...

  1. Scilab has a way to use Python : PIMS. Originaly created in August 2014.
    1. How mature do you think it is?
    2. How compatible is it with the potential need of using existing Python-based ML framework from within Scilab?
    3. How easy/hard would it be for Mandar to pursue what has been done here so that using the ML frameworks from Scilab would be working well?
  2. Data Management. I think the questions related to the actual size of the data that would be possibly handled by Scilab's users is key. Many ML methods (not necessarily "Deep" ones) need to be trained on large data sets. It doesn't mean that everything has to sit in RAM during training or general pre-processing but it must be possible to handle large data sets.
    1. Do we use only "pointers" from Scilab to give an access to the real data structures that are used by the ML frameworks?
    2. Do we want to integrate part or all of the data structures that are useful, as native Scilab data structures?
    3. Do we consider that the execution of ML algorithms should be designed and architectured in a way that it is done "remotely" from the perspective of Scilab?
  3. Use Cases. We need to list some use cases that are typical of what Scilab users do and that make the usage of ML an exciting perspective. If we can not demonstrate that ML within Scilab is possible, easy and really useful on these Use cases, I am not sure we will have reached the main target of that GSoC opportunity.
    Can we list use cases together?
    I will start by items some but your input is important here.
    1. image classification
    2. object recognition in images and video
    3. Data Driven Industrial Process Control
    4. Anomaly Detection
    5. Dimensionality / Model reduction
    6. etc.

For sure, these questions do not cover all the important topics for this "ML Toolbox" project but this is a way to bootstrap.
As we know, we need to be active and efficient for the 30th of May!

Thanks for your feedback and feel free to share your point of view.
 

 

Cordialement – Best regards,

 

Philippe SAADÉ

 

Le 18/05/2017 à 21:50, Amanda Osvaldo a écrit :
Hi everybody, can I made some questions ?

First, at all, I really agree that SciLab needs a Machine Learning toolbox.

However, I'm pretty critical about Scilab in your limitations.
I see very potential in the software but require a reform in your infrastructure.


So, my questions.

How large are we talking about the training dataset in scilab ?
Even with Tensorflow compatibility if you need to put all the dataset into the RAM I fear the toolbox utility will be very limited.
In another words: The toolbox will can handle a 250GB dataset or just a few GBs from a desktop ?

Have I read right ?
We are talking about to integrate Scilab and tensorflow or scikit-learn ?
I think it's a good idea, I just whant to know if I'm interpreting right.

Somebody have some idea how to handle this project in a software engineering perspective?
Just to ensure the tests and code quality.


-- Amanda Osvaldo


On Thu, 2017-05-18 at 16:01 +0000, Yann Debray wrote:

Dear Caio, Dhruv and Amanda,

 

I would like to include my colleague Philippe Saadé to the exchanges on Machine Learning for Scilab.

He is an experienced mathematician working with us at ESI Group, and has an interesting vision on the subject.

He will be scientific advisor and mentor for a joint internship on Machine learning starting mid june.

 

[hidden email]: Could you maybe share with us your view on the subject?  

 

We can keep this exchange public if it is alright with you all, since I believe our success on the subject will depend on our capacity to centralize and merge our community efforts.

You can all collaborate on the project on our forge:

http://forge.scilab.org/index.php/p/machine-learning-toolbox/

 

Yours

Yann @ Scilab

 

De : Amanda Osvaldo [hidden email]
Date : vendredi 28 avril 2017 à 01:03
À : List dedicated to the development of Scilab [hidden email], Yann Debray [hidden email], Dhruv Khattar [hidden email]
Objet : Re: [Scilab-Dev] Machine Learning Toolbox

 

Hi Caio, sorry for the late.

 

I think we should ask ourselves what SciLAB's focus and what audience are.

I feel a lack of knowing what users of Scilab seek.

 

Me, for example, I want to do everything from protyping to running the script on hundreds of Intel Xeon servers with the least possible effort.

Even with less effort than it would have if the script were built in Python.

 

I am sure that new data structures will expand the use of SciLAB.

 

But what advantage will this bring to users?

Python, as example, have already optimized data structures and libraries.

 

-- Amanda Osvaldo

 

 

On Wed, 2017-04-26 at 14:32 -0300, Caio Souza wrote:

Hi,

 

 

I have been thinking about the usability of the toolbox and independent of which algorithms we are going to have, would be interesting to have some simplified structure (like TensorFlow).

 

Despite it being a lot of work to have such structure, (data, model, cost function, minimizer), it would make the toolbox easy to use and extend, having minimum impact to the usability.

 

IMHO, this is something that should be defined before any coding starts, and also well explained to the student.

 

I would like to hear from you what do you think, so we can start a discussion.

 

 

Best,

Caio SOUZA

_______________________________________________
dev mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/dev



_______________________________________________
dev mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/dev
Amanda Osvaldo Amanda Osvaldo
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Machine Learning Toolbox

In reply to this post by Amanda Osvaldo

Hi, sorry for the delay. :-O
I think the first e-mail was a guideline. :-O

--- Amanda Osvaldo

On Wed, 2017-05-31 at 14:25 +0000, Philippe Saade wrote:
Hi all
It looks like you didn't receive my first email ? 

Envoyé de mon mobile

Le 31 mai 2017 à 16:20, Amanda Osvaldo <[hidden email]> a écrit :


I everyone, I think we have nothing about it. <face-surprise.png>
So ... somebody have a plan ? <face-surprise.png>

-- Amanda Osvaldo


On Mon, 2017-05-29 at 00:04 +0200, Philippe Saadé (ESI INENDI) wrote:
Dear All,

I took some time to jump in the discussion due to the fact that I wanted to get a better understanding of the current status of your discussions, a better understanding of Mandar's profile and expertise, and also what is easy/hard to do with Scilab to meet some serious and legitimate demands from Scilab's users.

As I am the last to join the discussion, I will voluntarily reset my mind and start again the discussions with you so that we can try to structure the project and converge quickly on an achievable list of goals for this GSoC.

For that purpose, I would like to list a series of questions on which we need to share a mutual list of answers and common understanding.
This should serve as a basis to decide what to do, how and when.

So, feel free to fill in...

  1. Scilab has a way to use Python : PIMS. Originaly created in August 2014.
    1. How mature do you think it is?
    2. How compatible is it with the potential need of using existing Python-based ML framework from within Scilab?
    3. How easy/hard would it be for Mandar to pursue what has been done here so that using the ML frameworks from Scilab would be working well?
  2. Data Management. I think the questions related to the actual size of the data that would be possibly handled by Scilab's users is key. Many ML methods (not necessarily "Deep" ones) need to be trained on large data sets. It doesn't mean that everything has to sit in RAM during training or general pre-processing but it must be possible to handle large data sets.
    1. Do we use only "pointers" from Scilab to give an access to the real data structures that are used by the ML frameworks?
    2. Do we want to integrate part or all of the data structures that are useful, as native Scilab data structures?
    3. Do we consider that the execution of ML algorithms should be designed and architectured in a way that it is done "remotely" from the perspective of Scilab?
  3. Use Cases. We need to list some use cases that are typical of what Scilab users do and that make the usage of ML an exciting perspective. If we can not demonstrate that ML within Scilab is possible, easy and really useful on these Use cases, I am not sure we will have reached the main target of that GSoC opportunity.
    Can we list use cases together?
    I will start by items some but your input is important here.
    1. image classification
    2. object recognition in images and video
    3. Data Driven Industrial Process Control
    4. Anomaly Detection
    5. Dimensionality / Model reduction
    6. etc.

For sure, these questions do not cover all the important topics for this "ML Toolbox" project but this is a way to bootstrap.
As we know, we need to be active and efficient for the 30th of May!

Thanks for your feedback and feel free to share your point of view.
 

 

Cordialement – Best regards,

 

Philippe SAADÉ

 

Le 18/05/2017 à 21:50, Amanda Osvaldo a écrit :
Hi everybody, can I made some questions ?

First, at all, I really agree that SciLab needs a Machine Learning toolbox.

However, I'm pretty critical about Scilab in your limitations.
I see very potential in the software but require a reform in your infrastructure.


So, my questions.

How large are we talking about the training dataset in scilab ?
Even with Tensorflow compatibility if you need to put all the dataset into the RAM I fear the toolbox utility will be very limited.
In another words: The toolbox will can handle a 250GB dataset or just a few GBs from a desktop ?

Have I read right ?
We are talking about to integrate Scilab and tensorflow or scikit-learn ?
I think it's a good idea, I just whant to know if I'm interpreting right.

Somebody have some idea how to handle this project in a software engineering perspective?
Just to ensure the tests and code quality.


-- Amanda Osvaldo


On Thu, 2017-05-18 at 16:01 +0000, Yann Debray wrote:

Dear Caio, Dhruv and Amanda,

 

I would like to include my colleague Philippe Saadé to the exchanges on Machine Learning for Scilab.

He is an experienced mathematician working with us at ESI Group, and has an interesting vision on the subject.

He will be scientific advisor and mentor for a joint internship on Machine learning starting mid june.

 

[hidden email]: Could you maybe share with us your view on the subject?  

 

We can keep this exchange public if it is alright with you all, since I believe our success on the subject will depend on our capacity to centralize and merge our community efforts.

You can all collaborate on the project on our forge:

http://forge.scilab.org/index.php/p/machine-learning-toolbox/

 

Yours

Yann @ Scilab

 

De : Amanda Osvaldo [hidden email]
Date : vendredi 28 avril 2017 à 01:03
À : List dedicated to the development of Scilab [hidden email], Yann Debray [hidden email], Dhruv Khattar [hidden email]
Objet : Re: [Scilab-Dev] Machine Learning Toolbox

 

Hi Caio, sorry for the late.

 

I think we should ask ourselves what SciLAB's focus and what audience are.

I feel a lack of knowing what users of Scilab seek.

 

Me, for example, I want to do everything from protyping to running the script on hundreds of Intel Xeon servers with the least possible effort.

Even with less effort than it would have if the script were built in Python.

 

I am sure that new data structures will expand the use of SciLAB.

 

But what advantage will this bring to users?

Python, as example, have already optimized data structures and libraries.

 

-- Amanda Osvaldo

 

 

On Wed, 2017-04-26 at 14:32 -0300, Caio Souza wrote:

Hi,

 

 

I have been thinking about the usability of the toolbox and independent of which algorithms we are going to have, would be interesting to have some simplified structure (like TensorFlow).

 

Despite it being a lot of work to have such structure, (data, model, cost function, minimizer), it would make the toolbox easy to use and extend, having minimum impact to the usability.

 

IMHO, this is something that should be defined before any coding starts, and also well explained to the student.

 

I would like to hear from you what do you think, so we can start a discussion.

 

 

Best,

Caio SOUZA

_______________________________________________
dev mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/dev


_______________________________________________
dev mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/dev
Loading...