[GSoC2011] Remote file access

classic Classic list List threaded Threaded
2 messages Options
M Stefan M Stefan
Reply | Threaded
Open this post in threaded view
|

[GSoC2011] Remote file access

Hello. I'm Stefan, a 19-year-old student at Faculty of Computer Science
in Iasi, Romania.
I've started programming when I was pretty young (about 9) and I've
enjoyed it ever since. I've worked on many
programming languages (C, C++, Python, JavaScript, PHP, C# etc.) and
various projects (mostly as a freelancer and on
personal projects). This is my second time to participate on GSoC (also
participated last year for Boost).

I would really like to have the opportunity to write code for some major
open-source community such as yours. It was
great last year, and I'm sure it will be great this year. Admittedly,
this isn't a very complicated task I'm proposing on, and
even though I'm good enough at math, I'm not sure I have enough skills
(I should have at least read Numerical Recipes in C)
to work on your more complex proposals (maybe next year?).
I'm familiar with SVN, not so much with Git (but this shouldn't be a
problem, I was planning on getting acquainted with it one
of these days anyway). I haven't worked with Scilab much before, except
for performing simple tasks such as plotting 1d or 2d
functions, some matrix operations and other simple computations.

I would like to extend the functionality of Scilab's fileio module by
adding network functionality using protocols such as HTTP and FTP.
Internally, I will use libcurl (probably the best available choice) and
the interface will be integrated into fileio's. In
other words, HTTP/FTP connections will be represented using the same
types of file descriptors as for files. The functions
taking filename parameters from the fileio module will be able to take
URL strings (they will be passed to CURLOPT_URL,
which implements URL RFC 2396 [see:
http://curl.haxx.se/libcurl/c/curl_easy_setopt.html#CURLOPTURL]).
Additionally,
functions such as http(url,[post_data[,cookie,[,username,password]]])
and ftp(url[,username[,password]]) will be added to
return file descriptors that will be used with existent read/write
functions.

Clearly, nonsense operations such as write or append mode on mopen used
in combination with HTTP or read-only FTP will
throw an error. mwrite will be nonblocking (i.e. it will save into a
buffer), and mclose will be blocking (i.e. it will do the actual
FTP upload). mopen in read mode will be blocking, because all the (e.g.
HTTP) data will be read into a buffer on opening.
Further read calls will be nonblocking, as they will work directly on
the existent buffer. This is probably the best idea, since
HTTP sockets are to be closed as fast as possible and libcurl is
event-driven (i.e. a callback is called when data is read, as
opposed to libcurl waiting for the user to ask for data to be read). A
make_url function will also be available in order to save
the user from the trouble of building correctly-escaped URL-s (gets
nasty when you want to connect to make a FTP url with an
username and a password, and they're variables).
If you consider it useful, I can make a separate module to work solely
as a wrapper for libcurl, but as long as fileio supports all
necessary features through file manipulation functions, I'm not sure a
separate module is needed. libcurl functionality which cannot be
accessed directly through regular file functions can be implemented
separately, e.g.: net_setproxy.

If there's time left, it would probably also be nice to implement
readhtml() and writehtml() functions in order to
generate/read matrices from html tables, as well as readxml() and
writexml() functions, which should be able to serialize and
unserialize all data types into an XML file.
A good example of similar design is PHP, whose fopen() function and the
like support working with HTTP:
$fh = fopen('http://www.google.com', 'r'); and
$data=file_get_contents('http://www.google.com'); both work as long as the
option allow_url_fopen is enabled.

My proposal is not yet complete (a timeline and some other explanations
need to be provided), but any criticism and
suggestions before posting my actual proposal are appreciated.

Yours faithfully,
    Stefan M

sylvestre sylvestre
Reply | Threaded
Open this post in threaded view
|

Re: [GSoC2011] Remote file access

hello

Please post that the gsoc website.
However, you should consider an other subject, many students already
choose this one ... :/
sorry
Sylvestre
Le jeudi 07 avril 2011 à 00:05 +0300, M Stefan a écrit :

> Hello. I'm Stefan, a 19-year-old student at Faculty of Computer Science
> in Iasi, Romania.
> I've started programming when I was pretty young (about 9) and I've
> enjoyed it ever since. I've worked on many
> programming languages (C, C++, Python, JavaScript, PHP, C# etc.) and
> various projects (mostly as a freelancer and on
> personal projects). This is my second time to participate on GSoC (also
> participated last year for Boost).
>
> I would really like to have the opportunity to write code for some major
> open-source community such as yours. It was
> great last year, and I'm sure it will be great this year. Admittedly,
> this isn't a very complicated task I'm proposing on, and
> even though I'm good enough at math, I'm not sure I have enough skills
> (I should have at least read Numerical Recipes in C)
> to work on your more complex proposals (maybe next year?).
> I'm familiar with SVN, not so much with Git (but this shouldn't be a
> problem, I was planning on getting acquainted with it one
> of these days anyway). I haven't worked with Scilab much before, except
> for performing simple tasks such as plotting 1d or 2d
> functions, some matrix operations and other simple computations.
>
> I would like to extend the functionality of Scilab's fileio module by
> adding network functionality using protocols such as HTTP and FTP.
> Internally, I will use libcurl (probably the best available choice) and
> the interface will be integrated into fileio's. In
> other words, HTTP/FTP connections will be represented using the same
> types of file descriptors as for files. The functions
> taking filename parameters from the fileio module will be able to take
> URL strings (they will be passed to CURLOPT_URL,
> which implements URL RFC 2396 [see:
> http://curl.haxx.se/libcurl/c/curl_easy_setopt.html#CURLOPTURL]).
> Additionally,
> functions such as http(url,[post_data[,cookie,[,username,password]]])
> and ftp(url[,username[,password]]) will be added to
> return file descriptors that will be used with existent read/write
> functions.
>
> Clearly, nonsense operations such as write or append mode on mopen used
> in combination with HTTP or read-only FTP will
> throw an error. mwrite will be nonblocking (i.e. it will save into a
> buffer), and mclose will be blocking (i.e. it will do the actual
> FTP upload). mopen in read mode will be blocking, because all the (e.g.
> HTTP) data will be read into a buffer on opening.
> Further read calls will be nonblocking, as they will work directly on
> the existent buffer. This is probably the best idea, since
> HTTP sockets are to be closed as fast as possible and libcurl is
> event-driven (i.e. a callback is called when data is read, as
> opposed to libcurl waiting for the user to ask for data to be read). A
> make_url function will also be available in order to save
> the user from the trouble of building correctly-escaped URL-s (gets
> nasty when you want to connect to make a FTP url with an
> username and a password, and they're variables).
> If you consider it useful, I can make a separate module to work solely
> as a wrapper for libcurl, but as long as fileio supports all
> necessary features through file manipulation functions, I'm not sure a
> separate module is needed. libcurl functionality which cannot be
> accessed directly through regular file functions can be implemented
> separately, e.g.: net_setproxy.
>
> If there's time left, it would probably also be nice to implement
> readhtml() and writehtml() functions in order to
> generate/read matrices from html tables, as well as readxml() and
> writexml() functions, which should be able to serialize and
> unserialize all data types into an XML file.
> A good example of similar design is PHP, whose fopen() function and the
> like support working with HTTP:
> $fh = fopen('http://www.google.com', 'r'); and
> $data=file_get_contents('http://www.google.com'); both work as long as the
> option allow_url_fopen is enabled.
>
> My proposal is not yet complete (a timeline and some other explanations
> need to be provided), but any criticism and
> suggestions before posting my actual proposal are appreciated.
>
> Yours faithfully,
>     Stefan M
>