[Scilab-users] Wanted: Command like regexp accepting a string vector for haystack

classic Classic list List threaded Threaded
4 messages Options
Jens Jens
Reply | Threaded
Open this post in threaded view
|

[Scilab-users] Wanted: Command like regexp accepting a string vector for haystack

Hallo Scilab experts,
I am looking for a command to

return the match (or position) of a character string (=needle) in a vector of strings (=haystack), where needle may be a regular expression

Needle occurs only once in any line of haystack. That may ease the problem.

[start, final, match] = regexp(input, pattern, 'r') is very close to it but does not accept a vector of strings as a haystack.

I hope to have overlooked something in my search. Is there such a command?  

Vectorisation is essential here because otherwise the search is too slow for many lines (<10^6).

Kind regards

Jens 

_______________________________________________
users mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/users
Samuel GOUGEON Samuel GOUGEON
Reply | Threaded
Open this post in threaded view
|

Re: Wanted: Command like regexp accepting a string vector for haystack

Hello Jens,

Le 05/12/2018 à 17:47, Jens Simon Strom a écrit :

Hallo Scilab experts,
I am looking for a command to

return the match (or position) of a character string (=needle) in a vector of strings (=haystack), where needle may be a regular expression

Needle occurs only once in any line of haystack. That may ease the problem.

[start, final, match] = regexp(input, pattern, 'r') is very close to it but does not accept a vector of strings as a haystack.

I hope to have overlooked something in my search. Is there such a command?  

Vectorisation is essential here because otherwise the search is too slow for many lines (<10^6).

You may concatenate your input array with some gluing character that you know is not in the strings (for instance, likely, ascii(10)), and then use regexp() on the single glued result.

HTH
Samuel


_______________________________________________
users mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/users
Jens Jens
Reply | Threaded
Open this post in threaded view
|

Re: Wanted: Command like regexp accepting a string vector for haystack

Thanks Samuel,

That works fine.

Regards
Jens
--------------------------------------------------------------------------------------
Am 05.12.2018 21:50, schrieb Samuel Gougeon:
Hello Jens,

Le 05/12/2018 à 17:47, Jens Simon Strom a écrit :

Hallo Scilab experts,
I am looking for a command to

return the match (or position) of a character string (=needle) in a vector of strings (=haystack), where needle may be a regular expression

Needle occurs only once in any line of haystack. That may ease the problem.

[start, final, match] = regexp(input, pattern, 'r') is very close to it but does not accept a vector of strings as a haystack.

I hope to have overlooked something in my search. Is there such a command?  

Vectorisation is essential here because otherwise the search is too slow for many lines (<10^6).

You may concatenate your input array with some gluing character that you know is not in the strings (for instance, likely, ascii(10)), and then use regexp() on the single glued result.

HTH
Samuel



_______________________________________________
users mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/users


_______________________________________________
users mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/users
Jens Jens
Reply | Threaded
Open this post in threaded view
|

Re: Wanted: Command like regexp accepting a string vector for haystack

Hallo Samuel,
Just further feedback! In my case, when concatenating the 7000 lines (minimum example of a gpx file) to one long string 1 x 1, grep needs three times more computing time compared to parsing the lines in a loop. Here vectorisation is way slower.

Kind regards
Jens
----------------------------------------------------------------------------------------


Am 06.12.2018 14:51, schrieb Jens Simon Strom:
Thanks Samuel,

That works fine.

Regards
Jens
--------------------------------------------------------------------------------------
Am 05.12.2018 21:50, schrieb Samuel Gougeon:
Hello Jens,

Le 05/12/2018 à 17:47, Jens Simon Strom a écrit :

Hallo Scilab experts,
I am looking for a command to

return the match (or position) of a character string (=needle) in a vector of strings (=haystack), where needle may be a regular expression

Needle occurs only once in any line of haystack. That may ease the problem.

[start, final, match] = regexp(input, pattern, 'r') is very close to it but does not accept a vector of strings as a haystack.

I hope to have overlooked something in my search. Is there such a command?  

Vectorisation is essential here because otherwise the search is too slow for many lines (<10^6).

You may concatenate your input array with some gluing character that you know is not in the strings (for instance, likely, ascii(10)), and then use regexp() on the single glued result.

HTH
Samuel



_______________________________________________
users mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/users



_______________________________________________
users mailing list
[hidden email]
http://lists.scilab.org/mailman/listinfo/users