# [Scilab-users] Error in parameters determined in non-linear least-squares fitting

6 messages
Open this post in threaded view
|

## [Scilab-users] Error in parameters determined in non-linear least-squares fitting

 Scilab friends: the power of Scilab is amazing and I have used it recently for non-linear least-squares fitting, below example from Scilab help function for "datafit". On occasions, I have also used "leastsq".Question: how do I derive the 1sigma standard error in the three parameters p(1), p(2), and p(3)? And, if it is not too complicated, covariances?I know this is written in dozens of textbooks, but I am always getting lost.Please provide a simple recipe written in Scilab.Best greetingsHeinz// -- 04/04/2020 14:57:30 -- // //generate the data function y=FF(x, p) y=p(1)*(x-p(2))+p(3)*x.*x endfunction X=[]; Y=[]; pg=[34;12;14] //parameter used to generate data for x=0:.1:3 Y=[Y,FF(x,pg)+100*(rand()-.5)]; X=[X,x]; end Z=[Y;X]; //The criterion function function e=G(p, z), y=z(1),x=z(2); e=y-FF(x,p), endfunction //Solve the problem p0=[3;5;10] [p,err]=datafit(G,Z,p0); scf(0);clf() plot2d(X,FF(X,pg),5) //the curve without noise plot2d(X,Y,-1) // the noisy data plot2d(X,FF(X,p),12) //the solution xgrid();legend("the curve without noise"," the noisy data", "THE FINAL SOLUTION.....",4); title("solution set 39.868419 10.312053 11.482521","fontsize",4);_______________________________________________ users mailing list [hidden email] http://lists.scilab.org/mailman/listinfo/users
Open this post in threaded view
|

## Re: Error in parameters determined in non-linear least-squares fitting

 Hello Heinz, You can have a look at pages 45-49 of my slides on least Squares : http://www.utc.fr/~mottelet/mt94/leastSquares.pdf Page 48 you have an example where the Covariance matrix is approximated for a fitting problem with an ode defined page 42. S. Quoting Heinz Nabielek <[hidden email]>: Scilab friends: the power of Scilab is amazing and I have used it recently for non-linear least-squares fitting, below example from Scilab help function for "datafit". On occasions, I have also used "leastsq".   Question: how do I derive the 1sigma standard error in the three parameters p(1), p(2), and p(3)? And, if it is not too complicated, covariances?   I know this is written in dozens of textbooks, but I am always getting lost. Please provide a simple recipe written in Scilab. Best greetings Heinz       // -- 04/04/2020 14:57:30 -- // //generate the data function y=FF(x, p) y=p(1)*(x-p(2))+p(3)*x.*x endfunction X=[]; Y=[]; pg=[34;12;14] //parameter used to generate data for x=0:.1:3 Y=[Y,FF(x,pg)+100*(rand()-.5)]; X=[X,x]; end Z=[Y;X]; //The criterion function function e=G(p, z), y=z(1),x=z(2); e=y-FF(x,p), endfunction //Solve the problem p0=[3;5;10] [p,err]=datafit(G,Z,p0); scf(0);clf() plot2d(X,FF(X,pg),5) //the curve without noise plot2d(X,Y,-1) // the noisy data plot2d(X,FF(X,p),12) //the solution xgrid();legend("the curve without noise"," the noisy data", "THE FINAL SOLUTION.....",4); title("solution set 39.868419 10.312053 11.482521","fontsize",4); _______________________________________________ users mailing list [hidden email] http://lists.scilab.org/mailman/listinfo/users
Open this post in threaded view
|

## Re: ?==?utf-8?q? Error in parameters determined in non-linear least-squares fitting

 In reply to this post by Heinz Nabielek-3 Hello Heinz, See below the small example I built and I refer to whenever I need to do some data fitting with confidence intervals for the parameters of the model. It is far from perfect but it might help you untangle the Jacobian and covariance matrix thingy. Just two words of caution: (1) I am clearly less versed in applied maths than Stéphane or Samuel, so take this with a grain of salft; (2) I built this example ages ago, before Samuel improved datafit. Good luck Antoine ////////////////////////////////////////////////////////////////////////////////////// // REFERENCE: //     //    Least Squares Estimation //    SARA A. VAN DE GEER //    Volume 2, pp. 1041–1045 //    in //    Encyclopedia of Statistics in Behavioral Science //    ISBN-13: 978-0-470-86080-9 //    ISBN-10: 0-470-86080-4 //    Editors Brian S. Everitt & David C. Howell //    John Wiley & Sons, Ltd, Chichester, 2005 // //This is a short and definetly incomplete tutorial on data fitting in Scilab using leastsq. // // Basic assumption: this tutorial is for scientists that face this simple problem: // they have an experimental dataset x_epx,y_exp and a certain model y=Model(x,p) to fit thie dataset. //  This tutorial will try to answer the folowing questions: //  1) how do you do that (simply) //  2) how do you do that (more reliably and more quickly) //      a) let's go faster with a Jacobian //      b) how good is your fit? How big is the "error bar" associated with each parameter of your Model? //      c) Can we bullet proof it? //1) How do you do curve fitting in Scilab // //  We need a) a model function b) a dataset c) some more work //  1)a) Let's start with the model function: a sinewave //      here is the formula: y=A*sin(k*(x-x0))+y0; //      here is the function prototypr [y]=SineModel(x,p) with  p=[x0,k,A,y0]' function [y]=SineModel(x,p) //INPUTS:   x 1D vector //          p parameter vector of size [4,1] containing //              x0  : sine origin //              k   : sine spatial frequency ie k=2%pi/Xp with Xp the period //              A   : sine amplitude //              y0  : sine offset //OUTPUTS:   y 1D vector of same size than x such that y=A*sin(k*(x-x0))+y0;       x0=p(1);       k=p(2);       A=p(3);       y0=p(4);       y=y0+A*sin((x-x0)*k); endfunction //  1)b) Let's now have a fake dataset: a sine wave with some noise //  We reuse the Model function we have just created to make this fake dataset x_exp=[-10:0.1:10]'; x0=1.55; k=1*%pi/3; A=4.3; y0=1.1 y_exp=SineModel(x_exp,[x0,k,A,y0])+(2*rand(x_exp)-1)*6/10; //let's check and see what it looks like: scf(); plot(x_exp,y_exp,'k.-'); xlabel('Exparimental X'); ylabel('Exparimental Y'); xtitle('Our fake experimental dataset to fit'); //  1)c) we are not done yet, we need some more work //  First we need an error function that return for a given parameter set param the difference //  between the experimental dataset and the model one: //  Note that this function returns the difference at each point, not the square of the difference, //  nor the sum over each point of the square of the differences function e = ErrorFitSine(param, x_exp, y_exp)    e = SineModel(x_exp,param)-y_exp; endfunction // Now we need a starting point that is not too far away from the solution // Let's just fo it by hand for the moment we'll see later how to make it programmatically // Just go and have a look at the previous plot and "guess", here is mine: p0=[1,2*%pi/6,4,1]; //Ready to go: [f,popt, gropt] = leastsq(list(ErrorFitSine,x_exp,y_exp),p0); //popt contains the optimal parameter set that fits our dataset // scf(); plot(x_exp,y_exp,'k.'); plot(x_exp,SineModel(x_exp,popt),'r-'); xlabel('Experimental X') ylabel('Experimental Y and best fit'); xtitle([...     "x0="+msprintf('%1.3f     fit value:     \t%1.3f',x0,popt(1));...     "k ="+msprintf('%1.3f     fit value:     \t%1.3f',k,popt(2));...     "A ="+msprintf('%1.3f     fit value:     \t%1.3f',A,popt(3));...     "y0="+msprintf('%1.3f     fit value:     \t%1.3f',y0,popt(4))... ]); //Yep we are done popt is the optimal parameter set that fits our dataset x_exp,y_exp with our // model SineModel // How to assert the quality of our fit? We can use fopt and gropt for that. They should be both as small as possible. // Ideally, the gradient should be zeros for each parameter, otherwise it means we have not found an optimum. //2) How to go beyond that simple example? // Namely: //      a) how to go faster? //      b) how to estimate how good our fit is (aka get error bars on our parameters)? //      c) how to make thinks more reliable with less human guessing and more bulletproofing? //2)a) and also 2)b) //  We need the same extra function in order to speed things up and to estimate //  how the "noise" on the experimental dataset translates in "noise" on each //  individual parameter p(1), ...p($): the Jacobian matrix of our fit model. // Impressive name for a simple idea: providing leastsq with the partial // derivative of the model formula with respect to each parameter p(1)..p($). function g = ErrorJMatSine(p, x, y) // //   g(1,:) = d(SinModel(x,p))/dx0 = d(SinModel(x,p))/d(p(1)) //   g(2,:) = d(SinModel(x,p))/dk = d(SinModel(x,p))/d(p(2)) //   g(3,:) = d(SinModel(x,p))/dA = d(SinModel(x,p))/d(p(3)) //   g(4,:) = d(SinModel(x,p))/dy0 = d(SinModel(x,p))/d(p(4)) //    y=A*sin(k*(x-x0))+y0;    x0=p(1);    k=p(2);    A=p(3);    y0=p(4);    g = [...          (-k)*SineModel(x,[x0-%pi/2/k,k,A,0]'),...          (x-x0).*SineModel(x,[x0-%pi/2/k,k,A,0]'),...          SineModel(x,[x0,k,1,0]'),...          ones(x) ...        ]; endfunction //Ready to go again, and faster this time: [f,popt_fast, gropt] = leastsq(list(ErrorFitSine,x_exp,y_exp),ErrorJMatSine,p0); disp("This should be ~ [0,0,0] when both fits give exactly the same results") disp(string((popt-popt_fast)./(popt+popt_fast))) //Now we check that we are faster: //  we do it several times to average over timing variations caused by //  other processes running on our computer speedup=zeros(1,10) for i=1:100     tic();     [f,popt_fast, gropt] = leastsq(list(ErrorFitSine,x_exp,y_exp),p0);     t1=toc();     tic();     [f,popt_fast, gropt] = leastsq(list(ErrorFitSine,x_exp,y_exp),ErrorJMatSine,p0);     t2=toc();     speedup(i)=t1/t2; end scf(); plot(speedup,'k.'); plot(mean(speedup)*ones(speedup),'r'); xlabel("Fit iteration"); ylabel("SpeedUp factor when using Jacobian Matrix"); xtitle("Here we have a "+msprintf("~%1.2f",mean(speedup))+... " speed improvement using the Jacobian Matrix"); //2)b) How can we estimate "error bars" for each individual parameters? //  We are going to estimate how the "noise" on our dataset turns into //  noise on each parameter by estimating the confidence interval at 95% g = ErrorJMatSine(popt_fast, x_exp, y_exp);//Jacobian matrix of the fit formula //estimate of the initial noise on the dataset to fit sigma2=1/(length(x_exp)-length(popt_fast))*f; //covariance matrix of fitting parameters pcovar=inv(g'*g)*sigma2; //confidence interval for each parameter ci=(1.96*sqrt(diag(pcovar)))'; //Let's present the results of the confidence interval calculation str=msprintf("%4.3f\n", popt_fast')+' +/- '+msprintf("%4.3f\n", ci'); str=[["x0 = ";"k = ";"A = ";"y0 = "]+str]; str=["Fit results:";"y=A*sin(k*(x-x0))+y0";"Param +/- conf. interval @ 95%";str] scf(); plot(x_exp,y_exp,'k.'); plot(x_exp,SineModel(x_exp,popt_fast),'r-'); xlabel('Experimental X') ylabel('Experimental Y and best fit'); xtitle('Fit with confidence intervals at 95%'); xstringb(-6,-3,str,12,6,'fill'); e=gce(); e.box="on"; e.fill_mode="on"; e.background=color("light gray"); //Now we have a fast fit function that can give some estimation on //  how seriously we can believe the fitted parameters. //  You can see that the precision with which we retrieve each parameter varies //  k0 is more precisely determined than y0 (15x more precisely!) //You can play with the amount of noise to see how it affects the retrieved //  parameters and confidence intervals //2)b) How can we bullet proof our fitting script by putting all this stuff //      together is several functions that checks the arguments and modify //      them if needed to avoid difficult to understand complaints from //      leastsq? // test of a sinus fitting routine ////  y=A*sin(k*(x-x0))+y0; //// function [y]=Sine(x,x0,k,A,y0) //// function [y]=Sine(x,p) with  p=[x0,k,A,y0]' //function [y]=Sine(x,varargin) ////    pause //    select length(varargin) //    case 1 then //        //we use form [y]=Sine(x,p) where p=[x0,dx,A,y0]' //        p=varargin(1); //    case 4 then //        //we use form [y]=Sine(x,x0,k,A,y0) //        p=[varargin(1);varargin(2);varargin(3);varargin(4)]; //    else //        //call is not correct, we give up //        y=[];return //    end //      x0=p(1); //      k=p(2); //      A=p(3); //      y0=p(4); //      y=y0+A*sin((x-x0)*k); //endfunction // function [p0,pinf,psup]=EstimateFitSine(x,y)     //  y=A*sin(k*(x-x0))+y0;     y0=mean(y);     A=(max(y)-min(y))/2; //    pause     sy=fftshift(fft(y-mean(y)));     //sy=fft(y-mean(y));     msy=max(abs(sy));rg=find(abs(sy)==msy);     delta=(rg(2)-rg(1))/length(y);     k=delta*%pi/(mean(diff(x)));     x0=mean(abs(atan(imag(sy(rg)),real(sy(rg)))/k))+%pi/2/k;     p0=[x0,k,A,y0];     pinf=[min(x),0      ,-%inf  ,-%inf];     psup=[max(x),%inf   ,%inf   ,%inf]; endfunction function e = ErrorFitSine(param, xi, yi)    e = SineModel(xi,param)-yi; endfunction   function g = dErrorFitSine(param, xi, yi) //    y=A*sin(k*(x-x0))+y0;    x0=param(1);    k=param(2);    A=param(3);    y0=param(4); //   pause    g = [...          (-k)*SineModel(xi,[x0-%pi/2/k,k,A,0]),...          (xi-x0).*SineModel(xi,[x0-%pi/2/k,k,A,0]),...          SineModel(xi,[x0,k,1,0]),...          ones(xi) ...        ]; endfunction function [f,popt,yopt,gropt,ci,pcovar]=FitSine(x,y,p0,varargin) // FIT Experimental dataset y(x) with a sine model //      [y]=A*sin(k*(x-x0))+y0; // //INPUTS:    //x             :       experiemental x dataset    //y             :       experimental  y dataset    //p0            :       starting parameter set [x0,k,A,y0]    //varargin      :    //               pinf   inferior limit for popt    //               psup   superior limit for popt //OUTPUTS    //f             :       value of the cost function for the best parameter set    //popt          :       best parameter set    //yopt          :       fit function evaluated on the x dataset    //gropt         :       gradient of the cost function at x    //ci            ;       confidence interval @ 95% on each parameter in popt    //pcovar        ;       covariance matrix of popt        // CHECK x and y are col not row    if  isrow(x) then        x=x.';    end    if  isrow(y) then        y=y.';    end        if (length(varargin) < 2) then       // No constraints on parameter set were provided       [f,popt, gropt] = leastsq(list(ErrorFitSine,x,y),dErrorFitSine,p0); //      [f,popt_fast, gropt] = leastsq(list(ErrorFitSine,x_exp,y_exp),ErrorJMatSine,p0);    else       // Constraints on parameter set were provided       pinf=varargin(1);       psup=varargin(2);       [f,popt, gropt] = leastsq(list(ErrorFitSine,x,y),dErrorFitSine,"b",pinf,psup,p0);    end    //normalized best fit evaluated on normalized x    yopt=ErrorFitSine(popt, x, zeros(x));    //rescale best fit param     g = dErrorFitSine(popt, x, y);//Jacobian matrix of the fit formula     //estimate of the noise on the signal to fit     sigma2=1/(length(x)-length(popt))*f;     //covariance matrix of fitting parameters     pcovar=inv(g'*g)*sigma2;     //confidence interval for each parameter     ci=1.96*sqrt(diag(pcovar));     ci=ci.'; endfunction // Let's use our unified and bullet-proof routine x=[-10:0.1:10]; //y=Sine(x,[1,2*%pi/3,4,1])+(2*rand(x)-1)*0.1; y=SineModel(x,[5,1*%pi/3,4,1])+(2*rand(x)-1)*2; //y=SineModel(x,[5,1*%pi/3,4,1])+(2*rand(x)-1)*2/10; p0=EstimateFitSine(x,y); [f,popt,yopt,gropt,ci,pcovar]=FitSine(x,y,p0); str="$"+["x_0";"k";"A";"y_0"]+"="+string(popt.')+"\pm"+string(ci.')+"$"; str=["$\text{Model: }[y]=A\sin(k(x-x_0))+y_0$";str]; scf(); plot(x,y,'k.'); plot(x,SineModel(x,p0),'g'); plot(x,SineModel(x,popt),'r'); xlabel("$x$"); ylabel("$y$"); xtitle(str) legend(["Data";"Guess";"Fit"]) //////////////////////////////////////////////////////////////////////////////////////   Le Samedi, Avril 04, 2020 15:13 CEST, Heinz Nabielek <[hidden email]> a écrit:   > Scilab friends: the power of Scilab is amazing and I have used it recently for non-linear least-squares fitting, below example from Scilab help function for "datafit". On occasions, I have also used "leastsq". > > Question: how do I derive the 1sigma standard error in the three parameters p(1), p(2), and p(3)? And, if it is not too complicated, covariances? > > I know this is written in dozens of textbooks, but I am always getting lost. > Please provide a simple recipe written in Scilab. > Best greetings > Heinz > > > > // -- 04/04/2020 14:57:30 -- // > //generate the data > function y=FF(x, p) >   y=p(1)*(x-p(2))+p(3)*x.*x > endfunction > X=[]; > Y=[]; > pg=[34;12;14] //parameter used to generate data > for x=0:.1:3 >   Y=[Y,FF(x,pg)+100*(rand()-.5)]; >   X=[X,x]; > end > Z=[Y;X]; > //The criterion function > function e=G(p, z), >   y=z(1),x=z(2); >   e=y-FF(x,p), > endfunction > //Solve the problem > p0=[3;5;10] > [p,err]=datafit(G,Z,p0); > scf(0);clf() > plot2d(X,FF(X,pg),5) //the curve without noise > plot2d(X,Y,-1)  // the noisy data > plot2d(X,FF(X,p),12) //the solution > xgrid();legend("the curve without noise"," the noisy data", "THE FINAL SOLUTION.....",4); > title("solution set   39.868419    10.312053    11.482521","fontsize",4); _______________________________________________ users mailing list [hidden email] http://lists.scilab.org/mailman/listinfo/users
Open this post in threaded view
|

## Re: ?==?utf-8?q? Error in parameters determined in non-linear least-squares fitting

 Hi Scilabers Good examples are worth a lot. Maybe this one could be part of the Scilab documentation? Best regards, Claus On 06.04.2020 08:17, Antoine Monmayrant wrote: > Hello Heinz, > > See below the small example I built and I refer to whenever I need to do some data fitting with confidence intervals for the parameters of the model. > It is far from perfect but it might help you untangle the Jacobian and covariance matrix thingy. > Just two words of caution: (1) I am clearly less versed in applied maths than Stéphane or Samuel, so take this with a grain of salft; (2) I built this example ages ago, before Samuel improved datafit. > > Good luck > > Antoine > > ////////////////////////////////////////////////////////////////////////////////////// > // REFERENCE: > // > //    Least Squares Estimation > //    SARA A. VAN DE GEER > //    Volume 2, pp. 1041–1045 > //    in > //    Encyclopedia of Statistics in Behavioral Science > //    ISBN-13: 978-0-470-86080-9 > //    ISBN-10: 0-470-86080-4 > //    Editors Brian S. Everitt & David C. Howell > //    John Wiley & Sons, Ltd, Chichester, 2005 > // > > //This is a short and definetly incomplete tutorial on data fitting in Scilab using leastsq. > // > // Basic assumption: this tutorial is for scientists that face this simple problem: > // they have an experimental dataset x_epx,y_exp and a certain model y=Model(x,p) to fit thie dataset. > //  This tutorial will try to answer the folowing questions: > //  1) how do you do that (simply) > //  2) how do you do that (more reliably and more quickly) > //      a) let's go faster with a Jacobian > //      b) how good is your fit? How big is the "error bar" associated with each parameter of your Model? > //      c) Can we bullet proof it? > > //1) How do you do curve fitting in Scilab > // > //  We need a) a model function b) a dataset c) some more work > //  1)a) Let's start with the model function: a sinewave > //      here is the formula: y=A*sin(k*(x-x0))+y0; > //      here is the function prototypr [y]=SineModel(x,p) with  p=[x0,k,A,y0]' > function [y]=SineModel(x,p) > //INPUTS:   x 1D vector > //          p parameter vector of size [4,1] containing > //              x0  : sine origin > //              k   : sine spatial frequency ie k=2%pi/Xp with Xp the period > //              A   : sine amplitude > //              y0  : sine offset > //OUTPUTS:   y 1D vector of same size than x such that y=A*sin(k*(x-x0))+y0; >        x0=p(1); >        k=p(2); >        A=p(3); >        y0=p(4); >        y=y0+A*sin((x-x0)*k); > endfunction > > //  1)b) Let's now have a fake dataset: a sine wave with some noise > //  We reuse the Model function we have just created to make this fake dataset > > x_exp=[-10:0.1:10]'; > x0=1.55; > k=1*%pi/3; > A=4.3; > y0=1.1 > y_exp=SineModel(x_exp,[x0,k,A,y0])+(2*rand(x_exp)-1)*6/10; > > //let's check and see what it looks like: > scf(); > plot(x_exp,y_exp,'k.-'); > xlabel('Exparimental X'); > ylabel('Exparimental Y'); > xtitle('Our fake experimental dataset to fit'); > > //  1)c) we are not done yet, we need some more work > //  First we need an error function that return for a given parameter set param the difference > //  between the experimental dataset and the model one: > //  Note that this function returns the difference at each point, not the square of the difference, > //  nor the sum over each point of the square of the differences > function e = ErrorFitSine(param, x_exp, y_exp) >     e = SineModel(x_exp,param)-y_exp; > endfunction > // Now we need a starting point that is not too far away from the solution > // Let's just fo it by hand for the moment we'll see later how to make it programmatically > // Just go and have a look at the previous plot and "guess", here is mine: > p0=[1,2*%pi/6,4,1]; > > //Ready to go: > [f,popt, gropt] = leastsq(list(ErrorFitSine,x_exp,y_exp),p0); > //popt contains the optimal parameter set that fits our dataset > // > scf(); > plot(x_exp,y_exp,'k.'); > plot(x_exp,SineModel(x_exp,popt),'r-'); > xlabel('Experimental X') > ylabel('Experimental Y and best fit'); > xtitle([... >      "x0="+msprintf('%1.3f     fit value:     \t%1.3f',x0,popt(1));... >      "k ="+msprintf('%1.3f     fit value:     \t%1.3f',k,popt(2));... >      "A ="+msprintf('%1.3f     fit value:     \t%1.3f',A,popt(3));... >      "y0="+msprintf('%1.3f     fit value:     \t%1.3f',y0,popt(4))... > ]); > > //Yep we are done popt is the optimal parameter set that fits our dataset x_exp,y_exp with our > // model SineModel > // How to assert the quality of our fit? We can use fopt and gropt for that. They should be both as small as possible. > // Ideally, the gradient should be zeros for each parameter, otherwise it means we have not found an optimum. > > //2) How to go beyond that simple example? > // Namely: > //      a) how to go faster? > //      b) how to estimate how good our fit is (aka get error bars on our parameters)? > //      c) how to make thinks more reliable with less human guessing and more bulletproofing? > > > //2)a) and also 2)b) > //  We need the same extra function in order to speed things up and to estimate > //  how the "noise" on the experimental dataset translates in "noise" on each > //  individual parameter p(1), ...p($): the Jacobian matrix of our fit model. > // Impressive name for a simple idea: providing leastsq with the partial > // derivative of the model formula with respect to each parameter p(1)..p($). > function g = ErrorJMatSine(p, x, y) > // > > //   g(1,:) = d(SinModel(x,p))/dx0 = d(SinModel(x,p))/d(p(1)) > //   g(2,:) = d(SinModel(x,p))/dk = d(SinModel(x,p))/d(p(2)) > //   g(3,:) = d(SinModel(x,p))/dA = d(SinModel(x,p))/d(p(3)) > //   g(4,:) = d(SinModel(x,p))/dy0 = d(SinModel(x,p))/d(p(4)) > > //    y=A*sin(k*(x-x0))+y0; >     x0=p(1); >     k=p(2); >     A=p(3); >     y0=p(4); >     g = [... >           (-k)*SineModel(x,[x0-%pi/2/k,k,A,0]'),... >           (x-x0).*SineModel(x,[x0-%pi/2/k,k,A,0]'),... >           SineModel(x,[x0,k,1,0]'),... >           ones(x) ... >         ]; > endfunction > > > //Ready to go again, and faster this time: > [f,popt_fast, gropt] = leastsq(list(ErrorFitSine,x_exp,y_exp),ErrorJMatSine,p0); > > disp("This should be ~ [0,0,0] when both fits give exactly the same results") > disp(string((popt-popt_fast)./(popt+popt_fast))) > > //Now we check that we are faster: > //  we do it several times to average over timing variations caused by > //  other processes running on our computer > speedup=zeros(1,10) > for i=1:100 >      tic(); >      [f,popt_fast, gropt] = leastsq(list(ErrorFitSine,x_exp,y_exp),p0); >      t1=toc(); >      tic(); >      [f,popt_fast, gropt] = leastsq(list(ErrorFitSine,x_exp,y_exp),ErrorJMatSine,p0); >      t2=toc(); >      speedup(i)=t1/t2; > end > > scf(); > plot(speedup,'k.'); > plot(mean(speedup)*ones(speedup),'r'); > xlabel("Fit iteration"); > ylabel("SpeedUp factor when using Jacobian Matrix"); > xtitle("Here we have a "+msprintf("~%1.2f",mean(speedup))+... > " speed improvement using the Jacobian Matrix"); > > //2)b) How can we estimate "error bars" for each individual parameters? > //  We are going to estimate how the "noise" on our dataset turns into > //  noise on each parameter by estimating the confidence interval at 95% > > g = ErrorJMatSine(popt_fast, x_exp, y_exp);//Jacobian matrix of the fit formula > //estimate of the initial noise on the dataset to fit > sigma2=1/(length(x_exp)-length(popt_fast))*f; > > //covariance matrix of fitting parameters > pcovar=inv(g'*g)*sigma2; > //confidence interval for each parameter > ci=(1.96*sqrt(diag(pcovar)))'; > > //Let's present the results of the confidence interval calculation > str=msprintf("%4.3f\n", popt_fast')+' +/- '+msprintf("%4.3f\n", ci'); > str=[["x0 = ";"k = ";"A = ";"y0 = "]+str]; > str=["Fit results:";"y=A*sin(k*(x-x0))+y0";"Param +/- conf. interval @ 95%";str] > > scf(); > plot(x_exp,y_exp,'k.'); > plot(x_exp,SineModel(x_exp,popt_fast),'r-'); > xlabel('Experimental X') > ylabel('Experimental Y and best fit'); > xtitle('Fit with confidence intervals at 95%'); > xstringb(-6,-3,str,12,6,'fill'); > e=gce(); > e.box="on"; > e.fill_mode="on"; > e.background=color("light gray"); > > //Now we have a fast fit function that can give some estimation on > //  how seriously we can believe the fitted parameters. > //  You can see that the precision with which we retrieve each parameter varies > //  k0 is more precisely determined than y0 (15x more precisely!) > //You can play with the amount of noise to see how it affects the retrieved > //  parameters and confidence intervals > > > > //2)b) How can we bullet proof our fitting script by putting all this stuff > //      together is several functions that checks the arguments and modify > //      them if needed to avoid difficult to understand complaints from > //      leastsq? > > // test of a sinus fitting routine > > ////  y=A*sin(k*(x-x0))+y0; > //// function [y]=Sine(x,x0,k,A,y0) > //// function [y]=Sine(x,p) with  p=[x0,k,A,y0]' > //function [y]=Sine(x,varargin) > ////    pause > //    select length(varargin) > //    case 1 then > //        //we use form [y]=Sine(x,p) where p=[x0,dx,A,y0]' > //        p=varargin(1); > //    case 4 then > //        //we use form [y]=Sine(x,x0,k,A,y0) > //        p=[varargin(1);varargin(2);varargin(3);varargin(4)]; > //    else > //        //call is not correct, we give up > //        y=[];return > //    end > //      x0=p(1); > //      k=p(2); > //      A=p(3); > //      y0=p(4); > //      y=y0+A*sin((x-x0)*k); > //endfunction > // > function [p0,pinf,psup]=EstimateFitSine(x,y) >      //  y=A*sin(k*(x-x0))+y0; >      y0=mean(y); >      A=(max(y)-min(y))/2; > //    pause >      sy=fftshift(fft(y-mean(y))); >      //sy=fft(y-mean(y)); >      msy=max(abs(sy));rg=find(abs(sy)==msy); >      delta=(rg(2)-rg(1))/length(y); >      k=delta*%pi/(mean(diff(x))); >      x0=mean(abs(atan(imag(sy(rg)),real(sy(rg)))/k))+%pi/2/k; >      p0=[x0,k,A,y0]; >      pinf=[min(x),0      ,-%inf  ,-%inf]; >      psup=[max(x),%inf   ,%inf   ,%inf]; > endfunction > > > function e = ErrorFitSine(param, xi, yi) >     e = SineModel(xi,param)-yi; > endfunction >   > function g = dErrorFitSine(param, xi, yi) > //    y=A*sin(k*(x-x0))+y0; >     x0=param(1); >     k=param(2); >     A=param(3); >     y0=param(4); > //   pause >     g = [... >           (-k)*SineModel(xi,[x0-%pi/2/k,k,A,0]),... >           (xi-x0).*SineModel(xi,[x0-%pi/2/k,k,A,0]),... >           SineModel(xi,[x0,k,1,0]),... >           ones(xi) ... >         ]; > endfunction > > > > function [f,popt,yopt,gropt,ci,pcovar]=FitSine(x,y,p0,varargin) > // FIT Experimental dataset y(x) with a sine model > //      [y]=A*sin(k*(x-x0))+y0; > // > //INPUTS: >     //x             :       experiemental x dataset >     //y             :       experimental  y dataset >     //p0            :       starting parameter set [x0,k,A,y0] >     //varargin      : >     //               pinf   inferior limit for popt >     //               psup   superior limit for popt > //OUTPUTS >     //f             :       value of the cost function for the best parameter set >     //popt          :       best parameter set >     //yopt          :       fit function evaluated on the x dataset >     //gropt         :       gradient of the cost function at x >     //ci            ;       confidence interval @ 95% on each parameter in popt >     //pcovar        ;       covariance matrix of popt >     >     // CHECK x and y are col not row >     if  isrow(x) then >         x=x.'; >     end >     if  isrow(y) then >         y=y.'; >     end >     >     if (length(varargin) < 2) then >        // No constraints on parameter set were provided >        [f,popt, gropt] = leastsq(list(ErrorFitSine,x,y),dErrorFitSine,p0); > //      [f,popt_fast, gropt] = leastsq(list(ErrorFitSine,x_exp,y_exp),ErrorJMatSine,p0); >     else >        // Constraints on parameter set were provided >        pinf=varargin(1); >        psup=varargin(2); >        [f,popt, gropt] = leastsq(list(ErrorFitSine,x,y),dErrorFitSine,"b",pinf,psup,p0); >     end >     //normalized best fit evaluated on normalized x >     yopt=ErrorFitSine(popt, x, zeros(x)); >     //rescale best fit param > >      g = dErrorFitSine(popt, x, y);//Jacobian matrix of the fit formula >      //estimate of the noise on the signal to fit >      sigma2=1/(length(x)-length(popt))*f; > >      //covariance matrix of fitting parameters >      pcovar=inv(g'*g)*sigma2; >      //confidence interval for each parameter >      ci=1.96*sqrt(diag(pcovar)); >      ci=ci.'; > endfunction > > > // Let's use our unified and bullet-proof routine > > x=[-10:0.1:10]; > //y=Sine(x,[1,2*%pi/3,4,1])+(2*rand(x)-1)*0.1; > y=SineModel(x,[5,1*%pi/3,4,1])+(2*rand(x)-1)*2; > //y=SineModel(x,[5,1*%pi/3,4,1])+(2*rand(x)-1)*2/10; > > p0=EstimateFitSine(x,y); > [f,popt,yopt,gropt,ci,pcovar]=FitSine(x,y,p0); > > > str="$"+["x_0";"k";"A";"y_0"]+"="+string(popt.')+"\pm"+string(ci.')+"$"; > str=["$\text{Model: }[y]=A\sin(k(x-x_0))+y_0$";str]; > > scf(); > plot(x,y,'k.'); > plot(x,SineModel(x,p0),'g'); > plot(x,SineModel(x,popt),'r'); > xlabel("$x$"); > ylabel("$y$"); > xtitle(str) > legend(["Data";"Guess";"Fit"]) > > ////////////////////////////////////////////////////////////////////////////////////// >   > Le Samedi, Avril 04, 2020 15:13 CEST, Heinz Nabielek <[hidden email]> a écrit: >   >> Scilab friends: the power of Scilab is amazing and I have used it recently for non-linear least-squares fitting, below example from Scilab help function for "datafit". On occasions, I have also used "leastsq". >> >> Question: how do I derive the 1sigma standard error in the three parameters p(1), p(2), and p(3)? And, if it is not too complicated, covariances? >> >> I know this is written in dozens of textbooks, but I am always getting lost. >> Please provide a simple recipe written in Scilab. >> Best greetings >> Heinz >> >> >> >> // -- 04/04/2020 14:57:30 -- // >> //generate the data >> function y=FF(x, p) >>    y=p(1)*(x-p(2))+p(3)*x.*x >> endfunction >> X=[]; >> Y=[]; >> pg=[34;12;14] //parameter used to generate data >> for x=0:.1:3 >>    Y=[Y,FF(x,pg)+100*(rand()-.5)]; >>    X=[X,x]; >> end >> Z=[Y;X]; >> //The criterion function >> function e=G(p, z), >>    y=z(1),x=z(2); >>    e=y-FF(x,p), >> endfunction >> //Solve the problem >> p0=[3;5;10] >> [p,err]=datafit(G,Z,p0); >> scf(0);clf() >> plot2d(X,FF(X,pg),5) //the curve without noise >> plot2d(X,Y,-1)  // the noisy data >> plot2d(X,FF(X,p),12) //the solution >> xgrid();legend("the curve without noise"," the noisy data", "THE FINAL SOLUTION.....",4); >> title("solution set   39.868419    10.312053    11.482521","fontsize",4); > _______________________________________________ > users mailing list > [hidden email] > http://lists.scilab.org/mailman/listinfo/users-- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus_______________________________________________ users mailing list [hidden email] http://lists.scilab.org/mailman/listinfo/users
 Hi all, I would appreciate comments/corrections&improvements from you guys as I am far from a specialist in this field! If some of you find this example useful, I can try to work a bit to improve it and push it as an atom module. In fact, I have a bunch of other fit functions (like gaussian, diode-like curve, ...) that follow the same approach than the sinewave example  and I've long wanted to bundle them in a module where one can add a new fit function by providing the model, error, jacobian and estimate functions like in my example... Antoine     Le Lundi, Avril 06, 2020 08:43 CEST, Claus Futtrup <[hidden email]> a écrit:   > Hi Scilabers > > Good examples are worth a lot. Maybe this one could be part of the > Scilab documentation? > > Best regards, > Claus > > On 06.04.2020 08:17, Antoine Monmayrant wrote: > > Hello Heinz, > > > > See below the small example I built and I refer to whenever I need to do some data fitting with confidence intervals for the parameters of the model. > > It is far from perfect but it might help you untangle the Jacobian and covariance matrix thingy. > > Just two words of caution: (1) I am clearly less versed in applied maths than Stéphane or Samuel, so take this with a grain of salft; (2) I built this example ages ago, before Samuel improved datafit. > > > > Good luck > > > > Antoine > > > > ////////////////////////////////////////////////////////////////////////////////////// > > // REFERENCE: > > // > > //    Least Squares Estimation > > //    SARA A. VAN DE GEER > > //    Volume 2, pp. 1041–1045 > > //    in > > //    Encyclopedia of Statistics in Behavioral Science > > //    ISBN-13: 978-0-470-86080-9 > > //    ISBN-10: 0-470-86080-4 > > //    Editors Brian S. Everitt & David C. Howell > > //    John Wiley & Sons, Ltd, Chichester, 2005 > > // > > > > //This is a short and definetly incomplete tutorial on data fitting in Scilab using leastsq. > > // > > // Basic assumption: this tutorial is for scientists that face this simple problem: > > // they have an experimental dataset x_epx,y_exp and a certain model y=Model(x,p) to fit thie dataset. > > //  This tutorial will try to answer the folowing questions: > > //  1) how do you do that (simply) > > //  2) how do you do that (more reliably and more quickly) > > //      a) let's go faster with a Jacobian > > //      b) how good is your fit? How big is the "error bar" associated with each parameter of your Model? > > //      c) Can we bullet proof it? > > > > //1) How do you do curve fitting in Scilab > > // > > //  We need a) a model function b) a dataset c) some more work > > //  1)a) Let's start with the model function: a sinewave > > //      here is the formula: y=A*sin(k*(x-x0))+y0; > > //      here is the function prototypr [y]=SineModel(x,p) with  p=[x0,k,A,y0]' > > function [y]=SineModel(x,p) > > //INPUTS:   x 1D vector > > //          p parameter vector of size [4,1] containing > > //              x0  : sine origin > > //              k   : sine spatial frequency ie k=2%pi/Xp with Xp the period > > //              A   : sine amplitude > > //              y0  : sine offset > > //OUTPUTS:   y 1D vector of same size than x such that y=A*sin(k*(x-x0))+y0; > >        x0=p(1); > >        k=p(2); > >        A=p(3); > >        y0=p(4); > >        y=y0+A*sin((x-x0)*k); > > endfunction > > > > //  1)b) Let's now have a fake dataset: a sine wave with some noise > > //  We reuse the Model function we have just created to make this fake dataset > > > > x_exp=[-10:0.1:10]'; > > x0=1.55; > > k=1*%pi/3; > > A=4.3; > > y0=1.1 > > y_exp=SineModel(x_exp,[x0,k,A,y0])+(2*rand(x_exp)-1)*6/10; > > > > //let's check and see what it looks like: > > scf(); > > plot(x_exp,y_exp,'k.-'); > > xlabel('Exparimental X'); > > ylabel('Exparimental Y'); > > xtitle('Our fake experimental dataset to fit'); > > > > //  1)c) we are not done yet, we need some more work > > //  First we need an error function that return for a given parameter set param the difference > > //  between the experimental dataset and the model one: > > //  Note that this function returns the difference at each point, not the square of the difference, > > //  nor the sum over each point of the square of the differences > > function e = ErrorFitSine(param, x_exp, y_exp) > >     e = SineModel(x_exp,param)-y_exp; > > endfunction > > // Now we need a starting point that is not too far away from the solution > > // Let's just fo it by hand for the moment we'll see later how to make it programmatically > > // Just go and have a look at the previous plot and "guess", here is mine: > > p0=[1,2*%pi/6,4,1]; > > > > //Ready to go: > > [f,popt, gropt] = leastsq(list(ErrorFitSine,x_exp,y_exp),p0); > > //popt contains the optimal parameter set that fits our dataset > > // > > scf(); > > plot(x_exp,y_exp,'k.'); > > plot(x_exp,SineModel(x_exp,popt),'r-'); > > xlabel('Experimental X') > > ylabel('Experimental Y and best fit'); > > xtitle([... > >      "x0="+msprintf('%1.3f     fit value:     \t%1.3f',x0,popt(1));... > >      "k ="+msprintf('%1.3f     fit value:     \t%1.3f',k,popt(2));... > >      "A ="+msprintf('%1.3f     fit value:     \t%1.3f',A,popt(3));... > >      "y0="+msprintf('%1.3f     fit value:     \t%1.3f',y0,popt(4))... > > ]); > > > > //Yep we are done popt is the optimal parameter set that fits our dataset x_exp,y_exp with our > > // model SineModel > > // How to assert the quality of our fit? We can use fopt and gropt for that. They should be both as small as possible. > > // Ideally, the gradient should be zeros for each parameter, otherwise it means we have not found an optimum. > > > > //2) How to go beyond that simple example? > > // Namely: > > //      a) how to go faster? > > //      b) how to estimate how good our fit is (aka get error bars on our parameters)? > > //      c) how to make thinks more reliable with less human guessing and more bulletproofing? > > > > > > //2)a) and also 2)b) > > //  We need the same extra function in order to speed things up and to estimate > > //  how the "noise" on the experimental dataset translates in "noise" on each > > //  individual parameter p(1), ...p($): the Jacobian matrix of our fit model. > > // Impressive name for a simple idea: providing leastsq with the partial > > // derivative of the model formula with respect to each parameter p(1)..p($). > > function g = ErrorJMatSine(p, x, y) > > // > > > > //   g(1,:) = d(SinModel(x,p))/dx0 = d(SinModel(x,p))/d(p(1)) > > //   g(2,:) = d(SinModel(x,p))/dk = d(SinModel(x,p))/d(p(2)) > > //   g(3,:) = d(SinModel(x,p))/dA = d(SinModel(x,p))/d(p(3)) > > //   g(4,:) = d(SinModel(x,p))/dy0 = d(SinModel(x,p))/d(p(4)) > > > > //    y=A*sin(k*(x-x0))+y0; > >     x0=p(1); > >     k=p(2); > >     A=p(3); > >     y0=p(4); > >     g = [... > >           (-k)*SineModel(x,[x0-%pi/2/k,k,A,0]'),... > >           (x-x0).*SineModel(x,[x0-%pi/2/k,k,A,0]'),... > >           SineModel(x,[x0,k,1,0]'),... > >           ones(x) ... > >         ]; > > endfunction > > > > > > //Ready to go again, and faster this time: > > [f,popt_fast, gropt] = leastsq(list(ErrorFitSine,x_exp,y_exp),ErrorJMatSine,p0); > > > > disp("This should be ~ [0,0,0] when both fits give exactly the same results") > > disp(string((popt-popt_fast)./(popt+popt_fast))) > > > > //Now we check that we are faster: > > //  we do it several times to average over timing variations caused by > > //  other processes running on our computer > > speedup=zeros(1,10) > > for i=1:100 > >      tic(); > >      [f,popt_fast, gropt] = leastsq(list(ErrorFitSine,x_exp,y_exp),p0); > >      t1=toc(); > >      tic(); > >      [f,popt_fast, gropt] = leastsq(list(ErrorFitSine,x_exp,y_exp),ErrorJMatSine,p0); > >      t2=toc(); > >      speedup(i)=t1/t2; > > end > > > > scf(); > > plot(speedup,'k.'); > > plot(mean(speedup)*ones(speedup),'r'); > > xlabel("Fit iteration"); > > ylabel("SpeedUp factor when using Jacobian Matrix"); > > xtitle("Here we have a "+msprintf("~%1.2f",mean(speedup))+... > > " speed improvement using the Jacobian Matrix"); > > > > //2)b) How can we estimate "error bars" for each individual parameters? > > //  We are going to estimate how the "noise" on our dataset turns into > > //  noise on each parameter by estimating the confidence interval at 95% > > > > g = ErrorJMatSine(popt_fast, x_exp, y_exp);//Jacobian matrix of the fit formula > > //estimate of the initial noise on the dataset to fit > > sigma2=1/(length(x_exp)-length(popt_fast))*f; > > > > //covariance matrix of fitting parameters > > pcovar=inv(g'*g)*sigma2; > > //confidence interval for each parameter > > ci=(1.96*sqrt(diag(pcovar)))'; > > > > //Let's present the results of the confidence interval calculation > > str=msprintf("%4.3f\n", popt_fast')+' +/- '+msprintf("%4.3f\n", ci'); > > str=[["x0 = ";"k = ";"A = ";"y0 = "]+str]; > > str=["Fit results:";"y=A*sin(k*(x-x0))+y0";"Param +/- conf. interval @ 95%";str] > > > > scf(); > > plot(x_exp,y_exp,'k.'); > > plot(x_exp,SineModel(x_exp,popt_fast),'r-'); > > xlabel('Experimental X') > > ylabel('Experimental Y and best fit'); > > xtitle('Fit with confidence intervals at 95%'); > > xstringb(-6,-3,str,12,6,'fill'); > > e=gce(); > > e.box="on"; > > e.fill_mode="on"; > > e.background=color("light gray"); > > > > //Now we have a fast fit function that can give some estimation on > > //  how seriously we can believe the fitted parameters. > > //  You can see that the precision with which we retrieve each parameter varies > > //  k0 is more precisely determined than y0 (15x more precisely!) > > //You can play with the amount of noise to see how it affects the retrieved > > //  parameters and confidence intervals > > > > > > > > //2)b) How can we bullet proof our fitting script by putting all this stuff > > //      together is several functions that checks the arguments and modify > > //      them if needed to avoid difficult to understand complaints from > > //      leastsq? > > > > // test of a sinus fitting routine > > > > ////  y=A*sin(k*(x-x0))+y0; > > //// function [y]=Sine(x,x0,k,A,y0) > > //// function [y]=Sine(x,p) with  p=[x0,k,A,y0]' > > //function [y]=Sine(x,varargin) > > ////    pause > > //    select length(varargin) > > //    case 1 then > > //        //we use form [y]=Sine(x,p) where p=[x0,dx,A,y0]' > > //        p=varargin(1); > > //    case 4 then > > //        //we use form [y]=Sine(x,x0,k,A,y0) > > //        p=[varargin(1);varargin(2);varargin(3);varargin(4)]; > > //    else > > //        //call is not correct, we give up > > //        y=[];return > > //    end > > //      x0=p(1); > > //      k=p(2); > > //      A=p(3); > > //      y0=p(4); > > //      y=y0+A*sin((x-x0)*k); > > //endfunction > > // > > function [p0,pinf,psup]=EstimateFitSine(x,y) > >      //  y=A*sin(k*(x-x0))+y0; > >      y0=mean(y); > >      A=(max(y)-min(y))/2; > > //    pause > >      sy=fftshift(fft(y-mean(y))); > >      //sy=fft(y-mean(y)); > >      msy=max(abs(sy));rg=find(abs(sy)==msy); > >      delta=(rg(2)-rg(1))/length(y); > >      k=delta*%pi/(mean(diff(x))); > >      x0=mean(abs(atan(imag(sy(rg)),real(sy(rg)))/k))+%pi/2/k; > >      p0=[x0,k,A,y0]; > >      pinf=[min(x),0      ,-%inf  ,-%inf]; > >      psup=[max(x),%inf   ,%inf   ,%inf]; > > endfunction > > > > > > function e = ErrorFitSine(param, xi, yi) > >     e = SineModel(xi,param)-yi; > > endfunction > >   > > function g = dErrorFitSine(param, xi, yi) > > //    y=A*sin(k*(x-x0))+y0; > >     x0=param(1); > >     k=param(2); > >     A=param(3); > >     y0=param(4); > > //   pause > >     g = [... > >           (-k)*SineModel(xi,[x0-%pi/2/k,k,A,0]),... > >           (xi-x0).*SineModel(xi,[x0-%pi/2/k,k,A,0]),... > >           SineModel(xi,[x0,k,1,0]),... > >           ones(xi) ... > >         ]; > > endfunction > > > > > > > > function [f,popt,yopt,gropt,ci,pcovar]=FitSine(x,y,p0,varargin) > > // FIT Experimental dataset y(x) with a sine model > > //      [y]=A*sin(k*(x-x0))+y0; > > // > > //INPUTS: > >     //x             :       experiemental x dataset > >     //y             :       experimental  y dataset > >     //p0            :       starting parameter set [x0,k,A,y0] > >     //varargin      : > >     //               pinf   inferior limit for popt > >     //               psup   superior limit for popt > > //OUTPUTS > >     //f             :       value of the cost function for the best parameter set > >     //popt          :       best parameter set > >     //yopt          :       fit function evaluated on the x dataset > >     //gropt         :       gradient of the cost function at x > >     //ci            ;       confidence interval @ 95% on each parameter in popt > >     //pcovar        ;       covariance matrix of popt > >     > >     // CHECK x and y are col not row > >     if  isrow(x) then > >         x=x.'; > >     end > >     if  isrow(y) then > >         y=y.'; > >     end > >     > >     if (length(varargin) < 2) then > >        // No constraints on parameter set were provided > >        [f,popt, gropt] = leastsq(list(ErrorFitSine,x,y),dErrorFitSine,p0); > > //      [f,popt_fast, gropt] = leastsq(list(ErrorFitSine,x_exp,y_exp),ErrorJMatSine,p0); > >     else > >        // Constraints on parameter set were provided > >        pinf=varargin(1); > >        psup=varargin(2); > >        [f,popt, gropt] = leastsq(list(ErrorFitSine,x,y),dErrorFitSine,"b",pinf,psup,p0); > >     end > >     //normalized best fit evaluated on normalized x > >     yopt=ErrorFitSine(popt, x, zeros(x)); > >     //rescale best fit param > > > >      g = dErrorFitSine(popt, x, y);//Jacobian matrix of the fit formula > >      //estimate of the noise on the signal to fit > >      sigma2=1/(length(x)-length(popt))*f; > > > >      //covariance matrix of fitting parameters > >      pcovar=inv(g'*g)*sigma2; > >      //confidence interval for each parameter > >      ci=1.96*sqrt(diag(pcovar)); > >      ci=ci.'; > > endfunction > > > > > > // Let's use our unified and bullet-proof routine > > > > x=[-10:0.1:10]; > > //y=Sine(x,[1,2*%pi/3,4,1])+(2*rand(x)-1)*0.1; > > y=SineModel(x,[5,1*%pi/3,4,1])+(2*rand(x)-1)*2; > > //y=SineModel(x,[5,1*%pi/3,4,1])+(2*rand(x)-1)*2/10; > > > > p0=EstimateFitSine(x,y); > > [f,popt,yopt,gropt,ci,pcovar]=FitSine(x,y,p0); > > > > > > str="$"+["x_0";"k";"A";"y_0"]+"="+string(popt.')+"\pm"+string(ci.')+"$"; > > str=["$\text{Model: }[y]=A\sin(k(x-x_0))+y_0$";str]; > > > > scf(); > > plot(x,y,'k.'); > > plot(x,SineModel(x,p0),'g'); > > plot(x,SineModel(x,popt),'r'); > > xlabel("$x$"); > > ylabel("$y$"); > > xtitle(str) > > legend(["Data";"Guess";"Fit"]) > > > > ////////////////////////////////////////////////////////////////////////////////////// > >   > > Le Samedi, Avril 04, 2020 15:13 CEST, Heinz Nabielek <[hidden email]> a écrit: > >   > >> Scilab friends: the power of Scilab is amazing and I have used it recently for non-linear least-squares fitting, below example from Scilab help function for "datafit". On occasions, I have also used "leastsq". > >> > >> Question: how do I derive the 1sigma standard error in the three parameters p(1), p(2), and p(3)? And, if it is not too complicated, covariances? > >> > >> I know this is written in dozens of textbooks, but I am always getting lost. > >> Please provide a simple recipe written in Scilab. > >> Best greetings > >> Heinz > >> > >> > >> > >> // -- 04/04/2020 14:57:30 -- // > >> //generate the data > >> function y=FF(x, p) > >>    y=p(1)*(x-p(2))+p(3)*x.*x > >> endfunction > >> X=[]; > >> Y=[]; > >> pg=[34;12;14] //parameter used to generate data > >> for x=0:.1:3 > >>    Y=[Y,FF(x,pg)+100*(rand()-.5)]; > >>    X=[X,x]; > >> end > >> Z=[Y;X]; > >> //The criterion function > >> function e=G(p, z), > >>    y=z(1),x=z(2); > >>    e=y-FF(x,p), > >> endfunction > >> //Solve the problem > >> p0=[3;5;10] > >> [p,err]=datafit(G,Z,p0); > >> scf(0);clf() > >> plot2d(X,FF(X,pg),5) //the curve without noise > >> plot2d(X,Y,-1)  // the noisy data > >> plot2d(X,FF(X,p),12) //the solution > >> xgrid();legend("the curve without noise"," the noisy data", "THE FINAL SOLUTION.....",4); > >> title("solution set   39.868419    10.312053    11.482521","fontsize",4); > > _______________________________________________ > > users mailing list > > [hidden email] > > http://lists.scilab.org/mailman/listinfo/users> > > > -- > This email has been checked for viruses by Avast antivirus software. > https://www.avast.com/antivirus> > _______________________________________________ > users mailing list > [hidden email] > http://lists.scilab.org/mailman/listinfo/users> _______________________________________________ users mailing list [hidden email] http://lists.scilab.org/mailman/listinfo/users
 Antoine, I find your tutorial very useful indeed! Regards, Federico Miyara On 06/04/2020 04:40, Antoine Monmayrant wrote: Hi all, I would appreciate comments/corrections&improvements from you guys as I am far from a specialist in this field! If some of you find this example useful, I can try to work a bit to improve it and push it as an atom module. In fact, I have a bunch of other fit functions (like gaussian, diode-like curve, ...) that follow the same approach than the sinewave example and I've long wanted to bundle them in a module where one can add a new fit function by providing the model, error, jacobian and estimate functions like in my example... Antoine Le Lundi, Avril 06, 2020 08:43 CEST, Claus Futtrup [hidden email] a écrit:  Hi Scilabers Good examples are worth a lot. Maybe this one could be part of the Scilab documentation? Best regards, Claus On 06.04.2020 08:17, Antoine Monmayrant wrote:  Hello Heinz, See below the small example I built and I refer to whenever I need to do some data fitting with confidence intervals for the parameters of the model. It is far from perfect but it might help you untangle the Jacobian and covariance matrix thingy. Just two words of caution: (1) I am clearly less versed in applied maths than Stéphane or Samuel, so take this with a grain of salft; (2) I built this example ages ago, before Samuel improved datafit. Good luck Antoine ////////////////////////////////////////////////////////////////////////////////////// // REFERENCE: // // Least Squares Estimation // SARA A. VAN DE GEER // Volume 2, pp. 1041–1045 // in // Encyclopedia of Statistics in Behavioral Science // ISBN-13: 978-0-470-86080-9 // ISBN-10: 0-470-86080-4 // Editors Brian S. Everitt & David C. Howell // John Wiley & Sons, Ltd, Chichester, 2005 // //This is a short and definetly incomplete tutorial on data fitting in Scilab using leastsq. // // Basic assumption: this tutorial is for scientists that face this simple problem: // they have an experimental dataset x_epx,y_exp and a certain model y=Model(x,p) to fit thie dataset. // This tutorial will try to answer the folowing questions: // 1) how do you do that (simply) // 2) how do you do that (more reliably and more quickly) // a) let's go faster with a Jacobian // b) how good is your fit? How big is the "error bar" associated with each parameter of your Model? // c) Can we bullet proof it? //1) How do you do curve fitting in Scilab // // We need a) a model function b) a dataset c) some more work // 1)a) Let's start with the model function: a sinewave // here is the formula: y=A*sin(k*(x-x0))+y0; // here is the function prototypr [y]=SineModel(x,p) with p=[x0,k,A,y0]' function [y]=SineModel(x,p) //INPUTS: x 1D vector // p parameter vector of size [4,1] containing // x0 : sine origin // k : sine spatial frequency ie k=2%pi/Xp with Xp the period // A : sine amplitude // y0 : sine offset //OUTPUTS: y 1D vector of same size than x such that y=A*sin(k*(x-x0))+y0; x0=p(1); k=p(2); A=p(3); y0=p(4); y=y0+A*sin((x-x0)*k); endfunction // 1)b) Let's now have a fake dataset: a sine wave with some noise // We reuse the Model function we have just created to make this fake dataset x_exp=[-10:0.1:10]'; x0=1.55; k=1*%pi/3; A=4.3; y0=1.1 y_exp=SineModel(x_exp,[x0,k,A,y0])+(2*rand(x_exp)-1)*6/10; //let's check and see what it looks like: scf(); plot(x_exp,y_exp,'k.-'); xlabel('Exparimental X'); ylabel('Exparimental Y'); xtitle('Our fake experimental dataset to fit'); // 1)c) we are not done yet, we need some more work // First we need an error function that return for a given parameter set param the difference // between the experimental dataset and the model one: // Note that this function returns the difference at each point, not the square of the difference, // nor the sum over each point of the square of the differences function e = ErrorFitSine(param, x_exp, y_exp) e = SineModel(x_exp,param)-y_exp; endfunction // Now we need a starting point that is not too far away from the solution // Let's just fo it by hand for the moment we'll see later how to make it programmatically // Just go and have a look at the previous plot and "guess", here is mine: p0=[1,2*%pi/6,4,1]; //Ready to go: [f,popt, gropt] = leastsq(list(ErrorFitSine,x_exp,y_exp),p0); //popt contains the optimal parameter set that fits our dataset // scf(); plot(x_exp,y_exp,'k.'); plot(x_exp,SineModel(x_exp,popt),'r-'); xlabel('Experimental X') ylabel('Experimental Y and best fit'); xtitle([... "x0="+msprintf('%1.3f fit value: \t%1.3f',x0,popt(1));... "k ="+msprintf('%1.3f fit value: \t%1.3f',k,popt(2));... "A ="+msprintf('%1.3f fit value: \t%1.3f',A,popt(3));... "y0="+msprintf('%1.3f fit value: \t%1.3f',y0,popt(4))... ]); //Yep we are done popt is the optimal parameter set that fits our dataset x_exp,y_exp with our // model SineModel // How to assert the quality of our fit? We can use fopt and gropt for that. They should be both as small as possible. // Ideally, the gradient should be zeros for each parameter, otherwise it means we have not found an optimum. //2) How to go beyond that simple example? // Namely: // a) how to go faster? // b) how to estimate how good our fit is (aka get error bars on our parameters)? // c) how to make thinks more reliable with less human guessing and more bulletproofing? //2)a) and also 2)b) // We need the same extra function in order to speed things up and to estimate // how the "noise" on the experimental dataset translates in "noise" on each // individual parameter p(1), ...p($): the Jacobian matrix of our fit model. // Impressive name for a simple idea: providing leastsq with the partial // derivative of the model formula with respect to each parameter p(1)..p($). function g = ErrorJMatSine(p, x, y) // // g(1,:) = d(SinModel(x,p))/dx0 = d(SinModel(x,p))/d(p(1)) // g(2,:) = d(SinModel(x,p))/dk = d(SinModel(x,p))/d(p(2)) // g(3,:) = d(SinModel(x,p))/dA = d(SinModel(x,p))/d(p(3)) // g(4,:) = d(SinModel(x,p))/dy0 = d(SinModel(x,p))/d(p(4)) // y=A*sin(k*(x-x0))+y0; x0=p(1); k=p(2); A=p(3); y0=p(4); g = [... (-k)*SineModel(x,[x0-%pi/2/k,k,A,0]'),... (x-x0).*SineModel(x,[x0-%pi/2/k,k,A,0]'),... SineModel(x,[x0,k,1,0]'),... ones(x) ... ]; endfunction //Ready to go again, and faster this time: [f,popt_fast, gropt] = leastsq(list(ErrorFitSine,x_exp,y_exp),ErrorJMatSine,p0); disp("This should be ~ [0,0,0] when both fits give exactly the same results") disp(string((popt-popt_fast)./(popt+popt_fast))) //Now we check that we are faster: // we do it several times to average over timing variations caused by // other processes running on our computer speedup=zeros(1,10) for i=1:100 tic(); [f,popt_fast, gropt] = leastsq(list(ErrorFitSine,x_exp,y_exp),p0); t1=toc(); tic(); [f,popt_fast, gropt] = leastsq(list(ErrorFitSine,x_exp,y_exp),ErrorJMatSine,p0); t2=toc(); speedup(i)=t1/t2; end scf(); plot(speedup,'k.'); plot(mean(speedup)*ones(speedup),'r'); xlabel("Fit iteration"); ylabel("SpeedUp factor when using Jacobian Matrix"); xtitle("Here we have a "+msprintf("~%1.2f",mean(speedup))+... " speed improvement using the Jacobian Matrix"); //2)b) How can we estimate "error bars" for each individual parameters? // We are going to estimate how the "noise" on our dataset turns into // noise on each parameter by estimating the confidence interval at 95% g = ErrorJMatSine(popt_fast, x_exp, y_exp);//Jacobian matrix of the fit formula //estimate of the initial noise on the dataset to fit sigma2=1/(length(x_exp)-length(popt_fast))*f; //covariance matrix of fitting parameters pcovar=inv(g'*g)*sigma2; //confidence interval for each parameter ci=(1.96*sqrt(diag(pcovar)))'; //Let's present the results of the confidence interval calculation str=msprintf("%4.3f\n", popt_fast')+' +/- '+msprintf("%4.3f\n", ci'); str=[["x0 = ";"k = ";"A = ";"y0 = "]+str]; str=["Fit results:";"y=A*sin(k*(x-x0))+y0";"Param +/- conf. interval @ 95%";str] scf(); plot(x_exp,y_exp,'k.'); plot(x_exp,SineModel(x_exp,popt_fast),'r-'); xlabel('Experimental X') ylabel('Experimental Y and best fit'); xtitle('Fit with confidence intervals at 95%'); xstringb(-6,-3,str,12,6,'fill'); e=gce(); e.box="on"; e.fill_mode="on"; e.background=color("light gray"); //Now we have a fast fit function that can give some estimation on // how seriously we can believe the fitted parameters. // You can see that the precision with which we retrieve each parameter varies // k0 is more precisely determined than y0 (15x more precisely!) //You can play with the amount of noise to see how it affects the retrieved // parameters and confidence intervals //2)b) How can we bullet proof our fitting script by putting all this stuff // together is several functions that checks the arguments and modify // them if needed to avoid difficult to understand complaints from // leastsq? // test of a sinus fitting routine //// y=A*sin(k*(x-x0))+y0; //// function [y]=Sine(x,x0,k,A,y0) //// function [y]=Sine(x,p) with p=[x0,k,A,y0]' //function [y]=Sine(x,varargin) //// pause // select length(varargin) // case 1 then // //we use form [y]=Sine(x,p) where p=[x0,dx,A,y0]' // p=varargin(1); // case 4 then // //we use form [y]=Sine(x,x0,k,A,y0) // p=[varargin(1);varargin(2);varargin(3);varargin(4)]; // else // //call is not correct, we give up // y=[];return // end // x0=p(1); // k=p(2); // A=p(3); // y0=p(4); // y=y0+A*sin((x-x0)*k); //endfunction // function [p0,pinf,psup]=EstimateFitSine(x,y) // y=A*sin(k*(x-x0))+y0; y0=mean(y); A=(max(y)-min(y))/2; // pause sy=fftshift(fft(y-mean(y))); //sy=fft(y-mean(y)); msy=max(abs(sy));rg=find(abs(sy)==msy); delta=(rg(2)-rg(1))/length(y); k=delta*%pi/(mean(diff(x))); x0=mean(abs(atan(imag(sy(rg)),real(sy(rg)))/k))+%pi/2/k; p0=[x0,k,A,y0]; pinf=[min(x),0 ,-%inf ,-%inf]; psup=[max(x),%inf ,%inf ,%inf]; endfunction function e = ErrorFitSine(param, xi, yi) e = SineModel(xi,param)-yi; endfunction function g = dErrorFitSine(param, xi, yi) // y=A*sin(k*(x-x0))+y0; x0=param(1); k=param(2); A=param(3); y0=param(4); // pause g = [... (-k)*SineModel(xi,[x0-%pi/2/k,k,A,0]),... (xi-x0).*SineModel(xi,[x0-%pi/2/k,k,A,0]),... SineModel(xi,[x0,k,1,0]),... ones(xi) ... ]; endfunction function [f,popt,yopt,gropt,ci,pcovar]=FitSine(x,y,p0,varargin) // FIT Experimental dataset y(x) with a sine model // [y]=A*sin(k*(x-x0))+y0; // //INPUTS: //x : experiemental x dataset //y : experimental y dataset //p0 : starting parameter set [x0,k,A,y0] //varargin : // pinf inferior limit for popt // psup superior limit for popt //OUTPUTS //f : value of the cost function for the best parameter set //popt : best parameter set //yopt : fit function evaluated on the x dataset //gropt : gradient of the cost function at x //ci ; confidence interval @ 95% on each parameter in popt //pcovar ; covariance matrix of popt // CHECK x and y are col not row if isrow(x) then x=x.'; end if isrow(y) then y=y.'; end if (length(varargin) < 2) then // No constraints on parameter set were provided [f,popt, gropt] = leastsq(list(ErrorFitSine,x,y),dErrorFitSine,p0); // [f,popt_fast, gropt] = leastsq(list(ErrorFitSine,x_exp,y_exp),ErrorJMatSine,p0); else // Constraints on parameter set were provided pinf=varargin(1); psup=varargin(2); [f,popt, gropt] = leastsq(list(ErrorFitSine,x,y),dErrorFitSine,"b",pinf,psup,p0); end //normalized best fit evaluated on normalized x yopt=ErrorFitSine(popt, x, zeros(x)); //rescale best fit param g = dErrorFitSine(popt, x, y);//Jacobian matrix of the fit formula //estimate of the noise on the signal to fit sigma2=1/(length(x)-length(popt))*f; //covariance matrix of fitting parameters pcovar=inv(g'*g)*sigma2; //confidence interval for each parameter ci=1.96*sqrt(diag(pcovar)); ci=ci.'; endfunction // Let's use our unified and bullet-proof routine x=[-10:0.1:10]; //y=Sine(x,[1,2*%pi/3,4,1])+(2*rand(x)-1)*0.1; y=SineModel(x,[5,1*%pi/3,4,1])+(2*rand(x)-1)*2; //y=SineModel(x,[5,1*%pi/3,4,1])+(2*rand(x)-1)*2/10; p0=EstimateFitSine(x,y); [f,popt,yopt,gropt,ci,pcovar]=FitSine(x,y,p0); str="$"+["x_0";"k";"A";"y_0"]+"="+string(popt.')+"\pm"+string(ci.')+"$"; str=["$\text{Model: }[y]=A\sin(k(x-x_0))+y_0$";str]; scf(); plot(x,y,'k.'); plot(x,SineModel(x,p0),'g'); plot(x,SineModel(x,popt),'r'); xlabel("$x$"); ylabel("$y$"); xtitle(str) legend(["Data";"Guess";"Fit"]) ////////////////////////////////////////////////////////////////////////////////////// Le Samedi, Avril 04, 2020 15:13 CEST, Heinz Nabielek [hidden email] a écrit:  Scilab friends: the power of Scilab is amazing and I have used it recently for non-linear least-squares fitting, below example from Scilab help function for "datafit". On occasions, I have also used "leastsq". Question: how do I derive the 1sigma standard error in the three parameters p(1), p(2), and p(3)? And, if it is not too complicated, covariances? I know this is written in dozens of textbooks, but I am always getting lost. Please provide a simple recipe written in Scilab. Best greetings Heinz // -- 04/04/2020 14:57:30 -- // //generate the data function y=FF(x, p) y=p(1)*(x-p(2))+p(3)*x.*x endfunction X=[]; Y=[]; pg=[34;12;14] //parameter used to generate data for x=0:.1:3 Y=[Y,FF(x,pg)+100*(rand()-.5)]; X=[X,x]; end Z=[Y;X]; //The criterion function function e=G(p, z), y=z(1),x=z(2); e=y-FF(x,p), endfunction //Solve the problem p0=[3;5;10] [p,err]=datafit(G,Z,p0); scf(0);clf() plot2d(X,FF(X,pg),5) //the curve without noise plot2d(X,Y,-1) // the noisy data plot2d(X,FF(X,p),12) //the solution xgrid();legend("the curve without noise"," the noisy data", "THE FINAL SOLUTION.....",4); title("solution set 39.868419 10.312053 11.482521","fontsize",4);  _______________________________________________ users mailing list [hidden email] http://lists.scilab.org/mailman/listinfo/users   -- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus _______________________________________________ users mailing list [hidden email] http://lists.scilab.org/mailman/listinfo/users  _______________________________________________ users mailing list [hidden email] http://lists.scilab.org/mailman/listinfo/users  _______________________________________________ users mailing list [hidden email] http://lists.scilab.org/mailman/listinfo/users