Fitting Strategies

pytc implements a variety of fitting strategies:

  • BayesianFitter uses Markov-Chain Monte Carlo to estimate posterior probability distributions for all fit parameters. (Recommended)
  • BootstrapFitter samples from uncertainty in each heat and then fits the model to pseudoreplicates using unweighted least-squares regression.
  • MLFitter fits the model to the data using least-squares regression weighted by the uncertainty in each heat. (Default)

These are implemented as subclasses of the pytc.fitters.Fitter. base class.

Bayesian

pytc.fitters.BayesianFitter.

Uses Markov-Chain Monte Carlo (MCMC) to sample from the posterior probability distributions of fit parameters. pytc uses the package emcee to do the sampling. The log likelihood function is:

\[ln(L) = -0.5 \sum_{i=0}^{i < N} \Big [ \frac{(q_{obs,i} - q_{calc,i}(\vec{\theta}))^{2}}{\sigma_{i}^{2}} + ln(\sigma_{i}^{2}) \Big ]\]

where \(q_{obs,i}\) is an observed heat for a shot, \(q_{calc,i}\) is the heat calculated for that shot by the model, and \(\sigma_{i}\) is the experimental uncertainty on that heat.

The prior distribution is uniform within the specified parameter bounds. If any parameter is outside of its bounds, the prior is \(-\infty\). Otherwise, the prior is 0.0 (uniform).

The posterior probability is given by the sum of the log prior and log likelihood functions.

\[ln(P) = ln(L) + ln(prior)\]

Parameter estimates

Parameter estimates are the means of posterior probability distributions.

Parameter uncertainty

Parameter uncertainties are estimated by numerically integrating the posterior probability distributions.

Options

  • num_walkers: number of MCMC walkers
  • initial_walker_spread: how much to spread out the inital walkers
  • ml_guess: whether or not to start the sampler from the ML guess
  • num_steps: number of steps each walker should take
  • burn_in: fraction of initial samples to discard from the sampler
  • num_threads: number of threads to use (not yet implemented)

Bootstrap

pytc.fitters.BootstrapFitter.

Samples from experimental uncertainty in each heat and then peforms unweighted least-squares regression on each pseudoreplicate using scipy.optimize.least_squares. The residuals function is:

\[\vec{r} = \vec{q}_{obs} - \vec{q}_{calc}(\vec{\theta})\]

where \(\vec{q}_{obs}\) is a vector of the observed heats and \(\vec{q}_{calc}(\vec{\theta})\) is a vector of heats observed with fit paramters \(\vec{\theta}\).

This uses the robust Trust Region Reflective method for the nonlinear regression.

Parameter estimates

Parameter estimates are the means of bootstrap pseudoreplicate distributions.

Parameter uncertainty

Parameter uncertainties are estimated by numerically integrating the bootstrap pseudoreplicate distributions.

Options

  • num_bootstrap: number of bootstrap replicates
  • perturb_size: how much to perturb each heat for random sampling
  • exp_err: use experimental estimates of heat uncertainty. (overrides perturb_size.
  • verbose: how verbose to be during the fit

Least-squares regression

pytc.fitters.MLFitter.

Weighted least-squares regression using scipy.optimize.least_squares. The residuals function is:

\[\vec{r} = \frac{\vec{q}_{obs} - \vec{q}_{calc}(\vec{\theta})}{\vec{\sigma}_{obs}}\]

where \(\vec{q}_{obs}\) is a vector of the observed heats, \(\vec{q}_{calc}(\vec{\theta})\) is a vector of heats observed with fit paramters \(\vec{\theta}\), and \(\vec{\sigma}_{obs}\) are the uncertainties on each fit.

This uses the robust Trust Region Reflective method for the nonlinear regression.

Parameter estimates

The parameter estimates are the maximum-likelihood parameters returned by scipy.optimize.least_squares.

Parameter uncertainty

We first approximate the covariance matrix \(\Sigma\) from the Jacobian matrix \(J\) estimated by scipy.optimize.least_squares:

\[\Sigma \approx [2(J^{T} \cdot J)]^{-1}\]

We can then determine the standard deviation on the parameter estimates \(\sigma\) by taking the square-root of the diagonal of \(\Sigma\):

\[\sigma = \sqrt(diag(\Sigma))\]

Ninety-five percent confidence intervals are estimated using the Z-score assuming a normal parameter distribution with the mean and standard deviations determined above.

Warning

Going from \(J\) to \(\Sigma\) is an approximation. This is susceptible to numerical problems and may not always be reliable. Use common sense on your fit errors or, better yet, do Bayesian integration!