Fitting Strategies¶

pytc implements a variety of fitting strategies:

BayesianFitter uses Markov-Chain Monte Carlo to estimate posterior probability distributions for all fit parameters. (Recommended)
BootstrapFitter samples from uncertainty in each heat and then fits the model to pseudoreplicates using unweighted least-squares regression.
MLFitter fits the model to the data using least-squares regression weighted by the uncertainty in each heat. (Default)

These are implemented as subclasses of the pytc.fitters.Fitter. base class.

Bayesian¶

pytc.fitters.BayesianFitter.

Uses Markov-Chain Monte Carlo (MCMC) to sample from the posterior probability distributions of fit parameters. pytc uses the package emcee to do the sampling. The log likelihood function is:

\[ln(L) = -0.5 \sum_{i=0}^{i < N} \Big [ \frac{(q_{obs,i} - q_{calc,i}(\vec{\theta}))^{2}}{\sigma_{i}^{2}} + ln(\sigma_{i}^{2}) \Big ]\]

where \(q_{obs,i}\) is an observed heat for a shot, \(q_{calc,i}\) is the heat calculated for that shot by the model, and \(\sigma_{i}\) is the experimental uncertainty on that heat.

The prior distribution is uniform within the specified parameter bounds. If any parameter is outside of its bounds, the prior is \(-\infty\). Otherwise, the prior is 0.0 (uniform).

The posterior probability is given by the sum of the log prior and log likelihood functions.

\[ln(P) = ln(L) + ln(prior)\]

Parameter estimates¶

Parameter estimates are the means of posterior probability distributions.

Parameter uncertainty¶

Parameter uncertainties are estimated by numerically integrating the posterior probability distributions.

Options¶

num_walkers: number of MCMC walkers
initial_walker_spread: how much to spread out the inital walkers
ml_guess: whether or not to start the sampler from the ML guess
num_steps: number of steps each walker should take
burn_in: fraction of initial samples to discard from the sampler
num_threads: number of threads to use (not yet implemented)

Bootstrap¶

pytc.fitters.BootstrapFitter.

Samples from experimental uncertainty in each heat and then peforms unweighted least-squares regression on each pseudoreplicate using scipy.optimize.least_squares. The residuals function is:

\[\vec{r} = \vec{q}_{obs} - \vec{q}_{calc}(\vec{\theta})\]

where \(\vec{q}_{obs}\) is a vector of the observed heats and \(\vec{q}_{calc}(\vec{\theta})\) is a vector of heats observed with fit paramters \(\vec{\theta}\).

This uses the robust Trust Region Reflective method for the nonlinear regression.

Parameter estimates¶

Parameter estimates are the means of bootstrap pseudoreplicate distributions.

Parameter uncertainty¶

Parameter uncertainties are estimated by numerically integrating the bootstrap pseudoreplicate distributions.

Options¶

num_bootstrap: number of bootstrap replicates
perturb_size: how much to perturb each heat for random sampling
exp_err: use experimental estimates of heat uncertainty. (overrides perturb_size.
verbose: how verbose to be during the fit

Least-squares regression¶

pytc.fitters.MLFitter.

Weighted least-squares regression using scipy.optimize.least_squares. The residuals function is:

\[\vec{r} = \frac{\vec{q}_{obs} - \vec{q}_{calc}(\vec{\theta})}{\vec{\sigma}_{obs}}\]

where \(\vec{q}_{obs}\) is a vector of the observed heats, \(\vec{q}_{calc}(\vec{\theta})\) is a vector of heats observed with fit paramters \(\vec{\theta}\), and \(\vec{\sigma}_{obs}\) are the uncertainties on each fit.

This uses the robust Trust Region Reflective method for the nonlinear regression.

Parameter estimates¶

The parameter estimates are the maximum-likelihood parameters returned by scipy.optimize.least_squares.

Parameter uncertainty¶

We first approximate the covariance matrix \(\Sigma\) from the Jacobian matrix \(J\) estimated by scipy.optimize.least_squares:

\[\Sigma \approx [2(J^{T} \cdot J)]^{-1}\]

We can then determine the standard deviation on the parameter estimates \(\sigma\) by taking the square-root of the diagonal of \(\Sigma\):

\[\sigma = \sqrt(diag(\Sigma))\]

Ninety-five percent confidence intervals are estimated using the Z-score assuming a normal parameter distribution with the mean and standard deviations determined above.

Warning

Going from \(J\) to \(\Sigma\) is an approximation. This is susceptible to numerical problems and may not always be reliable. Use common sense on your fit errors or, better yet, do Bayesian integration!