# Fitting Strategies¶

pytc implements a variety of fitting strategies:

• BayesianFitter uses Markov-Chain Monte Carlo to estimate posterior probability distributions for all fit parameters. (Recommended)
• BootstrapFitter samples from uncertainty in each heat and then fits the model to pseudoreplicates using unweighted least-squares regression.
• MLFitter fits the model to the data using least-squares regression weighted by the uncertainty in each heat. (Default)

These are implemented as subclasses of the pytc.fitters.Fitter. base class.

## Bayesian¶

Uses Markov-Chain Monte Carlo (MCMC) to sample from the posterior probability distributions of fit parameters. pytc uses the package emcee to do the sampling. The log likelihood function is:

$ln(L) = -0.5 \sum_{i=0}^{i < N} \Big [ \frac{(q_{obs,i} - q_{calc,i}(\vec{\theta}))^{2}}{\sigma_{i}^{2}} + ln(\sigma_{i}^{2}) \Big ]$

where $$q_{obs,i}$$ is an observed heat for a shot, $$q_{calc,i}$$ is the heat calculated for that shot by the model, and $$\sigma_{i}$$ is the experimental uncertainty on that heat.

The prior distribution is uniform within the specified parameter bounds. If any parameter is outside of its bounds, the prior is $$-\infty$$. Otherwise, the prior is 0.0 (uniform).

The posterior probability is given by the sum of the log prior and log likelihood functions.

$ln(P) = ln(L) + ln(prior)$

### Parameter estimates¶

Parameter estimates are the means of posterior probability distributions.

### Parameter uncertainty¶

Parameter uncertainties are estimated by numerically integrating the posterior probability distributions.

### Options¶

• num_walkers: number of MCMC walkers
• initial_walker_spread: how much to spread out the inital walkers
• ml_guess: whether or not to start the sampler from the ML guess
• num_steps: number of steps each walker should take
• burn_in: fraction of initial samples to discard from the sampler
• num_threads: number of threads to use (not yet implemented)

## Bootstrap¶

Samples from experimental uncertainty in each heat and then peforms unweighted least-squares regression on each pseudoreplicate using scipy.optimize.least_squares. The residuals function is:

$\vec{r} = \vec{q}_{obs} - \vec{q}_{calc}(\vec{\theta})$

where $$\vec{q}_{obs}$$ is a vector of the observed heats and $$\vec{q}_{calc}(\vec{\theta})$$ is a vector of heats observed with fit paramters $$\vec{\theta}$$.

This uses the robust Trust Region Reflective method for the nonlinear regression.

### Parameter estimates¶

Parameter estimates are the means of bootstrap pseudoreplicate distributions.

### Parameter uncertainty¶

Parameter uncertainties are estimated by numerically integrating the bootstrap pseudoreplicate distributions.

### Options¶

• num_bootstrap: number of bootstrap replicates
• perturb_size: how much to perturb each heat for random sampling
• exp_err: use experimental estimates of heat uncertainty. (overrides perturb_size.
• verbose: how verbose to be during the fit

## Least-squares regression¶

Weighted least-squares regression using scipy.optimize.least_squares. The residuals function is:

$\vec{r} = \frac{\vec{q}_{obs} - \vec{q}_{calc}(\vec{\theta})}{\vec{\sigma}_{obs}}$

where $$\vec{q}_{obs}$$ is a vector of the observed heats, $$\vec{q}_{calc}(\vec{\theta})$$ is a vector of heats observed with fit paramters $$\vec{\theta}$$, and $$\vec{\sigma}_{obs}$$ are the uncertainties on each fit.

This uses the robust Trust Region Reflective method for the nonlinear regression.

### Parameter estimates¶

The parameter estimates are the maximum-likelihood parameters returned by scipy.optimize.least_squares.

### Parameter uncertainty¶

We first approximate the covariance matrix $$\Sigma$$ from the Jacobian matrix $$J$$ estimated by scipy.optimize.least_squares:

$\Sigma \approx [2(J^{T} \cdot J)]^{-1}$

We can then determine the standard deviation on the parameter estimates $$\sigma$$ by taking the square-root of the diagonal of $$\Sigma$$:

$\sigma = \sqrt(diag(\Sigma))$

Ninety-five percent confidence intervals are estimated using the Z-score assuming a normal parameter distribution with the mean and standard deviations determined above.

Warning

Going from $$J$$ to $$\Sigma$$ is an approximation. This is susceptible to numerical problems and may not always be reliable. Use common sense on your fit errors or, better yet, do Bayesian integration!