Mixed models

Here, we implement generalized linear mixed effect models (GLMMs), i.e. models of the form:

\begin{align}\mathbb{E}\left[ y_{ij} \mid \boldsymbol \gamma_i \right] &= h \left( \boldsymbol \eta_{ij} \right) \\ \boldsymbol \eta_{ij} & = \mathbf{x}_{ij}^T\boldsymbol \beta + \mathbf{u}_{ij}^T \boldsymbol \gamma_i \end{align}

for data $(y_{ij}, \mathbf{x}_{ij})$ that is grouped into $i \in \{1, \dots, m \}$ groups (or clusters) and $j \in \{1, \dots, n_i\}$ observations per group. For an intro on the notation, usual assumptions, etc. please refer to any book on regression models, as this is not part of this notebook.

In the following we show, how $\boldsymbol \beta$ can be estimated and $\boldsymbol \gamma$ can be predicted for the Gaussian case and for the Poisson case. The implementations are not really efficient and don't use results from contemporary research (i.e., lme4's PLS and PIRLS). Good references are for instance McCulloch and Searle (2001), Pinheiro and Bates (2000) and Jiang (2007).

The relevant code can be found here.

Feedback and comments are welcome!

In [1]:
import pandas as pd
import scipy as sp
import scipy.stats as st
from patsy import dmatrices
from sklearn.preprocessing import LabelEncoder

rmvnorm = st.multivariate_normal.rvs
rpois = st.poisson.rvs

The implementations for fitting GLMMs can be found in these modules:

In [2]:
from lme.ls import working_response, working_weight, irls
from lme.marginal_likelhood import restricted_mll
from lme.optim import optim
from lme.util import block_diag, cholesky_factor, ranef_variance, diag

Gaussian LMMs

The most prominent example of an LMM is for Gaussian responses $\mathbf{y}$:

\begin{align} y_{ij} \mid \boldsymbol \gamma_i & \sim \mathcal{N}\left(\mathbf{x}_{ij}^T\boldsymbol \beta + \mathbf{u}_{ij}^T\boldsymbol \gamma_i, \sigma^2\right)\\ \boldsymbol \gamma_i & \sim \mathcal{N}\left(\mathbf{0}, \mathbf{Q}\right) \end{align}

where the response function $h$ is the identity function. This section shows how estimation of the parameters is done in the Gaussian case.

We use the sleep study data from lme4 since it's good for our purpose (but any other one will do, too). The description of the data from the package: The average reaction time per day for subjects in a sleep deprivation study. On day 0 the subjects had their normal amount of sleep. Starting that night they were restricted to 3 hours of sleep per night. The observations represent the average reaction time on a series of tests given each day to each subject.

In [3]:
tab = pd.read_csv("./data/sleepstudy.csv")
tab.head()
Out[3]:
Reaction Days Subject
0 249.5600 0 308
1 258.7047 1 308
2 250.8006 2 308
3 321.4398 3 308
4 356.8519 4 308

In R's formula notation we are trying to fit a model of this form: Reaction ~ Days + (Days | Subject). At the time of writing this patsy doesn't support creating random effects model matrices, so we need to do it ourselves. However, we can use it for the response matrix and fixed effects design matrix.

In [4]:
y, X = dmatrices("Reaction ~ Days", tab)

To create the random effects matrix $\mathbf{U}$, let us first realize how $\mathbf{U}$ has to look: if we collect all cluster-specific responses $\mathbf{y}_i$ and the respective random effects $\boldsymbol \gamma_i$ into vectors, we get

\begin{align} \mathbf{y} = \begin{pmatrix} \mathbf{y}_1 \\ \vdots\\ \mathbf{y}_m\\ \end{pmatrix} = \begin{pmatrix} y_{1,1}\\ \vdots\\ y_{1,n_1}\\ y_{2,1}\\ \vdots\\ y_{m,n_m} \end{pmatrix} ,\qquad \boldsymbol \gamma = \begin{pmatrix} \boldsymbol \gamma_{1}\\ \vdots\\ \boldsymbol \gamma_{m} \end{pmatrix} \end{align}

Thus, in matrix notation the general form of $\mathbf{U}$ needs to be a block diagonal matrix:

\begin{align} \mathbf{U} = \text{blockdiag}\left(\mathbf{U}_1, \dots, \mathbf{U}_i, \dots ,\mathbf{U}_m\right) \end{align}

We can compute $\mathbf{U}$ using the following method:

In [5]:
def build_ranef_model_matrix(tab, factor, ranef):
    inter_tab = tab[[factor, ranef]].copy()
    inter_tab['grp'] = LabelEncoder().fit_transform(tab.Subject)
    inter_tab = inter_tab[[factor, "grp"]].pivot(columns=factor).reindex()
    inter_tab.values[sp.isfinite(inter_tab.values)] = 1
    inter_tab.values[sp.isnan(inter_tab.values)] = 0

    slope_tab = tab[[factor, ranef]].copy().pivot(columns=factor).reindex()
    slope_tab.values[sp.isnan(slope_tab.values)] = 0

    U = pd.concat([inter_tab, slope_tab], axis=1, sort=True)
    U = U.reindex(sorted(U.columns, key=lambda x: x[1]), axis=1)

    return sp.asarray(U.values)
In [6]:
U = build_ranef_model_matrix(tab, "Subject", "Days")
U
Out[6]:
array([[1., 0., 0., ..., 0., 0., 0.],
       [1., 1., 0., ..., 0., 0., 0.],
       [1., 2., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 1., 7.],
       [0., 0., 0., ..., 0., 1., 8.],
       [0., 0., 0., ..., 0., 1., 9.]])

Let's do some checks: $\mathbf{U}$ needs to have as many rows as $\mathbf{X}$ or $\mathbf{y}$ and twice as many columns as grouping factors, i.e. an intercept and a slope for every group.

In [7]:
assert(U.shape[0] == X.shape[0])
assert(U.shape[1] == 2 * len(sp.unique(tab.Subject.values)))

We will sample the responses ourselves here to be able to validate the inferences. We set $\sigma^2 = 0.1$ and $\boldsymbol \beta = \left(2 \ 1\right)^T$.

In [8]:
n, p = X.shape
q = int(U.shape[1] / 2)

sd = 0.1
beta = sp.array([2, 1])

Then we sample every random effects vector $\boldsymbol \gamma_i \sim \mathcal{N}\left(\mathbf{0}, \sigma^2\begin{pmatrix} 1 & 0.25 \\ 0.25 & 1\end{pmatrix}\right)$. Like $\mathbf{U}$, the covariance matrix of $\boldsymbol \gamma$, $\mathbf{G}$, has to be block-diagonal, too:

\begin{align} \mathbf{G} = \text{blockdiag}\left(\mathbf{Q}, \dots, \mathbf{Q}, \dots ,\mathbf{Q}\right) \end{align}
In [9]:
sp.random.seed(23)

Q = sd * sp.array([[1, 0.25], [0.25, 1]])
gamma = rmvnorm(mean=sp.zeros(q * 2), cov=block_diag(Q, q))
gamma.mean()
Out[9]:
0.004416309719067267

We then sample from the conditional distribution $P(\mathbf{y} \mid \boldsymbol \gamma)$ as:

In [10]:
y = rmvnorm(mean=X.dot(beta) + U.dot(gamma), cov=sp.diag(sd * sp.ones(n)))

Fitting an LMM consists of estimating $\sigma^2$, $\boldsymbol \beta$ and $\mathbf{Q}$ and predicting the random effects $\boldsymbol \gamma$. First we estimate $\sigma^2$ and $\mathbf{Q}$ by integrating out $\boldsymbol \beta$ and $\boldsymbol \gamma$ from the conditional likelihood. To do so we treat $\boldsymbol \beta$ as a random variable with flat prior as from an empirical Bayes perspective. Having estimated the variance components, we estimate $\boldsymbol \beta$ and $\boldsymbol \gamma$.

More specifically, we start from the joint likelihood of all parameters $L\left(\boldsymbol \beta, \boldsymbol \gamma, \sigma^2, \mathbf{Q} \right)$. From this we marginalize out the random effects

\begin{align} L\left(\boldsymbol \beta, \sigma^2, \mathbf{Q} \right) = \int L\left(\boldsymbol \beta, \boldsymbol \gamma, \sigma^2, \mathbf{Q} \right) d\boldsymbol \gamma \end{align}

As mentioned above, we now treat $\boldsymbol \beta$ as random and marginalize it out, too:

\begin{align} L\left(\sigma^2, \mathbf{Q} \right) = \int L\left(\boldsymbol \beta, \sigma^2, \mathbf{Q} \right) d\boldsymbol \beta \end{align}

We usually call the likelihood above restricted likelihood. $L\left(\sigma^2, \mathbf{Q} \right)$ needs to be estimated numerically, for instance using Newton's method. We then use Henderson's mixed model equations for $\boldsymbol \beta$ and $\boldsymbol \gamma$.

We optimize the restricted (log)likelihood function first. We define it as a lambda which we can optimize using scipy.

In [11]:
pll = restricted_mll("gaussian")
fn = lambda nu, y, X, U: sp.asscalar(pll(nu, y, X, U)[0])

opt = optim(fn,
            sp.array([2, 5, 0.5, 1]),
            ((0.01, None), (None, None), (None, None), (None, None)),
            args=(y, X, U))
sd_hat, nu_hat = opt.x[0], opt.x[1:]

Estimates of $\sigma^2$:

In [12]:
print("sigma:\t\t{}\nsigma_hat:\t{}".format(sd, sd_hat))
sigma:		0.1
sigma_hat:	0.10451774782009256

Estimates of $\mathbf{G}$:

In [13]:
print("Q:\n{}\nQ_hat:\n{}".format(Q, cholesky_factor(nu_hat)))
Q:
[[0.1   0.025]
 [0.025 0.1  ]]
Q_hat:
[[0.02451861 0.02563742]
 [0.02563742 0.11471649]]

We then use Henderson's equations (see McCulloch & Searle (2001), Eqns 6.24 and 6.42):

In [14]:
G_hat = ranef_variance(nu_hat, q)
R_hat = diag(n, sd_hat)
b_hat, gamma_hat = irls(X, U, G_hat, 1/sp.diag(R_hat), y)

Estimates of $\boldsymbol \beta$:

In [15]:
print("beta: {}\nbeta_hat: {}".format(beta, b_hat))
beta: [2 1]
beta_hat: [2.02667544 1.01888476]

Estimates of $\boldsymbol \gamma$:

In [16]:
print("gamma:\n{}\ngamma_hat:\n{}".format(gamma, gamma_hat))
gamma:
[-0.25538236 -0.07811166 -0.16371937  0.15081283  0.37619662  0.01261309
 -0.48256002  0.0082431  -0.04384917 -0.30698673  0.06033511  0.46520567
  0.27633971 -0.09256565  0.52124719  0.04748265 -0.05064887  0.71172263
 -0.08712428 -0.79900486  0.125951    0.0477785  -0.27174966 -0.06332042
 -0.12725791 -0.03387785  0.01130221 -0.04147368  0.22723849  0.29448651
 -0.02159007  0.526561   -0.25158029  0.03071211 -0.17614052 -0.3882979 ]
gamma_hat:
[-0.17565207 -0.10253998 -0.00694584  0.08540393  0.11954361  0.04024997
 -0.06631729 -0.06497658 -0.07201884 -0.31794553  0.1757246   0.42912509
  0.02265338 -0.08137199  0.0316269   0.08725347  0.08311083  0.66610281
 -0.10123518 -0.83483693  0.07952846  0.02856995 -0.11877487 -0.10019238
 -0.0571625  -0.07433132 -0.06748332 -0.07988322  0.20390674  0.26262008
  0.11978779  0.47507794 -0.00283365 -0.02128552 -0.16745874 -0.3970398 ]

Statsmodels

Let's compare our implementation to statsmodels.

In [17]:
import statsmodels.formula.api as smf

We need to set the response to our samples values first:

In [18]:
tab.Reaction = y

Then we fit the model:

In [19]:
md = smf.mixedlm("Reaction ~ Days", tab, groups=tab["Subject"], re_formula="~ Days")
mdf = md.fit()
print(mdf.summary())
          Mixed Linear Model Regression Results
==========================================================
Model:             MixedLM  Dependent Variable:  Reaction
No. Observations:  180      Method:              REML
No. Groups:        18       Scale:               0.1045
Min. group size:   10       Likelihood:          -110.6631
Max. group size:   10       Converged:           Yes
Mean group size:   10.0
----------------------------------------------------------
                 Coef. Std.Err.   z    P>|z| [0.025 0.975]
----------------------------------------------------------
Intercept        2.027    0.058 34.921 0.000  1.913  2.140
Days             1.019    0.080 12.693 0.000  0.862  1.176
Group Var        0.025    0.068
Group x Days Cov 0.026    0.065
Days Var         0.115    0.130
==========================================================

Nice! Fixed effects look good, and the random effects:

In [20]:
print("gamma_hat\n{}\ngamma_statsmodels:\n{}".format(
    gamma_hat, sp.concatenate([x.values for x in mdf.random_effects.values()])))
gamma_hat
[-0.17565207 -0.10253998 -0.00694584  0.08540393  0.11954361  0.04024997
 -0.06631729 -0.06497658 -0.07201884 -0.31794553  0.1757246   0.42912509
  0.02265338 -0.08137199  0.0316269   0.08725347  0.08311083  0.66610281
 -0.10123518 -0.83483693  0.07952846  0.02856995 -0.11877487 -0.10019238
 -0.0571625  -0.07433132 -0.06748332 -0.07988322  0.20390674  0.26262008
  0.11978779  0.47507794 -0.00283365 -0.02128552 -0.16745874 -0.3970398 ]
gamma_statsmodels:
[-0.17566149 -0.10253845 -0.00694846  0.08540437  0.11955076  0.04024881
 -0.06632019 -0.06497611 -0.07201578 -0.31794607  0.17572583  0.42912494
  0.02265696 -0.08137258  0.03162687  0.08725348  0.0830998   0.66610469
 -0.10122115 -0.83483931  0.07953317  0.02856918 -0.11878046 -0.10019147
 -0.05716453 -0.07433099 -0.06748593 -0.0798828   0.20391407  0.26261891
  0.11978405  0.4750786  -0.00283331 -0.02128558 -0.16746021 -0.39703961]

Poisson GLMMs

A frequently used GLMM in biology which response variable in the exponential family is the Poisson case:

\begin{align} {y}_{ij} \mid \boldsymbol \gamma_i & \sim \text{Pois} \left( \exp \left( \mathbf{x}_{ij}^T\boldsymbol \beta + \mathbf{u}_{ij}^T\boldsymbol \gamma_i \right)\right)\\ \boldsymbol \gamma_i & \sim \mathcal{N}\left(\mathbf{0}, \mathbf{Q}\right) \end{align}

where $h$ is, for instance the, exponential function. This section shows how estimation of the parameters is done for Poisson variables.

As before, we use the sleep study data from lme4 and sample the responses ourselves, using the same fixed and random effects matrices and parameters as before. We only change the fixed effects to not get overflows:

In [21]:
beta = sp.array([.25, .5])
In [22]:
y = rpois(mu=sp.exp(X.dot(beta) + U.dot(gamma)))

Fitting an exponential family GLMM is more involved than their Gaussian counterparts, since the integral that marginalizes out $\boldsymbol \gamma$ in general does not have a close form solution and the relationship between predictor $\boldsymbol \eta$ and the responses $\mathbf{y}$ is not linear. Otherwise inference is conceptually the same.

We start from the restricted likelihood $L\left(\mathbf{Q} \right)$ and use an Laplace approximation which we maximize to estimate variance components of $\mathbf{Q}$. Then analogously to normal GLMs, we use IRLS to estimate $\boldsymbol \beta$ and $\boldsymbol \gamma$. We alternate these steps until converence. You can find more details in the references above.

First, we define the restricted log-likelihood again.

In [23]:
pll = restricted_mll("poisson")
fn = lambda nu, y, X, U, W, b: sp.asscalar(pll(nu, y, X, U, W, b)[0])

We then alternate between estimating the variance components of $\mathbf{Q}$ and the estimating the fixed and random effects.

In [24]:
b_tilde = bold = sp.ones(shape=p)
g_tilde = gold = sp.ones(shape=q * 2)

while True:
    y_tilde = working_response(y, X, U, b_tilde, g_tilde, sp.exp, sp.exp)
    w_tilde = working_weight(y, X, U, b_tilde, g_tilde, sp.exp, sp.exp)

    nu_hat = optim(fn,
                   nu_hat,
                   bounds=((None, None), (None, None), (None, None)),
                   args=(y_tilde, X, U, sp.diag(w_tilde), b_tilde),
                   iter=1).x

    G_hat = ranef_variance(nu_hat, q)
    b_tilde, g_tilde = irls(X, U, G_hat, w_tilde, y_tilde)

    if sp.sum((b_tilde - bold) ** 2) < 0.00001 and \
       sp.sum((g_tilde - gold) ** 2) < 0.00001:
       break
    bold, gold = b_tilde, g_tilde

Estimates of $\boldsymbol \beta$:

In [25]:
print("beta: {}\nbeta_hat: {}".format(beta, b_tilde))
beta: [0.25 0.5 ]
beta_hat: [0.128871   0.54520462]

Estimates of $\boldsymbol \gamma$:

In [26]:
print("gamma:\n{}\ngamma_hat:\n{}".format(gamma, gamma_hat))
gamma:
[-0.25538236 -0.07811166 -0.16371937  0.15081283  0.37619662  0.01261309
 -0.48256002  0.0082431  -0.04384917 -0.30698673  0.06033511  0.46520567
  0.27633971 -0.09256565  0.52124719  0.04748265 -0.05064887  0.71172263
 -0.08712428 -0.79900486  0.125951    0.0477785  -0.27174966 -0.06332042
 -0.12725791 -0.03387785  0.01130221 -0.04147368  0.22723849  0.29448651
 -0.02159007  0.526561   -0.25158029  0.03071211 -0.17614052 -0.3882979 ]
gamma_hat:
[-0.17565207 -0.10253998 -0.00694584  0.08540393  0.11954361  0.04024997
 -0.06631729 -0.06497658 -0.07201884 -0.31794553  0.1757246   0.42912509
  0.02265338 -0.08137199  0.0316269   0.08725347  0.08311083  0.66610281
 -0.10123518 -0.83483693  0.07952846  0.02856995 -0.11877487 -0.10019238
 -0.0571625  -0.07433132 -0.06748332 -0.07988322  0.20390674  0.26262008
  0.11978779  0.47507794 -0.00283365 -0.02128552 -0.16745874 -0.3970398 ]

Statsmodels

Afaik statsmodels does not have implementations for frequentist GLMMS.