Sequential regression models are a a special case of ordinal models, where the different categories can only be reached one after another, i.e., sequentially. Sequential response mechanisms frequently occur in biological and medical settings, for example titer data or MIC data, but haven’t received much attention in applied research. In this case study, we analyse an ordinal data sets for which an underlying sequential response mechanism can be assumed and compare the sequential model to the more common cumulative model.

The relevant code that is used in this case study is implemented as an R-package. The package relies on rstan for inference and can be found here.

Feedback and comments are welcome!

suppressMessages({
  library(tidyverse)
  library(colorspace)
  library(patchwork)
  library(bayesplot)

  library(rstan)
  library(loo)
  library(rstansequential)
})

options(mc.cores = max(1, parallel::detectCores() - 1))

Cumulative models

The conventional cumulative model introduced by McCullagh (1980) is probably the most frequently encountered ordinal regression model. It assumes a response variable \(Y\) with \(K\) ordered categories \(k = 1, \dots, K\) and ordinal probabilities \(\pi_1, \dots, \pi_K\). The model is motivated by a latent continuous random process \(\tilde{Y}\)

\[\begin{align} \tilde{Y} & = -\mathbf{x}^T \boldsymbol \beta + \epsilon \\ \epsilon & \sim f \end{align} \]

for some density \(f\). The latent variable is related to the observed category with a threshold mechanism

\[\begin{align} Y = k \Leftrightarrow \theta_{k - 1} < \tilde{Y} \le \theta_{k}, \; k = 1, \dots, K \end{align} \]

for a set of latent ordered continuous cutpoints \(\theta_0, \dots, \theta_K\) with \(\theta_0 = -\infty\) and \(\theta_K = \infty\). From this assumption one obtains

\[\begin{align} P(Y \le k) & = P(\tilde{Y} \le \theta_k) \\ & = P(-\mathbf{x}^T \boldsymbol \beta +\epsilon \le \theta_k) \\ & = P(\epsilon \le \theta_k + \mathbf{x}^T \boldsymbol \beta) \\ &= F(\theta_k + \mathbf{x}^T \boldsymbol \beta), \end{align}\]

which gives rise to the name cumulative (because of the cumulative probabilities). Computing the ordinal probabilities form this gives

\[\begin{align} \pi_k & = F(\theta_k + \mathbf{x}^T \boldsymbol \beta) - F(\theta_{k - 1} + \mathbf{x}^T \boldsymbol \beta) \end{align}\]

The choice of the distribution function \(F\) determines which model we obtain. The cumulative logit model (i.e., choosing a logit link function) is obtained if we choose a logistic distribution function

\[\begin{align} P(Y \le k) = \frac{1}{1 + \exp(-\theta_k - \mathbf{x}^T \boldsymbol \beta)} \end{align}\]

or equivalently a logit link which gives us the cumulative log odds

\[\begin{align} F^{-1}\left( P(Y \le k) \right) = \log \frac{P(Y \le k)}{1 - P(Y \le k)} = \theta_k + \mathbf{x}^T \boldsymbol \beta \end{align}\]

If we look at the odds ratio of the event \(Y \le k\) for two populations characterized by covariable values \(\mathbf{x}\) and \(\mathbf{\tilde{x}}\)

\[\begin{align} \frac{P(Y \le k \mid \mathbf{x})\ / \ P(Y > k \mid \mathbf{x}) }{P(Y \le k \mid \mathbf{\tilde{x}})\ / \ P(Y > k \mid \mathbf{\tilde{x}}) } & = \frac{ \exp\left( \theta_k + \mathbf{x}^T \boldsymbol \beta \right)}{\exp\left(\theta_k + \mathbf{\tilde{x}}^T \boldsymbol \beta \right)} \\ & = \exp\left( \left( \mathbf{x} - \mathbf{\tilde{x}}\right)^T \boldsymbol \beta \right) \end{align}\]

we see that the ratio of odds is the same regardless of the category. This assumption is also called proportional odds and the model proportional odds model. The derivation is crucial for the interpretation of coefficients and makes it especially simple, because it says that the ratio of the cumulative odds of two populations are the same irrespective of the category. Hence we can disregard the categories when we interpret coefficients.

Sequential models

In some applications if makes more sense to assume that the categories can only be reached sequentially. For instance, to gain the title of “PhD” in the ordering of degrees “Bachelor < Master < PhD” first Bacherlor and Master titles have to be obtained. Analogously, if the response variable is the number of guitars someone owns, one can assume that the guitars have been bought successively, and not, as a cumulative model assumes, all at the same time. Thus, the number of guitars can be considered to be a “sequential” ordinal random variable. The sequential model has (afaik) been introduced by Tutz (1991). In the literature, multiple names has been used for sequential regression models. Tutz also calls them continuation ratio models. Yee (2010) calls them stopping ratio models which seems to be a better fitting name.

As in the cumulative model, we assume a latent continuous process \(\tilde{Y} = - \mathbf{x}^T \boldsymbol \beta + \epsilon\). For the sequential model it relates to the observed random variable \(Y\) via

\[\begin{align} P(Y = k \mid Y \ge k) = F(\theta_k + \mathbf{x}^T \boldsymbol \beta) \\ \end{align} \]

As above, \(k = 1, \dots, K\), \(\theta_0, \dots, \theta_K\) are cutpoints with \(\theta_0 = -\infty\) and \(\theta_K = \infty\) and \(F\) is a cumulative distribution function.

In order to reach a category \(k\), \(Y\) has to go through all the previous categories. For instance, for any \(Y\), we first compute

\[\begin{align} P(Y = 1) = F(\theta_1 + \mathbf{x}^T \boldsymbol \beta) \end{align}\]

and stop the process if the Bernoulli experiment is successful. If it is not successful, i.e, \(Y \ge 2\), we do another experiment between \(Y=2\) and \(Y >2\):

\[\begin{align} P(Y = 2 \mid Y \ge 2) = F(\theta_2 + \mathbf{x}^T \boldsymbol \beta) \end{align}\]

If this is successful, we stop. Otherwise we continue the process. At the end of the process, the ordinal propability for category \(k\) is

\[\begin{align} \pi_k & = (Y = k \mid Y \ge k) \prod_{l = 1}^{k - 1} 1 - P(Y = l \mid Y \ge l) \\ & = F(\theta_k + \mathbf{x}^T \boldsymbol \beta) \prod_{l = 1}^{k - 1} 1 - F(\theta_l + \mathbf{x}^T \boldsymbol \beta) \end{align}\]

As for the cumulative model, the most frequent choice of \(F\) is probably the cumulative distribution function of the logistic distribution. Hence

\[\begin{align} P(Y = k \mid Y \ge k) = \frac{1}{1 + \exp(-\theta_k - \mathbf{x}^T \boldsymbol \beta)} \end{align}\]

or equivalently the logit

\[\begin{align} F^{-1}\left( Y = k \mid Y \ge k \right) = \log \frac{P(Y = k \mid Y \ge k)}{1 - P(Y = k \mid Y \ge k)} = \theta_k + \mathbf{x}^T \boldsymbol \beta \end{align}\]

The logit is a ratio of the probability that the respose category stops at \(k\) and the probability of the categories \(\{k + 1, \dots, K \}\). As in the cumulative model, the odds ratio does not depend on the category which makes the model a proportional odds model as well.

Wine tasting data

We apply the two models above to a wine data set from Randall (1989) and Tutz and Hennevogl (1996) where nine judges assessed the bitterness of wine. When tasting the wines judges gave a statement about the bitterness of wine on a scale from least to most bitter. I extracted the data from the papers and put added them to the rstansequential package. Let’s visualize the data first.

data(wine)
head(wine)

# A tibble: 6 x 5
  rating temperature contact bottle judge
  <ord>  <fct>       <fct>   <fct>  <fct>
1 2      low         no      1      1    
2 1      low         no      1      2    
3 2      low         no      1      3    
4 3      low         no      1      4    
5 2      low         no      1      5    
6 3      low         no      1      6

The response rating determines the bitterness of the wine from least to most bitter. When the grapes were crushed temperature could be controlled and set to either low or high. Furthermore, contact determines whether grapes had contact to the skin. For each treatment combination two bottles have been fermented.

I first pivot the data frame around the response rating such that we can nicely plot it.

wide.wine <- wine %>%
  dplyr::select(-judge) %>%
  tidyr::pivot_longer(-rating, names_to="covariable", values_to = "value")

g <- ggthemes::theme_tufte() +
  ggplot2::theme(axis.title = element_blank(), 
                 strip.text = element_text(size=12))

ggplot2::ggplot(wide.wine) +
  ggplot2::geom_histogram(
    aes(x=value, fill=rating), 
    color="black", 
    stat="count", 
    position = "dodge") +
  colorspace::scale_fill_discrete_sequential(c1=50, c2=80, l1=30, l2=80, alpha=0.75) + 
  ggplot2::facet_wrap(. ~ covariable, nrow = 3, scales = "free") +
  g

The bars show the ratings of the independent judges given the covariables bottle, contact and temperature. For the second and third plot, i.e., contact=yes and temperature=high the ratings seem higher than for the other levels. In order to account for heterogeneity of the judges, we should assume that each has her own sensitivity for bitterness of the wines resulting in correlation of the errors. Hence, we add a group-level coefficient, such that the latent process becomes

\[\begin{align} \tilde{Y}_j = - \mathbf{x}^T \boldsymbol \beta - \gamma_j + \epsilon, \end{align}\]

for judges \(j = 1, \dots, 9\). The vector \(\mathbf{x}\) is a vector of the dummy encoded covariables contact, bottle and temperature and \(\gamma_j\) are group-specific intercepts. In a recent case study I explained hierarchical models in more detail (link) so have a look if this notation is unclear.

Stan files

The stan models for both the cumulative and the sequential models are with the exception of the likelihood the same. The stan code for the cumulative log probability mass function is fairly short:

cat(
  readLines(
    system.file(file.path("stan", "include", "cumulative.stan"), 
                package = "rstansequential")
  ),
  sep="\n"
)

real marg_prob_cumul_logit(int y, real eta, vector c, int K) {
  real val = 0;
   if (y == 1) {
      val = inv_logit(c[1] - eta);
    }
    else if (1 < y && y < K) {
      val = inv_logit(c[y] - eta) - inv_logit(c[y - 1] - eta);
    }
    else {
      val = 1 - inv_logit(c[y - 1] - eta);
    }

    return val;
}


real cumulative_lpmf(int[] y, vector eta, vector c) {
  int N = num_elements(y);
  int K = num_elements(c) + 1;
  real lpmf = 0;

  for (i in 1:N) {
    lpmf += log(marg_prob_cumul_logit(y[i], eta[i], c, K));
  }

  return lpmf;
}


real cumulative_scalar_lpmf(int y, real eta, vector c) {
  int K = num_elements(c) + 1;
  return log(marg_prob_cumul_logit(y, eta, c, K));
}

Analogously, the probability mass function of the sequential model looks like this:

cat(
  readLines(
    system.file(file.path("stan", "include", "sratio.stan"), 
                package = "rstansequential")
  ),
  sep="\n"
)

real marg_prob_stop_ratio(int y, real eta, vector c, int K) {
  real val = 0;
  if (y < K) {
    val += inv_logit(c[y] - eta);
    if (y > 1) {
      for (i in 1:(y - 1)) {
        val *= 1 - inv_logit(c[i] - eta);
      }
    }
  }
  else {
    val = prod(1 - inv_logit(c[1:(K - 1)] - eta));
  }

  return val;
}


real sratio_lpmf(int[] y, vector eta, vector c) {
  int N = num_elements(y);
  int K = num_elements(c) + 1;
  real lpmf = 0;

  for (i in 1:N) {
    lpmf += log(marg_prob_stop_ratio(y[i], eta[i], c, K));
  }

  return lpmf;
}


real sratio_scalar_lpmf(int y, real eta, vector c) {
  int K = num_elements(c) + 1;
  return log(marg_prob_stop_ratio(y, eta, c, K));
}

We plug each of these files into the main Stan file, which for the cumulative model looks like this:

cat(
  readLines(
    system.file(file.path("stan", "hierarchical_cumulative.stan"), 
                package = "rstansequential")
  ),
  sep="\n"
)

functions {
#include /include/block_diag.stan
#include /include/cumulative.stan
}


data {
  int<lower=1> N;
  int<lower=1> K;
  int<lower=1> P;
  matrix[N, P] X;
  int<lower=1, upper=K> y[N];
  int<lower=1> Q;
  matrix[N, Q] Z;
  int<lower=1> n_coef;
  int<lower=1> n_levels;
}


transformed data {
  int nr = n_coef > 1 ? n_coef : 0;
}


parameters {
  ordered[K - 1] threshold;
  vector[P] beta;

  vector[Q] gamma;
  vector<lower=0>[n_coef] sigma;
  cholesky_factor_corr[nr] Omega;
}


transformed parameters {
  matrix[nr, nr] R;
  if (n_coef > 1) {
    R = diag_pre_multiply(sigma, Omega);
  }
}


model {
  matrix[Q, Q] G;
  if (n_coef > 1) {
    G = block_diag(R, n_levels);
  } else {
    G = diag_matrix(rep_vector(sigma[1]^2, n_levels));
  }

  beta ~ normal(0, 1);
  threshold ~ student_t(3, 0, 10);

  sigma ~ normal(0, 1);
  Omega ~ lkj_corr_cholesky(1);
  gamma ~ multi_normal_cholesky(rep_vector(0, Q), G);

  y ~ cumulative(X * beta + Z * gamma, threshold);
}


generated quantities {
  vector[N] log_lik;
  for (i in 1:N) {
    log_lik[i] = cumulative_scalar_lpmf(y[i] | X[i] * beta + Z[i] * gamma, threshold);
  }
}

The data block is fairly obvious: it defines the data passed to Stan. The parameters and transformed parameter blocks defined the model parameters such as coefficients (beta and gamma), standard deviations (sigma) and, in case a group-level effect has multiple coefficients, a correlation matrix (Omega). Note that my implementation could be made more efficient, but it the “general” form of a mixed-model in matrix notation and suffices for demonstration. See, for instance, my case study).

The model block defines the actual model. We use a normal prior for the population-level coefficients, an LKJ prior for the correlation matrix, and a multivariate normal for the group-level coefficients to be able to model their correlations. The likelihood is then the cumulative or the sratio pmf from above.

Cumulative model fit

With the theory derived, the linear predictor constructed and the stan files implemented, we can fit the models. Fitting the model can be done like this using rstansequential (we set refresh=-1 to avoid most of the output of rstan):

wine_fit_cumulative <- rstansequential::fit(
  rating ~ bottle + contact + temperature + (1 | judge), 
  wine, 
  family="cumulative",
  refresh=-1,
  seed=23L
)

print(wine_fit_cumulative, pars = c("threshold", "beta", "gamma", "sigma"))

Inference for Stan model: hierarchical_cumulative.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

              mean se_mean   sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
threshold[1] -1.86    0.02 0.73 -3.42 -2.33 -1.83 -1.37 -0.52  1577    1
threshold[2]  1.09    0.02 0.62 -0.10  0.69  1.09  1.49  2.36  1392    1
threshold[3]  3.52    0.02 0.74  2.13  3.01  3.48  4.00  5.03  1249    1
threshold[4]  5.30    0.02 0.87  3.69  4.70  5.26  5.89  7.05  1341    1
beta[1]       0.18    0.01 0.43 -0.68 -0.11  0.18  0.46  1.02  3553    1
beta[2]       1.40    0.01 0.44  0.54  1.10  1.39  1.68  2.29  2953    1
beta[3]       2.34    0.01 0.48  1.42  2.01  2.34  2.66  3.30  2177    1
gamma[1]      1.45    0.02 0.77  0.04  0.92  1.42  1.96  3.07  1266    1
gamma[2]     -0.56    0.02 0.72 -2.11 -1.00 -0.51 -0.07  0.74  1791    1
gamma[3]      0.87    0.02 0.71 -0.40  0.37  0.82  1.31  2.42  1437    1
gamma[4]     -0.07    0.02 0.69 -1.45 -0.50 -0.07  0.36  1.32  1929    1
gamma[5]      0.17    0.01 0.68 -1.17 -0.27  0.15  0.59  1.53  2112    1
gamma[6]      0.38    0.02 0.69 -0.97 -0.05  0.35  0.80  1.79  1864    1
gamma[7]     -1.71    0.02 0.85 -3.58 -2.24 -1.65 -1.12 -0.16  1300    1
gamma[8]     -0.26    0.02 0.68 -1.68 -0.67 -0.23  0.17  1.00  1956    1
gamma[9]     -0.50    0.02 0.71 -1.95 -0.95 -0.46 -0.03  0.83  2030    1
sigma[1]      1.05    0.01 0.22  0.61  0.91  1.04  1.19  1.51   919    1

Samples were drawn using NUTS(diag_e) at Thu Jun  4 17:00:37 2020.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

The fit shows (from top to bottom), the estimates for the four cutpoints (called threshold here). We obtain \(c=4\) cutpoints since we used \(K = c + 1\) categories. The next set of parameters are the coefficients of the covariables bottle=2, contact=yes and temperature=high. Note that we do not have intercepts in the model matrix. The rest are the coefficients for the group-level intercepts and their standard deviation.

The effective sample sizes are decent given that we used a centered parameterization and the R-hats are optimal. Before we continue, we plot the traces and histograms of posteriors, and HMC energy diagnostics.

trace.plot <- bayesplot::mcmc_trace(wine_fit_cumulative, regex_pars = "beta") 
hist.plot  <- bayesplot::mcmc_hist(wine_fit_cumulative, regex_pars = "beta")

(hist.plot + g) + (trace.plot + g)

The HMC energy diagnostics consists of two histograms per chain. Rougly speaking, if the two histograms overlap the HMC will rapidly explore the target distribution. If they don’t overlap, the sampler may not sufficiently explore the tails of the distribution and autocorrelation of the chain is large.

params <- bayesplot::nuts_params(wine_fit_cumulative)
bayesplot::mcmc_nuts_energy(params) + g

Sequential model fit

We now fit the sequential model. To do that we just change the family to sratio from cumulative.

wine_fit_sratio <- rstansequential::fit(
  rating ~ bottle + contact + temperature + (1 | judge), 
  wine, 
  family="sratio",
  refresh=-1,
  seed=23L
)

print(wine_fit_sratio, pars = c("threshold", "beta", "gamma", "sigma"))

Inference for Stan model: hierarchical_sratio.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

              mean se_mean   sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
threshold[1] -1.86    0.02 0.69 -3.30 -2.30 -1.84 -1.38 -0.61  1887    1
threshold[2]  0.87    0.01 0.58 -0.24  0.49  0.86  1.24  2.03  1897    1
threshold[3]  2.87    0.02 0.72  1.49  2.40  2.84  3.34  4.33  1709    1
threshold[4]  4.06    0.02 0.91  2.42  3.42  4.01  4.64  5.95  1711    1
beta[1]       0.03    0.01 0.36 -0.66 -0.20  0.03  0.28  0.73  3626    1
beta[2]       1.27    0.01 0.39  0.49  1.00  1.27  1.54  2.04  2710    1
beta[3]       2.19    0.01 0.44  1.37  1.88  2.18  2.48  3.12  2299    1
gamma[1]      1.20    0.02 0.66  0.04  0.75  1.15  1.60  2.64  1912    1
gamma[2]     -0.37    0.01 0.61 -1.67 -0.75 -0.33  0.03  0.78  2408    1
gamma[3]      0.88    0.01 0.64 -0.22  0.43  0.85  1.28  2.27  2082    1
gamma[4]      0.01    0.01 0.60 -1.21 -0.37  0.01  0.38  1.20  2365    1
gamma[5]      0.18    0.01 0.61 -1.00 -0.20  0.16  0.57  1.43  2675    1
gamma[6]      0.34    0.01 0.58 -0.76 -0.04  0.32  0.70  1.55  2331    1
gamma[7]     -1.54    0.02 0.77 -3.19 -2.02 -1.49 -0.98 -0.16  1720    1
gamma[8]     -0.30    0.01 0.59 -1.51 -0.65 -0.27  0.08  0.82  2149    1
gamma[9]     -0.50    0.01 0.62 -1.78 -0.88 -0.46 -0.09  0.67  2045    1
sigma[1]      0.99    0.01 0.21  0.60  0.85  0.98  1.11  1.42  1103    1

Samples were drawn using NUTS(diag_e) at Thu Jun  4 17:00:41 2020.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

The number of parameters and their names are the same as for the cumulative model, only their interpretation is different. Before we interpret them, we again plot traces and histograms of the posteriors.

trace.plot <- bayesplot::mcmc_trace(wine_fit_sratio, regex_pars = "beta") 
hist.plot  <- bayesplot::mcmc_hist(wine_fit_sratio, regex_pars = "beta")

(hist.plot + g) + (trace.plot + g)

As before, we check the energy distributions, too.

params <- bayesplot::nuts_params(wine_fit_sratio)
bayesplot::mcmc_nuts_energy(params) + g

Interpretation of coefficients

While the coefficients of the two models are fairly similar, the major difference lies in their interpretation. For comparison the means of the coefficients are shown below:

parse.summary <- function(fit, name) {
  summ <- data.frame(
    mean = rstan::summary(fit, pars="beta")$summary[, "mean"],
    model = name,
    feature = c("bottle=2", "contact=yes", "temperature=high")
  )
  
  summ
}

cumulative.summary  <- parse.summary(wine_fit_cumulative, "cumulative")
sratio.summary  <- parse.summary(wine_fit_sratio, "sequential")

rbind(cumulative.summary, sratio.summary) %>%
  tidyr::pivot_wider(values_from = "mean", names_from="feature")

# A tibble: 2 x 4
  model      `bottle=2` `contact=yes` `temperature=high`
  <chr>           <dbl>         <dbl>              <dbl>
1 cumulative     0.178           1.40               2.34
2 sequential     0.0342          1.27               2.19

For the cumulative model the coefficients refer to the cumulative odds ratio

\[\begin{align} \frac{P(Y \le k \mid \mathbf{x})\ / \ P(Y > k \mid \mathbf{x}) }{P(Y \le k \mid \mathbf{\tilde{x}})\ / \ P(Y > k \mid \mathbf{\tilde{x}}) } & = \exp\left( \left( \mathbf{x} - \mathbf{\tilde{x}}\right)^T \boldsymbol \beta \right) \end{align}\]

while they refer to a stopping odds ratio for the sequential model

\[\begin{align} \frac{P(Y = k \mid Y \ge k, \mathbf{x})\ / \ P( Y > k \mid Y \ge k, \mathbf{x}) }{P(Y = k \mid Y \ge k, \mathbf{\tilde{x}})\ / \ P(Y > k \mid Y \ge k, \mathbf{\tilde{x}}) } & = \exp\left( \left( \mathbf{x} - \mathbf{\tilde{x}}\right)^T \boldsymbol \beta \right) \end{align}\]

For instance, for the cumulative model for the covariable temperature has the odds ratio

\[\begin{align} \frac{P(Y \le k \mid temperature=high)\ / \ P(Y > k \mid temperature=high) }{P(Y \le k \mid temperature=low)\ / \ P(Y > k \mid temperature=low) } & = \exp\left( 2.34 \right) \end{align}\]

which means that for any rating level (from non-bitter to bitter) the odds for a less bitter rating in comparison to a more bitter rating are distictly higher when the grapes of the wine have been crushed with a high temperature, i.e., warmer crushing decreases the chances to be judged as bitter.

For the sequential model the covariable temperature has the odds ratio

\[\begin{align} \frac{P(Y = k \mid Y \ge k, temperature=high)\ / \ P( Y > k \mid Y \ge k, temperature=high) }{P(Y = k \mid Y \ge k, temperature=low)\ / \ P(Y > k \mid Y \ge k, temperature=low) } & = \exp\left( 2.19 \right) \end{align}\]

which means that the odds of stopping the transition to a higher bitterness is greater when the grapes have been crushed under high temperature, i.e., the chances of obtaining higher bitterness levels are lower when the grapes have been crushed when its warm.

Model selection

We do a model comparison using approximate leave-one-out cross-validation using the loo package.

do.loo <- function(fit) {
    ll <- loo::extract_log_lik(fit, merge_chains = FALSE)
    r_eff <- loo::relative_eff(exp(ll))
    loo::loo(ll, r_eff = r_eff)
}

wine_cumulative_loo <- do.loo(wine_fit_cumulative)
wine_sratio_loo <- do.loo(wine_fit_sratio)

loos <- loo::loo_compare(
    list(cumulative=wine_cumulative_loo, sequential=wine_sratio_loo)
)

print(loos, simplify=FALSE)

           elpd_diff se_diff elpd_loo se_elpd_loo p_loo se_p_loo looic se_looic
sequential   0.0       0.0   -86.6      4.7        11.8   1.4    173.1   9.4   
cumulative  -0.6       0.7   -87.2      4.7        12.2   1.2    174.3   9.4

The expected log predictive density is higher for the sequential model. That’s nice, because it demonstrates (as was the goal of this case study) that the sequential response mechanism is preferred for this data set. The improvement is small, but so is the data set.

Acknowledgements

Thanks to Jana Linnik for introducing me to sequential regression models and for making me aware that many biological data sets can be more appropriately modelled using a sequential response mechanism.

License

The case study is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Session info

sessionInfo()

R version 4.0.0 (2020-04-24)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 19.10

Matrix products: default
BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.3.7.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=de_CH.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=de_CH.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=de_CH.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=de_CH.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] rstansequential_0.1.0 loo_2.2.0             rstan_2.19.3         
 [4] StanHeaders_2.19.2    bayesplot_1.7.1       patchwork_1.0.0      
 [7] colorspace_1.4-1      forcats_0.5.0         stringr_1.4.0        
[10] dplyr_0.8.5           purrr_0.3.4           readr_1.3.1          
[13] tidyr_1.0.2           tibble_3.0.1          ggplot2_3.3.0        
[16] tidyverse_1.3.0      

loaded via a namespace (and not attached):
  [1] ellipsis_0.3.0       ggridges_0.5.2       brms_2.12.0         
  [4] rsconnect_0.8.16     markdown_1.1         base64enc_0.1-3     
  [7] fs_1.4.1             rstudioapi_0.11      farver_2.0.3        
 [10] DT_0.13              fansi_0.4.1          mvtnorm_1.1-0       
 [13] lubridate_1.7.8      xml2_1.3.2           bridgesampling_1.0-0
 [16] codetools_0.2-16     knitr_1.28           shinythemes_1.1.2   
 [19] jsonlite_1.6.1       broom_0.5.6          dbplyr_1.4.3        
 [22] shiny_1.4.0.2        compiler_4.0.0       httr_1.4.1          
 [25] backports_1.1.6      assertthat_0.2.1     Matrix_1.2-18       
 [28] fastmap_1.0.1        cli_2.0.2            later_1.0.0         
 [31] htmltools_0.4.0      prettyunits_1.1.1    tools_4.0.0         
 [34] igraph_1.2.5         coda_0.19-3          gtable_0.3.0        
 [37] glue_1.4.0           reshape2_1.4.4       ggthemes_4.2.0      
 [40] Rcpp_1.0.4.6         cellranger_1.1.0     vctrs_0.2.4         
 [43] nlme_3.1-147         crosstalk_1.1.0.1    xfun_0.13           
 [46] ps_1.3.2             rvest_0.3.5          mime_0.9            
 [49] miniUI_0.1.1.1       lifecycle_0.2.0      gtools_3.8.2        
 [52] zoo_1.8-8            scales_1.1.0         colourpicker_1.0    
 [55] hms_0.5.3            promises_1.1.0       Brobdingnag_1.2-6   
 [58] parallel_4.0.0       inline_0.3.15        shinystan_2.5.0     
 [61] yaml_2.2.1           gridExtra_2.3        stringi_1.4.6       
 [64] dygraphs_1.1.1.6     checkmate_2.0.0      pkgbuild_1.0.7      
 [67] rlang_0.4.5          pkgconfig_2.0.3      matrixStats_0.56.0  
 [70] evaluate_0.14        lattice_0.20-41      labeling_0.3        
 [73] rstantools_2.0.0     htmlwidgets_1.5.1    processx_3.4.2      
 [76] tidyselect_1.0.0     plyr_1.8.6           magrittr_1.5        
 [79] R6_2.4.1             generics_0.0.2       DBI_1.1.0           
 [82] pillar_1.4.3         haven_2.2.0          withr_2.2.0         
 [85] xts_0.12-0           abind_1.4-5          modelr_0.1.6        
 [88] crayon_1.3.4         utf8_1.1.4           rmarkdown_2.1       
 [91] grid_4.0.0           readxl_1.3.1         callr_3.4.3         
 [94] threejs_0.3.3        reprex_0.3.0         digest_0.6.25       
 [97] xtable_1.8-4         httpuv_1.5.2         stats4_4.0.0        
[100] munsell_0.5.0        shinyjs_1.1

References

McCullagh, Peter. 1980. “Regression Models for Ordinal Data.” Journal of the Royal Statistical Society: Series B (Methodological) 42 (2): 109–27. https://doi.org/10.1111/j.2517-6161.1980.tb01109.x.

Randall, J. H. 1989. “The Analysis of Sensory Data by Generalized Linear Model.” Biometrical Journal 31 (7): 781–93. https://doi.org/10.1002/bimj.4710310703.

Tutz, Gerhard. 1991. “Sequential models in categorical regression.” Computational Statistics & Data Analysis 11 (3): 275–95. https://doi.org/10.1016/0167-9473(91)90086-H.

Tutz, Gerhard, and Wolfgang Hennevogl. 1996. “Random Effects in Ordinal Regression Models.” Computational Statistics & Data Analysis 22 (5): 537–57. https://doi.org/10.1016/0167-9473(96)00004-7.

Yee, Thomas. 2010. “The VGAM Package for Categorical Data Analysis.” Journal of Statistical Software, Articles 32 (10): 1–34. https://doi.org/10.18637/jss.v032.i10.

On sequential regression models

Simon Dirmeier <simon.dirmeier @ web.de>

June 2020