Fits many generalised linear mixed effects models (GLMM) with negative binomial distribution for analysis of overdispersed count data with random effects. Designed for longitudinal analysis of RNA-Sequencing count data.

## Usage

```
glmmSeq(
modelFormula,
countdata,
metadata,
id = NULL,
dispersion = NA,
sizeFactors = NULL,
reduced = NULL,
modelData = NULL,
designMatrix = NULL,
method = c("lme4", "glmmTMB"),
control = NULL,
family = nbinom2,
cores = 1,
removeSingles = FALSE,
zeroCount = 0.125,
verbose = TRUE,
returnList = FALSE,
progress = FALSE,
...
)
```

## Arguments

- modelFormula
the model formula. This must be of the form

`"~ ..."`

where the structure is assumed to be`"counts ~ ..."`

. The formula must include a random effects term. For more information on formula structure for random effects see`lme4::glmer()`

- countdata
the sequencing count data matrix with genes in rows and samples in columns

- metadata
a dataframe of sample information with variables in columns and samples in rows

- id
Optional. Used to specify the column in metadata which contains the sample IDs to be used in repeated samples for random effects. If not specified, the function defaults to using the variable after the "|" in the random effects term in the formula.

- dispersion
a numeric vector of gene dispersion. Not required for

`glmmTMB`

models, as these determine and fit dispersion for each gene.- sizeFactors
size factors (default = NULL). If provided the

`glmer`

offset is set to log(sizeFactors). For more information see```lme4::glmer()`

- reduced
Optional reduced model formula. If this is chosen, a likelihood ratio test is used to calculate p-values instead of the default Wald type 2 Chi-squared test.

- modelData
Optional dataframe. Default is generated by call to

`expand.grid`

using levels of variables in the formula. Used to calculate model predictions (estimated means & 95% CI) for plotting via modelPlot. It can therefore be used to add/remove points in modelPlot.- designMatrix
custom design matrix, used only for prediction

- method
Specifies which package to use for fitting GLMM models. Either "lme4" or "glmmTMB" depending on whether to use lme4::glmer or glmmTMB::glmmTMB to fit GLMM models.

- control
the

`glmer`

optimizer control. If`method = "lme4"`

default is`glmerControl(optimizer = "bobyqa")`

). If`method = "glmmTMB"`

default is`glmmTMBControl()`

- family
Only used with

`glmmTMB`

models. Default is`nbinom2`

. See glmmTMB::nbinom2- cores
number of cores to use. Default = 1.

- removeSingles
whether to remove individuals without repeated measures (default = FALSE)

- zeroCount
numerical value to offset zeroes for the purpose of log (default = 0.125)

- verbose
Logical whether to display messaging (default = TRUE)

- returnList
Logical whether to return results as a list or

`glmmSeq`

object (default = FALSE). Useful for debugging.- progress
Logical whether to display a progress bar

- ...
Other parameters to pass to

`lme4::glmer()`

## Value

Returns an S4 class `GlmmSeq`

object with results for gene-wise
general linear mixed models. A list of results is returned if `returnList`

is `TRUE`

which is useful for debugging. If all genes return errors from
`glmer`

, then an error message is shown and a character vector containing
error messages for all genes is returned.

## Details

This function is a wrapper for `lme4::glmer()`

. By default, p-values for each
model term are computed using Wald type 2 Chi-squared test as per
`car::Anova()`

. The underlying code for this has been optimised for speed.
However, if a reduced model formula is specified by setting `reduced`

, then a
likelihood ratio test is performed instead using stats::anova. This will
double computation time since two GLMM have to be fitted.

Parallelisation is provided using parallel::mclapply on Unix/Mac or parallel::parLapply on PC.

Setting `method = "glmmTMB"`

enables an alternative method of fitting GLMM
using the `glmmTMB`

package. This gives access to a variety of alternative
GLM family functions. Note, `glmmTMB`

negative binomial models are
substantially slower to fit than `glmer`

models with known dispersion due to
the extra time taken by `glmmTMB`

to optimise the dispersion parameter.

The `id`

argument is usually optional. By default the `id`

column in the
metadata is determined as the term after the bar in the random effects term
of the model. Note that `id`

is not passed to `glmer`

or `glmmTMB`

. It is
only really used to remove singletons from the `countdata`

matrix and
`metadata`

dataframe. The `id`

is also stored in the output from `glmmSeq`

and used by plotting function `modelPlot()`

. However, due to its flexible
nature, in theory `glmmSeq`

should allow for more than one random effect
term, although this has not been tested formally. In this case, it is
probably prudent to specify a value for `id`

.

## Examples

```
data(PEAC_minimal_load)
disp <- apply(tpm, 1, function(x) {
(var(x, na.rm = TRUE)-mean(x, na.rm = TRUE))/(mean(x, na.rm = TRUE)**2)
})
MS4A1glmm <- glmmSeq(~ Timepoint * EULAR_6m + (1 | PATID),
countdata = tpm[1:2, ],
metadata = metadata,
dispersion = disp,
verbose = FALSE)
names(attributes(MS4A1glmm))
#> [1] "info" "formula" "stats" "predict" "reduced" "countdata"
#> [7] "metadata" "modelData" "optInfo" "errors" "vars" "class"
```