Fits many linear mixed effects models for analysis of gaussian data with random effects, with parallelisation and optimisation for speed. It is suitable for longitudinal analysis of high dimensional data. Wald type 2 Chi-squared test is used to calculate p-values.

## Usage

```
lmmSeq(
modelFormula,
maindata,
metadata,
id = NULL,
offset = NULL,
test.stat = c("Wald", "F", "LRT"),
reduced = NULL,
modelData = NULL,
designMatrix = NULL,
control = lmerControl(),
cores = 1,
removeSingles = FALSE,
verbose = TRUE,
returnList = FALSE,
progress = FALSE,
...
)
```

## Arguments

- modelFormula
the model formula. This must be of the form

`"~ ..."`

where the structure is assumed to be`"gene ~ ..."`

. The formula must include a random effects term. See formula structure for random effects in`lme4::lmer()`

- maindata
data matrix with genes in rows and samples in columns

- metadata
a dataframe of sample information with variables in columns and samples in rows

- id
Optional. Used to specify the column in metadata which contains the sample IDs to be used in repeated samples for random effects. If not specified, the function defaults to using the variable after the "|" in the random effects term in the formula.

- offset
Vector containing model offsets (default = NULL). If provided the

`lmer()`

offset is set to`offset`

. See`lme4::lmer()`

- test.stat
Character value specifying test statistic. Current options are "Wald" for type 2 Wald Chi square test using code derived and modified from car::Anova to improve speed for matrix tests. Or "F" for conditional F tests using Saiterthwaite's method of approximated Df. This uses lmerTest::lmer and is somewhat slower.

- reduced
Optional reduced model formula. If this is chosen, a likelihood ratio test is used to calculate p-values instead of the default Wald type 2 Chi-squared test.

- modelData
Optional dataframe. Default is generated by call to

`expand.grid`

using levels of variables in the formula. Used to calculate model predictions (estimated means & 95% CI) for plotting via modelPlot. It can therefore be used to add/remove points in modelPlot.- designMatrix
Optional custom design matrix generated by call to

`model.matrix`

using`modelData`

and`FEformula`

. Used to calculate model predictions for plotting.- control
the

`lmer`

optimizer control (default =`lmerControl()`

). See`lme4::lmerControl()`

.- cores
number of cores to use for parallelisation. Default = 1.

- removeSingles
whether to remove individuals with no repeated measures (default = FALSE)

- verbose
Logical whether to display messaging (default = TRUE)

- returnList
Logical whether to return results as a list or lmmSeq object (default = FALSE). Helpful for debugging.

- progress
Logical whether to display a progress bar

- ...
Other parameters passed to

`lmerTest::lmer()`

. Only available if`test.stat = "F"`

.

## Value

Returns an S4 class `lmmSeq`

object with results for gene-wise linear
mixed models; or a list of results if `returnList`

is `TRUE`

, which is
useful for debugging. If all genes return errors from `lmer`

, then an error
message is shown and a character vector containing error messages for all
genes is returned.

## Details

By default, p-values for each model term are computed using Wald type 2
Chi-squared test as per `car::Anova()`

. The underlying code for this has been
optimised for speed. However, if a reduced model formula is specified by
setting `reduced`

, then a likelihood ratio test (LRT) is performed instead
using `anova`

. This will double computation
time since two LMM have to be fitted for each gene. For LRT, models being
compared are optimised by maximum likelihood and not REML (`REML=FALSE`

).

Two key methods are used to speed up computation above and beyond simple
parallelisation. The first is to speed up `lme4::lmer()`

by calling
`lme4::lFormula()`

once at the start and then updating the `lFormula`

output
with new data. The 2nd speed up is through optimised code for repeated type 2
Wald Chi-squared tests (original code was derived from car::Anova). For
example, elements such as the hypothesis matrices are generated only once to
reduce unnecessarily repetitive computation, and the generation of p-values
from Chi-squared values is vectorised and performed at the end. F-tests using
the `lmerTest`

package have not been optimised and are therefore slower.

Parallelisation is performed using parallel::mclapply on unix/mac and parallel::parLapply on windows. Progress bars use pbmcapply::pbmclapply on unix/mac and pbapply::pblapply on windows.

The `id`

argument is usually optional. By default the `id`

column in the
metadata is determined as the term after the bar in the random effects term
of the model. Note that `id`

is not passed to `lmer`

. It is only really used
to remove singletons from the `maindata`

matrix and `metadata`

dataframe. The
`id`

is also stored in the output from `lmmSeq`

and used by plotting function
`modelPlot()`

. However, due to its flexible nature, in theory `lmmSeq`

should
allow for more than one random effect term, although this has not been tested
formally. In this case, it is probably prudent to specify a value for `id`

.

## Examples

```
data(PEAC_minimal_load)
logtpm <- log2(tpm +1)
lmmtest <- lmmSeq(~ Timepoint * EULAR_6m + (1 | PATID),
maindata = logtpm[1:2, ],
metadata = metadata,
verbose = FALSE)
names(attributes(lmmtest))
#> [1] "info" "formula" "stats" "predict" "reduced" "maindata"
#> [7] "metadata" "modelData" "optInfo" "errors" "vars" "class"
```