# Train and test a ZeitZeiger predictor, accounting for batch effects

Source:`R/zeitzeiger_predict.R`

`zeitzeigerBatch.Rd`

Train and test a predictor on multiple datasets independently, using
`sva::ComBat()`

to correct for batch effects prior to running `zeitzeiger()`

.

## Usage

```
zeitzeigerBatch(
ematList,
trainStudyNames,
sampleMetadata,
studyColname,
batchColname,
timeColname,
nKnots = 3,
nTime = 10,
useSpc = TRUE,
sumabsv = 2,
orth = TRUE,
nSpc = 2,
timeRange = seq(0, 1 - 0.01, 0.01),
covariateName = NA,
featuresExclude = NULL,
dopar = TRUE
)
```

## Arguments

- ematList
Named list of matrices of measurements, one for each dataset, some of which will be for training, others for testing. Each matrix should have rownames corresponding to sample names and colnames corresponding to feature names.

- trainStudyNames
Character vector of names in

`ematList`

corresponding to datasets for training.- sampleMetadata
data.frame containing relevant information for each sample across all datasets. Must have a column named

`sample`

.- studyColname
Name of column in

`sampleMetdata`

that contains information about which dataset each sample belongs to.- batchColname
Name of column in

`sampleMetdata`

that contains information about which dataset each sample belongs to. This should correspond to the names of`ematList`

, and will often be the same as`studyColname`

, but doesn't have to be.- timeColname
Name of column in

`sampleMetdata`

that contains the values of the periodic variable.- nKnots
Number of internal knots to use for the periodic smoothing spline.

- nTime
Number of time-points by which to discretize the time-dependent behavior of each feature. Corresponds to the number of rows in the matrix for which the SPCs will be calculated.

- useSpc
Logical indicating whether to use

`PMA::SPC()`

(default) or`base::svd()`

.- sumabsv
L1-constraint on the SPCs, passed to

`PMA::SPC()`

.- orth
Logical indicating whether to require left singular vectors be orthogonal to each other, passed to

`PMA::SPC()`

.- nSpc
Vector of the number of SPCs to use for prediction. If

`NA`

(default),`nSpc`

will become`1:K`

, where`K`

is the number of SPCs in`spcResult`

. Each value in`nSpc`

will correspond to one prediction for each test observation. A value of 2 means that the prediction will be based on the first 2 SPCs.- timeRange
Vector of values of the periodic variable at which to calculate likelihood. The time with the highest likelihood is used as the initial value for the MLE optimizer.

- covariateName
Name of column(s) in

`sampleMetadata`

containing information about other covariates for`sva::ComBat()`

, besides`batchColname`

. If`NA`

(default), then there are no other covariates.- featuresExclude
Named list of character vectors corresponding to features to exclude from being used for prediction for the respective test datasets.

- dopar
Logical indicating whether to process the folds in parallel. Use

`doParallel::registerDoParallel()`

to register the parallel backend.

## Value

- spcResultList
List of output from

`zeitzeigerSpc()`

, one for each test dataset.- timeDepLike
3-D array of likelihood, with dimensions for each test observation (across all datasets), each element of

`nSpc`

, and each element of`timeRange`

.- mleFit
List (for each element in

`nSpc`

) of lists (for each test observation) of`mle2`

objects.- timePred
Matrix of predicted times for test observations by values of

`nSpc`

.