Train and test a ZeitZeiger predictor, accounting for batch effects
Source:R/zeitzeiger_predict.R
zeitzeigerBatch.Rd
Train and test a predictor on multiple datasets independently, using
sva::ComBat()
to correct for batch effects prior to running zeitzeiger()
.
Usage
zeitzeigerBatch(
ematList,
trainStudyNames,
sampleMetadata,
studyColname,
batchColname,
timeColname,
nKnots = 3,
nTime = 10,
useSpc = TRUE,
sumabsv = 2,
orth = TRUE,
nSpc = 2,
timeRange = seq(0, 1 - 0.01, 0.01),
covariateName = NA,
featuresExclude = NULL,
dopar = TRUE
)
Arguments
- ematList
Named list of matrices of measurements, one for each dataset, some of which will be for training, others for testing. Each matrix should have rownames corresponding to sample names and colnames corresponding to feature names.
- trainStudyNames
Character vector of names in
ematList
corresponding to datasets for training.- sampleMetadata
data.frame containing relevant information for each sample across all datasets. Must have a column named
sample
.- studyColname
Name of column in
sampleMetdata
that contains information about which dataset each sample belongs to.- batchColname
Name of column in
sampleMetdata
that contains information about which dataset each sample belongs to. This should correspond to the names ofematList
, and will often be the same asstudyColname
, but doesn't have to be.- timeColname
Name of column in
sampleMetdata
that contains the values of the periodic variable.- nKnots
Number of internal knots to use for the periodic smoothing spline.
- nTime
Number of time-points by which to discretize the time-dependent behavior of each feature. Corresponds to the number of rows in the matrix for which the SPCs will be calculated.
- useSpc
Logical indicating whether to use
PMA::SPC()
(default) orbase::svd()
.- sumabsv
L1-constraint on the SPCs, passed to
PMA::SPC()
.- orth
Logical indicating whether to require left singular vectors be orthogonal to each other, passed to
PMA::SPC()
.- nSpc
Vector of the number of SPCs to use for prediction. If
NA
(default),nSpc
will become1:K
, whereK
is the number of SPCs inspcResult
. Each value innSpc
will correspond to one prediction for each test observation. A value of 2 means that the prediction will be based on the first 2 SPCs.- timeRange
Vector of values of the periodic variable at which to calculate likelihood. The time with the highest likelihood is used as the initial value for the MLE optimizer.
- covariateName
Name of column(s) in
sampleMetadata
containing information about other covariates forsva::ComBat()
, besidesbatchColname
. IfNA
(default), then there are no other covariates.- featuresExclude
Named list of character vectors corresponding to features to exclude from being used for prediction for the respective test datasets.
- dopar
Logical indicating whether to process the folds in parallel. Use
doParallel::registerDoParallel()
to register the parallel backend.
Value
- spcResultList
List of output from
zeitzeigerSpc()
, one for each test dataset.- timeDepLike
3-D array of likelihood, with dimensions for each test observation (across all datasets), each element of
nSpc
, and each element oftimeRange
.- mleFit
List (for each element in
nSpc
) of lists (for each test observation) ofmle2
objects.- timePred
Matrix of predicted times for test observations by values of
nSpc
.