Train and test a ZeitZeiger predictor, accounting for batch effects
Source:R/zeitzeiger_predict.R
zeitzeigerBatch.RdTrain and test a predictor on multiple datasets independently, using
sva::ComBat() to correct for batch effects prior to running zeitzeiger().
Usage
zeitzeigerBatch(
ematList,
trainStudyNames,
sampleMetadata,
studyColname,
batchColname,
timeColname,
nKnots = 3,
nTime = 10,
useSpc = TRUE,
sumabsv = 2,
orth = TRUE,
nSpc = 2,
timeRange = seq(0, 1 - 0.01, 0.01),
covariateName = NA,
featuresExclude = NULL,
dopar = TRUE
)Arguments
- ematList
Named list of matrices of measurements, one for each dataset, some of which will be for training, others for testing. Each matrix should have rownames corresponding to sample names and colnames corresponding to feature names.
- trainStudyNames
Character vector of names in
ematListcorresponding to datasets for training.- sampleMetadata
data.frame containing relevant information for each sample across all datasets. Must have a column named
sample.- studyColname
Name of column in
sampleMetdatathat contains information about which dataset each sample belongs to.- batchColname
Name of column in
sampleMetdatathat contains information about which dataset each sample belongs to. This should correspond to the names ofematList, and will often be the same asstudyColname, but doesn't have to be.- timeColname
Name of column in
sampleMetdatathat contains the values of the periodic variable.- nKnots
Number of internal knots to use for the periodic smoothing spline.
- nTime
Number of time-points by which to discretize the time-dependent behavior of each feature. Corresponds to the number of rows in the matrix for which the SPCs will be calculated.
- useSpc
Logical indicating whether to use
PMA::SPC()(default) orbase::svd().- sumabsv
L1-constraint on the SPCs, passed to
PMA::SPC().- orth
Logical indicating whether to require left singular vectors be orthogonal to each other, passed to
PMA::SPC().- nSpc
Vector of the number of SPCs to use for prediction. If
NA(default),nSpcwill become1:K, whereKis the number of SPCs inspcResult. Each value innSpcwill correspond to one prediction for each test observation. A value of 2 means that the prediction will be based on the first 2 SPCs.- timeRange
Vector of values of the periodic variable at which to calculate likelihood. The time with the highest likelihood is used as the initial value for the MLE optimizer.
- covariateName
Name of column(s) in
sampleMetadatacontaining information about other covariates forsva::ComBat(), besidesbatchColname. IfNA(default), then there are no other covariates.- featuresExclude
Named list of character vectors corresponding to features to exclude from being used for prediction for the respective test datasets.
- dopar
Logical indicating whether to process the folds in parallel. Use
doParallel::registerDoParallel()to register the parallel backend.
Value
- spcResultList
List of output from
zeitzeigerSpc(), one for each test dataset.- timeDepLike
3-D array of likelihood, with dimensions for each test observation (across all datasets), each element of
nSpc, and each element oftimeRange.- mleFit
List (for each element in
nSpc) of lists (for each test observation) ofmle2objects.- timePred
Matrix of predicted times for test observations by values of
nSpc.