Train and test a ZeitZeiger predictor, accounting for batch effects

Train and test a predictor on multiple datasets independently, using sva::ComBat() to correct for batch effects prior to running zeitzeiger().

Usage

zeitzeigerBatch(
  ematList,
  trainStudyNames,
  sampleMetadata,
  studyColname,
  batchColname,
  timeColname,
  nKnots = 3,
  nTime = 10,
  useSpc = TRUE,
  sumabsv = 2,
  orth = TRUE,
  nSpc = 2,
  timeRange = seq(0, 1 - 0.01, 0.01),
  covariateName = NA,
  featuresExclude = NULL,
  dopar = TRUE
)

Arguments

ematList: Named list of matrices of measurements, one for each dataset, some of which will be for training, others for testing. Each matrix should have rownames corresponding to sample names and colnames corresponding to feature names.
trainStudyNames: Character vector of names in ematList corresponding to datasets for training.
sampleMetadata: data.frame containing relevant information for each sample across all datasets. Must have a column named sample.
studyColname: Name of column in sampleMetdata that contains information about which dataset each sample belongs to.
batchColname: Name of column in sampleMetdata that contains information about which dataset each sample belongs to. This should correspond to the names of ematList, and will often be the same as studyColname, but doesn't have to be.
timeColname: Name of column in sampleMetdata that contains the values of the periodic variable.
nKnots: Number of internal knots to use for the periodic smoothing spline.
nTime: Number of time-points by which to discretize the time-dependent behavior of each feature. Corresponds to the number of rows in the matrix for which the SPCs will be calculated.
useSpc: Logical indicating whether to use PMA::SPC() (default) or base::svd().
sumabsv: L1-constraint on the SPCs, passed to PMA::SPC().
orth: Logical indicating whether to require left singular vectors be orthogonal to each other, passed to PMA::SPC().
nSpc: Vector of the number of SPCs to use for prediction. If NA (default), nSpc will become 1:K, where K is the number of SPCs in spcResult. Each value in nSpc will correspond to one prediction for each test observation. A value of 2 means that the prediction will be based on the first 2 SPCs.
timeRange: Vector of values of the periodic variable at which to calculate likelihood. The time with the highest likelihood is used as the initial value for the MLE optimizer.
covariateName: Name of column(s) in sampleMetadata containing information about other covariates for sva::ComBat(), besides batchColname. If NA (default), then there are no other covariates.
featuresExclude: Named list of character vectors corresponding to features to exclude from being used for prediction for the respective test datasets.
dopar: Logical indicating whether to process the folds in parallel. Use doParallel::registerDoParallel() to register the parallel backend.

Value

spcResultList: List of output from zeitzeigerSpc(), one for each test dataset.
timeDepLike: 3-D array of likelihood, with dimensions for each test observation (across all datasets), each element of nSpc, and each element of timeRange.
mleFit: List (for each element in nSpc) of lists (for each test observation) of mle2 objects.
timePred: Matrix of predicted times for test observations by values of nSpc.

Train and test a ZeitZeiger predictor, accounting for batch effects

Usage

Arguments

Value

See also