Skip to contents

Train and test a predictor on multiple datasets independently, using sva::ComBat() to correct for batch effects prior to running zeitzeiger().

Usage

zeitzeigerBatch(
  ematList,
  trainStudyNames,
  sampleMetadata,
  studyColname,
  batchColname,
  timeColname,
  nKnots = 3,
  nTime = 10,
  useSpc = TRUE,
  sumabsv = 2,
  orth = TRUE,
  nSpc = 2,
  timeRange = seq(0, 1 - 0.01, 0.01),
  covariateName = NA,
  featuresExclude = NULL,
  dopar = TRUE
)

Arguments

ematList

Named list of matrices of measurements, one for each dataset, some of which will be for training, others for testing. Each matrix should have rownames corresponding to sample names and colnames corresponding to feature names.

trainStudyNames

Character vector of names in ematList corresponding to datasets for training.

sampleMetadata

data.frame containing relevant information for each sample across all datasets. Must have a column named sample.

studyColname

Name of column in sampleMetdata that contains information about which dataset each sample belongs to.

batchColname

Name of column in sampleMetdata that contains information about which dataset each sample belongs to. This should correspond to the names of ematList, and will often be the same as studyColname, but doesn't have to be.

timeColname

Name of column in sampleMetdata that contains the values of the periodic variable.

nKnots

Number of internal knots to use for the periodic smoothing spline.

nTime

Number of time-points by which to discretize the time-dependent behavior of each feature. Corresponds to the number of rows in the matrix for which the SPCs will be calculated.

useSpc

Logical indicating whether to use PMA::SPC() (default) or base::svd().

sumabsv

L1-constraint on the SPCs, passed to PMA::SPC().

orth

Logical indicating whether to require left singular vectors be orthogonal to each other, passed to PMA::SPC().

nSpc

Vector of the number of SPCs to use for prediction. If NA (default), nSpc will become 1:K, where K is the number of SPCs in spcResult. Each value in nSpc will correspond to one prediction for each test observation. A value of 2 means that the prediction will be based on the first 2 SPCs.

timeRange

Vector of values of the periodic variable at which to calculate likelihood. The time with the highest likelihood is used as the initial value for the MLE optimizer.

covariateName

Name of column(s) in sampleMetadata containing information about other covariates for sva::ComBat(), besides batchColname. If NA (default), then there are no other covariates.

featuresExclude

Named list of character vectors corresponding to features to exclude from being used for prediction for the respective test datasets.

dopar

Logical indicating whether to process the folds in parallel. Use doParallel::registerDoParallel() to register the parallel backend.

Value

spcResultList

List of output from zeitzeigerSpc(), one for each test dataset.

timeDepLike

3-D array of likelihood, with dimensions for each test observation (across all datasets), each element of nSpc, and each element of timeRange.

mleFit

List (for each element in nSpc) of lists (for each test observation) of mle2 objects.

timePred

Matrix of predicted times for test observations by values of nSpc.