Register or synchronize different expression profiles — scale_and_register

scale_and_register_data() is a function to register expression profiles a user wish to compare. This includes an option to scale data before registration, find and calculate score of optimal shifts and stretches, as well as apply the best shifts and stretches.

Usage

scale_and_register_data(
  input_df,
  stretches = NA,
  shifts = NA,
  min_num_overlapping_points,
  maintain_min_num_overlapping_points = FALSE,
  initial_rescale = FALSE,
  do_rescale = TRUE,
  accession_data_to_transform,
  accession_data_ref,
  start_timepoint = c("reference", "transform", "zero"),
  expression_value_threshold = 5,
  is_data_normalised = FALSE,
  optimise_registration_parameters = FALSE,
  num_iterations = 60
)

Arguments

input_df: Input data frame containing all replicates of gene expression in each genotype at each time point.
stretches: Candidate registration stretch factors to apply to data to transform.
shifts: Candidate registration shift values to apply to data to transform.
min_num_overlapping_points: Number of minimum overlapping time points. Shifts will be only considered if it leaves at least these many overlapping points after applying the registration function.
maintain_min_num_overlapping_points: Whether to automatically calculate extreme (minimum and maximum) values of shifts to maintain specified min_num_overlapping_points condition. By default, FALSE.
initial_rescale: Scaling gene expression prior to registration if TRUE.
do_rescale: Scaling gene expression using only overlapping time points points during registration.
accession_data_to_transform: Accession name of data which will be transformed.
accession_data_ref: Accession name of reference data.
start_timepoint: Time points to be added in both reference data and data to transform after shifting and stretching. Can be either "reference" (the default), "transform", or "zero".
expression_value_threshold: Expression value threshold. Remove expressions if maximum is less than the threshold. If NULL keep all data.
is_data_normalised: TRUE if dataset has been normalised prior to registration process.
optimise_registration_parameters: Whether to optimise registration parameters with Simulated Annealing. By default, FALSE.
num_iterations: Maximum number of iterations in the Simulated Annealing optimisation. By default, 60.

Value

This function returns a list of data frames, containing:

mean_df: a data frame containing mean expression value of each gene and accession for every time point.
mean_df_sc: identical to mean_df, with additional column sc.expression_value which the scaled mean expression values.
to_shift_df: a processed input data frame which is ready to be registered.
best_shifts: a data frame containing best shift factor for each given stretch.
shifted_mean_df: the registration result - after stretching and shifting.
imputed_mean_df: the imputed (transformed to be the same in a set of common time points) registration result.
all_shifts_df: a table containing candidates of registration parameters and a score after applying each parameter (stretch and shift factor).
model_comparison_df: a table comparing the optimal registration function for each gene (based on all_shifts_df scores) to model with no registration applied.

Examples

if (FALSE) {
# Load a data frame from the sample data
all_data_df <- system.file("extdata/brapa_arabidopsis_all_replicates.csv", package = "greatR") %>%
  utils::read.csv()

# Running the registration
registration_results <- scale_and_register_data(
  input_df = all_data_df,
  stretches = c(3, 2.5, 2, 1.5, 1),
  shifts = seq(-4, 4, length.out = 33),
  min_num_overlapping_points = 4,
  initial_rescale = FALSE,
  do_rescale = TRUE,
  accession_data_to_transform = "Col0",
  accession_data_ref = "Ro18",
  start_timepoint = "reference"
)
}