Skip to contents

scale_and_register_data() is a function to register expression profiles a user wish to compare. This includes an option to scale data before registration, find and calculate score of optimal shifts and stretches, as well as apply the best shifts and stretches.

Usage

scale_and_register_data(
  input_df,
  stretches = NA,
  shifts = NA,
  min_num_overlapping_points,
  maintain_min_num_overlapping_points = FALSE,
  initial_rescale = FALSE,
  do_rescale = TRUE,
  accession_data_to_transform,
  accession_data_ref,
  start_timepoint = c("reference", "transform", "zero"),
  expression_value_threshold = 5,
  is_data_normalised = FALSE,
  optimise_registration_parameters = FALSE,
  num_iterations = 60
)

Arguments

input_df

Input data frame containing all replicates of gene expression in each genotype at each time point.

stretches

Candidate registration stretch factors to apply to data to transform.

shifts

Candidate registration shift values to apply to data to transform.

min_num_overlapping_points

Number of minimum overlapping time points. Shifts will be only considered if it leaves at least these many overlapping points after applying the registration function.

maintain_min_num_overlapping_points

Whether to automatically calculate extreme (minimum and maximum) values of shifts to maintain specified min_num_overlapping_points condition. By default, FALSE.

initial_rescale

Scaling gene expression prior to registration if TRUE.

do_rescale

Scaling gene expression using only overlapping time points points during registration.

accession_data_to_transform

Accession name of data which will be transformed.

accession_data_ref

Accession name of reference data.

start_timepoint

Time points to be added in both reference data and data to transform after shifting and stretching. Can be either "reference" (the default), "transform", or "zero".

expression_value_threshold

Expression value threshold. Remove expressions if maximum is less than the threshold. If NULL keep all data.

is_data_normalised

TRUE if dataset has been normalised prior to registration process.

optimise_registration_parameters

Whether to optimise registration parameters with Simulated Annealing. By default, FALSE.

num_iterations

Maximum number of iterations in the Simulated Annealing optimisation. By default, 60.

Value

This function returns a list of data frames, containing:

mean_df

a data frame containing mean expression value of each gene and accession for every time point.

mean_df_sc

identical to mean_df, with additional column sc.expression_value which the scaled mean expression values.

to_shift_df

a processed input data frame which is ready to be registered.

best_shifts

a data frame containing best shift factor for each given stretch.

shifted_mean_df

the registration result - after stretching and shifting.

imputed_mean_df

the imputed (transformed to be the same in a set of common time points) registration result.

all_shifts_df

a table containing candidates of registration parameters and a score after applying each parameter (stretch and shift factor).

model_comparison_df

a table comparing the optimal registration function for each gene (based on all_shifts_df scores) to model with no registration applied.

Examples

if (FALSE) {
# Load a data frame from the sample data
all_data_df <- system.file("extdata/brapa_arabidopsis_all_replicates.csv", package = "greatR") %>%
  utils::read.csv()

# Running the registration
registration_results <- scale_and_register_data(
  input_df = all_data_df,
  stretches = c(3, 2.5, 2, 1.5, 1),
  shifts = seq(-4, 4, length.out = 33),
  min_num_overlapping_points = 4,
  initial_rescale = FALSE,
  do_rescale = TRUE,
  accession_data_to_transform = "Col0",
  accession_data_ref = "Ro18",
  start_timepoint = "reference"
)
}