Register or synchronize different expression profiles
Source:R/process_data.R
scale_and_register_data.Rd
scale_and_register_data()
is a function to register expression profiles a
user wish to compare. This includes an option to scale data before
registration, find and calculate score of optimal shifts and stretches,
as well as apply the best shifts and stretches.
Usage
scale_and_register_data(
input_df,
stretches = NA,
shifts = NA,
min_num_overlapping_points,
maintain_min_num_overlapping_points = FALSE,
initial_rescale = FALSE,
do_rescale = TRUE,
accession_data_to_transform,
accession_data_ref,
start_timepoint = c("reference", "transform", "zero"),
expression_value_threshold = 5,
is_data_normalised = FALSE,
optimise_registration_parameters = FALSE,
num_iterations = 60
)
Arguments
- input_df
Input data frame containing all replicates of gene expression in each genotype at each time point.
- stretches
Candidate registration stretch factors to apply to data to transform.
- shifts
Candidate registration shift values to apply to data to transform.
- min_num_overlapping_points
Number of minimum overlapping time points. Shifts will be only considered if it leaves at least these many overlapping points after applying the registration function.
- maintain_min_num_overlapping_points
Whether to automatically calculate extreme (minimum and maximum) values of
shifts
to maintain specifiedmin_num_overlapping_points
condition. By default,FALSE
.- initial_rescale
Scaling gene expression prior to registration if
TRUE
.- do_rescale
Scaling gene expression using only overlapping time points points during registration.
- accession_data_to_transform
Accession name of data which will be transformed.
- accession_data_ref
Accession name of reference data.
- start_timepoint
Time points to be added in both reference data and data to transform after shifting and stretching. Can be either
"reference"
(the default),"transform"
, or"zero"
.- expression_value_threshold
Expression value threshold. Remove expressions if maximum is less than the threshold. If
NULL
keep all data.- is_data_normalised
TRUE
if dataset has been normalised prior to registration process.- optimise_registration_parameters
Whether to optimise registration parameters with Simulated Annealing. By default,
FALSE
.- num_iterations
Maximum number of iterations in the Simulated Annealing optimisation. By default, 60.
Value
This function returns a list of data frames, containing:
- mean_df
a data frame containing mean expression value of each gene and accession for every time point.
- mean_df_sc
identical to
mean_df
, with additional columnsc.expression_value
which the scaled mean expression values.- to_shift_df
a processed input data frame which is ready to be registered.
- best_shifts
a data frame containing best shift factor for each given stretch.
- shifted_mean_df
the registration result - after stretching and shifting.
- imputed_mean_df
the imputed (transformed to be the same in a set of common time points) registration result.
- all_shifts_df
a table containing candidates of registration parameters and a score after applying each parameter (stretch and shift factor).
- model_comparison_df
a table comparing the optimal registration function for each gene (based on
all_shifts_df
scores) to model with no registration applied.
Examples
if (FALSE) {
# Load a data frame from the sample data
all_data_df <- system.file("extdata/brapa_arabidopsis_all_replicates.csv", package = "greatR") %>%
utils::read.csv()
# Running the registration
registration_results <- scale_and_register_data(
input_df = all_data_df,
stretches = c(3, 2.5, 2, 1.5, 1),
shifts = seq(-4, 4, length.out = 33),
min_num_overlapping_points = 4,
initial_rescale = FALSE,
do_rescale = TRUE,
accession_data_to_transform = "Col0",
accession_data_ref = "Ro18",
start_timepoint = "reference"
)
}