Assess normality of traits in a data frame.

Usage,
assess_normality(
  raw_data,
  excluded_columns,
  cpus = 1,
  out_prefix = file.path(tempdir(), "metapipe"),
  plots_dir = tempdir(),
  transf_vals = c(2, exp(1), 3, 4, 5, 6, 7, 8, 9, 10),
  alpha = 0.05,
  pareto_scaling = FALSE,
  show_stats = TRUE
)

Arguments

raw_data

Data frame containing the raw data.

excluded_columns

Numeric vector containing the indices of the dataset properties that are non-numeric, excluded columns.

cpus

Number of CPUs to be used in the computation.

out_prefix

Prefix for output files and plots.

plots_dir

Path to the directory where plots should be stored.

transf_vals

Numeric vector with the transformation values.

alpha

Significance level.

pareto_scaling

Boolean flag to indicate whether or not perform a Pareto scaling on the normalised data.

show_stats

Boolean flag to indicate whether or not to show the normality assessment statistics (how many traits are normal, how many were transformed/normalised, and which transformations were applied).

Value

List of data frames for the normal (norm) and skewed (skew) traits.

Details

The normality of each trait is assessed using a Shapiro-Wilk test, under the following hypotheses:

  • \(H_0:\) the sample comes from a normally distributed population.

  • \(H_1:\) the sample does not come from a normally distributed population.

Using a significance level of \(\alpha = 0.05\). If the conclusion is that the sample does not come from a normally distributed population, then a number of transformations are performed, based on the transformation values passed with transf_vals. By default, the following transformation values are used a = c(2, exp(1), 3, 4, 5, 6, 7, 8, 9, 10) with the logarithmic (log_a(x)), power (x^a), and radical/root (x^(1/a)) functions.

Examples

# \donttest{
# Toy dataset
example_data <- data.frame(ID = c(1,2,3,4,5), 
                           P1 = c("one", "two", "three", "four", "five"), 
                           T1 = rnorm(5), 
                           T2 = rnorm(5))
out_prefix <- file.path(tempdir(), "metapipe")
plots_dir <- file.path(tempdir(), "plots")
example_data_normalised <- 
  MetaPipe::assess_normality(example_data, 
                             c(1, 2),
                             out_prefix = out_prefix,
                             plots_dir = plots_dir)
#> Total traits (excluding all NAs traits):     2
#> Normal traits (without transformation):      2
#> Normal traits (transformed):                 0
#> Total normal traits:                         2
#> Total skewed traits:                         0
example_data_norm <- example_data_normalised$norm
example_data_skew <- example_data_normalised$skew

# Normal traits
knitr::kable(example_data_norm)
#> 
#> 
#> | ID|         T1|         T2|
#> |--:|----------:|----------:|
#> |  1| -0.1088320| -0.0115961|
#> |  2|  0.6623192|  0.1743725|
#> |  3| -1.2701015|  0.0213287|
#> |  4| -0.5573619|  0.9433218|
#> |  5|  0.2172910| -0.4548338|

# Skewed traits (empty)
# knitr::kable(example_data_skew)


# F1 Seedling Ionomics dataset
data(ionomics) # Includes some missing data
out_prefix <- file.path(tempdir(), "ionomics")
plots_dir <- file.path(tempdir(), "plots")
ionomics_rev <- MetaPipe::replace_missing(ionomics, 
                                          excluded_columns = c(1, 2),
                                          replace_na = TRUE,
                                          out_prefix = out_prefix)
ionomics_normalised <- 
  MetaPipe::assess_normality(ionomics_rev,
                             excluded_columns = c(1, 2),
                             out_prefix = out_prefix,
                             plots_dir = plots_dir,
                             transf_vals = c(2, exp(1)))
#> Total traits (excluding all NAs traits):     21
#> Normal traits (without transformation):      2
#> Normal traits (transformed):                 4
#> Total normal traits:                         6
#> Total skewed traits:                         15
#> 
#> Transformations' summary:
#> 	f(x)      Value     # traits  
#> 	log       2         3         
#> 	root      e         1         
                             
ionomics_norm <- ionomics_normalised$norm
ionomics_skew <- ionomics_normalised$skew

# Normal traits
knitr::kable(ionomics_norm[1:5, ])
#> 
#> 
#> |ID    |     Ca44|      B11|     Na23|     Mg26|     Rb85|      Sr88|
#> |:-----|--------:|--------:|--------:|--------:|--------:|---------:|
#> |E_001 | 15894.22| 4.222397| 2.042740| 10.77021| 1.555742|  7.347059|
#> |E_002 | 13155.45| 3.855684| 1.917202| 10.54095| 2.058711|  6.890243|
#> |E_004 | 14182.51| 3.879033| 2.354263| 10.51931| 2.198422|  9.025915|
#> |E_005 | 22550.82| 4.329576| 2.477233| 11.13450| 1.791578| 15.292360|
#> |E_006 | 15982.76| 3.697997| 2.419593| 11.72734| 2.229866| 13.901449|

# Skewed traits (partial output)
knitr::kable(ionomics_skew[1:5, 1:8])
#> 
#> 
#> |ID    |      K39|      P31|       Li7|      Al27|      S34|     Fe54|     Mn55|
#> |:-----|--------:|--------:|---------:|---------:|--------:|--------:|--------:|
#> |E_001 | 5888.311| 1743.118| 0.0128699|  3.845879| 1152.944| 27.59340| 54.53991|
#> |E_002 | 7013.400| 2244.684| 0.0119316|  5.825639| 1600.442| 35.49159| 52.57114|
#> |E_004 | 7966.273| 2311.057| 0.0212316|  8.036047| 1039.098| 39.13434| 36.66475|
#> |E_005 | 7514.089| 2315.675| 0.0233063|  9.482051| 1091.607| 40.22041| 43.24368|
#> |E_006 | 7608.464| 1995.193| 0.0588128| 29.329605| 1096.871| 75.23614| 53.64705|
# }