Assess normality of traits in a data frame.
Data frame containing the raw data.
Numeric vector containing the indices of the dataset properties that are non-numeric, excluded columns.
Number of CPUs to be used in the computation.
Prefix for output files and plots.
Path to the directory where plots should be stored.
Numeric vector with the transformation values.
Significance level.
Boolean flag to indicate whether or not perform a Pareto scaling on the normalised data.
Boolean flag to indicate whether or not to show the normality assessment statistics (how many traits are normal, how many were transformed/normalised, and which transformations were applied).
List of data frames for the normal (norm
) and skewed
(skew
) traits.
The normality of each trait is assessed using a Shapiro-Wilk test, under the following hypotheses:
\(H_0:\) the sample comes from a normally distributed population.
\(H_1:\) the sample does not come from a normally distributed population.
Using a significance level of \(\alpha = 0.05\). If the conclusion is that
the sample does not come from a normally distributed population, then a
number of transformations are performed, based on the transformation values
passed with transf_vals
. By default, the following transformation
values are used a = c(2, exp(1), 3, 4, 5, 6, 7, 8, 9, 10)
with the
logarithmic (log_a(x)
), power (x^a
), and
radical/root (x^(1/a)
) functions.
# \donttest{
# Toy dataset
example_data <- data.frame(ID = c(1,2,3,4,5),
P1 = c("one", "two", "three", "four", "five"),
T1 = rnorm(5),
T2 = rnorm(5))
out_prefix <- file.path(tempdir(), "metapipe")
plots_dir <- file.path(tempdir(), "plots")
example_data_normalised <-
MetaPipe::assess_normality(example_data,
c(1, 2),
out_prefix = out_prefix,
plots_dir = plots_dir)
#> Total traits (excluding all NAs traits): 2
#> Normal traits (without transformation): 2
#> Normal traits (transformed): 0
#> Total normal traits: 2
#> Total skewed traits: 0
example_data_norm <- example_data_normalised$norm
example_data_skew <- example_data_normalised$skew
# Normal traits
knitr::kable(example_data_norm)
#>
#>
#> | ID| T1| T2|
#> |--:|----------:|----------:|
#> | 1| -0.1088320| -0.0115961|
#> | 2| 0.6623192| 0.1743725|
#> | 3| -1.2701015| 0.0213287|
#> | 4| -0.5573619| 0.9433218|
#> | 5| 0.2172910| -0.4548338|
# Skewed traits (empty)
# knitr::kable(example_data_skew)
# F1 Seedling Ionomics dataset
data(ionomics) # Includes some missing data
out_prefix <- file.path(tempdir(), "ionomics")
plots_dir <- file.path(tempdir(), "plots")
ionomics_rev <- MetaPipe::replace_missing(ionomics,
excluded_columns = c(1, 2),
replace_na = TRUE,
out_prefix = out_prefix)
ionomics_normalised <-
MetaPipe::assess_normality(ionomics_rev,
excluded_columns = c(1, 2),
out_prefix = out_prefix,
plots_dir = plots_dir,
transf_vals = c(2, exp(1)))
#> Total traits (excluding all NAs traits): 21
#> Normal traits (without transformation): 2
#> Normal traits (transformed): 4
#> Total normal traits: 6
#> Total skewed traits: 15
#>
#> Transformations' summary:
#> f(x) Value # traits
#> log 2 3
#> root e 1
ionomics_norm <- ionomics_normalised$norm
ionomics_skew <- ionomics_normalised$skew
# Normal traits
knitr::kable(ionomics_norm[1:5, ])
#>
#>
#> |ID | Ca44| B11| Na23| Mg26| Rb85| Sr88|
#> |:-----|--------:|--------:|--------:|--------:|--------:|---------:|
#> |E_001 | 15894.22| 4.222397| 2.042740| 10.77021| 1.555742| 7.347059|
#> |E_002 | 13155.45| 3.855684| 1.917202| 10.54095| 2.058711| 6.890243|
#> |E_004 | 14182.51| 3.879033| 2.354263| 10.51931| 2.198422| 9.025915|
#> |E_005 | 22550.82| 4.329576| 2.477233| 11.13450| 1.791578| 15.292360|
#> |E_006 | 15982.76| 3.697997| 2.419593| 11.72734| 2.229866| 13.901449|
# Skewed traits (partial output)
knitr::kable(ionomics_skew[1:5, 1:8])
#>
#>
#> |ID | K39| P31| Li7| Al27| S34| Fe54| Mn55|
#> |:-----|--------:|--------:|---------:|---------:|--------:|--------:|--------:|
#> |E_001 | 5888.311| 1743.118| 0.0128699| 3.845879| 1152.944| 27.59340| 54.53991|
#> |E_002 | 7013.400| 2244.684| 0.0119316| 5.825639| 1600.442| 35.49159| 52.57114|
#> |E_004 | 7966.273| 2311.057| 0.0212316| 8.036047| 1039.098| 39.13434| 36.66475|
#> |E_005 | 7514.089| 2315.675| 0.0233063| 9.482051| 1091.607| 40.22041| 43.24368|
#> |E_006 | 7608.464| 1995.193| 0.0588128| 29.329605| 1096.871| 75.23614| 53.64705|
# }