Replace missing values (NA
s) in a dataset, the user can choose
between two actions to handle missing data:
Drop traits (variables) that exceed a given threshold,
prop_na
, a rate of missing (NA
) and total observations.
Replace missing values by half of the minimum within each trait.
Finally, if there are traits for which all entries are missing, these will
be removed from the dataset and stored in a external CSV file called
"<out_prefix>_NA_raw_data.csv"
.
Data frame containing the raw data.
Numeric vector containing the indices of the dataset properties that are non-numeric, excluded columns.
Prefix for output files and plots.
Proportion of missing/total observations, if a trait exceeds
this threshold and replace_na = FALSE
, then it will be
dropped out.
Boolean flag to indicate whether or not missing values should be replaced by half of the minimum value within each trait.
Data frame containing the raw data without missing values.
# Toy dataset
example_data <- data.frame(ID = c(1,2,3,4,5),
P1 = c("one", "two", "three", "four", "five"),
T1 = rnorm(5),
T2 = rnorm(5),
T3 = c(NA, rnorm(4)), # 20 % NAs
T4 = c(NA, 1.2, -0.5, NA, 0.87), # 40 % NAs
T5 = NA) # 100 % NAs
out_prefix = file.path(tempdir(), "metapipe")
MetaPipe::replace_missing(example_data, c(1, 2), out_prefix = out_prefix)
#> The following trait was dropped because it has 50% or more missing values:
#> - T5
#> ID P1 T1 T2 T3 T4
#> 1 1 one -0.56047565 1.7150650 NA NA
#> 2 2 two -0.23017749 0.4609162 1.2240818 1.20
#> 3 3 three 1.55870831 -1.2650612 0.3598138 -0.50
#> 4 4 four 0.07050839 -0.6868529 0.4007715 NA
#> 5 5 five 0.12928774 -0.4456620 0.1106827 0.87
MetaPipe::replace_missing(example_data,
c(1, 2),
prop_na = 0.25,
out_prefix = out_prefix)
#> The following traits were dropped because they have 25% or more missing values:
#> - T4
#> - T5
#> ID P1 T1 T2 T3
#> 1 1 one -0.56047565 1.7150650 NA
#> 2 2 two -0.23017749 0.4609162 1.2240818
#> 3 3 three 1.55870831 -1.2650612 0.3598138
#> 4 4 four 0.07050839 -0.6868529 0.4007715
#> 5 5 five 0.12928774 -0.4456620 0.1106827
MetaPipe::replace_missing(example_data,
c(1, 2),
replace_na = TRUE,
out_prefix = out_prefix)
#> The following trait was dropped because it has 100% missing values:
#> - T5
#> ID P1 T1 T2 T3 T4
#> 1 1 one -0.56047565 1.7150650 0.05534136 -0.25
#> 2 2 two -0.23017749 0.4609162 1.22408180 1.20
#> 3 3 three 1.55870831 -1.2650612 0.35981383 -0.50
#> 4 4 four 0.07050839 -0.6868529 0.40077145 -0.25
#> 5 5 five 0.12928774 -0.4456620 0.11068272 0.87
# F1 Seedling Ionomics dataset
data(ionomics) # Includes some missing data
out_prefix <- file.path(tempdir(), "ionomics")
ionomics_rev <- MetaPipe::replace_missing(ionomics,
c(1, 2),
out_prefix = out_prefix)
ionomics_rev <- MetaPipe::replace_missing(ionomics,
excluded_columns = c(1, 2),
prop_na = 0.025,
out_prefix = out_prefix)
#> The following trait was dropped because it has 2.5% or more missing values:
#> - Se78
ionomics_rev <- MetaPipe::replace_missing(ionomics,
excluded_columns = c(1, 2),
replace_na = TRUE,
out_prefix = out_prefix)
knitr::kable(ionomics_rev[1:5, 1:8])
#>
#>
#> |ID | SampleWeight| Ca44| K39| P31| Li7| B11| Na23|
#> |:-----|------------:|--------:|--------:|--------:|---------:|--------:|---------:|
#> |E_001 | 79| 15894.22| 5888.311| 1743.118| 0.0128699| 18.66673| 6.970224|
#> |E_002 | 93| 13155.45| 7013.400| 2244.684| 0.0119316| 14.47693| 5.866392|
#> |E_004 | 97| 14182.51| 7966.273| 2311.057| 0.0212316| 14.71313| 10.251955|
#> |E_005 | 82| 22550.82| 7514.089| 2315.675| 0.0233063| 20.10630| 11.773697|
#> |E_006 | 99| 15982.76| 7608.464| 1995.193| 0.0588128| 12.97801| 11.043837|