Load raw data • MetaPipe

library(MetaPipe)

MetaPipe only accepts comma-separated values (CSV) files with the following structure:

ID	[Property]₁	…	[Property]_M	[Trait]₁	…	[Trait]_N

where the first column (ID) should be an unique identifier for each entry, if there are repeated values MetaPipe will aggregate and replace them by a single row (mean across entries). The data structure can have 0 to M properties, including categorical and numerical. Finally, at least one one trait is expected.

The function call is as follows:

load_raw(raw_data_filename = "/path/to/FILE.CSV", 
         excluded_columns = c(2, 3, ..., M))

where raw_data_filename is the filename containing the raw data, either absolute or relative paths are accepted. The argument excluded_columns is a vector containing the indices of the properties, e.g. c(2, 3, ..., M).

# Toy dataset
set.seed(123)
example_data <- data.frame(ID = c(1,2,3,4,5),
                           P1 = c("one", "two", "three", "four", "five"), 
                           T1 = rnorm(5), 
                           T2 = rnorm(5),
                           T3 = c(NA, rnorm(4)),                     #  20 % NAs
                           T4 = c(NA, 1.2, -0.5, NA, 0.87),          #  40 % NAs
                           T5 = NA)                                  # 100 % NAs

workdir <- tempdir()

## Write to disk
write.csv(example_data, 
          file.path(workdir, "example_data.csv"), 
          row.names = FALSE)
## Create copy with duplicated rows
write.csv(example_data[c(1:5, 1, 2), ], 
          file.path(workdir, "example_data_dup.csv"), 
          row.names = FALSE)

# Load the data
load_raw(file.path(workdir, "example_data.csv"), c(2))
#>   ID    P1 T5          T1         T2        T3    T4
#> 1  1   one NA -0.56047565  1.7150650        NA    NA
#> 2  2   two NA -0.23017749  0.4609162 1.2240818  1.20
#> 3  3 three NA  1.55870831 -1.2650612 0.3598138 -0.50
#> 4  4  four NA  0.07050839 -0.6868529 0.4007715    NA
#> 5  5  five NA  0.12928774 -0.4456620 0.1106827  0.87
load_raw(file.path(workdir, "example_data_dup.csv"), c(2))
#>   ID    P1 T5          T1         T2        T3    T4
#> 1  1   one NA -0.56047565  1.7150650        NA    NA
#> 2  2   two NA -0.23017749  0.4609162 1.2240818  1.20
#> 3  3 three NA  1.55870831 -1.2650612 0.3598138 -0.50
#> 4  4  four NA  0.07050839 -0.6868529 0.4007715    NA
#> 5  5  five NA  0.12928774 -0.4456620 0.1106827  0.87

Next, see either Replace Missing Data [Optional] or Assess Normality.