A High-Performance Computing Pipeline for QTL Mapping of Large Ionomic and Metabolomic Datasets • MetaPipe

MetaPipe: A High-Performance Computing Pipeline for QTL Mapping of Large Ionomic and Metabolomic Datasets

Overview

The goal of MetaPipe is to provide an easy to use and powerful tool capable of performing QTL mapping analyses.

Installation

You can install the released version of MetaPipe from CRAN with:

install.packages("MetaPipe")

And the development version from GitHub with:

# install.packages(c("hexSticker", "kableExtra", "qpdf", "remotes")
remotes::install_github("villegar/MetaPipe", build_vignettes = TRUE)

Example

Load raw data

For details about the data structure and extended documentation, see the vignette Load Raw Data.

vignette("load-raw-data", package = "MetaPipe")

Function call

MetaPipe::load_raw(raw_data_filename = "FILE.CSV", excluded_columns = c(...))

where raw_data_filename is the filename containing the raw data, both absolute and relative paths are accepted. Next, the argument excluded_columns is a vector containing the indices of the properties, e.g. c(2, 3, ..., M).

# F1 Seedling Ionomics dataset
ionomics_path <- system.file("extdata", 
                             "ionomics.csv", 
                             package = "MetaPipe", 
                             mustWork = TRUE)
ionomics <- MetaPipe::load_raw(ionomics_path)
knitr::kable(ionomics[1:5, 1:8])

ID	SampleWeight	Ca44	K39	P31	Li7	B11	Na23
E_199	79	32675.79	6051.023	2679.338	0.1159068	23.32975	9.372606
E_209	81	28467.95	5642.651	2075.403	0.0104801	27.31206	8.787553
E_035	81	27901.35	7357.856	2632.343	0.0561879	16.87480	14.369062
E_197	79	27855.36	5225.275	1761.725	0.0104453	25.34740	11.009597
E_016	79	27377.40	6141.001	2145.715	0.0172996	24.64500	6.999958

Replace missing data

For extended documentation, see the vignette Replace Missing Data.

vignette("replace-missing-data", package = "MetaPipe")

Function call

MetaPipe::replace_missing(raw_data = example_data, 
                          excluded_columns = c(2), 
                          # Optional
                          out_prefix = "metapipe", 
                          prop_na = 0.5, 
                          replace_na = FALSE)

where raw_data is a data frame containing the raw data, as described in Load Raw Data and excluded_columns is a vector containing the indices of the properties, e.g. c(2, 3, ..., M). The other arguments are optional, out_prefix is the prefix for output files, prop_na is the proportion of NA values (used to drop traits), and replace_na is a logical flag to indicate whether or not NAs should be replace by half of the minimum value within each variable.

# F1 Seedling Ionomics dataset
data(ionomics) # Includes some missing data
ionomics_rev <- MetaPipe::replace_missing(ionomics, c(1, 2))
ionomics_rev <- MetaPipe::replace_missing(ionomics, 
                                          excluded_columns = c(1, 2), 
                                          prop_na =  0.025)
#> The following trait was dropped because it has 2.5% or more missing values: 
#>  - Se78
ionomics_rev <- MetaPipe::replace_missing(ionomics, 
                                          excluded_columns = c(1, 2),
                                          replace_na =  TRUE)
knitr::kable(ionomics_rev[1:5, 1:8])

ID	SampleWeight	Ca44	K39	P31	Li7	B11	Na23
E_001	79	15894.22	5888.311	1743.118	0.0128699	18.66673	6.970224
E_002	93	13155.45	7013.400	2244.684	0.0119316	14.47693	5.866392
E_004	97	14182.51	7966.273	2311.057	0.0212316	14.71313	10.251955
E_005	82	22550.82	7514.089	2315.675	0.0233063	20.10630	11.773697
E_006	99	15982.76	7608.464	1995.193	0.0588128	12.97801	11.043837

Assess normality

For extended documentation, see the vignette Assess Normality.

vignette("assess-normality", package = "MetaPipe")

MetaPipe assesses the normality of variables (traits) by performing a Shapiro-Wilk test on the raw data (see Load Raw Data and Replace Missing Data. Based on whether or not the data approximates a normal distribution, an array of transformations will be computed, and the normality assessed one more time.

Function call

MetaPipe::assess_normality(raw_data = raw_data, 
                           excluded_columns = c(2, 3, ..., M), 
                           # Optional
                           cpus = 1, 
                           out_prefix = "metapipe", 
                           plots_dir = tempdir(), 
                           transf_vals = c(2, exp(1), 3, 4, 5, 6, 7, 8, 9, 10),
                           alpha = 0.05,
                           pareto_scaling = FALSE,
                           show_stats = TRUE)

where raw_data is a data frame containing the raw data, as described in Load Raw Data and excluded_columns is a vector containing the indices of the properties, e.g. c(2, 3, ..., M). The other arguments are optional, cpus is the number of cores to use, in other words, the number of concurrent traits to process, out_prefix is the prefix for output files, plots_dir is the output directory where the plots will be stored, transf_vals is a vector containing the transformation values to be used when transforming the original data, alpha is the significance level for the Wilk-Shapiro tests, pareto_scaling is a boolean flag to indicate whether or not to scale the traits to the same scale, and show_stats is a boolean flag to show or hide some general statistics of the normalisation process.

# F1 Seedling Ionomics dataset
data(ionomics) # Includes some missing data
ionomics_rev <- MetaPipe::replace_missing(ionomics, 
                                          excluded_columns = c(1, 2),
                                          replace_na =  TRUE)
ionomics_normalised <- 
  MetaPipe::assess_normality(ionomics_rev,
                             excluded_columns = c(1, 2),
                             transf_vals = c(2, exp(1)),
                             out_prefix = "README-ionomics",
                             plots_dir = "man/figures/",
                             pareto_scaling = FALSE)
#> Total traits (excluding all NAs traits):     21
#> Normal traits (without transformation):      2
#> Normal traits (transformed):                 4
#> Total normal traits:                         6
#> Total skewed traits:                         15
#> 
#> Transformations summary:
#>  f(x)      Value     # traits  
#>  log       2         3         
#>  root      e         1

# Extract normalised features
ionomics_norm <- ionomics_normalised$norm
ionomics_skew <- ionomics_normalised$skew

The function call to MetaPipe::assess_normality will print a summary of the transformations performed (if any), as well as an overview of the number of traits that should be considered normal and skewed. Next, we can preview some of the partial output of the normality assessment process:

# Normal traits
knitr::kable(ionomics_norm[1:5, ])

ID	Ca44	B11	Na23	Mg26	Rb85	Sr88
E_001	15894.22	4.222397	2.042740	10.77021	1.555742	7.347059
E_002	13155.45	3.855684	1.917202	10.54095	2.058711	6.890243
E_004	14182.51	3.879033	2.354263	10.51931	2.198422	9.025915
E_005	22550.82	4.329576	2.477233	11.13450	1.791578	15.292360
E_006	15982.76	3.697997	2.419593	11.72734	2.229866	13.901449

# Skewed traits (partial output)
knitr::kable(ionomics_skew[1:5, 1:8])

ID	K39	P31	Li7	Al27	S34	Fe54	Mn55
E_001	5888.311	1743.118	0.0128699	3.845879	1152.944	27.59340	54.53991
E_002	7013.400	2244.684	0.0119316	5.825639	1600.442	35.49159	52.57114
E_004	7966.273	2311.057	0.0212316	8.036047	1039.098	39.13434	36.66475
E_005	7514.089	2315.675	0.0233063	9.482051	1091.607	40.22041	43.24368
E_006	7608.464	1995.193	0.0588128	29.329605	1096.871	75.23614	53.64705

Among the transformed traits, we have B11 and Na23. Both of which seem to be skewed, but after a simple transformation, can be classify as normalised traits.

QTL mapping

Scan one QTL mapping

qtl_scone <- function(x_data, cpus = 1, ...)

where x_data

# F1 Seedling Ionomics dataset
data(father_riparia) # Genetic map
# Load cross file with genetic map and raw data for normal traits
x <- MetaPipe::read.cross(father_riparia, 
                          ionomics_norm,
                          genotypes = c("nn", "np", "--"))
#>  --Read the following data:
#>   166  individuals
#>   1115  markers
#>   7  phenotypes
#> Warning in summary.cross(cross): Some markers at the same position on chr
#> 1,4,5,7,8,9,10,12,14,15,16,17; use jittermap().
#>  --Cross type: f2
                          
set.seed(123)
x <- qtl::jittermap(x)
x <- qtl::calc.genoprob(x, step = 1, error.prob = 0.001)
x_scone <- MetaPipe::qtl_scone(x, 1, model = "normal", method = "hk")

MetaPipe

Overview

Installation

Example

Load raw data

Function call

Replace missing data

Function call

Assess normality

Function call

QTL mapping

Scan one QTL mapping

Links

License

Citation

Developers

Dev status