R package:blupADC- Overview
Table of Contents
R package for animal and plant breeding
Contents
Documents support two-language(English and Chinese).
Documents support two-language(English and Chinese).
OVERVIEW
blupADC
is an useful and powerful tool for handling genomic data and pedigree data in animal and plant breeding(traditional blup and genomic selection). In the design of this package, most of data analysis problems in breeding have been considered, and the speed of calculation is also the key point. In terms of the speed, the core functions of this package are coded by c++ (Rcpp
and RcppArmadillo
) , and it also supports parallel calculation (by applying openMP
programming) and big data calculation(by importing bigmemory
package).
blupADC
provides many useful functions for the whole steps for animal and plant breeding, including pedigree analysis(trace pedigree, rename pedigree, and correct pedigree errors), genotype data format conversion(supports Hapmap, Plink, Blupf90, Numeric, VCF and Haplotype format), genotype data quality control and imputation, construction of kinship matrix(pedigree, genomic and single-step),and genetic evaluation( by interfacing with two famous breeding softwares, DMU and BLUPF90 in an easy way).
Finally, we kindly provides an easier way of applying blupADC
, which is a free website(shinyapp). Several functions are still under development. But the pitfall of this website is that it can’t handle big data.
😊 Good Luck Charlie ! If you have suggestion or question, please contact: quanshun1994@gmail.com !
👨💻 Citation
Quanshun Mei, Chuanke Fu, Jieling Li, Shuhong Zhao, and Tao Xiang. “blupADC: An R package and shiny toolkit for comprehensive genetic data analysis in animal and plant breeding.” bioRxiv (2021), doi: https://doi.org/10.1101/2021.09.09.459557
New features
1.0.3
- Incorporate maternal effect, permanent effect, random regression effect, and social genetic effect models in the genetic evaluation by DMU (2021.8.24)
1.0.4
- Incorporate haplotype format conversion ,haplotype-based numeric matrix construction and haplotype-based additive relationship matrix construction (2021.10.8)
- Import bigmemory object in matrix save and calculation for handling big data(2021.10.8)
1.0.5
- Incorporate format conversion from blupf90 and numeric(0,1,2) format to hapmap format (2021.11.5)
- Support LR method to evaluate prediction accuracy and Hotelling_test to test significance between predictive abilities(by cross-validation method)
- Fix dEBV(2021.12.22)
1.0.6
- Support running multiple tasks in DMU and BLUPF90 simultaneously! (2022.05.25)
1.1.0
- Introduce object-oriented programming in running Genomic Prediction (2023.07.17) (see more details)
- Move the example data and software into another R package, blupSUP, user has to install this package only for once time!
- User can still use the R function in the previous version of blupADC !
GETTING STARTED
🙊Installation
blupADC
links to R packages Rcpp
, RcppArmadillo
, data.table
and bigmemory
. These dependencies should be installed before installing blupADC
.
install.packages(c("Rcpp", "RcppArmadillo","RcppProgress","data.table","bigmemory","R6"))
👉 Note: In the analysis of DMU and BLUPF90 , we need to download software DMU (DMU download website) and BLUPF90 previously (BLUPF90 download website). For convenience, we have encapsulated the basic module of DMU and BLUPF90 in package blupADC
.
For commercial use of DMU and BLUPF90, user must contact the author of DMU and BLUPF90 !!!
For the latest version of blupADC, user has to install the blupSUP package at first(only for one time), which contains the example data and software(e.g. DMU, BLUPF90, and etc.)!
devtools::install_github("TXiang-lab/blupSUP")
Install blupADC via devtools
devtools::install_github("TXiang-lab/blupADC")
👉 Note:If the connection with github is not good(such as in China), user can download as below:
devtools::install_git("https://gitee.com/qsmei/blupADC")
⚠️During installation, if there are some errors like that: ‘trimatl_ind’ was not declared in this scope, ‘class arma::Mat
After installed successfully, the blupADC
package can be loaded by typing
library(blupADC)
Note: In terms of the relationship matrix construction, we highly recommend Microsoft R Open(faster than traditional R many times)
🙊Features
- Feature 1. Genomic data format conversion
- Feature 2. Genomic data quality control and genotype imputation
- Feature 3. Breed composition analysis and duplication detection of genomic data
- Feature 4. Pedigree tracing, and analysis
- Feature 5. Pedigree visualization
- Feature 6. Relationship matrix construction(A,G, and H)
- Feature 7. Genetic evaluation with DMU
- Feature 8. Genetic evaluation with BLUPF90
Usage
blupADC
provides several datasets objects, including data_hmp
, origin_pedigree
.
In addition, blupSUP
provides several files which are saved in ~/blupSUP/extdata
. We can get the path of these files by typing
system.file("extdata", package = "blupSUP") # path of provided files
Feature 1. Genomic data format conversion (see more details)
library(blupADC)
format_result=geno_format(
input_data_hmp=example_data_hmp, # provided data variable
output_data_type=c("Plink","BLUPF90","Numeric"),# output data format
output_data_path=getwd(), #output data path
output_data_name="blupADC", #output data name
return_result = TRUE, #save result in R environment
cpu_cores=1 # number of cpu
)
#convert phased VCF data to haplotype format and haplotype-based numeric format
library(blupADC)
data_path=system.file("extdata", package = "blupSUP") # path of example files
phased=geno_format(
input_data_path=data_path, # input data path
input_data_name="example.vcf", # input data name,for vcf data
input_data_type="VCF", # input data type
phased_genotype=TRUE, # whether the vcf data has been phased
haplotype_window_nSNP=5, # according to nSNP define haplotype-block,
bigmemory_cal=TRUE, # format conversion via bigmemory object
bigmemory_data_path=getwd(), # path of bigmemory data
bigmemory_data_name="test_blupADC", #name of bigmemory data
output_data_type=c("Haplotype","Numeric"),# output data format
return_result=TRUE, #save result in R environment
cpu_cores=1 # number of cpu
)
Feature 2. Genomic data quality control and genotype imputation (see more details)
library(blupADC)
geno_qc_impute(
input_data_hmp=example_data_hmp, #provided data variable
data_analysis_method="QC_Imputation", #analysis method type,QC + imputatoin
output_data_path=getwd(), #output data path
output_data_name="YY_data", #output data name
output_data_type="VCF" #output data format
)
Feature 3. Breed composition analysis and duplication detection of genomic data (see more details)
library(blupADC)
check_result=geno_check(
input_data_hmp=example_PCA_data_hmp, #provided hapmap data object
duplication_check=FALSE, #whether check the duplication of genotype
breed_check=TRUE, # whether check the record of breed
breed_record=example_PCA_Breed, # provided breed record
output_data_path=getwd(), #output path
return_result=TRUE #save result as a R environment variable
)
Feature 4. Pedigree tracing, analysis (see more details)
library(blupADC)
pedigree_result=trace_pedigree(
input_pedigree=example_ped1, #provided pedigree data variable
trace_generation=3, # trace generation
output_pedigree_tree=T # output pedigree tree
)
Feature 5. Pedigree visualization (see more details)
library(blupADC)
plot=ggped(
input_pedigree=example_ped2,
trace_id=c("121"),
trace_sibs=TRUE #whether plot the sibs of subset-id
)
Feature 6. Relationship matrix construction(A,G, and H) (see more details)
library(blupADC)
data_path=system.file("extdata", package = "blupSUP") # path of example files
kinship_result=cal_kinship(
input_data_path=data_path, # input data path
input_data_name="example.vcf", # input data name,for vcf data
input_data_type="VCF", # input data type
kinship_type=c("G_A","G_D"), #type of kinship matrix
dominance_type=c("genotypic"), #type of dominance effect
inbred_type=c("Homozygous"), #type of inbreeding coefficients
return_result=TRUE) #save result as a R environment variable
Feature 7. Genetic evaluation with DMU (see more details)
library(blupADC)
data_path=system.file("extdata", package = "blupSUP") # path of example files
run_DMU(
phe_col_names=c("Id","Mean","Sex","Herd_Year_Season","Litter","Trait1","Trait2","Age"), # colnames of phenotype
target_trait_name=list(c("Trait1")), #trait name
fixed_effect_name=list(c("Sex","Herd_Year_Season")), #fixed effect name
random_effect_name=list(c("Id","Litter")), #random effect name
covariate_effect_name=NULL, #covariate effect name
phe_path=data_path, #path of phenotype file
phe_name="phenotype.txt", #name of phenotype file
integer_n=5, #number of integer variable
analysis_model="PBLUP_A", #model of genetic evaluation
dmu_module="dmuai", #modeule of estimating variance components
relationship_path=data_path, #path of relationship file
relationship_name="pedigree.txt", #name of relationship file
output_result_path=getwd() # output path
)
Feature 8. Genetic evaluation with BLUPF90 (see more details)
library(blupADC)
data_path=system.file("extdata", package = "blupSUP") # path of example files
run_BLUPF90(
phe_col_names=c("Id","Mean","Sex","Herd_Year_Season","Litter","Trait1","Trait2","Age"), # colnames of phenotype
target_trait_name=list(c("Trait1")), #trait name
fixed_effect_name=list(c("Sex","Herd_Year_Season")), #fixed effect name
random_effect_name=list(c("Id","Litter")), #random effect name
covariate_effect_name=NULL, #covariate effect name
phe_path=data_path, #path of phenotype file
phe_name="phenotype.txt", #name of phenotype file
analysis_model="PBLUP_A", #model of genetic evaluation
relationship_path=data_path, #path of relationship file
relationship_name="pedigree.txt", #name of relationship file
output_result_path=getwd() # output path
)