R package:blupADC- Overview

Table of Contents
logo-blupADC

R package for animal and plant breeding

Contents


Documents support two-language(English and Chinese). overview

Documents support two-language(English and Chinese). overview

OVERVIEW

blupADC is an useful and powerful tool for handling genomic data and pedigree data in animal and plant breeding(traditional blup and genomic selection). In the design of this package, most of data analysis problems in breeding have been considered, and the speed of calculation is also the key point. In terms of the speed, the core functions of this package are coded by c++ (Rcpp and RcppArmadillo ) , and it also supports parallel calculation (by applying openMP programming) and big data calculation(by importing bigmemory package).

blupADC provides many useful functions for the whole steps for animal and plant breeding, including pedigree analysis(trace pedigree, rename pedigree, and correct pedigree errors), genotype data format conversion(supports Hapmap, Plink, Blupf90, Numeric, VCF and Haplotype format), genotype data quality control and imputation, construction of kinship matrix(pedigree, genomic and single-step),and genetic evaluation( by interfacing with two famous breeding softwares, DMU and BLUPF90 in an easy way).

Finally, we kindly provides an easier way of applying blupADC, which is a free website(shinyapp). Several functions are still under development. But the pitfall of this website is that it can’t handle big data.

😊 Good Luck Charlie ! If you have suggestion or question, please contact: quanshun1994@gmail.com !

👨‍💻 Citation

Quanshun Mei, Chuanke Fu, Jieling Li, Shuhong Zhao, and Tao Xiang. “blupADC: An R package and shiny toolkit for comprehensive genetic data analysis in animal and plant breeding.” bioRxiv (2021), doi: https://doi.org/10.1101/2021.09.09.459557

New features

1.0.3

  • Incorporate maternal effect, permanent effect, random regression effect, and social genetic effect models in the genetic evaluation by DMU (2021.8.24)

1.0.4

  • Incorporate haplotype format conversion ,haplotype-based numeric matrix construction and haplotype-based additive relationship matrix construction (2021.10.8)
  • Import bigmemory object in matrix save and calculation for handling big data(2021.10.8)

1.0.5

  • Incorporate format conversion from blupf90 and numeric(0,1,2) format to hapmap format (2021.11.5)
  • Support LR method to evaluate prediction accuracy and Hotelling_test to test significance between predictive abilities(by cross-validation method)
  • Fix dEBV(2021.12.22)

1.0.6

  • Support running multiple tasks in DMU and BLUPF90 simultaneously! (2022.05.25)

1.1.0

  • Introduce object-oriented programming in running Genomic Prediction (2023.07.17) (see more details)
  • Move the example data and software into another R package, blupSUP, user has to install this package only for once time!
  • User can still use the R function in the previous version of blupADC !

GETTING STARTED

🙊Installation

blupADC links to R packages Rcpp, RcppArmadillo , data.table and bigmemory . These dependencies should be installed before installing blupADC.

install.packages(c("Rcpp", "RcppArmadillo","RcppProgress","data.table","bigmemory","R6"))

👉 Note: In the analysis of DMU and BLUPF90 , we need to download software DMU (DMU download website) and BLUPF90 previously (BLUPF90 download website). For convenience, we have encapsulated the basic module of DMU and BLUPF90 in package blupADC.

For commercial use of DMU and BLUPF90, user must contact the author of DMU and BLUPF90 !!!

For the latest version of blupADC, user has to install the blupSUP package at first(only for one time), which contains the example data and software(e.g. DMU, BLUPF90, and etc.)!

devtools::install_github("TXiang-lab/blupSUP")

Install blupADC via devtools

devtools::install_github("TXiang-lab/blupADC")

👉 Note:If the connection with github is not good(such as in China), user can download as below:

devtools::install_git("https://gitee.com/qsmei/blupADC")

⚠️During installation, if there are some errors like that: ‘trimatl_ind’ was not declared in this scope, ‘class arma::Mat’ has no member named ‘clean’……Please make sure the version of RcppArmadillo over 0.9.870.2.0."

After installed successfully, the blupADC package can be loaded by typing

library(blupADC)

Note: In terms of the relationship matrix construction, we highly recommend Microsoft R Open(faster than traditional R many times)

🙊Features

  • Feature 1. Genomic data format conversion
  • Feature 2. Genomic data quality control and genotype imputation
  • Feature 3. Breed composition analysis and duplication detection of genomic data
  • Feature 4. Pedigree tracing, and analysis
  • Feature 5. Pedigree visualization
  • Feature 6. Relationship matrix construction(A,G, and H)
  • Feature 7. Genetic evaluation with DMU
  • Feature 8. Genetic evaluation with BLUPF90

Usage

blupADC provides several datasets objects, including data_hmp, origin_pedigree.

In addition, blupSUP provides several files which are saved in ~/blupSUP/extdata. We can get the path of these files by typing

system.file("extdata", package = "blupSUP") # path of provided files

Feature 1. Genomic data format conversion (see more details)

library(blupADC)
format_result=geno_format(
    	input_data_hmp=example_data_hmp,  # provided data variable
        output_data_type=c("Plink","BLUPF90","Numeric"),# output data format
    	output_data_path=getwd(),   #output data path      
    	output_data_name="blupADC", #output data name    
        return_result = TRUE,       #save result in R environment
        cpu_cores=1                 # number of cpu 
                  )

#convert phased VCF data to haplotype format and  haplotype-based numeric format
library(blupADC)
data_path=system.file("extdata", package = "blupSUP")  #  path of example files 
phased=geno_format(
         input_data_path=data_path,      # input data path 
         input_data_name="example.vcf",  # input data name,for vcf data
         input_data_type="VCF",          # input data type
         phased_genotype=TRUE,           # whether the vcf data has been phased
         haplotype_window_nSNP=5,        # according to nSNP define haplotype-block,
    	 bigmemory_cal=TRUE,             # format conversion via bigmemory object
    	 bigmemory_data_path=getwd(),    # path of bigmemory data 
    	 bigmemory_data_name="test_blupADC", #name of bigmemory data 
         output_data_type=c("Haplotype","Numeric"),# output data format
         return_result=TRUE,             #save result in R environment
         cpu_cores=1                     # number of cpu 
                  )

Feature 2. Genomic data quality control and genotype imputation (see more details)

library(blupADC)
geno_qc_impute(
            input_data_hmp=example_data_hmp,        #provided data variable
            data_analysis_method="QC_Imputation",   #analysis method type,QC + imputatoin
            output_data_path=getwd(),               #output data path
            output_data_name="YY_data",             #output data name
            output_data_type="VCF"                  #output data format 
            )                       

Feature 3. Breed composition analysis and duplication detection of genomic data (see more details)

library(blupADC)
check_result=geno_check(
                  input_data_hmp=example_PCA_data_hmp,   #provided hapmap data object
                  duplication_check=FALSE,       #whether check the duplication of genotype
                  breed_check=TRUE,               # whether check the record of breed
                  breed_record=example_PCA_Breed, # provided breed record
                  output_data_path=getwd(),       #output path
                  return_result=TRUE              #save result as a R environment variable
                  )

Feature 4. Pedigree tracing, analysis (see more details)

library(blupADC)
pedigree_result=trace_pedigree(
                input_pedigree=example_ped1,   #provided pedigree data variable
                trace_generation=3,            # trace generation
                output_pedigree_tree=T         # output pedigree tree
                )  

Feature 5. Pedigree visualization (see more details)

library(blupADC)
plot=ggped(
       input_pedigree=example_ped2,
       trace_id=c("121"),
       trace_sibs=TRUE   #whether plot the sibs of subset-id  
        ) 

Feature 6. Relationship matrix construction(A,G, and H) (see more details)

library(blupADC)
data_path=system.file("extdata", package = "blupSUP")  #  path of example files 
kinship_result=cal_kinship(
        		input_data_path=data_path,      # input data path 
        		input_data_name="example.vcf",  # input data name,for vcf data
         		input_data_type="VCF",          # input data type
    			kinship_type=c("G_A","G_D"),      #type of  kinship matrix
    			dominance_type=c("genotypic"),    #type of dominance effect
    			inbred_type=c("Homozygous"),      #type of inbreeding coefficients
    			return_result=TRUE)               #save result as a R environment variable         

Feature 7. Genetic evaluation with DMU (see more details)

library(blupADC)
data_path=system.file("extdata", package = "blupSUP")  #  path of example files 
  
run_DMU(
        phe_col_names=c("Id","Mean","Sex","Herd_Year_Season","Litter","Trait1","Trait2","Age"), # colnames of phenotype 
        target_trait_name=list(c("Trait1")),                     #trait name 
        fixed_effect_name=list(c("Sex","Herd_Year_Season")),     #fixed effect name
        random_effect_name=list(c("Id","Litter")),               #random effect name
        covariate_effect_name=NULL,                              #covariate effect name
        phe_path=data_path,                          #path of phenotype file
        phe_name="phenotype.txt",                    #name of phenotype file
        integer_n=5,                                 #number of integer variable 
        analysis_model="PBLUP_A",                    #model of genetic evaluation
        dmu_module="dmuai",                          #modeule of estimating variance components 
        relationship_path=data_path,                 #path of relationship file 
        relationship_name="pedigree.txt",            #name of relationship file 
        output_result_path=getwd()                   # output path 
        )

Feature 8. Genetic evaluation with BLUPF90 (see more details)

library(blupADC)
data_path=system.file("extdata", package = "blupSUP")  #  path of example files 
  
run_BLUPF90(
        phe_col_names=c("Id","Mean","Sex","Herd_Year_Season","Litter","Trait1","Trait2","Age"), # colnames of phenotype 
        target_trait_name=list(c("Trait1")),                     #trait name 
        fixed_effect_name=list(c("Sex","Herd_Year_Season")),     #fixed effect name
        random_effect_name=list(c("Id","Litter")),               #random effect name
        covariate_effect_name=NULL,                              #covariate effect name
        phe_path=data_path,                          #path of phenotype file
        phe_name="phenotype.txt",                    #name of phenotype file
        analysis_model="PBLUP_A",                    #model of genetic evaluation
        relationship_path=data_path,                 #path of relationship file 
        relationship_name="pedigree.txt",            #name of relationship file 
        output_result_path=getwd()                   # output path 
        )   
Quanshun Mei
Quanshun Mei
Postdoctoral researcher

My research interests include applying genomic selection and machine learning in animal breeding.