R package:blupADC-Feature 3

Table of Contents

Overview

👦 Breed composition analysis is usually a problem in data analysis. In package:blupADC, user can solve this problem by applying geno_check function. In addition, user can detect the duplication of genomic data easily by applying geno_check function.

Example

Breed composition analysis

library(blupADC)
check_result=geno_check(
                  input_data_hmp=example_PCA_data_hmp,   #provided hapmap data object
                  duplication_check=FALSE,       #whether check the duplication of genotype
                  breed_check=TRUE,              # whether check the record of breed
                  breed_record=example_PCA_Breed,        # provided breed record
                  return_result=TRUE             #return result 
                  )

Check duplication

library(blupADC)
check_result=geno_check(
                  input_data_hmp=example_data_hmp,   #provided hapmap data object
                 duplication_threshold=0.95, #threshold of duplication
                  duplication_check=TRUE,    #whether check the duplication of genotype
                  breed_check=FALSE,         # whether check the record of breed
                  return_result=TRUE         #return result 
                  )

Output

The result of output mainly contains two parts, including:

  • duplicated_genotype
IND1 IND1 1
IND2 IND2 1
IND3 IND3 1
IND4 IND4 1

The first and the second column is the name of individual, the third column is the percentage of overlap.

  • pca_outlier
Id Breed Expeced_Breed
IND100 LL YY
IND233 DD YY
IND91 LL YY
IND92 LL YY
IND93 LL YY
IND94 LL YY

Figure A is the PCA result before correcting breed record , Figure B is the PCA result after correcting breed correcting record

PCA_blupADC

Parameter

Many parameters in genotype_data_overlap are the same as in genotype_data_format_conversion function (see more details).

Thus, we will introduce specific parameters in genotype_data_overlap function.

  • 1:selected_snps

Number of SNPs in detecting overlap, numeric class. Default is 1000.

  • 2:overlap_threshold

Threshold of duplicate genotype, numeric class. Default is 0.95.

  • 3:duplication_check

Whether check duplication of genotype, logical class. Default is TRUE.

  • 4:breed_check

Whether check breed record of genotype, logical class. Default is FALSE.

  • 5:ind_breed

Breed record of individuals, data.frame class.

The format of ind_breed is showing as follow:

Id Breed
IND1 YY
IND2 YY
IND3 YY
IND4 YY
IND5 YY
IND6 YY

When the proportion of genotype data between two individuals is larger than this threshold, these two individuals will be regarded as the same individual.

Quanshun Mei
Quanshun Mei
Postdoctoral researcher

My research interests include applying genomic selection and machine learning in animal breeding.