R package:blupADC-Feature 3
Table of Contents
Overview
👦 Breed composition analysis is usually a problem in data analysis. In package:blupADC, user can solve this problem by applying geno_check function. In addition, user can detect the duplication of genomic data easily by applying geno_check function.
Example
Breed composition analysis
library(blupADC)
check_result=geno_check(
input_data_hmp=example_PCA_data_hmp, #provided hapmap data object
duplication_check=FALSE, #whether check the duplication of genotype
breed_check=TRUE, # whether check the record of breed
breed_record=example_PCA_Breed, # provided breed record
return_result=TRUE #return result
)
Check duplication
library(blupADC)
check_result=geno_check(
input_data_hmp=example_data_hmp, #provided hapmap data object
duplication_threshold=0.95, #threshold of duplication
duplication_check=TRUE, #whether check the duplication of genotype
breed_check=FALSE, # whether check the record of breed
return_result=TRUE #return result
)
Output
The result of output mainly contains two parts, including:
- duplicated_genotype
| IND1 | IND1 | 1 |
|---|---|---|
| IND2 | IND2 | 1 |
| IND3 | IND3 | 1 |
| IND4 | IND4 | 1 |
The first and the second column is the name of individual, the third column is the percentage of overlap.
- pca_outlier
| Id | Breed | Expeced_Breed |
|---|---|---|
| IND100 | LL | YY |
| IND233 | DD | YY |
| IND91 | LL | YY |
| IND92 | LL | YY |
| IND93 | LL | YY |
| IND94 | LL | YY |
Figure A is the PCA result before correcting breed record , Figure B is the PCA result after correcting breed correcting record

Parameter
Many parameters in genotype_data_overlap are the same as in genotype_data_format_conversion function (see more details).
Thus, we will introduce specific parameters in genotype_data_overlap function.
- 1:selected_snps
Number of SNPs in detecting overlap, numeric class. Default is 1000.
- 2:overlap_threshold
Threshold of duplicate genotype, numeric class. Default is 0.95.
- 3:duplication_check
Whether check duplication of genotype, logical class. Default is TRUE.
- 4:breed_check
Whether check breed record of genotype, logical class. Default is FALSE.
- 5:ind_breed
Breed record of individuals, data.frame class.
The format of ind_breed is showing as follow:
| Id | Breed |
|---|---|
| IND1 | YY |
| IND2 | YY |
| IND3 | YY |
| IND4 | YY |
| IND5 | YY |
| IND6 | YY |
When the proportion of genotype data between two individuals is larger than this threshold, these two individuals will be regarded as the same individual.