Hailiang Song,Qin Zhang and Xiangdong Ding*
Abstract Background:Different production systems and climates could lead to genotype-by-environment(G×E)interactions between populations,and the inclusion of G×E interactions is becoming essential in breeding decisions.The objective of this study was to investigate the performance of multi-trait models in genomic prediction in a limited number of environments with G×E interactions.Results:In total,2,688 and 1,384 individuals with growth and reproduction phenotypes,respectively,from two Yorkshire pig populations with similar genetic backgrounds were genotyped with the PorcineSNP80 panel.Single-and multi-trait models with genomic best linear unbiased prediction(GBLUP)and BayesCπwere implemented to investigate their genomic prediction abilities with 20 replicates of five-fold cross-validation.Our results regarding between-environment genetic correlations of growth and reproductive traits(ranging from 0.618 to 0.723)indicated the existence of G×E interactions between these two Yorkshire pig populations.For single-trait models,genomic prediction with GBLUP was only 1.1%more accurate on average in the combined population than in single populations,and no significant improvements were obtained by BayesCπfor most traits.In addition,single-trait models with either GBLUP or BayesCπproduced greater bias for the combined population than for single populations.However,multi-trait models with GBLUP and BayesCπbetter accommodated G×E interactions,yielding 2.2%-3.8%and 1.0%-2.5%higher prediction accuracies for growth and reproductive traits,respectively,compared to those for single-trait models of single populations and the combined population.The multi-trait models also yielded lower bias and larger gains in the case of a small reference population.The smaller improvement in prediction accuracy and larger bias obtained by the single-trait models in the combined population was mainly due to the low consistency of linkage disequilibrium between the two populations,which also caused the BayesCπmethod to always produce the largest standard error in marker effect estimation for the combined population.Conclusions:In conclusion,our findings confirmed that directly combining populations to enlarge the reference population is not efficient in improving the accuracy of genomic prediction in the presence of G×E interactions,while multi-trait models perform better in a limited number of environments with G×E interactions.
Keywords:Combined population,Genotype-by-environment interaction,Linkage disequilibrium,Multi-trait model,Pig
Genomic selection[1],which relies on linkage disequilibrium(LD)between single nucleotide polymorphisms(SNPs)and causative variants,has become a useful tool in animal breeding[2]and plant breeding[3].Generally,the accuracy of genomic selection increases as the number of animals in the reference population increases[4].For small reference populations,combining populations of the same breed or related breeds reportedly increases the accuracy of genomic selection in cattle and pig[2,5-8].However,genomic prediction using combined populations has not shown a clear advantage over genomic prediction using single populations,possibly because the LD between SNPs and causative variants is not sufficiently consistent between populations[5-7].The failure to consider genotype-by-environment(G×E)interactions between populations is another important reason.Thus,exploiting G×E interactions in combined populations could be an attractive and meaningful approach for increasing the accuracy of genomic prediction.
A G×E interaction is defined as different genotypes reacting unequally to environmental changes[9],and ignoring possible G×E interactions could lead to a reduction in genetic gains.Two models are widely used to detect G×E interactions.One is a multi-trait model,which assumes that the phenotypic expressions of a trait in various environments are different traits[10].In this case,the genetic correlation between traits in different environments is used as the indicator of a G×E interaction[9,11].The other model is the reaction norm model[12],which models the trajectory of animal performance as a function of an environmental gradient.However,in a small number of environments,e.g.,two or three environments,the reaction norm model captures only part of the G×E interaction due to the limited amount of environmental variation.Furthermore,it is difficult to define a suitable environmental covariate in the reaction norm model[12,13].
With the development of genomic selection in pig breeding,more farms could be included to enlarge the reference population size in China,and joint genomic evaluation across countries or breeding organizations is expected.In dairy cattle,G×E interactions are usually ignored in combined-population genomic prediction;for example,Zhou et al.[14]reported that the consistency of LD was very high(0.97)between Chinese and Nordic Holsteins,indicating a high level of genetic similarity between the two populations.Therefore,it is necessary to consider the influence of G×E interactions,as the consistency of LD between pig populations is relatively low compared to that in dairy cattle.The objectives of this study were to evaluate G×E interactions between two Yorkshire pig populations with similar genetic backgrounds and to investigate the performance of multitrait models in genomic prediction in a limited number of environments with G×E interactions.
The whole procedure for collecting pig blood samples was carried out in strict accordance with the protocol approved by the Animal Welfare Committee of China Agricultural University(Permit Number:DK996).
Yorkshire pig populations were sampled from two breeding farms in the north(Beijing)and south(Fujian Province)of China,designated Beijing and Fujian,respectively,for convenience.The pigs in the Beijing and Fujian populations are American Yorkshire progeny with the same genetic background but no pedigree links.Phenotypic records of reproductive traits,namely,the number of piglets born alive(NBA)and the total number of piglets born(TNB),and growth traits,namely,days to 100 kg(AGE)and backfat thickness at 100 kg(BFT),were examined.The measurements of AGE and BFT were reported in Song et al.[7].The phenotypic information is presented in Table 1.For the growth traits,phenotypic information from 28,827 and 13,860 pigs born in 2008-2017 and 2007-2018,respectively,in the Beijing and Fujian populations was available.For the reproductive traits,the farrow records of 5,968 and 3,115 pigs born in 2007-2017 and 2007-2018,respectively,in Beijing and Fujian were obtained.
In pig genomic prediction analyses,corrected phenotypes derived from pedigree-based estimated breeding values(EBVs)are usually used as response variables.Conventional EBVs and genetic parameters were estimated based on a repeatability model of reproductive traits implemented separately in each population.In themodel,the fixed effects included herd,year and season,and the random effects included the additive genetic,random residual and permanent environmental effects.For the growth traits,a bivariate animal model was implemented,and the fixed effects included herd,year,season and sex.In addition to the additive genetic effect of each individual and the random residual effect,litter effect was also taken into account as a random effect.A total of 31,529 and 32,175 animals were traced back to construct a pedigree relationship matrix for the Beijing and Fujian populations,respectively.Afterwards,corrected phenotypic values(yc)for reproductive traits were calculated as the EBV plus the average of estimated residuals over parities for a sow,andycvalues for growth traits were computed as the EBV plus the estimated residual for each individual in each population.EBVs and genetic parameters were computed using the DMUAI procedure in DMU software[15].
Table 1 Summary statistics for the two Yorkshire pig populations,including the numbers of genotyped animals and estimates of heritability(h2)
Genomic DNA was extracted from blood samples using a TIANamp Blood DNA Kit DP348(Tiangen,Beijing,China).Genotyping was performed using the PorcineSNP80 BeadChip(Illumina,San Diego,CA,USA),which includes 68,528 SNPs distributed across the entire pig genome.The number of genotyped animals for each trait is presented in Table 1.
Missing genotypes for SNPs with known chromosomal positions were imputed by BEAGLE with default parameter settings[16],and those for SNPs with unknown positions were discarded.Based on the imputed dataset,SNPs were excluded from the analysis if the minor allele frequency was less than 0.05,the call frequency score was less than 0.90,or the genotype frequencies deviated from Hardy-Weinberg equilibrium with aP-value lower than 10-7.The animals with a call rate less than 0.90 or with an EBV reliability less than 0.3 were removed.After quality control,2,688 and 1,384 genotyped individuals remained for the growth and reproductive traits,respectively,and 56,445 SNPs were ultimately used.
Principal component analysis(PCA)was performed on the G matrix using GCTA software[17].This resulted in a matrix of eigenvectors in descending order that represented principal components(PCs),where PC1 had the largest eigenvalue.The overall structuring of genetic variation was visualized in a scatterplot of the top few PCs.
LD between a pair of SNPs was measured as[18],andwas calculated as follows:
wheref(AB),f(A),f(a),f(B)andf(b)are the observed frequencies of haplotype AB and alleles A,a,B and b,respectively.The consistency of LD in the two populations was measured by the correlation ofrLDvalues of adjacent marker pairs on each autosome between the two populations.
Four methods,namely,single-trait genomic BLUP(GBLUP),multi-trait GBLUP,single-trait BayesCπand multi-trait BayesCπ,were implemented to predict the genomic EBV(GEBV)for each genotyped individual.
Single-trait GBLUP(ST-GBLUP)model
The ST-GBLUP[19]model was used to predict the GEBVs of all genotyped individuals.
where y is the vector of corrected phenotypic values;u is the overall mean;1 is a vector of 1;and a is the vector of genomic breeding values,following a normal distribution N(0,Gσa2),in whichσa2is the variance of the addictive genetic effect and G is the marker-based genomic relationship matrix[19].e is the vector of random errors,following a normal distribution N(0,Iσe2),in which σe2is the residual variance.
Multi-trait GBLUP(MT-GBLUP)model
The MT-GBLUP model was defined as.
Single-trait BayesCπ(ST-BayesCπ)model
In ST-BayesCπ[20],marker effects on phenotypic traits were sampled from a mixture of null and normal distributions,
where yiis a vector of phenotypes,μis a vector of overall means,mijis the genotype covariate at locusjfor individuali(coded as 0,1 and 2),pis the number of genotyped loci,αjis a vector of allele substitution effects,and eiis a vector of random residuals for individualiwith a distributionThe markers in the model shared a common variance.The prior for the allele substitution effects of each molecular markerαjdepends on the varianceand the probabilityπthat markers do not have a genetic effect.
Multi-trait BayesCπ(MT-BayesCπ)model
In MT-BayesCπ,where each locus can have an effect on any combination of traits[21],the prior forαjk,the allele substitution or marker effect on traitkfor locusj,is a mixture with a point mass at zero and univariate normal distribution conditional on
wheremij,jandpare the same as in ST-BayesCπ.yiis a vector of phenotypes ofktraits for individuali(corrected phenotypic values of the same trait in different populations),μis a vector of overall means forktraits,andαjis a vector of marker allele substitution effects onktraits for locusj.The residuals,ei,are a priori assumed to be independently and identically distributed multivariate normal vectors with a null mean and covariance matrix R(R is the same as described for the multi-trait GBLUP model),which,in turn,is assumed to have an inverse Wishart prior distribution,The covariance between effects for traitkandk′at the same locus,i.e.,αjkandαjk′is
Imputation of missing phenotypic data was implemented in each Markov chain Monte Carlo(MCMC)iteration in MT-BayesCπ.
For the ST-BayesCπand MT-BayesCπmethods,the MCMC chain was run for 20,000 cycles,and the first 10,000 cycles were discarded as burn-in.Every 10thsample of the remaining 10,000 iterations was saved to estimate SNP effects and the variance components.The Julia package JWAS was used for BayesCπanalyses[22],and the DMUAI procedure,implemented in DMU software,was used for GBLUP analyses.
In this study,the accuracy and unbiasedness of prediction were obtained through 5-fold cross-validation(CV)for the Beijing and Fujian populations,where the Beijing(Fujian)population was randomly split into 5 folds,predicting one fold based on the other 4 folds,the combined population was divided into 4 folds and another population,or a multi-trait model was used in which the values of the same trait in 4 folds and another population were considered different traits.In each round of CV,phenotypes from one fold(validation population)were removed from the dataset,and the remaining folds(reference population)were used to predict the future performance of animals in the validation population.This 5-fold CV was replicated 20 times,and the results are presented as the mean and standard deviation for the 20 replicates.Here,the single-trait models(ST-GBLUP and ST-BayesCπ)based on single populations and the combined population were termed ST-GBLUP(BayesC π)_single and ST-GBLUP(BayesCπ)_combined,respectively.For ST-GBLUP(BayesCπ)_single,GBLUP(-BayesCπ)_combined and MT-GBLUP(BayesCπ),the validation population was the same for the four approaches in each replicate of 5-fold CV.The accuracy of genomic prediction was evaluated asr(yc,GEBV),the correlation between GEBVs and corrected phenotypic valuesycin the validation population.In addition,b(yc,GEBV),the regression ofycon GEBVs,was also calculated to assess the possible inflation or unbiasedness of predictions.A two-tailed paired t-test was used to compare prediction accuracy between a pair of scenarios.
To assess the population structure of the two Yorkshire pig populations,PCA was performed.Figure 1 illustrates that the genetic backgrounds of the Beijing and Fujian populations were similar,and both populations were composed of American Yorkshire progeny.In addition,the Yorkshire population(874 pigs)with a British origin was selected for PCA,which also showed that Beijing and Fujian had similar genetic backgrounds(Additional file:Fig.S1).However,no pedigree connection between these populations was detected according to their pedigree records due to a lack of genetic exchange.
LD between a pair of SNPs was measured asr2LDandrLDin the Beijing and Fujian populations.The LD between adjacent markers in the Beijing and Fujian populations was also investigated,as shown in Table 2.The meanr2LDof adjacent SNP pairs within a chromosome ranged from 0.52 to 0.63 in the Beijing population and from 0.51 to 0.63 in the Fujian population.The meanr2LDacross all chromosomes was 0.56 in both the Beijing and Fujian populations,also indicating the similar genetic backgrounds of the populations.However,the consistency of LD between the two populations was not high.The correlation ofrLDfor adjacent SNPs between the two populations ranged from 0.47 to 0.60 across all chromosomes,with a mean of 0.55 at an average marker distance of 41 Kb,indicating the insufficient consistency in LD between these two pig populations.
Estimates of heritability for the growth and reproductive traits in the two Yorkshire pig populations are shown in Table 1.In the Beijing population,the heritabilities of AGE and BFT were 0.33 and 0.34,respectively,with a standard error of 0.01.The heritabilities of AGE and BFT were 0.42 and 0.44,respectively,with a standard error of 0.02 in the Fujian population.For NBA and TNB,the estimated heritabilities were 0.07 and 0.08 with a standard error of 0.01 in the Beijing population and 0.09 and 0.11 with a standard error of 0.02 in the Fujian population.
Fig.1 Principal component analysis(PCA)of two Yorkshire pig populations.Beijing and Fujian represent two Yorkshire pig populations from two Chinese pig breeding farms;PC1(3.5%)=first principal component(variance explained by PC1=3.5%);PC2(2.3%)=second principal component(variance explained by PC2=2.3%);PC3(1.3%)=third principal component(variance explained by PC3=1.3%)
Table 2 Distance and LDof adjacent SNPs for each autosome(CHR)
Table 2 Distance and LDof adjacent SNPs for each autosome(CHR)
a Cor:the correlation of rLD values of adjacent SNP pairs between two populations;b Sum over 18 autosomesc Yorkshire pig populations from Beijing and Fujian with similar genetic backgrounds
?
Figure 2 shows the estimated genetic correlations between the Beijing and Fujian populations based on the MT-GBLUP model using all genotyped animals.For AGE and BFT,the estimates of genetic correlations were 0.618 and 0.623,respectively,with standard errors of 0.145 and 0.134.The genetic correlations of NBA and TNB were 0.714 and 0.723,respectively,with standard errors of 0.153 and 0.159.These genetic correlations indicated that G×E interactions most likely exist between the Beijing and Fujian populations,as Robertson[23]suggested that 0.80 was the threshold of biological importance for G×E interactions.
Tables 3 and 4 and Fig.3 demonstrate the accuracies of genomic prediction for growth and reproductive traits achieved by applying single-trait and multi-trait models with GBLUP and BayesCπ.Generally,in the same scenario,the prediction accuracies of each method were the largest for BFT,as it had the highest estimated heritability among the four traits.Lower prediction accuracies were obtained for two reproductive traits,NBA and TNB,due to their low heritabilities.For the two pig populations,higher accuracies for growth traits were acquired for the Beijing population,as it had a much larger reference population than the Fujian population(Table 1),while their small reference populations for reproductive traits resulted in comparably low accuracies of genomic prediction.Tables 3 and 4 and Fig.3 further show the superiority of the multi-trait model for genomic prediction when dealing with G×E interactions,as discussed below.
Table 3 presents the accuracy of genomic prediction for growth and reproductive traits achieved by applying the GBLUP method,as measured by 20 replicates of fivefold CV.For the Beijing population,the prediction accuracies obtained with ST-GBLUP_single were 0.315 and 0.331 for AGE and BFT and 0.142 and 0.172 for NBA and TNB,respectively.When ST-GBLUP_combined was used,the genomic prediction accuracies were improved by 1.1% and 1.7% for NBA and TNB,respectively,compared with those obtained with ST-GBLUP_single,while there were no significant improvements for the growth traits AGE and BFT.In contrast,for the Fujian population,the genomic prediction accuracies for both growth and reproductive traits increased when the single populations were combined to enlarge the reference population.On average,ST_GBLUP_combined yielded 1.1% and 2.0% higher accuracies than STGBLUP_single for the growth and reproductive traits,respectively.Obviously,the gains in accuracy obtained for the Fujian population were larger.
However,the single-trait model with the BayesCπ method did not yield higher prediction accuracies for the combined population compared with the single populations for most traits,even though the reference population was larger.As shown in Table 4,when predicting the traits of pigs from the Beijing population,the prediction accuracies obtained with STBayesCπ_single were 0.306,0.328,0.156 and 0.179 for AGE,BFT,NBA and TNB,respectively.STBayesCπ_combined produced almost the same accuracies as ST-BayesCπ_single in all scenarios,and the accuracy obtained by ST-BayesCπ_combined was even slightly lower in some cases;e.g.,the prediction accuracies of ST-BayesCπ_single and STBayesCπ_combined for NBA were 0.156 and 0.154,respectively.A trend of similar accuracies for STBayesCπbetween the combined and single populations was also found in the Fujian population,as demonstrated in Table 4.The average prediction accuracies obtained with ST-BayesCπ_single and ST-BayesCπ_combined were almost the same for the growth traits and reproductive traits.
Fig.2 Genetic correlations between the two Yorkshire pig populations obtained using the multi-trait GBLUP method with all genotyped animals.AGE:days to 100 kg;BFT:backfat thickness at 100 kg;NBA:number of piglets born alive;TNB:total number of piglets born.The red line represents the threshold of 0.8.The error bar represents the standard error
Table 3 Accuracy and unbiasedness of genomic prediction of growth and reproductive traits performed with the GBLUP method as assessed with 20 replicates of five-fold CV
Table 4 Accuracy and unbiasedness of genomic prediction of growth and reproductive traits performed with the BayesCπmethod as assessed with 20 replicates of five-fold CV
Fig.3 Comparison of genomic prediction accuracies of GBLUP and BayesCπmethods when implementing a single-trait model in single populations and a multi-trait model.a Predicting pigs from the Beijing population using a single-trait model;b predicting pigs from the Fujian population using a single-trait model;c predicting pigs from the Beijing population using a multi-trait model;d predicting pigs from the Fujian population using a multi-trait model.AGE:days to 100 kg;BFT:backfat thickness at 100 kg;NBA:number of piglets born alive;TNB:total number of piglets born;BayesCpi:BayesCπ
The unbiasedness of genomic prediction of growth and reproductive traits as assessed with 20 replicates of five-fold CV is shown in Tables 3 and 4.When predicting the traits of pigs from the Beijing population based on the GBLUP method,larger bias(0.199)was produced by ST-GBLUP_combined than by ST-GBLUP_single.Likewise,unbiasedness was close to 1 with the STGBLUP_single method for growth and reproductive traits when predicting the traits of pigs from the Fujian population,while ST-GBLUP_combined yielded a 0.292 larger bias than ST-GBLUP_single(Table 3).Similarly,ST-BayesCπalso produced more bias for the combined population than for the single populations in most situations(Table 4).
In addition,prediction accuracies obtained with training on the Beijing population and validation in the Fujian population and training on the Fujian population and validation in the Beijing population using STGBLUP and ST-BayesCπwere determined.Low prediction accuracies and large bias were observed in all scenarios(Additional file:Table S1),indicating that using one population to predict another is infeasible,even though the genetic backgrounds of the Beijing and Fujian populations were similar in this study.
Both single-trait and multi-trait models with GBLUP and BayesCπwere implemented to address the combined population in this study.Tables 3 and 4 show the accuracies and unbiasedness of genomic prediction obtained with the multi-trait and single-trait models in the same scenarios.In general,the multi-trait model showed significant superiority over the single-trait model.For the GBLUP method,as shown in Table 3,when predicting the traits of pigs from the Beijing population,the multi-trait GBLUP model(MT-GBLUP)yielded approximately 1.3%(1.2%)and 2.2%(0.8%)higher accuracies than ST-GBLUP_single(ST-GBLUP_combined)for the growth and reproductive traits,respectively.Furthermore,a small amount of bias was obtained with MTGBLUP in most scenarios.A similar trend was found in the Fujian population.As shown in Table 3,MT-GBLUP yielded 3.2%(2.2%)and 3.0%(1.1%)higher accuracies on average for the growth and reproductive traits,respectively,than ST-GBLUP_single(ST-GBLUP_combined).The unbiasedness was close to 1 when using the MTGBLUP methods for growth and reproductive traits in most situations.Again,the gains obtained by MTGBLUP in the Fujian population were larger than those in the Beijing population.
For the multi-trait model with BayesCπ,as shown in Table 4,when predicting the traits of pigs from the Beijing population,MT-BayesCπyielded approximately 1.3%(1.4%)and 2.5%(2.5%)higher accuracies than STBayesCπ_single(ST-BayesCπ_combined)for the growth and reproductive traits,respectively.The largest gain over ST-BayesCπ_single and ST-BayesCπ_combined was 2.7% and 2.9% for NBA,respectively.When predicting the traits of pigs from the Fujian population,MT-BayesCπobtained higher accuracies for all traits(Table 4),yielding 2.7%(2.6%)and 2.4%(2.3%)gains on average over ST-BayesCπ_single(ST-BayesCπ_combined)for the growth and reproductive traits,respectively.MT-BayesCπalso produced bias as low as that of ST-BayesCπ_single in most scenarios.
Figure 3 presents the accuracy of genomic prediction of growth and reproductive traits achieved by applying single-population ST-GBLUP(BayesCπ)and MTGBLUP(BayesCπ)methods.When predicting the traits of pigs from the Beijing population,GBLUP yielded approximately 1.0% higher accuracy than BayesCπregardless of whether the single-or multi-trait model was used for AGE and a bias close to 0 with the GBLUP method in this scenario,while there was no difference in accuracy between GBLUP and BayesCπfor BFT.For reproductive traits in the Beijing population,the accuracy of genomic prediction was increased by 1.3% on average for MT-BayesCπcompared to MT-GBLUP,and no difference in accuracy was found between ST-GBLUP and ST-BayesCπ.When predicting the traits of pigs from the Fujian population,GBLUP and BayesCπproduced similar prediction accuracies for AGE;e.g.,the prediction accuracies with ST-GBLUP_single and ST-BayesCπ_single were 0.245 and 0.243,respectively,while less bias was obtained by GBLUP in this scenario.There was no difference in prediction accuracy or unbiasedness between ST-GBLUP_single and ST-BayesCπ_single for BFT,while MT-GBLUP achieved a 1.3% higher accuracy than MT-BayesCπ.For reproductive traits in the Fujian population,there was no difference in prediction accuracy between GBLUP and BayesCπ,except that MTGBLUP_combined achieved a 1.9% higher accuracy than MT-BayesCπ_combined for TNB.Generally,GBLUP performed comparably to BayesCπin the single-trait and multi-trait models.
In this study,we investigated the accuracy of genomic prediction in two Yorkshire pig populations with similar genetic backgrounds when G×E interactions were taken into account.Our results revealed G×E interactions between the two populations for the reference growth and reproductive traits.We further explored whether directly combining populations to enlarge the reference population is efficient in improving the accuracy of genomic prediction in the presence of G×E interactions.Our results demonstrated the superiority of the multi-trait model for genomic prediction in dealing with G×E interactions,which could better accommodate the G×E interactions.Higher prediction accuracies and lower bias were obtained with the multi-trait model than with the single-trait model when implementing either GBLUP or BayesCπin a given situation.
G×E interactions play an important role in pig populations,and they should be considered in breeding programs to select the best animals in different environments[11,24].The detection of G×E interactions relies on a genetic correlation of 0.8 for one trait in different environments,the threshold suggested by Robertson[23].Accordingly,G×E interactions between two pig populations for growth and reproductive traits(ranging from 0.618 to 0.723)were observed.Similarly,Li et al.[25]reported on the across-country genetic correlations(ranging from 0.604 to 0.726)of dairy cattle,indicating that an important G×E interaction exists between Brazilian and Nordic(or Nordic and French)populations.Hay and Roberts[26]also reported the existence of genotype-by-nutritional environment interactions for growth and carcass traits in beef cattle with genetic correlations below the threshold value of 0.8.In this study,SNP markers were used to construct the relationships(G matrix)between the two populations in order to estimate the genetic correlations,as no linked pedigree was available.Compared with a pedigree used to estimate genetic correlations,the realized relationships among individuals are captured by marker information and achieve more accurate estimates of genetic correlations.However,the number of genotyped animals was not large enough to produce a lower standard error for genetic correlations compared to that obtained by estimating heritability.We therefore used SNP markers to assess the population structure of the two pig populations,and the similar patterns of LD on each chromosome between these two populations further showed that they had similar genetic backgrounds.However,the correlation ofrLDvalues of adjacent SNPs with a mean of 0.55 indicated the insufficient consistency in LD between these two pig populations,which could be due to G×E interactions.Genetic correlations are caused by the pleiotropic action of genes and linkage between genes affecting different traits[9].If the performance values of the same trait in different populations(environments)are regarded as different traits and are affected by different genes(that are of course pleiotropic),the phase of linkage of these genes and the consistency of their linkage(not LD)in the two populations will affect the strength of the genetic correlation and G×E interaction.
The size of the reference population is one key factor in genomic prediction[4].Combining populations is an easy but practical way to improve the accuracy of genomic prediction,and its advantages in practice have been reported by EuroGenomics[8]and North American[2]consortiums.Similar results were shown in this study,in which higher prediction accuracies were obtained by the single-trait model for the combined population than for the single populations in most situations(Table 3),and such gains were greater for the population with a small reference size;e.g.,the prediction accuracy in the combined population was 1.1% higher than that in the single population for the growth trait BFT in the Fujian population but only 0.3% higher in the Beijing population,and the number of genotyped animals in the Beijing population was much larger than that in the Fujian population(Table 2),in accordance with the features of other studies[5,14].However,in some situations,accuracy was not increased or slightly decreased in the combined population;e.g.,the prediction accuracies for AGE in the Beijing population obtained with ST-BayesCπ_combined and ST-BayesCπ_single were 0.306 and 0.304,respectively(Table 4).Generally,in this study,the single-trait model exhibited only a slight gain in genomic prediction accuracy for most traits when using the combined population.In addition,the single-trait model produced more bias for the combined population than for the single populations(Tables 3 and 4).The lower improvement or slight decrease in the combined population compared with the expectation could be due to the weak consistency in LD between populations,as pointed out by Lund et al.[8].
Many studies have shown that inconsistent LD between SNPs and causative variants across populations causes disadvantages compared with using a combined population in genomic prediction[5,6,27,28].In this study,the correlation ofrLDvalues of adjacent SNPs with a mean of 0.55 indicated the insufficient consistency in LD between the Beijing and Fujian populations.Weak consistency in LD will lead to bias in SNP effect estimates when populations are simply combined with a single-trait model,further resulting in fewer improvements in accuracy and larger bias.Therefore,the effect of increasing the consistency in LD or reducing its noise on GEBVs is worth investigating.On the one hand,the consistency in LD between populations could be affected by the density of chip SNPs.De Roos et al.[29]found that markers in LD with causative variants across diverged breeds would require~300,000 SNPs.Hoze et al.[30]reported that 2% higher prediction accuracies were acquired for combined small populations of dairy cattle breeds when using the 777 K panel compared to the 54 K panel,and our previous study showed that imputed whole-genome sequencing data hold the potential to increase the accuracy of genomic prediction for combined populations in pigs[7].
On the other hand,reasonable models should be explored to take advantage of the correlation of LD between populations.The single-trait model ignores the inconsistency in LD between populations,while the multi-trait model treats populations as different environments and uses covariance to consider the correlation between them,which is caused by the consistency(no matter how high or low)in LD between populations.In this study,the multi-trait model performed best in all scenarios(Tables 3 and 4).Similar results were obtained in a Brazilian Holstein population by adding data from Nordic and French Holstein populations and using a multi-trait model[25].In addition,our results also showed that less bias was obtained with the multi-trait model compared to the single-trait model,indicating that the multi-trait model can not only use the combined information of populations to increase prediction accuracy but also reduce the impact of LD inconsistencies.
The multi-trait model and reaction norm model are two prevalent methods for handling G×E interactions in genetic evaluations.Only a few studies have explicated the advantages and disadvantages of these two kinds of methods.Falconer et al.[9]reported that G×E interactions could be detected using a multi-trait model,in which the trait values in each environment are treated as genetically distinct traits.In a limited number of environments(e.g.,in this study,with two breeding farms),a multi-trait model could capture the G×E interaction between these environments.However,the computational demand of multi-trait models will increase rapidly with an increase in the number of environmental levels,as more(co)variance components will have to be estimated and convergence will become increasingly difficult.In contrast to the multitrait model,the reaction norm model can handle a large number of or continuous environmental levels(e.g.,farm,year and seasonal effects),and genetic parameters could be estimated at each environmental level[12].However,the reaction norm model captures only part of the G×E interaction due to the limited number of environmental levels.In this study,the genomic prediction accuracies were not improved by using the reaction norm model with farm,year and seasonal effects as environmental covariates due to the limited number of environments(results not shown).Therefore,the reaction norm model could not perform well in a limited number of environments.In addition,convergence of the reaction norm model could be hindered because of the complexity of the model when the data are less informative[11].Therefore,a multi-trait model is optimal for analyzing G×E interactions in a limited number of environments.Furthermore,the multitrait model is more convenient in practical breeding,and it can accommodate phenotypes that are not measured in exactly the same way,e.g.,different scales of deregressed proofs and the existence of G×E interactions in Chinese and Nordic Holsteins[25,31].Such phenotypes cannot be analyzed by a single-trait model for joint genomic prediction,while the use of a multi-trait model is plausible in such cases.
In this study,GBLUP and BayesCπ,assuming a common(co)variance for all(GBLUP)or a proportion(BayesC π)of the SNP effects,were used in this study.As many studies have reported for real data[32,33],the average prediction accuracies were not significantly different between GBLUP and BayesCπin the single populations and multi-trait model in this study(Fig.3).However,BayesCπ performed worse than GBLUP for the combined population,and no improvements in accuracy were obtained by BayesCπfor most traits compared to that obtained with a single population.In addition,BayesCπproduced a larger bias(Table 4).Table 5 provides the standard error of the marker effect estimated by BayesCπin single and combined populations.The BayesCπmethod for the combined population produced the largest standard error of the marker effect in all situations,implying larger bias when estimating genomic breeding values.A larger standard error for the marker effect might have been caused by the inconsistency in LD between the Beijing and Fujian populations.In contrast,the minimum standard error of the marker effects estimated by the multi-trait BayesCπ model also indicated the superiority of multi-trait BayesC πin genomic prediction.Multi-trait BayesCπwas first proposed by Jia and Jannink[34],with the restrictive assumption that a locus simultaneously affects all the traits or none of them.In this study,multi-trait BayesCπ allowed a locus to affect any combination of traits,which is biologically realistic,especially in multi-trait analyses involving many traits[21].However,it might not be possible to apply multi-trait single-step GBLUP(BayesCπ)[35-37]in this study,as no pedigree link between the two pig populations was available and the BayesCπmethod was too computationally demanding.
Table 5 Mean value of the standard error of marker effects estimated by the BayesCπmethod using all genotyped animals
G×E interactions constitute a potential source of inefficiency in animal breeding.Ignoring possible G×E interactions could lead to a reduction in genetic gains.Our results demonstrated that directly combining populations to enlarge the reference population is not efficient in improving the accuracy of genomic prediction in the presence of G×E interactions in two Yorkshire pig populations,while the multi-trait model performed better in a limited number of environments with weak G×E interactions.This study will be helpful when leveraging G×E interactions in practical genomic selection.
Supplementary informationaccompanies this paper at https://doi.org/10.1186/s40104-020-00493-8.
Additional file 1:Fig.S1.Principal component analysis(PCA)of the Beijing and Fujian Yorkshire pig populations and a British Yorkshire pig population.
Additional file 2:Table S1.Accuracy and unbiasedness of genomic prediction of growth and reproductive traits obtained when using one population to predict another.
Abbreviations
G×E:Genotype-by-environment;GBLUP:Genomic best linear unbiased prediction;LD:Linkage disequilibrium;SNPs:Single nucleotide polymorphisms;Beijing and Fujian represent two Yorkshire pig populations from two Chinese pig breeding farms;AGE:Days to 100 kg;BFT:Backfat thickness at 100 kg;NBA:Number of piglets born alive;TNB:Total number of piglets born;yc:Corrected phenotypic values;GEBV:Genomic EBV;STGBLUP(BayesCπ):Single-trait GBLUP(BayesCπ);MT-GBLUP(BayesCπ):Multitrait GBLUP(BayesCπ);CV:Cross validation;ST-GBLUP(BayesCπ)_single:STGBLUP(BayesCπ)based on single population;ST-GBLUP(BayesC π)_combined:ST-GBLUP(BayesCπ)based on combined population;PCA:Principal component analysis;PC:Principal component;MCMC:Markov chain Monte Carlo
Acknowledgements
The authors are grateful to Beijing Liuma and Fujian Yongcheng Pig Breeding Co.,Ltd.for providing blood samples.
Authors’contributions
HLS and XDD designed the experiments.HLS performed statistical analysis and wrote the manuscript.QZ and XDD revised the manuscript.All authors read and approved the final manuscript.
Funding
This work was supported by grants from the earmarked fund for China Agriculture Research System(CARS-35),Modern Agriculture Science and Technology Key Project of Hebei Province(19226376D),the National Key Research and Development Project(SQ2019YFE00771),the National Natural Science Foundation of China(31671327).Major Project of Selection for New Livestock and Poultry Breeds of Zhejiang Province(2016C02054-5).
Availability of data and materials
The datasets used during the current study are available from the corresponding author on reasonable request.
Ethics approval and consent to participate
The experiments involving animals were approved by the Animal Welfare Committee of China Agricultural University.There was no use of human participants,data or tissues.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Author details
1National Engineering Laboratory for Animal Breeding,Laboratory of Animal Genetics,Breeding and Reproduction,Ministry of Agriculture,College of Animal Science and Technology,China Agricultural University,Beijing 100193,China.2Shandong Provincial Key Laboratory of Animal Biotechnology and Disease Control and Prevention,Shandong Agricultural University,Taian 271001,China.
Received:1 March 2020 Accepted:7 July 2020
Journal of Animal Science and Biotechnology2021年1期