The role of genomic structural variation in the genetic improvement of polyploid crops

2019-04-17 01:33:38SarahVeronicaSchiesslElvisKatcheElizabethIhienHarmeetSinghChawlaAnnalieseMason

The Crop Journal 2019年2期

Sarah-Veronica Schiessl,Elvis Katche,Elizabeth Ihien,Harmeet Singh Chawla,Annaliese S.Mason*

Department of Plant Breeding,Justus Liebig University,Heinrich-Buff-Ring 26-32,Giessen 35392,Germany

Keywords:Presence-absence variation Copy-number variation Homeologous exchanges Genome structure Pan-genome

A B S T R A C T Many of our major crop species are polyploids,containing more than one genome or set of chromosomes.Polyploid crops present unique challenges,including difficulties in genome assembly,in discriminating between multiple gene and sequence copies,and in genetic mapping,hindering use of genomic data for genetics and breeding.Polyploid genomes may also be more prone to containing structuralvariation,such as loss of gene copies or sequences(presence-absence variation)and the presence of genes or sequences in multiple copies(copynumber variation).Although the two main types of genomic structural variation commonly identified are presence-absence variation and copy-number variation,we propose that homeologous exchanges constitute a third major form of genomic structural variation in polyploids.Homeologous exchanges involve the replacement of one genomic segment by a similar copy from another genome or ancestrally duplicated region,and are known to be extremely common in polyploids.Detecting all kinds of genomic structural variation is challenging,but recent advances such as optical mapping and long-read sequencing offer potentialstrategies to help identify structuralvariants even in complex polyploid genomes.All three major types of genomic structural variation(presence-absence,copy-number,and homeologous exchange)are now known to influence phenotypes in crop plants,with examples of flowering time,frost tolerance,and adaptive and agronomic traits.In this review,we summarize the challenges of genome analysis in polyploid crops,describe the various types ofgenomic structuralvariation and the genomics technologies and data thatcan be used to detect them,and collate information produced to date related to the impact of genomic structural variation on crop phenotypes.We highlight the importance of genomic structural variation for the future genetic improvement of polyploid crops.

1.Introduction

Polyploidy refers to either the duplication of a single genome(autopolyploidy)or to the combination of two or more different genomes(allopolyploidy)to make a new species[1,2].Many important domesticated crops have been classified as allopolyploids,such as wheat(Triticum aestivum),tobacco(Nicotiana tabacum),peanut(Arachis hypogaea),and cotton(Gossypium hirsutum)[1].Another popular example is rapeseed(allotetraploid Brassica napus,2n=4x=38,AACC)which was formed by hybridization between B.rapa(2n=2x=20)and B.oleracea(2n=2x=18)[3].By contrast,autopolyploids,such as seedless watermelon(Citrullus lanatus),banana(Musa acuminata),potato(Solanum tuberosum),and alfalfa(Medicago sativa)[1],arise within a single species by genome doubling[4].

In flowering plants in particular,polyploidy and interspecific hybridization have played a major and pervasive role in shaping plant genomes[5].The initial merger of two genomes is often accompanied by dramatic events such as transposable-element activation(movement and replication of mobile DNA),homeologous exchanges(swapping of DNA between ancestrally related chromosomes),and DNA methylation(which may cause changes in gene expression),while the subsequent path to diploidization involves the loss,retention,or maintenance of duplicate genes with possible neo-and subfunctionalization(respectively the arising of novel gene functions and the sharing of gene functionality between duplicates)[6-8].Together with the strongly increased sequence similarity in polyploid genomes,these processes drastically increase the likelihood of occurrence of genomic structural variants in polyploids.

Polyploids harbor great potential for crop improvement.The presence of extra gene copies and alleles can boost allelic heterosis(which confers hybrid vigor)as well as provide gene redundancy[6].However,the multiple subgenomes and larger genome size in polyploid than in diploid crops pose some challenges to polyploid crop improvement.These challenges include decreased selection efficiency due to the contribution of multiple genes and alleles to each trait,and increased difficulties in obtaining accurate genomic and genotypic data.The latter challenge is particularly relevant for genomic structural variants,which are now known to heavily influence traits.Addressing some of these challenges requires a deep functional and structural understanding of crop genomes[9].In this review,we present an overview of the types of genomic structural variation present in polyploids and how they can be detected,as well as the documented influence of genomic structural variation on traits in polyploids.We highlight the challenges and opportunities in exploiting the special genomic structure of polyploid crops for evolutionary and breeding research.

2.Genomic structural variation

2.1.Types of genomic structural variation in polyploids

Genomic structural variation includes all variants of the DNA sequence in which sequence blocks larger than 1 kb are transferred to a different genomic context.These transfers can have different outcomes:the sequence block to be transferred can be moved to a new locus(translocation),it can be flipped from a 5′-to-3′to a 3′-to-5′orientation in the same location(inversion),it can be lost(deletion),and it can be copied to a new locus(duplication).Although translocations and inversions change only the genomic context and do not affect the number of copies of a sequence present in the genome,deletions and duplications can change the copy number of the genes contained in the affected sequence block.This change can lead to individual variation in the number of copies of a gene,which is called copy-number variation(CNV).If a gene or region is simply missing in some individuals relative to others,we call these presence-absence variants(PAVs).

Polyploid genomes are,owing to their intragenomic homology,prone to so-called homeologous exchanges(HEs),in which homeologous chromosomes(ancestrally related chromosomes from different subgenomes)exchange genetic material.Particularly common in polyploids,these exchanges between homeologous chromosomes during meiosis can result in the appearance of CNVs and PAVs as well as reciprocal translocation events(Fig.1).Although homeologous exchanges are often classed as either PAVs or CNVs,these differ from the conventional definition of PAVs and CNVS in that one part of the genome is replaced with a copy from another part of the genome,usually a homeologous region,generally conserving gene content(Fig.1).Although transposable elements have been reported to be a major cause of structural variation[10,11],and CNV and PAV are considered to have greater effects on plant phenotype[12],HEs play a major role in generating genomic structural variation in polyploids.Investigating,capturing,and utilizing the genetic differences arising from this variation will promote the genetic improvement of polyploid crops.In the following sections we will briefly discuss these three main types of structural variation(PAVs,CNVs,and HEs),focusing on the context of polyploid crops.

2.2.Copy-number variation

Copy-number variation refers to the presence of DNA sequences(usually larger than 1 kb)in copies whose number varies between individuals or populations of the same species.Smaller elements are known as insertion/deletions[13-15].In humans,besides the known association with sporadic and Mendelian diseases,CNV has also been associated with complex traits in humans such as autism,susceptibility to HIV,and schizophrenia.However,not only do CNVs play a role in disease and susceptibility to disease,but CNVs may also result in the emergence of advantageous traits,and thus be subject to evolutionary pressures such as selection and drift[16,17].Several mechanisms have been proposed to explain how CNV arises,including non-allelic homologous recombination(NAHR)and non-homologous end joining(NHEJ),which are recombination-based mechanisms,and retrotransposition,which is the activation and insertion of retrotransposons.A novel replication-based mechanism known as fork stalling and template switching(FoSTeS)has been proposed to account for complex rearrangements that cannot be explained by the above mechanisms[16,18].NAHR occurs between DNA segments of high similarity that are not alleles or homologous sequences,and usually involves lowcopy-number repeats(LCR),which are DNA segments larger than 1 kb probably generated during genome duplication events[19],a major feature of polyploids.

Fig.1-Examples of genomic structural variation that can occur in polyploids.Two pairs of homeologous chromosomes A1 and B1 with identical gene order(numbers 1-7)are presented as an example,showing presence-absence and copy-number variation as traditionally defined(respectively,loss and duplication of genes)and some of the variants that may arise from homeologous exchanges:a reciprocal translocation,which is a form of chromosome rearrangement without loss or multiplication of sequences;a non-reciprocal translocation,resulting in a “PAV-like”region with the absence of the B1 homeolog and duplication of the A1(technically also a CNV,but these are more difficult to detect);and a translocation heterozygote such as may arise by hybridization between an individual with a fixed reciprocal translocation event and an individual without this translocation event,resulting in a 3:1 ratio of A1:B1 chromosome segments over a “CNV-like”region.

CNV has long been known to contribute to phenotypic diversity in humans.More recently,evidence from an increasing number of studies has shown that CNV is prevalent and plays an important role in phenotypic diversity in plants.Studies in different plant species such as maize(Zea mays)[20],Arabidopsis thaliana[21],rice(Oryza sativa)[22],rapeseed[23,24],and wheat[25]all testify to the prevalence of CNV in plants.Interestingly,CNV in plants is generally calculated differently from that in animals.In animals,copy number is calculated as the number of copies per haploid genome,whereas in plants copy number generally refers to the number of copies per diploid genome[18,25,26].Using comparative genome hybridization,Springer et al.[20]identified 400 CNVs between the maize inbred lines Mo17 and B73.These CNVs were distributed across all maize chromosomes,although several conserved regions(located mostly around the centromeres)showed few or no CNVs.By sequencing 80 Arabidopsis thaliana accessions collected from diverse environments,Cao et al.[21]discovered 1029 CNVs,some of which overlapped with gene coding regions and thus might have an effect on phenotype.Copy-number variation was found to be particularly prevalent in polyploids and polyploid crops,such as allopolyploid wheat[25,27-29]and autotetraploid potato[13],despite the added challenge of discriminating between the multiple gene copies already present as a result of wholegenome duplication and hybridization events.Several authors have also reported CNVs affecting important adaptive and agronomic characteristics such as grain yield,frost tolerance,and flowering time[30-33];a comprehensive overview of the effect of structural variation on phenotypic traits in polyploids appears in Section 4 of this review.

2.3.Presence-absence variation

Definitions of PAVvary.Whereas Ding etal.[34]define PAVas the presence or absence of genes within a genome or the presence of genes located in differentgenomic regions between genomes,the more common definition is the presence or absence of a gene in some butnotallindividuals ofa species[20,35].PAVhas also been considered as an extreme form of CNV[14].With the cost of genome sequencing currently decreasing,the ability to sequence many genomes simultaneously is on the rise,permitting alignment of the genomes of many individuals of the same species for comparison.One common outcome of this comparison is the identification of presence-absence variation(PAV).The extent to which two genomes can vary in terms of PAV has been demonstrated in maize,in which a comparison was made between inbred lines B73 and Mo17.On average,only 50%of sequences were shared in common between the two lines,while 25%of sequences in homologous locations were present in one inbred line but were absent in the other[36].This result,coupled with results from other plant species,prompted an extension of the “pan-genome”concept(originally proposed in bacteria)to plants.The pan-genome is composed of the “core”genome(genes or genomic regions present in all individuals of a species)and the “dispensable”genome(present in some individuals of a species)[10,12].Following the introduction of this concept to plants,the pan-genomes of plant species such as rice,barley,maize,and soybean have been analyzed,and the dispensable genome fraction has been shown to play an important role in evolution as well as in the complex interplay between plant species and the environment[18,37].As a result,some authors have asked whether the dispensable genome fraction really is dispensable[10,12],or should perhaps instead be thought of as another form of adaptive variation within species.

Polyploidization and subsequent diploidization processes in plants are accompanied by subgenome fractionation,gene loss,and transposable element activation[38],processes that can increase the frequency of presence-absence variation.Despite the inherent difficulty of evaluating PAV in polyploids,several studies have made progress in this direction.Montenegro et al.[35]produced a wheat pan-genome by sequencing 18 elite cultivars and comparing them to an elite spring cultivar,and found that each cultivar had an average of 128,656 genes,with 64.3%of genes shared by all 19 cultivars.The total pan-genome content was 140,500±102 genes,with 39 unique genes per individual.Sequencing the genome of the autotetraploid potato Solanum tuberosum(2n=4x=48)and comparing it with a heterozygous diploid genome revealed PAV in 275 genes,with 246 genes specific to the diploid[39].The assembled pan-genome of Brassica napus from 53 synthetic and non-synthetic lines revealed that 38%of genes showed PAV,many of which PAVs were putatively associated with important agronomic traits such as flowering time,disease resistance,and glucosinolate content[23].These examples highlight the importance of PAV and its potential for the genetic improvement of polyploid crops.

2.4.Homeologous exchanges

In allopolyploids,homeologous chromosomes come together in a single genome.Disomic inheritance,which is the result of strict pairing between homologous chromosomes,is sometimes enforced by pairing regulators,such as Ph1 in allopolyploid wheat[40].However,this process can occasionally fail even in stable allopolyploids,such thathomeologous chromosomes pair and exchange genetic information[40-43],undergoing HEs(Fig.1).Genomic variation resulting from HEs has been reported in rapeseed[44],wheat[18,45],sunflower(Helianthus annuus)[46],and Tragopogon[47,48].HEs have also been demonstrated to affect phenotypic traits.Although Udall et al.[49]observed no marked effect of HEs in four rapeseed mapping populations,Osborn et al.[41]reported that a homeologous non-reciprocal translocation between chromosomes A7 and C6 in a rapeseed mapping population had a significant effect on seed yield.Other studies have linked HEs to other important traits such as seed quality,flowering time and fertility[24,50].HEs have also been reported to be the major cause of gene PAV in B.napus amphiploids[23,51].Hurgobin et al.[23]assembled the pangenome of B.napus and reported two types of PAV:non-HE-related PAV and HE-related PAV,the latter referring to the loss of genes by replacement with their corresponding genomic segments from homeologous regions.Of the 53 accessions used to assemble the pan-genome,30 showed HE-related PAVs,and functional annotation of these HE-related PAVs pointed to their involvementin stress,defense,and auxin pathways[23].Lloyd et al.[51]also validated the effects on gene expression of 21 HEs between B.napus accessions,demonstrating major effects of some ofthese HEs on homeologous gene pairs.As HEs have been reported to have both adaptive and agronomic importance,cataloguing these genomic changes could play an importantrole in the breeding of allopolyploid crops.

3.Techniques for uncovering genomic structural variation in polyploid crops

3.1.Physical and genetic maps

Polyploid genomes are larger and more complex than their related diploid genomes.Larger genomes are more expensive to sequence,and polyploid genomes usually require more bioinformatics expertise than diploid genomes[52].The assembly of complex and polyploid genomes is still quite challenging,as next-generation sequencing relies on assembling short sequences that are usually much smaller than the size of genomic rearrangements[53],making it difficult to identify these events.To date,numerous genome misassemblies in polyploids have been found to result from structural variation,such as translocations and inversions between the A and D subgenomes in tetraploid cotton[53].Chromosome rearrangements can also interfere with the construction of genetic maps,mapping of quantitative trait loci(QTL),and marker-assisted selection[54,55].These rearrangements may also affect the accurate positioning of sequences in polyploid genomes when sequences are aligned to the reference genome of their diploid progenitors[56],complicating sequence-based genotyping approaches.However,the use ofgenetic mapping populations has shown great potential for addressing sequence-assembly problems in polyploids[57],and genetic and physical maps can be integrated for QTL mapping[58]or even combined to identify the effect of homeologous exchanges on phenotypes[50].Building on these strategies,new technological developments also show promise in facilitating the identification of structural variants in complex polyploid genomes for use in crop improvement.

3.2.Discriminating between homeologous loci

A major challenge in polyploid crop improvement is in discriminating between homeologous alleles;that is,alleles present at homeologous loci(in different genomic locations),rather than homologous loci(alleles present at the same locus on two homologous chromosomes).Alleles from different homeologous genomic locations are difficult to discriminate from alleles at a single homologous locus,leading to false identification of marker(e.g.single nucleotide polymorphisms(SNP))polymorphism[59-61].This confusion also creates difficulty in the development of homeolog-specific and allelespecific markers,complicating the design of primers to amplify specific target regions and not corresponding homeologous regions[54].In allopolyploids such as peanut in which the two subgenomes are highly similar(with 96%median sequence identity),distinguishing homeologous from allelic SNPs is complicated[62].Thus,increased genetic similarity between subgenomes within polyploid species exacerbates these challenges.Consequently,the rate of SNP marker development and its application in molecular breeding is slowed in polyploid crops[63].Polyploid crops often require higher numbers of markers than diploids,ultimately increasing cost[64].Amplification of homeologous alleles is also an obstacle to the use of SNP arrays for studying structural variation in polyploid crops.SNP arrays can be a valuable tool for detecting structural variation:in general,they can detect PAVs in the form of segregating marker “fails”(failure to amplify an allele by multiple markers physically located contiguously on a chromosome)in a population according to the expected allele ratios[44,65,66].However,when they are used with polyploid genomes,short oligonucleotide probes anchored onto these arrays often bind to closely related parts of the subgenomes,leading to false SNP calling[66].Other problems caused by homeologous sequences in polyploid crop improvement include difficulty in genome-wide quantification of homeologous gene expression due to high sequence similarity between homeologous gene pairs,and the detection of more minor than major QTL in QTL mapping of polyploids[67-69].Despite these challenges,approaches such as SNP array genotyping of doubled-haploid[70]and testcross[71]mapping populations have been highly successful in identifying chromosomal structural rearrangements and the effects of these rearrangements on phenotype[50]in complex polyploids.

With the advent of third-generation genomic technologies(firstgeneration:sequencing of single shortread sequences;e.g.Sanger sequencing;second generation:high-throughput multiplex sequencing of short(＜150 bp)reads;e.g.Illumina sequencing(https://www.illumina.com/)),it is also now possible to detect structural variants in polyploid genomes with great precision,assisting in the discrimination of homeologous alleles.Unlike theirpredecessors,third-generation technologies rely mainly on capturing long-range genomic information,and can be broadly classified into two categories:mapping-and sequencing-based[72].Third-generation mapping technologies such as Bionano Genomics optical mapping (https://bionanogenomics.com/)provide long-range genome structure information in the formofordered genomic markers(restriction or marker sites)without sequencing every single nucleotide.However,third-generation sequencing technologies such as Pacific Biosciences(PacBio)Single Molecule Real Time(SMRT)sequencing(https://www.pacb.com/)and the Oxford Nanopore Technologies(https://nanoporetech.com/)sequencing platform provide actual base-pair information for ultra-long DNA molecules.

3.3.Third-generation mapping technologies

Bionano Genomics optical mapping using nano-channel arrays relies on capturing long-range genomic information in the form of restriction sites for the detection of structural variants.Introduced in 2010,it involves imaging of highmolecular-weight fluorescently labeled DNA molecules and creation of large restriction maps represented as stretches of light and dark regions(resembling a barcode)which can then be aligned to an in silico-generated optical map of a reference assembly.Insertions and deletions can be detected if the analyzed genotypes have additional or missing restriction sites compared to a reference assembly,although detection may be confounded by the presence of mutations in the restriction enzyme binding sites.Optical mapping has been used in genome assembly and structural-variation detection approaches in many plant species,such as wheat[73],maize[74],Arabidopsis[75,76],and clover(Trifolium subterraneum L.)[77].One of the key factors distinguishing this approach from other technologies is that the DNA molecules are not broken into small fragments during the entire process,thus enabling the capture of long-range genomic information stretching up to several hundred kilobases.However,this approach relies,for structural-variation detection,on the availability of a highquality reference assembly,which is often not yet available for non-model crop species.Furthermore,lack of actual nucleotide information makes it computationally challenging to isolate actual structural variant calls from the noise generated during imaging of the DNA fragments.

3.4.Third-generation sequencing technologies

Two major third generation sequencing technologies with excellent potential for detecting structural variation in complex polyploid genomes are PacBio SMRT technology and Oxford Nanopore sequencing.Unlike Bionano Optical mapping,both techniques generate actual nucleotide sequences rather than only restriction-site information.PacBio SMRT works on the principle of sequencing-by-synthesis.A singlestranded circularized DNA template is fed into a sequencing well.As the DNA polymerase synthesizes the complimentary strand base by base,a distinct fluorescent signal is generated for each base.In its current form,PacBio reads range from 2 kb to 100 kb in length.Ultra-long reads can span insertions or deletions of several thousand base pairs,thereby enabling the precise detection and mapping of variation breakpoints.A very recent example underlining the efficacy of PacBio sequencing in assembling complex polyploid genomes was the whole-genome assembly of bread wheat[78].However,one of the possible reasons preventing more frequent use of this technology to date for detection of structural variation is the currently prohibitive financial costs associated with it.For example,a de novo assembly of a single rapeseed(Brassica napus,2n=4x,～1130 Mb)genome with 80× coverage would currently cost approximately US$25,000 with a service provider(excluding the bioinformatics analysis)on the Sequel System,whereas the system itself costs US$350,000.

Oxford Nanopore sequencing provides a relatively cheap alternative to PacBio SMRT technology for detection of structural variants in polyploid genomes.The largest difference separating this technology from others is that the cost of the sequencer itself has been reduced to practically zero.This makes it easy for smaller labs to access this technology.One of the very popular sequencing platforms from the Oxford Nanopore technologies is known as the MinION.It is a small handheld device capable of sequencing DNA by measuring the minute disruptions in electric current as a DNA molecule traverses a nanopore.It is capable of delivering a read length similar to that of the PacBio SMRT technology.There are a few examples of this technology being applied to decipher structural variation in plant genomes,such as Solanum pennellii,for which Nanopore reads were used exclusively[79],and Arabidopsis thaliana,for which structural variants associated with plant growth were identified using a single Nanopore flow cell[75].Although promising,this technology is still in its developmental phase,and high error rates and erratic data yields from Nanopore flow cells continue to be an issue.In fact,high error rates(～15%;compared to ＜0.5%for Illumina short reads[80])are a major bottleneck for both the PacBio SMRT and Oxford Nanopore sequencing technologies[81].However,thanks to the random distribution of errors across the entire length of a read,the error problem can be overcome by increasing the depth of sequencing,albeit making the entire process more expensive.

3.5.A combined strategy for the detection of structural variation in polyploids

As discussed,every individual technology has its own limitations.Accordingly,for complex genomes it would be advantageous to adopt a hybrid approach using a combination of different technologies.A very commonly used strategy for detection of structural variation in large genomes is to pursue low coverage with longer reads using either PacBio or Nanopore sequencing and to combine these data with a high number of Illumina short reads.Second-generation sequencing technologies such as Illumina together with classical genetic mapping can also be used to detect long-range genomic rearrangements.For example,using a combination of short Illumina reads with SNP data from a segregating nested association mapping(NAM)population in Brassica napus revealed genomic deletions associated with disease resistance[65].

As mentioned previously,pan-genomics is another popular choice for the detection of structural variation,in particular PAV.Many studies have shown the power of pangenomics in detecting PAV,for example in wheat[35],rapeseed[23],cabbage(Brassica oleracea)[82],and rice[83].However,it is important to note that the amount of information a pan-genome can provide about a species is highly dependent on the number and diversity of the individuals sequenced to create the pan-genome,and sequencing a large number of individuals inevitably incurs high costs.Declining costs combined with increasing throughput of next-generation sequencing technologies during the last decade has already led to the availability of many crop plant reference genomes:for example rice[84],maize[85],sorghum(Sorghum bicolor)[86],cucumber(Cucumis sativus)[87],soybean(Glycine max)[88],potato(Solanum tuberosum)[89],barley(Hordeum vulgare)[90],cotton(Gossypium hirsutum)[91],chickpea(Cicer arietinum)[92],rapeseed(Brassica napus)[3],bread wheat[93],common bean(Phaseolus vulgaris)[94],and pearl millet(Pennisetum glaucum)[95].However,creating multiple reference assemblies for a single species remains challenging,owing to financial and technological constraints.In addition to large overhead expenses in creating multiple assemblies,a majority of these sequencing technologies can read only a short stretch of DNA per read.This limitation might not be a problem in dealing with diploid genomes,but it presents an inherent problem for identifying genome structural variation in polyploid crops.Allopolyploid genomes usually comprise two or more very closely related genomes,making the alignment of short DNA reads to a reference genome extremely challenging.Sequence alignment is at the heart of any structural variation detection pipeline,and a spurious alignment may lead to the wrong biological conclusions.However,with declining sequencing costs it is just a matter of time before the term “reference genome”becomes obsolete,and thousands of genome assemblies become available for every plant species,including the complex polyploid crops.

4.The influence of structural variation on traits

4.1.The challenge of linking structural variation to phenotype

Copy-number variation can also be understood as an extreme form of sequence variation.Whereas deletions abolish gene copy function,duplications can lead to variation in expression level[13]and thereby affect gene dosage.Duplications thus have a higher risk of affecting traits than point mutations or InDels,and should accordingly be under strong selection pressure.The adaptive value of gene duplicates is illustrated by the observation that polyploid genomes tend to lose redundant gene copies[52],whereas gene copies from adaptive pathways are retained or even duplicated.For example,in the mustard family,genes from the glucosinolate pathway were highly retained after whole-genome duplications,with a retention rate of 95%compared to 45%across all genes[96].Despite the strong phenotypic effects expected from CNV,findings linking genomic variation with phenotypes are scarce.This scarcity is due partly to the comparatively large effort needed to confirm CNVs in plant genomes.CNVs can be detected via microscopy-based approaches,e.g.,fluorescent in situ hybridization(FISH);quantitative PCR(qPCR);probe-hybridization-based assays;SNP array-based methods;or next-generation sequencing.Statistically linking CNVs to phenotypes requires analysis of medium-to-largesized populations,and these must undergo CNV detection as well as phenotyping,which can be expensive in cost and time.Moreover,some of these methods have specific drawbacks.For example,PCR-based approaches are not robust against sequence variation within primer binding sites,SNP arrays cannot easily detect duplications,and NGS approaches depend on the quality of the reference genome.These difficulties partly explain the lack of data.Another reason may be that both scientists and breeders underestimate the extent of structural variation in polyploid crops and focus rather on classical sequence variants.Recent findings of crop genomic structural variation,however,show that the assumption of a stable genome is questionable.This genomic instability is also shown by several pan-genomic approaches.For example,sequencing 10 different accessions of cabbage,a diploid with multiple polyploid events in its lineage(mesohexaploid),revealed that 18.7%of the gene copies were affected by PAV in at least one of the accessions[82].Another pan-genome study in the related recent allotetraploid rapeseed,using 53 diverse accessions including resynthesized lines,found that 38.0%of the genes were affected by PAV[23],similar to the value of 35.7%obtained from 18 hexaploid wheat cultivars[35].These figures illustrate the importance of structural variation in polyploid crops and the pressing need to investigate the influence of structural variants on adaptive and agronomic traits.Although transposable elements are the major source of structural variation in diploids,polyploids present even more possibilities of genomic rearrangement.Homeologous exchanges in meiosis can lead to deletions and duplications in homeologous chromosomes,depending on the degree of sequence similarity and distance from the centromeres[97].A summary of research studies that have confirmed the phenotypic influence of CNVs in polyploid plants is presented in Table 1.

4.2.The adaptive value of structural variation

The high degree of structural variation observed within species is indicative of its adaptive value.CNVs are raw material for evolutionary adaptation and have been found to underlie several adaptive traits in crops,among them flowering time[27],cold tolerance[103],and boron tolerance[104].They also seem to be an important mechanism for the development of herbicide resistance in weeds[105],demonstrating the power of structural variation in facilitating plant evolution even on very short time scales.Gaines et al.[105]identified the acquisition of glyphosate resistance in Palmer amaranth,a weedy species in southeast USA,by repeated duplication of the gene 5-enolpyruvylshikimate-3-phosphate synthase(EPSPS).The enzyme EPSPS is the molecular target of glyphosate,and resistant populations show a 40-100 fold increase in EPSPS copy number along with increased protein expression and enzymatic activity[105].Sequencing of these amplified regions revealed that they were in close vicinity to miniature inverted-repeat transposable elements(MITEs),suggesting that these MITEs have played a mechanistic role in gene amplification[106].FISH experiments also confirmed that the EPSPS cassette indeed amplified as an extrachromosomal circular DNA(eccDNA)from one of two genomic copies of EPSPS,a finding that may explain how the species managed to evolve widespread resistance to glyphosate in less than five years[107].

Owing to the technical constraints discussed above,most studies linking CNV and phenotype focus on selected genes or sets of genes.Techniques such as exome capture can overcome this limitation for genome-wide analysis.However,few genome-wide studies have been performed to link structural variants to adaptive traits genome-wide in polyploids,owing to the high costs involved.In one example,537 diverse accessions of tetraploid and octoploid switchgrass(Panicum virgatum)were sequenced,revealing 9979 genes affected by CNV or PAV[100].Some of the CNVs could be assigned to specific ecotypes:for example,62 deletions were specific to upland switchgrass,one of the major ecotypes of the species[100].Some of the affected genes were also involved in photo-inhibition protection,indicative of a genome-wide adaptive response in either upland or lowland switchgrass[100].

4.3.Effects of structural variation on flowering time

A major component of climatic adaptation is flowering-time regulation,which synchronizes plant development with climatic conditions.In narrow-leafed lupin(Lupinus angustifolius),breeder selection for two major early flowering-time loci(Ku and Julius)allowed expansion of this crop into shorter-season environments:very recently,both lociwere found to resultfrom deletions in regulatory regions of a flowering-time gene(LanFTc1,a FLOWERING LOCUS T(FT)homolog)[108].In wheat,an increase in the copy number of Ppd-B1 conferred day-neutral flowering,whereas accessions with unaltered copy numbers were found to be photoperiod-sensitive[25].Increased copy numbers of Vrn-A1 were correlated with an increased vernalization requirement.In a large landmark study[27]using a global panel of 1110 diverse wheat cultivars,copy-number variants of these two loci were found to be involved in wheat adaptation worldwide.Based on the phylogeny of the population,the authors concluded thatthe observed gene duplications arose independently several times,underlining the importance and prevalence of the CNV adaptation mechanism.Ppd-B1 duplications affected 56%of the Chinese subpopulation but only 10% of the total population,suggesting a role ofphotoperiod adaptation to different daylength regimes.In addition,copy numbers of Vrn-A1 increased from southern to northern Europe.Both CNVs together explained between 3%and 30%ofthe phenotypic variance depending on environment.Other polyploid crops also show widespread CNVs in floweringtime genes.Using a targeted-sequencing approach in a population of 280 diverse accessions of rapeseed,a study[24]of gene copy-number variation showed that CNVs are highly abundant in all 35 flowering-time gene copy groups assessed.Similar results were found for the progenitor species B.rapa and B.oleracea[32].In particular,two structural rearrangements(HE duplication-deletion events)affecting copies of Bna.PHYA(duplicated from A09,deleted on C8)and Bna.FLC(duplicated from A10,deleted on C09)predominated in swedes(B.napus ssp.napobrassica;[24]).PHYA is a photoreceptor involved in photoperiod-dependent regulation of flowering time,whereas FLC is the main regulator of vernalization in dicots[109].The subspecies napobrassica,as a root vegetable form of B.napus,is extremely vernalization-dependent and flowers only after very long periods of cold,presumably because it has been selected for bolting resistance[24].These observed HEs and CNVs in wheat and rapeseed were selected unconsciously by early breeders,highlighting the possible gains that could be made by targeted application of knowledge of structural genome variation in modern breeding.

Table 1-Studies reliably linking structural variation to trait variation in polyploid crops.

4.4.Effects of structural variation on frost tolerance

Another trait tightly linked to climatic adaptation is frost tolerance.Biannual plants overwinter in the vegetative state and therefore have to withstand temperatures well below 0°C.This requirement can limit introgression of genetic variance from an annual to a biannual background.In wheat,two major loci controlling this trait have been shown to be linked to CNVs:FR1,containing the previously named vernalization regulator Vrn-A1,and FR2,where three CREPEAT BINDING FACTOR(CBF)gene copies,CBF-A12,CBFA14,and CBF-A15,are located[101].Copy numbers of CBF-A12 and CBF-A14 in 65 diverse winter and 81 diverse spring cultivars were significantly correlated with frost tolerance in winter accessions,but not in spring accessions[101].Furthermore,the phenotypic effect of the CNV depended on the haplotype of the other locus:an increased copy number of Vrn-A1 led to higher frost tolerance in accessions carrying the FR2-A2-T haplotype[101].Later,Würschum et al.[102]showed that the CBF-A14 CNV indeed accounts for 24.3%of the phenotypic variance of the trait in winter wheat.This example shows that both CNVs and sequence variants should be carefully integrated into the genetic models underlying modern breeding programs.

4.5.Effects of structural variation on other agronomic traits

Although several clear examples of CNVs affecting important agronomic traits,such as disease resistance[72,110]and metal ion tolerance[104,111],are known in diploids,results showing the effect of CNVs on agronomic traits in polyploids are surprisingly scarce.However,the rare examples we have indicate that more research in this field is sorely needed.In allotetraploid rapeseed,Qian et al.[99]used data from a genome-wide SNP array to identify a deletion encompassing a copy of NON-YELLOWING 1(NYE1),which is involved in chlorophyll degradation during senescence.The deletion was significantly associated with chlorophyll content at two different plant developmental stages in a population of 203 diverse Chinese semi-winter accessions[99].Haplotype analysis revealed that seven accessions carried the deletion,all of which showed significantly increased chlorophyll content at both seedling and bolting stages[99].Phylogenetic analysis suggested that the locus had been introgressed into B.napus from the progenitor species B.rapa[99].Similarly,a deletion in a copy of cytokinin oxidase(CKX),a gene involved in photosynthesis regulation,reduced both chlorophyll content and grain size in a recombinant inbred line(RIL)population derived from a cross between a winter and a spring cultivar of wheat[30].Stein et al.[50]identified several genomic rearrangements affecting seed quality and other agronomic traits in three different doubledhaploid(DH)populations of B.napus using a variety of different methods to validate these results:FISH,whole genome resequencing,genetic mapping using SNP array data,and sequence capture[50].Different quantitative trait loci(QTL)for seed fiber content,number of seeds per silique,flowering time,and glucosinolate content could all be traced back to deletions or homeologous exchanges in this study[50].For example,a QTL for seed color,lignin,and fiber contentwas associated with a duplicated fragment on chromosome C08,with a corresponding loss of a(homeologous)chromosome A09 fragment in the resynthesized mapping parent[50].The 173-kb QTL interval was found to harbor three important candidate genes for those traits.A CNV for one of these gene copies was additionally confirmed by sequence capture,indicating that the structural rearrangement indeed affected the genes within the interval[50].

5.Conclusions and perspectives

Today,we know that allelic variants of genes are far from explaining the totality of crop phenotypes.There is an increasing realization that SNPs do not represent all existing genetic variation within a species,and also an increasing appreciation of the role of genome structural variation in phenotypic expression[14].This understanding has led to large-scale characterization of multiple genomes per species at a population level,resulting in the discovery of a large array of structural variants[10].Changes in gene copy number affect the expression levels of genes,which may in turn alter phenotypes.Owing to their large size relative to single nucleotide polymorphisms(SNPs),structural variants most likely account for more heritable differences in phenotype than SNPs[112].However,until recently,the complexity of genome structural variation was relatively unknown,particularly in plants.Genome structure is still more complex in polyploids,which have multiple gene copies and are often also the product of ancestral hybridization events between species.Polyploids carry all the same types of genomic structural variation as diploids,but taken to the extreme.Owing to the buffering effect of multiple genomes and hence redundant gene copies,CNV and PAV are hypothetically better tolerated in polyploids than in diploids[7].Sequence homology shared between two different parts of the genome initiates non-homologous recombination events in all species,regardless of polyploid status[54].However,polyploids,with their high proportion of sequence homology(homeology)between different subgenomes,are far more likely to undergo chromosome rearrangements[113].These rearrangements can be classified into non-reciprocal and reciprocal translocation events,and then into duplications and deletions(Fig.1),but all arise from exchanges between non-homologous(usually homeologous)chromosomes during meiosis[114].These homeologous exchanges differ from PAV and CNV as traditionally defined in that a “dosage compensation”effect is usually achieved,such that gene content is relatively conserved [115,116].For this reason,we propose that homeologous exchanges should be considered as a major,distinct category of genome structural variation,particularly in polyploids.

All three major types of genome structural variation(CNV,PAV,and HE)have been linked to important agronomic traits in polyploid crops(Table 1),despite the fact that genomic resources are still sparse for many of these species.The production of multiple reference genomes per species and the subsequent construction of“pan-genomes”that include all(or almost all)DNA present in all individuals within a species have been under way foronly a shorttime,and mostofour important crop species are still lacking pan-genome resources(Table 2).Fortunately,these resources are becoming increasingly available with the advent of new sequencing technologies,and with the corresponding drop in price and time required for genome sequencing.In the short term,detection of genome structural variation atlowercosts can be optimized by combining multiple technologies,such as short-read sequencing with longer reads or optical mapping,or by the use of older methods such as molecular cytogenetics or genotyping arrays[50,66].

In this review,we have highlighted the importance of genomic structural variation for polyploid crop improvement.Genomic structural variation in polyploids affects major agronomic traits such as flowering time,frost tolerance,and seed quality traits,as well as adaptation to the environment(Table 1).This effect of structural variation on phenotype emphasizes the necessity of investing in pan-genomics,as emerging research has also pointed to the “dispensable”genome as playing an important role in adaptation within a species to different environmental conditions[10,12]and CNV and PAV have been specifically linked to traits inadvertently selected for by human agriculture in polyploid crops such as wheat[27]and rapeseed[98].In future,efficient identification and functional validation of genomic structural variation facilitated by new technologies and pan-genome resources should play a major role in genetic improvement of polyploid crops.

Table 2-List of species with assembled pan-genomes.

Acknowledgments

This work was supported by the Deutsche Forschungsgemeinschaft(MA6473/1-1,MA6473/2-1).

The Crop Journal2019年2期

The Crop Journal的其它文章: Mapping QTL underlying tuber starch content and plant maturity in tetraploid potato; QTL mapping and QTL×environment interaction analysis of multi-seed pod in cultivated peanut(Arachis hypogaea L.); Identification of major QTL for seed number per pod on chromosome A05 of tetraploid peanut(Arachis hypogaea L.); Co-location of QTL for Sclerotinia stem rot resistance and flowering time in Brassica napus; Mapping loci controlling fatty acid profiles,oil and protein content by genome-wide association study in Brassica napus; Genetic variations in plant architecture traits in cotton(Gossypium hirsutum)revealed by a genomewide association study