Ling You,Liu Yng,Fngying Liu,Lei Kng,Ho Chen,Bin Yng,b,Qin Yng,Zhongsong Liu,*
a College of Agronomy,Hunan Agricultural University,Changsha 410128,Hunan,China
b Guizhou Institute of Oil Crops,Guizhou Academy of Agricultural Sciences,Guiyang 550000,Guizhou,China
Keywords:Brassica juncea Mitochondrial genome Mitotypes Molecular markers Migration routes
ABSTRACT The release of mitochondrial genome sequences provides the basis for characterizing interspecific and intraspecific variation in Brassica mitochondrial genomes.However,few B.juncea (mustard) mitochondrial genomes have been published.We assembled the mitochondrial genomes of three B.juncea subspecies and compared them with previously published genomes.The genomes were phylogenetically classified into A,B,C,and Bna clades.Two variant sites,a transversion (C →A) at nt 79,573 and a 31-bp copy-number variation between nts 65,564 and 65,596,were identified.Based on these variant sites,mitotype-specific sequence markers were developed to characterize the variation among worldwide 558 B.juncea accessions.Three mitochondrial genome types(mitotypes MT1–MT3)were identified.In terms of geographical distribution,MT1 and MT2 accessions were distributed mainly to the north and MT3 to the south of 34°N.Root mustards carried only MT1,leaf and stem mustards carried mainly MT3,and seed mustards carried all three mitotypes,implying that the mitotypes underwent selection during B.juncea domestication.A new form of oil mustard evolved by hybridization between two gene pools in southwest China.
Mustard (Brassica juncea (L.) Czern.et Coss.) is grown as an oilseed,vegetable,and condiment crop worldwide,and is taxonomically classified into four subspecies:napiformis(root),juncea(seed),integrifolia (leaf) and tsatsai (stem) [1,2].B.juncea (2n=4x=36,AABB)is an allotetraploid that originated from natural hybridization between B.rapa (2n=20,AA) as a maternal parent and B.nigra(2n=16,BB).Mustard was grown 6000–7000 years ago in China[3],and Indian mustard was used by the Indus Valley civilization in the form of oil mustard as early as 2300 to 1750 BCE[4].
Because the cytoplasm is inherited from the maternal parent in Brassica juncea and all others,variation in the cytoplasmic genome can be used to investigate species origin and domestication.The release of organellar genome sequences has provided new insights into the genetic relationships of Brassica crop species [5–13].Comparison showed that B.juncea and B.rapa carry identical mitochondrial (mt) genomes,in agreement with the fact that B.rapa is the cytoplasmic donor of B.juncea [6,9,13].Likewise,mt genome sequencing confirmed that B.nigra is the maternal parent of B.carinata [8].
Intraspecific variation in the mt genome has been frequently observed in Brassica species.There was variation in size and structure of mt genomes between B.oleracea accessions [6,7].An InDel variation was detected between B.nigra mt genomes [8].A SNP variation,C to A,was identified between two B.rapa varieties,Chinese cabbage and Mizuna.This variation also appears in B.juncea[10].Based on 42 mt genes,B.napus accessions were clustered into two clades:B.rapa and B.oleracea clades [14],which were proposed to correspond to pol (Polima) and nap (Napus) cytoplasm,respectively [13].In mustard,mt genome sequences of three subspecies:leaf [6],stem [9] and seed mustards (GenBank ID:MT675103) have been released.The mt genome sequence of root mustard has not been reported.
Mitotype-specific sequence (MSS) markers have been developed to classify mt genomes and permit identification of cytoplasmic types,phylogenetic analysis,and breeding of cytoplasmic male sterility (CMS) lines in Brassica species [15,16].In B.napus,the cytoplasmic types nap,cam(Campestris),pol,and ogu(Ogura)have been distinguished.For accurate and rapid identification of these different cytoplasms,corresponding MSS markers have been developed,and provide a reliable method for the identification of CMS rapeseed hybrids[15].Recently[16],MSS markers were developed from B.napus mitotype-specific sequences and used to identify breeding lines.
With the rapid development of sequencing technologies,mt genome sequences can be assembled using long-read thirdgeneration technologies [17–20].The aim of the present study was to assemble the mt genomes of root,seed,and leaf mustards using PacBio sequencing and to develop MSS markers to characterize B.juncea accessions and mitotypes and to investigate the diversification and spread of B.juncea mitotypes.
Three Brassica juncea accessions:Datoucai (ssp.napiformis),Sichuan Yellow (SY,ssp.juncea),and CR 2493 (ssp.integrifolia)were used for genome sequencing.A panel of 558 B.juncea accessions from 38 countries or regions were used for mitotyping.Their geographic origins are presented in Table S1.The other accessions,including 51 B.napus accessions,5 B.rapa accessions,4 B.oleracea accessions,and 1 accession each of B.nigra and B.carinata(Table S2) were used to investigate mt genome variation.F1and F2progenies from reciprocal crosses between SY and Huayejie and their pooled seed samples(Table S3)were used to test the reliability and accuracy of MSS markers.
Total DNA was isolated from 2-week-old leaves or mature seeds by the CTAB method [21].Species identification of Brassica accessions was performed by multiplex PCR using genome-specific primers [22].
The genomes of Datoucai,Sichuan Yellow,and CR 2493 were sequenced using the PacBio Sequel platform [23] (Table S4).The error correction module in Falcon (github.com/PacificBiosciences/falcon) was used to correct reads.The error-corrected reads were aligned with default parameters against the B.juncea mt genome sequence(JF920288)using blat software(v.3.2.1) [24].Reads with over 80-fold coverage,length >30 kb,and identity of 90%or more were taken as candidate reads from the mt genome.For de novo assembly of the mt genomes of these accessions,the candidate reads were assembled into contigs with Celera Assembler 8.3rc2[25],using default parameters.The longest contigs were aligned(blast.ncbi.nlm.nih.gov/Blast.cgi) against JF920288 to identify the start position of the mt genome.The circular genome was constructed after the overlap at the start and end of the sequence was verified.Finally,SNPs and InDels between the new and the reference (JF920288) mt sequences were identified by Basic Local Alignment Search Tool (BLAST,blast.ncbi.nlm.nih.gov/Blast.cgi)alignment and validated by sequencing of PCR products.
The mt genomes were annotated by BLAST searches against nr/nt (blast.ncbi.nlm.nih.gov/Blast.cgi) to identify known mt protein genes,rRNA genes,and repetitive sequences.To identify tRNA genes,tRNAscan-SE (lowelab.ucsc.edu/tRNAscan-SE/) was used.OGDraw 1.3.1 [26] was used to draw circular mt genome maps.
Twenty-one Brassica mt genome sequences were retrieved from NCBI (www.ncbi.nlm.nih.gov/) and used for clustering with the three sequences assembled above.Each mt genome sequence was simulated by the ART software [27] into 30-fold paired-end sequences (Illumina,San Diego,CA,USA) with an library of 200 bp and reads length of 150 bp.BWA 0.7.16[28]was then used to align the simulated reads against JF920288.SNPs and InDels between the sequences were identified by GATK [29].The identified variant sites were extracted,concatenated,and aligned with ClustalW in MEGA 7.0 [30],The maximum likelihood method was used to construct the phylogenetic tree,with 1000 bootstraps.The phylogenetic tree was drawn with EvolView(www.evolgenius.info/evolview).
To validate mt genome variants,the primers sequences of MSS markers MT.CNV and MT.SNP were redesigned as described previously [23] (Table S5).The PCR reactions contained 10 μL 2× Fast LongTaq PCR PreMix (Innovagene,Changsha,China),100 ng DNA template,and 0.4 μL forward and reverse primers (10 μmol L-1),and were made to 20 μL with sterilized water.The PCR protocol was as follows:94°C for 3 min;34 cycles of 94°C for 30 s,annealing for 30 s with the annealing temperature shown in Table S5,72°C for 30 s;and a final extension at 72°C for 5 min.For detecting MT.SNP markers,PCR amplicon could be digested by the endonuclease EarI when the nucleotide is C at position no.79,573 according to JF920288,whereas it could not be digested when the nucleotide C had undergone transversion to A.All PCR products,whether or not digested,were separated by electrophoresis on 2.0% agarose gels.
The read N50s for the three assembled accessions Datoucai,Sichuan Yellow,and CR 2493 were 16.2,13.6,and 16.0 kb,respectively (Table S4).Their final assembled mt genomes were respectively 219,775 (GenBank ID:MZ671991),219,806 (MZ671990),and 219,775 bp(MZ671992)in length(Table 1).Like the reference(JF920288)[6],these genomes all harbored 55 genes,including 34 protein-coding genes,3 rRNA genes,and 18 tRNA genes.The mt genome of B.juncea Datoucai is shown in Fig.S1.
Table 1 Size of mt genomes in Brassica juncea.
A total of 2710 SNPs and 1371 InDels were detected in 24 Brassica mt genome sequences.These variant sites were used to construct a phylogenetic tree with Arabidopsis thaliana as an outgroup.The mt genomes of Brassicaceae were divided into clades A,B,C,and Bna.All the mt genomes of B.rapa and B.juncea and three mt genomes of pol-like B.napus were clustered in clade A,while clade B included the mt genomes of B.nigra,B.carinata,and Sinapis arvensis.Four B.oleracea mt genomes constituted clade C.Two nap-like B.napus mt genomes formed clade Bna (Fig.1).These results showed that there is divergence among the mt genomes of six Brassica species,and indicate that pol-like B.napus and nap-like B.napus have different cytoplasmic origins.
Fig.1.Phylogenetic tree of Brassicaceae mt genomes.The purple circle size represents the number of 31-bp repeat sequences.The red square indicates that the nucleotide is A,whereas the green and blue squares represent C and T,respectively.The species and serial numbers of the mt genomes used for phylogenetic analysis are Arabidopsis thaliana (NC_001284), Sinapis arvensis(KM851044), Brassica rapa (JF920285,AP017996,AP017997), B.nigra (AP012989,KP030753), B.oleracea (AP012988,KU831325,JF920286,KJ820683), B.juncea(MT675103,KF736093,MZ671990,MZ671991,MZ671992,JF920288,KJ461445),B.napus (FR715249,KM454975,KM454974,AP006444,KP161618),and B.carinata(JF920287)).
One copy-number variation (CNV),four SNPs,and ten InDels were identified among the three new mt genome sequences and the reference JF920288 in B.juncea (Fig.2a;Table S6).All these variant sites were validated by PCR amplification.However,only four variant sites were found among the three new B.juncea mt genome sequences.Compared to Datoucai,SY had a transversion(T →G) at nt 40,702 and two 31-bp repeat sequences between nts 65,564 and 65,596,while CR 2493 had two transversions,G/T at nt 43,641 and C/A at nt 79,573.Sequencing of the PCR amplicons from 30 B.juncea accessions showed that the polymorphic nucleotide G at nt 40,702 kept together with the 31-bp repeats between nts 65,564 and 65,596 (Fig.2b;Table S7),but the variable nucleotide T at nt 43,641 was found only in CR 2493.Accordingly,only MT.CNV (nts 65,564–65,596) and MT.SNP (nt 79,573) were used to develop MSS markers.
A total of 620 Brassica accessions(Tables S1,S2)were identified by genomic-specific markers [22].Among them,558 accessions belong to the species B.juncea.Three types of mt genomes (mitotypes,MT) in these B.juncea accessions were distinguished with the MSS markers MT.CNV and MT.SNP (Table S5):MT1 (one 31-bp repeat,nucleotide C),MT2 (two 31-bp repeats,nucleotide C),and MT3 (one 31-bp repeat,nucleotide A) (Fig.2b).MT1,MT2,and MT3 accounted for 16.3% (91/558),39.6% (221/558),and 44.1% (246/558) (Table S1),respectively.Among the remaining 62 accessions,51 B.napus and 5 B.rapa accessions were distinguished by these MSS markers (Table S2),as shown by the differences in the mt genome sequences of these species (Fig.1).
All 29 root mustards carried only MT1,whereas all 13 stem mustards and 96% (102/106) of leaf mustards carried MT3.Four hundred ten seed mustards carried all three mitotypes,with the three mitotypes accounting for 14.1%,53.9%,and 31.9%,respectively (Fig.3).Seed mustards from different geographic regions carried different mitotypes.Seed mustards from northwest China carried MT2,while those from South Asia carried MT3.The latitude 34°N appeared to be a geographical dividing line distinguishing the mitotypes of 492 B.juncea accessions with known geographical origins (Fig.4;Table S1).MT1 and MT2 occurred more frequently in the north,while MT3 predominated in the south.
The F1and F2progenies from the reciprocal crosses between Sichuan Yellow (MT2) and Huayejie (MT3) displayed the same mitotype as their female parent (Fig.5a).When 13 mixed seed samples of both parents were mitotyped to detect the sensitivity of the MSS markers,the markers MT.CNV and MT.SNP detected at least 5% of seed contamination and could detect contamination in a wide dynamic range of mixed proportion (5%–95%) (Fig.5b).
The clustering of the mt genomes of all B.juncea subspecies with those of B.rapa confirmed that the cytoplasm of B.juncea was directly inherited from the diploid progenitor species B.rapa.We confirmed that B.rapa and B.juncea showed intraspecific divergence in mt genomes[10]and found that MT3,though common to many B.juncea accessions (Tables S1,S2;Fig.1),was present in B.rapa only in the subspecies Nipposinica Mizuna,a unique vegetable cultivated in Kyoto since the 18th and early 19th centuries [32].Like potherb mustard (B.juncea var.multiceps),which dates back to the 16th century [33],Mizuna has deeply lobed,narrow leaves,high tillering ability,and small seeds.Contrary to Hatono et al.[10],we propose that the mt genome of Mizuna was introgressed from mustard.MT1 of B.rapa is phylogenetically the closest to that of B.juncea and thus can be considered the primitive mitotype of B.juncea.
Fig.2.Variation in Brassica juncea mt genomes.(a)The positions of fifteen variants identified among three new mt genome sequences and the reference JF920288 of B.juncea.The orange and blue dotted lines represent the variant sites used for genotyping and not used for genotyping,respectively.(b)The variation and genotype of four mt genomes.Red and green lettering indicate respectively the CNV between nts 65,564 and 65,596 and the single-nucleotide variation at nt 79,573.MT1,mitotype 1;MT2,mitotype 2;MT3,mitotype 3.
Fig.3.Proportions of mitotypes in various subspecies of Brassica juncea. N,root mustard;J,seed mustard;I,leaf mustard;T,stem mustard;MT1,mitotype 1;MT2,mitotype 2;MT3,mitotype 3.
The mt genome evolved from MT1 to MT2 and MT3 via respectively a repeat insertion and a base transversion in B.juncea.B.juncea accessions from northern China,Central Asia,and Europe carried mainly MT1 and MT2,while those from Southern China and South Asia carried MT3 (Fig.4).We propose that B.juncea spread via several routes(Fig.6).The B.juncea accessions carrying MT1 spread eastward into East Asia.Those carrying MT2 spread into Central Asia and northwest China along the northern route suggested by Chen et al.[34].This route partially overlaps with the migration route of MT1.Accessions carrying MT3 spread into South Asia and further southwestern China along the southern route.MT1 and MT2 from northern China and MT3 from southwest China were introduced into the Yangtze River basin,where all three mitotypes of B.juncea are located and new forms of B.juncea evolved.
Fig.4.Geographical distribution of Brassica juncea mitotypes.The green circle,orange circle,and red triangle represent MT1,MT2,and MT3,respectively.**,significant difference at P <0.01 between MT1,MT2 and MT3.MT1,mitotype 1;MT2,mitotype 2;MT3,mitotype 3.The world map was downloaded at the website http://bzdt.ch.mnr.gov.cn,and the map content approval number is GS(2016)1666.
Fig.5.Electrophoregrams of PCR products amplified from mitotype-specific sequence markers in Brassica juncea.(a) Identification of cytoplasmic donor parents of hybrid offspring.(b)Identification of seed mixtures.M,DNA molecular weight ladder;D,a direct cross;R,the reciprocal cross.Lanes 1–2,P1(Sichuan Yellow);Lanes 3–4,F1(P1×P2);Lanes 5–6,F2(P1×P2);Lanes 7–8,P2(Huayejie);Lanes 9–10,F1(P2×P1);Lanes 11–12,F2(P2×P1);Lanes 13–25,100%,99%,95%,90%,80%,70%,50%,30%,20%,10%,5%,1%,and 0% of Sichuan Yellow seed,respectively.
Fig.6.Evolution and migration of Brassica juncea mt genomes.
Taxonomic[35],biochemical[36],and DNA marker[34]studies have shown that the seed mustards fall into two groups:the India-Pakistan and the China-Eastern Europe group.The India-Pakistan group typically has brown seed,is long day-insensitive,and carries the MT3 mitotype,while the China-Eastern Europe group is characterized by both yellow and brown seed,long-day sensitivity,and the MT2 mitotype.Oilseed mustard in southwest China carries all three mitotypes,with MT2 prevalent (Fig.4;Table S1),and shares the yellow-seed characteristic of oilseed mustard in northern China and the early-maturity character of India-Pakistan oilseed mustard.We speculate that a new form of oilseed mustard evolved in southwest China from hybridization between China-Eastern and India-Pakistan gene pools.
Based on the distribution of mitotypes among subspecies,the MSS markers developed in this study may be applied to subspecies identification of B.juncea varieties or accessions;identification of the cytoplasmic donors of hybrid progenies of B.juncea,and determination of seed purity in B.juncea.Besides,the MT.CNV marker distinguished the pol-like from the nap-like mt genome in B.napus(Table S2),indicating that this MSS marker can be used for mitotyping not only in B.juncea but in other Brassica species.
The mt genome of root mustard was assembled.Based on CNV and SNP variation in mt genomes,two MSS markers were developed and three mitotypes identified in B.juncea.Analysis of the geographical and subspecies distributions of these mitotypes showed that B.juncea spread via three routes and that a new form of early maturing yellow-seeded oil mustard evolved in southwestern China by inter-gene-pool hybridization.
CRediT authorship contribution statement
Liang You:Methodology,Validation,Investigation,Data curation,Software,Visualization,Writing– original draft.Liu Yang:Investigation,Data curation,Writing– original draft.Fangying Liu:Software.Lei Kang:Methodology,Resources.Hao Chen:Software.Bin Yang:Resources.Qian Yang:Software.Zhongsong Liu:Conceptualization,Data curation,Formal analysis,Resources,Supervision,Writing– review &editing,Funding acquisition.
Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This work was supported by China Agriculture Research System(CARS-12) and National Natural Science Foundation of China(U20A2029).
Appendix A.Supplementary data
Supplementary data for this article can be found online at https://doi.org/10.1016/j.cj.2021.10.004.